<center><img src="https://i.imgsafe.org/0c/0c4ef21a6f.png"></center>
There are probably at least three kinds of bots, and I will cover two. But before I get started, I need to say, never leave your password in a program. The other thing to know is that some sites can deem the frequent use of botting as a DDOS attack including the <a href="http://tf.nist.gov/tf-cgi/servers.cgi">NIST time server</a>, and the companies can ban any accounts you hold with them-or worse prosecute you if you disrupt their service. The other thing to consider, if that if a company changes a structure of their webpage-you may have to rebuild your scripts. So If you ever get anywhere, form strategic partnerships so you can change as they change with minimal down time. One thing this is not, it is not trying to learn someone elses prebuilt bot software and trying to configure their bots. This post is to give some basic fundamental ideas to help you build your own bot from scratch.
The first that I shall cover involves using the internet, the codes produced aren't really bots per se but just scripts that reads data from websites, and parses the the desired information. The latter is more visual that automates the mouse and keys, and of course the goal here is for games. The third type is the type more of us may properly fear, and that is automation with or without AI.
**********************
As we noted before these scripts are not bots, but more of an introduction of how to have code read webpages, and extract the data. Once you have the information you desire, you can either store it in a file or a database for a later decision or make a more immediate decision. For example, if you wanted to test out theories of moving averages or bollinger bands or whatever about stocks, you could read a page that stores these statistics and perform a mathematical analysis, and if they pass whatever test you want them to, you could have them log into your stock broker on their own-but nothing more than just toy/play money. If you want to try AI by reading news headlines (scanning news sites for new news links) and making decisions (by having code open the new news and analyze it) in split seconds and buying, don't jump into using real money, because you will find that when developing the key words (or dictionary) to look for there may be negations or other bad news too, so you have to do a bad news dictionary too. I was trying to capture the prices of electronics through online etailers, and link them all through affiliate programs on a social network i was developing. Whatever you want to do, the sky is the limit. One Georgia Tech student wrote a bot that entered contests on twitter, <a href="https://fossbytes.com/how-a-coder-built-a-twitter-bot-that-entered-and-won-1000-online-contests/">and was winning an average of 4 contests a day for 9 months</a>. How many websites just do nothing more than get you the best prices? It all starts with being able to gather data; negotiations and partnerships can come later.
1. The first step is basically to identify the type of data you want from what webpage.
2. The next step is to use your browser and view the source code, this can be done in chrome by left clicking. For this example we will use the headline of the drudgereport. And of course the information on many webpage changes (including the drudge report), so its not like we can always just count so many characters in and call it good.
3. We now try to identify how best to get to the information we want with the least amount of jumps (strpos in php, .indexOf in java) -or at least get there and not waste too much time determining the best route. Instead of trying to count say the number of div tags down or something, the drudgereport in their comments leaves an important marker. If we could mentally chop off (substr() in php, .substring in java) everything before that point (as that is what well literally do in code), we look at it from that point forward and proceed to chop it closer and closer to the data. Once we reach the data, we then identify where it ends.
<center><img src="https://i.imgsafe.org/09/0927579a87.png">
<i>you may have to click on the image to see it better.</i></center>
4 We'll use php. For this you would need to install apache webserver and php. Alternatively, you could use java, visual basic, or whatever. And we'll basically implement the code that stores it into a String (in java you should use a StringBuffer instead of strings, because of how strings are stored in memory). Often over HTTP[S] Data is passed one of two methods, GET or POST. Sometimes it doesn't matter which one, sometimes it is vitally important. Just as users can edit a GET request, users can edit a POST request too. Here we will be using GET.
<i>
<?php
$url="www.drudgereport.com";
$path="/index.html";
$fp = fsockopen($url, 80);
fputs($fp, "GET " . $path . " HTTP/1.1\r\nHost: drudgereport.com\r\nConnection:Close\r\n\n");
while(!feof($fp)) {$pagetext=$pagetext . fgets($fp);}
fclose($fp);
...
</i>
5 The above code will store the entire page into the variable $pagetext. So now we have to dig the data we want. In the drudge example, we chosen a marker of '! MAIN HEADLINE' and used strpos to identify how far in the string it is. I also stored a copy into $gets, and I like to have another copy too although here $copy is unnecessary. Anyways, after finding the location of the marker, I chop everything (substr) before that marker off of the string variable. I keep chopping until I get closer and closer to the data I want, here using "href" and later ">".
</i>
...
$gets = $pagetext;
$copy=$gets;
$start=strpos($gets,'! MAIN HEADLINE');
$gets=substr($gets,$start);
$start=strpos($gets,'HREF');
$gets=substr($gets,$start);
$start=strpos($gets,'>');
$gets=substr($gets,$start);
...
</i>
6 This isn't really a step, but whether a drudge headline is red or not can make or break the code without this correction. All this next portion does it correct the code.
<i>
...
if ($start<4) {
$gets=substr($gets,$start);
$start=strpos($gets,'>');
$gets=substr($gets,$start);
}
...
</i>
7 At this point we have pretty much closed in on our data. And now we should find a marker that determines where the desired data ends, and keep only the part we want.
<i>
...
$start=strpos($gets,'<');
$code = substr($gets, 1, strpos($gets,"<")-1);
...
</i>
8 Before making decisions, it is a good idea to output the data and make adjustments as necessary. As seen in the section above we already made offsets. You could store the output in a file, or just use an echo command. Here we just commented it out.
<i>
...
//echo $gets.
...
</i>
9 The next stage is to determine what you want to do with the data. And I chose to store it in an image-cough using some old captcha code.
<i>
...
class img {
var $font = 'arial';
function img($width, $height, $code) {
$font_size = .7*$height;
$image = @imagecreate(600, 50) or die('Cannot initialize new GD image stream');
$background_color = imagecolorallocate($image, 255, 255, 255);
$text_color = imagecolorallocate($image, 20, 40, 100);
$textbox = imagettfbbox($font_size, 0, $this->font, $code) or die('Error in imagettfbbox function');
$x = ($width - $textbox[4])/2;
$y = ($height - $textbox[5])/2;
imagettftext($image, $font_size, 0, $x, $y, $text_color, $this->font , $code) or die('Error in imagettftext function');
header('Content-Type: image/jpeg');
imagejpeg($image);
imagedestroy($image);
}}
$width = isset($_GET['width']) ? $_GET['width'] : 600;
$height = isset($_GET['height']) ? $_GET['height'] : 50;
$captcha = new img($width,$height,$code);
?>
</i>
B. That was pretty easy. But that example was over http. More and more websites are using https, so that requires different a few changes. Most noticeable the port changes from port 80 to port 443, and fsockopen wants an ssl prefix. But it is pretty much the same idea.
We'll get the data we want from https://coinmarketcap.com/currencies/views/all/ , and we will be looking for the price of bit coin and again storing it into an image. We'll show a screen shot of just a little bit of the source code.
<center><img src="https://i.imgsafe.org/09/09bbd01f7c.png"></center>
i. you'll notice the "ssl" in the $url, and the change in port number on lines 1 & 3. And of course we have to change our GET to reflect that we aren't using the drudge report anymore.
<i>
<?php
$url="ssl://coinmarketcap.com";
$path="/currencies/views/all/";
$fp = fsockopen($url, 443, $errno, $errstr, 30);
if(!$fp){ echo "$errstr ($errno)\n"; }
fputs($fp, "GET " . $path . " HTTP/1.1\r\nHost: coinmarketcap.com\r\nConnection:Close\r\n\n");
while(!feof($fp)) {$pagetext=$pagetext . fgets($fp);}
fclose($fp);
...
</i>
ii. We widdled it down, and made the adjustments. We left the bitcoin marker in but commented in case a user wanted to try a different currency.
<i>
...
$gets = $pagetext;
//$start=strpos($gets,'bitcoin');
//$gets=substr($gets,$start);
$start=strpos($gets,'ce" data-usd=');
$gets=substr($gets,$start);
$start=strpos($gets,'=');
$gets=substr($gets,$start);
$code = 'bitcoin(usd):' . substr($gets, 2, strpos($gets," ")-3);
...
</i>
iii.based upon the data we make a decision and do it. In this case, display an image.
</i>
...
class img {
var $font = 'arial';
function img($width, $height, $code) {
$font_size = .7*$height;
$image = @imagecreate(600, 50) or die('Cannot initialize new GD image stream');
$background_color = imagecolorallocate($image, 255, 255, 255);
$text_color = imagecolorallocate($image, 20, 40, 100);
$textbox = imagettfbbox($font_size, 0, $this->font, $code) or die('Error in imagettfbbox function');
$x = ($width - $textbox[4])/2;
$y = ($height - $textbox[5])/2;
imagettftext($image, $font_size, 0, $x, $y, $text_color, $this->font , $code) or die('Error in imagettftext function');
header('Content-Type: image/jpeg');
imagejpeg($image);
imagedestroy($image);
}}
$width = isset($_GET['width']) ? $_GET['width'] : 600;
$height = isset($_GET['height']) ? $_GET['height'] : 50;
$captcha = new img($width,$height,$code);
?>
</i>
C. So anyways i go to the point late, well how can I work this into steem. Some things are easy to do like looking up the users who voted on a particular entry. So try to indentify the source (we'll use https://steemd.com/news/@firstamendment/bunndy-trial-results-not-guilty-on-34-charges-or-no-verdict-great-news-thank-god ) , we see it is https, and we'll look also at the code.
<center><img src="https://i.imgsafe.org/09/09d440f1c7.png"></center>
There are multiple items that we want to extract. But Usually (but not always), when a website reports a list of information there is some type of similarity between each data entries. This is called a delimiter. In php it is useful to use the explode option, in java it would be useful to use a StringTokenizer (and .hasMoreTokens() and .nextToken()). In php it basically makes a list of all your data (although it is still messy) into an array, but removes the delimiters. The foreach method allows us to evaluate each member of the array. You'll see. One thing to note is that here, the first element in the array is null. So depending on your data you receive, you may want to see if you should include the first entry. Here we accepted it cause we really don't care, and it kept things simple.
i.
<?php
$url="ssl://steemd.com";
$path="/news/@firstamendment/bunndy-trial-results-not-guilty-on-34-charges-or-no-verdict-great-news-thank-god";
$fp = fsockopen($url, 443, $errno, $errstr, 30);
if(!$fp){ echo "$errstr ($errno)\n"; }
fputs($fp, "GET " . $path . " HTTP/1.1\r\nHost: steemd.com\r\nConnection:Close\r\n\n");
while(!feof($fp)) {$pagetext=$pagetext . fgets($fp);}
fclose($fp);
...
</i></i>
ii.
<i>
$gets = $pagetext;
$start = strpos($gets, '"account" href="/@');
$gets = substr($gets,$start);
$end= strpos($gets,'</a></div>');
$gets = substr($gets,0,$end);
...
</i>
iii. The explode, foreach, further refinement, and simply printing to a webpage.
<i>
...
$data=explode('"account" href="/@', $gets);
foreach($data as $a) {
//echo $a;
//echo "<br>";
$start2=strpos($a,'"');
$i=substr($a,0, $start2);
echo $i;
echo "<br>";
}
?>
</i>
Well that's nice. But what if we wanted to do automatic upvotes. Well I didn't get to it at this time, it is done through an HTTPS POST method and it was difficult for me to locate it on the browser. Basically what I had to do on chrome, go to developers tool, click on network and find a page worth voting on. Reload the page and wait for everything to finish loading-except for that one script that keeps loading every 20 seconds. When you are done, upvote and you'll see a new entry. Click on that entry which should read something like "/api/v1/record_event". And look for the source of the headers. You'll see what is being sent.
To see someone doing this on another page, watch:
https://www.youtube.com/watch?v=ue6oEH_NeNY
<i>video by Sagar S</i>
Your browser sent:
POST /api/v1/record_event HTTP/1.1
Host: steemit.com
Connection: keep-alive
Content-Length: [length]
Pragma: no-cache
Cache-Control: no-cache
Accept: application/json
Origin: https://steemit.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
Content-Type: text/plain;charset=UTF-8
DNT: 1
Referer: https://steemit.com/colorchallenge/@[user]/[path]
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
Cookie: _ga=[censored]; _gid=[censored]; stm1=[censored]; stm1.sig=[censored]
{"csrf":"[censored]","type":"Vote","value":"@[user]/[path]"}
I don't think I have ever seen what is called a payload before (or maybe I don't use posts very often), as usually a request ends with the \r\n\n. So I am not sure at this point what is required for the number of \n and \r and where they are located. But anyways from your script you would load the page you wanted and instead of sending the "GET yadda yadda yadda", you need to identity what goes into it and do a "POST" like the above to the record_event path. your csrf id is found within the source of the original page and can be parsed out. your ga and gid are passed through cookies, and that would probably allow you to session hijack from your browser to your script. Session hijacking is actually a hacking method. The stm1 and the stm1 are cookies being sent from the server every 20 seconds that keep being renewed. If I understand this correctly, you have 20 seconds to load the page and submit a vote.
One thing in closing, is that both java and php can run other programs. In php it is through the exec([command]) method, similarly in java it is System.exec([command]).
I think this is where I'll leave off for reading webpages.
****************************
So now we'll jump into java robots. So of course you'll need to install java. It isn't the type of robot you think it is. It is just getting screen information, and doing mouse clicks from what I see. It is convenient in visual games, or maybe building programs to assist the technophobic. Like instead of getting a phone call asking how to check their email, you could make a manifest file and make a java program executable, and pin it to their taskbar. Just, don't store passwords in it.
Because I don't regularly work with java anymore, this is basically what I do to compile java code.
1 I go to the binary directory of java.
2 When I have code to compile I use a java command. I edited out my username from the screen shot.
<center>https://i.imgsafe.org/0a/0a75cb8b9e.png</center>
3. when I want to run code. I use the java command, and notice the -cp flag.
<center>https://i.imgsafe.org/0a/0a815a9246.png</center>
You can automate web stuff on it too, but just about everything webbased would be better if it was done at the data level. But ideals and reality are quite different. While Some file types are fairly easy to learn like bmp and gif-others like jpg, flash, videos, etc not so much. If there was good documentation on how to use use a series of images and make a movie file out of it without using 3rd party software, the robots class would make it easy to make a movie of desktop events.
I was thinking about posting some code of a bot I made for the game from clicker heroes. You can try clicker heroes out of steam for free. The code of the bot is clearly here too long, too messy, and too complex for most readers. < a href="http://neophytesoftware.com/steemitfirstamendment/clickerh.txt">messy</a> . Notice it is a txt file not a java file, you may want to change that. It automatically upgrade heroes, it handled the mercenaries, it clicks the bonuses, and auto advances to the next boss as the time to kill lessers decreases.
So basically you start your program you need to call your basic libraries. I don't code with any IDEs.
import java.awt.*;
import java.awt.event.*;
import java.awt.image.*;
import java.io.*;
import java.nio.*;
import javax.swing.*;
import java.util.*;
public class test {
public static void main(String args[]) {
...
</i>
When developing a visual bot, I like to keep a jframe open at all times whose interior colors matches the pixel the mouse is on, and the console constantly printing out the coordinators and the color. If there is a color anomaly (false positive, false negative), I can try to isolate the problem and code in the fix later. I'll delay the printing for now.
...
JFrame jf= new JFrame("test");
jf.setLocation(50,50);
jf.setSize(150,150);
jf.setVisible(true);
...
To begin the java robots class....you create a java robots object. And working with visuals, it is a good idea to take a screen shot-or at least instantiate the variables used to something besides null. We'll also include so other variables x1 and y1 which will map to a particular point on the window. x and y are the size of the window, and a & b will be offsets.
...
Robot r=new Robot();
Dimension d= Toolkit.getDefaultToolkit().getScreenSize();
BufferedImage img= r.createScreenCapture(new Rectangle(d));
int x1=0;
int y1=0;
int a=0;
int b=0;
double x=(int)d.getWidth();
double y=(int)d.getHeight();
...
</i>
And you are going to be looping a lot if you are gaming. But other task you may want to do are pretty limited. Here we'll loop until we break the program. We'll also set a flag as to whether what we want is active or not. You may need a few flags depending on how complex the game is.
<i>
...
while(true) {
isactive=false;
...
</i>
and we might as well print our mouse data at this point.
<i>
....
img= r.createScreenCapture(new Rectangle(d));
int x2=(int)MouseInfo.getPointerInfo().getLocation().getX();
int y2=(int)MouseInfo.getPointerInfo().getLocation().getY();
byte[] z= ByteBuffer.allocate(4).putInt(img.getRGB((int)x2,(int)y2)).array();
int r1=(256+(int)z[1])%256;
int g=(256+(int)z[2])%256;
int b1=(256+(int)z[3])%256;
System.out.println("("+x2+":"+y2+"):("+x1+","+y1+"):("+(x2-x1)+","+(y2-y1)+"):"+" R:"+r1+" G:"+g+" B:"+b1);
jf.getContentPane().setBackground(new Color(r1,g,b1));
...
</i>
Now this game, unlike say chrome, has a pink icon at the top of the window. Chrome seems to have its own marker on the other side of the window-the user icon, so just follow the (default blue) left until the window ends if you want to use chrome. From previous runs using that little window that tracks the mouse and the colors, we've identify a set of numbers belonging to that pink color and chose it as a marker. On this first circle through our x1 and y1 is 0, so we can't take a short cut and assume the marker will be at 0,0 (maybe we could). The code first looks to a short cut too if the marker was previously found (not x*y not equal to 0), and if it is in the same spot at the previous round it can advance without having to check every pixel until it finds what we chose to be our marker. But on the first instance or where the window has been moved, it does have to check every spot until we identifies our marker. We note that A and B are our offsets. Assuming the contents in the game stay in the same area, then no matter where we move our game window (providing it is onscreen) the contents should remain a constant distance from points a & b. We modulate the color numbers by 256 because java loves to turn byte data above 128 into negatives, signed bytes-what a pain. You may be wondering why we allocated 4 bytes from the color, it is because it uses your standard RGB AND also an alpha channel which we don't care about.
<center>https://i.imgsafe.org/0c/0c8031826f.png</center>
<i>
...
if (x1*y1!=0) {
z= ByteBuffer.allocate(4).putInt(img.getRGB(x1,y1)).array();
r1=(256+(int)z[1])%256;
g=(256+(int)z[2])%256;
b1=(256+(int)z[3])%256;
if (r1==237 && g==93 && b1==232) {a=x1; b=y1;isactive=true;}}
if (!isactive) {
for(a=0;a<x;a++) {
for(b=0;b<y;b++) {
z= ByteBuffer.allocate(4).putInt(img.getRGB(a,b)).array();
r1=(256+(int)z[1])%256;
g=(256+(int)z[2])%256;
b1=(256+(int)z[3])%256;
if (r1==237 && g==93 && b1==232 && findonce) {
//a=449 b=145
findonce=false;
x1=a;y1=b;
isactive=true;}}}}
if (!isactive) {hm=false;mm=false;}
...
</i>
Basically most of this gaming bot-with a lot of time delays and state controls- basically goes through the code and looking for colors it identifies as good and clicks on them. And how do you click? It is quite simple. Wash on, wash off. err almost that easy. mousepress, sleep, mouse release. So this part of the code tries to determine if we are in an advancing mode, or stuck on the same level by checking a particular pixel relative to the marker we found. If we are in farm mode, but the time to kill a normal monster (t2k) is less than 4 seconds, but the game has been running for more than 40 seconds, then we will press the button so we can advance to the boss.
<i>
...
z3= ByteBuffer.allocate(4).putInt(img.getRGB((int)a+1108,(int)b+272)).array();
r3=(256+(int)z3[1])%256;
g3=(256+(int)z3[2])%256;
b3=(256+(int)z3[3])%256;
if (r3==255 && g3==0 && b3==0) {
if (t2k<4000 && (((new Date()).getTime()-wait2boss.getTime())>40000)) {
//System.out.println("time to kill"+(x1+1107)+":"+(y1+272));
fdat2=new Date();
r.mouseMove(x1+1107,y1+272);
r.mousePress(InputEvent.BUTTON1_MASK);
try { Thread.sleep(10); } catch(Exception e) {System.out.println(e.toString());}
r.mouseRelease(InputEvent.BUTTON1_MASK);
}}
</i>
And Keypresses and releases are not that different.
<i>
if (!isupgrading) {r.keyRelease(KeyEvent.VK_CONTROL);r.keyPress(KeyEvent.VK_Z);}
else {r.keyRelease(KeyEvent.VK_Z);r.keyPress(KeyEvent.VK_CONTROL);}
</i>
Now of course this example is probably more complicated than it needed to be. If you wanted to be lazy and not care about security, you could write a java program that looks down to your taskbar, opens up chrome, loads a webpage of your choice, logs you in. But other than the logging in part; automatically opening a webpage could be done through java's system.exec method.
**********************************
If you really wanted to be evil, you could perhaps write a virus, and have it execute a java robots package at night and install a cryptominer on hundreds of thousands of PCs feeding wallets you control....Until you get arrested and never heard of again.
**********************************
I In looking back at my old code for my capcha, I seen how poorly the code was that I borrowed for my captcha. The background color was set to one color, the noise to another set color, and the text and other color. A bot could could identify the background color, recognize the specs are garbage and change them into the background color, and basically run the image through OCR software which has no noise.
Similarly I have seen other bad captcha in the last week that would be easy to bypass.
At least 3 of the faucets use this. So at least 40 satories every 5 minutes if I wanted to write a script and leave an old computer running. Once the puzzle loads which gives you a clear picture of it, you click a little scrolling thing and move the puzzle piece into place. So if you can identify the original, take note of where the puzzle piece begins, and the empty spot is. Occasionally it does automatically fail you which you can immediately redo, but so long as a bot solves it at random times between 1-4 seconds and no longer than 12 hours a day I think it would be hard to tell.
<center><img src=https://i.imgsafe.org/0b/0bcfe0dfdc.png></center>
These types of captacha are very common, and they repeat themselves. A test to indentify each capture of this style would be to identify a and either store the whole image or specific spots on each square. Because these could be updated, one may want to basically hire a team in india to keep identifying captcha not previously found.
<center><img src="https://i.imgsafe.org/0b/0bf463cbb7.png">
<img src="https://i.imgsafe.org/0b/0bf4343033.png"></center>
There are some caveats that appear every once in a while too. But again, a team of people in india should tackle them no problem.
<center><img src="https://i.imgsafe.org/0b/0bf3b37ee5.png"></center>
This next one is unique, and the images were animated. But there is a finite amount of pokemon, and pokemon have their own colors, width and height.
<center><img src="https://i.imgsafe.org/0b/0bfe82726f.png"></center>
These ones are some of the worst ones. Some of this type I am unable to decipher. But even the ones that are difficult to make out, could be derived from various idioms, memes, lyrics, and movie quotes. Sometimes they ask questions, many times they are solvable by what would appear in a fact book, as well as a general idea as to understand what the question is asking (tallest, largest, smallest, anagram). But for the most part the background is easy to identify.We also see an inversion where a letter is in the background. So it is possible of course to run many tests on the same image. One that filters out large spaces, and one that considered enclosed areas. Another test that say looks at the boundaries one one color that strikes thought another, and corrects the colors as necessary. Similarly little specs (not seen here) could be purged, as well as large blobs too big to be in the solution that do not intercept. I suppose from there you can do some horizontal and vertical line test to try to identify the text. I have tried free OCR software on these without corrections, and without luck.
<center>https://i.imgsafe.org/0c/0c0bec4b27.png</center>
The very worse ones are drawing the outline of a sign. I've never been able to get one of those right myself.
But for the experiment I did last week. Even if we assume I wrote a bot that could produce say 100 sat every 5 minutes 24/7, 365 days a year, with bitcoin at $4500. That's about $470 a year. It is not worth the effort for an american programmer to try to exploit for all the time he'd have to commit to it, but there are countries where it may be worth the effort. That being said, the puzzle piece captcha may be something for a teenager to try. The other thing to consider is that if you can successfully bypass captchas through software, the greater profit may be in reselling the code and of developing writing better captchas. Odds are such code already exists.