Some of you might have read my text about web 2.0 bots. Since recently bots have been successfully banned one of major Polish sites I've been testing them on, I think it's time to wrap it up. So, as part of the summary, I've decided to show the idea and how this can be considered a flaw in social websites concept.
Summary:
The idea I've used is very simple - set up multiple machines (virtual in this case), make each of them use different IP (for example by proxies or tor) and make a small app that uses IE to automate websites. It's nearly impossible to detect such a bot - IE behaves exactly the same as with normal user - Javascript, AJAX, image loading, iframes, cookies, everything works the same. Using this to create and build ranking of multiple fake accounts (40 in my case) is enough to manipulate content of website effectively - add and vote for your own, block unwanted content, generate comments and vote for them.
History:
All of this started last year when I originally started playing travian game. For me it was obvious that a computer (with some manual intervention by me) could play it better than me clicking it whenever I'm by my computer. Computer can play it when I'm asleep, busy at work, on vacation - basically 24/7. What I did was pick up Internet Explorer COM API and started using it from Tcl. I'll save implementation details for a few more articles.
In the beginning the logic was simple - monitor resources and build things as soon as they're available. Results were quite surprising - after a reset of the game (each player starting off from beginning) my bot was in top 50 for a long time. Next I added a few more bots, added a bit more logic (mainly me being able to tell each player which buildings to do etc), added resource sharing and ... I failed. I forgot that game featured a system that prevented weaker players from feeding resources to stronger ones for free.
After that, I dropped the idea of using IE for some time. A few months passed and I got interested in a polish digg clone. The thing I did originally was to use small http client along with cookie handling and basic DOM model to automate tasks. Tor was used for making connections come from multiple sources. While it was working for some time, web admins found about it quite fast - especially since a friend of mine also did a similar system and we ended up competing on numbers. That wasn't a good idea. We all got banned, site blocked tor IPs, and my first account there (which was actually a legit one) was banned. Any attempt to use the same tool ended up in account bans in a few hours.
A few months passed, winter came and I thought - would it make sense to use IE for digg clone? I started building a good framework this time - one I could use for all sites I'd want to play with. Previously the code was pretty much limited to sites I played with. So, a few weeks passed and first version was ready. For some time, I added some features - reasonable scheduler, failover, autoupdate, webservice for managing it.
System has been working quite well, my bots got quite good rankings (1/3 of all bots were in site's top 1000 users). Recently, after around 6 months, I decided to see if aggressively using the bots would cause them to be detected. It did - I guess either users reported this or the site had a good multiaccount detection algorithm. So, now that the experiment is over, I can publish it along with some thoughts.
Thoughts:
One of the most important things I realized after doing this is that web 2.0 has a major flaw - either automated or organized (as in multiple users doing this) users/bots can effectively manipulate a website and its content. If I was able to pull it off with just minimal resources, large company or organization do the same to advertise their stuff. Same with political parties and so on.
Should people trust these sites? After some time spent on this issue I'm not so sure. For fun stuff, another photo or video of a cat, I'd say it's people's choice. In other cases, especially politics - I'd think twice.
More to come:
Over next weeks, I'll describe more details about the architecture, tools used and how it all works. I might also go back to online gaming.
|