Why exactly not? Amazon and probably Google cloud's IPs are probably available publicly, and you can probably discern Googlebot's IP from the cloud ones (using user agent, or even ip ranges). You can sensibly block IP ranges from any dedicated hosts/cloud providers. Only <1% of them would be valid traffic (ie someone working in AWS going to your site directly)
• For search engines with hosted services, its unlikely their spiders and hosted traffic come from the same IP blocks.
• In general, if you're running a public-facing website, you're not particularly interested in datacenter-originating traffic. Throwing up a large blocklist is actually going to cost you very little legit traffic.
• There's a highly underappreciated service at http://asn.routviews.org/ which gives reverse IP lookups of CIDR block and ASN for a given IP. Query the text record for the reversed dotted quad at the domain asn.routeviews.org. E.g., for news.ycombinator.com, the IP is 184.108.40.206. Reverse that to query:
$ host -t txt 220.127.116.11.asn.routeviews.org
18.104.22.168.asn.routeviews.org descriptive text "13335" "22.214.171.124" "21"
• You may need to punch holes for specific hosts. That's fairly tractable.
Like archive.org, for example...
I recall, when I signed up for my free year of AWS, that it required a valid credit card. They claimed to only target providers that need essentially only an email address.
I think that detail makes the title incredibly misleading.
This also illustrates the rules of efficiencies. If there exists a resource one makes available for a given purpose (signing up lots of developers for free accounts) then you can assume there exists others who will figure out how to get those resources and turn them into something else of value to them.
There's no such thing as a free lunch. There's also no such thing as limiting fraud. It will always be a choice.
I think the "Amazon" spin in the title is from many other services running atop of EC2 and effectively reselling EC2 instances or access to VMs that run on EC2. They could more easily create accounts with these services and technically still run "within Amazon's cloud".
That's, after all, just an permutation of the 'use scraped email addresses to create realistic-looking fake email addresses'. I think anyone wanting to use this for damage wouldn't be stopped by some CC requirement, especially if it's not charged.
I see nothing ethically wrong with this. As long as companies offer a free trial, they should consider the risk that users will take advantage of it to do anything that can be done on a computer, whether that means running proxies, scrapers, or miners.
The only plausible issue here is that the "hackers" automated the process, meaning they have no intention to actually convert into paid customers. But surely the companies account for the risk of free trial users not converting to paid. That's the whole point of a free trial.
The only way to mitigate this is to require a credit card up front for the free trial, which IIRC has been shown to reduce conversions. It's likely that reducing conversions by requiring a credit card will cost companies more than they lose from automated scripts signing up to their service. (And by the way, a CAPTCHA will be futile here -- the bots are only signing up to a few dozen accounts, so the "hackers" can manually solve the CAPTCHAs or farm each out to Pakistan for fractions of a cent.)
Back in my "blackhat SEO" days, these kind of scripts were super common. Any minimally profitable process that can be automated, can and should be. Because when you scale $0.25/day across 1000s of accounts, suddenly you're making quite a bit more.
This is actually a great idea and I just might implement it myself. Come at me! ;)
They set up a litecoin mining botnet and out fear being detected - and more generally doing the right thing - shut it down / reduced it to only a few bots.... So it's not like they actively tried to hide it via some kind of rootkit or the likes...
I think http://www.gutenberg.org/ does the same.
What should we do then?
I am not familiar with bitcoin so I can't say exactly how to defend it other than just say block certain tools at least.
Any site providing service to run user code is always vulnerable to code execution abuse, and this is where sandbox and strict job execution policy (job timeout, system call prohibitions, whitelist approach etc) are required. But that will degrade the service's usefulness. Obviously reading /etc/passwd should return permission denied or at least returns nothing and any file creation is restricted to local sandbox.
I'd like to ask people from codeacademy and coursera to reveal a bit more about their security policy. You can build your own rainbow table if they allow you to use the hash library, http request and sends request to remote service (with a digitalocean account you can pay $5 for 20GB SSD storage).
If you want my unsolicited advice, I would recommend really focusing on your cloud market offering. I think this is going to be huge in the next decade. Cloud computing is rapidly moving to a utility model, and whoever controls the platform connecting service providers to service consumers is going to make a shitload of money.
I'm actually working on a product in a similar space now; if you want to chat, shoot me an email -- firstname.lastname@example.org