Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This works great, except that some sites seem to block requests from EC2 hosts. StackOverflow and Yelp are two that come to mind immediately although I'm sure there are others. If I remember right, StackOverflow only lets you access via the API if you're on an EC2 host.

On the other hand, I can see where they're coming from by banning the whole netblock. Otherwise you could scrape until your IP get banned for blowing a rate limit, then tear down that instance and spin up a new one.



Same experience using Slicehost as a VPN and searching on Google:

    We're sorry...

    ... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.


When SSHing from a Linode server, I generally get a captcha on Google and then I assume they whitelist my IP for a bit, because I don't see it again.


Same experience with Linode but unfortunately, I don't even get a CAPTCHA when using my Slicehost VPN.


I get this same message about once a week since I frequently use my Linode VPS as a proxy for my web traffic, but it's never lasted for more than a half hour or so. I've always wondered why their heuristics only seem to notice me so infrequently. Do you get it consistently?


Possible you only have that problem during peak hours? I would wager they're just rate limiting by IP or IP pool.


Yes, I have never been able to access Google from my Slicehost VPN IIRC. I also used a Linode VPN for a while and got the message infrequently with the possibility to enter a CAPTCHA to complete my request.


Wasn't slicehost used as part of that google gmail hack thing a while back (in 2009 or something)? Or was that linode? I can't remember.


Related: http://blog.y3xz.com/blog/2012/03/11/stack-overflow-stop-blo...

I tried contacting SO about this, and got a negative response.


I also connect a lot through an EC2 machine that I use as a proxy. I do this because I do not live in the U.S. and many websites require U.S.-based IPs.

It's a pity that StackOverflow doesn't allow me to do this (so I end up turning the proxy on and off and back on again). It's not like you couldn't rent a cheap Linode instance (or from another provider, take your pick) and do whatever you want with SO, if you really wanted to.

SO needs more sophisticated tools to block the access for bots/crawlers. IP blocking just doesn't cut it and tends to discriminate against legitimate users with special needs.


"It's a pity that StackOverflow doesn't allow me to do this (so I end up turning the proxy on and off and back on again)."

FoxyProxy (and probably other extensions) make site-specific proxy settings feasible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: