Ask HN: Hacker News rate limits and robots policy?

benologist · on Nov 5, 2013

There's an official API that's much easier to work with and doesn't have aggressive precautions in place - http://hnsearch.com/api

xauronx · on Nov 6, 2013

I wouldn't necessarily say it's much easier to work with. Could not manage to pull down anything representing the front page, and from what I could tell from a lot of searching, no one else could either. So yes, you have access to the data but for 99% of uses it's either too old or not representative of what you would see in a browser.

Awesome resource for querying + running analytics, but for an HN client, not as much.

jkarneges · on Nov 5, 2013

Ah yes, I saw this. According to a post on their forums, the Thrift DB runs about 15 minutes behind HN itself. Not terribly great for my particular application (that checks for updates), but no doubt useful for other things.

minimaxir · on Nov 5, 2013

BTW: there is a limit of 1000 results per search.

benologist · on Nov 5, 2013

You can mitigate that by spidering users and searching domains to build a more complete database -

http://api.thriftdb.com/api.hnsearch.com/items/_search?filte...

I wrote a small NodeJS spider if anyone's interested -

https://github.com/benlowry/hnsubmitterstats

minimaxir · on Nov 5, 2013

Coincidentally, I was looking into that option as well, and that seems perfect for my needs :) thanks!

raspie · on Nov 6, 2013

From my experience the actual allowed crawl interval is somewhere between 120 and 180 seconds.

jkarneges · on Nov 6, 2013

I'm actually trying a test with a 180-second interval now. We'll see if this lasts longer than 51 hours.