

Access to HN posts - bashgrep

I am trying to use ec2 machines to scrape HN posts, but it isn't working.  Specifically, it seems like HN is rate limiting request from ec2 machines.  Does HN not want people scraping HN?  How can I get access to all HN posts?
======
itsprofitbaron
You can scrape HN as long as you respect the robots.txt[1] and don't retreive
more than a couple of pages per minute.

Have you considered just pulling the data from HNSearch's API[2] or the one by
iHackerNews[3]?

[1] <https://news.ycombinator.com/robots.txt>

[2] <https://www.hnsearch.com/api>

[3] <http://api.ihackernews.com/>

------
mindcrime
_Does HN not want people scraping HN?_

I don't remember the details, but I think pg has expressed some desire to not
have people scraping the site, or at least not to scrape often. I believe the
justification was that too many bots crawling/scraping the site hurts
performance for everybody else. You might try searching the old posts for more
info on the topic.

 _How can I get access to all HN posts?_

Try <http://api.ihackernews.com/>

