

Ask HN: How to get a list of all submissions on Hacker News? - maximumwage

Hi, does anyone know how I could get a list of all the submissions ever posted to Hacker News or how I would go about crawling ycombinator.com?  I have some time on my hands and would like to expand my skills, and the HN archives seem like a good place to start.
======
RiderOfGiraffes
What skills do you have?

~~~
maximumwage
minor scripting skills; I was hoping to find an open source crawler, or a way
to download an entire website.

~~~
RiderOfGiraffes
You can use wget to download an entire site, but I really, really wouldn't
pull HN like that.

If you really want everything then here's one way to do it.

Start with curl, pull <http://news.ycombinator.com/item?id=1> and then use
Python and BeautifulSoup to extract the items that come with it. Then pick the
smallest number that you haven't pulled yet, and repeat.

But I would be absolutely certain you've got it all right before letting it
loose, and it is ethically required of you to throttle your bandwidth when
doing something like this.

But think twice - there might be better ways of learning what you want to
learn.

