Since the Hacker News API (https://github.com/HackerNews/API) used in this scraping is being brought up again, I'll ask a burning question: is development of the API dead?
From the commit notes in that repo, the only changes from the initial release in 2014 are "minor README updates."
I've been using it for a project (collecting video lectures for https://www.findlectures.com) and it seems to work pretty well and seems to keep up to date.
At minimum, there is no authentication endpoint for HN users, which is the primary reason you haven't seen many HN apps take off in the past 2 years.
A more damning reason is that the official HN API in its current state is worse than the API it replaced! The Algolia API (https://hn.algolia.com/api) is still active, and can retrieve data with 1000 entries per page (vs. 1 at a time for the official API), and can also retrieve the comments plus text of a submission thread in a single HTTP request (the official API requires the user to perform a HTTP request to retrieve the text for each comment in a thread)
This is true. Without OAuth, I was not able to connect to individual user accounts. I wanted to allow users to display their own upvote/post history (see here: https://www.sizzleanalytics.com/reddit/)
I was unaware of the algolia api, that will help for future tasks I'm sure. Thanks!
From the commit notes in that repo, the only changes from the initial release in 2014 are "minor README updates."