
Scrapping HN / HNewsletter ? - lcrmorin
As a pet project, I would be interested in scrapping some queries on HN &#x2F; HNewsletter. I am starting with Python (as per Automating the boring stuff with Python).<p>However, I encouter some practical problems.<p>For HN, I am able to search a term, but I am not sure how to navigate the different pages (or if it is needed at all).<p>How would one get all the titles (+points +number of comments) that match a specific query ? What would be the most practical way to also get the comments in an exploitable format ?<p>For hacker newsletter, I found that website : https:&#x2F;&#x2F;hackernewsletter.com&#x2F;issues&#x2F; that seems to make the loop over issue relatively easy. But issue before 250 seems to be missing. Any idea on where I could find them ? Or an already compiled archive ?
======
varbhat
[https://github.com/HackerNews/API](https://github.com/HackerNews/API)

[https://hn.algolia.com/api](https://hn.algolia.com/api)

------
verdverm
I believe a copy of HN data is on BigQuery, not sure how current

