Hacker News new | past | comments | ask | show | jobs | submit login
Scrapping HN / HNewsletter ?
1 point by lcrmorin on Sept 1, 2020 | hide | past | favorite | 2 comments
As a pet project, I would be interested in scrapping some queries on HN / HNewsletter. I am starting with Python (as per Automating the boring stuff with Python).

However, I encouter some practical problems.

For HN, I am able to search a term, but I am not sure how to navigate the different pages (or if it is needed at all).

How would one get all the titles (+points +number of comments) that match a specific query ? What would be the most practical way to also get the comments in an exploitable format ?

For hacker newsletter, I found that website : https://hackernewsletter.com/issues/ that seems to make the loop over issue relatively easy. But issue before 250 seems to be missing. Any idea on where I could find them ? Or an already compiled archive ?





I believe a copy of HN data is on BigQuery, not sure how current




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: