Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Py package to collect normalized news from (almost) any website (github.com/kotartemiy)
26 points by artembugara on Feb 24, 2020 | hide | past | favorite | 4 comments



Congratulations on the product! Will check out soon!

4 questions:

1. Do you see yourself staying committed to open sourcing all your code?

2. What agreements, if any, do you have with the news provider websites to continue to source the data?

Are they all sourced through the RSS feed endpoints? If so, the following are not that important to answer although I would still love to get your input:

3. How frequently do you plan to refresh data if I were to pay you $25/mo for the hosted service?

4. Do all tiers get the same refresh speed, just more calls?

PS: LOVED the use of poetry for deps. Great job! I'll push a PR to package everything into a docker container if you feel that will be helpful. It will bootstrap poetry using pipx into the stock python image


Hey, thx!

1. We would love to do so. Depends on our time and motivation. I definitely will write about all the steps on my Medium blog (you can find the link on newscatcherapi.com) 2. Yep. RSS. With package, you scrape this data yourself. So I do not see any problems here. Regarding the product, I still do not understand how it works, but there are many products that to the same so I am not worried at the moment. I think it is important not to share the full text of an article. 3. Start from 4/hour for MVP. Then can increase it for as much as I want. Use serverless infrastructure. 4. Totally.

Waiting for your PR!


What makes this library different from something like Newspaper? https://github.com/codelucas/newspaper


It is very different. Newspaper3k allows you to automatically collect data when you know the url of an article.

Newscatcher gives you all the latest news (with url) for all the news providers.

It is nice to combine those 2.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: