
Show HN: Py package to collect normalized news from (almost) any website - artembugara
https://github.com/kotartemiy/newscatcher
======
subhobroto
Congratulations on the product! Will check out soon!

4 questions:

1\. Do you see yourself staying committed to open sourcing all your code?

2\. What agreements, if any, do you have with the news provider websites to
continue to source the data?

Are they all sourced through the RSS feed endpoints? If so, the following are
not that important to answer although I would still love to get your input:

3\. How frequently do you plan to refresh data if I were to pay you $25/mo for
the hosted service?

4\. Do all tiers get the same refresh speed, just more calls?

PS: LOVED the use of poetry for deps. Great job! I'll push a PR to package
everything into a docker container if you feel that will be helpful. It will
bootstrap poetry using pipx into the stock python image

~~~
artembugara
Hey, thx!

1\. We would love to do so. Depends on our time and motivation. I definitely
will write about all the steps on my Medium blog (you can find the link on
newscatcherapi.com) 2\. Yep. RSS. With package, you scrape this data yourself.
So I do not see any problems here. Regarding the product, I still do not
understand how it works, but there are many products that to the same so I am
not worried at the moment. I think it is important not to share the full text
of an article. 3\. Start from 4/hour for MVP. Then can increase it for as much
as I want. Use serverless infrastructure. 4\. Totally.

Waiting for your PR!

------
tokyolights
What makes this library different from something like Newspaper?
[https://github.com/codelucas/newspaper](https://github.com/codelucas/newspaper)

~~~
artembugara
It is very different. Newspaper3k allows you to automatically collect data
when you know the url of an article.

Newscatcher gives you all the latest news (with url) for all the news
providers.

It is nice to combine those 2.

