
Show HN: News Extract API – Pull structured data from online news articles - artembugara
Hey HN,<p>It will be more of a &quot;How I released my API without managing a website, servers, users, and payments. With 0$ up-front cost&quot;<p>Over the past year, I have come up with a plan of how I could release my own product without having to deal with managing users and&#x2F;or dealing with payment processing.<p>It is a 3 steps procedure:
1. Make an API that solves a problem
2. Deploy it with a serverless architecture
3. Distribute through an API Marketplace<p>That took me about 2-3 days to develop an API using Flask, deploy it via Zappa on AWS, and release through RapidAPI.<p>Source code of API: <a href="https:&#x2F;&#x2F;github.com&#x2F;kotartemiy&#x2F;extract-news-api" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kotartemiy&#x2F;extract-news-api</a>
Subscribe to API on Rapid: <a href="https:&#x2F;&#x2F;rapidapi.com&#x2F;provider&#x2F;4109621&#x2F;apis&#x2F;extract-news&#x2F;users" rel="nofollow">https:&#x2F;&#x2F;rapidapi.com&#x2F;provider&#x2F;4109621&#x2F;apis&#x2F;extract-news&#x2F;user...</a>
I&#x27;m on ProductHunt today: <a href="https:&#x2F;&#x2F;www.producthunt.com&#x2F;posts&#x2F;extract-news-api" rel="nofollow">https:&#x2F;&#x2F;www.producthunt.com&#x2F;posts&#x2F;extract-news-api</a><p>Full article on how I did it: <a href="https:&#x2F;&#x2F;towardsdatascience.com&#x2F;api-as-a-product-how-to-sell-your-work-when-all-you-know-is-a-back-end-bd78b1449119" rel="nofollow">https:&#x2F;&#x2F;towardsdatascience.com&#x2F;api-as-a-product-how-to-sell-...</a>
======
dennisy
The core of the product is a third party lib Newspaper3k. Not sure I would
want to pay for the Flask wrapper.

~~~
holler
Interestingly I revisited the repo for news3k and it seems to be inactive with
300+issues, 70+ open PR's, last commit 12 months ago...

~~~
thundergolfer
Damn, yeah that's pretty unloved. Weirdly the developer is still advertising
consulting services for the project and allowing donations.

Given how popular the project is and the owner's absence, community ownership
seems like a no-brainer.

~~~
holler
I'm using newspaper for a new project and this has me wondering how hard it
would be to either fork it or create a new repo and let the community run
it... hrmm. For me it's not pressing right now but if it really stays stagnant
then maybe.

~~~
dennisy
I am using this lib too in one of our projects, I would be keen to do this.

How best do you think to structure this, move it into its own new org?

I also assume it would be good to try and get hold of the creator first?

------
speg
Oh man, I need this. I hope it works better than whatever I was trying before.
I was using it to grab all my HN upvote/favs and using Postgres full text
search on the article text, but most of them ended up being full of gibberish
or missing loads of text. A good way for me to answer "where did I see that
article last week..."

------
nikhilalmeida
Curious what the economics are there in working like this?

* What is the market size of developers companies that pay to have such wrapper APIs?

* What would pricing for something like this look like?

* What is the hit / miss ratio of API ideas?

I was thinking of putting together datasets and listing on similar
marketplaces. Anyone have any information to share on this?

~~~
polymorph1sm
I googled a put together a few similar services (not affiliated ) and it seems
like pricing varies quite a lot paid package start at 150 to 449 per month
(all provide free service though).

1) [https://currentsapi.services/en](https://currentsapi.services/en)

2) [https://contextualweb.io/news-api/](https://contextualweb.io/news-api/)

3) [https://webhose.io/products/news-feeds/](https://webhose.io/products/news-
feeds/)

4) [https://aylien.com/news-api/](https://aylien.com/news-api/)

------
salmaanp
Nice, it has some similarities to what I had done earlier. A reddit bot which
reads articles and posts a summary in the comments. It lived entirely on the
heroku using free dyno. Didn't know there was an API marketplace!

[https://github.com/SalmaanP/samacharbot2](https://github.com/SalmaanP/samacharbot2)

------
rkwz
I worked on something similar using a different approach:

Fetch the article

=> Run it through Mozilla Readability library

=> Extract plain content

[https://github.com/sheshbabu/readable-
scraper/blob/master/in...](https://github.com/sheshbabu/readable-
scraper/blob/master/index.js)

------
axegon_
I had done this some years ago but it was completely private and I used it for
summarizing news and informing me about certain events and shove a stuff in
couchbase for future analysis. Kind of lost interest at some point so I
scrapped it.

Looking at yours(This has become some sort of a habit at this point but first
thing I did was scroll through the code) the first thing that struck me were
the 55 dependencies. Woah...

~~~
artembugara
There are only 3-4 libraries that I installed when developing. Some libraries
require other libraries and so on.

Not sure if it is just a Python thing.

~~~
axegon_
Yeah, figured. I'm guessing you shoved everything from pip freeze into the
requirements. I don't recall flask, newspaper or langdetect having pillow as a
requirement for example...

------
sloev
If you want to stay in control of deployment i can recommend Lambdarest
instead. Its much simpler: only does routing and input marshaling. Has been in
prod since 2017 and is actively maintained (i am doing that ;-)

[https://github.com/trustpilot/python-
lambdarest](https://github.com/trustpilot/python-lambdarest)

~~~
artembugara
wow, great. I will have a look today

------
JPKab
Thanks for sharing this. I've never played around with Zappa, so looking
forward to looking at how you used it. The fact that you're parsing text is
icing on the cake, since I love all things NLP.

~~~
artembugara
yeah, Zappa is quite amazing.

Serverless seems like another great tool (supports many languages and many
platforms):
[https://github.com/serverless/serverless](https://github.com/serverless/serverless)

------
naikas82
I was looking for something like this. But with add-on to summarize the news
text as close as human. Any idea???

~~~
artembugara
Hey man. Yeah, text summarization sucks a lot at the moment. I would say there
is no such product at the moment. None I am aware about, at least

