
Show HN: English and Spanish news articles summarizer algorithm with word clouds - Agent_Phantom
https://github.com/PhantomInsights/summarizer
======
btutal
Good job man, have you ever considered taking it few steps further.

I would really appreciate, if an application would go through my RSS feeds and
offer me neutral news (without any comments or etc.) only facts in summary.

Imagine you have 15 different news from 15 different sources about same topic.
Let's say "Microsoft's new Chromium-Edge Browser" Each tech site is writing
about it from their perspective. Some say it is quite cool, some say it is
just a Chrome clone. I would appreciate a summary of this 15 web site without
additional comments.

What do you think?

~~~
Agent_Phantom
I actually really like your idea and it can be implemented very quickly.

This project is currently run for the subreddit of my country and the users
have liked it a lot, the summaries often remove the bias and keep the facts.

I can make a subproject that will load rhe urls from a rss feed and create
shorter summaries. Thankfully I would recycle 90% of the codebase.

~~~
btutal
It can even be a feature for platforms like Feedly. When I think about the
time that i spent to check daily news. I would be willing to pay a small
monthly fee to get that kind of summaries.

------
giancarlostoro
Man, I thought word clouds were gone. I remember the word cloud craze in the
mid to late 2000's then they sorta vanished. I guess other SEO enhancements
replaced them?

~~~
Agent_Phantom
Indeed, they are not as popular now but I thought they looked cool.

Fortunately the effort to implement them was very low since I reused an
internal variable.

------
Theodores
This is really good. Right now I am doing a bit of sentence parsing myself and
I appreciate the time you have put in to documenting your algorithms as well
as the tools used.

I am interested in a few metrics such as sentence length to flag run-on
sentences that are not good advertising copy. After reading your article I am
wondering what else I need to be doing since I am working at the word level.

I remember Microsoft Word had tools in it to gauge reading level - do you know
if there is a convenient library for that in Python world? I am not using
Python myself but there is a difference between a tabloid and a broadsheet,
maybe you could put that into the mix.

~~~
Agent_Phantom
I think this library does exactly what you need:

[https://github.com/shivam5992/textstat](https://github.com/shivam5992/textstat)

~~~
Theodores
Cough cough - still using PHP...

[https://github.com/DaveChild/Text-
Statistics](https://github.com/DaveChild/Text-Statistics)

Cooking on gas though.

Thanks again for your original post.

------
Agent_Phantom
On the following url you can see the final result after an article is
processed:

[https://www.reddit.com/user/huachibot/comments/](https://www.reddit.com/user/huachibot/comments/)

The good part is that the summary algorithm is independent from the bot logic.
It processes the text no matter how you obtained it.

