Show HN: HackerNews RealTime Summarization Using Machine Learning (hn10.org)
10 points by bexp 2 hours ago | hide | past | web | 9 comments | favorite





Good summary, how is h 10 built? It would be cool to summarize other sites feed with the same tech.

I used NLTK in Python http://nltk.org + Twisted for serving HTTP

Does it summarize our comment threads or the pages it links to?

Just pages for now.

Alright! Did you ever try to run a summarizer on a post's comment section? It pretty much creates an "article" for you. I've always thought that article generation from comment threads is an area that should be explored.

This would really benifit from an explanation of what I'm seeing. can't really upvote it without knowing what's going on....

there are couple things here: 1. scrapping web page to get text content 2. use NLTK to proccess text and get summary and keywords 3. wrap it into REST API and serve as web service

You could probably get cleaner input for step 1 via the Mercury API [https://mercury.postlight.com/web-parser/] — it has a lot of affordances for different kinds of HTML formatting.

Thanks will try at some point, my biggest concern was that those kind of API's are almost all paid and rarely open sourced.

