Show HN: HackerNews RealTime Summarization Using Machine Learning
(
hn10.org
)
10 points
by
bexp
2 hours ago
|
hide
|
past
|
web
|
9 comments
|
favorite
acd
35 minutes ago
Good summary, how is h 10 built? It would be cool to summarize other sites feed with the same tech.
reply
bexp
30 minutes ago
I used NLTK in Python
http://nltk.org
+ Twisted for serving HTTP
reply
Raphmedia
28 minutes ago
Does it summarize our comment threads or the pages it links to?
reply
bexp
23 minutes ago
Just pages for now.
reply
Raphmedia
20 minutes ago
Alright! Did you ever try to run a summarizer on a post's comment section? It pretty much creates an "article" for you. I've always thought that article generation from comment threads is an area that should be explored.
reply
alistproducer2
2 hours ago
This would really benifit from an explanation of what I'm seeing. can't really upvote it without knowing what's going on....
reply
bexp
26 minutes ago
there are couple things here: 1. scrapping web page to get text content 2. use NLTK to proccess text and get summary and keywords 3. wrap it into REST API and serve as web service
reply
jeffehobbs
23 minutes ago
You could probably get cleaner input for step 1 via the Mercury API [
https://mercury.postlight.com/web-parser/
] — it has a lot of affordances for different kinds of HTML formatting.
reply
bexp
10 minutes ago
Thanks will try at some point, my biggest concern was that those kind of API's are almost all paid and rarely open sourced.
reply
