Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Graph of wikipedia articles semantic similarity (LSI, Python, d3.js) (similarityapi.appspot.com)
54 points by lucamartinetti on March 21, 2012 | hide | past | favorite | 13 comments


Small experiment of visualization of wikipedia articles as a graph using d3.js.

Articles with more traffic are bigger. I computed the semantic similarity using LSI with python (gensim) You have to scroll down/right a bit!

http://similarityapi.appspot.com/graph/?title=blade%20runner

There is also a JSON api: http://similarityapi.appspot.com/api/v1/?limit=100&title...

All feedback is appreciated:

@lucamartinetti luca@luca.io


I've had much, much better results with LDA than LSI. Give that a shot if you have a chance, you'll be blown away. Stop word ratios are important, and make the max number of tokens 500,000.


The JSON api should degrade gracefully if results aren't found. I.E. There should be a JSON message explaining that that item doesn't exist.


Right! It could use some input checking / normalization too. It expects the title parameter to be lower case now.


Option to select language version could be a good feature (defaulting to en as now).


how much data did you use for the semantic analysis?


The whole text of all articles from wikipedia english (then filtered those with more the 1k views last month)


I've never liked these scrolling animations. You need too much precision to see a part of the page clearly, while with normal scrolling it wouldn't matter if the information you're reading is at the bottom or top of the screen.


Does the font size for a node represent it's similarity with the query page?


It represents the traffic of the article. Ten most related articles are displayed for each expanded node. Articles with more inbound links are darker


Blank page in Chrome.


Down?


Not for me. You need a modern browser (chrome or firefox) and scroll a bit




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: