Show HN: Graph of wikipedia articles semantic similarity (LSI, Python, d3.js)

lucamartinetti · on March 21, 2012

Small experiment of visualization of wikipedia articles as a graph using d3.js.

Articles with more traffic are bigger. I computed the semantic similarity using LSI with python (gensim) You have to scroll down/right a bit!

http://similarityapi.appspot.com/graph/?title=blade%20runner

There is also a JSON api: http://similarityapi.appspot.com/api/v1/?limit=100&title...

All feedback is appreciated:

@lucamartinetti luca@luca.io

3pt14159 · on March 21, 2012

I've had much, much better results with LDA than LSI. Give that a shot if you have a chance, you'll be blown away. Stop word ratios are important, and make the max number of tokens 500,000.

viscanti · on March 21, 2012

The JSON api should degrade gracefully if results aren't found. I.E. There should be a JSON message explaining that that item doesn't exist.

lucamartinetti · on March 21, 2012

Right! It could use some input checking / normalization too. It expects the title parameter to be lower case now.

rplnt · on March 21, 2012

Option to select language version could be a good feature (defaulting to en as now).

Radim · on March 21, 2012

how much data did you use for the semantic analysis?

lucamartinetti · on March 21, 2012

The whole text of all articles from wikipedia english (then filtered those with more the 1k views last month)

edo-codes · on March 21, 2012

I've never liked these scrolling animations. You need too much precision to see a part of the page clearly, while with normal scrolling it wouldn't matter if the information you're reading is at the bottom or top of the screen.

stephengoodwin · on March 21, 2012

Does the font size for a node represent it's similarity with the query page?

lucamartinetti · on March 21, 2012

It represents the traffic of the article. Ten most related articles are displayed for each expanded node. Articles with more inbound links are darker

lucian1900 · on March 21, 2012

Blank page in Chrome.

ssn · on March 21, 2012

Down?

lucamartinetti · on March 21, 2012

Not for me. You need a modern browser (chrome or firefox) and scroll a bit