Hacker News new | past | comments | ask | show | jobs | submit login

> but had fun doing it

Well, that's more than a lot of people can say about certain projects!

Want to give us a run-down of how you put it together?




Definitely, it was actually pretty simple once I figured out what I was going to do. I started from the link graph put together by Henry Haselgrove (http://users.on.net/~henry/home/wikipedia.htm) that I found when looking through the EC2 public datasets. I then had a few easy steps.

1) flip the link graph from outgoing to incoming, so from any page I can see what links to it.

2) I found all the distances and paths iteratively by exploding out from Adolf_Hitlers page. http://www.johnandcailin.com/blog/cailin/breadth-first-graph... and blogs like it were very helpful.

3) loaded the data into a large binary file that I divided into indexed parts that I compressed and uploaded to appengine to extract and load into bigtable (this took the most amount of time! both to run and to write the code to make it work)

4) ??

5) profit


Cool. How long did it take and how big is it?

Also, what's the longest path?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: