

The Google PageRank Algorithm in 126 Lines of Python - hhm
http://kraeutler.net/vincent/essays/google%20page%20rank%20in%20python

======
pc
"I was surprised by the simplicity of the math underlying the google PageRank
algorithm, and the ease with which it seemed to be efficiently implementable"

It's actually not all that easy to implement this stuff efficiently when n
grows to any sort of useful size. E.g. the article uses an n x n adjacency
matrix, so if we take n = 1 million pages, we need 4TB just to represent it,
assuming no overhead. (Adjacency lists help here.)

See
[http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en...](http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en&doc=2003-16&format=pdf&compression=&name=2003-16.pdf).

(I hacked together a script to compute this on MediaWiki wikis a while ago, by
examining the DB directly -- <http://collison.ie/code/pagerank.rb>)

~~~
vincentk
As the author of the above article, I'd like to point out that the adjacency
matrix is extremely sparse. If you use a sparse representation of the link
matrix (as I have done -- your claim is incorrect), its space usage will scale
as O(N), and not as O(N^2) as you suggest.

By that of course I don't mean to say you can use this script to build your
own google.... but the problems lie elsewhere.

------
eru
On of our professors used PageRank as an example to introduce matrices. He
explained the algorithm in class. (Not this particular python implemention
though.)

------
rms
The original algorithm, I presume... I would think they're up to millions of
lines of python/c++ by now.

------
tlrobinson
Next: "Linux in 62 lines of code!!!"

~~~
trekker7
Bah, PG could write that in Lisp on the back of a napkin.

~~~
ph0rque
With arc, he'd need just a postage stamp.

<http://reddit.com/r/programming/info/6710p/comments/c0312wv>

~~~
timr
A bit more for the unicode support, though.

