

Q: Why is PageRank so complicated? - Trindaz

A potentially silly question that might hurt our pending YCombinator application, I've got to ask it anyway:<p>I really have no idea how PageRank works, and I'm interested in where the magic lies. Is it in the way they create an index that can quickly be searched even though it's got billions of entries? I read lots about how it's based on the number of links pointing to pages, but isn't that just a matter of creating a big graph of all web pages and counting the links between them?<p>I'd appreciate any info on specifically what technical problems have been so ingeniously solved to create PageRank.
======
pbhjpbhj
<http://en.wikipedia.org/wiki/PageRank> is pretty good, if you want to see the
original patent then there's a link there.

Of course there are many alterations now to the original ranking algo.

Good quality links with varied text from a wide variety of high-authority
domains along with a well structured internal link graph should be all the
magic you need to get a high PR.

SeoMOZ's annual ranking factors report is a good read on the practicalities.

------
Gibbon
If you want to read up on PageRank and other search algorithms, this book:
[http://www.amazon.com/Googles-PageRank-Beyond-Science-
Rankin...](http://www.amazon.com/Googles-PageRank-Beyond-Science-
Rankings/dp/0691122024) goes into more detail.

Larry joked that Sergei just wanted to see how cool he was by measuring how
many people were linking to him.

------
cperciva
PageRank isn't complicated. It's just an eigenvector.

------
korch
It's not just about eigenvectors, though that is the coolest part of it. The
"magic" part is that PageRank is also about stochastic processes, which use
the Ergodic theorem to show that their PageRank exists at an arbitrary scale.
Therefore it doesn't matter how big or complex the link structure on the web
gets—it will scale.

See this:

"For any matrix A = [cP + (1-c)E]' where P is an n×n row-stochastic matrix, E
is a nonnegative n×n rank-one row-stochastic matrix, and 0 =< c =< 1, the
second eigenvalue of A has modulus less than or equal to c. Furthermore, if P
has at least two irreducible closed subsets, the second eigenvalue is equal to
c.

This statement has implications for the convergence rate of the standard
PageRank algorithm as the web scales, for the stability of PageRank to
perturbations to the link structure of the web, for the detection of Google
spammers, and for the design of algorithms to speed up PageRank."

<http://answers.google.com/answers/threadview/id/379557.html>

~~~
Trindaz
Thanks for this korch very interesting.

