

PageRank is bad math: discussion - pixcavator
http://inperc.com/blog2/2011/07/05/pagerank-is-bad-math-discussion/

======
drx
Regardless of whether PageRank is "bad math" (the author being the arbiter of
what's bad), it was never about being formally anal, it was about solving a
problem -- making search much, much better than the then-competition.

PageRank solves the problem with flying colors. There is nothing wrong about
having hidden constants that you tweak until you get the results you want. The
alternative would be to, instead of coding what has become Google, attempt to
find a more general solution. Maybe you'll find it. Maybe. And if you do, by
the time you have, someone else will have come and made Google instead of you.
And for what? Mathematical purity? Phobia of constants?

I suppose the author also feels much of physics is also bad, since it's
riddled with constants upon constants, all of which are "ticking time bombs":
<http://en.wikipedia.org/wiki/Physical_constant>

~~~
pixcavator
Google was successful then with PageRank, but maybe now they need something
better? It is my personal opinion but building something as huge (now) as
Google search on such a shaky foundation as PageRank is dangerous.

I don’t think physical constants are made up and they certainly aren’t hidden.

~~~
Locke1689
PageRank is one of more than 200 metrics Google uses in its search engine.
Your presumption that it is the foundation, or even the most important metric,
is unwarranted.

~~~
pixcavator
Initially it _was_ the foundation and currently let’s just admit that neither
of us knows.

~~~
Locke1689
Well, generally how search works now was actually one of the first things I
learned when I started working for Google, but my point was that you don't
know and are just making things up.

~~~
pixcavator
Ok maybe you do know. However, it's odd that you bring up that you work at
Google and then add nothing new to the discussion (everybody knows about the
200 metrics). How about: "You're wrong, PageRank has _never_ been the
foundation of Google search"?

~~~
Locke1689
I never said PageRank has never been the foundation of Google search or that
PageRank isn't a part of search, just that there are other parts that you are
not taking into account.

As to why I'm not saying exactly how Google search works -- I'm obviously
under confidentiality agreements to not disclose that information.

------
akie
The math in the original 1998 PageRank paper might not be mathematically 100%
sound, but why would they need that in the first place? Do you really think
you need a formal analysis before you build something? This is not academia,
you know - if you need a formal proof of everything you do, you'd never get
anything done.

Besides, the paper you're referring to is 13 years old. Why drag it up now?

~~~
pixcavator
They found time to publish this in 1998 but have been too busy since? I don’t
think so. The reason is that they decided to keep all new development secret.
That’s why we have to speculate.

~~~
Locke1689
Your speculation is in no way sound.

------
justin_vanw
"But π=3.14159265358979 is a time bomb! Sooner or later it will fail you when
it’s not accurate enough anymore."

Well, actually for almost everything humans do, this will never, ever fail
you. In fact, I can't think of a single thing this will fail for outside of
physics research or formal mathematics.

------
VMG
it may be flawed math but it is solid engineering

~~~
pixcavator
Is manually penalizing the ranks of spam sites solid engneering too?

~~~
justin_vanw
Well, sure. Google is a machine that turns small text queries into internet
links. It does this remarkably well. If manually penalizing sites gets you
solid improvements beyond what automated or clever methods give you, it is bad
engineering to not do it. What's the argument against this? Sure, it makes
engineers feel icky to have manual anything, and it's not nearly as scaleable
or reliable as some hypothetical automated solution. The issue is that the
hypothetical automated solution doesn't exist. I'm sure they have 500 of the
smartest, most capable people in the world working on this every single day,
that they haven't found it is good evidence that the solution is not obvious.

Surely a fully automated, self correcting algorithm exists to penalize spam
sites and shitty content farms, and whoever comes up with it will be _heavily_
rewarded financially for doing so, either because they will make major inroads
in the search space, or because there will be a bidding war for the
technology. That nobody has come up with it outside of Google either says it
likely is outside of the engineering ability of humans to do it at the moment.

------
jcampbell1
> But π=3.14159265358979 is a time bomb! Sooner or later it will fail you when
> it’s not accurate enough anymore.

If you know the exact diameter of the sun, and calculate the circumference
with 3.14159265358979 as an estimate for pi, then your error will be about 10
microns. Using a 14 digit estimate of pi, is never going to be a timebomb for
any practical task. If the earth was round to 14 significant digits the
highest mountains would tower 10 nanometers above the deepest valley.

~~~
pixcavator
You are saying, I think, that we won’t ever need to know π beyond some
accuracy. By saying that you, if you are right, are doing _good math_.

