
The Anatomy of a Large-Scale Hypertextual Web Search Engine - aduffy
http://infolab.stanford.edu/~backrub/google.html
======
danso
A great paper, not just because of its historical significance, but because of
its overall readability and insights even today. For people new to algorithms,
I use it as an example of how valuable judgments -- as far as humans are
concerned -- can be made at scale by analyzing something very mechanical (e.g.
backlinks, font size/weight as a heuristic for word-weighting). And how
evaluating algorithms ultimately involves subjective intuition and judgment.

IIRC Google didn't mine its logs to analyze user behavior until a few years
in, which means that their Brin's and Page's assessments that BackRub was
returning better search results was entirely subjective, in the "it's just
common sense" way:

> _For example, we have seen a major search engine return a page containing
> only "Bill Clinton Sucks" and picture from a "Bill Clinton" query. Some
> argue that on the web, users should specify more accurately what they want
> and add more words to their query. We disagree vehemently with this
> position. If a user issues a query like "Bill Clinton" they should get
> reasonable results since there is a enormous amount of high quality
> information available on this topic. Given examples like these, we believe
> that the standard information retrieval work needs to be extended to deal
> effectively with the web._...

> _While evaluation of a search engine is difficult, we have subjectively
> found that Google returns higher quality search results than current
> commercial search engines._

------
snikeris
Interesting bit given how Google now makes its money:

Currently, the predominant business model for commercial search engines is
advertising. The goals of the advertising business model do not always
correspond to providing quality search to users. For example, in our prototype
search engine one of the top results for cellular phone is "The Effect of
Cellular Phone Use Upon Driver Attention", a study which explains in great
detail the distractions and risk associated with conversing on a cell phone
while driving. This search result came up first because of its high importance
as judged by the PageRank algorithm, an approximation of citation importance
on the web [Page, 98]. It is clear that a search engine which was taking money
for showing cellular phone ads would have difficulty justifying the page that
our system returned to its paying advertisers. For this type of reason and
historical experience with other media [Bagdikian 83], we expect that
advertising funded search engines will be inherently biased towards the
advertisers and away from the needs of the consumers.

~~~
michaelmarion
"we expect that advertising funded search engines will be inherently biased
towards the advertisers and away from the needs of the consumers."

Well, yeah. Money talks.

------
tonmoy
Maybe the title should have the publication year in parentheses?

------
euyyn
I love that Figure 1, "High Level Google Architecture", is unnecessarily (but
à la mode of the time) in 3D.

