Hacker News new | past | comments | ask | show | jobs | submit login
The PageRank Citation Ranking: Bringing Order to the Web (1998) [pdf] (stanford.edu)
86 points by niico on Oct 21, 2016 | hide | past | web | favorite | 39 comments

I love how the (now famous) researchers dismiss at the time what has become Google's main chase:

"These types of personalized PageRanks are virtually immune to manipulation by commercial interests. For a page to get a high PageRank, it must convince an important page, or a lot of non-important pages to link to it. At worst, you can have manipulation in the form of buying advertisements(links) on important sites. But, this seems well under control since it costs money"

And yet, everything in that paragraph is absolutely true. If anything, they have stopped short from taking it to its logical conclusions. Since it costs money, only when more money than it costs is involved will this take place.

To go further, the implicit assumption that they were making when stopping short was: Google will never be big enough that people will spend money to manipulate it. But it did, and people do.

To me this reinforces that you should take seriously things like 50% bitcoin attacks. If bitcoins become valuable enough, someone will do it.

And here's the original patent. I never knew until now that Sergey Brin wasn't on it.

Will expire in about a year and three months!


Well, they didn't call it Brinrank :)

the best part of the paper was this: "To test the utility of PageRank for search we built a web search engine called Go ogle" now a company worth hundreds of billions.

Also fun and related: http://infolab.stanford.edu/~sergey/

And another paper by Page and Brin: http://infolab.stanford.edu/~backrub/google.html

10+ years ago I needed to contact someone from Google and I found this page. At the time, I didn't know who Sergey Brin is. So I mailed the guy to an email provided at the bottom of the page and told my colleagues something like:

    I found a guy who works at Google, we should be all good, I just emailed him, let's see what he says. 
Everyone was laughing their heads off when they asked me the guy's name.

In the mobile app I'm using the company name in your comment has split across two lines as "Go-ogle", which suddenly made so much sense!

I was slightly disappointed that this isn't what it says in the paper, or presumably your comment!

(Still no Ida why it was split when words are usually wrapped.)

Is it still possible in today's Academia setting to conceive a company like Google without oweing anything to the institution in which it was created? It looks like the first version of Google was even hosted on Stanford's computers.

Google paid licencing fees to Stanford for pagerank in the form of stock. http://www.ipnav.com/blog/google-algorithm-earns-337-million...

Do colleges own startups people create while going to school? If so that sounds horrible considering how much college costs.

I heard if you work for a company while also developing a startup depending on the contract the company can claim they own the startup - even if you were working on it at home with your computer you bought with your own money and off the clock for them.

Typically there is a three way way split: college, dept and prof/student. I know students who drop out rather than give up too much to the university.

I was a Stanford grad student in the 1980s when these policies were still being worked out. There were cases of grad students being abused by the commercial interests of their professors before the university clamped down. These abuses included postponing the publication of results which could delay graduation or an academic career. Now profs and students can only spend 20% of their time on commercial interest or should take a leave of absence and leave Stanford facilities if they want to work more. The just retired Stanford President did such when creating his RISC computer company MIPS. Also you can only delay publication one year.

No, but they do own patents created by their researchers.

Stanford didn't own Google, they owned the PageRank patent, which they licensed to Google in return for equity.

Interesting. "patents created by their researchers" so if it's a official school project then, and not someone doing it on their own? That would makes since. I don't get why schools need patents in the first place though.

If you sign an invention assignment agreement (IAA) with a university, it will own inventions that fall within the scope of the agreement. Universities typically require employees including faculty, staff, and graduate students to sign these agreements. Even if a university employee doesn't sign an IAA, the university might have "shop rights" or copyright ownership of developments made with university resources or within the scope of employment.

Undergraduate students generally are not employed by the university and are not required to sign IAAs, so they keep the rights to their own inventions. (Though there can be some tricky situations where significant university resources are used.)

They don't need them, except for financial gain. They make money off their patents just like the parent said, by licensing them, selling them, etc.

Even if they're doing it on their own, it might be owned by the school if it is related to the work they're doing, just like any company.

Interesting. I thought people went to college to learn, not to work for the school.

Do they get paid to do this sort of stuff too? I know I heard of people working in like the library to help pay down their loans. Just never really thought of colleges owning IP. So just a bit of a surprise to me.

Universities are research centers, not just schools.

Undergraduate students get scholarships for doing research, grad students do research as part of their studies and professors are usually researchers as well.

According to this article Universities are hedge funds with schools attached.


It wasn't possible then either.

This is 1998 Link. Is this still at least partially valid

Probably somewhat. However, "link equity/power" has definitely (in part) been replaced by other ranking signals.

Didn't they move towards "RankBrain" which I believe is a ML based search engine. Links maybe one of those signals but I believe user activity maybe lot more interesting for them.

Yes, is Google still using this internally? If not, then what would be a more up to date reference?

I think Google stopped updating Pagerank two years back

You're thinking of the "public" PageRank value previously displayed in Google's browser toolbars. Those are indeed no longer updated. However, there is still some variant of "PageRank" in use internally.

They are probably still using it, but I think they apply spam and duplicate content detection first and also limit PageRank flow to sites of similar topic. Another thing they could do is to nullify the old links for resold websites.

Besides that, time spent on page and social signals such as likes and tweets probably count for more than links. A crappy spam site would be disregarded pretty soon, because people would immediately bounce back to Google and try other sites.

But does it work in practice? It doesn't seem to. PageRank is mostly a marketing tool.


"The fact that in-degree features outperform PageRank under all measures is quite surprising. A possible explanation is that link-spammers have been targeting the published PageRank algorithm for many years, and that this has led to anomalies in the web graph that affect PageRank."

Well, if you are looking for a measure that is the easiest to game, abuse and spam, you cant go wrong with your pagerank killer: in-degree.

It works less well than it used to, but its never used in isolation. Used in isolation its a pretty good porn detector.

Any act of measuring a system changes that system. Pagerank did more to destroy the value of the web of links than any other technology before it because it was so good at measuring its value. Extracting that value then instantly leads to diminishing it because others (in this case the link spammers) want a slice of that huge pie.

I believe that each and every technology that successfully manages to index the web in a new and useful way will further diminish the value of the web.

Any act of measuring a system changes that system.

How about when I measure a stick using a ruler?

He means a system that can react to the measurements. Like how predicting the stock market is fundamentally impossible because whatever you do, the stock market can react to your prediction.

But he said 'Any'. I'm looking for a counterexanple.

Are there any other good reads on how this technology is implemented, the shortcomings since it was first released, improvements in the algorithm, splitting the calculations up into a map-reduce for the modern (larger) web, and anything else that might be a good read?

I just googled for some implementations like this one in Go: https://github.com/dcadenas/pagerank

This reminds me of the new Black Mirror season 3 pilot

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact