Hacker News new | comments | show | ask | jobs | submit login
RIP Google PageRank score: A retrospective on how it ruined the web (searchengineland.com)
282 points by adamcarson on Mar 9, 2016 | hide | past | web | favorite | 125 comments

I feel like no matter how Google or any other search engine ranked pages, SEO firms would be there to game the system and make a mess of the web. Making Pagerank visible to people with one particular toolbar does seem like a fairly major misstep on Google's part. The majority of the people who would really care about that are the kind of people you shouldn't be encouraging.

One of the obnoxious things about SEO is that if one person is doing it everybody has to do it. It's not necessarily enough to simply offer a better product at a better price. Luckily Google does try to reduce the effect of SEO. I notice for instance that StackExchange almost always beats out Expert Sex Change links these days.

I'm not sure how much of it is that google tries to reduce the effects of SEO, or that they just keep moving the target. Every time they change their algorithm it seems like a bunch of sites who were practicing one strategy disappear, and before long a new strategy is discovered causing new sites to rise to the top.

Ultimately Google's ability to quantify both quality and relevance will converge on actual quality and relevance. In other words, the way to "game" the system and practice SEO will be to create a site that is actually relevant with high-quality content. But we're not there yet, and who knows how close we'll actually get.

This is definitely what is happening. I work in SEO and follow the industry closely, but the SEO industry itself is a giant echo-chamber. But essentially what it boils down to is:

- make something your audience wants

- make content your audience wants to share

- "earn" links by creating useful resources & content

Essentially, the way to get great rankings in Google is by actually having a website that people want to go to and is relevant to their interests. Trying to game the system is much harder these days, and honestly for the most part isn't worth the effort.

One thing I seriously dislike about SEO is the value placed in content often above actual practice. So if a client is a plumber, we're realistically telling them to start an e-zine of engaging, shareable content rather than just providing a great service. Or pushing backlinks.

When so much business is decided by a search engine, the plumbing company with people writing "10 crazy plumbing stories you won't believe" beats the plumbers who are out on call each day. Further, if time-on-page is a Google factor, the site with a slideshow or listicle is going to top the one with efficient info and contact details.

The alternative for your hypothetical plumber to get "on Google" is to buy advertising from them. So instead of gaming SEO, or creating e-zine content that's shareable on the "social" sites, he has to buy ad space from Google so that his site shows up on searches.

Side note. Ideally, your plumber's service shouldn't have to be "googled". It should be in some sort of directory, and we shouldn't be using Google for that. The problem is that people are so very reliant on Google to find everything for them that they don't know any other place to do it. And no reasonable competitor (that I know of) has come in to provide such a directory service.

Most "yellow-page" sites I've seen are giant data-dumps of next-to-useless and out-dated pieces of information. No wonder people "google it".

Minor anecdote. Just the other day, I searched for a known name of a website/company (can't remember). And the first hit was their website, but just above it was an Adword ad for them. Not thinking, I accidentally clicked the adword version. That was $1 right there taken from their advertising budget because of a broken web (I'd argue), and simply given to Google.

Yep, I've done that before on browsers without any ad blockers. I'm so used to clicking the first result that looks like what I want that I got bit by an ad. I'll always go out of my way to click the not-ad link if it goes to the place I want.

Well a bad part of it is that "something your audience wants" has proven to be "awful buzzfeed clickbait titles", so human shittiness trumps value in this way

But how much traffic do clickbait titles get from search? I would guess almost none. Clickbait titles are an optimization for social media and "content discovery" ad platforms like Outbrain or Taboola, not search.

You may just be working with a poor definition of value if you consider giving people what they want to be at odds with it.

"Value" is not equal to "what people click on". That's just what's easy to measure. The more you optimize the second goal, the more it will diverge from what people actually value.

Funny thing is, most people don't know what they want, or they do but it's bad for them, or it's good for them but bad for many others.

The set of people whose wants should be satisfied is much smaller than the set of people who want.

Only if you use some vague, hypothetical concept of want. To me wants are the things people pursue. Seeing as people act, the idea that they don't know what they want is silly.

To say that peoples wants are bad for them or they should not get what they want is a paternalistic concept that I don't care for nor find particularly actionable.

You may be working with a excessively technical/economic definition of "value" rather than the common human one, if you believe that the only meaning of "value" is "monetary profit."

Huh? I didn't say anything about monetary profit. Users don't profit monetarily when they read buzzfeed articles so I have no idea how you would have got that idea.

Rather than knee-jerk dismiss the suggested interpretation, why don't you simply tell us how you define value?

Something that people want.

Thanks. That may be your definition, but it's not, oh, say, the neoclassical definition of economic value, it's not even the Austrian / Libertarian definition of value, it's not the classical economic definition of use value from Adam Smith, and it's not even the classical definition of value that you'll find through Greek and Roman philosophy.

Or, say, of concepts of value from ecology and systems theory.

But if you want to have conversations based on your Glory, by all means do.

I sincerely appreciate your letting us know that's your intent so as to avoid considerable wasting of time.

If that's your entire definition of value, it's overly narrow. Lots of people want things that are directly bad for them. Letting a five-year-old eat candy until his teeth fall out is not providing him a valuable service.

To get back to the original subject, reasonable people can perhaps debate whether the explosion in clickbait articles is "good" or "bad" in whatever sense, but it doesn't follow from that that "value" lies solely in satisfying people's most immediate wants.

Yet another case of the HN hive-mind downvoting inconvenient truths.

Is heroin a very high value product?

Absolutely, judging from what people are willing to pay.

Arguments against heroin are invariably rooted in negative externalities. I have never heard anyone argue that there is no demand for heroin.

The "value" of heroin comes from it tricking your brain with chemicals that you need it and will die otherwise. I WANT IT GIVE IT TO ME == value, without follow up questions? Similarly buzz feed tricks you with catchy titles. Oh look the end result was I clicked it, therefore it has tremendous value without further analysis.

"Tricking your brain with chemicals" is a ridiculous way to dismiss something that people like. Every desire or want we have is the result of chemicals.

Our desires and neurochemistry evolved in a very different environment from today. I think it's fair to say that a chemical that mimics an evolved desire/reward system without actually fulfilling the need that system serves is "tricking" the brain.

Because of our different environment, none of those "needs" really exist though. Being part of modern society basically guarantees we stay fed and sheltered. Very few people are trying to procreate as much as they can. Some of us are choosing not to at all.

We eat high calorie foods because our brain tells us its good even though we're a little fat. We fuck each other because our brain tells us its good, while we use contraceptives. We go out and try to be successful because our brain tells us it's good even though our base needs are already provided for.

Sure you can say that we're tricking our brain, but only so far as you can say everything we've done since we discovered conscious thought and developed agency is tricking the brain. Hence why I find pointing it out in a specific case a little silly. "Hey you're only doing that because it feels good!" Well yeah duh? Same reason any of us are doing anything.

> The "value" of heroin comes from it tricking your brain with chemicals that you need it and will die otherwise.

Most people who use heroin do not become addicted. The addiction rate for heroin is 23% according to National Institute on Drug Abuse [1]. Note that this is overall addiction rate, not the percentage of heroin users that are addicted, as most heroin addicts do not remain addicted.

In other words: The value of heroin for most users is the high, not that it satisfies an addiction.

[1] https://www.drugabuse.gov/publications/drugfacts/heroin

street price != value

I'm guessing by "negative externalities" you mean "people who object to having their property stolen by desperate junkies, or who would rather not get caught up in the violent fallout from drug gang wars."

How do you propose to put a value on that?

value = street price - net negative externalities

The negative externalities you mention are much more a result of prohibition than of the drug itself. But under current law I'll give you that they're real, though simple policy fixes would largely eliminate them.

The negative externalities to which I'm referring are primarily the addictive nature constraining the users' future choice, and the chance of an early death resulting from overdose (though that is arguably a result of prohibition as well).

Those aren't externalities, just costs.

an externality is the cost or benefit that affects a party who did not choose to incur that cost or benefit

Who said anything about demand? This is what I meant when I said you were defining "value" in narrowly economic terms.

Depends on the person. For some, incredibly so.

As history has proven it almost always does.

As somebody with no dog in this fight, I cannot help but wonder if there's a correlation between PageRank and the rise of referring to text on a page as "Content", rather than, well, "An Article".

Content is just a generic term. If you write a post or a Slideshare that has mostly pictures in it, it's still content but not an article.

Not sure if the parent poster referred to this, but it's not absurd to consider that the term "content" may have harmful connotations.

See Stallman's opinion on this (I know, an extremist view, but I think he has a point):

"Words to avoid (or use with care)": http://www.gnu.org/philosophy/words-to-avoid.en.html#Content

> If you want to describe a feeling of comfort and satisfaction, by all means say you are “content,” but using the word as a noun to describe publications and works of authorship adopts an attitude you might rather avoid: it treats them as a commodity whose purpose is to fill a box and make money. In effect, it disparages the works themselves. If you don't agree with that attitude, you can call them “works” or “publications.”

I think there may be a correlation between this negative phenomenon of gaming pagerank and referring to articles and pictures as "content", as if they were second-class citizens of the web.

I guess my reaction is such that the term "Content" relegates an article (or a slide deck) to being a unit of commerce, rather than a summary of thought (or of emotion).

This is what SEO firms like to say because it gives them cover and makes people think they're just content writers, but I don't think it's accurate. Backlinks are still king, even if your site has scarcely any content on it; not that it's bad to have more high-quality content, but it's not like Google weighs that strongly. A lot more human curation would be needed to determine that.

You put "earn" in quotation marks, because these aren't organically earned except in a tiny fraction of cases. Most of them are planted.

I think that by the time that Google is able to quantify quality and relevance, people will be able to use AI to automatically create websites that are relevant and high quality.

I'm not sure whether that is a good thing or not, but it would make sense that the sophistication of the AI used to game Google moves at about the same pace of Google's ability to stop people from gaming its algorithms.

I suspect it's not that hard to get a very rough assessment of quality and relevance - although it would only be useful with a giant multidimensional model of demographics and interests, because "relevant" isn't a scalar.

My guess is Google doesn't do it because it costs too many cycles. Counting backlinks, supplemented by some very basic NLP, is very much cheaper and easier.

The irony is that I wonder if people would pay for - or at least not mind - guided and prompted personal search if it produced highly relevant results.

The sad truth is that the current sneaky lumping of broad demographic guessing with search history with backlink counting and a bit of NLP/timeline voodoo produces mediocre results for many kinds of searches. (I've just spent a very frustrating 15 minutes trying to find out if the raw datasets from KIC 8462852 are available online. Usually my search fu is pretty good, but I couldn't get a definitive answer.)

I'm not sure to what extent Google's sales model relies on this. It's much easier to sell advertising if you don't offer an SLA, because it becomes the customer's fault if the service doesn't provide high quality results.

A more effective service would increase ad buyer confidence and the price of niche ad sales would increase, but maybe not by enough to compensate for the loss of more generic ad sales overall.

There is no scientific definition of quality, so this can't happens (even if there is a easy ascertainment of garbage). Google is used to confuse quality and popularity, so maybe we will end with the more popular texts, ideas, beliefs, etc. on the top.

People will still be able to game that scenario, just copy-paste from elsewhere.

Google is, like any ranking system, subject to Goodhart's law[1]. "When a measure becomes a target, it ceases to be a good measure."

What has been Google's saving grace has been their skill at updating their target, attempting to turn the chaos and effort from SEO towards improvement in content. They have made an impossible problem into a mostly-solvable control systems[2] problem.

[1] https://en.wikipedia.org/wiki/Goodhart%27s_law [2] https://en.wikipedia.org/wiki/Control_system

I see it slightly differently, the issue is money. When page rank was simply an internal relevance metric it was fine, but when page rank was the way to change how much revenue ads on your page brought you, well then it became much more than "relevance."

Brad Templeton made the observation way early on that his humor mailing list went to over ten million people. If people sent him a penny if they laughed, and only 1% did, he would make $30,000 a month or $360,000 a year just for curating jokes. Scale + pennies can add up to big dollars on an individual level.

PageRank, AdSense, AdWords, all fed this mechanism with a way to turn a small amount of work by an individual into disproportionate returns. It really has been no surprise that the web spam problem became much much worse than the email spam problem. There was a better built in payment mechanism.

>It's not necessarily enough to simply offer a better product at a better price.

It's not nearly enough to offer a better product at a better price. Our company, which was objectively superior to the bad imitators that followed us, was eviscerated by a competitor with a pedigree as a professional spammer -- his product rarely worked and we were constantly getting refugees once they were able to wade through the massive amounts of spam that promoted his solution and find some genuine information. It didn't help that he bought off niche webmasters to delete any mention of our service from their forums and only allow mention of his service.

I have accepted it as a reality that "SEO consultants" (read: professional spammers) are a necessity if your site is going to get anywhere. Note we've always been SEO optimized, meaning our site was always search engine friendly and it contained a great deal more relevant content than our competitor's site. But because our competitor enlisted his network of spam websites to manufacture backlinks and actively sought to exclude mention of my product from the internet by paying off webmasters, he performed much better.

People don't know and generally don't care if your product is better or worse. If you have a product that looks like it's functioning, that's all the development that needs to be done as long as it half-works 5-10% of the time. The rest of the money needs to go into spa--err, sorry, "internet marketing".

> I notice for instance that StackExchange almost always beats out Expert Sex Change links these days.

True, but they very often don't beat out numerous websites that have taken the open StackExchange content (or other similar, more specialized forums) and rehosted it on a new domain. This behavior should be easy to kill, Google should consider this blacklistable behavior and have an easy way to report any site doing it, with the number of StackExchange users out there these sites would be blacklisted within hours of being launched and before long peolple would simply stop doing it. But for some reason it is allowed to continue.

Consider this attack:

1) copy paste from from any random site to a StackExchange comment 2) report the original site for scraping StackExchange 3) lulz

The StackExchange content is open, so I don't think the offending sites are doing anything wrong per se, excluding making the internet worse.

And you think Google won't be able to compare timestamps of when each piece of content was inserted/updated?

Also Quora. I luckily almost never see Quora results anymore. For a while, maybe a couple of years back, they'd rank on the first page for many technical queries. Clicking through would give you the question and a blocked answer. Really glad that Google decided that those businesses shouldn't win, because it's a really sleazy model.

PageRank wasnt bad. But the 'game-able' mechanism went on way too long.. and it was compounded when Google "went to bed" with SEO firms

> it was compounded when Google "went to bed" with SEO firms

What does this comment refer to?

It just seems like Google was incentivised to have a relationship with SEO firms to make page rank work.

I doubt that. Google is Google because it was/is better than others at finding relevant information.

Really crummy title. PageRank is the reason we HAVE a search as powerful as Google, and largely the reason the web is as good as it is today.

Raise your hand if you want to go back to AltaVista/AskJeeves.

I actually miss old AltaVista I always could find what I was looking for with their boolean expressions. Yes, perhaps it was harder to some, but it worked well.

With Google, I feel like I have to fight with it. If I'm searching for something obscure or perhaps a word that is misspelled on purpose it thinks it knows better what I'm looking for. It also often returns searches without the word that I searched for and often ignores when I prefix it with + or put in quotes.

If you quote a word or phrase, it will be treated as required, just like "+" used to work with AltaVista.

You can also click on the "Search tools" button and select "Verbatim" from the "All results" dropdown. This causes the search to only perform exact matches: https://support.google.com/websearch/answer/142143?hl=en.

And yes, I also spend a lot of time fighting Google's inferral rules. I remember doing a search for Biber at one point, and it asked if I meant Bieber instead.

Not quite: I've sometimes put "CentOS 7" in quotes when looking for something CentOS 7-specific, only to have Google return pages and pages of results for older versions of CentOS. These days it's almost a crapshoot whether or not I'll get relevant results from Google.

As lobster_johnson says, verbatim mode is key. I wish it was the default - I'm always using specific phrases the google re-interprets into something more popular. A big pet peeve given my line of work: I know autocad is more popular, but please google, I really do want results about draftsight!

Did you try the verbatim mode?

Ignoring words in your query that it doesn't like is a relatively new behavior for Google -- in the last three years or so. But yes, there's a reason why I keep a search keyword for Bing.

It is one of the most annoying and user unfriendly anti-patterns I've ever come across. It is SO irritating when you are searching for, say, "Hackernews ramen" and you get thousands of results all with the little gray strikethrough text saying "ramen" doesn't appear on the page. If it's not there why bother even returning the result? I know the answer (Ad money / SEO) but it's like one of the basic tenets of information retrieval and they just said fuck it.

But synonyms! I want to say that if I search for "London Ferris Wheel" and I find "London Eye" without the words "Ferris Wheel" anywhere on the page, that's fine.

However, 5 minutes looking through my Google Search History, and I don't see any examples where non-verbatim results would have been useful, so hmm.

If Google highlighted that these synonyms appeared in the article, that'd be great. They don't. Thus I, the user, am forced to conclude that the relevancy of the result is pertinent to the other words i.e. the more general search I was trying to avoid by adding in the other word! It's madness I tell you.

They have to grow query count (see: marketable inventory), and what better way to do so than serving useless results on your first queries, leading to additional queries?

AltaVista was excellent because of those booleans. Particularly, "NEAR".

"NEAR" is default in Google I think. If you look for "car" it will find sites that say "automobile"

NEAR worked differently. it would search for text where 2 unrelated words were physically close to each other, like in same paragraph. it's more simple than what google does, but surprisingly effective.

e.g. "brakes NEAR ford NEAR (problem OR issue)" would bring back results with "brake issues with fords..." as opposed to AND where all words simply appear on the page.

no other search engine at the time offered anything near this power. the amount of crap eliminated by a properly constructed query was breathtaking

LexisNexis (database of court decisions, newspaper archives, etc.) still has the NEAR feature, and I've found it very useful there.

As far as I can tell, intext:<word> still forces it to always include <word> in the results.

I miss AltaVista as well. I'm a rebel and I despise Google, however effective they are. I feel like they have way too much negative influence on the web.

The title is about "Google PageRank score," not Google PageRank.

PageRank was one part of the reason we have a search as powerful as Google, the use of links to assign an authority score. Another big part was the use of link context -- which isn't part of PageRank but part of the overall search algorithm at Google.

PageRank score is when Google revealed those scores to the public for any page. That's not something it had to do, in order to use PageRank as part of it algorithm. But in doing so, it fueled an explosion in link spam.

As the author himself (sullivandanny) replied, you misread the title.

But I will actually claim that PageRank itself is a problem not just because it is so gamed but also because web page authors link to things they find via Google, creating a positive feedback loop that undermines the very premise of the PageRank algorithm.

Search engine research had stagnated because of Google's dominance, and Google itself is not motivated to change, much as Microsoft was unwilling to evolve its cash cow. Rather than innovate in search, it spends most of its resources on ways to shore up its dominance (Google+, Android), fighting the very thing its search engine feeds (Internet garbage), and strengthening its advertising business.

To fully remember the evolution of search one needs to have been over 20 in 1995 and on the internet. That's probably quite a small number of people who are now over 20 and on the internet.

Do you remember when Yahoo was a website about wrestling ?

Or when having a good section in DMOZ was important ?

I don't remember Yahoo being a wrestling site. But I was over 20 in '95 and I can remember the internet before Google. It was a bit of a mess. Fun and amazing and kooky. But tantalizingly frustrating if you were trying to do any focused research.

I can also remember the moment when I discovered Google. It was reading this article back in 2000 (I could have sworn it was 1998):


If you want to travel back in time to the internet as it was when Google appeared, give it a read.

For any gripes I may have about Google's search engine (like the fact I can never seem to easily relocate this article when I want to refer to it), it definitely solved more problems than it created.

> Do you remember when Yahoo was a website about wrestling ?

What? Really? I was around in 1995, and I don't recall that...

Jerry Wang had a webpage about wrestling that was hosted on the same server as "Jerry's Guide to the World Wide Web", the original name for Yahoo!.

i remember it being akebono.stanford.edu

immune to Google I guess

I think the problem is that Google puts too much faith in algorithms. A lot of math guys do this; they write an algorithm and are very slow to accept it when the algorithm isn't all-encompassing or totally representative. They need to give more emphasis to human tweaks and consensus, even when their algorithm is trying to figure out consensus, as PageRank does.

I'd like to see a hybrid of the high-speed algorithmic scanning of the pages that we see now combined with an army of human reviewers, including super-reviewers who are recognized industry experts, who periodically rate indexed content based on quality. Backlinks and other derived consensus measurements should be given far less weight and a combined algorithmic and human quality rating should be of at least equal importance.

So I want to go back to the days of a manually curated web index, combined with the technology needed to make that span out over billions of web pages.

If you think that's a viable proposition feel free to start it. I'm not convinced that the economics add up, nor that recognized industry experts want to spend the majority of their time reading a huge amount of content and rating it for pennies an hour.

I don't think Google have too much faith in their algorithm - they know it's flawed. But it's the least worst algorithm anyone has come up with, and adding human tweaks leads it subject to subjective bias.

>If you think that's a viable proposition feel free to start it.

Yeah, I don't think it's a bad idea. I don't have the funding to start it, of course, and VCs crap their pants at the thought of anything that has overhead, so it's probably a non-starter.

>nor that recognized industry experts want to spend the majority of their time reading a huge amount of content and rating it for pennies an hour.

The recognized experts would be paid more than pennies an hour, and they wouldn't need to spend the majority of their time reviewing content. They'd be "super-reviewers", so their opinions would hold a lot of weight. It'd be a way for them to make some extra money without a lot of overhead, something they'd do occasionally for an hour here and there. Honestly the main thing we'd be looking for from these people is information about the cutting edge; things that are trending that we haven't picked up yet, things that are new and thus don't have a lot of consensus markers but are still worth attention, and information about the perception of the content within the industry. That classification can be used to inform on a variety of axes that could be good search parameters. We'd need to make sure we got opposing industry leaders so that the index didn't become solely representative of a single viewpoint.

Normal reviewers in a position analogous to a news reporter are more affordable, more consistent, and can classify the majority of content for a sector fine. Maybe Yahoo! could take their niche content mills and reassign the staff to rate pages in their index, then maybe their search will get somewhere.

Then you'd have MTurk style reviewers who provide the bulk of the content rankings and just give back a few basic pieces of info. These are the people that would be working for $2-$3/hr or less, at their convenience.

All of this is on top of a more traditional automated ranking algorithm that would use consensus markers and computer-perceptible quality markers to rank content. There's not necessarily an obligation that every page is sampled and reviewed by a human.

It'd be great if we could get good traffic data too; we'd be able to see where people are actually going instead of just what they put links back to.

>But it's the least worst algorithm anyone has come up with, and adding human tweaks leads it subject to subjective bias.

The bias is there regardless, it's just filtered through different parameters. This is inescapable. In general, not just in algorithm and computer design, we need less faith in cold systems and more faith in human intervention and judgment.

Yes, you have to be aware that any process is subject to gaming, manipulation, or bias, but I think affording sufficient room for human opinion and circumstantial judgment as very highly-weighted inputs prevents most of the egregious failures caused by runaway systems.

@cookiecaper: Something like what you describe is happening.

I never knew PageRank scores were visible, and I never used the web before Google.

But this article is so far up its own ass.

> Ever gotten a crappy email asking for links? Blame PageRank.

Never mind that web rings were around long before Google and used the same tactics.

> Ever had garbage comments with link drops? Blame PageRank.

There are way more reasons spammers exist than just boosting PageRank.

The author is acting like a) Google had less of an influence on the web before PageRank was public information and b) the web was somehow better both back then and before Google existed. There will always be people who want to game search engine results, regardless of how much information they know about their own standing, and the web was pretty much un-navigable pre-Google.

So as the author, I'll try to clarify.

The article isn't about Google's influence on the web, as a whole. Google has had a huge influence on the web in many ways, from making it easier for people to locate information to sites considering how to speed up their content, to become mobile-friendly or to use secure connections, because Google rewards such things with ranking boosts.

The article was specifically about PageRank's influence on the web, in terms of link brokering and link spam. Before Google released PageRank scores, some of this happened. It would have happened even if scores had never been released, because it was well-known that Google leveraged links and thus, links had value.

But PageRank scores were an accelerant. They allowed people to use Google's own scores to assign value to pages, value that could be translated into monetary value. It really did reshape the link economy, to the degree that we had a court case with a First Amendment ruling on Google's search results (amazing, when you think about it) as well as an entire new standard to restrict the credit links could pass, nofollow.

I doubt Google anticipated this. Showing the scores, as the article explains, was meant as an incentive for Google Toolbar users -- "Hey, enable this feature, and we'll show you how valuable a page is deemed to be." Google's gain, of course, was that anyone enabling this sent their browsing patterns back to Google, so it better understood what was happening on the web outside of its own properties.

The unintended consequence was that PageRanks scores fueled an explosion in link buying and selling, as well as link spam.

Also, I didn't say the web was better before Google. It had plenty of problems, though it wasn't "pretty much un-navigable pre-Google," as you say. Many people used many of the search engines that were bigger than Google successfully for years. If it were really that bad, by the time Google came along, people would have given up on the web.

Google, of course, was a huge improvement in search and for the web as a whole. The article wasn't that Google was bad for the web. It really was just focusing on one aspect that didn't help the web, how releasing PageRank scores ironically fueled some of the spam Google has to fight (and which it fights well) as well as the spam third-parties have to deal with.

> I doubt Google anticipated this.

It's nice of you to say, but Google has many very smart people who spend all their time thinking about search. I would be surprised if they failed to anticipate this outcome. Maybe they thought it was worth the cost, especially as a company that values openness.

> Never mind that web rings were around long before Google and used the same tactics.

If you "never used the web before Google," how would you know this? I suppose you might have read about it. I did use the web before Google, and I don't remember web rings back then. There simply was no reason to do so, before Google started ranking pages based on the links.

I do remember lots of BS meta keywords.

I used the web before Google too, and I remember them.

Also, I do consider reading about something to be a valid way to learn about things you didn't directly witness, so it's strange that you would discount this.

> There simply was no reason to do so, before Google started ranking pages based on the links.

Webrings were useful for people who found a website interesting and wanted to visit other similar websites. The incentive to be in a webring was that you could get more exposure. Not to mention that the 90s were full of trendy things like this: "under construction" gifs, "valid HTML" buttons, etc. Web rings were one of those "clever" things you could add to a website.

Having had pages in Webrings way back when, it was definitely about exposure, but I think it was also an extension of groups (Yahoo! Groups being the biggest I can recall, but I think there was also OneGroup (?) - it maybe had purple in it's logo?).

With Tripod, Geocities, Angelfire, and the like it was fairly easy to get a really basic page up, typically with a bunch of links to pages that you checked on regularly, and might be of interest to others.

At least that's how it was for me. I think the rings I was part of included Terragen, POV-Ray, and Star Wars, and I can recall getting a couple people started with basic (and of course very ugly, with that same star field background for the SW-related pages) for people that I met in the various groups.

Good times.

EDIT: And don't forget the 'made with notepad' icons. Or recommending 800x600 or 1024x768 as the best resolution to view a site.

Fair enough guys. Thanks for reminding me about webrings. I had forgotten them and confused them with common link spam we see now.

I do consider first hand memories a more reliable contribution than hearsay (reading about it), but there is some contribution in repeating received wisdom too.

Web rings enabled access to related content. Definitely a pre-Google thing, I recall them from my introduction to the Web in high school circa 1996-1998 (not certain when I first saw it, but in that range).

Web rings were all the rage in 1990s/early 2000s, right before Google launched.

Unless they were some other kinds of web rings we're talking about.

I first learned about this after starting https://neocities.org and seeing a bunch of really garbage pages that were full of random text that linked to a derpie site somewhere.

We get pagerank SEO spam from time to time, and it's pretty annoying. I have the tools to take care of it within 5 minutes every day, but I do worry that if we grow to a certain point it may no longer be possible for me to handle the problem alone.

I'm sure many other sites have similar problems with comment spam, and I'd love to hear some advice on how to deal with this from sites that have the same problem.

Right now our main lines of defense are a recaptcha (our last remaining third party embed, ironically sending user data to Google I'd rather not send to deal with a problem Google largely created), and a daily update of an IP blacklist we get from Stop Forum Spam.

I tried to do some Bayesian classification, but didn't make much progress unfortunately. And nofollow really isn't an option for me, as it would involve me manipulating other people's web sites and I don't want to do that.

This might be very heavy handed, but could you build something that added rel="nofollow" to all links on user pages until trust is verified?

I've pondered doing something like this, but it kindof crosses a line I'm not yet comfortable crossing (changing users' HTML without them noticing), and would require a way for me to convert from/back when the user edits their site. Technically it's pretty difficult: HTML/XML parsers like Nokogiri like to "fix" html when they parse it, so they can do a lot of changes to your document you didn't want to make.

I've said I would decide to cross it if I ever ran into a major security issue, but so far that hasn't been the case. I've even decided against web page auto optimization ala things like ngx_pagespeed for this reason.

One place I do cross this line is for our https://neocities.org/browse gallery in order to deal with iframe issues (I make the A links on a site open a new tab with a javascript inject), but that passive model won't work for pagerank issues because it uses the browser to make the changes.

And honestly, let's be real, even if I put on nofollow and Google stops using pagerank, they're probably still going to do it because they're basically shady pagerank scam artists in the end, fooling gullible people into sending them money. They don't understand they're feeding into a botnet meets army of underpaid pagerank spammers, and it's basically impossible to fix this with education.

It would be excellent if Google gave me an API to report pagerank spam. For all the money they've made on pagerank, it would be nice if they could defer some of that money into helping us deal with this, and would definitely help to improve their search results.

> I tried to do some Bayesian classification, but didn't make much progress unfortunately. And nofollow really isn't an option for me, as it would involve me manipulating other people's web sites and I don't want to do that.

Did you read the comment?


Back in 2003 I wrote:

"PageRank stopped working really well when people began to understand how PageRank worked. The act of Google trying to "understand" the web caused the web itself to change."


It's amazing that it took this long.

You might be interested in Goodhart's law: "When a measure becomes a target, it ceases to be a good measure."


A friend mention that recently to me and it was really eye-opening. Once you start looking, the pattern is everywhere.

The real problem is that Google was losing the link spam war until very, very recently. It was trivial to game them up until 2010, and only really became relatively difficult somewhere around 2012.

And, the solution looks roughly like "weigh established authority to the point where it trumps relevance".

Google has dealt with web spam by replacing it with their own ads. Search for "credit card" or "divorce lawyer". Everything above the fold is a Google ad. Air travel searches bring up Google's own travel info. No amount of SEO can compete with that.

(I still offer Ad Limiter if you'd like to trim Google's in-house search result content down to a manageable level.)

> Google has dealt with web spam by replacing it with their own ads.


Google are not interesting in protecting us web users from SEO, but themselves (or more specifically their core ad business). After all why would you pay for Google ads if you could just get free traffic from Google via SEO techniques? So SEO is the logical competitor to Google.

I also believe this. Almost all updates to the search engine seem to be to drive SEOs towards adwords. It's not about protecting users, it's about profit.

Better title: "How SEO Asshattery Turned The Web To Shit"

Does anyone here think we need a search engine which lets us maintain large blacklists of websites. For example, if I am searching for information about airbnb, I do not want news websites like NY Times, WSJ, Forbes, Business standard etc to show up in the results at all. Any business related question on India is invariable dominated by Times of India and other newspapers. With google, its becoming increasingly difficult to filter out websites.

Edit: Changed "on airbnb" to "about airbnb"

Check out chrome extension "Personal Blocklist"

I had used that extension for ages to prevent w3schools from appearing in my search results, but removed it when I eventually got tired of seeing search results jump around.

(Because the extension works on a JS level, it needs to wait for the results page to load before it can strip out what you want to hide. Too often I'd see the results page begin drawing quickly, with a couple w3schools results at the top of the results, and my eyes would scan down to find the first non-w3schools result; then a second later, in the middle of finding the first such result, the w3schools results would be removed, and the non-w3schools result which I'd finally found and was about to click on has moved.)

I used that extension for the same reason you did and like you I removed it because it was annoying. What I started doing instead when on a new computer is I'll include mdn in the query when it's web related so i'd search js history mdn and such and after a while Google learns that I value results from mdn highly so when I search even without saying mdn, those results come up high in the results. For example when I now search for just js history, the first result for me is "Manipulating the browser history - Web APIs | MDN".

The Google search bubble is powerful and can be harmful in some ways but once you learn of its existence and are careful about what results you click, it will work for you in a great way.

I am still uncomfortable with the amount of stuff Google knows about me. I sometimes try ddg or even yahoo or bing but they're not as good.

To achieve the same result, I've now got a browser shortcut (aka omnibox search engine) for an "I Feel Lucky" Google search restricted to the MDN site for whatever keywords I'm looking for. This means I type "mdn js history" and Chrome picks up the "mdn" prefix to use my shortcut, expands it into https://www.google.com/search?q=site:developer.mozilla.org+j... and I end up on the same page you mention. (The "feeling lucky" search isn't perfect, but it's usually good enough for me...)

Personal blocklist does not filter results at Google's algo level. It works at javascript level. So, its not really useful.

If I understand correctly, it takes the domains and includes them in an exclusion query on every search. So it does work on an algorithm level.

For the use case you described it would work. You could also simply use the search operator site:airbnb.com

"How gravity ruined flying"? PageRank looking at links isn't some arbitrary thing, it's a source of information every good search will take into account.

I think it's odd to perceive the end of a relative transparent metric - whatever relevant or not it has been - as a good thing.

Indeed, one of the things I don't like (hate?) Google about is the SEO and PageRanking BS. All pages in the last 10 years are starting to look the same. All pages are becoming what Google wants them to be.

Just because I can't see the score doesn't mean I'm not going to what I can to increase it.

What difference does it make if the semantics of PageRank are still in place for determining position in the search index, but it is just hidden?

You can still infer the approximate rank of a page by where it places relative to other pages, when searching for relevant keywords. Someone wanting to place ahead of the competition still has a function for measuring how well they are doing in SEO.

Another way to look at this is a blow to openness and a concentration of Google's power. The PageRank scores still exist, but they now will be known only by (some? all?) Google employees.

Therefore, the data is no longer open and power is now more concentrated: Those who know someone at Google can find out their page rank score; the 99.999...% of the rest of the world cannot.

I just lost a ton of respect for Danny Sullivan.

Every system can be gamed. Every system where money can be made WILL be gamed. It's a predator-prey relationship.

The way this article was written made it sound like Google Search was a bane when it arrived. And sure, it was the worst Search Engine at the time, except for all the others that had been invented up until then.

You're reading things into this that I didn't write, I'd say.

When Google arrived, it was a huge advance in search. It offered an obvious improvement in relevancy, which is why so many serious searchers switched to it from AltaVista and then users of other search engines moved over.

Nothing I wrote suggested that Google was bad, didn't offer great relevancy or anything like that.

My story is about what happened when Google revealed PageRank scores for pages across the web. That fueled an explosion in link buying and selling. It allowed people to attach Google's own score to a page, a value if you will that Google itself placed on those pages, which made it easier to then assign a monetary value.

In turn, that lead to many of the woes that the web as a whole has to deal with today: understanding how to use nofollow to block links, to stay in Google's good graces. Spam mail pitching links, trying to buy links. Link spam

I'm sure we'd have had some of this even without PageRank scores ever having been revealed. Perhaps it would have been as much, even. After all, it was well-known that Google was leveraging links as part of its ranking algorithm. The market would have been there.

But I do think that releasing the PageRank scores accelerated market faster than it would have done otherwise.

Back to the gaming -- again, it feels like you're reading stuff I didn't actually write. I'm certainly not saying that Google itself introduced the ability for people to try and game search engines. That was happening even before Google existed. Of course, Google initially thought it was immune. In 1998, Sergey Brin even said this on a panel that I moderated:

"Google’s slightly different in that we never ban anybody, and we don’t really believe in spam in the sense that there’s no mechanism for removing people from our index. The fundamental concept we use is, you know, is this page relevant to the search? And, you know, some pages which, you know, they may almost never appear on the search results page because they’re just not that relevant."

Google soon changed its view and introduced extensive spam fighting efforts. Those were inevitable. As you say, it was prey that would attract predators. And even with the link selling, it has done an admirable job fighting off the spam. It's not always perfect, but it's a very robust system.

Nevertheless, the spam attempts will continue regardless if Google actually blunts them because, as the article explained, there's simply so many people with misconceptions that they'll chase anything anyway. PageRank scores fed into this, that's all.

Thanks for the response.

> My story is about what happened when Google revealed PageRank scores for pages across the web.

And I'd assert that people already knew if they were number one in the search results, or not. And that metric continues to be the main thing they pay attention to. Well, that and their traffic numbers from Google. My point being, we all knew Google was using links to rank, and the search result rank was visible just by doing a search on a few of your synonyms and adjacent terms, market, brands, trademarks, etc. The battle over spamming links was inevitible, whether they revealed PageRank numbers or not.

> I'm sure we'd have had some of this even without PageRank scores ever having been revealed. Perhaps it would have been as much, even. After all, it was well-known that Google was leveraging links as part of its ranking algorithm. The market would have been there. But I do think that releasing the PageRank scores accelerated market faster than it would have done otherwise.

I can agree with that, but that's not the tone that I get from your article, at all.

The tone I get is that Google created this monster, visible PageRank score, and those crappy emails, link drops, and need to use nofollow, are uniquely Google's fault, and it all could have been prevented if they hadn't ruined the web in 2000 by making it visible.

> Google initially thought it was immune.

Your quote from Sergey doesn't imply to me that he thought they were immune. It tells me that they mechanism they intended to use to fight spam would be to reduce its rank so low that "...they may almost never appear on the search results page..."

You may think that's a pedantic difference, but I see it as a meaningful difference. You can't claim that we're all immune to measles, mumps, polio... But we've reduced the incidence to an incredibly low level, here in the Western world.

Reading this makes one realize how easy it was for Groovy's Tiobe ranking to jump from #82 to #17 in just 12 months, as shown at http://www.tiobe.com/tiobe_index?page=Groovy , and the other spikes in its history.

Are you saying that someone working on Groovy is doing "link optimisation" in order to get higher on Google thus Tiobe?

According to http://www.tiobe.com/tiobe_index?page=programminglanguages_d... , Google is by far the largest component of Tiobe's calculation: Google.com: 7.69%, Google.co.in: 5.54%, Google.co.jp: 4.92%, Google.de: 3.69%, Google.co.uk: 3.08%, Google.com.br: 2.77%, Google.fr: 2.46%, Google.it: 2.15%, Google.es: 1.85%, Google.com.mx: 1.54%, Google.ca: 0.92%, Google.co.id: 0.62%.

It seems the most likely reason for Groovy's strange behavior on Tiobe.

PageRank is still visible today?!? Where? (I am just curious, I thought it's not visible anywhere for years)


The value of "ch" is a checksum you have to precalculate.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact