Hacker News new | past | comments | ask | show | jobs | submit login
Evolving “nofollow” – new ways to identify the nature of links (googleblog.com)
130 points by doener 7 months ago | hide | past | web | favorite | 61 comments

It's the crisis of the hyperlink: It's diminished to the point where Google is afraid of losing it's previously dominant signal for good.

Personal publishing has moved almost entirely to platforms that `nofollow` links. And commercial sites do whatever it takes to avoid linking anywhere but back, deeper into their own site.

There have been some cases where I think people are behaving somewhat irresponsible with "nofollow": the German Wikipedia, for example, does it to all outbound links.

That's understandable from a spam-fighting perspective. But, if search engines actually implement it draconically, it deprives the search index of vast amounts of collaborative filtering.

I actually just checked, and couldn't find nofollows on either HN or Reddit links. Are they somehow setting them via some method other than the actual link tag?

In any case, I was going to suggest that nofollow makes sense for the "new" queue, but it wouldn't hurt and possibly help to remove it once user generated content has reached certain milestones, i. e. user karma, or hitting the front page.

A related source of great frustration: put some good stuff on the web. Lots of people link to it but almost entirely from social media platforms where every link is a nofollow. Somebody else working on gaming the system with SEO makes a less valuable and useful thing, but obtains (by subterfuge or diligent direct asking) some non-nofollow links. Your high-quality content will be outranked in search results by the other less-high-quality quality content.

A reasonable improvement to this, to reinvigorate the hyperlink: social media platforms could stop using nofollow for links put in by users who themselves have built up a degree of reputation.

It really does seem hard to get visitors these days, even for original niche content. The first page of Google search results is half social/video/news carousels instead of genuine long-term content, and a lot of the other links point to repetitive articles published by generic big-name websites that never really add anything new.

This is the point of nofollow, to get rid of small websites and to prevent them. The spam story is nonsens, it does nothing spam protection related.

The thing is, all this really does is promote spamming and self promotion, its of no real direct benefit to the user.

The English-language Wikipedia used to take an... interesting approach to nofollow, not sure if they still do. External links to most websites were nofollow with the exception of cross-wiki links to approved wikis including links to Jimmy Wales' commercial website Wikia. So the net result was that the trusted sources which Wikipedia heavily relied on got no boost but Wikia got a substantial boost in ranking.

> couldn't find nofollows on either HN or Reddit links HN nofollows job ads and posts under ~10 points (it's 5 in the opensource code, but it seems HN has been modified there). Old reddit has rel="nofollow" but then removes it via JavaScript for some reason. NewReddit displays OldReddit if it detects Googlebot is asking.

> It's diminished to the point where Google is afraid of losing it's previously dominant signal for good.

Is Google afraid of losing that signal? I don't have any data to back this up, but my guess is that Google's reliance on PageRank is greatly diminished now, and they probably use search result click-through rate and website engagement data (from Google Analytics and ads) as the primary signals for website relevance and quality.

PageRank is not an important signal anymore, but Google still uses links in other ways.


> but it wouldn't hurt and possibly help to remove it once user generated content has reached certain milestones, i. e. user karma, or hitting the front page.

Wouldn't that also give a lot more incentive to reach those milestones illegitemately?

It's funny because Google created this monster by attaching a value to links.

I don't plan on implementing these, and neither should you, unless they get added as part of an official HTML specification. Maybe they are, but I don't see them on this list of link types [0]. Until they get formally specified, to me this is just Google doing Google things and fragmenting the web.

[0] https://developer.mozilla.org/en-US/docs/Web/HTML/Link_types

> unless they get added as part of an official HTML specification. Maybe they are, but I don't see them on this list of link types [0]

Extending link types is in the html specification:


> Extensions to the predefined set of link types may be registered in the microformats wiki existing-rel-values page [http://microformats.org/wiki/existing-rel-values#HTML5_link_...].

where "may" is defined in the usual way (they aren't in there, but the page hasn't been updated for over a year, so I'm not sure if anyone cares about it anymore after already adding so many extension values).

It's common to have user agent (in this case googlebot) specific rel values.

That's a fair point, thanks. I do still feel that the attribute is woefully underspecified (evidenced by the lack of the microformats list being updated, and the fact that it's a wiki) so in my opinion the least Google could do here is to shore that up. If they contributed to the wiki or formalized it a little bit (like the HSTS list or public suffix list) it would go a long way as far as how I feel about their additions to the attribute. I realize that's quite a lot to ask for "just" a few new link types though.

To be fair, the WHATWG HTML "Standard" is also nothing more than a Wiki, or even just a collaborative space where everything can change at any time.

> To be fair, the WHATWG HTML "Standard" is also nothing more than a Wiki, or even just a collaborative space where everything can change at any time.

well unless you're waiting for the second coming for someone official to come along and bless an RFC, all standards are born of a collaborative space where everything can change at any time. Versions are just snapshots, and good luck if you want to stick with one forever (hows that TLS 1.0 server going to work out after March 2020?).

Anyways, if you want an official© W3 spec, here it is in HTML5 https://www.w3.org/TR/html50/links.html#other-link-types, HTML 4.01 https://www.w3.org/TR/html401/types.html#type-links, or hey, even back to HTML 2 https://tools.ietf.org/html/rfc1866#section-5.7.3

I'm well aware of the various W3C's and WHATWG HTML specs (see my site at [1]); that was kindof my point ;)

[1]: http://sgmljs.net/

It feels like yet another way for Google to offload the hard work of indexing the web into other people.

See also; microdata, and the monthly alerts I get from Google about the content of my sites being malformed, yet they validate just fine in a dozen other tools.

If this keeps up, eventually we'll have to log in to Google to provide it with with the URLs of new content, and fill in all of the meta data about each page. All in the name of whatever buzzword Big G comes up with that month.

>If this keeps up, eventually we'll have to log in to Google to provide it with with the URLs of new content, and fill in all of the meta data about each page

FWIW, most people who care about their search ranking would absolutely love a tool like this. The point of the automated crawl isn't just to find the information, it's because if you let people submit their own content to the index they'll lie about what it is.

> hard work of indexing the web

IMO if you mean that Google should check linked-to content rather than relying on link qualifiers, Google has been doing a bad job in recent years, as I can't find useful material among an ocean of low-effort clickbait most of the time. OTOH, relying on metadata by publishers won't solve this problem.

Dare I say that a better indexed web is a public good that we all benefit from immensely.

Logging into Google and providing it with the URLs isn't analogous because that only benefits Google, it provides a barrier to the entry of competition, which isn't good for us users.

> Dare I say that a better indexed web is a public good that we all benefit from immensely.

Sure, if it were used for the benefit of the public, rather than the benefit of Google.

I mean we collectively did this to ourselves by letting browsers be super permissive about invalid markup.

The only reason that Google is able to do this is because things that don’t know what rel=“usg” means will silently ignore it.

They would have found another, probably less elegant, way to do the same thing. E.g. they'd put the extra info in HTML comments.

Oh, hi Microsoft.

<!--[if gte IE 6 ]><![endif]-->

You're confusing forward-compatibility (ignore markup with tags you don't recognize, which is awesome) with accepting invalid markup (which is stupid).

I run large forums and mark my links "nofollow". I see no reason or benefit to me to change to or add "ugc".

It's not clear that there's any benefits for me.

And it's vague enough that I don't know that there are not downsides.

Seems best to do nothing.

Complete agreement. Furthermore, bad actors out there will abuse "ugc": they will mark terabytes of robot-generated content as "user-generated". It will not be possible to rely on it as an indication of organic content or anything of the sort. Search engines won't be able to use that for ranking, for instance.

This change is not for you. This change allows them to treat facebook/twitter/reddit links with weight if the other signals are present.

If I know that I'm traversing and processing facebook/twitter/reddit links, doesn't "nofollow" already indicate user generated content? (Even if some "nofollow" links do not do that, I can probably separate those based on their position in the documents.)

If you're going to treat specific sites specially, then go in all the way and be prepared to have your logic understand any/all aspects of their structure. Or else, don't bother.

>> "...no reason or benefit..."

I donno, seems legit to me. Adding "nofollow" just means "I'm not accountable for this" -- it's overloaded to mean both "My site is linking here but I didn't vet it first" (the UGC version), versus "This isn't actually me, it's an advertiser borrowing space on my site" (the sponsored version).

Your site/forum might want "credit" for being a sort of attention aggregation hub without necessarily taking responsibility for all the content your users post -- you may want a softer middle ground if such a thing becomes available. But in the advertiser version you're just renting out page real estate; your relationship with those links ends the moment the ad is displayed.

Giving you a way to differentiate between the two could let you benefit from ranking and placement that more appropriately takes your user content into consideration while better filtering out noise from ad content.

So... this is pretty much saying they will now analyze all links now, irregardless of "nofollow". Previously they outright ignored "nofollow".

Not enough signals when everything is tagged "nofollow"

Nitpick: There's no such word as "irregardless". Escorts myself out

Slightly tangential, but this comes to mind:

We are not here concerned with so-called computer 'languages', which resemble human languages (English, Sanskrit, Malayalam, Mandarin Chinese, Twi or Shoshone etc.) in some ways but are forever totally unlike human languages in that they do not grow out of the unconscious but directly out of consciousness. Computer language rules ('grammar') are stated first and thereafter used. The 'rules' of grammar in natural human languages are used first and can be abstracted from usage and stated explicitly in words only with difficulty and never completely.

--Walter Ong [Orality And Literacy]

For every word in existence, there was someone who used it first.

I don't think it's that simple. There's a chicken & egg paradox at play.

Are you trying to start an inflame war?

Nitpick: there is such a word according to MW [1]

[1] https://www.merriam-webster.com/dictionary/irregardless

1. I only recognise the Oxford English Dictionary, not those upstart Americans ;-)

2. Even the Americans say (from your link):

"Irregardless was popularized in dialectal American speech in the early 20th century. Its increasingly widespread spoken use called it to the attention of usage commentators as early as 1927. The most frequently repeated remark about it is that "there is no such word." There is such a word, however. It is still used primarily in speech, although it can be found from time to time in edited prose. Its reputation has not risen over the years, and it is still a long way from general acceptance. Use regardless instead."

That's doesn't read as accepting the 'word' beyond the most technical noting that some people use it.

Dictionaries are historians of usage not legislators of language. At least in English, where we have no equivalent to the "Académie française" (suck it, Jonathan Swift).

Though it considers it "non-standard" the word also exists in the Oxford English Dictionary: https://www.oed.com/view/Entry/99668

It actually sounds like they've already been analyzing all links, regardless of "nofollow". Which is not surprising in any way.

Seems so. They note at the end that they still respect rel=noindex and of course robots.txt though.

From the Article:

rel="nofollow": Use this attribute for cases where you want to link to a page but don’t want to imply any type of endorsement, including passing along ranking credit to another page.

This means the meaning of 'nofollow' is changing? That seems a horrible idea. Previously 'nofollow' meant exactly that - "don't follow this link please googlebot", now it will mean "follow this link, but don't grant my site ranking onto the destination." - Thats a VERY different use case, I can't see all the millions of existing 'nofollow' tags being changed by site owners to any of these new tags. Surely a 'nogrant' or somesuch would be a better option, and leave 'nofollow' alone.

Question that wasn't answered: Should user generated content links, that may or may not be sponsored (suppose the site owner neither knows nor cares), be marked "ugc" or "ugc sponsored" or "ugc nofollow"?

I think just "ugc" from Google's perspective, but they mention that you can use "ugc nofollow" for other services that don't understand "ugc".

Ugc: It's basically someone I didn't know posted this. Treat it as neutral.

Sponsored is someone paided/trade for this. Treat it as distrustful.

Both is someone I don't know posted this and I made money. Treat as fake-

Nofollow has and it's interpretation have always been interesting. Professional SEO's analyze sites as to their "link profile" the number of nofollow, follow, anchor'd, raw etc. type URLs that point to your site.

Consider a super-spammy site that eschews any like but a "follow", well that's weird - that may be an indication they're a spammer.

So while this (quite likely correctly) states that "nofollow" links have not search rank weight, it's not the entire story.

Not sure that many sponsors will be happy to keep sponsoring the content if they'll get "sponsored"(pretty much nofollow) link.

Seems like a net win.

It's not a win, it just means nobody will use that tag.

I also propose a semantic <ad> HTML tag, which would behave like <section>, to represent parts of a document that are an ad.

With an optional do-not-block attribute.

> With an optional do-not-block attribute.

Which would be honored just like Do-Not-Track? Seems only fair.

I'm curious if any attempt was made to submit this as a standard inclusion to the IANA link relations registry:


I appreciate nofollow and ugc. I run a site with user-generated content, and worry about spammers over writing the site with junk content. Having a way to indicate that the link is from someone else is useful for eliminating the incentives for spammers.

"Do more work so that we have more info. No, it doesn't give you any benefit."

Nofollow made sense but if everyone is using it for all links there is no value in it and the tag should be ignored.

I like the change it allows them to ignore nofollow for facebook/twitter/reddit or smaller sites that nofollow all links.

I'm pretty sure google has already been treating some nofollow links as if they weren't nofollow for some time now.

It makes organic indexing impossible, there is no point making your own website like this. Its even insulting calling hardworked content spam.

nofollow is 15 years old? Holy crap I'm old!!!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact