
How rel=nofollow Works - luigi
http://luigimontanez.com/2012/how-rel-nofollow-works/
======
Loic
The article is a bit wrong. nofollow does not mean that the crawlers must not
follow the link, but means that the ranking algorithms must not consider the
link for the ranking as the link could be dubious or spam.

As such, the relationship between the page linked and the page linking is not
to be affected by the link. And _this_ is what makes the links not surfacing.

Basically, the current Twitter HTML says: We have no outbound links.

~~~
chalst
Google has said that it won't add a URL to its index of URLs to crawl as a
result of seeing the URL in a rel=nofollow href [1]. This is over and above
not weighting the link if the URL does otherwise make it onto its to crawl
list.

[1] [http://googleblog.blogspot.com/2007/02/robots-exclusion-
prot...](http://googleblog.blogspot.com/2007/02/robots-exclusion-
protocol.html)

~~~
dolbz
The post you're linking is discussing the META tag version of nofollow, not
the hyperlink rel attribute which is detailed here:
[http://googleblog.blogspot.com/2005/01/preventing-comment-
sp...](http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html)

This article doesn't say explicitly that it won't follow those links and I
would suggest it often will as comments _are_ a rich source of pages to index.
The nofollow on a link like this just indicates that the page owner can't
vouch for the quality of the linked page (e.g. user submitted link) and
therefore doesn't want to pass pagerank to it.

Edit: it's also implied in the parent's link that the META tag only prevents
the following of links to pages within _your_ site. This sounds reasonable as
who are you to tell Google they can't index a third party domain?

~~~
mapgrep
"who are you to tell Google they can't index a third party domain?"

No tweets link to third-party domains, since Twitter wraps all links in their
own URL shortener (t.co). Even if you use an independent URL shortener, it
will be wrapped by t.co.

(I suppose you could argue that there is no way for Google to determine this
algorithmically, since "twitter.com" != "t.co", so it should go ahead with the
crawl, but there's the question of how Twitter would respond to that.)

~~~
Raphael
They could create a robots.txt for t.co and set it to noindex.

------
chalst
The cited Danny Sullivan article has a little of his interview with Eric
Schmidt:

sullivan> I countered that Google seemed to have all the permission it needed,
in that they’re not blocked from crawling pages.

schmidt>“That’s your opinion,” Schmidt said, then joked: “If you could arrange
a letter from Facebook and Twitter to us, that would be helpful.”

sullivan> I pushed back that both have effectively given those letters since
their robots.txt files — a method of blocking search engines — weren’t telling
Google to go away.

Well, why stop there? Why shouldn't Google ignore the robots.txt file in its
search for shareable nuggets for its search results, which they are equally
not "blocked" from using? The answer for both rel=nofollow and robots.txt is
that Google has explicitly promised webmasters that it will not do this.
Sullivan knows this: this is bad journalism. I'd be curious to see more of the
transcript of the interview.

~~~
masklinn
> The answer for both rel=nofollow and robots.txt is that Google has
> explicitly promised webmasters that it will not do this.

No, not for a/@rel="nofollow", the only promise Google does is that it will
not transfer pagerank credit to the link's destination[0].

This does not prevent them from using that link as an outbound in order to
discover new content, nor does it prevent Google from reversing the link for
the "Shared On" feature. Google _specifically advocates_ [0] using
a/@rel=nofollow for user-generated links in order to prevent rewarding
spammers, Twitter's use of the attribute is not just sensible, it's necessary.

niyazpk's comment[1] looks far more sensible: "Shared On" is a specific
feature for content producers and part of a special agreement and API access.

[0] [http://googleblog.blogspot.com/2005/01/preventing-comment-
sp...](http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html)

[1] <http://news.ycombinator.com/item?id=3456233>

~~~
jasonlotito
> the only promise Google does is that it will not transfer pagerank credit to
> the link's destination

Google is fairly clear about this:

[http://support.google.com/webmasters/bin/answer.py?hl=en&...](http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569)

"How does Google handle nofollowed links?

In general, we don't follow them. This means that Google does not transfer
PageRank or anchor text across these links. _Essentially, using nofollow
causes us to drop the target links from our overall graph of the web._
However, the target pages may still appear in our index if other sites link to
them without using nofollow, or if the URLs are submitted to Google in a
Sitemap. Also, it's important to note that other search engines may handle
nofollow in slightly different ways."

So, if your tweet includes a link, they drop that link from the post as far as
their graph is concerned. And that graph is what makes up their search.

It's not _just_ page rank.

~~~
masklinn
Interesting, so their implementation has broadly expanded upon the original
meaning of the attribute (and the one standardized as a microformat), and in
recommending this be broadly used to fight against spam they've paved the way
for an interesting time for everybody else?

~~~
jasonlotito
Broadly expanded? Standardized? Please.

I think the nofollow attributes intent was fairly straight forward from the
beginning: to designate links that should not be accorded attention in search
results. In 2005, this was merely via PageRank. Today, it's recommendations in
search results.

Why would I want to see search results that shouldn't be given weight in a
search result in a search result? Explain that to me.

------
jimmy_cheese
But, MG Siegler is correct, is he not, that google have chosen not to link to
artists twitter pages alongside their G+ pages, and that this could easily
been seen as anti-competitive.

e.g.
[https://www.google.com/search?&q=music](https://www.google.com/search?&q=music)

~~~
luigi
That's a new sidebar feature that goes where ads have traditionally gone.
Google has decided to literally promote G+ over ads, which is a story in
itself, but a totally different one.

It doesn't change the search results in the main column.

~~~
Steko
How is it a "totally different" story when the two changes are launched at the
same time under the same name ("Search Plus Your World") and both underplay
the dominant social networks?

They seem rather related parts of the same story.

You make a big deal about not "ascribing evil motives to Google" but I think
it's pretty clear that Google is going all in on G+ and both aspects of
Search+ that Danny Sullivan has taken issue with are directly related to this.

I don't care for the word "evil" because I think Google may be perfectly
within their rights to do this but it's also very unlike how Google has
historically operated. Inasmuch as that historical behavior personified "don't
be evil", well it's hard to be shocked when people call this new, tough
negotiating Google "evil".

------
abalone
This guy completely misread MG Siegler. MG's point is rel=nofollow has nothing
to do with the twitter profile pages that google is refusing to include in
their people & places feature. Nothing to do with outbound links.

This enitre debate over rel=nofollow is a red herring.

------
lambada
I'm amazed really by how much FUD has be4en spread around by Twitter, and most
reports of the story. This article clears it up perfectly.

~~~
masklinn
It does not, a/@rel=nofollow has nothing to do with it, the only thing Google
says it will do with these is break searchrank credit transference, and Google
themselves recommends to set this attribute:

> anywhere that users can add links by themselves, including within comments,
> trackbacks, and referrer lists

[http://googleblog.blogspot.com/2005/01/preventing-comment-
sp...](http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html)

In fact, _Google_ created this attribute _specifically and solely to combat
comment/user spam_.

If _Google_ decided to expand the role of @rel="nofollow" as a gigantic "fuck
you" to everybody that's a different issue, but the article still is not
right.

~~~
jasonlotito
You are repeating this misinformation. As I explained here:
<http://news.ycombinator.com/item?id=3456404> Google makes this clear in this
answer:
[http://support.google.com/webmasters/bin/answer.py?hl=en&...](http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569)

------
kgo
FWIW, bingbot basically DDOS'ed our site a few months ago by crawling links
that were labeled nofollow/noindex in the hrefs. Adding a rule to robots.txt
fixed the problem.

------
thadwoodman
"The hubbub is centered around a complaint by Twitter that links shared on
Twitter are not surfacing in the search results."

I think this mischaracterizes this complaint. I don't think anyone is
complaining that the LINKS themselves aren't surfacing. That complaint would
imply that their ought to be some influence between links on Twitter and the
search rank of those linked pages. And I don't think anyone wants to see
spammy tweets on Twitter altering skewing search ranking.

Rather the complaint is that recommendations from Twitter are not APPEARING in
the search results. And to that point: Why is it difficult for Google to index
the fact that someone linked to a site (whether it be rel=nowfollow or not)?
Why does this information need to be coupled with the passing of page rank?

------
EthanEtienne
I don't think this is what they're complaining about. Twitter is saying why
are they not linking to the @wwe Twitter account in the new social
recommendations like the Google+ links to @wwe. They're not talking about
links outbound of Twitter, rather links inbound to twitter, like the @wwe
twitter or Facebook account pages.

------
latch
What would happen if twitter removed rel="nofollow"? Couldn't that open the
possibility for more spam on twitter? I tend to agree with Google here, but
maybe when it comes to social recommendation, Twitter could be recognized as
special and specific algorithms could be written to figure out the worth of a
recommendation.

~~~
sjs382
I can't imagine _more_ spam on Twitter.

In all seriousness htough, you can probably analyze the Twitter "social graph"
to weed out a lot of spam.

------
raghavsethi
Brilliant article. I was starting to doubt Google for a minute, but this
clears it up concisely.

------
jerednel
There seems to be some confusion about the different kinds of nofollow.

<a rel="nofollow"...> Means not to follow the link to which the nofollow
belongs. It may still be indexed, however, if the page is found by a either an
internal or external followed link.

The <meta name="robots" content="nofollow"> does not follow/crawl to any of
the links on the page. These pages may be indexed, however if they are reached
by either an internal or external followed link

To keep the page out of the index, "noindex" should be applied to the meta tag
of the page that is to be removed from the index.

Further, a Robots.txt /disallow does not remove pages from the index. A
noindex must be used on the page to remove it from the index. Or a request via
Google's Webmaster Tools.

That said, it seems feasible that if Google wanted to, they could parse the
external links from a Twitter stream without actually following them. This
wouldn't necessitate Twitter removing nofollow from their links.

~~~
mapgrep
"if Google wanted to, they could parse the external links from a Twitter
stream without actually following them"

The problem is that there are no external links in the tweets. All tweets
point to t.co, which is run by Twitter. You can't find out the ultimate
destination of the link without following it through Twitter's redirector.

------
jyap
Twitter does in fact implement rel=nofollow to external links on their site
but this does not mean content (especially profiles) can not be indexed.
Mentioning Twitter’s use of rel=nofollow is a definite red herring.

~~~
redthrowaway
Individual tweets will be indexed, but _google has no idea who's sharing
what_. That's the issue at hand. Google's changes to Search will promote links
that your social connections have shared. If googlebot is ignoring links in
tweets, at Twitter's request, then Twitter has taken themselves out of the
Search plus Your World game.

Again, the issue is not that googlebot doesn't index tweets. The issue is that
it doesn't index any of the links in those tweets. Thus Google has no way of
displaying "@Someone tweeted this" in their Search plus Your World results.

------
keltex
Personally I think it's dumb for Twitter to nofollow it's outbound links. If I
were them I would let the links be followed and then let Google come up with
an algorithm on how to rank the links from Twitter.

------
adeelarshad82
Can someone please explain to me what nofollow has to do with Google indexing
or displaying tweets in their serps. It's like saying Google discourages pages
which may contain relevant/important information on the topic a user may be
looking for because it contains nofollow links. What am i missing here?

------
recoiledsnake
What have outbound links to do with this at all? Many tweets can and do have
zero links. I don't see how that's relevant to this conversation where the
topic is about searching the tweets themselves, which has nothing whatsoever
to do with outbound links from those tweets.

------
niyazpk
(I am new to this debate and so may be way off, but)

As far I understand, even this article is not very clear. It says:

 _Google is simply complying with Twitter.com’s directive to not follow
outbound links in tweets it crawls, and the consequence is that there will
never be ”… shared this on Twitter” in the search results._

Wrong.

You see, when I share some random link in my blog and then you search for that
topic, Google will not say "niyazpk shared this in ...." in the search result.
Why? Because Google probably considers shares from a few trusted
sites/partners only.

Let us read Google's explanation again:

 _We are a bit surprised by Twitter’s comments about Search plus Your World,
because they chose not to renew their agreement with us last summer
(<http://goo.gl/chKwi>), and since then we have observed their rel=nofollow
instructions._

And this quote from Google[1]:

 _Since October of 2009, we have had an agreement with Twitter to include
their updates in our search results through a special feed, and that agreement
expired on July 2. While we will not have access to this special feed from
Twitter, information on Twitter that’s publicly available to our crawlers will
still be searchable and discoverable on Google._

It is pretty clear what happened. Twitter did not renew the agreement with
Google and Google stopped considering Twitter as a source for the "shared on"
snippet. The "no-follow" attribute has nothing to do with except that it work
exactly like it works for any other site.

[1] [http://searchengineland.com/as-deal-with-twitter-expires-
goo...](http://searchengineland.com/as-deal-with-twitter-expires-google-
realtime-search-goes-offline-84175)

~~~
Terretta
> _new to this ... may be way off_

You are indeed "way off".

The article is factually and technically correct, while your "Because Google
_probably_ considers" is speculation.

The "special" feed supported crawling efficiency, giving Google a real time
firehouse of new tweets, instead of having to crawl Twitter as any visitor or
spider would.

The "no follow" breaks the association between a tweeter and the shared
content, exactly as the linked article states.

In Google's own words, "Essentially, using nofollow causes us to drop the
target links from our overall graph of the web."

