
Twitter just updated its robots.txt to exclude all scrapers - sb057
https://www.twitter.com/robots.txt
======
searchmartin
Here's a write up as to why they have made the changes. Was going to write it
yesterday, but life got in the way:

[http://webmarketingschool.com/no-twitter-did-not-just-de-
ind...](http://webmarketingschool.com/no-twitter-did-not-just-de-index/)

~~~
sp332
Why would you not use 301 redirects or just rel=canonical?

~~~
searchmartin
There must be some platform issue is my best guess...

~~~
sp332
They could at least have allowed the Internet Archive in the robots.txt, since
the way things stand all www.twitter.com links will be unavailable from the
Wayback Machine. That will obviously be a huge loss to researchers.

~~~
sp332
(Update: the Wayback Machine will be fine, using the twitter.com/robots.txt)

------
mattbasta
Nah.

[https://twitter.com/robots.txt](https://twitter.com/robots.txt)

They blocked robots on their marketing pages.

~~~
searchmartin
again, no, they blocked it on WWW. only: [http://webmarketingschool.com/no-
twitter-did-not-just-de-ind...](http://webmarketingschool.com/no-twitter-did-
not-just-de-index/)

------
sb057
Same file as of a few hours ago:

[https://web.archive.org/web/20150715164726/https://twitter.c...](https://web.archive.org/web/20150715164726/https://twitter.com/robots.txt)

------
searchmartin
Nope. No. They didn't.

What they did was some perfectly legitimate duplicate content protection.

Will write it up in a bit more detail...

------
kenkasan
so what does that mean?

~~~
minimaxir
Absolutely nothing from Twitter should be appearing in search engines.

~~~
Paulods
Actually i believe its only from the "www" subdomain. Take a look at the
robots without the "www"

[https://www.twitter.com/robots.txt](https://www.twitter.com/robots.txt)
[https://twitter.com/robots.txt](https://twitter.com/robots.txt)

This is likely just to prevent content duplication/nudge users to visit
without the "www".

~~~
thenextcorner
yep makes a huge difference... as explained here as well
[http://webmarketingschool.com/no-twitter-did-not-just-de-
ind...](http://webmarketingschool.com/no-twitter-did-not-just-de-index/)

~~~
sp332
You should never use Google's estimate as a real estimate, especially once it
gets past 100. There are better ways to move content, like 301 redirects or
rel=canonical.

