They could at least have allowed the Internet Archive in the robots.txt, since the way things stand all www.twitter.com links will be unavailable from the Wayback Machine. That will obviously be a huge loss to researchers.
You should never use Google's estimate as a real estimate, especially once it gets past 100. There are better ways to move content, like 301 redirects or rel=canonical.
IMO, that's exactly the reason. Before search engines could scrape the data and load the content for free. Now they'll need to reach firehose data agreements.
Take a look at the robots file without the "www" subdomain. Its likely to prevent content duplication/push for only the non "www" url to appear on search engines.
http://webmarketingschool.com/no-twitter-did-not-just-de-ind...