

Bit.ly/robots.txt and the Dangers of Custom Shortened URLs - byrneseyeview
http://www.davidnaylor.co.uk/dangers-of-custom-shortened-urls.html

======
gojomo
_All the owner of that blog would have to do would be to change his post to
look like a normal robots.txt file and he could happily ban Google (or Yahoo,
or whoever) from crawling any page on bit.ly._

No crawler I know of accepts a redirected robots.txt from an alternate domain
for rules about the original domain.

~~~
NathanKP
That is true, but it must still mess things up to have robots.txt redirected.
It would appear that Google and other bots won't have any way of reaching the
real robots.txt.

And what about sitemap.xml, atom.xml, and other typical files that could also
be redirected?

~~~
timf
That is bit.ly's problem for not making an exception (assuming they want one
in the first place, it's not required), not a general problem.

~~~
NathanKP
Nope, its not a general problem, but I bet the programmers over at bit.ly are
still going to catch some flack over this.

~~~
timf
Agreed

------
brown9-2
I'm a bit amazed at how much these URL shorteners have caught on.

I've started to see them used in email sent _internally_ by Corporate folks at
my company, as links to press releases on our _own_ website.

~~~
wmf
Of course, many corporate Web sites are so poorly designed that every URL is
longer than 80 characters and thus may be mangled in email.

~~~
snprbob86
Most corporate email travels via Exchange with Rich Text, so the 80 character
limit doesn't apply. That's just what "normal users" use; rightfully so.

For us hackers: When sending plain text messages, you should encase urls in
<angle_brackets> to prevent breaking and especially the trailing period
problem like <www.google.com>.

~~~
silvestrov
You forgot: Lotus Notes. Mishandles html emails so much, the new Outlook is
pure perfection compared to it. (grumbles over having to implement a complex
receipt email for a company that uses Notes internally).

------
tlrobinson
Heh, check out the statistics for that URL:

<http://bit.ly/info/robots.txt>

Hundreds of thousands of "direct" hits, and hardly any actual referrers. Looks
like bit.ly is counting crawlers as hits. Not too surprising given the lack of
validation elsewhere.

