

Googlebot snarfing URLs from the ether? - colanderman

So the other day I had placed a private CGI script on my personal server for testing.  (No links to the URL; directory redirects to my home page.)  It crashed my server, so I rebooted it and decided to come back later.<p>Two days later, I get an e-mail from Google's Webmaster tools (which I have configured on that server) that my site is down.  Indeed, it needs another reboot, as if someone accessed that private URL.  But surely no bot could know that URL exists, right?  Checking my server logs I find this (1.2.3.4 is my IP address):<p><pre><code>    1.2.3.4 - - [31/May/2013:01:01:41 +0000] "GET /~chris/avcc.cgi HTTP/1.1" 200 283 "-" "Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.10.289 Version/12.02"
    66.249.73.230 - - [31/May/2013:01:02:11 +0000] "GET /robots.txt HTTP/1.1" 200 477 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
    66.249.73.230 - - [31/May/2013:01:02:11 +0000] "GET /~chris/avcc.cgi HTTP/1.1" 200 291 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
</code></pre>
That is, mere <i>seconds</i> after I had first accessed a URL the world had <i>never seen before</i>, Googlebot decided to crawl my site (as indicated by the robots.txt hit), and then access the URL that <i>was not linked anywhere</i>.<p>My only explanation is that somewhere between my home computer and my (off-site) server, a Google-affiliated network appliance <i>sniffed my initial GET</i> and <i>sent the URL to Googlebot to be indexed</i> within <i>30 seconds</i>.<p>Has anyone else experienced this?<p>(Yes I know security through obscurity, should use robots.txt, yadda yadda.)
======
mattsah
Did you use Chrome to access the URL?

~~~
colanderman
No, I'm using Opera.

------
stray
Did you access that URL with _Google_ Chrome mere seconds before googlebot
miraculously found it?

~~~
galacticvoid
Or does the browser your using have instance results from the address bar and
the search engine is Google?

~~~
colanderman
Yes, though I've tried testing for this, but I only see requests from my
browser (Opera) for _incomplete_ URLs.

