The (humorous) dangers of Google's new deep search

sdurkin · on April 17, 2008

Very funny anecdote. What bothers me though is that this implies that Google seems to think they own the web.

For example, they told the guy that he was lucky that they were willing to give him a backup, but it seems to me that Google's the one that should be taking responsibility for their actions.

Its a short jump from, "you have to use POST or we'll delete your stuff" to "you have to follow Google standard X or we won't index your site."

Welcome to the first web empire.

Tichy · on April 17, 2008

No, any web crawler would have done the same thing. It is simply an error to modify content with a GET.

meat-eater · on April 17, 2008

I agree. Google simply followed the the semantics of http. They had similar problems with the the Google Web Accelerator before. But it's really hard to blame google for the mistake of other programmers.

TFrancis · on April 17, 2008

Additionally, the interface to web crawlers (robots.txt) is well defined.

sdurkin · on April 17, 2008

Yes, it is an error. What alarms me is Google's attitude.

What happens when we're dealing with interfaces that are a little bit more ill-defined? Will Google continue to demand that you follow their way of doing things?

Google's attitude in this case suggests they will.

neilc · on April 17, 2008

It's not "their way of doing things", it is just the way the web works (per HTTP spec). Any crawler would have done the same thing in that situation -- that fact that it happened to be Google is merely coincidental. Given the scale at which they operate, you can't expect Google or any other web-scale crawler to be mind-readers.

marcus · on April 17, 2008

At least search engines can be told not to touch a certain link, if a bored user did the same thing who would you blame?

keshet · on April 17, 2008

In a way Google is doing QA on the whole web. This will expose all kinds of bugs sites have hidden behind their form-processing scripts. Databases will get filled up with random junk ('Google was here'), but that is also good for QA. On the other hand, a lot of that junk data will be reflected back to the web by sites which post this stuff.. not so good for the SNR of the Internet as a whole.

graywh · on April 17, 2008

I've seen the same thing happen with a rails app a co-worker created and populated with data for a demo/tutorial. It got indexed by the university's web crawler overnight and was empty the next day.

TrevorJ · on April 17, 2008

Oh man, THAT is awesome! ....Backing up my site now :-P

redorb · on April 17, 2008

I they will only use drop downs, thus most don't contain such harsh options. I want to hear a good case of getting behind those forms... if they do enter data into boxes, they will do it off a dictionary list that is certified safe ..

- Sites still use drop down nav? (is that a case?)