

Googlebot on a tcp/ip level - bauchidgw
http://tupalo.com/en/blog/seo-mystery/

======
jacquesm
Apart from the auto-backslapping this a an interesting confirmation that
performance of your website impacts search engine results.

As for google sending multiple requests, from the way the article is written
it sounds as though google sends the requests all at once and then waits for
the answers to come back one by one, you can cure this by switching keep-alive
off on the server side.

Typically in your http configuration you would add a line like this (example
for apache):

    
    
        KeepAlive Off
    

You could even do this just for the googlebot:

    
    
        BrowserMatch "GoogleBot" nokeepalive
    

That way you can 'fix' the google bot issues without affecting the normal
users of the site.

~~~
ivank
Even better would be to convince GoogleBot not to pipeline, while keeping
keep-alive on. Perhaps by sending some kind of IIS/4 Server header to
Googlebot?

~~~
jacquesm
Keep-alive and pipelining go hand in hand, a trick I abused for many years in
order to serve up streaming video using jpegs.

The funny thing is it works both ways, if you switch keepalive to 'on' and you
start dumping answers in to the pipe pre-emptively (because, for instance you
know you're talking to your own little piece of javascript on the other side,
so you can predict the next request) then you can save yourself the round-trip
delays that you would have if you stuck to the regular request/answer,
request/answer pattern.

Keep-alive on pretty much implies that pipelining is ok.

For many years this was the 'secret sauce' that my company lived off, the fact
that nobody clued in to it is something that amazes me to this date, it seemed
a pretty obvious thing to do.

~~~
ivank
> dumping answers in to the pipe pre-emptively

That's a very interesting hack, but I hope no one decides to deploy it today.
You really can't predict what the next request will be - the browser can reuse
the connection for some other request at a whim.

And I'm sure there's some proxy around that will panic if it gets a response
before getting a request.

> Keep-alive on pretty much implies that pipelining is ok.

Well, sure, in theory. But since nearly every browser keeps it off (edit:
doesn't pipeline requests), obviously a ton of servers are broken. Even Flickr
was broken for many years with pipelining on (image downloads would abort
randomly). Chrome is planning to eventually enable it with a bunch of
heuristics; hopefully that will improve the situation.

Edit: "Making HTTP Pipelining Usable on the Open Web":
<http://tools.ietf.org/html/draft-nottingham-http-pipeline-00>

~~~
jacquesm
As a rule I have it 'off' because I have seen more bugs related to keep-alive
than that I've seen benefits from it but in some special cases the speedup can
be dramatic so you should always at least test to see what it does for you.

For instance, a gallery page with lots of small thumbnails could benefit from
keep-alive being on.

------
hartror
While the article was interesting the first line "We did it. We solved one of
the unsolved big SEO (Search Engine Optimization) mysteries of the modern
time." had me reaching for the close button. Is it just me or is there a LOT
of spin from SEO types?

~~~
rythie
I imagine getting on Hacker News is a good boost for their SEO, so maybe it
worked?

~~~
franze
as the author of the article in question, SEO and geek i must say: i
deliberately overspinned the wording of the article, as - as a matter of fact
- the mystery was so mysterious that nobody in the so called SEO "community"
ever noticed that there was something rotten going on in their beloved google
webmaster reports.

and (i'm not sure why, it was a friday afternoon when i wrote that thing) i
thought it would be funny to take the typical over-the-the-top-linkbait-kind-
of-writing-style and ...... did something nobody has ever done before ... put
some real information in it.

~~~
carbocation
I think the post conveyed the tongue-in-cheek nature pretty well. I actually
thought you'd be disclosing an even _smaller_ finding.

------
nl
Someone needs to point out that this isn't the TCP/IP level, it's just HTTP.

I know it is pedantic technical discussions need precision in their agreed
terminology. HTTP and TCP/IP are at different stack levels, and everything
discussed here is HTTP.

------
TimH
It would be useful if Google hooked some info like this into the error message
somehow. Can anyone pass it on to the right people at Google?

~~~
alanh
Matt Cutts already commented on the article, so I'm sure the right people will
know before Monday is over.

------
mrb
"If you don’t understand a single word i just wrote, please remember, we are
geeks."

Hum. An SEO discovers HTTP pipelining and gets all excited about it. _yawns_

~~~
Andrewski
SEO bums are not geeks, they are cancer. SEO is why Google and other search
engines suck now.

Remember to get your herbal V14gra, and to Digg this!

------
mybbor
Thank you so much. The timing of this article is fantastic I have actually
been battling this exact bug/quirk all week.

------
whackberry
I wish we had a downvote arrow as well. This is SEO spam and it reached nr 1
on HN.

