

Google Crawls My Site 90% Faster - Speed is Important - kadavy
http://www.kadavy.net/blog/posts/google-crawls-faster-speed-performance-optimization/

======
jsm386
Can I ask you what was going wrong in the past that caused your site to take
such a long time for Google to crawl it - and what you did to bring the crawl
time down so dramatically?

For reference, when some sites I was tracking spiked up over a few hundred MS
crawl time I grew alarmed and resolved those issues.

Also, another good way to measure things is to look at 'Site Performance' in
the Labs section of Webmaster Tools (right below Diagnostics used in this
post). You'll get a graph that represents your site relative to the Internet
along with improvement suggestions.

Also - while the conclusion seems accurate: faster load time, better ranking
in perhaps 1% of cases is something Google has talked about - I think your
reasoning is off.

You wrote: _.If a site can be crawled faster - and requires less resources to
index, doesn’t it stand to reason that it will be rewarded with higher search
rankings?_

It's not resources, it is a matter of user experience. The faster your site
loads, the happier Google's searchers who clicked over to you are:

 _Speeding up websites is important - not just to site owners, but to all
Internet users. Faster sites create happy users and we've seen in our internal
studies that when a site responds slowly, visitors spend less time there_

See [http://googlewebmastercentral.blogspot.com/2010/04/using-
sit...](http://googlewebmastercentral.blogspot.com/2010/04/using-site-speed-
in-web-search-ranking.html)

~~~
kadavy
I understand the user experience issue, but I'm trying to illustrate that
there's also the issue of Google's resources. They are a business after all,
no matter how they hope to not be evil.

What I did to make the speed improvements - check out the post that I
referenced (twice) in the post: [http://www.kadavy.net/blog/posts/wordpress-
optimization-drea...](http://www.kadavy.net/blog/posts/wordpress-optimization-
dreamhost-rackspace/)

~~~
ericd
I don't see how it factors into their resource usage at all. A crawler that's
blocked waiting for a reply from your site is unlikely to be using much/any
CPU time compared to an active crawler, and that time is surely being used by
another crawler on that machine. All crawlers blocked and waiting? That means
they can spin up 50 more!

Unless they're doing it wrong, and I very much doubt they are given it's
central to their business, it's purely a user experience issue.

~~~
kadavy
I don't know the details of how their crawler works, but it seemed to me that
if a page takes more time to serve to the crawler, there would be some lost
resources.

Even so, faster page load time also means higher AdSense CTR, and more money -
though I guess you could make an indirect user experience case for that.

~~~
ericd
There's bound to be a small amount of memory used by the idle crawler, but if
they designed it well, it seems unlikely that that would be a limiting factor.

You make a fair point about the AdSense. I would guess they care more about a
better user experience, but it's certainly possible. Everything they do can be
viewed as a means to keep people searching and using the web, and clicking on
more ads.

------
strebler
Google's page loading times seem kind of flaky. One of our sites' loading
times has jumped between 1.5 and 10 seconds and back four times since they
started graphing it.

But I have a cron job that tracks the loading times of random pages on our
sites (from offsite, every 10 minutes). I get very consistent loading times
(nowhere near that variance).

(Edit: they also suggest GZIP resources that are in fact already gzipped, at
least they are according to the HTTP headers)

------
ericd
I think this post has a grain of truth behind it, though I don't agree with
the reasoning about resource utilization.

I noticed that when one of my sites went from 300ms response times to 30ms,
the crawler started indexing more pages per day, and would index deeper, which
meant more of my pages in their index. The result was a healthy boost in
organic search traffic due to more long tail search matches.

~~~
kadavy
I'm starting to wonder if people are interpreting "resources" to mean "CPU
resources," when I really mean "resources" as in time, money, etc.

What's your hypothesis for why the crawler started indexing more pages per day
when your response time improved? That _seems_ like evidence of my theory, but
I'm a designer by training so maybe there's a more technical explanation I'm
missing.

~~~
ericd
Yeah, I assumed you meant machine resources. Machine resources tend to be the
limiting factor for massively parallel things like crawling, so that's why I
assumed that.

My hypothesis is that there's a single crawler that was looking at my site,
and it makes a best effort to crawl as much as it can. I have more pages on my
site than can be crawled in a single day at 300ms/page, but at 30ms/page, it
can be done in 3-4 hours. I don't know enough about their architecture to make
any further guesses than to say that it probably just grabs as much as it can
within the time it's focused on my site before something else on the queue
becomes a higher priority.

------
crxnamja
nice information, will try to use some of the tips.

~~~
tropin
Where the hell are the tips? It would really hard to find a blog post with
less meat, even the first comment here in nh is longer.

