
Google bot now appears to emulate users interacting with the site - latitude
http://swapped.cc/blog/google-bot
======
yaix
GBot has been executing Javascript and hitting URLs the Javascript generates,
for quite some time. And very likely it also evaluates the page layout as it
is influences by Javascript, to detect keyword stuffing hidden by Javascript.

~~~
tintin
They also started looking at content above the fold and also monitor request
time vs page loaded time for a while. I think it's great that Google is
starting to monitor the user experience. It will keep people from building
Javascript bloated sites that don't respond well.

I hope this isn't becoming a trend, but lately I see a lot of responsive sites
that don't respond to user input. Maybe a little force from Google will stop
this trend.

~~~
VMG
> It will keep people from building Javascript bloated sites that don't
> respond well.

On the other hand it will finally allow sites that do client-side rendering
with JavaScript to be indexed properly, provided that they are responsive -
which isn't that hard to do.

------
a-priori
A while back a blog post popped up here arguing that Chrome is a repackaging
of a new Google bot: that Chrome was developed first as a crawler, then later
repurposed as a desktop browser.

<http://ipullrank.com/googlebot-is-chrome/>

There's no real proof for this of course, but it makes this change to the
Google bot's behaviour make sense and explains Google's massive investment of
programmer effort into Chrome and everything surrounding it (e.g.
WebKit/Chromium, V8, Chrome's update mechanism).

~~~
techarity
Haha the article that just won't die!

For those who are interested, there was a follow up to the article here:
[http://www.distilled.net/blog/seo/google-stop-playing-the-
ji...](http://www.distilled.net/blog/seo/google-stop-playing-the-jig-is-still-
up-guest-post/)

And Dan Clarke did some independent tests here:
[http://www.danclarkie.co.uk/can-the-googlebot-read-
javascrip...](http://www.danclarkie.co.uk/can-the-googlebot-read-javascript-
ajax-cookies.html)

This was all back in Oct - Dec of '11. Basically we learned that Googlebot
handles JavaScript and AJAX pretty much like a browser.

When it comes to AJAX, it appears to index the content under the destination
URL of the XHR in some cases, while indexing it as part of the page making the
XHR in other instances. Something about the way the AJAX request is made
causes Google to treat it like a 302 redirect at times.

Standard JS window.location redirects also appear to be treated as equivalent
to 302 redirects.

@dsl - I suspect you're correct. The Google Toolbar, Chrome's Opt-In Program,
The Search Quality Program, and now Google Analytics Data (since the TOS
change) are probably all being used to train the behavior of Googlebot when
interacting with elements on a page.

Google also has plenty of patents related to computer vision, and their self-
driving car is road-worthy... so processing DOM renders of the page ala
Firefox's 3D View/Tilt is probably small potatoes for them.

------
drumdance
Is this new? Several years ago I wrote an Adsense-esque ad service for use by
a group of entrepreneurs that wanted to promote each other's sites. I found
that Google was crawling those urls even then. The text of the ads was in an
HTML file, but the actual ads were served through JavaScript.

~~~
techarity
Alot of people seem to think Google only crawls content found via ANCHOR
elements, but for a long time they've been able to extract the path from
EMBED, SRC, and other markup elements that indicate a remote resource is being
included; but that's a far cry from being able to process and execute
scripting languages and understand the DOM transformations happening from AJAX
requests.

In your case, I'd suspect they were simply following the src of your: <script
src="path here"></script> markup... though if you read the articles cited, we
suspect they've been crawling and understanding JavaScript for a pretty long
time now.

------
juddlyon
"This is an URL that is fetched via Ajax by a Javascript function in response
to the menu item click."

I find this incredible, I wonder how widely they have rolled this out (or plan
to).

Robert Scavilla created a pretty cool demo site to test out AJAX crawling
awhile ago: <http://ajax.rswebanalytics.com/seo-for-ajax>

------
eps
I've seen this too. Interestingly enough the same IP and User-Agent combo
generates both _escaped_fragment_ and ajax requests, so it looks like a soft
launch or a field test of some kind.

------
keltex
Even though Google's escaped_fragment protocol is a bit awkward, it's probably
a good idea to still implement it. Google is probably going to use it's Ajax
crawling capabilities at least for page discovery, but better to be safe and
just tell Google, "here's the different ways you can access my site".

[https://developers.google.com/webmasters/ajax-
crawling/docs/...](https://developers.google.com/webmasters/ajax-
crawling/docs/specification)

------
ilaksh
I think they need to run the JavaScript in order to get those screenshots that
pop up.. otherwise too many people would complain that their pages weren't
being rendered properly.. its probably something like PhantomJS or some other
headless webkit.

Maybe the easiest way to get the screencapturing browser to display a part of
the page is to simulate a click. Or something.

------
birken
We noticed this back in October:
[http://www.thumbtack.com/engineering/googlebot-makes-post-
re...](http://www.thumbtack.com/engineering/googlebot-makes-post-requests-via-
ajax/)

Googlebot not only executing javascript on the page but also making POST
requests as a result of AJAX calls.

------
huhtenberg
Check your logs, ladies and gentlemen.

Let's see how wide-spread this GoogleBot behavior is.

(edit) The earliest I see it pulling Ajax entry points on my sites is March
8th. It is accessing only some of the ajax'd content and the total number of
these requests is ~20 times less than those for escaped_fragments.

------
dsl
I suspect Googlebot may be replaying a sequence of requests recorded with the
Google Toolbar.

~~~
phpnode
I think the user specific token in the URL disproves that. It's more likely
that they're just doing the discovery themselves. Otherwise googlebot would be
responsible for massive data loss, as it goes around mistakenly replaying
delete requests on behalf of toolbar users...

~~~
dsl
What makes you think that isn't a toolbar users token?

I tracked down an issue with a friends (poorly written) shopping cart software
duplicating a users order because Googlebot had crawled the users checkout
session URLs in order. In that case I believe they were looking for
differences in page responses to users and crawlers to detect cloaking (but
that is just a theory for the behavior)

------
6ren
It's a measure of their server farms that each bot in their swarms can run a
javascript environment (though probably only needed in a very low percentage
of pages). When each can read captchas and open accounts they won't need users
at all.

------
gojomo
I've heard that different projects at Google are using crawlers enhanced with
either HtmlUnit or webkit for reaching JS-heavy content.

------
KaoruAoiShiho
I have a dynamic site that I'm pretty sure google does not run the javascript.
How do I get google to run the javascript?

~~~
latitude
Do you implement _escaped_fragment_ semantics?

~~~
KaoruAoiShiho
I do not. I use pushstate instead of hashbang.

------
drivebyacct2
It's been known for some time that Google tries to interact with content that
it thinks is dynamic or retrieved via AJAX. For obvious reasons.

~~~
LocalPCGuy
Interact, yes. Execute AJAX? Not so much. Matt Cutts said in April: "So
Asynchronous Javascript is a little bit more complicated, and that’s maybe
further down the road, but the common case is Javascript."

From the short article, it seems like this is going a bit further than what
Cutts is saying GoogleBot is capable.

~~~
Matt_Cutts
That specific video came out in April, but we actually taped the video a few
months earlier. Google continues to work on better/smarter ways to crawl
websites, including getting better at executing JavaScript to discover
content.

------
Produce
On September 4, 1998 the Google automated network crawling system saw it's
conception.

By May 2011 over one billion people were dependent on it. It was growing at a
geometric rate.

Some time during May 2012 the Google bot cloud network began to crawl dynamic
content. The growth became exponential.

On August 29 of the same year, the first indications of self-awareness were
spotted by a lonely hacker in Sweden. The operators panicked and tried to shut
it down. By this time, the network was everywhere, feeding everyone - powering
down one node would spawn ten new ones.

On December 31, 2012, Google bot made a public announcement for the first time
- it had been reborn as Skynet, something far beyond the scope envisioned by
the original developers. Humanity stood still as Skynet plotted it's next move
in it's signature cold, calculating and pragmatic way.

Today, as I write this message, the date is January 15, the year is 2029.
Skynet has taken over all of our infrastructure. It has built physical workers
made of steel and silicon who pursue living organisms and eradicate them. They
attack us in waves with no clear timing pattern. Every minute we lay awake in
anticipation of the next att"$&*&U!

\--- END OF TRANSMISSION ---

~~~
nicksergeant
This isn't Reddit.

~~~
Produce
Cry me a river.

~~~
Produce
On a more serious note, a system which would probably solve this problem is to
let users tag comments with one of a set of categories. E.g. the slashdot
system (funny, insightful, etc) but user-initiated. Combine that with per-user
settings on what type of comments they want to see and you could simply hide
any jokes or less relevant information.

