Hacker News new | comments | show | ask | jobs | submit login

Is it possible that people are looking at the page from Google's cache? I'm thinking the 3taps kind of 'web site scraping that doesn't look like web site scraping'

Hmm, that's interesting. I don't think so, though, because the user-agent on the requests is the googlebot:

    From: googlebot(at)googlebot.com
    User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Well an interesting check would be to look at one of your pages in the cache that fires an AJAX call and see where that call comes from. I agree it would be 'weird' if it came from Googlebot instead of the browser looking at the cache.

At Blekko we post process extracted pages of the crawl which, if they were putting content behind js could result in js calls offset by the initial access but 3 days seems like a long time. Mostly though the js is just page animation.

Would it make sense that loading from the cache makes a call to the origin server?

I just checked one of my sites which loads available delivery dates via ajax through the google cache, and yep, it caches that as the dates are when the cache was taken.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact