
GET, POST, and safely surfacing more of the web - Garbage
http://googlewebmastercentral.blogspot.com/2011/11/get-post-and-safely-surfacing-more-of.html
======
franze
ok, i'm highlighting two awesome pieces of information in that specific
blogpost. both of them aren't new but this is the first time they were ever
officially confirmed by google.

    
    
      Google begins indexing yummy-sundae.html and, as a part of 
      this process, decides to attempt to render the page to  
      better understand its content and/or generate the Instant 
      Preview.
    

this is the first time ever that google confirms that they are internally
rendering the page - not only for preview purposes - but also for indexing
purposes.

    
    
      >Remember to include important content (i.e., the content you’d like 
      >indexed) as text, visible directly on the page and without 
      >requiring user-action to display. Most search engines are text-
      >based and generally work best with text-based content. We’re always 
      >improving our ability to crawl and index content published in a 
      >variety of ways, but it remains a good practice to use text for 
      >important information.
    

first time ever they advise - directly - against "user action dependent hidden
content"

~~~
antifuchs
One more nice piece of infomration (I'm sure it was out there before, but the
consequences are now more interesting): If .js files referenced in your page
are disallowed by robots.txt, googlebot won't load them.

------
andybak
This strongly supports this: <http://news.ycombinator.com/item?id=3182579>

I wonder if another implication of Google actually rendering pages is that SEO
is going to have to focus much more heavily on visible page content rather
than hidden or quietly shuffled out-of-sight content.

~~~
franze
well, there is the implication that we will see HN posts a la "Google deleted
my content!"

google says they will only post POST request that are automatically triggered
onpageload and will have no other side effects other than (fetching & )
rendering content.

ok, lets say google does 1000 000 000 POST requests and gets it 99.999% right.
this means that there will be 1 000 000 occurrences where google actually will
trigger some unintended side-effects.

"At scale everything breaks" (c) Urs Hölzle
[http://www.zdnet.co.uk/news/cloud/2011/06/22/google-at-
scale...](http://www.zdnet.co.uk/news/cloud/2011/06/22/google-at-scale-
everything-breaks-40093061/)

~~~
rwmj
If you've got a website and look at the logs, you'll notice dozens of badly
behaved bots, spam farms etc that follow any and every get or post link, fill
in forms with spam and submit them, and ignore robots.txt. If your site is
deleting comments based on an unauthenticated POST, you've already got a
broken site.

------
martian
This seems relevant (from HN a couple weeks ago):

[http://www.thumbtack.com/engineering/googlebot-makes-post-
re...](http://www.thumbtack.com/engineering/googlebot-makes-post-requests-via-
ajax/)

