

Does the Google Bot index css hidden divs?  yes. - tomaltman
http://tomaltman.com/does-the-google-bot-index-css-hidden-divs/

======
paulirish
He said he used `display:hidden` which isn't valid CSS (and doesn't hide
anything) Might be a typo but regardless I think this deserves more
investigation.

~~~
blauwbilgorgel
Google does index "display: none"/hidden divs:
[http://www.google.com/search?q=De+eerste+stap+in+het+generer...](http://www.google.com/search?q=De+eerste+stap+in+het+genereren+van+websitebezoeken+met+AdWords&pws=0)

    
    
      <div class="webinar-blurb" id="bot-164716" style="display: none;">
      De eerste stap in het genereren van websitebezoeken met AdWords is
      ...
      </div>

------
aam1r
I don't see why Google wouldn't index hidden it. As long as there is static
text on the page, Googlebot will see it and index it.

Has anyone seen any scenarios where there was static content on the page and
Google didn't index it?

~~~
delinka
This kind of article reminds me of bad PDF redaction and then the redactor
being surprised that people can still read the redaction.

You don't stick text where a computer can read it and then marvel that the
computer can read it. And if you want something truly hidden, you don't
publish it.

If you're worried about Google's index, they'll obey robots.txt.
Alternatively, customize your reply when you get the Googlebot user agent.

~~~
JonnieCache
_"Alternatively, customize your reply when you get the Googlebot user agent."_

WARNING: BE VERY CAREFUL DOING THIS!

This is one of the main ways that the googlebot detects spam/malware sites and
doing it can utterly destroy your ranking forever in some circumstances.

------
cooperadymas
Relevant references from Matt Cutts in regards to Google handling hidden
content:

<http://www.youtube.com/watch?v=UpK1VGJN4XY>

[http://www.stonetemple.com/articles/interview-matt-
cutts.sht...](http://www.stonetemple.com/articles/interview-matt-cutts.shtml)
(do a search for hidden)

------
illdave
It's true that Google is able to crawl and index content that's contained
within a hidden div. Obviously don't use this to hide a load of text on a page
to manipulate search results - you'll get caught if it wouldn't pass a manual
review - but it's useful to know this if you've got a legitimate reason to do
so (e.g. you've got a navigation that isn't revealed until a link is clicked,
or a box that displays the contents of the hidden dev when a "More Info" link
is clicked, etc).

------
dlikhten
to be honest this was fully expected. Most crawlers don't execute any JS or
CSS so fine, this one display: hidden, the next would be width: 1px; height:
1px; overflow: hidden; position: absolute; interpreting css rules without a
css engine is very hard, hence why honeypot fields work. Css engine is too
slow to crawl. So google does not interpret it.

------
gbaygon
Somethimes this is very useful, a client once ask me to make a website with
image headers for every page, with the title in the image. So i added a hidden
h1, containing the title, for better indexing.

~~~
jeremyswank
that works, but probably a better technique is to put in a normal h1 and
assign the image to background of that element. then use text-indent to thow
the display of the text off screen (as is commonly done). like this:

h1 { background-image: url(foo.png); text-indent: -999em; }

of course, you then have to give dimensions to the h1 to show the whole
picture, but i leave that as an exercise for the reader.

~~~
pavel_lishin
Why is this technique better?

~~~
blauwbilgorgel
It isn't better.

It is better to just do:

    
    
      <h1><img src="" alt="" width="" height="" /></h1>
    

Negative text-indent can cause accessibility problems. Scenario "CSS on-images
off" will see no content at all.

You can solve that by using a <span> overlay, with the image set as a
background:

    
    
      <h1>Heading<span class="overlay"></span></h1>
    

Then with the scenario "CSS on-Images off" people can read the underlying
text.

But you are still hiding stuff for search engines, and they might not all like
that.

Source for "don't go hiding stuff with CSS like -9999 pixels on the left, if
you can just use an image alt, that is what it is for" this Matt Cutts video
from 2009:

<http://www.youtube.com/watch?v=fBLvn_WkDJ4#t=0m13s>

BTW: The OP is confusing. Like Paul Irish said: There is a "display: none" and
a "visibility: hidden". A "display: hidden" isn't valid CSS.

~~~
Isofarro
"Negative text-indent can cause accessibility problems. Scenario "CSS on-
images off" will see no content at all."

What disability is overcome by leaving CSS on and turning images off?

~~~
blauwbilgorgel
To me, accessibility is all about providing proper access to information.

It doesn't matter to me if it is because a screenreader, a Lynx browser, a
preference, or a no-script plug-in. All that matters to me is my users can
access my content on a broad range of devices.

CSS on/Images off is an unlikely scenario, but it is a possible scenario (also
consider the possibility of images not loading for whatever reason). If I were
a user stuck in that scenario, I'd say: "I can't access core parts of this
website". For a website developer, you could call that an accessibility
problem, no?

Relevant thread on WebAIM:
<http://webaim.org/discussion/mail_thread?thread=3785> A little more cheeky:
<http://www.arespritesaccessible.net/explain.php> Wikipedia: Web accessibility
refers to the inclusive practice of making websites usable by people of all
abilities and disabilities.

~~~
Isofarro
"CSS on/Images off is an unlikely scenario, but it is a possible scenario
(also consider the possibility of images not loading for any reason). If I
were in a user stuck in that scenario, I'd say: I can't access core parts of
this website. For a website developer, you could call that an accessibility
problem, no?"

No. I'd call it a connection/network issue. The universality consideration
(which is what you are really advocating here) then points to progressive
enhancement -- defensive design -- as a technique to mitigate the failure to
load a page's supplementary assets.

------
alanh
If you want to include in your page source things you want to dynamically
display, such as error messages or tooltips or confirmation messages – without
having them show up in Google† – consider using the `<script type="text/html">
... HTML ... </script>` trick. Then you can grab the content via e.g.
`.innerHTML` and place it in your lightbox or use it for templating, etc. See
also: ICanHaz.js.

For the opposite – showing text to search engines and screen readers, but not
visual browsers' end-users – position the content off-screen.
[http://webaim.org/techniques/css/invisiblecontent/#technique...](http://webaim.org/techniques/css/invisiblecontent/#techniques)

 _† I haven’t explicitly tested that search engines ignore contents of script
tags w.r.t. indexing, but would be extremely surprised if the assumption was
shown to be false._

~~~
cemregr
AFAIK GoogleBot runs a minimum amount of Javascript on the page before
parsing.

------
kolbusa
This seems rather intuitive. Does the google bot even parse css? Is there any
evidence that it ignores non-displayed text present in HTML?

~~~
infinity
Googlebot and some other search engine crawlers also crawl css stylesheet
files, I have seen evidence in my server log files.

There are several patents describing how a search engine might analyse the
visual appearance and structure of a website, with a special view to give some
parts of the page content (the main content) a higher weight or priority then
the rest (like footer text and navigation). I would expect that the algorithms
used by Google will give a good impression about the page rendering. What
conclusions are drawn from this and how it affects the ranking of individual
pages for certain search queries, who knows, ... pretty much about this is
kept secret and there is a lot of speculation.

Many search engine patents are explored and explained by Bill Slawski on his
blog "SEO by the sea":

<http://www.seobythesea.com/>

That there are these patents does of course not tell us, that search engines
are working exactly in this or another way, but I think it often gives a good
impression about what might be going on behind the scenes.

~~~
blauwbilgorgel
Further reading:

    
    
      Techniques for approximating the visual layout of a web 
      page and determining the portion of the page containing 
      the significant content 

[http://appft1.uspto.gov/netacgi/nph-
Parser?Sect1=PTO2&Se...](http://appft1.uspto.gov/netacgi/nph-
Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-
adv.html&r=1&f=G&l=50&d=PG01&p=1&S1=20060149775&OS=20060149775&RS=20060149775)

    
    
      Automatic Visual Segmentation Of Webpages

[http://appft.uspto.gov/netacgi/nph-
Parser?Sect1=PTO2&Sec...](http://appft.uspto.gov/netacgi/nph-
Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-
adv.html&r=1&p=1&f=G&l=50&d=PG01&S1=20090177959.PGNR.&OS=dn/20090177959&RS=DN/20090177959)

    
    
      Techniques for approximating the visual layout of a web 
      page and determining the portion of the page containing 
      the significant content 

[http://appft1.uspto.gov/netacgi/nph-
Parser?Sect1=PTO2&Se...](http://appft1.uspto.gov/netacgi/nph-
Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-
adv.html&r=1&p=1&f=G&l=50&d=PG01&S1=20080033996.PGNR.&OS=dn/20080033996&RS=DN/20080033996).

    
    
      VIPS: a Vision-based Page Segmentation Algorithm

[http://research.microsoft.com/apps/pubs/default.aspx?id=7002...](http://research.microsoft.com/apps/pubs/default.aspx?id=70027)

    
    
      Classifying functions of web blocks based on linguistic features

[http://patft.uspto.gov/netacgi/nph-
Parser?Sect1=PTO2&Sec...](http://patft.uspto.gov/netacgi/nph-
Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-
adv.htm&r=1&p=1&f=G&l=50&d=PTXT&S1=7,895,148.PN.&OS=PN/7,895,148&RS=PN/7,895,148)

    
    
      How a Search Engine Might Identify the Functions of 
      Blocks in Web Pages to Improve Search Results

[http://www.seobythesea.com/2011/02/how-a-search-engine-
might...](http://www.seobythesea.com/2011/02/how-a-search-engine-might-
identify-the-functions-of-blocks-in-web-pages-to-improve-search-results/)

------
techscruggs
This isn't exactly news, is it? Isn't this the same tactic that
ExpertsExchange used to show up in search results while trying to hide the
content behind a paywall?

~~~
soult
Nope, they do not use hidden divs.

They only display the answers if a) you are Googlebot, coming from a Google IP
address, or b) when you HTTP referer contains google.com.

If it weren't for the latter, they would be banned by Google. Of course, they
still hide the answers all the way on the bottom of a very long page.

Kind of related: Ever since the advance of "open" sites like Quora and
StackOverflow I have not seen Expertsexchange[0] in my results anymore.

0: I still can't read that name without snickering.

~~~
natesm
There are a lot of sites that are just mirrors of SO questions now though,
which is also incredibly annoying.

------
arb99
Isn't this quite well known? There are loads of sites using display:none for
valid reasons

------
lean
Was there ever and doubt that it did? How could we excuse hiding/showing
"tabs", for example, if it didn't?

~~~
blauwbilgorgel
I always advise to show all tabs for users without javascript support.

    
    
      <div style="display: none">
        Hidden tab content
      </div>
    

Else noscript users would have to disable CSS to watch the above hidden tabs,
pretty inexcusable. The "correct" progressive enhancement way to do this, is
to show all the tabs, and use javascript to hide tabs from view from those
that support javascript.

Basically it has always been a bad idea to "hardcode" visibility for
tabs/divs.

For example visit:
[http://adwords.google.com/support/aw/bin/static.py?hl=nl&...](http://adwords.google.com/support/aw/bin/static.py?hl=nl&page=webinars.cs)
without javascript support. The content is hidden for everyone and pressing
expand won't do anything.

Compare with:
[http://www.google.com/support/webmasters/bin/answer.py?hl=en...](http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=185417)
. Even without javascript support you can still see the examples, because
those are only hidden if you have javascript support.

~~~
mike-cardwell
I like to use javascript to add a class of "with-js" to the body element
immediately after the body tag. Then you can just use css to style everything
dependent on javascript. For example:

    
    
      <body>
         <script type="text/javascript">document.body.className='with-js';</script>
    

Then in CSS you might do this if you want a certain element to disappear only
if javascript is enabled:

    
    
      body.with-js #foo { display:none; }

------
CJefferson
I'm sure it goes without saying, but google looks very badly on people who try
to trick it.

If this became a common method of hiding text from users, but giving it to
google, them I'm sure two things would happen:

1) Google would update their engine to detect hidden divs.

2) Google would penalise websites which abuse hidden divs.

~~~
jwdunne
It's not as cut and dry as that. Google does try to determine if a hidden
element is being used for wrong intentions but there are many valid uses of
this. I've seen them mention this many times.

A number one, perfectly valid usage is for dropdown menus. Naturally you want
them hidden until activated via hover.

There's also many, many other ways of hiding elements. You can make the
position of the element absolute and move it off the page from site. I think
you could also make any overflow hidden and then make element 1px in width and
height. There's also the infamous 'same foreground and background colours'
trick. The list can and does continue.

This isn't really considering JS which opens up even more avenues to hide
content either.

I think to truly have an idea of what's maliciously hidden, the fully rendered
page would have to be analysed. I think there are instances where this is
being done but this is edging towards theory and away from concrete facts.

