

Jeff Atwood - The Sitemap Paradox - tzury
http://webmasters.stackexchange.com/questions/4803/the-sitemap-paradox

======
staunch
Maybe I'm missing something, but in <http://stackoverflow.com/robots.txt> I
see:

    
    
        Sitemap: /sitemap.xml
    

But <http://stackoverflow.com/sitemap.xml> is a 404.

Perhaps he submitted a different sitemap URL to Google directly. Maybe that's
the problem though?

All the examples on <http://www.sitemaps.org/protocol.php#submit_robots> use a
fully qualified URL not an absolute URL relative to the domain. I've always
used that.

~~~
carson
They are probably blocking everyone except for Google from grabbing the file.

~~~
nostromo
It still 404s with a GoogleBot user agent.

~~~
staunch
You can do a PTR/A lookup to check if an IP is legit Googlebot or not though.
It's possible they're doing that.

~~~
codinghorror
correct, we are doing this

------
WillyF
It makes sense that links that are only found in the sitemap don't get
indexed. Unless you link to the sitemap somewhere on your site (not including
in robots.txt), then you're not going to pass any PageRank to the sitemap, and
it's not going to pass any PageRank to the pages it's linked to.

Being crawled and being indexed are two different things. Sitemaps allow
Googlebot to crawl your site more easily. What gets indexed has a lot to do
with PageRank, and if you're not flowing PageRanks efficiently through the
site, you're going to have indexation problems.

Here's an interesting post on how sitemaps affect crawlers:
<http://www.seomoz.org/blog/do-sitemaps-effect-crawlers>

~~~
mattmanser
PR isn't the only thing in the world. Google regularly state that a large % of
their searches are unique.

If you've got the only site in the entire world that talks about mutant killer
spider monkeys it should come up top if someone searches for that term,
regardless of the page's PR.

After all, that site's the only one in the world with the foresight to predict
the coming apocalypse.

------
nowarninglabel
Pfft, _that guy_ only has a 25% accept rate.

~~~
tav
What is the 25% a reference to? Obviously missing some kind of in-joke... ?

And for those of you like me who might wonder what an "accept rate" is, it's
the percentage of accepted answers to questions asked by a given user on Stack
Overflow — [http://blog.stackoverflow.com/2009/08/new-question-asker-
fea...](http://blog.stackoverflow.com/2009/08/new-question-asker-features/)

------
keltex
In my experience, sitemaps aren't for URL discovery, but more for URL
prioritization. Google will pretty much crawl you whole site whether or not
you include a sitemap. Where the sitemap becomes important is whether google
includes in the supplementary index or the main index. With a sitemap, you can
specify a priority for each page and basically hint to google that some pages
are more important than others.

<http://sitemaps.org/protocol.php#prioritydef>

~~~
codinghorror
the consensus is that sitemap.xml works best for rapid discovery of new
content -- it's not very good for discovery of deep content the crawler can't
get to because of the aforementioned paradox (that is, if Google can't see you
linking to your own page, it is disinclined to let the sitemap link matter)

------
simpleenigma
I've been working on an e-commerce project for a client that ended up having
some major issues with database imports and the client wanted results now.
Anyway, we got all of the items online and in Google with sitemaps, but the
links were orphans ... You could find them in the built in site search and
from a Google search, but not through any link path ...

After about 2 or 3 months like this we started to see the traffic going down
and the item pages were getting de-indexed.

Since we fixed the import (and internal politics) problem (about 6 months
after de-indexing) we have seen a steady increase in traffic again and the
number of indexed pages is going up slowly ... very slowly ...

My take on this experience is that you need to have some sort of click-able
link path to get to your content or any gains you might get from a sitemap
will be taken away ... Sitemaps might get your pages crawled faster, they may
even get into the index faster .. .but to keep them there you need good site
structure ...

It is a tool and won't fix design problems ...

~~~
codinghorror
this is consistent with our experience as well

------
phpnode
This is why you need 2 types of sitemap, the HTML kind which can pass
pagerank, and the XML kind which can denote page priority.

------
jdbeast00
we make use of google site search for our search and sitemaps are the vehicle
by which we can ensure a new site of ours gets indexed within 24 hours (we
have a lot of small, frequently changing sites)

