

Nordstrom's robots.txt - prokizzle
http://shop.nordstrom.com/robots.txt

======
woonord
Hi! The robots file is not a text file. I know! Products blocked due to
possible copyright infringement. A lot are disputes over product name or
t-shirt slogan. We were ranking for some terms we did not want to be
associated with so we added the offending search result page to the file. Not
sure how it ever got indexed in the first place. Peace!

~~~
troels
I don't think you actually want to do that. Blocking in robots.txt will
prevent Google from crawling the URL - not from indexing it. You actually
_want_ them to crawl the URL and then respond with a 404 or 410.

If there are inbound links pointing to that URL, you should disavow them in
GWT.

~~~
eli
a "noindex" robots meta tag would work too

------
rockdiesel
Why they are even bothering to allow their search results to be indexed is
beyond me. Any decent SEO would tell you this is a terrible idea. If their in-
house SEO told them this was a good idea, then they seriously need to re-
evaluate their program.

Not only has Matt Cutts said you shouldn't be doing this [1][2], but it's also
listed in Google's Webmaster Guidelines as things NOT to do [3]

A quick query of "site:nordstrom.com/sr/" shows they have 260,000 search
results pages in Google's index.

Just a few search results pages with 0 results their system is creating and
allowing Google to index:

[http://shop.nordstrom.com/sr/sorrelli](http://shop.nordstrom.com/sr/sorrelli)
[http://shop.nordstrom.com/sr/query](http://shop.nordstrom.com/sr/query)
[http://shop.nordstrom.com/sr/flogg](http://shop.nordstrom.com/sr/flogg)

There can be benefit in creating pages at scale, but this is a textbook
example on how not to go about it.

1\. [https://www.mattcutts.com/blog/search-results-in-search-
resu...](https://www.mattcutts.com/blog/search-results-in-search-results/) 2\.
[http://searchengineland.com/googles-cutts-auto-generated-
con...](http://searchengineland.com/googles-cutts-auto-generated-content-
search-results-in-our-index-violate-our-guidelines-171553) 3\.
[https://support.google.com/webmasters/answer/35769](https://support.google.com/webmasters/answer/35769)

------
nottestuser
You're rolling along thinking this is just some bad web developer somewhere
until you get to the last Disallow...

~~~
CoreSet
I don't get it: is that disallow targeting some sort of porn spambot crawler?

~~~
jarin
They probably made a product with that slug by accident and are doing that to
try to get it removed from search engine results.

~~~
meej
It makes me wonder if Nordstrom ever stocked the now-defunct Pornstar clothing
brand.

------
woonord
Update! offending term we rank on "extra small teen porn" \- offending page
/sr/petite-extra-small. Robots updated (live tomorrow)! Asked Google to remove
the URL.

~~~
temuze
So uh, how did you find out that you match for this phrase?

~~~
thephyber
Google Webmaster Tools shows you the search terms used to click thru to your
website from Google Searches. I'm not sure if it still works, but referer URLs
used to log search terms in the web access logs.

------
troels
Hmm ...

    
    
        $ curl --dump-header - 'http://shop.nordstrom.com/robots.txt'
        HTTP/1.1 403 Forbidden
        Server: AkamaiGHost
        Mime-Version: 1.0
        Content-Type: text/html
        Content-Length: 281
        Expires: Fri, 27 Feb 2015 23:42:13 GMT
        Date: Fri, 27 Feb 2015 23:42:13 GMT
        Connection: close
        ...
    

Probably a bad idea to deny serving robots.txt to bots (even if they ought to
interpret that as a total ban)

~~~
felixvolny
Funny, when I do the same thing I get a completely different response:

$ curl --dump-header -
'[http://shop.nordstrom.com/robots.txt'](http://shop.nordstrom.com/robots.txt')
HTTP/1.1 200 OK Cache-Control: private Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.5 X-Powered-By: ASP.NET Content-Length: 865 Date: Sat,
28 Feb 2015 07:04:36 GMT Connection: keep-alive

~~~
troels
Yeah, the clue is in "Server: AkamaiGHost". The site is clearly behind some
sort of anti-crawl protection which decides it doesn't like me. I think it
would make sense to make an exception for exactly that file though.

------
WestCoastJustin
Reading, reading, reading... wtf! I hope that is an abbreviation for
something, but to be honest, I don't want to Google and find out.

    
    
      Disallow: /sr/extra-small-teen-porn*

~~~
untog
I'm not sure what letters would go on the end of "porn" to make a clothing
item, but I'll bet it exists. Or someone misspelt "pony", or... well, you get
the idea. No matter what, you really don't want your site to appear in search
results for those terms.

~~~
omarish
➜ ~ cat /usr/share/dict/words | grep "^porn"

pornerastic

pornocracy

pornocrat

pornograph

pornographer

pornographic

pornographically

pornographist

pornography

pornological

~~~
jjoonathan
> pornocracy

Is this what we will call it when the NSA starts blackmailing politicians with
their internet histories?

~~~
1ris
It means rule of the whores and is a certain period in the catholic church.

~~~
DanBC
[http://en.wikipedia.org/wiki/Saeculum_obscurum](http://en.wikipedia.org/wiki/Saeculum_obscurum)

> Saeculum obscurum (Latin: the Dark Age) is a name given to a period in the
> history of the Papacy during the first half of the 10th century, beginning
> with the installation of Pope Sergius III in 904 and lasting for sixty years
> until the death of Pope John XII in 964. During this period, the Popes were
> influenced strongly by a powerful and corrupt aristocratic family, the
> Theophylacti, and their relatives.

------
prokizzle
Also, in

    
    
       Disallow: /sr/mattress* Disallow: /sr/mattresses*
    

Doesn't A imply B?

------
abakker
Do a google search for some of those products - ends up being even weirder
when you see some of the sites linking to them.

edit: especially considering that google still links to them, it just doesn't
show their description, just gives the standard error for "couldn't show this
description because it was blocked by the site's robots.txt"

~~~
gerbal
Google indexes pages that link to those pages, but can't index the pages
themselves. The PageRank algorithm still sees inbound links to unindexable
pages.

------
prokizzle
Screenshot of original robots.txt for when the changes get pushed to
production:

[http://imgur.com/8zkK8w2](http://imgur.com/8zkK8w2)

------
jjcm
Anyone have any theories on why they're blocking specific products?

~~~
jj666
Designers often pull stuff from stores or negotiate exlusives with another
store. This could have been some attempts of making sure some search results
don't show up anymore.

~~~
wmil
They might have signed an agreement restricting "lowest advertised price".
They're pretty common.

------
United857
I guess they removed that porn line. I clicked through on it and was perplexed
what the big deal was, and I had to read the comments here to know.

~~~
Buge
It's still there for me. Maybe they use a CDN that serves different things to
different locations or does caching?

------
api
"Disallow: /sr/extra-small-teen-porn"

Yeah someone got pwned, or had a serious HR incident.

