
How do search engines treat trailing slashes and capital letters in URLs? - illdave
http://www.propellernet.co.uk/search-engines-treat-trailing-slashes-capital-letters-urls/
======
seanwilson
> There were four test URLs in total, two of which tested for how search
> engines deal with unique content

...

> There is one unusual thing, however – a site: search brings up the lowercase
> URL, but the uppercase URL is filtered out for being too similar to the
> other displayed URLs and isn’t shown unless the ‘repeat the search with the
> omitted results included’ link is clicked.

Maybe the content isn't unique enough so Google's duplicate detection
algorithm marks them as duplicate?

The two examples contain similar keywords and don't seem to have any outgoing
links. A human would probably flag them as spun articles so I wouldn't be
surprised if Google did the same.

Maybe the results would be different if each link was a unique high quality
article instead.

~~~
gowld
Yes, Google doesn't say the URL is similar, it says the "result" is similar.

------
joekrill
> Monitoring the server logs showed that Bingbot only crawled the lowercase
> version of the URL.

I've always thought it was much more common that sites written in ASP.Net have
case-sensitive URLs. At least that was quite common ~5 years ago (was it a
default setting or something? I haven't done .Net stuff in a while). So it's
pretty crazy that Bing only crawls lowercased URLs.

~~~
gowld
That's overstating the "case" (pun!). Ignoring or normalizing case is
different from ignoring a non-redundant URL with uppercase.

But it might cause problems if the IIS server isn't properly case-preserving
to normalize back to the standard (possibly capitalized) form.

Mac filesystem was nicely case-preserving but case-agnostic in this way, going
back decades.

~~~
saagarjha
It’s still like this by default, which can cause issues when working with
folders created on other platforms.

------
JeanMarcS
For years (if not decades) case haven’t been important in file names nor URL
in the Microsoft world (using web server on a MS computer will result as this
exact experience concerning letter case. An Image.jpg or IMAGE.JPG will only
show one of the two images).

Could this be an extension of that ? Looks like.

~~~
toolslive
BUT, I've seen case sensitive interpretation of email addresses on a microsoft
mail server. (no kidding)

~~~
sethammons
[http://www.faqs.org/rfcs/rfc2821.html](http://www.faqs.org/rfcs/rfc2821.html)

SMTP RFC 2821 says that the local part of the email address is case sensitive.
Some ignore this and consider upper and lower case the same.

Section 2.4:

> The local-part of a mailbox MUST BE treated as case sensitive. Therefore,
> SMTP implementations MUST take care to preserve the case of mailbox local-
> parts. Mailbox domains are not case sensitive.

~~~
toolslive
same section, same RFC:

    
    
        > However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged.

~~~
samatman
These are of course not in conflict.

What it means is, a mailserver _should_ resolve local part case-insensitively,
so that Bob@example.com and bob@example.com end up in the same mailbox.

But, to be spec-compliant, a mailserver _MUST_ send on an email addressed to
AlIcE@example.net to AlIcE@example.net, exactly, without downcasing it to
alice@example.net.

I'd imagine this is often honored in the breach, but there you have it.

~~~
gowld
"mail server" is ambiguous. SMTP Simple Mail Transfer Protocol is about
transferring mail between domains, which must preserve the local part.

An MDA Mail Delivery Agent or LDA Local Delivery Agent _inside_ a domain can
choose whether to be case sensitive, but is discouraged from doing so.

The same issue exists for dots '.' in local part.

------
Doctor_Fegg
The "Kihlepa" pages linked from the footer are quite something.

------
gumby
Would have been interesting to see how case differences in the host name
portion of the URL were treated. Domain names are case insensitive — would
google search both [https://foo.example.com/bar](https://foo.example.com/bar)
and [https://foo.Example.com/bar](https://foo.Example.com/bar)? It should not.

------
gowld
One case where this poses a problem is a dictionary site that supports
acronyms or proper nouns.

------
wccrawford
tl;dr - Google crawls both, but site: searches hide one of them as similar
results. Bing doesn't even bother crawling the other versions.

