
In search of the perfect URL - owksley
http://www.ollysco.de/2015/09/in-search-of-perfect-url.html
======
mysterypie
Xvideos uses the same technique to good effect as well. All of their videos
links follow the format:

www xvideos
com/videoNNNNNNN/description_of_activity_which_can_be_easily_updated (NSFW)

I wanted to mention a mystery concerning Xvideos. Here's a business that is
very much in-your-face (i.e. it is not a defense contractor or an organization
that wants to be discreet), but its ownership is totally unknown.

I researched it. There are literally zero articles or information about who
owns it. No interviews with the founders. Nothing. I haven't been able to even
figure out what country it's based in.

Somewhere out there is a very rich person whose family and friends probably
don't realize that he founded a major Internet business.

Yes, a _major_ Internet business: they have several million videos (far more
than competing "tube" sites), hundreds or thousands of fast servers, and an
Alexa rank of 47 which is higher than imdb.com and only a couple steps below
microsoft.com.

But in this age of little privacy, they've managed to be super private.

~~~
icebraining
It was actually blown recently (Aug. 15th) thanks to a lawsuit:

 _Another infringement suit has been waged by the MetArt Network against a
well-known online adult brand. (...) This time around the target is adult tube
site XVideos.com and two related web properties (...) along with defendant
owners Stephane and Malorie Pacaud of France_

[http://www.xbiz.com/news/197942](http://www.xbiz.com/news/197942)

Of course, those names could just be covers.

------
lwf
It also means you can trick people:

[http://www.amazon.com/Intel-Quantum-Computing-
Module/dp/B001...](http://www.amazon.com/Intel-Quantum-Computing-
Module/dp/B00186WR92)

~~~
nmjohn
Amazons URL's are actually quite interesting -

    
    
        Original: 
            http://www.amazon.com/Structure-Interpretation-Computer-Programs-Engineering/dp/0262510871/
    
        Equivalent:
            http://www.amazon.com/dp/0262510871
            http://www.amazon.com/dp/0262510871/something-else
            http://www.amazon.com/something/dp/0262510871
            http://www.amazon.com/something/dp/0262510871/something-else
    

It appears so long as 'dp/0262510871' is in the url (without dp/# appearing
before it, but a second one after is fine) it works.

~~~
cpeterso
Or simply [http://amzn.com/0262510871](http://amzn.com/0262510871)

~~~
rawdisk
HTTP/1.1 301 Moved Permanently

This is a URL shortener that just redirects to the full URL that has same
number. Easier to type but otherwise acomplishes nothing. Server with the
content still needs full URL. All this shorter URL gets you is the full URL.

------
X-Istence
Stack overflow does this too:

[http://stackoverflow.com/questions/32672492/python-3-5-start...](http://stackoverflow.com/questions/32672492/python-3-5-startswith-
in-if-statement-not-working-as-intended)

Is the same as:

[http://stackoverflow.com/questions/32672492/](http://stackoverflow.com/questions/32672492/)

------
adventured
I find it interesting the author mentions making an effort to remove the
numeric ID from the URL.

I love using numeric IDs in the URL, for one specific reason: perma-short-
link.

[http://qz.com/365810/whats-missing-from-this-13-year-old-
gir...](http://qz.com/365810/whats-missing-from-this-13-year-old-girls-iphone-
home-screen/)

Becomes:

[http://qz.com/365810](http://qz.com/365810)

Which then redirects to the proper full url. Total effort: almost nil.

~~~
userbinator
Not only numeric but alphanumeric IDs; they also work as a nice shorthand in
communication. I've seen plenty of people referring to e.g. "video jI3i9Lq4BcX
on YouTube" on sites which would otherwise censor actual URLs.

------
franze
my battle-proven URL rules. important: rule 1 is more important then rule 2 to
6 added up, rule nr 2 is more important than rule 3 to 6 totaled, rule 3 is
more important than 4 to 6 together, rule 4 is more important than 5 + 6, rule
5 and rule 6 are a tradeoff (it's short, not shortest possible URL).

the targeted phrase is term(s) you want to get found for (i.e.: in google
search)

URL-Rule 1: unique (1 URL == 1 resource, 1 resource == 1 URL)

URL-Rule 2: permanent (they do not change, no dependencies to anything)

URL-Rule 3: manageable (measurable, 1 logic per site section, no complicated
exceptions, no exceptions)

URL-Rule 4: easily scalable logic

URL-Rule 5: short

URL-Rule 6: with a variation of the targeted phrase

most common mistake, rule 6 (least important) invalidates rule 1 (most
important)

i stand with these url-rules, evertime you compromise on them - or change the
priority in between the url-rules, you - your
company/startup/business/website/webapp - will regret it in the longterm.

about: >This is the sort of solution that I really like. The SEO folks can
fiddle with the URL until the cows come home, the engineers have the luxury of
a straightforward rule, and the user never sees a broken link. Is this simple
structure enough to keep everybody happy?

NO

every redirect has a cost:

\- server ressources

\- (web)performance a.k.a. speed

\- long term project costs: redirects needs to be maintained (they will not)
and documented (they are not)

\- added complexity (redirect complexity add up fast, more info see
[https://news.ycombinator.com/item?id=8891553](https://news.ycombinator.com/item?id=8891553)
)

~~~
lwf
> every redirect has a cost:

If you are actually just keying your content lookup on the ID and don't
redirect the user, what's the performance problem?

And use rel=canonical so search engines do the right thing.

~~~
franze
no

simplified google works like this

discovery (queue) -(quality check)-> crawling(optional) -QC-> indexing

google does not "follow" canonicals, but whenever google discovers (during
crawling) a canonical it pushes it back to the discovery queue -> needs to
crawl again -> needs to figure out indexing

canonical is an indexing directive

so basically there are two quality checks before google can actually apply the
indexing directive after it has discovered the canonical during crawling. also
you can never be sure when - if ever - it will fetch the canonical URL or
choose to canonical it.

for small sites this is not a big an issue (you will have internal duplicate
pages for google for an unknown amount of time, but at one point they will
probably be canoncalized). for big sites with millions and millions of URLs
this is a big issue. basically in your example is the worst case: URL rule 6
(least important) breaks rule nr 1. then why do it at all

additionally to communicate different URLs to the users (based on the way
which they came to your site) which is just bad UX.

don't do it.

------
ckluis
I like this solution.

Essentially qz.com/122345/{anything-here} will redirect to the canonical url
allowing for experimentation on the title of articles and urls.

------
thephyber
I thought this was fairly common knowledge.

Using a DB PKID is a faster lookup than a text slug and uses much less storage
space in the DB.

For SEO / URL permanence reasons, the PKID is always the authoritative key
while the slug can be updated to represent the current content of the URL.

------
jjsewell-ff
When building content management systems, we've taken a approach similar to
this to keep URLs constant when names of articles, posts, objects might get
changed by an site admin. The first time I noticed this approach was Trello.

Here's an example trello URL: [https://trello.com/x/1234567/203-make-the-
buttons-bigger](https://trello.com/x/1234567/203-make-the-buttons-bigger)

If you change the name of the card, the ID (203) stays the same, but the
friendly part of the URL stays the same. When directing you to the card, the
system doesn't care past the ID.

------
giancarlostoro
Interestingly enough I think I tried the same thing when I saw a link from the
same site. It is indeed a great workaround to the changing URL's dilemma.

------
ambirex
We have reversed it to be example.com/seo-go-nuts/%d/ to bring the text closer
together.

~~~
Walkman
The problem with that if the user chop off the last bits (e.g. Pasting in
simewhere where it cannot fit) the id lost and you can't look it up. it
uappens more than you would think. It's important to have it early.

~~~
eli
We use this scheme as well. If only the last part with the ID is cut off and
you keep the slug text unique, you can still redirect to the correct article.

~~~
Walkman
Then you don't need the ID at all :) because you use the slug

~~~
eli
I want the slug to be able to change and I'd prefer not to have to keep track
of every variation ever assigned to that piece of content.

~~~
Walkman
if you go with /id/slug, you can redirect anything that is not exactly the
same as the current, so any older links would still work because no matter
what the slug was, you can redirect because the ID doesn't change.

~~~
eli
right, same as /slug/id

~~~
Walkman
no :D /id/slug is safer

