
What Every Developer Should Know About URLs - fogus
http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/?
======
rbritton
The // part isn't required for every scheme as the article implies. For
example, tel:5555555555 is perfectly acceptable.

The two slashes are even now considered a bad design decision originally:
<http://news.bbc.co.uk/2/hi/8306631.stm>

~~~
thwarted
Yeah, the // isn't part of the scheme, it's part of the authority.

That BBC article doesn't say _why_ "net users" find them annoying. I like the
double slashes (or at least some unique separator of the scheme from the
hostname and designates where the hostname starts), since it allows building
of useful relative URLs, as mentioned in the OA. As an example, not covered in
the examples in the article, you don't need to serve different CSS files for
secure and insecure content when you serve media assets from another domain
which is also available via both http and https.

The CSS

    
    
       background-image: url(//media.example.com/image.png)
    

can be used on both HTTP and HTTPS served pages and the browser will resolve
that relative URL by filling in the protocol from the base document. Without
the double slashes, you wouldn't be able to distinguish between a relative
path and a relative URL. If I remember correctly, a scheme change on the same
hostname but with a different path like:

    
    
      base document:  http://example.com/some/path
      relative URL:   https:/some/other/path
    

is possible too (I wonder how the parsing should work with port numbers, if
they can be relative too -- I have not read the RFC in a while, and it's such
a rare thing to use port numbers anyway).

Double slashes (well, backslashes) is how Microsoft/CIFS originally designated
server names in UNC paths, which I think may have been around before URLs were
standardized (don't quote me on that, they're most likely roughly the same age
and influenced each other). This is also why the file: scheme "requires"
_three_ leading slashes, as the "host" is empty to designate the local machine
-- but you could put in a hostname to access network shares (I put "requires"
in quotes because file has always had some ambiguities in the parsing
implementations).

I find it annoying when people read addresses and call them "backslashes".
Talk about wasting time and energy, that's a whole additional syllable said
for every path component in a URL!

~~~
pak
These tricks by combining parts of URLs to do relative linking on scheme,
host, etc. are little known but useful; I was expecting them to show up in the
article.

Going back to the debate over Chrome's potential dropping of "<http://> in the
address bar, if they were to use "//" instead they would have an argument for
technical correctness because the default protocol of "http:" could be
assumed. But having no leading "//" visually confuses it with a relative path
omitting the host, because it breaks the signifier for the authority component
of the URL. Just a thought.

------
almost
Are there any web developers here who didn't know these things? Are there any
developers of any type who didn't? I'm not sure who the audience for this
article is meant to be but I'm guessing you won't find too many of them here.

~~~
daleharvey
I never knew about params, I know a few people who didnt know things like
fragments dont get sent to the server, everyone gets encoding wrong at some
point, and I know a lot of people dont really know how the base tags work

Cheers for the article, handy quick reference.

~~~
locopati
I didn't either - in path params seems like an interesting way to address
issues that can come up when devising RESTful URL schemes (rather than relying
on query params only).

~~~
blasdel
For fucks sake, there is _no such thing_ as a RESTful URL scheme. 'Pretty'
URLs are a major anti-pattern -- they not only don't make you any more
RESTful, they actively undermine HATEOAS, which is the true signifier of
RESTfulness.

Actual REST treats URL strings (including query parameters) as being
completely opaque implementation details. The server is supposed to respond
with URLs in the hypertext -- you're _never_ supposed to be formatting them
yourself client-side using out of band knowledge. Query parameters are no
exception to that: if you want the client to pass them, give the client a form
in the response.

If you're expecting a client to munge together "path components" based on
foreknowledge of your data model, _you're doing it wrong_.

~~~
WesleyJohnson
Any chance you have articles to back this up? Not that I doubt you, but I've
never really understood or looked into all this REST business and if I'm ever
going to do so, I'd like to learn what being RESTful really means --- and not
just jumping on what sounds like a bandwagon for 'Pretty URLs' as you
mentioned. :)

~~~
adoyle
<http://tech.groups.yahoo.com/group/rest-discuss/>

------
TorKlingberg
No mention of non-ASCII characters at all? Punycode in the host name may still
be uncommon, but passing non-ASCII in the query is important. There is also
nothing about the encoding of space as + rather than %20 that happens a lot.

~~~
throw_away
note, however, that encoding a space as a + is supposed to happen only in the
query portion of the url, not the path.

------
mclin
Did this guy just paraphrase the RFC? It's a lot of work, but if it gets you
on HN...

~~~
nostrademons
I suspect that there're a lot of RFCs and W3C specs that could be paraphrased
and get you on HN.

How many web developers _actually_ know HTML, for example? In a few
discussions here and on Reddit, it seemed like well over 80% did not realize
that <!doctype html> is a valid doctype, or that you do not need to close many
common tags.

~~~
blasdel
I place all the blame for the cargo-cult validation-seeking know-nothing
standardistas squarely on the shoulders of Jeffery Zeldman.

If it wasn't for his ignorant boosterism, we probably could have euthanized
XHTML a long time ago.

~~~
blaix
If it wasn't for his boosterism, we'd probably still be writing HTML for
specific browsers and versions.

------
ars
For extra credit explain how to encode an IPv6 address in a URL without
getting it mixed up with the port.

~~~
moeffju
By putting it inside of square brackets.

[2001:db8::a00:20ff:fea7:ccea]:80

------
tomlin
In general, I am not a big fan of the father-knows-best narrative "What Every
Developer _Should_ Know".

Thanks, pa. All I needed from you was an "atta-boy!". _sigh_

