Hacker News new | past | comments | ask | show | jobs | submit login
Berners-Lee 'Sorry' for Slashes (2009) (bbc.co.uk)
47 points by lelf 5 days ago | hide | past | favorite | 86 comments

I don't think it mattered so much for print, but in the early days of the internet you would hear people on the TV laboriously reading out "haich tee tee pee colon slash slash double-u double-u double-u dot contoso dot com".

This is why a friend of mine has the email address "dot at dot at dot at". Yes, that's uniquely parseable back to one valid rfc822 address that works.

I have occasionally wondered about programming language syntax that might take reading aloud into account. Puncutation ends up sounding like Victor Borge: https://www.youtube.com/watch?v=Qf_TDuhk3No and of course Python's indentation is very at risk of ambiguity.

After years of people mistyping "jasonmill" for "jasomill" in my (originally university-assigned) email address, I registered jasomill.at so I could receive email as "jasomilldot", because I guess turning the simple act of giving out my email address over the phone into an Abbott and Costello routine sounded like a good idea at the time.

Oh, and you don't need special syntax to write code to be read aloud, to wit,


> turning the simple act of giving out my email address over the phone into an Abbott and Costello routine sounded like a good idea at the time

Hah! I've been dealing with that for a few years now -- my email is most of my username and spelling it out is fun for neither party.

I've considered getting the shortest, most phonetic-friendly email possible specifically for phone calls; something like a3gx@gmail.com but also want to self-host. It's a hassle!

I have an address specifically for that—it's just one letter and a few digits, redirecting to the main account. Works like a charm. Especially when the native alphabet in the country is not Latin and laboriously explaining which letter is which is a meme from the phone calls era. Whereas I can just call digits by the names in the local language, and everyone is familiar with hearing them.

Meanwhile, with my main-ish addresses, I've been told that they look like a random jumble of letters or a password. And once I've given one of them over the phone, took ten minutes easily—with confusion as to even how many letters are there.

With my domain name, I've been using a one-letter address for humans but have found it often confuses people; it doesn't fit the pattern they expect. So, I've been thinking of switching to domain@domain.tld, which is still obviously "special" but matches the regular pattern better (and I don't have to spell out <domain> twice).

A family member has firstname@firstnamelastname.com, which is super nice, but unfortunately my name and most variants are taken.

And Sharon still denies she wrote Black Perl. (It was Larry.)

My GP is not very tech savvy and any time we call the surgery there is a lengthy unskippable message about Covid with the receptionist reading out https://111.nhs.uk/covid-19 down to the last colon.

Does it say 'slash' or 'forward slash'? Pet peeve of mine...

Fun to help someone via phone map a Windows share who doesn't know the difference/'what's a backslash?'

If you don’t know, how are you supposed to know whether forward slash has to have it’s upper or lower end forwards?

Or if you do know that, how do you know if "forwards" is to the left or right?

It's called a virgule if you want to stop the forward slash confusion. Of course nobody will know what you're talking about.

I recollect something similar. I used to have an email id that was: at . 80 @ atdot . at

Fun times reading it out!

> "haich tee tee pee colon slash slash double-u double-u double-u dot contoso dot com"

"WWW" is one of rare acronyms where it's easier to pronounce the full phrase. :)

Why did "triple-double-u" not catch on?

dub dub dub

+1 just for the Victor Borge reference. Maybe his first ever appearance on HN? Who knows?

.. which is also me :)

Reminds me of "E-mail Addresses It Would Be Really Annoying to Give Out Over the Phone"


One could e.g. read lines at a loudness proportional to their indentation.

I imagine it would help to make code less indented should the author be required to read their work aloud.

"By 'rockstar' we really did mean 'somebody who could shout really loudly', sorry"

Victor Borge's solution for any nebulousness around punctuation for Alexa and other voice-activated devices is sheer genius! ...And, so funny and cute! Having seen this little bit reminds me that I have not seen/enjoyed nearly enough Victor Borge material! Thanks for sharing this video!!!

Common Lisp symbols and naming convention takes reading aloud into account well.

Yes, the naming convention does but the source code itself does not lend itself to it greatly with all the parentheses.

     open paren sum open paren tick one two three close paren close paren

That's not how you say it.

How do you say it?

You elide the parenthesis. They are as unnecessary when speaking Lisp code aloud as saying "comma", "period", or "question mark" are in everyday conversation.

9/10 times they would really mess it up and just go "whatever if says on the screen, go there" or something similar :)

My only hardship with this is about half the population doesn't know the difference between a slash and a backslash, which is probably not their fault because they wouldn't have encountered them outside of typesetting or computing. Which is fine until you hear the TV person above do it wrong. Luckily most browsers figure it out.

I like to tell people slash is the one you put in a date.

The only place a non-programmer user would encounter a backslash is in Windows (nee DOS) file paths, right? Yet another thing Microsoft has made worse in the world ;)

Can we talk about gboard, Google's default keyboard for Android for a moment?

On my phone at least, the backslash has an clearly and easily accessible shortcut (long press 'w'). Meanwhile, to enter a forward slash, one has to not only tap into the special symbols page, but actually tap again into the SECOND PAGE of that.

Does anyone have any insight as to just what these galaxy brains at Google are thinking?

Hashtag include open-angle-bracket ess tee dee eye oh dot aitch close-angle-bracket...

I think the greater sin by far was orienting domain names the way that they are:

com.google.www becomes tab-completable from the most generic element to the least generic. www.google.com is .. not like that.

Academic networks in the UK[0] actually got this the right way round from the start, but DNS won the battle in the mid 90s.

[0] https://en.wikipedia.org/wiki/JANET_NRS

This is just a guess, but reading RFC 819 (https://tools.ietf.org/rfc/rfc819.txt), which transitioned ARPANET to a hierarchical naming scheme and also predates DNS, the little-endian notation might be a simple artifact of ARPANET's e-mail addressing. E-mail addressing was already user@host, which is basically little-endian. It would be consistent to extend it like user@host.site.network. JANET also used @ notation, but used big-endian notation for the domain, which seems inconsistent, at least from a user's perspective.

Wikipedia says JANET's e-mail addressing notation was defined by the "Grey Book", and this Usenet thread, http://neil.franklin.ch/Usenet/alt.folklore.computers/200209..., says the Grey Book domain notation comes from the Network Independent File Transfer Protocol (NIFTP aka "Blue Book", which was a different protocol from ARPANET's RFC 354 FTP). This 1990 JANET<->ARPANET e-mail gateway document, http://dotat.at/tmp/JANET-Mail-Gateways.pdf, says that JANET e-mail was transferred using NIFTP, so it would make sense that the domain part of the e-mail address would use NIFTP rules. Both above sources say (explicitly or impliedly) that JANET generally, and NIFTP specifically, were based on X.25, and X.25 uses big-endian addressing.

So on JANET the hierarchical naming scheme predated the e-mail addressing scheme[1], whereas on ARPANET the reverse is true. Both formats make sense as path dependent outcomes.

[1] Presumably JANET still adopted user@ because the message format was based on RFC 822, according to that gateway document above, but it was still worth partially deviating from RFC 822, which explicitly defines little-endian domain syntax, because of JANET's pre-existing host addressing scheme.

I am still waiting for an apology for mathematics and base 10 numbers being written the wrong way round logically!

Same with SQL. I am giving SQL trainings this year, and I have to explain why the server will read your query in a completely different order than how you write it.

I'm still waiting for an apology for base 10. We should be using base 12 or dozenal.


Numbers are written correctly, most significant digit on the left. SQL is mostly correct according that logic, first `join` then `where` then `order by`. There are inconsistencies though.

You are used to see the most significant digit on the left, but really we have to right align a column of number in order for them to be readable, whereas everything else we read is left aligned. And you don't know what this significant number corresponds to (thousands, millions, billions?) until you have read all the other numbers (if you read left to right).

As for SQL, you write

SELECT TOP 10 ColName FROM TableName WHERE X = 10 ORDER BY ColName

and the server reads

FROM TableName WHERE X = 10 SELECT ColName ORDER BY ColName TOP 10

It is not "mostly" the right order.

And right to left assignment for mathematics or programming.

I suppose an enterprising browser hacker could implement this such that the browser could accept URLs in this format and reorient them when it makes the requests. Even paths would be tab-completable from browser history, would they not? I would love this in my browser.

That sounds so strange. I wonder if that is because I am used to the www. or because ending urls in .org or .com sounds better (pre-invention)

Probably for the same reason nobody wrote


DC, Washington

Pennsylvania Ave, 1600

Office of the President

President Donald Trump

That's actually how addresses are written in Chinese -- most general to most specific. Super disorienting when you're learning it as an English native speaker.

I think he can be forgiven for not anticipating the cost of a couple slashes. It's probably not as expensive as the addition of 'null' to ALGOL, and that was actually intended to be very widely adopted!

I wrote my MS dissertation under TimBL at MIT. I was into semantic web technologies in the late aughts and chatted with him few times about the double-slash and web nomenclatures etc.

I think this article either exaggerates or misstates his intent. IIRC, his thinking was the double slash was just a continuation of the path operator in *nix and would represent the "hyper" in hypertext.

Don't want to get too far off topic but I'm curious what are your thoughts on Solid?

It’s true the semantic web was a far bigger mistake, although he doesn’t seem to be sorry for that yet.

I really don't think the semantic web was a "mistake" at all. And as much as Sir Tim would hate me for saying this, it was (is) still AI - more correctly Symbolic AI - to collaborate at web-scale.

Mind you, back then AI was a dirty word, even at MIT, and we had to couch it as "cognitive computing","distributed intelligence" etc.

It is a laudable effort and I still love me some semweb technologies.

If there's anything to "apologize" for with URLs, it would be the use of the ampersand character for separating query parameters, as these are interpreted as SGML ERO (entity reference open) character in SGML default concrete syntax, making SGML parsers see an entity reference in links such as http://bla.bla/doc?param=value&otherparam=othervalue. XML doesn't help here as it rejects the whole string as attribute value. This is also IMHO an oversight in the XML and WebSGML specs (the latter allowing to circumvent the issue via so-called data attributes).

URL's were supposed to be universal, not limited to any particular media format. The ampersand have to be escaped as &amp; in SGML/HTML/XML. Other formats will need other escapes - no characters are "safe" in all formats.

there were attempts to use ; instead, many query parsers in standard libraries still split parameters pairs on semicolon. But who cares if browsers still generate & in GET forms

I'm not so negative. The double slash provides a significant visual differentiator that a single colon would not. It may "waste" paper but it saves time.

In what way does it save time?

Not grandparent but it saves time the same way syntax highlighting saves time. You simply recognize the difference faster.

With http://example.com it's quicker to distinguish the protocol (http) and the domain name (example.com).

Compare that to http:example.com and it might take a bit longer at first glance (to some people), because they read over the : and then need to do a quick linear scan before they spot the : and are then able to distinguish http vs example.com

Given that one sees a lot of domain names, I'd say it'd save a few hours of everyone's life in the aggregate.

But humans rarely need to parse URL's into components but often need to read them aloud or type them. The extra characters make this slower and more error prone.

For most user interaction purposes the URL is just an opaque string. Only the browser need to actually parse the URL.

I don't think it's been common for most people to need to directly type or read URLs aloud for a long time.

The article is more than a decade old.

Should have been a ligature for it, instead of the actual ://.

It's often mentioned that open source has '1000 eyes' to correct and improve things.

The web has many more than that, I'm sure he can be forgiven for not anticipating every scrutiny about URIs/HTTP/HTML.

Seems like a small nitpick, sites that don't optimise their images or compress text content over the wire puts the space savings from no // into the shade

I don't think size over the wire is the concern. The annoyance is URL's in print or read aloud and typed manually.

Browsers have since alleviated this by adding "http://" automatically when you type a domain name.

Note that space in the sense of bytes is not mentioned in the article. Only user-visible things such as space on paper.

Fair point, especially about the browser. Could be argued that mainstream browser's behaviour is what translated to how people offline communicated URLs. If they assumed http for a URL that was without protocol, then perhaps less people would've felt the need to include the protocol on paper.

Given the standard RFC, the two initial slashes as in scheme:// are critical in any URL when you need to be able to distinguish between the authority and path component, as in scheme://authority/path as in http://server/file.txt. Otherwise it would be impossible to know when the server part finished and the path part began (since the authority or server part is always between the second and third slashes). Given the article, I think we’re very lucky to have ended up with this design. But I suppose we could have also arrived at a syntax where the path begins with the second slash (e.g. scheme:/server/path). In fact scheme:/path is valid syntax and is simply the contracted URL form of scheme:///path so at least by today’s RFC definition, scheme:/server/path wouldn’t work since in this contracted form, the path begins with the very first slash and that ‘server’ bit wouldn’t be a server at all but also part of the path.

I wonder if slashdot would’ve taken off had it been called “colondot” instead.

However, scheme relative URLs (i.e. //example.com/thing.jpg) are useful in edge cases where you want to request assets using the same protocol as the document, are they not?

In an alternative future where the double slashes didn't exist we would just use the colon.


What would :443/thing.jpg mean? Is that a port or hostname?

Also, don’t forget user/pass prompt in a URL.

There is no ambiguity.

http:username:password@example.com/thing.jpg http:username:password@example.com:8080@example.com/thing.jpg

But they might have picked a different delimiter for that in this alternative future. Or perhaps realized it was a bad idea earlier.

IPv6 addresses are a different story. We definitely would have chosen a different character to delimit IPv6 addresses in this future.

A port. RFC 1912 states "Labels may not be all numbers, but may have a leading digit (e.g., 3com.com)."

FWIW I had googled whether hostnames may be entirely digits before asking this, and this SO answer suggested yes. Perhaps that doesn’t apply to URLs? But to be clear I meant hostname not a FQDN, because obviously there is no ambiguity once a dot is present.


But without the double slash, how do you differentiate between:




I admit it is hard to grok in the beginning, but all you need to tell non technical people is "you can leave out the http and www" and "hyphen and slash are different symbols".

From Berners-Lee's FAQ: https://www.w3.org/People/Berners-Lee/FAQ.html

Q: What is the history of the //?

A: I wanted the syntax of the URI to separate the bit which the web browser has to know about (www.example.com) from the rest (the opaque string which is blindly requested by the client from the server). Within the rest of the URI, slashes (/) were the clear choice to separate parts of a hierarchical system, and I wanted to be able to make a link without having to know the name of the service (www.example.com) which was publishing the data. The relative URI syntax is just unix pathname syntax reused without apology. Anyone who had used unix would find it quite obvious. Then I needed an extension to add the service name (hostname). In fact this was similar to the problem the Apollo domain system had had when they created a network file system. They had extended the filename syntax to allow //computername/file/path/as/usual. So I just copied Apollo. Apollo was a brand of unix workstation. (The Apollo folks, who invented domain and Apollo's Remote procedure call system later I think went largely to Microsoft, and rumor has it that much of Microsoft's RPC system was).

I have to say that now I regret that the syntax is so clumsy. I would like http://www.example.com/foo/bar/baz to be just written http:com/example/foo/bar/baz where the client would figure out that www.example.com existed and was the server to contact. But it is too late now. It turned out the shorthand "//www.example.com/foo/bar/baz" is rarely used and so we could dispense with the "//".


however Bill \\ Gates shall never be forgiven

As I've heard the history, Microsoft tried to use a forward slash for directories in DOS 2.0, but IBM insisted it be a backslash so that it didn't conflict with the parameter switch.

Yes, the (recently released) source code for MS-DOS 2.0 is liberally sprinkled with comments that indicate Microsoft fully intended for the path separator character to be '/' and the command-line option designation character to be '-'.

Microsoft had to design a new API for DOS 2.0, as this was the first version to support hard disk drives and hence required support for subdirectories to organize the filesystem. The API was intentionally designed to mimic Unix. [1]

And it was also intended for devices (such as CON, LPT1, COM1, etc.) to be prefixed with the special directory name 'DEV', as in '/DEV/CON' and '/DEV/LPT1', just to make it feel even more like Unix. [2]

Apparently the idea was that MS-DOS would be Microsoft's single-user operating system running on cheap 8088 machines, and Xenix would be their "enterprise" multi-user operating system running on high-end 80286 systems, and programs could target a single common DOS/Xenis API and could be run on either OS.

[1] https://github.com/microsoft/MS-DOS/blob/master/v2.0/source/...

[2] https://github.com/microsoft/MS-DOS/blob/master/v2.0/source/...

I think you mean Bill \\\\ Gates.

hmm, yes the \\\\ does make a nice Gate icon.

cries in Python

With the quotes I like the read that Sorry as Not Sorry at all, stop bothering me.


> Also, there are a number of factors at play, a number of different futures that could have been resulted if there were no slashes just by chaos theory[1]. So, speculating in the hindsight is not fruitful.

This sounds like an argument that one should do whatever, because whatever can result from it.

Well, at the very least you shouldn't attempt to do something under the assumption that you can perfectly predict it's result.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact