Hacker News new | past | comments | ask | show | jobs | submit login
Pronunciations for hexadecimal numbers (1968) (twitter.com/lizhenry)
497 points by henrik_w on Aug 26, 2019 | hide | past | favorite | 117 comments



In French, and more specifically French as spoken in some areas in Switzerland, there is a very natural way of pronouncing hex numbers:

1 to F are the normal 'un' to 'quinze'. A is 'dix'.

10 is 'seize'. 11 is 'seize et un' etc. Until 1F which is `seize et quinze'.

20 to 2F are 'vingt' until 'vingt et quinze'.

Etc.

70, 80 and 90 are called 'septante', 'huitante' (Swiss-specific) and 'nonante'.

A0 is 'dixante', then we have 'onzante', 'douzante', 'treizante', 'quatorzante' and 'quinzante'.

100 is 'cent', and from there normal rules apply.

Etc.

So B78D would be 'onze mille sept cents huitante treize'.

Note that 'soixante dix' is 6A, not 70.


I see no reason this couldn't be applied to English. "-nz(e)", etc. is akin to "-teen", and beyond that it should be exactly the same, with septante/seventy, huitante/eighty, and nonante/ninety being the popular English versions (unlike e.g. Metropolitan French).

B78D: Eleven Thousand Seven Hundred and Eighty Thirteen

See also Tolkien's famous hobbit's birthday party.


I love this system, but how do you deal with the ambiguity around "ten"?

    9: nine
    A: ten?
    B: eleven
    ...
    10: ten?
    20: twenty
    30: thirty


A=ten, 10=onety, 11=onety-one, 1A=onety-ten, etc ?

Edit: Or as suggested in the original link, A=Ten, 10=annty, 11=annty-one and 1A=annty-ten


Following the French style:

10 is sixteen.

11 is sixteen one

A is not ten in hexadecimal.

"ten" always means 10 in any base. the number after "nine" is "deci".


    9: nine
    A: ten
    B: eleven
    ...
    10: hexTen
    20: hexTwenty
    30: hexThirty
    ...
    100: hexHundred
My issue is that 1000 is hexThousand, but when working with hex numbers, its far more logical to group them in 4s.

0x10000 is a number more deserving of a name than 0x1000.

In effect, I want to call 0x10000 as "hex-thousand" (correlating to the number where 16-bits overflow). And 0x100000000 as "hex-million" (correlating to the 32-bit overflow number).

Of course, that leaves 0x1000 and 0x1000000 as ambiguous.


‘Myriad’, via Greek¹. (Apparently though, ‘myria‐’ was a pre-SI metric prefix for ×10000⏨, so I'm afraid someone will come along and insist on ‘myribi‐’, and I'll have to smack them.)

¹ https://en.wikipedia.org/wiki/10,000#Name


Interesting.

Myriad shares similarities to "Million". So maybe Myriad / Biriad / Trilliad / Quadrilliad will scale into the higher orders.

Hmm, maybe I'll make this a thing.


Not to be confused with the long-scale numbers: milliard/billiard/trilliard


I guess the language you're looking for is Japanese, then. (Or Chinese?)

1=一 ichi

10=十 juu

100=百 hyaku

1000=千 sen

1,0000=万 man

10,0000=十万 juuman

100,0000=百万 hyakuman

1000,0000=千万 senman

1,0000,0000=億 oku

1,0000,0000,0000=兆 chou

1,0000,0000,0000,0000=京 kei


Pronounce 11,78D vs B78D.


As suggested in a cousin: onety-one thousand, seven hundred and eighty thirteen.

Every word (phrase in the case of hundred/thousand) becomes strictly one digit, as opposed to normal English where e.g. "thirteen" is actually two digits.


But onety-one thousand isn’t strictly one digit, is it?


Sixteen-one thousand, seven hundred and eighty fourteen

As, from the original french comment:

1A = dix / ten

10 = seize / sixteen

11 = seize-et-un / sixteen-one

Others have suggested 10 = onety instead of sixteen. This would be more consistent, but less idiomatic (onety is not a number in English, and hex is base sixteen).


From the grandparent:

11 78D seize-un mille sept cent octante-quatrorze.

Or, in English:

11,78D Sixteen-one thousand seven hundred eighty fourteen.


Let's note that this fall apart in Metropolitan French, where we don't say neither « septante » for 70, nor « nonante » for 90 (and even less « huitante » for 80), but « soixante-dix » for 70, « quatre-vingt » for 80 and « quatre-vingt-dix » for 90.

It's quite a shame, because that solution for reading hexadecimal numbers out loud is quite elegant :)


Belgian French uses septante/nonante. Huitante would be a great addition.


I personally prefer octante. Septante and huitante sounds to much like just slapping -tante to 7 and 8. And just to be coherent, I would love to have heptante for 70.


"septante", "octante", "nonante" would be coherently based on Latin etymons, but if you want to coherently use "heptante", which relies on a Greek etymon, you should probably also use "ennéante" in lieu of "nonante".


In the context of hex numbers "quatre-vingt" isn't a problem tho, no ambiguity.


For the folks who know zero French:

« soixante-dix » for 70 = sixty-ten

« quatre-vingt » for 80 = four-twenty

« quatre-vingt-dix » for 90 = four-twenty-ten


> Note that 'soixante dix' is 6A, not 70.

"Soixante-dix" would be 70 no matter how you look at it, and 0x6A is obviously "cent six".

That's why you need a new suffix as well as new names for exponentials instead of trying to retrofit existing number-naming convention. (Same thing is true for English as well)

So let this new hexadecimal suffix would be -anze, 16 ^ 3 is hexmille, 16 ^ 2 is hexcents,

0xB78D = 46989 would be "onze hexmille sept hexcents huitanze treize." or "quarante six mille neuf cents quatre-vingt neuf"


I think you’re missing the point. Soixante-dix becomes 6A, since 6x is « soixante » and A is « dix » Then, 7x becomes septante. For example, 7A becomes septante-dix


Sure, except that in standard (metropolitan) French, "soixante-dix" is already used for 70.


Much less interesting but I've worked in teams where "zillion" is a shorthand for "followed by all the zeroes that fit in a word of the size we're talking about". So 8 zillion is 0x80000000 if we're talking about a 32 bit address space.


That's weird because 3 zillion is more than 28 zillion.


And one zillion equals ten zillions...


If you use the Rogers system rather than the Magnusson system then it is not equal, I think.


And also one hundred zillions, and even one zillion zillions.


Yeah ok, we got thay from the parent comment.


Yeah although we would probably have said "two eight zillion" not "twenty eight zillion", and in the kind of conversations we'd have, it wouldn't really matter.

"Where is the stack base supposed to be?"

"O X 8 zillion"

"Oh, the system crashed with stack pointer at 0 X two eight zillion, something must have gone badly wrong!"


Funny how the same informal-isms crop up in disparate teams -- I've heard/used the same thing.


You could have enforced reduction rules so that the only valid zillion value is [0-F].[0-F]...


In that case, you could pronounce the :: in IPv6 addresses ("fill with zeroes") as "zillion" if it appears at the end of the address!

(I don't know how people pronounce that operator otherwise... I just say "colon colon".)


I have yet to need to recite an IP6 address out loud, so I don't know how I'd say it. I'd probably just recite it digit-by-digit (which is pretty much how I recite all numbers anyway).


So like 4 zillion will be more than 5 zillion (assuming talking about unsigned 32 bit)?


Oh I guess you'd say 05 zillion ("oh-five zillion").


i'm stealing that


The extra digit names flow well, but I think shoehorning it into the same constructions English decimal numbers use doesn't, mostly because the “-ty”/“-teen” suffixes are too tied both etymologically and phonetically to being “ten”. If I say the word “eighteen”, making it 0x18 in one context and XVIII in another is close in plausible applicability and far in exact meaning, which is about the worst case for avoiding misunderstandings. In fact I have a vague recollection of my computer architecture instructor back in undergrad warning us about exactly that regarding hexadecimal numbers: do not ever be tempted to pronounce them as though they were decimal, even if all the digits fit into the pattern.

There might be ways to fix that with variants of those suffixes, though my first thought of “-xy”/“-xeen” might be too hard to distinguish in a noisy auditory environment.


https://en.wikipedia.org/wiki/Hexadecimal#Verbal_and_digital...

"An additional naming system has been published online by S. R. Rogers in 2007[21] that tries to make the verbal representation distiguishable in any case, even when the actual number does not contain numbers A-F. Examples are listed in the tables below."

"1743 one-thousek-seven-hundrek-fourtek-three"


Fine, but is it an imperial thousek, or a metric one?


Is imperial thou = 1024?


No, an imperial thousand is 1066.


Isn’t an issue what the words for numbers mean?

Does thousand mean the quantity itself or does it mean the decimal 1000?

In other words thousand = 1000 (dec) and 0x3E8 (hex). Are they both pronounced “thousand”? When someone says mix a thousand liters of glycol with ten kilograms of salt, both 1000 (dec) and 0x3E8 (hex) mean “thousand” each of something.

The only way to get around this is if every whole number on the number line had a unique name and could be represented by any numbering system.


Same question arises in binary - you could read "1000" as one thousand, but people are going to think you meant "whatever the base 10 quantity of 1000 is in binary" when what you actually meant was eight. It's easier to just always spell out anything that's not base 10.


If someone said to me 'one thousand base sixteen' I would think '=> 1000_16 => 0x1000'; certainly not that they were converting all numbers to base ten solely for the purpose of speech.

Thus, I agree with you, there's value in making digits beyond 9 pronouncable for higher bases (if dealing with them enough to make devising the system worthwhile of course).


I would think exactly the opposite.


I endorse that warning, I try to never pronounce i.e. 0x10 as "ten", just "hex one zero". After I wasted an hour hunting down somewhere I misused a number having the wrong base, it just isn't worth it.


It can get especially bad when you have to deal with atee vs eighty or eighteen vs ateen


These are quite silly things. Firstly, why invent words instead of just using the ltters. Obviously A creates an ambiguity with "eight", but other than that, things are fine. Perhaps A should be "aye".

The "teen" suffix is wrong; it specifically means "ten". For instance thirteen means three+ten, so it is inappropriate to pronounce 0x13 as "thirteen". So that is to say, the pronunciation issue does not begin at 1A; we shouldn't call 0x19 "nineteen", but something else.

Similarly 0x30 shouldn't be "thirty" because that word means three times ten.

There shouldn't be any common words between hex pronunciation and decimal that denote a different integer. If we say "hundred" and the context is really clear, it can be understood as 0x100, but the context isn't always clear. Attaching "hex" after every ambiguous wording ("hundred thirty-one hex" for 0x131) is verbose. How about:

   8: eight
   9: nine
   A: aye
   B: bee
   C: cee
   ...
   F: eff
  10: hex
  11: heven
  12: helve
  13: thirex
  14: forex
  15: fivex
  16: sixex
  17: sevex
  18: eightex
  19: ninex
  1A: ayex
  1B: behex
  ..
  1F: efex

  20: twexy
  30: trixy
  40: foxy
  50: fixy
  60: sixy
  70: sepsy
  80: oxy
  90: noxy
  A0: ayesy
  B0: beezy
  C0: ceezy
  D0: deezy
  E0: eezy
  F0: efzy

  100: hent (from "cent")
  1000: hil (from "mil")
  10000: han (from 万 (man))
  100000000: hoku (from 億 (oku))

  0xDEADBEEF:  deezy-ee-hent ayesy-dee han beezy-ee-hent eezy-eff.

  0xF00FCBB0:  efzy-hent-eff han, ceezy-bee-hent beezy.


Would anyone please help me understand the benefits of this? :).

This approach (and other approaches proposed in this thread) seems to add complexity but it's not that much better than just reading the characters normally.

> B78D would be 'onze mille sept cents huitante treize'

Or just read it: B, 7, 8, D. To space out a long hex number, read it in group of 4, with a pause.

The approach, other than it's not being shorter, also risks miscommunication due to the receiver might not be familiar with it, or due to the sound system has multiple very similar sounds (but totally will change the meaning when misheard) like -teen and -ty.


Don't you think that's a bit of a strawman? The user is explaining how that number is pronounced in their native Swiss French. Clearly nobody is suggesting you have to learn French to say some numbers.


I think this might be the disconnect between us. From my point of view as a non-native English speaker, the approach proposed in the tweet and that Swiss French system both requires the user to learn a new system. To me, both of that carry more drawbacks than just reading the hex character by character, in a group of 4.

I can give an example using the system in the tweet:

2F3E: twenty frost thirty ernest -- longer to read, 7 sounds.

2F3E: 2, F, 3, E -- shorter to read, 4 sounds.

Looking at that, I don't see the benefits and trade offs of that new complexity, and that's why I asked for help :)


I also prefer just reading off the short version; either in sets of (up to) 4 or 2, depending on the application.

For a mac address I'd not summarize and just read it off in 6 pairs.

For the example representation address on the wikipedia page... 2001:0db8:85a3:0000:0000:8a2e:0370:7334 ( https://en.wikipedia.org/wiki/IPv6_address )

Two Zero Zero One Colon Delta Bravo Eight Colon Eight Five Alpha Three Colon (pause) Colon Eight Alpha Two Echo Colon Three Seven Zero Colon Seven Three Three Four.

The part in the middle might need a better way of announcing, but given the double digits in other parts of the address it seems more natural to read it as a string of characters than to alter it.


You missed a colon but yeah. (I would just pronounce a single zero in between the colons instead of pausing to be extra clear)


I don't like this system so much, but there are other systems for hexadecimal pronouncing. I like the hex intuitor system, which uses "tex" for 0x10 and "ten" for 0xA.

I have also seen other systems of representing the digits for hexadecimal, such as Nystrom's system (which also includes pronounciation).

And now, looking at Wikipedia, there are more. (The hex intuitor system I mentioned above is also called the Rogers system.)




I was about to post the same link! In case you didn't click the link it is from the show Silicon Valley and has the following number words:

  0xA_ : Atta
  0xB_ : Bibbity
  0xC_ : City
  0xD_ : Dickety
  0xE_ : Ebbity
  0xE_ : Fleventy


ha, great link -- here's another version of the clip from the article that isn't region blocked https://www.youtube.com/watch?v=aFzze2o_NYY


I totally fail the point of this article. Why not just say A, B, C, D, E, F as the letters that they are? Works perfectly fine.

1F is just a two-syllable word. Just like "fiftysix" is.


I agree with you, sort of, in that if we're inventing a new system we may as well make it more intuitive for base 0x10 numbers while we're at it. At the very least we need something like the thousands separator, otherwise something like 0x10000000 becomes very difficult to say. Perhaps just use "word" as the suffix for (2^4)^4 sized groups. [1] And then "biword", "triword", "quadword", etc. So 0x10000000 would become "one aught aught aught word".

But then you start to see the advantage of named systems, because you end up saying "aught" or "zero" a lot to distinguish "0xA" from "0xA0" within words. "A salary of one aught aught word" doesn't seem as easy to me as "a salary of one hundred thousand". So perhaps something ought to be done about 0x10 and 0x100, even if we only use them in situations that can be abbreviated like that. Maybe 0x10 is "hex", so "base 0x10" becomes literally read as "base hex".

[1] Of course this fixes a meaning for how many bytes are in a word, which a lot of people aren't going to be happy with. ;-)


Because there is a shed that needs to be biked.


Simply because people love bikeshedding naming conventions. At this point, naming is just an engineering meme.


Isn't "fif-ty-six" three syllables?


Is it to better distinguish them, like the phonetic alphabet?

These letters all rhyme: B, C, D, and E. "Did you say 1B or 1D?"


AC is annty-christ


Of course it is! I never knew of these.

Such a computing lire, gem. I am going to say these first chance I get.


Stay F0 out there...


It's amazing how so many people who "don't" believe in God blaspheme at the earliest opportunity.


That might be one of the earliest examples of someone trying to give names to hex numbers (if I remember correctly, Knuth's TAoCP might've also contained one), but this is one of the earlier pages I remember coming across on the Internet about it: http://www.intuitor.com/hex/words.html


Remind me of Boby's "bibi-binary" :) https://en.wikipedia.org/wiki/Bibi-binary


It's amazing what naming conventions are used over time. In the 70's/early 80's "~" was (at least in programming circles in the UK) called a swan-hyphen, today it is often called a tilde. Though I'm sure it has many other names that have come and gone throughout the fashion of time.


In Spanish most people call it "tilde" as well, but you can also call it "Virgulilla" [1]. I always call it like this just because of how it sounds, love that word.

[1] https://es.wikipedia.org/wiki/Virgulilla


A fact which may be related is that Unicode contains a character which is in effect a lengthened tilde character, called U+2053 SWUNG DASH: ⁓

Therefore, might your “swan hyphen” be a corruption of an original term “swung hyphen”?


Interesting, it may well have some bearing upon the UNICODE naming which came about in 1991, which would be around the time that it's usage for "~" seemed to wane out of use.

"#" was another one with many names that have consolidated over time, been decades since heard anybody refer to it as a checkerboard, which was never as elegant and can see why "hash" became the norm.


Bell Telephone videos around the time touch tone dialing was introduced called it the tictactoe symbol. Seemed like a rather informal name from such a formal organization but I guess they wanted something that all of the public would know.


Unless it's on twitter, it's still either a "pound" or "octothorp" to me, depending on context. Hash is a breakfast food.


This list derives from revision 2.3 of the Usenet ASCII pronunciation guide. Single characters are listed in ASCII order; character pairs are sorted in by first member. For each character, common names are given in rough order of popularity, followed by names that are reported but rarely seen; official ANSI/CCITT names are surrounded by brokets: <>. Square brackets mark the particularly silly names introduced by INTERCAL.

# Common: number sign; pound; pound sign; hash; sharp; crunch ; hex; [mesh]. Rare: grid; crosshatch; octothorpe; flash; <square>, pig-pen; tictactoe; scratchmark; thud; thump; splat.

http://www.catb.org/~esr/jargon/html/A/ASCII.html

So, according to the above, the most common name for # is “number sign”, and the officiall ANSI/CCITT term is “square”.

Also, in the 1975 movie Three Days of the Condor, the term “symbol for number” is used, presumably referring to #.


I learned # from my father, who introduced me to electronics as a lad by bringing a broken Telex machine home from work. He called it "pound," and it stuck with me.

Also, I remember * being "splat" back in my PR1MOS and Vax days.


That's a twiddle (I think I was influenced by Intercal).

Sometimes single quotes to me are "ticks," with the grave accent being a "back-tick."


Some friends and I wrote something similar as satire during undergrad. It’s ridiculous, but I was (and still am) pleased how it rolls off the tongue.

http://web.mit.edu/kade/www/misc/ieee4919.pdf


Also in 1968, the french singer — and hobbyist mathematician — Boby Lapointe ¹ proposed a graphical representation and pronunciation system for hexadecimal numbers, named bibi-binary ². The pronunciation part is based on the combinations of 4 consonants (HBKD) with 4 vowels (OAEI). It's probably less practical than a system relying on similarities with words for native numbers, but it was probably more of an artsy experiment and it produces shorter (and maybe funnier) results.

¹ https://en.wikipedia.org/wiki/Boby_Lapointe

² https://en.wikipedia.org/wiki/Bibi-binary


I'll have to tell my A01C about these.

And while we're talking hex semantics, is it just me being old, or do lower case letters in hex values bother anyone else?

The only time I've been OK with this is when I've had to work with machines that only have seven-segment displays.


I prefer lower case. The height difference makes the letters easier to read IMO.


I tend to prefer lowercase since it reduces the number of confusable characters, especially in handwriting (8/B, 0/D, 4/A), but I can understand that numbers might typographically or semantically feel uppercase, so lowercase A-F creates a MIxEd cASe looK.

Is uppercase the older tradition? I'm ~30yo and I've been seeing both for as long as I remember.


In my experience, it was always upper case until around the time CSS was introduced. The article pictured has it in upper case and it's from 1968.


HTML was as well, before CSS became commonplace. I still reflexively do it on occasion when writing a root page (html/head/title/body tags).


Most hardware was upper-case-only until the '60s — being descended from upper-case-only accounting cards and/or teleprinters.


RFC 5952: A Recommendation for IPv6 Address Text Representation

says "The characters "a", "b", "c", "d", "e", and "f" in an IPv6 address MUST be represented in lowercase "[1]

One Reason for this, is that it can avoid redundancy if you submit a hexdecimal nummber by phone.

"However, when a letter is spelled uppercase,people tend to specify that it is uppercase, which is unnecessary information."[2]

If you care about typography and think mixing lower-case letters with upper-case numbers, you could argue, that oldstyle numbners aka. lower-case numbers could be a better fit.

[1]https://tools.ietf.org/html/rfc5952#section-4.3

[2]https://tools.ietf.org/html/rfc5952#section-3.3.3


Since the RFC came 42 years after the magazine article, and 27 years after I sold my first computer program (with data in hex), I'll just consider myself old fashioned.


I'll pronounce numbers like 0xF000 as "F thousand"


Ok, but I'm still pronouncing DEADF007 as "dead foot".


There's got to be some origin for those particular names - were they just lab partners or something? I need to know!


Last night I wrote to Robert Magnusson (the author of the 1968 article) to ask him. Hope he writes back!


Did you notice the line "A01C annty christeen". I thought 616 (268 base 16) was that number. We need to talk.


Wonder why they let “Ernest” into this scheme, with two syllables when all the others have one.


Wonder why they let "Seven" into this scheme, with two syllables when all the others have one.


I wrote to the author of the system last night to ask him the backstory behind the names. Maybe his colleagues, friends, siblings - I hope he writes me back to tell the story!


Didn't Malcolm Gladwell talk about how the Chinese names for numbers were were closer to their place values and therefore less confusing for calculations?

e.g. "50" is "five-tens" instead of "fifty"


Esperanto uses a similar system:

5 is kvin and 10 is dek so 50 is kvindek. If you know how to count to 10, you know how to count to 99.

100 is cent, so 500 is kvincent. 505 is kvincent kvin and 555 is kvincent kvindek kvin. Alright, now you know how to count to 999.

A "number to Esperanto" is a great exercise to code and Esperanto is a lovely language to learn. Try it!


There was a programme where primary-school kids were taught Esperanto as their first foreign language for a year, before switching to French for N years. Not only did those in the programme speak better French at the end than those who had learnt French for the whole N+1 years (which was the hypothesis being tested) but it was also found that many of the kids translated arithmetic problems into Esperanto before solving them and translating the answers back into English.


Yes, in Mandarin, the number 5 is wǔ (五), and the number 10 is shí (十). The number 50 is written as 五十 (I tend to read it as 5*10, or “five-tens” as you put it). I think it makes some simple math concepts easier to understand.


Que? "Fifty" is "five-tens", numerically, semantically, and etymologically.


Now that you've written it out I can tell that the "-ty" of "fifty" was probably shortened from some older pronunciation more resembling the word "ten". But I guarantee you that native English speakers are not thinking this in large numbers. This is the sort of insight that comes better when it's not your native language, others of us will just internalize "fifty" as its own word.

I guess it's also no coincidence that words ending in -teen are frequently confused with and misheard for those ending in -ty.

Edit: Seems like they derive from different Germanic roots for 10.


It's no where near as clear as the chinese version which is literally five tens though. To know fifty is 50 and five tens you have to already know both the meaning of number places and what fifty means, really hard to derive from just the name or seeing 50 written out.


Well, I'm not a native speaker. All I can say is it was crystal clear to me when I began learning English at age ten.


It's not super hidden but you also already understood numbers at that age too presumably so you already had most of the puzzle. As a native speaker I don't think I ever broke the words down like that or had them broken down to me like that (then again this would have been ages ago so maybe it was and I don't remember).


As a native speaker I'm with you. Always seemed pretty obvious to me too, especially above 5 with six-ty seven-ty etc. To be fair though, Chinese says/writes 15 as ten-five which is much more obvious than the difference between five-ty and five-teen. It's interesting in spoken English the emphasis flips between first to second syllable though.


You tell us!


There was a time when people programmed by hex and punched in code using strictly hex keyboard. I suppose this is what it was, even in the 90's, the heath kits were hex only.


This appears to confuse the representation of a number, with the quantity it represents.


wondering if anybody gets the hanky code reference in the tweet.


Well, a lot of San Francisco did, anyway!


betty oh-bet


this is so impractical


this is cursed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: