One bit of trivia I really love is why "DEL" is at 127 -- weirdly way away from all the control codes.
It's because 0x7F is all ASCII bits set to "1". Back in the early punch card (and telegraph?) days, if there was a typo, you couldn't "unpunch" a hole to make it a 0 again, but you could punch out all the rest of them, indicating "ignore this char, I've deleted it" -- 0b1111111.
> One feature I like is that you can neatly cut out a 64-character 6-bit all-uppercase encoding (like DEC's) if you don't need lowercase.
And if you take the digits from the 0x3X column and the non-digits from the 0x2X column (flipping one bit), you get 0123456789*+,-./ which is enough to write numbers and the basic arithmetic operators. This too was deliberate.
The TRS-80s had the lovely feature that all of the punctuation above the digits were consistent with ASCII layout. shift-1 was !, as per usual, but also shift-2 was ", and ' was shift-7. On the Model I they even went so far as to make shift-0 produce a space! (On the Model III however, they reused shift-0 to toggle between uppercase and lowercase mode -- sort of a reverse caps-lock key.)
In about 1978 I designed my own keyboard. The unique feature it had was that the key positions were used as address lines into a ROM, and I was free to assign any character to any key I wanted. Shift/Ctrl weren't required to be at all related to the base character. Today we take it for granted, but back then it was radical.
For a more detailed look at the design decisions in ASCII and its predecessors, check out Character Coding Sets: History and Development by Charles Mackenzie, one of the best technical books I've ever read.
For sentimental reasons I still carry around my Teletype ASCII reference card which is laid out this way. For extra retro goodness it shows the bits as holes on a paper tape with the feed holes and everything.
When you have to design a device that does ASCII entirely mechanically then this is a efficient way to structure it. I would not be surprised to hear that mechanical considerations had a influence on the question of what bits went where.
Added: OK, I have managed to not entirely surprise myself:
Somebody want to pretty that up and put out some print-resolution (and/or good web designed) version? I've always used http://www.asciitable.com/ ; Not because it's well presented but because it's easy to remember. A 'better' ASCII table would be good for making this cool again, teaching the current generation some of the things they've missed.
The 1963 standard⁰ answers your questions. (Probably the final 1967 version does too, but it's not online anywhere.)
In this instance the more common 8-column presentation actually better reflects the design. It's important that the low 4 bits of '0' are 0. The 0x2X column and 0x3X column are related by shift (on some devices²).
You have a typo in the code for zero, it doesn't end with 1. The reason 0 can't be 0100000 is that's the start of the printable characters and an established standard/requirement was that space collate below all other printable characters. So you have to start with space, you can't put 0 there.
> Why isn't "&" (ligature of "et") not in the same row as "e"? "$" ("dollar") is in the same row as "d".
It matches old mechanical typewriters. I have one with the shift-characters over the numbers being exactly what was 16 characters (one bit flip) away in the table.
Oh, that's why ^J is a literal newline in Emacs. How did I miss that for 25 years?
(BTW, for anyone curious, you match a newline in a regexp in emacs by using ^Q ^J, the first is the quote operator and the second is the character you want, ^J, or newline)
Very interesting, thanks. I never knew that key combinations on the terminal were actually just shortcuts to send specific control characters.
It's a bit confusing to me that the column header bits are added to the LEFT of the row identifiers. Might be helpful to report the row ids as "__00000" or similar.
This is very neat. I've done lots of work with binary text for hardware, but I would've never noticed this on my own. However now that it's written out, it seems very straight forward.
The Linux version has two columns; ever wonder escape (0x1b) visualizes as ^[? because it's a single bit off the actual character "["; all the control characters are; they're visualized by essentially flipping a single bit, and displaying ^ + that letter. This was the point of TFA.
The author's four column layout also makes the 1-bit difference between upper and lower case ASCII readily apparent, which the Linux man page regrettably does not.
If you want to geek-out further on ASCII have a look at the late Bob Bemer's site: https://www.bobbemer.com/ Bob is colloquially known as the "father of ASCII" (among other things) and his writing is fun to read and interesting.
I thought the original comment was a gem as well. For me, it made more sense to switch the 5 bit column to the right side, as seen here: http://pastebin.com/AeTXg1xe
^[ and ^{ do the same thing in Terminal.app (macOS Sierra) and vim 8.0. However, I get the bell sound (which generally denotes invalid input in macOS) for ^; and it prints nothing.
I guess it could in theory, on my keyboard, however, CTRL seems to override other control chars, so typing ^{ doesn't seem possible at all, at least without any hacks. (I don't have a US layout keyboard.)
CTRL does not actually modify the character code sent from the keyboard. For letters, the same keycode (which maps to ASCII with a constant addition of 0x3D) is always sent. Another byte in the HID report contains bit flags for modifier keys (L/R CTRL, SHIFT, etc); the OS decides what happens after that.
Not on a modern PC, but on the machine ASCII was originally developed for I could definitely see the CTRL key just pulling one bit low on the keyboard encoder.
The way this is layed out in a table and suddenly properties line up in rows and columns reminds me of the Periodic Table in Chemistry.
Obviously it shows how the ASCII committee used the first two bits as control bits and the remaining bits as a mixture of control and data bits, but I’d never seen it displayed this way.
I think the horizontal layout is a lot more readable, especially with the ability to read the bitstring more-or-less left-to-right. Only thing is that it's pretty wide—maybe too wide. The document that screenshot is from is meant to take up an A3 sheet split in half lengthwise. Someone willing to spend more time on it than I was would probably be able come up with helpful notes for the bottom half, or shift things around so elements corresponding to the low bit in the upper nibble are nestled below the items where that bit is off.
EBCDIC seems elegant in its own way: http://www.quadibloc.com/comp/cardint.htm (scroll down) -- apparently it's descended from IBM punch card formats. The discontinuity in the alphabet seems inconvenient for sorting, but it looks like it shares some properties (like bit-flip to make lower case) with ASCII.
The digits and letters actually map quite nicely to punch cards. You can see how punches 0 - 9 map exaclty to EBCDIC F0-F9. And if you check how the letters are coded on punch cards (1 - 9 plus one punch "above" in the zone and 0 rows) you can see how it maps exactly to EBCDIC C1-C9, D1-D9, E2-E9. Most other characters aren't coded quite as neatly, I don't know if there is a system to them.
I never did the column layout, but I knew about the bit flips for control and shift from both Tom Scott's video on reading ASCII, and from reading about the Meta key.
It's because 0x7F is all ASCII bits set to "1". Back in the early punch card (and telegraph?) days, if there was a typo, you couldn't "unpunch" a hole to make it a 0 again, but you could punch out all the rest of them, indicating "ignore this char, I've deleted it" -- 0b1111111.
source: http://www.trafficways.org/ascii/ascii.pdf which is a really neat read if you like that sort of thing :)