Hacker News new | past | comments | ask | show | jobs | submit login
Four Column ASCII (garbagecollected.org)
569 points by nishs on Feb 1, 2017 | hide | past | favorite | 68 comments

One bit of trivia I really love is why "DEL" is at 127 -- weirdly way away from all the control codes.

It's because 0x7F is all ASCII bits set to "1". Back in the early punch card (and telegraph?) days, if there was a typo, you couldn't "unpunch" a hole to make it a 0 again, but you could punch out all the rest of them, indicating "ignore this char, I've deleted it" -- 0b1111111.

source: http://www.trafficways.org/ascii/ascii.pdf which is a really neat read if you like that sort of thing :)

In particular, paper tape.

ASCII was very carefully designed.

One feature I like is that you can neatly cut out a 64-character 6-bit all-uppercase encoding (like DEC's) if you don't need lowercase.

Wikipedia covers some of the design choices here: https://en.wikipedia.org/wiki/ASCII#Internal_organization

The 1963 ASCII standard also lists many of the design considerations.


  > One feature I like is that you can neatly cut out a 64-character 6-bit all-uppercase encoding (like DEC's) if you don't need lowercase.
And if you take the digits from the 0x3X column and the non-digits from the 0x2X column (flipping one bit), you get 0123456789*+,-./ which is enough to write numbers and the basic arithmetic operators. This too was deliberate.

I thought the pattern was from 01000 to 11111 gives you ()*+,-./0123456789:;<=>? but I can see what you are talking about.

Yes and on keybord shift+1 becomes

The TRS-80s had the lovely feature that all of the punctuation above the digits were consistent with ASCII layout. shift-1 was !, as per usual, but also shift-2 was ", and ' was shift-7. On the Model I they even went so far as to make shift-0 produce a space! (On the Model III however, they reused shift-0 to toggle between uppercase and lowercase mode -- sort of a reverse caps-lock key.)

In about 1978 I designed my own keyboard. The unique feature it had was that the key positions were used as address lines into a ROM, and I was free to assign any character to any key I wanted. Shift/Ctrl weren't required to be at all related to the base character. Today we take it for granted, but back then it was radical.

How many keys were there? I suppose there must have been a lot more keys than your ROM had address lines.

My Japanese keyboard still has that punctuation ordering.

Here is a nice history of ASCII: https://github.com/ericfischer/ascii

Thank you!

For a more detailed look at the design decisions in ASCII and its predecessors, check out Character Coding Sets: History and Development by Charles Mackenzie, one of the best technical books I've ever read.

For sentimental reasons I still carry around my Teletype ASCII reference card which is laid out this way. For extra retro goodness it shows the bits as holes on a paper tape with the feed holes and everything.

When you have to design a device that does ASCII entirely mechanically then this is a efficient way to structure it. I would not be surprised to hear that mechanical considerations had a influence on the question of what bits went where.

Added: OK, I have managed to not entirely surprise myself:

* https://en.wikipedia.org/wiki/ASCII#Internal_organization

This seems to say that the influence was partially mechanical typewriters but the Teletype Model 33 entirely followed ASCII.

> For sentimental reasons I still carry around my Teletype ASCII reference card which is laid out this way.

Your comment inpired some googling. I found this chart, which although likely not the one you carry around, is pretty doggone cool too:


I found a collection of Teletype (teleprinter?) code cards:

* http://www.rtty.com/CODECARD/codecrd1.htm

Mine is the one designated "Model 33/35 ASCII".

Somebody want to pretty that up and put out some print-resolution (and/or good web designed) version? I've always used http://www.asciitable.com/ ; Not because it's well presented but because it's easy to remember. A 'better' ASCII table would be good for making this cool again, teaching the current generation some of the things they've missed.

Don't forget "man ascii". It's not laid out any better but might be more convenient for you.

Could you please take a photo of that card and post the image up somewhere for posterity?

There is more to it.

For example, note that the numerals map to their direct binary notation plus a 011 in front. 0 => ...0000, 1 => ...0001, 2 => ...0010, etc.

Now I wonder, why don't they start in the zero row? In other words, why is 0 = 0110000, instead of 0100000?

Why are the parens not in the same row as braces and brackets?

Why isn't "&" (ligature of "et") not in the same row as "e"? "$" ("dollar") is in the same row as "d".

The 1963 standard⁰ answers your questions. (Probably the final 1967 version does too, but it's not online anywhere.)

In this instance the more common 8-column presentation actually better reflects the design. It's important that the low 4 bits of '0' are 0. The 0x2X column and 0x3X column are related by shift (on some devices²).


¹ https://en.wikipedia.org/wiki/Binary-coded_decimal

² https://en.wikipedia.org/wiki/Bit-paired_keyboard

> Probably the final 1967 version does too, but it's not online anywhere.

Is this it?


Unfortunately that's just an article on the changes in the new version.

You have a typo in the code for zero, it doesn't end with 1. The reason 0 can't be 0100000 is that's the start of the printable characters and an established standard/requirement was that space collate below all other printable characters. So you have to start with space, you can't put 0 there.

> Why isn't "&" (ligature of "et") not in the same row as "e"? "$" ("dollar") is in the same row as "d".

It matches old mechanical typewriters. I have one with the shift-characters over the numbers being exactly what was 16 characters (one bit flip) away in the table.

And why isn't the octothorpe in the same row as "o"?

Oh, that's why ^J is a literal newline in Emacs. How did I miss that for 25 years?

(BTW, for anyone curious, you match a newline in a regexp in emacs by using ^Q ^J, the first is the quote operator and the second is the character you want, ^J, or newline)

If you're wondering "why not ^Q then Enter?", that gets you CR or ^M, but in UNIX, newline or ^J is what gets you to the next line.

So, yes, you normally push CR to enter the LF character.

The confusions between CR and LF run deep and wide, even in the UNIX world.

You will also see ^M on the end of each line in vim, if you open a CRLF line terminated file in it, but it thinks it's a LF line terminated file.

> that's why ^J is a literal newline

So I don't quite understand how that table explains it. I mean aside from the fact J character code being the LF code with first two bits zeroed.

It is explained under the chart for [ and Esc. CTRL zeroes out the first two bits of a character. So that is the explanation.

and ^I for tab, etc

What about breaking this table into to parts, 00000-01111, and 10000-11111?


It's a bit easier (for me) to read, if it's on one page. Also, it makes a bit more explicit that digits start at 0110000.

Very interesting, thanks. I never knew that key combinations on the terminal were actually just shortcuts to send specific control characters.

It's a bit confusing to me that the column header bits are added to the LEFT of the row identifiers. Might be helpful to report the row ids as "__00000" or similar.

This is very neat. I've done lots of work with binary text for hardware, but I would've never noticed this on my own. However now that it's written out, it seems very straight forward.

This also explains the Ctrl-D used to end input e.g. when using cat from stdin:


It also expl^H^H^H^H^H^H

   man ascii
on most Linux systems shows a similar layout.

I wonder if 00-1F could be added to the summary, using the Unicode Control Pictures range for added irony.

https://en.wikipedia.org/wiki/Control_Pictures (␀ ␁ ␂ ␃ ␄ ␅ ␆ ␇ ␈ ␉ ␊ ␋ ␌ ␍ ␎ ␏ ␐ ␑ ␒ ␓ ␔ ␕ ␖ ␗ ␘ ␙ ␚ ␛ ␜ ␝ ␞ ␟)

I've sshed into Linux systems just to get that particular version of the man page. It's a shame that OS X doesn't have it.

Like the author, it blew my mind when I realized that all the Ctrl+? keys were assigned their letters due to a single bit flip.

vim also does this; this is why nul is ^@.

If you don't have a Linux system at hand, you can get the manual page online: http://man7.org/linux/man-pages/man7/ascii.7.html

OSX doesn't have it? http://imgur.com/xFRNPbo

That's exactly what mine looks like too. Compare to the Linux version: http://man7.org/linux/man-pages/man7/ascii.7.html

The Linux version has two columns; ever wonder escape (0x1b) visualizes as ^[? because it's a single bit off the actual character "["; all the control characters are; they're visualized by essentially flipping a single bit, and displaying ^ + that letter. This was the point of TFA.

The author's four column layout also makes the 1-bit difference between upper and lower case ASCII readily apparent, which the Linux man page regrettably does not.

I assume the author meant "in a 32-character grouping", however man ascii(7) on my Linuxes show it in 64-character groups, so... shrug

I've a few notes on the programmatic elegance of ASCII at:


If you want to geek-out further on ASCII have a look at the late Bob Bemer's site: https://www.bobbemer.com/ Bob is colloquially known as the "father of ASCII" (among other things) and his writing is fun to read and interesting.

I thought the original comment was a gem as well. For me, it made more sense to switch the 5 bit column to the right side, as seen here: http://pastebin.com/AeTXg1xe

Does this mean that ^; and ^{ are also equivalent to ESC?

^{ is, but on my terminal ^; just prints out a ;

iTerm2, in vim 1.8

^[ and ^{ do the same thing in Terminal.app (macOS Sierra) and vim 8.0. However, I get the bell sound (which generally denotes invalid input in macOS) for ^; and it prints nothing.

You have control characters for the characters from 64 to 95.

    Control-@ is 0.  
    Control-A is 1, through control-Z is 26.
    Control-[ is 27, escape.
    Control-\ is 28.
    Control-] is 29.
    Control-^ is 30.
    Control-_ is 31.

I guess it could in theory, on my keyboard, however, CTRL seems to override other control chars, so typing ^{ doesn't seem possible at all, at least without any hacks. (I don't have a US layout keyboard.)

According to the bit-wise AND at the end of the article it should, but it seems like there's more to it than just zeroing out the 'column' bits.

Not sure if this is specifically related, but-

CTRL does not actually modify the character code sent from the keyboard. For letters, the same keycode (which maps to ASCII with a constant addition of 0x3D) is always sent. Another byte in the HID report contains bit flags for modifier keys (L/R CTRL, SHIFT, etc); the OS decides what happens after that.

Note that this holds under USB HID.

Not on a modern PC, but on the machine ASCII was originally developed for I could definitely see the CTRL key just pulling one bit low on the keyboard encoder.

I've been working with ASCII for over 30 years and never realized this smh

The way this is layed out in a table and suddenly properties line up in rows and columns reminds me of the Periodic Table in Chemistry.

Obviously it shows how the ASCII committee used the first two bits as control bits and the remaining bits as a mixture of control and data bits, but I’d never seen it displayed this way.

Really neat.

Funny you say that. Here's what I created for my personal use:


I think the horizontal layout is a lot more readable, especially with the ability to read the bitstring more-or-less left-to-right. Only thing is that it's pretty wide—maybe too wide. The document that screenshot is from is meant to take up an A3 sheet split in half lengthwise. Someone willing to spend more time on it than I was would probably be able come up with helpful notes for the bottom half, or shift things around so elements corresponding to the low bit in the upper nibble are nestled below the items where that bit is off.

I did the same thing.

Bug report, your table has an error at 1111110.

And then there's IBM's EBCDIC, of similar vintage to ASCII, but of markedly dissimilar utility.


EBCDIC seems elegant in its own way: http://www.quadibloc.com/comp/cardint.htm (scroll down) -- apparently it's descended from IBM punch card formats. The discontinuity in the alphabet seems inconvenient for sorting, but it looks like it shares some properties (like bit-flip to make lower case) with ASCII.

The digits and letters actually map quite nicely to punch cards. You can see how punches 0 - 9 map exaclty to EBCDIC F0-F9. And if you check how the letters are coded on punch cards (1 - 9 plus one punch "above" in the zone and 0 rows) you can see how it maps exactly to EBCDIC C1-C9, D1-D9, E2-E9. Most other characters aren't coded quite as neatly, I don't know if there is a system to them.

I never did the column layout, but I knew about the bit flips for control and shift from both Tom Scott's video on reading ASCII, and from reading about the Meta key.

8-columns, however, makes it clear why the ASR-33 had "!" over "1".

Which makes it clear why the Apple I, II and II+ did the same.

The ASR-33 even had [ \ ] ^ _ available only via shift + K L M N O. This is called a bit-paired keyboard.

This also shows why on IRC, [\]^ are considered to be case-shifted versions of {|}~.

Nostalgically interesting. But is it useful today?

Thanks for this, just posted it at my desk

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact