Hacker News new | comments | show | ask | jobs | submit login

I always thought it was a shame the ascii table is rarely shown in columns (or rows) of 32, as it makes a lot of this quite obvious. eg, http://pastebin.com/cdaga5i1

It becomes immediately obvious why, eg, ^[ becomes escape. Or that the alphabet is just 40h + the ordinal position of the letter (or 60h for lower-case). Or that we shift between upper & lower-case with a single bit.

esr's rendering of the table - forcing it to fit hexadecimal as eight groups of 4 bits, rather than four groups of 5 bits, makes the relationship between ^I and tab, or ^[ and escape, nearly invisible.

It's like making the periodic table 16 elements wide because we're partial to hex, and then wondering why no-one can spot the relationships anymore.

The 4-bit columns were actually meaningful in the design of ASCII. The original influence was https://en.wikipedia.org/wiki/Binary-coded_decimal and one of the major design choices involved which column should contain the decimal digits. ASCII was very carefully designed; essentially no character has its code by accident. Everything had a reason, although some of those reasons are long obsolete. For instance, '9' is followed by ':' and ';' because those two were considered the most expendable for base-12 numeric processing, where they could be substitued by '10' and '11' characters. (Why base 12? Shillings.)

The original 1963 version of ASCII covers some of this; a scan is available online. See also The Evolution of Character Codes, 1874-1968 by Eric Fischer, also easily found.

I stumbled across the history of the ASCII "delete" character recently: It's character 127, which means it's 1111111 in binary. On paper tape, that translates into 7 holes, meaning any other character can be "deleted" on the tape by punching out its remaining holes.

(It's also the only non-control ASCII character that can't be typed on an English keyboard, so it's good for creating WIFi passwords that your kid can't trivially steal.)

> It's also the only non-control ASCII character that can't be typed on an English keyboard

Don't count on it. There's a fairly long standing convention in some countries with some keyboard layouts that Control+Backspace is DEL. This is the case for Microsoft Windows' UK Extended layout, for example.

    [C:\]inkey Press Control+Backspace %%i & echo %@ascii[%i]
    Press Control+Backspace⌂
This is also the case for the UK keyboard maps on FreeBSD/TrueOS. (For syscons/vt at least. X11 is a different ballgame, and the nosh user-space virtual terminal subsystem has the DEC VT programmable backspace key mechanism.)

It's actually easier to add two spaces at both ends of the password :)

Wow, I never knew it was an actual character.

Sure, think of it this way: you're sitting at a terminal connected to a mainframe and press the "X" key; what bits get sent over the wire? The ones corresponding to that letter on the ASCII chart.

Now replace "X" with "Delete".

(too late for me to edit; took me a while to find online)

Another good source on the design of ASCII is Inside ASCII by Bob Bemer, one of the committee members, in three parts in Interface Age May through July 1978.




That Fischer paper does look interesting - Thanks!

I do understand that I've probably simplified "how I understand it" vs "how/why it was designed that way". This is pretty much intentional - I try to find patterns to things to help me remember them, rather than to explain any intent.

Yeah, there's not much 4-bit-ness that's an aid to understanding what it is today. One is that NUL, space, and '0' all have the low 4 bits zero because they're all in some sense ‘nothing’.

I started programming BASIC and assembly at 10 years old on a Vic-20, so I don't qualify as a wizened Unix graybeard, but I've still had plenty of cause to look up the ASCII codes, and I've never seen the chart laid out that way. Brilliant.

  > on a Vic-20
Which, weirdly, used the long-obsolete ASCII characters of 1963–1967, with '↑' and '←' in place of '^' and '_'.

PETSCII was a thing throughout the 8-bit Commodore line of products. It was based on the 1963 standard, but added various drawing primitives. I spent a lot of time drawing things with PETSCII for the BBS I ran from my bedroom.


Going on my deep (and probably fallible) memory; I remember seeing the ASCII set laid out like this on an Amoeba OS man page (circa 1990).


Had one as well, how did asm work?

>"It becomes immediately obvious why, eg, ^[ becomes escape. Or that the alphabet is just 40h + the ordinal position of the letter (or 60h for lower-case). Or that we shift between upper & lower-case with a single bit."

I am not following, can you explain why ^[ becomes escape. Or that the alphabet is just 40h + the ordinal position? Can you elaborate? I feel like I am missing the elegance you are pointing out.

If you look at each byte as being 2 bits of 'group' and 5 bits of 'character';

    00 11011 is Escape
    10 11011 is [
So when we do ctrl+[ for escape (eg, in old ansi 'escape sequences', or in more recent discussions about the vim escape key on the 'touchbar' macbooks) - you're asking for the character 11011 ([) out of the control (00) set.

Any time you see \n represented as ^M, it's the same thing - 01101 (M) in the control (00) set is Carriage Return.

Likewise, when you realise that the relationship between upper-case and lower-case is just the same character from sets 10 & 11, it becomes obvious that you can, eg, translate upper case to lower case by just doing a bitwise or against 64 (0100000).

And 40h & 60h .. having a nice round number for the offset mostly just means you can 'read' ascii from binary by only paying attention to the last 5 bits. A is 1 (00001), Z is 26 (11010), leaving us something we can more comfortably manipulate in our heads.

I won't claim any of this is useful. But in the context of understanding why the ascii table looks the way it does, I do find four sets of 32 makes it much simpler in my head. I find it much easier to remember that A=65 (41h) and a=97 (61h) when I'm simply visualizing that A is the 1st character of the uppercase(40h) or lowercase(60h) set.

This single comment has cleared up so much magic voodoo. I feel like everything fell into place a little more cleanly, and that the world makes a little bit more sense.

Thank you!

I can't believe I've only just realised where the Control key gets its name from. Thank you!

The article linked mentions that the ctrl key (back then?) just clears the top 3 bits of the octet.

Awesome, yes this makes total sense. I'm glad I asked. Cheers.

Basically, that modifier keys are just flags/mask e.g. ESC is 00011011, [ is 01011011. CTRL just unsets the second MSB and shifts the column without changing the row.

Physically it might have been as simple as a press-open switch on the original hardware, each bit would be a circuit which the key would connect, the SHIFT and CONTROL keys would force specific circuits open or closed.

if you press a letter and control, it generates the control character in the left-hand column.

the letters in the third column are A = 1, B = 2 etc: 40h + the position in the alphabet.

Awesome to see ^@ as null and laying it out this way makes it easier to see ^L (form-feed, as the article says: control-L will clear your terminal screen), ^G (bell), ^D, ^C etc etc

This is so that control characters (and shifted characters — see https://en.wikipedia.org/wiki/Bit-paired_keyboard) could be generated electromechanically. Remember a teletype of the era (e.g. Model 33) has no processing power.

There's a longer explanation on Wikipedia: https://en.wikipedia.org/wiki/Caret_notation

ESC is on the same row as [, just in another column. So Ctrl ends up being a modifier just like Shift, in that it changes column but not row.

The 40h offset is 2 columns' worth.

I made and printed out a nicely-formatted table that's adorning my office wall right now, for when I was trying to debug some terminal issues a while back (App UART->Telnet->Terminal is an interesting pipeline[1]), because I was frustrated with the readability of the tables I could quickly find online, and they didn't have the caret notation that so many terminal apps still use (quick, what's the ASCII value for ^[ and what's its name?[2]).

Cool story bro, I know, but I meant to put the file online in response here, but I can't find the source doc anymore >_< Edit: actually, I found an old incomplete version as a Apple Numbers file. If there's interest I can whip it back up into shape and post it as PDF.

[1] For example, when a unix C program outputs "\n", it's the terminal device between the program and the user TTY that translates it into \r\n. You can control that behavior with stty. I know this is something ESR would laugh at being novel to me. On bare-metal, you have no terminal device between you program and the UART output, so you need to add those \r\n yourself.

[2] That's ESC "escape" at decimal 26/hex 1B, and you can generate it in a terminal by pressing the Escape key or Ctrl-[

Consider just taking a photo of the table you found.

It's entirely possible that someone reading this thread will be able to source it.

But it's just a text table, right? It should be fairly trivial to reproduce it from a decent picture.

By the way, thanks for clarifying the existence and purpose of the now-in-kernel "terminal device." I've understood the Linux PTS mechanism and am aware of the Unix98 pty thing and all of that, but identifying it like that helps me mentally map it better.

The name that you need to know is line discipline.

Right... and that's the collective name for the terminal device's settings. I see, thanks!

Minor nitpick: isn't ESC decimal 27? I distinctly remember comparing char codes to 27 in Turbo Pascal to detect when user pressed Esc key... :)

LOL you're right. I guess I still managed to misread my table ^_^;

The 8x16 layout is more compact and fits on a single screen or book page, so it became the standard. You're absolutely right that the 4x32 layout makes the relationships more obvious. But once you've learned those relationships, you've lost the incentive to change the table.

Oh, so that's why C-m and C-j make a line break in Emacs, and C-g will make a sound if you press it one extra time.

^V will put DOS (and cmd.exe) into "next character is literal" mode, so "echo ^V^G" will make a beep. I think you needed ANSI.SYS loaded for it to work though (?).

Speaking of DOS, I've never forgotten that BEL is ASCII 7, so ALT+007 (ALT+7?) will happily insert it into things like text editors. I remember it showed up as a •. I'm not quite sure why.

Amazing. That is a thing of beauty. Can't believe I've never seen it like that before.

Thanks. With a bit of deduction from your post, I just figured out why ^H is backspace.

And ^W deletes words.

Both shortcuts work in terminal emulators.

(As an aside ^W is much easier to input^Wtype as a "fake backspace" thingamajig^H^H^H^H^H^H^H^H^H^H^Hmnemonic than ^H is.)

i didnt realize that ^W did words (and i didnt know what "ETB" was either). But thats a useful one to know!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact