Hacker News new | comments | show | ask | jobs | submit login

> What _is_ annoying about the UNISYS boxes is the 36 bit word format, though. Characters are stored in 9 bit quarterwords that map pretty awkwardly to bytes containing 8-bit ASCII. Binary data formats are essentially incompatible with anything.

This is why the FTP protocol has a byte size command. If all you have is 8-bit bytes then that seems strange. But at the time FTP was designed the most common machines on the ARPANET had 36-bit words (mostly PDP-10s and their derivatives) and bytes (the term was used in the more general sense) were just bit strings of 1-36 bits. 7-bit ascii was common (5 characters would fit in a word, like my username GUMBY), as were six bit bytes (pack six characters into a word). I never used 9-bit characters though arrays of nine-bit bytes were not unreasonable.

BTW the PDP-10 had 18-bit addresses so each word of memory held a Lisp cons; CAR, CDR, RPLACA etc were machine instructions. Gordon Bell and Alan Kotok designed the -10 (and its predecessor the PDP-6) with Lisp in mind. The first Lisp Machines.

> Binary data formats are essentially incompatible with anything.

Well, that's true today, but look at it the other way around: Unix was really developed for an 8/16-bit machine. It was a reimplementation of Multics that ran on a 36-bit machine (GE 645 & Honeywell 6180) written in PL/1. Unix was famously written for the PDP-7 (an 18-bit machine) but it was written in assembly. The famous PDP-11 version was written in a BCPL derivative you might have heard of called "C" and, since PL/1's level of machine abstraction was still new, the derivative modeled the PDP-11 architecture. So nowadays all CPUs are C machines and C runs well on them. Probably the most common non-PDP-11-like machine most programmers will program these days is a GPU.






> 7-bit ascii was common (5 characters would fit in a word, like my username GUMBY), as were six bit bytes (pack six characters into a word)

There were a bunch of different six bit character encodings, often (though not always strictly correctly) called "BCD". The horror show of IBM's EBCDIC was an eight bit extension of one of these.

Then there was 5 bit Baudot code, and...

The last time I checked, many *nix systems will still assume that you're on a 5 bit Baudot (uppercase only) teletype (i.e., a genuine physical tty) if you attempt to log in using all uppercase in your user name.

Some systems hacked in more characters by having special "shift in" and "shift out" characters. If a "shift in" character appeared in the stream, the system would switch to the alternate character set until a "shift out" character was received.


  > … *nix systems will still assume that you're on a 5 bit Baudot (uppercase only) teletype …
Akshully the original 1963 version of ASCII¹, which was a 7 bit code but did not include lower case. The Model 33 teletype² (one of the terminals used by UNIX developers³, and probably a contributing factor to two-character command names) was a 1963-ASCII device. Even after 1967 ASCII added lower case, popular low end video terminals⁴ did not include it so that they could get away with 6 bits worth of printable character ROM.

¹ http://worldpowersystems.com/archives/codes/X3.4-1963/index....

² https://en.wikipedia.org/wiki/Teletype_Model_33

³ https://commons.wikimedia.org/wiki/File:Ken_Thompson_(sittin...

https://en.wikipedia.org/wiki/ADM-3A


There are also other interesting uses for odd word lengths. For example, many UARTs support word lengths from 5 to 9-ish bits (some do more). This is commonly used to implement out of band signalling for protocols running over these, eg. using 9 bit words, where the ninth bit tells whether this is the start of a command frame. More handily even, in most MCUs this is already correctly separated, ie. there is a byte register with the lower 8 bits and then there is a different register for the remaining bits.

Or using the 9th bit to indicate that the rest of the byte is an address . Some PICs UARTS had an interruption that is triggered when the 9th bit is on.

The example C function in the article uses 8 instead of CHAR_BIT, which today seems not worth complaining about. I never knew there used to be so many machines with non-8-bit bytes.

> I never knew there used to be so many machines with non-8-bit bytes.

It's not like the 8 bit byte was the initial default and others were experiments.

FWIW I believe the 8-bit byte was an IBMism, and for whatever reason I don't remember IBM machines being particularly popular on the ARPANET, which was a research network.

Although its arrival well predated me I do remember a conversation with someone in which we were surprised by how it was becoming common to see people assume that a byte was a fixed 8 bits. I think that was entirely due to the spread of the Vax.


Yes. The early situation was very different; see e.g. http://www.ed-thelen.org/comp-hist/BRL-III-B.html from 1955.

The IBM 360 (1964–) was the killer 8-bit-byte machine.


> CAR, CDR, RPLACA etc were machine instructions. Gordon Bell and Alan Kotok designed the -10 (and its predecessor the PDP-6) with Lisp in mind. The first Lisp Machines.

The Stanford PDP-6 apparently even had a CONS instruction at opcode 257 :)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: