Hacker News new | past | comments | ask | show | jobs | submit login
Possible reasons for 8-bit bytes (jvns.ca)
197 points by cpach on March 7, 2023 | hide | past | favorite | 118 comments



I worked on the UIUC PLATO system in the 1970s : CDC-6600, 7600 cpus with 60-bit words. Back then everything used magnetic core memory and that memory was unbelievably expensive! Sewn together by women in southeast Asia, maybe $1 per word!

Having 6-bit bytes on a CDC was a terrific PITA! The byte size was a tradeoffs between saving MONEY (RAM) and the hassle of shift codes (070) used to get uppercase letters and rare symbols! Once semiconductor memory began to be available (2M words of 'ECS' - "extended core storage" - actually semiconductor memory - was added to our 1M byte memory in ~1978) computer architects could afford to burn the extra 2 bits in every word to make programming easier...

At about the same time microprocessors like the 8008 were starting to take off (1975). If the basic instruction could not support a 0-100 value it would be virtually useless! There was only 1 microprocessor that DID NOT use the 8-bit byte and that was the 12-bit intersil 6100 which copied the pdp-8 instruction set!

Also the invention of double precision floating point made 32-bit floating point okay. From the 40s till the 70s the most critical decision in computer architecture was the size of the floating point word: 36, 48, 52, 60 bits ... But 32 is clearly inadequate. But the idea that you could have a second larger floating point fpu that handled 32 AND 64-bit words made 32-bit floating point acceptable..

Also in the early 1970s text processing took off, partly from the invention of ASCII (1963), partly from 8-bit microprocessors, partly from a little known OS whose fundamental idea was that characters should be the only unit of I/O (Unix -father of Linux).

So why do we have 8-bit bytes? Thank you, Gordon Moore!


I worked on the later CDC Cyber 170/180 machines, and yeah there was a C compiler (2, in fact). 60-bit words, 18-bit addresses and index registers, and the choice of 5-bit or 12-bit chars. The highly extended CDC Pascal dialect papered over more of this weirdness and was much less torturous to use. The Algol compiler was interesting as well.

The 180 introduced a somewhat less wild, certainly more C friendly, 64-bit arch revision.

There was only 1 microprocessor that DID NOT use the 8-bit byte

Toshiba had a 12-bit single chip processor at one time I'm pretty sure you could make a similar claim about. More of a microcontroller for automotive that general purpose processor, tho.


yup - we bought 1.5Mb of core for our B6700 for ~US$1.25M (in 1976 dollars) - as a 48(+4) bit machine it either had 6 8-bit bytes or 8 6-bit bytes. In practice ASCII had 7-bit characters and EBCDIC 7-8 bit ones. Teletypes (6-bit) were still around.

From the beginning right up to the birth of microprocessors computer architects were still experimenting around these tradeoffs - around the late 70s it all jelled into 8 bit bytes and power of 2 word sizes

I think that 8 bits is a nice round number is you move to a memory model that is an array of bytes and an array of words - making everything a power of 2 in size makes the hardware simpler (much the same reason why bcd, and decimal arithmetic have largely disapeared)


having said that in the 90s I was once on a design team that built a CPU with 8 9-bit bytes in a word and an 81-bit instruction (9 9-bit bytes).

It was a DSP, memory was still relatively expensive and MPEG needed 9-bit delta frames, however DRAMS with byte parity were available so it was a price point that made sense at the time


The core dump of this systems must have been heavy.


I used a DECSYSTEM-20: it had variable "byte" size.. for ASCII you could (and normally did) pack 5 7-bit bytes into its 36-bit words. There were instructions to read and write bytes of any size you want out of these words.

The C compiler used 9-bit bytes, just to make pointer arithmetic simpler.


One of the most entertaining visual expositions of how magnetic core memory works is this little 14-min overview of the Saturn V memory system, "How did NASA steer the Saturn V":

https://youtu.be/dI-JW2UIAG0


> The word size is some multiple of the byte size. I’ve been confused about this for years, and the Wikipedia definition is incredibly vague

It is extremely confusing. A "word" means different things depending on architecture, programming language, and probably a few other things that make up the context.

For example, in the x86 assembly world you usually call an (usually aligned) 16 bit value a "word", a 32 bit value a "double word", and a 64 bit value a "quad word", all because the original 8086 had 16 bit wide registers that could be alternatively be used as two 8 bit registers.

In other architectures or more generally other contexts, a word often relates to "how wide the bus is". But even then some of the old nomenclature bleeds into newer things that may have gotten wider busses, and it's not even always clear what bus is being referred to.

Basically, unless you operate within a certain shared context, the word "word" is fuzzy and ambiguous. Maybe not quite as much as the word "object", but it's definitely up there somewhere.


Yes, that's correct. The word "word" is ambiguous and has a few meanings. Where I'm from, the "word" is the unit you can load/store to memory in a single instruction. So if you can load a 32-bit quantity, that's your word size. The byte is the address granularity, while the word is the access granularity.

Because of backwards compatibility, and because CPUs are far more flexible about loading more memory sizes than they traditionally have been, the meaning of "word" doesn't really matter anymore. In the context of Win32 and x86 programming, a WORD is 16 bits, a DWORD ("double word") is 32 bits, and a QWORD ("quad word") is 64 bits. That's most likely where you'll see it these days.


The reason why word size is so confusing is that we started sharing architectures (partially or fully) across multiple CPU designs, baking in the word size of whatever CPU design was first. And even Architectural word size has fallen out of modern use.

CPU word size for individual CPU designs much easier to define. It's not really the bus size, though it was strongly correlated until caches and bus-multipliers became a thing (For example, the PowerPC 601 has 32bit words (and 32bit registers) despite having a 64bit bus)

Word size is the native operation size. The size of integer you should use in your code for the best performance. Which is why the 68000 has a 16bit word size despite having 32bit registers, because instructions for 32bit operations took a few cycles longer.


On the other hand, I haven't seen a dword referring to anything other than 32 bits, nor a qword, 64. There's also oword, which is 128 bits.


PowerPC, ARM, AArch64, RISC-V and MIPS all use dword to refer to 64bit data.

And AArch64 also uses qword to refer to 128 bit data types.


Unfortunately, in Windows on ARM, a dword is still 32 bits.


That's a windows issue. When they say dword, they actually mean 32bit. It's just a type alias.

I assume you will find the same issue on the Window NT ports to MIPS and PowerPC.

Where you see dword as 64bit on those platforms is their documentation and assembly syntax. Most software in higher level languages (that aren't windows) avoided naming their type alias as WORD, DWORD and QWORD.


TIL that word/dword/qword comes from x86 assembly. I always thought these were invented by Microsoft as part of win16 and then carried into win32 for source-level backwards compatibility.


Binary coded decimal makes perfect sense if you're going to output the value to a succession of 7 segment displays (such as in a calculator). You would have to do that conversion in hardware anyway. A single repeated circuit mapping 4 bits to 7 segments gets you the rest of the way to readable output. Now that I think about it, its surprising ASCII wasn't designed around ease of translation to segmented display.


EBCDIC 1963/64 (i.e. E-BCD-IC) was an extension of BCD to support characters.

[0] https://en.wikipedia.org/wiki/EBCDIC


370 Assembly had a machine operation for converting an EBCDIC-encoded digit string into an internal integer which seemed odd at the time, but the more I think about it (converting digit strings to integers is a surprisingly subtle operation), the more it makes sense to do this at the lowest level possible.

The other interesting thing about EBCDIC was that character codes with A–F in the lower nibble of the code were generally unused or reserved for uncommon characters. This restriction didn’t apply to the upper nibble (most notably the digits were encoded as F0–F9).


That's not unusual for a CISC instruction set; even x86 has rudimentary decimal (BCD and ASCII) arithmetic support.

Modern z/Architecture (64-bit 360/370/390 successor) has instructions to convert numbers between EBCDIC, ASCII, Unicode, packed decimal, zoned decimal, and binary, and to convert Unicode character strings between UTF-8, UTF-16, and UTF-32.

In addition to conversion instructions, z/Architecture has full arithmetic support for binary and decimal integer operations and binary, decimal, and hexadecimal floating-point.

POWER6 and later instruction sets also include full support for decimal floating-point.


I assumed that packed decimals operations were already supported in old processors. E.g. COBOL had the 9(7)V9(2) implied-decimal-point style of money format that was well supported. Or do you mean all operations and not only conversion and basic + - * /.


"A program running on z/OS and the zSeries mainframe can run with 24-, 31-, or 64-bit addressing (and can switch among these if needed)."

https://www.ibm.com/docs/en/cics-ts/5.4?topic=basics-24-bit-...


> Now that I think about it, its surprising ASCII wasn't designed around ease of translation to segmented display.

Wikipedia has a section on the design considerations of ASCII: https://en.wikipedia.org/wiki/ASCII#Design_considerations


I love that there's a fractal world down there; from that the digits 0-9 start with bit pattern 0011 and then their value in binary to make easy convertion to/from BCD, and that control codes Start Message and End Message were positioned to maximise the Hamming distance so they're maximally different and least likely to be misinterpreted as each other in case of bits being mixed up, that 7-bit ASCII used on 8-bit tape drives left room for a parity bit for each character, that lowercase and uppercase letters differ only by the toggling of a single bit, that some of the digit/shift-symbol pairings date back to the first typewriter with a shift key in 1878...


And my personal favorite: DEL is 1111111 so it could be punched over any other character to "delete" it from a paper tape.


In the 8 bit variant you could also interpret it as the Morse Code letter ........, which is an early form of non-printable character known as a "procedure sign" (aka "pro-sign"), and is often transcribed as letters without the inter-letter spacing (Eg <SOS> is the transcription of ...---...). The prosign <eeeeeeee> means the sender has mis-keyed something and you should ignore the previous character.


The Wikipedia section is pretty good now. There are a few notable omissions. The backslash was added so that logical ∧ and ∨ could be written as /\ and \/ — two operators for the price of one. The characters : and ; were considered the least important punctuation, so they were placed immediately after 9, where they could be replaced by single-character 10 and 11 for shillings. And the characters *+,-./ were placed in rows 10-15 so that they could easily be combined with the digits to make a 16-bit subset suitable for writing numbers and basic arithmetic.


Maybe because ASCII is from the early 1960s and 7-segment displays didn't become widespread until 15 years or so later.


Author doesn't mention that several of those machines with 36-bit words had byte instructions allowing you to point at particular byte (your choice as to width, from 1-36 bits wide) and/or to stride through memory byte by byte (so an array of 3-bit fields was as easy to manipulate as any other size).

Also the ones I used to program (PDP-6/10/20) had an 18-bit address space, which you may note is a CONS cell. In fact the PDP-6 (first installed in 1964) was designed with LISP in mind and several of its common instructions were LISP primitives (like CAR and CDR).


Even more so, 6-bit characters were often used (supporting upper case only), in order to squeeze six characters into a word. Great for filenames and user id's. And for text files, 7-bit was enough to get upper and lower case and all the symbols, and you could pack five characters into a word. What could be better?


BCD was popular for a long time for values that had to be constantly printed in decimal, since doing repeated division by 10 could be slow on systems that had no wide multiplication and no barrel shifter (remember that integer division by a constant is just a wide multiply and shift). By storing in BCD you can skip the division when you go to print the value out, at the cost of making arithmetic on the value a bit more expensive. Even some video games of the 16-bit era used BCD for stuff that had to be displayed in the UI.


Knuth's MIX & MIXAL used a kind of BCD. A really strange design decision, perhaps a product of its time. A PITA to emulate.


For those who are confused about bytes vs. words:

The formal definition of a byte is that it's the smallest addressable unit of memory. Think of a memory as a linear string of bits. A memory address points to a specific group of bits (say, 8 of them). If you add 1 to the address, the new address points to the group of bits immediately after the first group. The size of those bit groups is 1 byte.

In modern usage, "byte" has come to mean "a group of 8 bits", even in situations where there is no memory addressing. This is due to the overwhelming dominance of systems with 8-bit bytes. Another term for a group of 8 bits is "octet", which is used in e.g. the TCP standard.

Words are a bit fuzzier. One way to think of a word is that it's the largest number of bits acted on in a single operation without any special handling. The word size is typically the size of a CPU register or memory bus. x86 is a little weird with its register addressing, but if you look at an ARM Cortex-M you will see that its general-purpose CPU registers are 32 bits wide. There are instructions for working on smaller or larger units of data, but if you just do a generic MOV, LDR (load), or ADD instruction, you will act on 32 register bits. This is what it means for 32 bits to be the "natural" unit of data. So we say that an ARM Cortex-M is a 32-bit CPU, even though there are a few instructions that modify 64 bits (two registers) at once.

Some of the fuzziness in the definition comes from the fact that the sizes of the CPU registers, address space, and physical address bus can all be different. The original AMD64 CPUs had 64-bit registers, implemented a 48-bit address space, and brought out 40 address lines. x86-64 CPUs now have 256-bit SIMD instructions. "32-bit" and "64-bit" were also used as marketing terms, with the definitions stretched accordingly.

What it comes down to is that "word" is a very old term that is no longer quite as useful for describing CPUs. But memories also have word sizes, and here there is a concrete definition. The word size of a memory is the number of bits you can read or write at once -- that is, the number of data lines brought out from the memory IC.

(Note that a memory "word" is technically also a "byte" from the memory's point of view -- it's both the natural unit of data and the smallest addressable unit of data. CPU bytes are split out from the memory word by the memory bus or the CPU itself. Since computers are all about running software, we take the CPU's perspective when talking about byte size.)


In the Microsoft world, “word” generally means 16 bits, because their usage dates back to the 16 bit era. Other sizes are double words and quad words

In the ARM ARM, a word is 32 bits, because that was the Arm’s original word size. Other sizes are half words and double words.

It is a very context-sensitive term.


FWIW, those conventions come from Intel originally, Microsoft took it from them. ARM borrowed from VAX Unix conventions, who got it from DEC.


>In the Microsoft world, “word” generally means 16 bits, because their usage dates back to the 16 bit era. Other sizes are double words and quad words

Ah, yes. That terminology is still used in the Windows registry, although Windows 10 seems to be limited to DWORD and QWORD. Probably dates back to the 286 or earlier. :-)


It's not entirely historically accurate. Early machines were "word addressable" (where the words wasn't 8-bit) which by your definition should have been called "byte addressable".

There were even bit addressable computers, but it didn't catch on :)

If it wasn't for text, there would be nothing "natural" about an 8-bit byte (but powers-of-two are natural in binary computers).


> There were even bit addressable computers, but it didn't catch on

Cortex-M has bit addressability as a vendor option. They're mapped onto virtual bytes in a region of memory so the core doesn't have to handle them in a special way.


"byte" has become so synonymous with 8 bits that I like to use the term "addressable unit".

I found this LLVM forum thread discussing it interesting: https://groups.google.com/g/llvm-dev/c/s2yuELeQMA8

And related PR https://reviews.llvm.org/D61725


I like how the author describes their thought process for how they made their best guess when they couldn't find a definitive answer, without simply assuming something or (worse) stating something as "fact" without evidence.


Yes, it is a really refreshing style of writing.


The reason to have a distinction between bits and bytes in the first place is so that you can have a unit of addressing that is different from the smallest unit of information.

But what would we lose if we just got rid of the notion of bytes and just let every bit be addressable?

To start, we'd still be able to fit the entire address space into a 64-bit pointer. The maximum address space would merely be reduced from 16 exabytes to 2 exabytes.

I presume there's some efficiency reason why we can't address bits in the first place. How much does that still apply? I admit, I'd just rather live in a world where I don't have to think about alignment or padding ever again. :P


There are a couple of efficiency reason besides the simple fact that every piece of hardware in existence operates on data sizes in powers of the byte. To start off with it would be fantastically inefficient to build a cpu that could load arbitrary bit locations so you would either be restricted to loading memory locations that are some reasonable fraction of the internal cache line or pay a massive performance penalty to load a bit address. Realistically what would you gain by doing this when the cpu would have to divide any location by eight (or some other fraction) to figure out which cache line it needs to load?

The article touches on this but having your addressable unit fit a single character is incredibly convenient. If you are manipulating text you will never worry about single bits in isolation. Ditto for mathematical operations, do you really have a need for numbers less than 255? It is a lot more convenient to think about memory locations as some reasonable unit that covers 99% of your computing use cases.


> There are a couple of efficiency reason besides the simple fact that every piece of hardware in existence operates on data sizes in powers of the byte. To start off with it would be fantastically inefficient to build a cpu that could load arbitrary bit locations so you would either be restricted to loading memory locations that are some reasonable fraction of the internal cache line or pay a massive performance penalty to load a bit address.

The Intel iAPX 432 did use bit-aligned instructions:

> https://en.wikipedia.org/w/index.php?title=Intel_iAPX_432&ol...


The TMS340 family used bit addresses, but pointers were 32 bits.

https://en.wikipedia.org/wiki/TMS34010


64 bits of addressing is actually much more than most (any?) actually-existing processors have, for the simple reason that there is little demand for processors that can address 16 exabytes of memory and all those address lines still cost money.


More to the point, storing the pointers cost memory. Switching from 32-bit to 64-bit effectively halved the caches for pointer-rich programs. AMD64 was a win largely due to all the things they did to compensate (including doubling the number of registers).


~60's computing grew out of the field of telecommunications [0][1].

When ASCII was released in 1963[3], integrated circuits didn't exist.

Computing hardware was a build it from available, off the shelf parts endeavor.

Easy to repurpose readily available 4-wire[2] to operate 4 wire relay. (2 relays -> 8 bits).

-----

[0] http://ed-thelen.org/comp-hist/Reckoners-ch-4.html

[1] http://quadibloc.com/comp/cardint.htm

[2] : https://en.wikipedia.org/wiki/Four-wire_circuit

[3] : https://www.historyofinformation.com/detail.php?id=803


One thing nobody talks about is that packaging technology back then was expensive.

4 bits fits into a 16 pin DIP well and cascades well. It is no coincidence that the 4004 operates on 4 bit units (BCD was also a factor). The 8008 needed an 18-pin DIP and was still far too constrained.

So, your choice of unit is likely a multiple of 4.


I thought 8008 was a 28-pin DIP and 8080 a 40-pin DIP ...


Nope, 8008 was an 18-pin DIP.

The 8080 was indeed a 40-pin DIP. However, it still needed something like 3 other support chip to demultiplex everything.

It wasn't until the 8085 that you didn't need so much support circuitry.


ML might benefit a lot from 10bit bytes. Accelerators have a separate memory space from the CPU after all, and have their own hbm dram as close as possible to the dies. In exchange, you could get decent exponent size on a float10 that might not kill your gradients when training a model


There seems to be as-yet no consensus on the best math primitives for ML.

People have invented new ones for ML (eg the Brain Float16), but even then some people have demonstrated training on int8 or even int4.

There isn't even consensus on how to map the state space onto the numberline - is linear (as in ints) or exponential (as in floats) better? Perhaps some entirely new mapping?

And obviously there could be different optimal numbersystems for different ML applications or different phases of training or inference.



The transition started before C, EBCDIC was 8 bits and ASCII was essentially a byte encoding. Unless you were designing some exotic hardware you probably needed to handle text and that was an eight bit byte. One motivation for the C type system was to extend the B programming language to support ASCII characters.


> DNS has a class field which has 5 possible values (“internet”, “chaos”, “hesiod”, “none”, and “any”).

I haven't thought about the possible values for that "class" field in a long time. It does seem interesting and extremely weird from today's perspective.


Wonder why they went with AAAA for ipv6 instead of a different class


because no one ever used CLASS and so it was poorly supported/understood


Speculation: 7 bits are required to represent 0..100, and 7 is Just Weird, so you add an 8th to be able to do -100..100 in one addressable unit. (Or use the 8th bit to tag a variable-length numeric value.)

4 bits for 0..9 as in BCD is probably more convincing.


This was three or four jobs ago, but I remember reviewing someone's C code and they kept different collections of char* and int* pointers where they could have used a single collection of void* and the handler code would have been a lot simpler.

The justification was that on this particular platform, char* pointers were differently structured to int* pointers, because char* pointers had to reference a single byte and int* pointers didn't.

EDIT - I appear to have cut this story short. See my response to "wyldfire" for the rest. Sorry for causing confusion.


> because char* pointers had to reference a single byte and int* pointers didn't.

I must be missing some context or you have a typo. Probably most architectures I've ever worked with had `int *` refer to a register/word-sized value, and I've not yet worked with an architecture that had single-byte registers.

Decades ago I worked on a codebase that used void * everywhere and rampant casting of pointer types to and fro. It was a total nightmare - the compiler was completely out of the loop and runtime was the only place to find your bugs.


There are architectures where all you have is word addressing to memory. If you want to get a specific byte out, you need to retrieve it and shift/mask yourself. In turn, a pointer to a byte is a software construct rather than something there's actual direct architectural support for.


x86 is byte addressable, but internally, the x86 memory bus is word addressable. So an x86 CPU does the shift/mask process you're referring to internally. Which means it's actually slower to access (for example) a 32-bit value that is not aligned to a 4-byte boundary.

C/C++ compilers often by default add extra bytes if necessary to make sure everything's aligned. So if you have struct X { int a; char b; int c; char d; } and struct Y { int a; int b; char c; char d; } actually X takes up more memory than Y, because X needs 6 extra bytes to align the int fields to 32-bit boundaries (or 14 bytes to align to a 64-bit boundary) while Y only needs 2 bytes (or 6 bytes for 64-bit).

Meaning you can sometimes save significant amounts of memory in a C/C++ program by re-ordering struct fields [1].

[1] http://www.catb.org/esr/structure-packing/


> internally, the x86 memory bus is word addressable

The 'memory bus' is not architectural. Different microarchitectures implement things differently, but most high-performance microarchitectures these days have relatively efficient misaligned accesses.


> The 'memory bus' is not architectural

Whether we classify the issue as "architectural" (whatever that means) is beside the point. Alignment has real effects on performance, and being aware of those effects is practically useful for working programmers.

I do agree with you about one thing: The unaligned access penalty is probably less on the x86 CPU's of the 2020's than it was, say, on a 486 from the mid-1990's.


Sure, unaligned access to memory is always expensive (on architectures that allow it at all).

But I'm talking about retrieving the 9th to 16th bit of a word, which is a little different. x86 does this just fine/quickly, because bytes are addressable.


Do C compilers for those platforms transparently implement this for your char pointers as GP suggests? I would expect that you would need to do it manually and that native C pointers would only address the same words as the machine itself.


> Do C compilers for those platforms transparently implement this for your char pointers as GP suggests?

Yes. Lots of little microcontrollers and older big machines have this "feature" and C compilers fix it for you.

There are nightmarish microcontrollers with Harvard architectures and standards-compliant C compilers that fix this up all behind the scenes for you. E.g. the 8051 is ubiquitous, and it has a Harvard architecture: there are separate buses/instructions to access program memory and normal data memory. The program memory is only word addressable, and the data memory is byte addressable.

So, a "pointer" in many C environments for 8051 says what bus the data is on and stashes in other bits what the byte address is, if applicable. And dereferencing the pointer involves a whole lot of conditional operations.

Then there's things like the PDP-10, where there's hardware support for doing fancy things with byte pointers, but the pointers still have a different format than word pointers (e.g. they stash the byte offset in the high bits, not the low bits).

The C standards makes relatively few demands upon pointers so that you can do interesting things if necessary for an architecture.


Depends on how helpful the compiler is. This particular compiler had an option to switch off adding in bit shifting code when reading characters and instead set CHAR_BIT to 32, meaning strings would have each character taking up 32 bits of space. (So many zero bits, but already handles emojis.)


I have seen a DSP processor that could address only 16-bit words. And C compiler did not fix it, bytes had 16 bits there.


Yeah, this is what I have heard of and was expecting. Sibling comment says it's not universal -- some C compilers for these platforms emulate byte addressing.


I forget the details (long time ago) but char* and int* pointers had a different internal structure. The assembly generated by the compiler when code accessed a char* pointer was optimized for accessing single bytes and was very different to the code generated for an int* pointer.

Digging deeper, this particular microcontroller was tuned for accessing 32 bits at a time. Accessing individual bytes needed extra bit-shuffling code to be added by the compiler.


Sounds like M68K or something similar, although Alpha AXP had similar byte level access issues. A compiler on either of those platforms likely would add a lot of fix up code to deal with the fact they have to load the aligned (either 16bit in M68K case or 32Bit IIRC in Alpha) and then do bitwise and shifts depending on the pointers lower bits.

Raymond's blog on the Alpha https://devblogs.microsoft.com/oldnewthing/20170816-00/?p=96...


M68k was byte addressable just fine. Early alpha had that issue though, as did later cray compilers. Alpha fixed it with BWX (byte word extension). Early cray compilers simply defined char as being 64bits, but later added support for the shift/mask/thick pointer scheme to pack 8 chars in a word.


Must have depended on variant, the one we used in college would throw a GP fault for misaligned access. It literally didn't have an A0 line. That said it's been over 10 years and I could be remembering the very hard instruction alignment rules as applying to data too...


16bits had to be aligned. It didn't have an A0 because of the 16bit pathway, but it did have byte select lines (#UDS, #LDS) for when you'd move.b d0,ADDR so that devices external to the CPU could see an 8-bit data access if that's what you were doing.


> char* and int* pointers had a different internal structure. The assembly generated by the compiler when code accessed a char* pointer was optimized for accessing single bytes and was very different to the code generated for an int* pointer.

But -- they are different. Architectures where they're treated the same are probably the exception. Depending on what you mean by "very different" - most architectures will emit different code for byte access versus word access.


Accessing a 32 bit word was a simple read op.

Accessing an 8 bit byte from a pointer, the compiler would insert assembly code into the generated object code. The "normal" part of the pointer would be read, loading four characters into a 32 bit register. Two extra bits were squirreled away somewhere in the pointer and these would feed into a shift instruction so the requested byte would appear in the lowest-significant 8 bits of the register. Finally, an AND instruction would clear the top 24 bits.


an architecture that had single-byte registers

Wild guess, but the OP might be talking about the Intel 8051. Single-byte registers, and depending on the C compiler (and there are a few of them) 8-bit int* pointing to the first 128/256 bytes of memory, but up to 64K of (much slower) memory is supported in different memory spaces with different instructions and a 16-bit register called DPTR (and some implementations have 2 DPTR registers). C support for these additional spaces is mostly via compiler extensions analogous but different from the old 8086 NEAR and FAR pointers. I'm obviously greatly simplifying and leaving out a ton of details.

Oh, yeah...on 8051 you need to support bit addressing as well, at least for the 16 bytes from 20h to 2Fh. It's an odd chip.


It is true that on at least some platforms an int* that is 4 byte aligned is faster to access than a pointer that is not aligned. I don’t know if there are platforms where int* is assumed to be 4-byte aligned, or if the C standard allows or disallows that, but it seems plausible that some compiler somewhere defaulted to assuming an int* is aligned. Some compilers might generate 2 load instructions for an unaligned load, which incurs extra latency even if your data is already in the cache line. These days usually you might use some kind of alignment directive to enforce these things, which works on any pointer type, but it does seem possible that the person’s code you reviewed wasn’t incorrect to assume there’s a difference between those pointer types, even if there was a better option.


In C++, char* pointers are special in that they are exempt from aliasing rules.


Because it’s a power of 2…

It’s embarrassing you’re even asking this. Obviously we wouldn’t use 11 bits per byte because that’s nonsense.

Why don’t we use 4 bits? Because that’s like saying you could only write 2 digits for decimal numbers and never more. That’s not enough digits for practical use. I don’t want to have to write every single number as 10 pairs of two digits when I could just use numbers of a practical length.


I’m kind of disappointed that embedded computing was not mentioned. It is the longest running use-case for resource constrained applications and there are cases where not only are you using 8-bit bytes but also an 8 bit CPU. BCD is still widely used in this case to encode data to 7 segment displays or just as data is relayed over the wire between chips.


I agree completely! See my answer up above. Only 7 or 8 bits makes sense for a microprocessor, not useful if you cannot store 0-100 in a byte! With ASCII(1963) becoming ubiquitous, the 8008 had to be 8-bits! Otherwise it would have been the 7007 lol ...


>It looks like the next important machine in 8-bit-byte history was the Intel 8008

How about the PDP-11? Both the 360 and the PDP-11 heavily influenced microprocessor architecture.


There were computers which used 4 and 6 bits as well. But 8bit byte was cheaper and worked for the time so now we have 8 bit everywhere


But what about 10-bit bytes* on a serial port?

https://en.wikipedia.org/wiki/8-N-1

* words


Those are normal 8-bit bytes wrapped in a mark-space pair to allow the hardware to divine which 8-bit byte it is using. That mark-space pair is just a way of separating out the beginnings and endings of the 8-bit bytes in the bit-stream.


Yes, that's what start / stop bits are. Though the parity bit is something else.


But the Wiki article was for '8N1' - that's why I didn't mention it.

If you add a parity bit, (say '8E1'), that makes the total 9 bits. When you wrap that in the mark-space pair that makes a total of 11 bits.

Then if you have two stop-bits, as in (say '8O2') you have 8 data bits, one parity bit, one start-bit, and two stop-bits. That makes 12 bits altogether for the single data byte sent serially.


BCD is more accurate within the limits of the 8 digits.

For instance a binary '3' is inexact, but a BCD '3' is exact.

That means that currency transactions tend to work better as long as you can restrict the number of significant digits to less than the number of digits used in the BCD software.

I used to use North Star BASIC back in the early 80s. North Star BASIC's default number of digits was 8, but they also supplied 10, 12 and 14 digit BASIC along with the default. You used whichever one was required for accuracy in your application.


> binary '3' is inexact

Huh? Do you mean '0.3'?


No '3'. The difference between '300', '30', '3', 0.3' etc is just the power of 10 attached to it.

Having 'cut my teeth' on North Star BASIC's BCD floating point representation of numbers ( "1 + 2 = 3"), with all numbers, even integers, stored as 5-byte 8-digit BCD floating-point numbers internally. It used to drive me nuts to use Lawrence Livermore Laboratories' BASIC which used binary floating-point. You'd get things like "1 + 2 = 2.99999998" (or similar, I'm working from a 40-years ago memory here).

LATER UPDATE: Sorry, I omitted the words 'floating point'. No wonder you said 'huh?'. It should have read 'Binary floating point '3' is inexact'.


You're remembering wrong, unless they did something crazily non-standard. Whole numbers are represented perfectly in binary floating point. At least until you run out of digits, at which point they start rounding to even numbers, then to multiples of 4, etc.

There is one spot where binary floating point has trouble compared to decimal, and that's dividing by powers of 5 (or 10). If you divide by powers of 2 they both do well, and if you divide by any other number they both do badly. If you use whole numbers, they both do well.

Also even if you do want decimal, you don't want BCD. You want to use an encoding that stores 3 digits per 10 bits.


Binary is significantly more accurate for most numbers.

If you use 12 digit BCD to store 1/3, you get .333 333 333 333 000 000

If you use the same number of bits for binary, you get .333 333 333 333 333 925


> (“a word is the natural unit of data used by a particular processor design”) Apparently on x86 the word size is 16 bits, even though the registers are 64 bits.

That's true for the the original x86 instruction set. IA-32 has 32 bit word size and x86-64 has... you guessed it 64.

16 and 32 bit registers are still retained for compatibility reasons (just look at the instruction set!).


It's extra fun because not only are the registers retained, they were only extended. So you can use their 16 and 32 bit names to refer to smaller sized parts of them.


Words aeem to be always 16 bits for the x86 and derivatives - see the data types section of the software developer manual.


I think "word" as a datatype evolved from its original and more general computing meaning of "the natural unit of data used by a processor design" (the thing we talk about with an "x-bit processor") to "16 bits" during the period of 16-bit dominance and the explosion of computing that was happening around it.

Essentially, enough stuff got written assume=ing that "word" was 16 bits (and double word, quad word, etc., had their obvious relationship) that even though the term had not previously been fixed, it would break the world to let it change, even as processors with larger word sizes (in the "natural unit of data" sense) became available, then popular, then dominant.


Some x86 categorizations would call those dwords and qwords respectively.


Maybe another vague reason: when PCs came about in the era of the 8008...8086s, 64K of RAM was like a high, reasonable amount. So you need 16-bit pointers, which require exactly 2 bytes.


Because humans have 10 fingers and 8 is the closest power-of-two to that.


The article points out that a power of two bit count is actually less important than many of us assume at first.


> why was BCD popular?

https://www.truenorthfloatingpoint.com/problem

Floating point arithmetic has its problems.

[1] Ariane 5 ROCKET, Flight 501

[2] Vancouver Stock Exchange

[3] PATRIOT MISSILE FAILURE

[4] The sinking of the Sleipner A offshore platform

[1] https://en.wikipedia.org/wiki/Ariane_flight_V88

[2] https://en.wikipedia.org /wiki/Vancouver_Stock_Exchange#Rounding_errors_on_its_Index_price

[3] https://www-users.cse.umn.edu/~arnold/disasters/patriot.html

[4] https://en.wikipedia.org/wiki/Sleipner_A#Collapse


The Ariane bug was an overflow casting 64-bit floating point to 16-bit integer. It would still have overflowed at the same point if it had been 64-bit decimal floating point using the same units. The integer part of the floating point number still wouldn't have fit in a signed 16-bit integer.

As per the provided link, the Patriot missile error was 24-bit fixed point arithmetic, not floating point. Granted, a fixed-point representation in tenths of a second would have fixed this particular problem, as would have using a clock frequency that's a power of 1/2 (in Hz). Though, using a base 10 representation would have prevented this rounding error, it would also have reduced the time before overflow.

I think IEEE-754r decimal floating point is a huge step forward. In particular, I think there was a huge missed opportunity in defining open spreadsheet formats that decimal floating point option wasn't introduced.

However, binary floating point rounding is irrelevant to the Patriot fixed-point bug.

It's not reasonable to expect accountants and laypeople to understand binary floating point rounding. I've seen plenty of programmers make goofy rounding errors in financial models and trading systems. I've encountered a few developers who literally believed the least significant few bits of a floating point calculation are literally non-deterministic. (The best I can tell, they thought spilling/loading x87 80-bit floats from 64-bit stack-allocated storage resulted in whatever bits were already present in the low-order bits in the x87 registers.)


Can you elaborate? How/why is BCD a better alternative to floating point arithmetic?


floating point error. BCD guarantees you that 1/10th, 1/100th, 1/100th, etc (to some configurable level) will be perfectly accurate, without accumulating error during repeat calculations.

floating point cannot do that, its precision is based on powers of 2 (1/2, 1/4, 1/8, and so on). For small values (in the range 0-1), there are _so many_ values represented that the powers of 2 map pretty tightly to the powers of 10. But as you repeat calculations, or get into larger values (say, in the range 1,000,000 - 1,000,001), the floating points become more sparse and errors crop up even easier.

For example, using 32 bit floating point values, each consecutive floating point in the range 1,000,000 - 1,000,001 is 0.0625 away from the next.

  jshell> Math.ulp((float)1_000_000)
  $5 ==> 0.0625


You are confusing two things. Usually you represent decimal numbers as rational fractions p/q with two integers. If you fix q, you get a fixed point format, if you allow q to vary, you get a floating point format. Unless you are representing rational numbers you usually limit the possible values of q, usually either powers of two or ten. Powers of two will give you your familiar floating point numbers but there are also base ten floating point numbers, for example currency data types.

BCD is a completely different thing, instead of tightly encoding an integer you encode it digit by digit wasting some fraction of a bit each time but make conversion to and from decimal numbers much easier. But there is no advantage compared to a base ten fixed or floating point representation when it comes to representable numbers.


This was one of those things where I know just enough to realize something about the reasoning is not right. Thank you for putting that feeling into competent words.


As a practical example, POWER architecture uses the densely-packed decimal encoding[1] to encode decimal digits within its IEEE 754-compliant decimal floating-point format[2].

IEEE 754 also supports encoding decimal integers as binary, by converting the entire decimal integer to a single binary integer (i.e., not by storing each decimal digit as a separate binary number).

[1] https://en.wikipedia.org/wiki/Densely_packed_decimal

[2] https://files.openpower.foundation/s/dAYSdGzTfW4j2r2#page=22...


As others are pointing out, decimal fidelity and "error" are different things. Any fixed point mantissa representation in any base has a minimal precision of one unit in its last place, the question is just which numbers are exactly representable and which results have only inexact representations that can accumulate error.

BCD is attractive to human beings programming computers to duplicate algorithms (generally financial ones) intended for other human beings to execute using arabic numerals. But it's not any more "accurate" (per transistor, it's actually less accurate due to the overhead).


You can have infinite precision in pretty much any accurate representation though, no? Where is the advantage in using BCD over any other fixed point representation?


For the reasons others have mentioned, plus BCD doesnt suffer data type issues in the same way unless the output data type is wrong, but then the coder has more problems than they realise.

The only real disadvantage for BCD is its not as quick as Floating point arithmetic, or bit swapping data types, but with todays faster processors, for most people I'd say the slower speed of BCD is a non issue.

Throw in other hardware issues, like bit swapping in non ECC memory and the chances of error's accumulate if not using BCD.


BCD is not floating point


BCD and floating point are orthogonal. Some early computer systems had BCD floating point, as did (do?) most scientific calculators.


That's the parent's point


Avoiding floating point doesn't imply BCD. Any representation for integers would do fine, including binary.

There are two reasons for BCD, (1) to avoid the cost of division for conversion to human readable representation as implied in the OP, (2) when used to represent floating point, to avoid "odd" representations in the human format resulting from the conversion (like 1/10 not shown as 0.1). (2) implies floating point.

Eben in floating point represented using BCD you'd have rounding errors when doing number calculations, that's independent of the conversion to human readable formats; so I don't see any reason to think that BCD would have avoided any disasters unless humans were involved. BCD or not is all about talking to humans, not to physics.


>Avoiding floating point doesn't imply BCD

Parent didn't say it's a logical necessity, as in "avoid floating point ==> MUST use BCD".

Just casually mentioned that one reason BCD got popular to sidestep such issues in floating point.

(I'm not saying that's the reason, or that it's the best such option. It might even be historically untrue that this was the reason - just saying the parent's statements can and probably should be read like that).


Sidestep which issue? The one of human representation, or the problems with floating point?

If they just want to side step problems with floating point rounding targetting the physical world, they need to go with integers. Choosing BCD to represent those integers makes no sense at all for that purpose. All I sense is a conflation of issues.

Also, thinking about it from a different angle, avoiding issues with the physical world is one of properly calculating so that rounding errors become no issues. Choosing integers probably helps with that more in the sense that it is making the programmer aware. Integers are still discrete and you'll have rounding issues. Higher precision can hide risks from rounding errors becoming relevant, which is why f64 is often chosen over f32. Going with an explicit resolution and range will presumably (I'm not a specialist in this area) make issues more upfront. Maybe at the risk of missing some others (like with the Ariane rocket that blew up because of a range overflow on integer numbers -- Edit: that didn't happen on the integer numbers though, but when converting to them).

A BCD number representation helps over the binary representation when humans are involved who shouldn't be surprised by the machine having different rounding than what the human is used to from base 10. And maybe historically the cost of conversion. That's all. (Pocket calculators, and finance are the only areas I'm aware of where that matters.)

PS. danbruc (https://news.ycombinator.com/item?id=35057850) says it better than me.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: