The naming of processor sizes is the subject of debate. I call a "pure 8 bit processor" one that has 8 bits for both data and addresses. Like the Kenbak-1. But these are so rare and educational rather than practical that it is very reasonable to call hybrid 8 bit / 16 bit processors just "8 bit".
This use of sloppy terms shouldn't make us forget that they are using an address extension trick, just like all those 16 bit processors that wanted to go beyond 64KB (for byte addressed such as the PDP-11, Z8000 or 8086) or 128KB (for word addressed, like the Xerox Alto's modified Data General Nova model).
Michael J. Flynn, best known in the computing world for his taxonomy of parallel computing (SISD, SIMD, MISD and MIMD), passed away on December 24, 2025
Most people are not aware that after the failure of the PS/2 attempt to control the PC market, IBM tried a third time using 7 patents it had for the PC AT (they didn't have any on the original PC or the XT). In the first half of the 1990s they went after the chipset makers (mostly in Japan at that time) and in the second half of the 1990s they went after the PC makers themselves all around the world. The would threaten to sue for all machines made up to that point unless they licensed not only the 7 AT patents (which would expire in 2001) but also a bunch of other unrelated patents that were much newer. As far as I know everybody signed the deal, which meant that IBM could make money without actually making any PCs themselves.
This is true, but the patents were all RAND-licensed, so the press reported that IBM made about $5 per PC. Which isn't nothing, but IBM had no ability to restrict PCs market segments (like they had with Microchannel). So we soon saw "commodity" PC servers and even midrange systems, stealing IBM's bread+butter.
One issue he mentioned is still true today in Brazil's universities: while in theory you can ask to transfer from one course to another, in practice you have to drop out of your current school and take the entrance exam for the other one. And then you waste a lot of time trying to get your grades for the courses you have already taken recognized as equivalent so you don't have to start from scratch.
For him to move from math to electrical engineering to physics in Brazil would mean going through this twice. This might make him take some 7 or 8 years to graduate.
I guess this inflexibility makes things easier for the administrators. They know they will have 25 students in the statistics class in 2028 and so know how many teachers to hire to handle that.
Actually, the 68000 had one full (all operations) 16 bit ALU and two more simple (add/subtract, so AU might be a better name) 16 bit ALUs so in the best case it could crunch 48 bits per clock cycle. The 8086 had one full 16 bit ALU and one simple 16 bit ALU (the ancestor of todays AGUs - address generator units).
The 68000 actually had both microcode and nanocode, so it was even further from hardwired control logic than the 8086. In terms of performance the 68000 was slightly faster than the 286 and way faster than the 8088 (I never used an 8086 machine).
The 286 looks like it ought to be usefully quicker in general? Motorola did a good job on the programming model, but you can tell that the 68000 is from the 1970s. Nearly all the 68000 instructions take like 8+ cycles, and addressing modes can cost extra. On the 286, on the other hand, pretty much everything is like 2-4 cycles, or maybe 5-7 if there's a memory operand. (The manual seems to imply that every addressing mode has the same cost, which feels a bit surprising to me, but maybe it's true.) 286 ordinary call/ret round trip time is also shorter, as are conditional branches and stack push/pop.
The timings given in the datasheet of 286 are very optimistic and they can almost never be encountered in a real program.
They assume that instructions have been fetched concurrently without ever causing a stall and that memory accesses are implemented with 0 wait states.
In reality, instruction fetching was frequently a bottleneck and implementing a memory with 0 wait states for 80286 was much more difficult than for MC68000 or MC68010.
With the available DRAM, normally both 80286 and 80386 would have needed a cache memory. Later, after the launch of 80386DX, cache memories became common on 386DX MBs, but I have not seen any 80286 motherboard with cache memory.
They might have existed at an earlier time when 286 was the highest end, but by the time of the coexistence with 386 the 286 became the cheap option, so its motherboards never had cache memory, thus the memory accesses always had wait states, increasing the probability of instruction fetch bottlenecks and resulting in significantly more clock cycles per instruction than in the datasheet.
> but by the time of the coexistence with 386 the 286 became the cheap option, so its motherboards never had cache memory
Not true. I vaguely remember servicing systems with chipsets from OPTI(only 2 large ones) having it. IIRC those were funtional(not exact) clones of Chips&Technologies NEAT(4 to 5 large chips), later shrunken to one by SCAT (Single Chip AT).
Also in times when the 386 ran at 33Mhz, or even at 40 if made by AMD, Compaq introduced 386SX systems with cache, and I remember wondering "why, oh why?". Talk about overengineering...
My reading is that there aren't really a lot of addressing modes on 286, as there are on 68000 and friends, rather every address is generated by summing an optional immediate 8 or 16 bit value and from zero to two registers. There aren't modes where you do one memory fetch, then use that as the base address for a second fetch, which is arguably a vaguely RISC flavored choice. There is a one cycle penalty for summing 3 elements ("based indexed mode").
What you say about memory indirect addressing is true only about MC68020 (1984) and later CPUs.
MC68000 and MC68010 had essentially the same addressing modes with 80286, i.e. indexed addressing with up to 3 components (base register + index register + displacement).
The difference is that the addressing modes of MC68000 could be used in a very regular way. All 8 address registers were equivalent, all 8 data registers were equivalent.
In order to reduce the opcode size, 80286 and 8086 permitted only certain combinations of registers in the addressing modes and they did not allow auto-increment and auto-decrement modes, except in special instructions with dedicated registers (PUSH, POP, MOVS, CMPS, STOS, LODS), resulting in an instruction set where no 2 registers are alike and increasing the cognitive burden of the programmer.
80386 not only added extra addressing modes taken from DEC VAX (i.e. scaled indexed addressing) but it made the addressing modes much more regular than those of 8086/80286, even if it has preserved the restriction of auto-incremented auto-decremented modes to a small set of special instructions.
This use of sloppy terms shouldn't make us forget that they are using an address extension trick, just like all those 16 bit processors that wanted to go beyond 64KB (for byte addressed such as the PDP-11, Z8000 or 8086) or 128KB (for word addressed, like the Xerox Alto's modified Data General Nova model).