I haven't tested a physical chip to verify, but based on my simulations I think you are correct. For your second example, a side effect is that NMI is blocked until the end of the instruction, so you could block the NMI interrupt for an arbitrary amount of time.
Oh wow. NMI blocked, and presumably other interrupts too? That means the filling a 64K segment with cs: prefixes will lock the CPU completely. IP will wrap around forever, and you have created some kind of infinite sized instruction. That's kind of cool!
Presumably, yeah. If other interrupts weren't blocked, unless PC is somehow saved to be the address of the prefix(es), upon exiting from the interrupt you'd resume from the "wrong" ("incomplete", lacking its prefix) instruction.
Love the work Ken, been reading your articles since the early 2010s. Do you get paid to write these posts and do the research or is this just a hobby? I wish I could find the time to do something similar, but between family and work I have zero free time to do anything anymore, unlike my 20s that were time spent wasted. Would love to know how you do time management if this is your hobby.
I met your group at some informal meeting where Eric was showing his Monster 6502 (it seems ages ago by now), and it definitely gave me a blueprint of how I'd spent my time once I retire. :D
I've recently taken an interest in CPU architecture and your wonderful article couldn't have come at a better time for me. Thank you.
Beginner question:
In the example of the 3-byte instruction using the immediate value:
ADD AX,1234
Is this instruction 3 bytes long because 'ADD AX' is encoded in 1 byte while the immediate value '1234' is two bytes long?
Does the differing prefetch queue size between 8088 and 8086 lead to any significant differences between the Bus Interface Units (or wherever that affects most) of the two chips, or is it basically just a "parameter" in the design that could be tuned without a lot of knock-on effects?
Also:
> If the queue ran empty, the processor waited until more instruction bytes were fetched from memory into the queue.
Does the CPU make any effort to fill up the queue before it runs empty?
I haven't studied the 8088 super-closely. There are a moderate number of changes. For instance, the prefetch registers in the 8088 needs to be updated a byte at a time, so they need separate write control lines for the low and high bytes. The logic that counts queue positions also needs changing; it is optimized logic rather than a generic counter. So it's more than just changing a parameter.
As for the CPU making an effort to fill up the queue, the CPU tries to fill up the queue if the bus is idle. But if memory accesses are happening, you're better off doing the memory accesses that you need rather than performing prefetches which could get discarded.
So what you're saying is that the 8086 was sort of stack based (like forth), and a given instruction just consumed the number of bytes off the stack it needed, then the assumption was the next thing on the stack was the next instruction?
The instruction bytes were in a queue, not a stack, so it's not really like Forth. It's the same as reading the bytes in order from memory except the queue improved performance by reading instructions when the bus was otherwise free.
This is a book I could read in the flesh instead of online. Getting answers to questions like why did they do this, was it a constraint of the day or some other reason could be elucidating.
Question: Does the 1BL thing imply that the 8086 is not capable of detecting useless prefixes? So the next 2 implications are correct:
Eg1: lock cs: clc is just treated as clc, and the lock and cs: are ignored?
Eg2: The 8086 has no 16 byte instruction length limit, unlike some successors. So e.g 16 seg overrides:
Cs: Ds: Es: Ss: Cs: Ds: Es: Ss: Cs: Ds: Es: Ss: Cs: Ds: Es: Ss: mov [1234],5
Is just ss: mov [1234],5