
How the 8086 processor handles power and clock internally - todsacerdoti
http://www.righto.com/2020/08/how-8086-processor-handles-power-and.html
======
kens
Author here if anyone has questions about the 8086 internals.

~~~
bogomipz
Another great article. In some photos we see small circles labeled as "vias"
and other's show small circles as "contacts." What exactly is the difference
between vias and contacts?

~~~
kens
No difference except how much space I had on the diagram :-)

------
jhallenworld
Wires on a chip are interesting. On a PCB you mostly have just two cases to
deal with: low edge rate case, the trace is equivalent to a capacitor. Fast
edge rate case, the trace is equivalent to a transmission line with a
characteristic impedance, such as a coax cable- signals will bounce and you
have to worry about termination. In both of these cases you can pretty much
assume that resistance is zero (except for very high edge rates).

Now on a chip the traces are so thin that resistance is significant and can
not be ignored. The trace can be modeled as a distributed or lumped RC
circuit. A consequence is that the delay is a quadratic function of its length
(doubling the length of the wire quadruples its delay). It becomes worthwhile
to add repeaters. I wonder if these show up in 8086..

On the other hand, "For on-chip wires with a maximum length of 1 cm, one
should only worry about transmission line effects when tr < 150 psec"

[http://bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f01/No...](http://bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f01/Notes/chapter4.pdf)

~~~
Taniwha
Well yes and no - at the time the 8086 was designed gate capacitance dominated
over wire RC delays - early design tools (sort of pre-late 90s, way after the
original 8086) didn't bother with calculating RC delays during synthesis and
we only really dealt with them in late static timing analysis (and really then
only for a few long lines).

Essentially as chip features got smaller wire resistance didn't scale the same
as gate capacitance (partly it's edge effects) and our tools needed to change
as RC delays started to dominate

------
bonzini
> This chip divided its input clock by 3 to generate the 33% duty cycle clock
> required by the 8086.

And the input clock was 4.77 MHz because that is one third of the NTSC clock,
14.318 MHz. The clock is further divided by 4 and that's the input of the 8253
programmable timer.

~~~
jhallenworld
The 33% is only required if you want to run the clock at its maximum speed. If
you run it at a lower speed, you can get away with 50% duty cycle and dispense
with the 8284A clock generator chip. I've done this on some cheap embedded
systems.. up to 4.2 MHz on a 5 MHz 8088 is OK..

~~~
kens
That's interesting. The datasheet says the clock has to be 2/3 low and 1/3
high with just 17 ns of margin. It's surprising that the datasheet is so
strict if you can totally violate the specification and still have it work.

(To be clear, I'm not disagreeing with you that this works.)

~~~
jhallenworld
Well the spec is minimum cycle time = 200 ns, maximum cycle time 500 ns. But
also it has min. low time = 118 ns and min. high time = 69 ns, which is how
the 1/3 duty cycle shows up. But there is no spec violation with, for example,
low time = 125 ns, high time = 125 ns for a cycle time of 250 ns (4 MHz).

~~~
kens
Which spec are you looking at? Is it a later revision? I'm looking at the
User's Manual [1] which says the CLK Low Time has a minimum of (2/3 cycle
time)-15 and CLK High Time has a minimum of (1/3 cycle time)+2. That yields a
minimum low time of 151 ns which is violated by the 125 ns value.

[1] page B-18 in
[http://www.bitsavers.org/components/intel/_dataBooks/1981_iA...](http://www.bitsavers.org/components/intel/_dataBooks/1981_iAPX_86_88_Users_Manual.pdf)

~~~
jhallenworld
This one, page 15:
[https://course.ece.cmu.edu/~ece740/f11/lib/exe/fetch.php?med...](https://course.ece.cmu.edu/~ece740/f11/lib/exe/fetch.php?media=wiki:8086-datasheet.pdf)

Ah, I see in the older datasheet it says TCLCH = 2/3 * TCLCL - 15. But I'm
pretty sure they mean TCLCH = 2/3 * TCLCLmin - 15 which gives the 118 ns it
has in the newer datasheet.

~~~
kens
Thanks! That makes more sense.

------
supernova87a
I've loved following this series of posts as an amateur/layman wanting to know
more about the hardware behind the chips I use!

Here's a question for anyone who might have an interest to help me understand
(at a high summary level) --

Steady DC power is being provided to the chip, and the internal clock is going
at 5 Mhz. What are the input and output signals like? Are they on the order of
a few kHz and in a long stream of data (in a short burst?) that the CPU then
takes and works on until the "delivery" burst? Is this the FSB? How many of
the 45 wires coming off the CPU are for the input/output signals, or what are
the rest of them for?

Thanks!

~~~
kens
The 8086 has 40 external pins [1]. There are 20 address lines shared with 16
data lines, so it's a 16-bit computer that can address 1 megabyte. The
remaining pins are mostly assorted control signals. (Intel liked to keep the
pin count pointlessly low, so the pins are multiplexed and have several uses.)

The external pins are switching at the 5 MHz clock speed, and there are four
clock periods for one memory bus cycle. The address pins send out the address
the first cycle, wait a cycle for memory to respond, and read or write the
data the third cycle.

The CPU and the memory bus are working at the same clock speed. (Although
somewhat decoupled because the 8086 had prefetching.) I think you're going the
wrong direction with kHz I/O and delivery bursts.

I've simplified things a bit. The 8086 User's Manual [2] explains all the
signals in great detail if you really want to know.

[1] I see that you carefully counted the 45 wires off the die. The power and
two grounds each use two bond wires in parallel for more current. There are
two wires to bias the substrate. So that's the "extra" 5 wires.

[2]
[http://www.bitsavers.org/components/intel/_dataBooks/1981_iA...](http://www.bitsavers.org/components/intel/_dataBooks/1981_iAPX_86_88_Users_Manual.pdf)

~~~
supernova87a
Thanks! I will have a read through that manual!

------
panpanna
This is so simple and beautiful.

It is mainly because of lack of power saving and EMC requirements plus no
clock skew problems at such low frequencies.

Nowadays you need a team of 50 people just for the clock tree.

------
miohtama
Also I find the following very intereating. Because no CADs existed, old CPUs
were drawn with pen and paper. The process of getting it to lithography was
called Rubylith.

[https://en.m.wikipedia.org/wiki/Rubylith](https://en.m.wikipedia.org/wiki/Rubylith)

Here are some historic photos with master prints of CPU

[https://www.team6502.org/](https://www.team6502.org/)

~~~
segfaultbuserr
The origin of the term "tape-out".

Circuit boards were also designed in this way.

[https://www.eetimes.com/how-it-was-pcb-layout-from-
rubylith-...](https://www.eetimes.com/how-it-was-pcb-layout-from-rubylith-to-
dot-and-tape-to-cad/)

~~~
Taniwha
Yup, I did this as a kid in the mid 70s - PCBs were done at 2x, red and blue
tape for 2 layers - you drilled holes yourself, no plated thru holes

------
egsmi
This is the one I personally have been waiting for. Thanks!

On the power distribution did you notice any bypassing? There are a number of
ways to do it now. I have no idea how they did it, if at all, in the 70s.

Also, did you notice any trim or anything on the clock driver so they could
match phase of clock and not clock?

~~~
kens
There's no power bypassing on the chip, just the power lines.

There's no clock trimming either. The clock circuitry ensures that there is a
gap between the two phases, but there's no adjustment. But at 10 MHz, the
clock is not too sensitive.

~~~
egsmi
That’s really interesting. It seems like a foreign world compared to what I’m
used to. :)

~~~
kens
Do you want to say a bit about bypassing and trimming on modern chips?

~~~
egsmi
It's a lot of information for a forum comment. I will provide a pointer
though.

I think the book at cmosvlsi.com is pretty good for an introduction to the
realities of modern digital IC design.

For more information on the topic of this thread in particular see slides
17-19 in this deck. I was asking if the cap on the far right of slide 17
existed in this design.
[http://pages.hmc.edu/harris/cmosvlsi/4e/lect/lect21.pdf](http://pages.hmc.edu/harris/cmosvlsi/4e/lect/lect21.pdf)

On the trim: When I looked at the two phase timing figure provided I noticed
if the bottom path through the clock driver was slower than the top path, due
to manufacturing tolerance, then that could cause skew (see slide 24 in the
deck I pointed to) between the phases which might cause phase 1 and phase 2 to
get too close, or even overlap. The clock driver circuit looked pretty regular
in the layout so I was guessing they might have redundant parallel drivers
that could be enabled or disabled by the ROM bits which could change the
relative strengths of the two paths, after manufacture. This would allow them
to recover the part and improve yield. But I guess edge rates and periods were
slow enough then that this was not a concern and just relative sizing was
enough.

~~~
segfaultbuserr
I recently read the book _Principles of Power Integrity for PDN Design_ and it
talks about the same problem from a perspective of a board designer. The
authors mention that in the vast majority of chips, due to the bond wire
inductance alone, board-level bypass is useless at frequencies above 100 MHz.
The only solution here is adding on-die capacitance, otherwise it's not
possible for a modern chip to operate correctly. In practice, the on-die
capacitance is often minimum due to its high cost, only effective at
frequencies above 100 MHz, leaving an anti-resonate peak around 100 MHz, thus
the problem can never be eliminated (unless a better package or more on-die
capacitance is used, which is not implemented due to its costs). And the
author showed how you can break almost every power distribution network of a
FPGA, by switching logic circuits in code at the exact anti-resonate frequency
where bypassing is least effective. Although it's usually not a problem in
practice (until you get "lucky"...).

And contrary to popular beliefs, it's impossible to remove it, no matter how
many bypass capacitors or how much buried capacitance exists on the circuit
board. In a complex chip like FPGA, PDN is critical. The author suggested that
the best thing the board designer can do is a workaround: using bypass
capacitors of multiple values to "tune" the PDN - this was done as a rule-of-
thumb in the old days, and today it's often seen as a bad practice as it
creates multiple uncontrolled resonate and anti-resonate frequency spikes when
capacitors are combined with parasitic inductance (similar to the case of the
100 MHz peak, but at board level), so at some range of frequencies, impedance
is actually significantly increased, which is in conflict of the goal of
decreasing PDN impedance at all frequencies. But in the case proposed by the
authors, it's done in a controlled manner - the PDN impedance is intentionally
increased by carefully creating a flat, slightly higher impedance region
around 100 MHz to dampen unwanted oscillation (with simulations and
measurements) when it's inadvertently excited.

Also, sometimes changing the timings or operating frequencies is required, so
the impedance peak of the chip PDN is not excited, even when it means the end-
user must redesign firmware, microcode, or HDL code.

------
ncmncm
It seems noteworthy that clock, Vdd, and Vss trees do not penetrate the
microcode array.

~~~
kens
Note 1 to be specific :-) "The microcode ROM forms a large region with no
power connections, just ground. This is because each row in the ROM is
implemented as a very large NOR gate with the power pull-up on the right-hand
edge. Thus, the ROM gates all have power and ground, even though it looks like
the ROM lacks power connections"

The outputs from the microcode go into clocked latches to the left of the
array.

The high-level motivation is that you want the microcode ROM to be as dense as
possible, so you want to minimize the number of different signals going in
there. It is constructed with just the input lines, transistors (or gaps),
ground, and the output lines, so it is about as dense as possible. Even so, it
takes up a large chunk of the 8086 die.

------
cheerlessbog
Is it conceivable that the clock signal could supply the power in a CPU?

~~~
kens
I'm not sure why you would want to power a CPU through the clock. However,
microcontrollers can get inadvertently powered through the I/O pin protection
diodes if you don't hook up power and ground:
[https://www.eevblog.com/2015/12/18/eevblog-831-power-a-
micro...](https://www.eevblog.com/2015/12/18/eevblog-831-power-a-micro-with-
no-power-pin/)

~~~
segfaultbuserr
I once spent two days debugging a microcontroller circuit for my
retrocomputing project because of this problem.

The microcontroller was an Atmel ATmega328P I extracted from an Arduino board,
I used the chip as an external debugger and monitor. The first function I
implemented was EEPROM reprogramming via a programming socket, the code was
working without any sign of issue for two weeks, working as intended. I could
burn a program to the ROM, plug the ROM in the 8-bit computer and execute
instructions. Later I attempted to plug the microcontroller into the system
bus of my 8-bit computer to add in-system reprogramming and memory debugging
capabilities, so I don't have to plug and unplug the ROM every time I need to
reprogram it. I planned to implemenet it by taking over the system bus of via
DMA, so the running CPU would handover its the control to me. However, no
matter how I changed the code, there were always some strange bugs. RAM
reading and writing never worked reliably, there's random memory corruption,
and the CPU was never able to continue executing the program correctly after
the DMA had finished. It seems there were always some forms of bus contention
bugs in the microcontroller, as if the GPIOs pins were not properly
tristated/isolated before the beginning of a DMA cycle. But I was unable to
find it at all in the code.

Eventually I realized the pin 7, Vcc, was not connected! I miswired the power
to an I/O pin. From the beginning, the ATmega328P was operating without the
main digital power supply and was sourcing all the power via the ESD diode on
the I/O pin, and/or possibly the analog power supply AVCC. I was surprised
that it was able to work for two weeks. On second thought, during EEPROM
programming, the connection to the chip was direct, and tristate was mostly
not used, the MCU had no problem driving it. But in memory debugging, the bus
is long and the entire output was tristating on and off during a DMA cycle,
the lack of a proper Vcc supply probably made the I/O driver to malfunction,
especially the input/output selection, creating unpredictable output state.

Later on in another unrelated project, I encountered another problem due to an
incorrect PCB footprint pinout. The 4-pin SMD crystal was connected to the
wrong pins - only one side was connected to the chip. there's basically no
system crystal at all. But the parasitic capacitance between the crystal pins
was sufficient to start a weak oscillation (on the oscilloscope you can see a
clock waveform with a very low amplitude, it's not at the proper logic level,
as if it's an analog RF circuit), the chip was even able to start its 125 MHz
PLL! But the logic was not fully functional until I dead-bugged soldering the
crystal the correct way.

Lesson learned: Always double check. Just because the chip has power, doesn't
mean the chip is receiving power correctly. Just because the chip has a clock
output, doesn't mean the chip is receiving the clock correctly. And finally,
if there's an external power-on reset, just because the chip was initialized
after you apply power, doesn't mean the power-on reset circuit is functional.

~~~
Chouhada
Reminds me of the history of the ARM1 chip, where the chip was accidently
running off leakage alone [1]

[1]
[https://en.wikichip.org/wiki/acorn/microarchitectures/arm1](https://en.wikichip.org/wiki/acorn/microarchitectures/arm1)

