
8088 microprocessor IP core fits in 308 LUTs, runs at 180MHz on a Kintex-7 FPGA - ingve
https://forums.xilinx.com/t5/Xcell-Daily-Blog/8088-microprocessor-IP-core-fits-in-308-LUTs-runs-at-180MHz-on-a/ba-p/682449
======
mutagen
Note that this core runs at ~100 MHz to get equivalent performance (cycle
accurate timings) to the original 4.77 MHz 8088. The big deal is the
relatively small number of LUTs used, leaving plenty of room for more stuff.

There's more from the creator at
[http://www.eetimes.com/author.asp?section_id=216&doc_id=1328...](http://www.eetimes.com/author.asp?section_id=216&doc_id=1328967)
including this nice tidbit:

 _The result is the MCL86, which is basically a 7-instruction, 32-bit micro-
sequencer. Some of the micro-sequencer 's instructions are specialized so as
to allow it to rapidly decode instructions as well as nest function calls.
With these seven instructions, I was able to microcode all of the 8086 opcodes
in a relatively small number of micro-sequencer clocks._

A video of this running 8088mph would be awesome, they already have a number
of videos of this running other stuff on a PC:
[https://www.youtube.com/channel/UC9B3TaEUon-
araO2j7tp9jg](https://www.youtube.com/channel/UC9B3TaEUon-araO2j7tp9jg) EDIT:
There is a video of it runnin 8088 mph that polpo linked!

~~~
ajross
Exactly. This is a "old is new" kind of design. Back when logic was so
expensive as to make computers almost impossible, designs were build around a
small number of execution units on a small number of buses and a comparatively
compact microcode table that could share them sequentially. So you only needed
one adder because the IP increment and ALU operation could share it on
different clocks.

Then once there was space to put all the stuff needed for a single instruction
on the die, we found ourselves clock-limited by long logic depth and started
splitting the functions out across "pipeline" stages, which begat RISC, and
we've never looked back.

But all that being said: _this design is totally cheating_. Sure, the _logic_
takes only 308 LUTs. But the microcode is stored in 4 block RAMs, which a
quick Google tells me are 36kbit a piece. That's a much more significant chunk
of chip resources than is implied in the linked article.

~~~
abortz
Perhaps in terms of _gates_ , but LUTs represent a much larger fraction of
resources on an FPGA than the block RAMs. In other words, if you wanted to
pack a lot of these onto an FPGA you'd run out of LUTs before block RAM.

~~~
ajross
I'm too lazy to look up numbers for the Kintex-7 part in question, but I'm
almost certain you're wrong on this. A block RAM is a _big_ chunk of die, and
there are comparatively few of them to go around. A LUT is a tiny object
(comparable in computation power to 10-50 dedicated transistors) and there are
hundreds of thousands of them on the FPGA.

I'm willing to bet lots that 4/n_block_ram > 308/n_lut.

~~~
ajross
Yeah, I went and looked up. The details are hairy, becuase Xilinx. But a
KC7K410T part, which is roughly their mid-range offering has 63550 slices,
where IIRC a slice has two LUTs (pay no attention to their "logic cells"
number -- that's a normalized thing scaled so as to be linear with the old
4-input LUT design from long ago) and 28620kbit of block RAMs, where each
block is 36kbit.

So that design uses 308/(2*63550) =~ 0.2% of the logic resources on the FPGA,
but 4/(28620/36) = 0.5% of the RAM.

Not nearly as imbalanced as it sounded to me originally, but still: the LUT
numbers are spun by more than a factor of two. The design is more closely
equivalent to "640 LUTs". Which interestingly is very comparable to the
equivalent transistor count on the original part from Intel.

~~~
abortz
Yes, you were correct. I was working from the bulk stats on the chip and mixed
up Kb and KB ;-)

------
tzs
> Just how many LUTs is 308? The smallest Kintex-7 FPGA is the K70T with
> 65,600 logic cells (“the logic equivalent of a classic 4-input LUT and a
> flip-flop” according to User Guide UG474), so we’re talking about a resource
> consumption of much less than 1% of that very small programmable device.

So...you could put 100 8088 work-alikes on one FPGA?

At the risk of inducing /. nostalgia in the old timers here...can you imagine
a Beowulf cluster of these?

~~~
cnvogel
To put the mind-boggling amount of logic in a contemporary FPGA into
perspective, the smallest device in the Kintex-7 family seems to be the
XC7K70T...

[http://www.xilinx.com/products/silicon-
devices/fpga/kintex-7...](http://www.xilinx.com/products/silicon-
devices/fpga/kintex-7.html#productTable) (<\-- HTML overview page at Xilinx)

...which contains 10,250 slices where one slice contains four 6-in-2-out
lookup tables and 8 flipflops:

[http://www.xilinx.com/support/documentation/user_guides/ug47...](http://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf)
(<\-- family user-guide) [http://imgur.com/viEdQUv](http://imgur.com/viEdQUv)
(<\-- png of page 19 with "schematic" of one logic slice)

On page 19: The four boxes on the left are the lookup tables implementing
combinatorical logic (A/W/O5/O6/...), the eight squares on the middle/right
(D/CE/CK/SR/...) are flipflops (store one bit of data each). There's a bunch
of random multiplexers (the trapezoid ones, they choose one output of X
inputs) scattered around. This "schematic" is of course simplified ;-).

So, 10250*6/308=199.7 fits the 8088 "IP" 200 times. Of course this is a very
naive calculation ignoring any routing between cores or any peripherals to
make them do anything useful, and one would use one of such a 8-bit CPU for
easy housekeeping tasks, and not 200 of them. But it shows nicely how
incredibly dense current FPGAs are.

Of course, just one bare chip will set you back around $120.
[https://octopart.com/search?q=XC7K70T](https://octopart.com/search?q=XC7K70T)
(<\-- part search)

~~~
typon
Even more mindboggling: the unreleased Stratix 10 FPGA from Altera claims to
have 5,510,000 LUTs in the largest device.

That's a total of 17,889 Intel 8088 Microprocessors.

~~~
PeCaN
What does one even use that much FPGA for? Any organization that could afford
that could probably afford making an ASIC of comparable or better performance.

...That said, that's crazy impressive.

~~~
cnvogel
One example that comes to mind is data analysis for radioastronomy. With
arrays of radiotelescopes, as far as I understand it, you will downconvert an
incoming frequency band spanning a few GHz, which gives you a few GByte/sec of
data _for_ _each_ _antenna_. Then you might have an array of 50 telescopes
(the ALMA array is planned to have 50 or so, I think).

This stream of maybe a TByte/sec of data will then be filtered and
decimated/downconverted in real time by racks full of DSP/FPGA boards. Here's
a picture of one board used for an Australian facility:

[http://www.atnf.csiro.au/news/newsletter/oct06/CABB.htm](http://www.atnf.csiro.au/news/newsletter/oct06/CABB.htm)

5x Virtex II XC2VP50 (23,616 slices, 2 PowerPC CPU blocks, $1700 each)

5x Virtex 4 XC4VSX55 (15,360 slices, $1300 each)

Yes, an ASIC might be more energy efficient and could be made faster, but
FPGAs give you the flexibility to adapt your algorithms and filter topologies.
And an ASIC run might cost you half a million dollars whereas with FPGAs you
only spend ~15'000$/board.

------
ChuckMcM
That is so freaking awesome. CPLDs are approaching 308 logic blocks :-). And
given the small footprint of the core it suggests you could probably build the
entire IBM PC architecture on a single FPGA with CGA or Hercules Mono
framebuffer support. Then boot Microsoft Flight Simulator and chortle at how
much imagination you needed to use to believe you were flying a plane.

~~~
acveilleux
Lol, I can't even suspend disbelief and play VGA games from the 1990s. I see
the pixels and not the scene. Same with games like Civ 1 or Sim City. I
remember playing for ages but now there's no way I can do it.

Maybe the size of my monitor is contributing to this. My DOS gaming days were
on 12" and 14" CRTs current play is on 27" LCD.

~~~
rasz_pl
This looks perfectly playable to me
[https://www.youtube.com/watch?v=qUOEAsCNZiA](https://www.youtube.com/watch?v=qUOEAsCNZiA)

------
huangc10
This is sweet. Been a while since I worked in HW after switching to full time
SW but this kind of news just makes me chuckle.

Say a modern day FPGA has 10 million gates. 6 gates/LUT. That gives 1.6
million LUTS. Let's say half are used up by other IPs and IOs within the chip.
800k/308 = ~2500.

You could have 2500 of the 8088 running at 180MHz simultaneously. Why? For
science.

------
david-given
I wonder how much extra space it would take to emulate all the weird-ass IBM
XT peripherals and end up with a true 8088 embeddable-system-on-a-chip? Plug
that into a cheapo SD card (or even a serial EEPROM) and you'd have a
standalone machine that would run DOS.

That could actually be _useful_.

------
danjayh
I haven't done any VHDL in about 9 years. Anyone know of a good site/tutorial
to run through that starts out relatively basic and goes through to advanced
topics? (bonus points if it has step-by-step instructions for a low cost /
free for noncommercial dev environment).

~~~
analognoise
Most vendors have free (as in beer) licenses available for their smaller
parts.

Check out FreeRangeFactory. They have some good intro books. Not sure how
advanced they get (as I do mostly verilog). Feel free to shoot me a message if
you get stuck.

------
bitwize
Let's see 8088mph run on that baby.

~~~
polpo
Okay:
[https://www.youtube.com/watch?v=b3GkPGZR4BU](https://www.youtube.com/watch?v=b3GkPGZR4BU)

~~~
duskwuff
Looks like it _mostly_ works, with the exception of a couple of tricky color
effects (which have nothing to do with the CPU, and might depend on some
composite video tricks).

In particular, the Kefrens bars at 4:42 render completely wrong.

~~~
ajenner
The Kefrens bars effect is the part of 8088 MPH that is most sensitive to the
instructions being a cycle faster or slower here or there. The MCL86 doesn't
claim to have perfect emulation of timing, so I would not necessarily expect
this effect to work reliably on it.

~~~
Scali
The artifact colours are quite off as well, so I suspect the card used was not
a real IBM CGA card. On my ATi Small Wonder and Paradise PVC4 cards, I get the
exact same Kefrens bars, with the top chopped off.

Having said that, the fact that the music slows down noticeable during the
moire-effect seems to indicate that indeed some instructions are a few cycles
off here and there.

------
fpgaminer
I'm curious why it only runs at 180MHz on a Kintex-7. The Kintex-7 can do
32-bit additions at 400-500MHz, so it's odd to see an 8088 running at less
than half that. The article mentions that removing the cycle accurate
constraint would allow it to run faster, so perhaps that's why.

~~~
vvanders
In theses solutions you're less interested in raw performance(you could go
with one of their hybrid ARM-FPGA cores in that case) then keeping LUTs small
so that the FPGA can use them for things that it's good at. Processors like
these are kinda like the glue for larger parts.

------
muterad_murilax
I'm sorry... IP? LUT?

EDIT: Thanks for the explanations, guys!

~~~
jwise0
For some reason, in the hardware design world, we use "IP" to mean "a self-
contained functional block of hardware that we have developed". It probably
derives from 'intellectual property' (i.e., 'we bought some IP from $VENDOR to
handle HDMI', and then eventually 'we bought _an_ IP from $VENDOR to handle
HDMI').

A LUT is, more or less, the smallest logical element on an FPGA (a piece of
programmable hardware) -- it's a Look-Up Table. They're not directly
comparable from FPGA to FPGA, because some FPGAs have different sizes; in the
early days, LUTs had 3 inputs and produced one output, but on modern FPGAs,
'LUT4's (4 input, 1 output) are the smallest that you'll reasonably get, and
some FPGAs even use 'LUT6's (6 input, 1 output; sometimes divisible into 5
input, 2 output, or other subdivisions) as their basic logic element. But no
matter how you slice it, 308 LUTs is impressively small, especially for
180MHz.

------
davidvicky
i'm curious why it only runs at 180MHz on a Kintex-7. The Kintex-7 can do
32-bit additions at 400-500MHz, so it's odd to see an 8088 running at less
than half that. The article mentions that removing the cycle accurate
constraint would allow it to run faster, so perhaps that's why.

[http://1tour.vn](http://1tour.vn) [http://en.1tour.vn](http://en.1tour.vn)
[http://1tour.vn/khach-san](http://1tour.vn/khach-san)
[http://1tour.vn/tour](http://1tour.vn/tour)
[http://blog.1tour.vn](http://blog.1tour.vn) [http://1tour.vn/khach-san/ha-
noi](http://1tour.vn/khach-san/ha-noi) [http://1tour.vn/khach-
san/sapa/](http://1tour.vn/khach-san/sapa/) [http://1tour.vn/khach-san/da-
nang/](http://1tour.vn/khach-san/da-nang/) [http://1tour.vn/khach-san/da-
lat/](http://1tour.vn/khach-san/da-lat/) [http://1tour.vn/khach-san/nha-
trang/](http://1tour.vn/khach-san/nha-trang/) [http://1tour.vn/khach-san/vung-
tau/](http://1tour.vn/khach-san/vung-tau/) [http://1tour.vn/khach-san/phan-
thiet/](http://1tour.vn/khach-san/phan-thiet/) [http://1tour.vn/khach-san/sai-
gon/](http://1tour.vn/khach-san/sai-gon/) [http://1tour.vn/khach-san/phu-
quoc/](http://1tour.vn/khach-san/phu-quoc/)

------
bifrost
This is pretty cool, and should remind everyone that the 8088 and even the
Zilog80 CPUs are still relevant and used to this day.

~~~
duskwuff
Z80? Yes. 8088? Eh, not really.

You may be confusing the 8088 CPU with the 8051 microcontroller, which is
extremely common in embedded designs.

------
atemerev
Incidentally, the smallest model organism with a nervous system, C. elegans,
has 302 neurons.

Coincidence? Don't think so.

------
chriscappuccio
Sweet Jesus

~~~
guiomie
Why? I haven't played with an FPGA since University, so I'm really
disconnected from this world. So what does this article mean?

~~~
vvanders
308 is an incredibly small amount of LUTs, might even fit in a CPLD.

~~~
sliverstorm
You could fit 200+ 8088's on the smallest version of the FPGA family they
used.

~~~
cfallin
Oh wow, now I really want to see a 200-core-8088-on-FPGA! (Yeah, I know, the
necessary NoC and glue would add a ton of overhead. But still!)

~~~
sklogic
Even better than 88s: [http://fpga.org/grvi-phalanx/](http://fpga.org/grvi-
phalanx/)

------
davidvicky
Robert Chao is what many of us dreamed we’d become when we first started
engineering school. We loved technology. We knew we were intelligent. We
wanted to build our skills and increase our knowledge. We wanted to apply that
education along with our natural passion and talent to solve important
problems. We craved that creative rush, that eureka moment when we realized we
had a slightly better idea, an improvement, a challenge to the status quo. We
wanted to change the world - one transistor at a time.

For most of us, Dr. Moore had other plans.

For the past four decades, Robert Chao has remained indifferent to the
cataclysmic vortex of Moore’s Law. His first patent (4215281), filed in
February 1978 on behalf of Supertex Semiconductor (of which he was a founder),
was for a CMOS integrated circuit - a single-chip solution that provided the
platform enabling the ubiquitous home smoke detector.

[http://1tour.vn](http://1tour.vn) [http://en.1tour.vn](http://en.1tour.vn)
[http://1tour.vn/khach-san](http://1tour.vn/khach-san)
[http://1tour.vn/tour](http://1tour.vn/tour)
[http://blog.1tour.vn](http://blog.1tour.vn) [http://1tour.vn/khach-san/ha-
noi](http://1tour.vn/khach-san/ha-noi) [http://1tour.vn/khach-
san/sapa/](http://1tour.vn/khach-san/sapa/) [http://1tour.vn/khach-san/da-
nang/](http://1tour.vn/khach-san/da-nang/) [http://1tour.vn/khach-san/da-
lat/](http://1tour.vn/khach-san/da-lat/) [http://1tour.vn/khach-san/nha-
trang/](http://1tour.vn/khach-san/nha-trang/) [http://1tour.vn/khach-san/vung-
tau/](http://1tour.vn/khach-san/vung-tau/) [http://1tour.vn/khach-san/phan-
thiet/](http://1tour.vn/khach-san/phan-thiet/) [http://1tour.vn/khach-san/sai-
gon/](http://1tour.vn/khach-san/sai-gon/) [http://1tour.vn/khach-san/phu-
quoc/](http://1tour.vn/khach-san/phu-quoc/)
[http://en.1tour.vn/accommodation/halong-bay-cruises-
tours](http://en.1tour.vn/accommodation/halong-bay-cruises-tours)
[http://en.1tour.vn/tour/hanoi/](http://en.1tour.vn/tour/hanoi/)
[http://en.1tour.vn/tour/sapa/](http://en.1tour.vn/tour/sapa/)
[http://en.1tour.vn/tour/mekong-delta/](http://en.1tour.vn/tour/mekong-delta/)
[http://tourdulich123.com/](http://tourdulich123.com/)
[http://dongtaydulich.com/](http://dongtaydulich.com/)
[http://datvemaybayi.com](http://datvemaybayi.com)
[http://monngon7.com](http://monngon7.com)
[http://dulichasian.com/](http://dulichasian.com/)
[http://travellvnn.com/](http://travellvnn.com/)
[http://tourgiarenhat.com](http://tourgiarenhat.com)
[http://vemaybayr.com](http://vemaybayr.com)
[http://dulich3mienn.com](http://dulich3mienn.com)
[http://khachsan9.com/](http://khachsan9.com/)
[http://cantho60s.com](http://cantho60s.com)
[http://raovatzoom.com](http://raovatzoom.com)
[http://vnco.net](http://vnco.net)
[http://cachlamkem.vn](http://cachlamkem.vn)
[http://thoitrang365.net](http://thoitrang365.net)
[http://autocar24.com](http://autocar24.com)

------
Gratsby
I know at least a dozen of these words.

