
A Superscalar Out-of-Order x86 Soft Processor for FPGA (2017) - matt_d
https://tspace.library.utoronto.ca/handle/1807/80713
======
basementcat
From section 6.8 (p. 69) "Note that the decoder’s microarchitecture design is
complete, including the branch predictor design and micro-op sequences for
nearly every x86 instruction and behaviour. Our circuit implementation is less
complete than our microarchitecture design (implemented as a detailed pipeline
simulation)"

Still an impressive work.

------
hak8or
How is this legally handled considering amd owns x86-64 and Intel owns x86?

As long as it's not used in commercial settings, will they pretend not to see
it while users are in a legally gray area?

~~~
wk_end
This is x86 only, and a quick skim of the dissertation suggests that it
doesn't implement any post-P6 (Pentium Pro) instructions. The P6 is 24 years
old now so presumably all patents are expired?

~~~
userbinator
The last time I looked, the MMX patents were close to expiry too, and by now
they might be.

~~~
Const-me
MMX is from P5.

P6 supports MMX and later revisions SSE 1.

The next one after P6 is NetBurst, it introduced SSE2 and later revisions
SSE3.

------
userbinator
_Our microarchitecture achieves 2.7 times the per-clock performance of a
performance-tuned Nios II /f, Alteraâ s fastest (RISC-like, single-issue,
pipelined) soft processor, and 0.8 times the frequency, for a total
performance improvement of 2.2 times._

It'd be very interesting to compare this to RISC-V.

~~~
pcwalton
Note that this the Nios II/f is an in-order CPU, while this is a superscalar
CPU. A more relevant benchmark would be the superscalar dual-issue ARM
Cortex-A9, illustrated in figure 13.4. It's about the same performance as that
one if you average all the benchmarks.

In theory, RISC-V should be at about the same performance as ARMv8 (note that
Cortex-A9 is ARMv7):
[https://news.ycombinator.com/item?id=15343287](https://news.ycombinator.com/item?id=15343287)

~~~
snvzz
>In theory, RISC-V should be at about the same performance as ARMv8

Or POWER9, the cpu with the fastest IPC. And it's RISC.

There's nothing to prevent RISC-V ISA from getting high performance
implementations.

------
Skunkleton
I am by no means an expert in digital design (I have only worked with them as
a SWE), but it seems to me that the use cases for a high performance soft
processor are pretty few and far between. After all, if you want a fast
processor you can get a hard processor with excellent performance/support for
less than the FPGA fabric likely cost.

Still a cool piece of tech though.

~~~
gh02t
The main use for soft processors is for hybrid designs. Stuff that needs some
significant programmable logic for really performance or timing sensitive
applications, but where other functionality is better implemented in an easier
to program CPU. If you're gonna have to use an FPGA anyway, it is frequently
easier/cheaper to just implement a soft core processor versus adding a
separate discrete processor (which is more involved than just adding a single
chip, you need all the supporting circuitry, interconnects, routing on the
board etc).

The other use case is sorta the same thing, but is as a normal CPU with a few
custom extensions. Sometimes no manufacturer's product fits your needs well
and ASICs are expensive (also difficult to change), so some companies just
ship customized CPUs on FPGAs with whatever extensions they need.

Xilinx's Zynq chips (FPGA with an ARM core) have been very successful, which
kinda demonstrates that this is an attractive combination.

~~~
wtallis
Time to market is also sometimes a factor; putting a soft processor onto the
unused parts of an FPGA is far easier than bringing up a SoC combining CPU
cores with special-purpose compute or IO.

The high-end SSD market has had a lot of FPGA-based products for years, and
recently many of them are using any leftover gates to add user-accessible CPUs
(or occasionally ML-focused compute resources). It turns out that there are
quite a few uses for having a CPU extremely close to your massive pile of
data, rather than having a relatively narrow PCIe link between the storage and
the CPU. These SSD controllers are usually forced to use pretty large FPGAs in
order to have a high enough pin count to manage several TB of flash, and it
seems that they often have logic elements to spare.

