
The History, Status, and Future of FPGAs - skovorodkin
https://queue.acm.org/detail.cfm?id=3411759
======
jcranmer
As a bit of a counterpoint:

One of my prior projects involved working with a lot of ex-FPGA developers.
This is obviously a rather biased group of people, but I saw a lot of feedback
around that was very negative about FPGAs.

One comment that's telling is that since the 90s, FPGAs were seen as the
obvious "next big technology" for HPC market... and then Nvidia came out and
pushed CUDA hard, and now GPGPUs have cornered the market. FPGAs are still
trying to make inroads (the article here mentions it), but the general sense I
have is that success has not been forthcoming.

The issue with FPGAs is you start with a clock rate in the 100s of MHz (exact
clock rate is dependent on how long the paths need to be), compared with a few
GHz for GPUs and CPUs. Thus you need a 5× performance win from switching to an
FPGA just to break even, and you probably need another 2× on top of that to
motivate people going through the pain of FPGA programming. Nvidia made GPGPU
work by being able to demonstrate meaningful performance gains to make the
cost of rewriting code worth it; FPGAs have yet to do that.

Edit: It's worth noting that the programming model of FPGAs has consistently
been cited as the thing holding back FPGAs for the past 20 years. The success
of GPGPU, despite the need to move to a different programming model to achieve
gains there, and the inability of the FPGA community to furnish the necessary
magic programming model suggests to me (and my FPGA-skeptic coworkers) that
the programming model isn't the actual issue preventing FPGAs from succeeding,
but that FPGAs have structural issues (e.g., low clock speeds) that prevent
their utility in wider market classes.

~~~
cmrdporcupine
See it's funny, I (software guy) have recently started doing a bunch of FPGA
stuff on the side for "fun" and I find the programming model to not be the
biggest challenge.

The tools, yes, because it seems like hardware engineers have a fetish for
all-encompassing painful vendor specific IDEs with half the features that us
software developers have, and with a crapload of vendor lock-in... but I
digress.

I find working in Verilog to be pretty pleasant. Yes I can see that with
sufficient complexity it wouldn't scale out well. But SystemVerilog does give
you some pretty good tools for managing with modularity.

On the other hand, I've never particularly enjoyed working with GPUS, CUDA,
etc.

So I would agree with your statement that the structural issues prevent their
utility in wider market classes -- and those really are as you say ... lower
clock speeds, cost, but also vendor tooling.

FPGAs could really do with a GCC/LLVM type open, universal, modular tooling. I
use fusesoc, which is about as close to that as I will get (declarative build
that generates the Vivado project behind the scenes), but it's not perfect,
still.

~~~
jjoonathan
I don't mean to belittle your exploration, but are you sure it's an apples-to-
apples comparison? This suggests to me that it isn't:

> it seems like hardware engineers have a fetish for all-encompassing painful
> vendor specific IDEs

Hardware engineers feel pain just like you do. The reason why they put up with
those awful software suites is because they have features they need that
aren't available elsewhere. In particular, they interface with IP blocks and
hard blocks, including at a debug + simulation level. Those tend to evolve
quickly and last time I looked -- which admittedly was a while ago -- the open
source FPGA tooling pretty much completely ignored them, even though they're
critical to commercial development.

If you are content to live without gigabit transceivers, PCIe controllers,
DRAM controllers, embedded ARM cores, and so on, I suspect it would be
relatively easy to use the open source tooling, but you would only be able to
address a small fraction of FPGA applications.

~~~
cmrdporcupine
Vivado ships all kinds of "IP" for those things, yes. And once you get past
the GUI wizards, drag and drop boxes and lines, and Tcl scripts you find in
the end it's just a library of Verilog, all mangled to the point of
illegibility.

I wasn't talking about open sourcing. I accept we won't have open source DRAM
controllers and the like from them. I understand the licensing restrictions. I
just don't like how they force all this stuff to be gatewayed through their
baroque and over complicated GUI tools.

I prefer tools that are scriptable, that can work with the build system of my
choice, that work properly with source control (imagine that!), where you have
your choice of editor rather than having their garbage one rammed down your
throat, where there's wizbang features like reformatting and auto-
indentation... Hell, even refactoring.

Vivado and Quartus just get in the way. There's no reason to tie all the stuff
you're talking about into an integrated tool. They could just ship libraries.

Fusesoc does in fact try to make them behave this way. But you can tell it's a
bit of a war to make it happen.

~~~
jjoonathan
Well yes, they shouldn't cram the awful GUI tools down HW engineers' throats,
but they do.

I'm glad Fusesoc is fighting the good fight and I'm glad you're fighting the
good fight, but as you point out, it's definitely a fight. It was hardly fair
to call the desire to avoid said fight a "fetish."

~~~
cmrdporcupine
I can only assume hardware engineers are asking for this kind of tooling,
because I can't imagine why companies would be spending the enormous
development effort on them and then giving them away for free if they weren't
being asked for?

So many things that could be done in a programatic, testable, declarative,
scriptable, repeatable way are done with futzy GUI tools in hardware land.
Schematic design _could_ be a matter of declaring components, buses, etc. and
letting the tool produce something (and then manually manipulate the visual
layout if necessary) ; I mean you could literally describe your board using
something similar to Verilog and get the tool to produce the schematic for
you... we have these kinds of powers in the 21st century -- Instead it's futz
with tools that are vaguely Illustrator-esque, find that half your connection
points are not actually connected, etc. Why do people want to suffer like
this?

Want to use a DRAM controller in Vivado? Find the wizard, enter into 10 text
boxes... and if you're lucky you can find the Tcl scripts it generated and in
the future just write your Tcl script... but they certainly won't make it
easy.

Vivado project in source control? You're going to jump through hoops for that.

I want hardware engineers to demand better.

------
lnsru
I am working right now on bare metal websockets implementation on Xilinx
Series 7 FPGAs. Currently it’s ZynQ SoC, but final product will probably have
Kintex 7 inside, so no Linux. The tools make me cry, no examples, application
notes from 2014 with ancient libraries. I hope, vendors will fix tooling. But
I see, Xilinx has released Vitis, so their scope is elsewhere, no interest in
old crap. Using Git with Vivado is already enough pain. So I keep my text
sources in Git and complete zipped projects as releases. Ouch!

~~~
beefok
I feel you completely. The Vivado IDE/toolchain is absolutely atrocious and
the designers should be shamed for the horrifying bloatware they push as the
STANDARD. Sometimes I have better luck doing everything in tcl/commandline
there.

~~~
tails4e
Vivado is amazing compared with the ASIC counterparts: Design compiler is for
RTL synthesis only and you need years of experience to get any decent qor out
of it. In ASIC land you have separate tools for every step, synthesis, STAs,
PnR, simulation, floor planning, power analysis, etc. Vivado does all that in
one seamless tool, and allows you to cross probe from a routed net right back
to the RTL code it came from. Try doing that with ASIC tools. So to me it's a
matter of perspective, once you understand how difficult the problem of
hardware design is to solve, and what some of the existing de facto industry
standard tools are like (for ASIC), you come to appreciate vivado for just how
well it brings all of these complex facets together. Of course if you come
from a SW background you make think vivado is terrible compared to VScode or
some other IDE, but that's an unfair comparison. I guess to reframe the
question - show me a hardware design environment that is better than Vivado.
Also, I separate vivado fron the Xilixn SDK, as they are different tools, and
Vivado is expclitly got the HW parts of the design

~~~
jlokier
I added one small Verilog file to a Vivado project.

It froze the IDE for _45 minutes_ before I could do anything else.

This was on a beefy machine at AWS too, not some cheap home desktop thing.

That wasn't compiling, no synthesis, P&R, nothing.

There was no giant netlist I'd been working on either. Most of the FPGA was
empty.

That was literally just adding a small source file which the IDE auto-indexed
so you could browse the contents.

In Verilator, an open source Verilog simulator, that same source file loaded,
completed its simulation and checked test results in less than a second. So it
wasn't that hard to compile and expand its contents.

Vivado is excellent for some things. But the excellence is not uniform
unfortunately. On that project, I had to do most of the Verilog development
outside Vivado because it was vastly faster outside Only importing modules
when they were pretty much ready to use and behaviorally validated.

~~~
tails4e
That's definitely an anomaly, I use vivado with ASIC code reguarly, very large
designs and have not seen anything like this. I use vivado to elaborate and a
analyse code intended for ASIC use as its better than other ASIC tools for
that purpose. Once I'm happy with it in vivado, then I push it through design
compiler, etc. Elaborating a deign that is 4 hours in DC synthesis is about 3
mins in vivado elaboration.

------
d_silin
I wonder if it is possible to add a (small) FPGA to a personal computer that
could accelerate any specific software tasks (video/audio encoding, ML
algorithms, compression, extra FPU capabilities) _on user demand_.

~~~
jeffreyrogers
The problem with this will be the overhead of transferring data to/from the
FPGA, which once accounted for often causes doing the computation on the CPU
to make more sense. It's obviously not a show-stopper, since GPUs have the
same problem, but are still useful, but it's hard to find a workload that maps
well to this solution.

~~~
derefr
In a DAW, accelerating a heavy VST plugin might make sense. But often those
are amenable to being translated to GPGPU code already.

I guess the one place where GPGPU-based solutions _wouldn 't_ work, is when
the code you want to accelerate is necessarily acting as some kind of Turing
machine (i.e. emulation for some other architecture.) However, I can't think
of a situation where an FPGA programmed with the netlist for arch A, running
alongside a CPU running arch B, would make more sense than just getting the
arch-B CPU to emulate arch A; unless, perhaps, the instructions in arch-A are
_very, very CISC_ , perhaps with analogue components (e.g. RF logic, like a
cellular baseband modem.)

------
retro_guy
Maybe you will find this article about Large-Scale Field-Programmable Analog
Arrays [FPAAs] interesting as well:
[https://hasler.ece.gatech.edu/FPAA_IEEEXPlore_2020.pdf](https://hasler.ece.gatech.edu/FPAA_IEEEXPlore_2020.pdf)

------
justicezyx
FPGAs are good at _nothing_ in the scale that can challenge non-configurable
silicons...

They are good at a lot of things that are in a smaller scales. Like general
prototyping/testing/simulation, telecom, special-purpose real-time computing
etc.

The behind-scene logic is that FPGAs can never make things as flexible as
software. And flexible software always offset the inefficiency in a non-
configurable chips. Just comparing FPGAs and CPUs/GPUs will never teach FPGAs
vendors the reality, or they choose to ignore after all...

~~~
GeorgeTirebiter
I believe you are incorrect. A counterexample to your claim is the increasing
use of FPGAs in the datacenter. And various AI engines are FPGA-based. You'll
do better for a CPU in Real Silicon; but a full-featured MPU w/standard
peripherals + FPGA for unusual & must-be-fast functions is hard to beat.

~~~
justicezyx
Tell me how much users are using FOGAs and why xillinx is just a fraction of
nVidia's market cap. 5 years ago, nvidia was 2x of xillinx in market cap, now
it's 10x.

------
inaccel
2 are the main challenges of the FPGA utilization:

\- The first one is the FPGA programming. Now using OpenCL and HLS is much
easier compared to VHDL/verilog to design your own accelerators.

\- The second one is the FPGA deployment and integration. Until now it was
very difficult to integrate your design with applications, to scale-out
efficiently and to share it among multiple threads/users. The main reason was
the lack of an OS_layer (or abstraction layer) that would enable to treat
FPGAs as any other computing resource (CPU, GPU).

This is why at inaccel we developed a unique vendor-agnostic orchestrator for
FPGAs. The orchestrator allows much easier integration, scaling and resource
sharing of FPGAs.

That way we have managed to decouple the FPGA designer from the software
developer. The FPGA designer creates the bitstream and the software developer
just call the function that wants to accelerate. No need to define the
bitstream file, no need to define the interface or the memory buffer
allocation.

And the best part: It is vendor and platform agnostic. The FPGA designer
creates multiple bitstream for different platform and the software developer
couldn't care less. The developer just call the function and the inaccel FPGA
orchestrator magically configure the right FPGA for the right function.

------
rwmj
_> Intel, AMD, and many other companies use FPGAs to emulate their chips
before manufacturing them._

Really? I'm assuming if this is true it can only be for tiny parts of the
design, or they have some gigantic wafer-scale FPGA that they're not telling
anyone about :-) Anyway I thought they mainly used software emulation to
verify their designs.

~~~
TomVDB
Many years ago, we had a custom made board with 8 huge Xilinx Virtex 5 FPGAs
(the largest available at the time) to emulate a large SOC. Those FPGAs were
something like $20K a piece.

We had 10 such boards, good for millions of dollars in hardware, and a small
team to keep it running.

These platform were mostly used by the firmware team to develop everything
before real silicon came back. It could run the full design at ~1 to 10MHz vs
+500MHz on silicon or 10kHz in simulation.

After running for a while, that FPGA platform crashed on a case where a FIFO
in a memory controller overflowed.

Our VP of engineering said that finding this one bug was sufficient to justify
the whole FPGA emulation investment.

~~~
mindentropy
The multiple FPGA on a board is generally from Dini Group right? Fantastic
boards.

Ref:
[https://www.dinigroup.com/web/index.php](https://www.dinigroup.com/web/index.php)

~~~
duskwuff
Dini's naming schemes are hilarious. They're all named like monsters in
B-movies -- their latest system, the DNVUF4A, is called "Godzilla's Butcher on
Steroids", for instance.

Also, Dini got acquired by Synopsys a few years ago.

~~~
mindentropy
Oh I love their humor. There is always something humorous written for their
status LEDS.

 _" Although no specific testing was performed, sophisticated statistical
finite element models and back of the envelope calculations are showing the
number of status LEDs to be bright enough to execute dermatological procedures
normally done with CO2 lasers. Contact the factory for more information about
this sophisticated feature and make sure an adult is present during operation.
These LEDs are user controllable from the FPGAs so can be used as visual
feedback in addition to burning skin."_

 _" As with all of our FPGA-based products boards, the DNVUPF4A is loaded with
LEDs. The LEDs are stuffed in several different colors (red, green, blue,
orange et al.). There are enough LEDs here to melt cheese. Please don't melt
cheese without adult supervision. These LEDs are user controllable from the
FPGAs so can be used as visual feedback in addition to the gratifying task of
creating gooey messes."_

------
andromeduck
IMO the next big application for FPGAs is going to be to serve as a
programmable DMA-engine of sorts. Have some a bunch of hard logic like ALUs
and/or IO/s strewn about. Like for hw accelerated sql queries, malloc/free,
data-specific compressors and the like.

------
Koshkin
I wonder what would be the advantages of using an FPGA to _test_ a CPU design
- compared to relying on a (presumably more accurate) computer-based
simulation. (I understand the reasons one might want to _implement_ a CPU in
an FPGA.)

~~~
dbcurtis
This idea is more than 30 years old. It has been done, and one upon a time
companies were built around this idea.

First off, mapping an entire CPU to an FPGA cluster is a design challenge
itself. Assuming you can build an FPGA cluster large enough to hold your CPU,
and reliable enough to get work done on it, you have the problem of
partitioning your design across the FPGA's. Second problem: observability. In
a simulator, you can probe anywhere trivially, with an FPGA cluster, you must
route the probed signal to something you can observe. (I am not even going to
talk about getting stimulus in and results out, since with FPGA or simulator,
either way you have that problem, it is just different mechanics.)

The big problem is that an FPGA models each signal with two states: 1 and 0. A
logic simulator can use more states, in particular U or "unknown". All latches
should come up U, and getting out of reset (a non-trivial problem), to grossly
oversimplify, is "chasing the U's away". An FPGA model could, in theory, model
signals with more than two states. The model size will grow quickly.

Source: Once upon a time I was pre-silicon validation manager for a CPU you
have heard of, and maybe used. Once upon a time I was architect of a hardware-
implemented logic simulator that used 192 states (not 2) to model the various
vagaries of wired-net resolution. Once upon a time I watched several cube-
neighbors wrestle with the FPGA model of another CPU you have heard of, and
maybe used.

Note: What would 3 state truth tables look like, with states 0,1,U? 0 and 1 is
0. 0 and U is 0. 1 and U is U -- etc. You can work out the rest with that
hint, I think.

Edit to add: Why are U's important? They uncover a large class of reset bugs
and bus-clash bugs. I once worked on a mainframe CPU where we simulated the
design using a two-state simulator. Most of the bugs in bring-up were getting
out of reset. Once we could do load-add-store-jump, the rest just mostly
worked. Reset bugs suck.

~~~
jacquesm
> Reset bugs suck.

Indeed they do. And even if you have working chips you get the next stage:
board level reset bugs. A MC68K board I helped develop didn't want to boot,
some nasty side effect of a reset line that didn't stay at the same level long
enough stopped the CPU from resetting reliably when everything else did just
fine. That took a while to debug.

------
m3kw9
The thing with FPGA is that companies when faced with cash and time crunch
will opt to use a FPGA instead of designing ASICs. The tools suck but
companies will hire someone that will do it. FPGA fit a very particular
constraint and still solves very specific problems efficiently

------
bsder
The problem that FPGAs have is that they are only good for low-volume
solutions that require flexibility and have no power constraints.

That's a really narrow market. Telecom equipment and lab equipment, basically.

If I need volume, I need at least an ASIC. If I need to manage power, I need a
full custom design.

~~~
GeorgeTirebiter
MicroSemi (now part of Microchip) makes some low-power FPGAs. Xilinx has made
the coolrunner CPLDs for years that are mighty low-power (they're not huge,
but often are big enough for some needed extra logic.). (Another not care too
much about power is Military.)

------
wwarner
This is really interesting. If a cpu hardware vulnerability like spectre could
be repaired by patching an fpga on the SOC that would be incredible. That type
of functionality would overtake the entire cloud market in about 3 days.

~~~
rwmj
I'm afraid it doesn't work like this. That would only be possible if the chip
was using an FPGA fabric for the relevant parts of the design. For example if
the L1 cache was implemented as an FPGA you could in theory patch around L1TF.
But they wouldn't do that because it would be far slower/larger than
implementing it directly as an ASIC.

Or you might imagine a chip that has an FPGA on the side (I expected Intel
would ship this after acquiring Altera, but it never happened). But the FPGA
would somehow have to have access to the paths that caused the vulnerability,
which is highly unlikely, and would also be really slow compared to what they
actually do which is hacking around it by microcode changes.

~~~
duskwuff
> Or you might imagine a chip that has an FPGA on the side (I expected Intel
> would ship this after acquiring Altera, but it never happened).

They did: [https://www.anandtech.com/show/12773/intel-shows-xeon-
scalab...](https://www.anandtech.com/show/12773/intel-shows-xeon-scalable-
gold-6138p-with-integrated-fpga-shipping-to-vendors)

But I get the sense this part was aimed at a few very specific customers. It
required some PCB-level power delivery changes, so you couldn't even drop it
into a standard server motherboard.

------
PanosJee
inaccel.com is making lots of steps to bring FPGA to 2020

Spark/k8s integration Abstraction of popular cores Python APIS Serverless
deployments Etc

