
An FPGA Is an Impoverished Accelerator - samps
http://homes.cs.washington.edu/~asampson/blog/fpga.html
======
bsder
What a useless article.

The big problem is software people thinking that they have any concept of
actual hardware design.

If they understood hardware, they would understand that an FPGA is the least
efficient way to accomplish anything.

Routing is sparser than any chip. You burn 10-100x the transistors to do the
same task. FPGA's are hot and slow.

Even for signal processing, an FPGA is going to be quite hard pressed to beat
a 2.0GHz ARM with Neon extensions unless it is _very_ expensive and your
algorithm is very dataflow oriented. How many ARM's can I put on a board for
$10,000-$100,000 (the very highest end FPGA's)?

You use an FPGA because you have a low-volume application that you can't do
any other way, and your application has enough margin that you can eat the
cost of the FPGA. And you are always looking to wipe out that FPGA and replace
it with a microprocessor because it is so much cheaper and easier to deal
with.

~~~
analognoise
"FPGA is the least efficient way to accomplish anything" Define 'efficient'.
If you're talking about cost, it costs far less than an asic below a certain
volume, and it certainly costs less in tooling and development.

"FPGA's are hot and slow" \- compared to a full custom IC? Sure. FPGA's
improve with every process generation (like all silicon devices) and an ASIC
design won't intrinsically take advantage of those advances; an FPGA design
that didn't meet it's power or thermal envelope 5 years ago might easily do so
now, without incurring the NRE of the ASIC - add to that the fact that the
first stage of the ASIC design can be prototyped via the FPGA, and you have a
viable product without the risk of a bad ASIC.

I've been involved in converting several Virtex-2 designs to newer devices -
the huge reduction in power and increase in available logic has led to some
extremely impressive gains. There is work to do in such a conversion, but it
is understood work - there's no real mystery to updating the CoreGen
components.

Agree it is a useless article though, because digital logic design is not
programming (it is architectural work). There is no 'abstracting' that away -
all attempts thus far have failed miserably (Vivado HLS, for example, turns
out designs that work but are HUGE compared to what even a passable designer
can do).

~~~
aninhumer
>There is no 'abstracting' that away

While hardware design often has awkward constraints that make generalised
abstractions tricky, there is still a _lot_ that can be done to improve over
Verilog or VHDL. I've been working in Bluespec for a couple of years now, and
the difference is night and day. Having a modern type system in our HDL makes
experimentation and iteration so much easier.

------
thisrod
I've heard several computational physicists make this complaint to NVIDIA
sales reps. The standard response, which I'm sure is correct, goes as follows.

Designing a fast processor is very expensive, far beyond the means of the
research community. The only way anyone can afford it is to sell millions of
the things to gamers. To put $1 of special hardware on your numerical card, we
have to put it on 1000 graphics cards too, so you'd have to pay $1000 for it.
Bad luck: scientists are destined to hack hardware that was designed for
larger markets.

~~~
jjoonathan
Yet somehow AMD manages to consistently offer better hardware (wrt double
floating point performance) for a lower price. I'm sure it's because the fine
folks at AMD are silicon wizards and not because of NVIDIA's cozy monopoly
position due to shrewd marketing of CUDA + their early-mover advantage in
academic markets.

~~~
duaneb
The tools are more valuable than raw performance, which can be bought with
time and money.

~~~
jjoonathan
Absolutely true, but there's better overlap between tooling required for the
game industry and tooling required for academic compute than there is between
the respective hardware (double vs single (or lower) float performance).

~~~
raverbashing
Well, buy AMD then...

But yeah, for games, maybe fp is going to make more sense than fixed-point
with time, as things like ray-tracing, etc, begin to be used.

~~~
Tuna-Fish
Games are basically 100% floating point on the graphics side already. Even the
color shading is done on floating point quantities on modern engines.

The problem isn't that games are not FP, it's that for games, 32-bit precision
is good enough, and for most problems actually way more than they need.

~~~
raverbashing
Yes, it's hard to justify a 48bit fpu or 64bit fpu when you can have 1.5x or
2x the number of computing units (approximately)

Still, "not as fast as we wanted" is a "modern researcher problem" ;) Some
years ago they would have been converting it to run in integers so that it's
not unbelievably slow.

------
gioele
> FPGAs are legacy baggage in the same way that GPGPUs are.

I hoped the author would expand on this point.

It is also my impression that GPGPU are just "a hack": they should had been
normal coprocessors to the main CPU, just like the FPU and the vector units
are. It seems that now we are finally reaching that model (in Linux the
graphics device is almost completely separated from the computational device,
although they are on the same physical device most of the time) but we are
still far from the "Comprocessor extension" opcode space of MIPS processors or
to the "brain and arms" of CELL (1 generic CPU, many specialized
coprocessors).

------
nullc
FPGAs would be more attractive if they weren't so over priced... good thing
that patents are around to almost completely eliminate competition in that
space.

~~~
Alphasite_
Or from the other view, reduce entering the market from an extremely lengthy
and risk R&D venture into a known fee. Which you can account for and
drastically lowers risk.

------
retroencabulato
I wish he would comment more on what he finds wrong with HDLs?

I fail to understand why using a HDL for a digital ASIC is fine, but using one
for a FPGA in the context of acceleration is not.

~~~
jjoonathan
He's annoyed that the HDL doesn't describe the entirety of a typical FPGA
"program". Some things just can't be emulated efficiently by the FPGA fabric
(or they are common enough patterns that it would be wasteful to do so) and
the workaround that has become the de-facto standard is to include "ASIC
chunks" in the middle of all the programmable gates. For instance, you might
have a serial output that runs at 10Gbps while the rest of the FPGA runs at
500MHz. To bridge the gap between the slower programmable logic and the fast
transceiver you need a shift register. The way you specify this in code is by
importing a vendor-specific "library" \-- except it's not really a HDL library
at all, it's a black box that the proprietary back-end hooks up to to the
"ASIC chunks" at compile time.

It's like compiling against a binary library, except that the binary isn't
another piece of software, it's an etched pattern on your FPGA's wafer. Even
if you did have the "source code" it wouldn't do you any good unless you have
a foundry in your backyard :-)

I'm skeptical of the calls for a higher level of abstraction. How are you
going to abstract away the fact that the FPGA has exactly 2 embedded memory
controllers that have precisely A, B, and C inputs and X, Y, and Z outputs?
Either you come up with a solution that's effectively just as ugly as what we
have now (because it exposes the FPGA's resources explicitly) or you come up
with a solution that hides these details and as a result becomes enormously
fragile because it's easy to accidentally change something that prevents the
compiler from inferring which embedded ASIC chunk you meant to use. You need
to be aware of limitations to work within them, and the limitations seem to be
stuck with us for the foreseeable future.

~~~
reacweb
The same way we have 3D printers now, I would dream to have a foundry in my
backyard.

~~~
jjoonathan
Me too, brother. Me too.

------
socceroos
Are there any attempts out there to build a better open standard than FPGA?
I'd be interested to look into them if there were.

~~~
minthd
I believe menta licenses fpga cores(LUT architectures). But FPGA's have so
much that isn't LUT which is critical for performance.

------
sklogic
Yes, RTL level of abstraction is a way too low, even for most of the ASIC
things. Yes, we need higher level HDLs (more abstract than the said Chisel and
Bluespec). I'm working on it, stay tuned.

But what I cannot get from this article is what is exactly wrong with the
current FPGAs design? They've got DSP slices (i.e., ALU macros), they've got
block RAMs and all the routing facilities one can imagine. For the dataflow
stuff it's more than enough.

Of course it would have been much better if the vendors published the detailed
datasheets for all the available cells and the interconnect, for the bitfile
formats, etc. - to make it possible for the alternative, open source
toolchains to appear. Yes, their existing toolchains are, well, clumsy. But it
is still quite possible to abstract away from the peculiarities of these
toolchains.

~~~
minthd
Best of luck for your project. I'm curious about it and i'll wait.Instead i'll
ask: what are your opinion regarding embedded/mcu software tools? do you see
something better than rust that can automate the dev process ?

~~~
sklogic
Thanks. I've been a Forth fan, but recently, looking at the advances in the
static code analysis, I'd suspect that the higher level languages have a
chance to become very useful in the mcu limited resources environment too.
Rust is a nice attempt, and there is also a possibility that something doing a
proper region analysis can kick in (looking at languages like Harlan, I would
not say it's impossible).

