
How FPGAs work, and why you'll buy one (2013) - PascLeRasc
https://www.embeddedrelated.com/showarticle/195.php?1
======
striking
I remember this article. This is by _yosefk.

I love FPGAs. They're awesome for interacting at a low level with just about
any kind of hardware, at nano-second-level latency.

But although _I_ bought one, I don't think most people will end up buying one,
unfortunately; the vast majority of the consumer stuff that we considered
putting on FPGAs in the past (graphics rendering, heavy math) is solved six
ways to Sunday by today's GPU. The FPGA also lacks an open-source (or even
shared-source) toolchain, there's no way to perform resource sharing, they
have code constraints that are physical limits, and so on.

Basically, FPGAs rock when used for chip design or for powering custom
prototype or small batch hardware. But they fall into a really narrow portion
of the market, because there are few cost-effective consumer-level solutions
that make use of FPGAs.

A shame, really. They're super cool.

~~~
BrainInAJar
You can always spend $40 and use this to play with one:
[http://www.clifford.at/icestorm/](http://www.clifford.at/icestorm/)

~~~
striking
Oh, wow. I have never seen this before. That's enormously useful and could be
a huge step towards getting FPGAs on the "maker" market. Thanks for the link!

------
ShinyCyril
Just a shout-out about the Icestorm project which provides a complete open-
source toolchain for Lattice's iCE40 FPGAs. It's mostly the work of one guy
(Clifford Wolf):
[http://www.clifford.at/icestorm/](http://www.clifford.at/icestorm/).

It's worth noting that arachne-pnr ([https://github.com/cseed/arachne-
pnr](https://github.com/cseed/arachne-pnr)) is written by a different person
(Cotton Seed).

------
jcr
Two previous discussions with lots of comments.

(109 points, 331 days ago, 107 comments)

[https://news.ycombinator.com/item?id=9388751](https://news.ycombinator.com/item?id=9388751)

(379 points, 999 days ago, 152 comments)

[https://news.ycombinator.com/item?id=5895672](https://news.ycombinator.com/item?id=5895672)

------
makomk
There's an important caveat that this is missing: yes, FPGAs can do a whole
bunch of multiply-accumulate operations at once, but each of those units
really wants to carry out the same multiply-accumulate operation every clock
cycle, with inputs coming from the same sources and outputs going to the same
destinations. With DSPs you can trivially execute a completely different step
of the computation every clock using the same multiplier, but on FPGAs you
either have to set up all your own control logic to multiplex all the DSP
units in and out at the right time or have one multiplier per stage in the
computation and inevitably lose efficiency because the rate you're feeding
data in is slower than the speed the FPGA can run at. And large FPGAs are
relatively expensive and, well, big - the bottom-of-the-range Artix 7 with 45
multiply-accumulate slices (enough for one 45-stage computation without
complex tricks) is $25 and the smallest package you can get it in is 1cm2.

~~~
ginko
>And large FPGAs are relatively expensive

That's an understatement.

[http://www.findchips.com/search/xcvu9p](http://www.findchips.com/search/xcvu9p)

~~~
aristidb
Intel and NVidia also sell chips in this price range.

[http://ark.intel.com/products/84685/Intel-Xeon-
Processor-E7-...](http://ark.intel.com/products/84685/Intel-Xeon-
Processor-E7-8890-v3-45M-Cache-2_50-GHz) [http://www.amazon.com/Nvidia-
Accelerator-passive-cooling-900...](http://www.amazon.com/Nvidia-Accelerator-
passive-cooling-900-22080-0000-000/dp/B00Q7O7PQA)

Now, I guess you can decent performance out of cheaper chips already, but
paying $5000 for an Intel CPU is hardly rare.

~~~
b1340276
You are ignoring that the 12 core Xeons are still relatively cheap at $2000
which means it's still affordable to just buy four of them and put them into a
single server.

------
scottbez1
The thing that really got me excited about FPGAs was working with the
"HappyBoard" that was used for MIT's 6.270 robotics competition[1]: it's
similar to an Arduino (AVR microcontroller with pin headers broken out and
USB), but with most of its IO (sensors, motors) going through a Xilinx FPGA
rather than directly connected to the ATMega128.

What that meant was that the AVR, which was relatively slow at 8MHz, was free
to run all your logic (including multithreading!) while the FPGA handled most
of the timing-critical things, like motor PWM and sensor decoding.

But the coolest part was that you could dynamically modify the IO if you
needed different behavior. Instead of having to find, buy, and wire up a
standalone quadrature decoder IC to count axle rotations, I just had to write
some code [2] and suddenly the board has a high speed quadrature decoder (or
several) built in!

The biggest roadblock is the toolchain though - even though [2] was a pretty
simple change, it took a long time to download and install the Xilinx ISE
tools, and they're not the easiest to use.

I would love to see a general purpose AVR+FPGA board with the FPGA toolchain
neatly packaged the way Arduino/Wiring has done for AVR. Verilog has a
somewhat steep learning curve, so you might want to hide that in some kind of
module system, with basic library modules like PWM, edge counting, or
quadrature decoding that can just be mapped to the pins you want. Maybe
something building on top of Icestorm?

[1] [http://spacecats.mit.edu/contestants/happyboard-
manual.pdf](http://spacecats.mit.edu/contestants/happyboard-manual.pdf)

[2]
[https://github.com/sixtwoseventy/joyos/commit/44975ea9bc64e5...](https://github.com/sixtwoseventy/joyos/commit/44975ea9bc64e507106fb2afda53e0a51002e498#diff-2cca5d619d44fd4f2b13b4440c410a03R1)

------
ErikZachrisson
To be frank after having experience with both FPGA and software development.
If you can solve the problem in software, just do it.

The cost in time and complexity of solving a problem in hardware is almost
always larger.

When writing software you gain a level of abstraction. You do not have to deal
with getting a special hardware (the FPGA), mess with proprietary buggy tools,
and all the nitty gritty stuff of proper reset, multiple clock domains, timing
closure where each rebuild of your code might take multiple hours to close.
And don't get me started with writing HDL-code.

I implemented an object tracker using a FPGA and a webcam and there were
multiple hardware problems that needed to be handled such as buffering data in
the sdram (design a sdram memory controller), research and implement image
filters and debug why the camera image sometimes got corrupted (the connection
cables were too long).

Almost all of these problems would have been gone by instead connecting an
USB-webcam to my PC and utilize the openCV library.

~~~
sklogic
> The cost in time and complexity of solving a problem in hardware is almost
> always larger.

Why?

I have exactly the opposite feeling when I'm dealing with software - "if only
I could design my own cache here, how much easier would it be to get around
this bottleneck!"

~~~
nickpsecurity
Are you arguing against the claim that it's much harder to design hardware
than software to solve a problem? That's the main claim.

~~~
sklogic
No, I'm arguing against a universality of such a claim. There is a
performance/complexity threshold above which hardware or hybrid solution is a
way easier than a purely software one.

~~~
nickpsecurity
I can see for performance. Software seems better at managing complexity if
good structure is used. What example do you have where software is harder to
develop and test than hardware due to complexity?

~~~
sklogic
> Software seems better at managing complexity if good structure is used.

Not until you hit a deep semantic difference between your hardware and a code
you're trying to build.

For example, there are many things that are best expressed as a Network-on-
Chip: a pipeline of specialised CPU cores each doing something simple. And
such a notion is not that easy to represent in a pure software without a
gigantic performance penalty. A pair of a CPU core and a small piece of code
running on top of it can be far simpler than any piece of code written for an
existing but not fitting architecture.

A simple example of such would be a dedicated Forth processor plus a Forth
implementation. They two together are often much simpler than, say, a
standalone Forth implementation for x86.

~~~
nickpsecurity
I still doubt it's better at reducing complexity because that stuff is
horrific in RTL. I can express it with a parallel programming language with
ease. A parallel programming language with macros and automatic placement onto
cores/DSP's would be even easier. Describe it, compile, and it probably runs.
With hardware, I'm looking at several more layers of verification, testing,
material checks, and so on.

High-level synthesis tools on the other hand might make your claim true.
They're a combo of HW and SW languages.

"A simple example of such would be a dedicated Forth processor plus a Forth
implementation. They two together are often much simpler than, say, a
standalone Forth implementation for x86. "

Simpler for who? The person implementing both a Forth implementation and a x86
processor? Or the person just implementing a Forth on x86? I've seen Forth
implementations in SW. They're trivial. Compilers, libraries, and hardware do
all the heavy lifting. I'll put it to the test, though, as I feel you might
have picked a good example.

J1 Forth CPU Verilog code
[http://excamera.com/files/j1demo/verilog/j1.v](http://excamera.com/files/j1demo/verilog/j1.v)

eForth assembler for Z80 & such in few assembly instructions
[http://www.figuk.plus.com/4thres/systems.htm](http://www.figuk.plus.com/4thres/systems.htm)

Forth interpreter in Ada (1985)
[http://www.forth.com/archive/jfar/vol3/no2/article7.pdf](http://www.forth.com/archive/jfar/vol3/no2/article7.pdf)

Maybe it's just that I don't know Verilog. However, the Verilog code looks
more like the assembler in second link. It's like a pile of gibberish. The Ada
code is very straightforward. I bet the debugging was easier, too. S, I'm
still not sold on hardware reducing complexity versus software implementation.

I'm 100% agreeing on improving performance esp where semantic mismatch occurs.
Hell, that's what inspired me to ask you and some others to evaluate those two
or three HLS tools so I could start cranking out accelerators if one was
legit. ;)

------
dkopi
FPGAs are awesome. They help solve problems that software is too slow to
solve, but aren't large enough in scale for ASICs. They're also great for
prototyping new chips before actually shelling out the hundreds of thousands
of dollars NRE for a new ASIC design.

Here's a great stackexchange discussion on why and how FPGAs can outperform
CPUs for certain tasks:
[http://electronics.stackexchange.com/questions/101472/how-
ca...](http://electronics.stackexchange.com/questions/101472/how-can-an-fpga-
outperform-a-cpu)

------
kutkloon7
In the projects where I use them in university I always found them quite hard
to work with.

While the idea of programming them with VHDL or Verilog is quite
straightforward, there were a lot of bumps in the road to getting a working
design. Specialized tools which generate reports that go on for pages and
pages (our team was completely unable to do something about the path of
longest delay), complicated information about the clock, and having to work
with vague documentation and intellectual property.

FPGA's are really nice when they work, but I feel like they are just too
complicated for developers to get started with on their own, at the moment.

~~~
makomk
Yeah, they're a bit of a dark art. They require hardware design tricks like
pipelining for best performance, you can run into fun stuff like routing
fabric congestion which absolutely kills performance and in turn is affected
by stuff like the layout the tools choose and whether you're using cheaper or
more expensive FPGAs (which at least for Xilinx have very different balances
of routing resources), etc. I've seen timing failures due to the signals
having to divert around the clock generators at the edge of the chip, which
were effectively a wedge with no routing fabric in between two sections of
logic.

------
gumby
This is a nice intro but the author forgets another downside of FPGAs which is
power consumption. The hard cores help, but in the limiting case you end up
with an ASIC.

