
FPGA Design for Software Engineers - srjilarious
https://www.walknsqualk.com/post/014-tiny-fpga-bx/
======
dopeboy
Ten years ago, in grad school, I co-wrote a video conferencing module in VHDL
[0]. I haven't touched HDLs since but here's what I remember very clearly from
that project:

* It took > 3 minutes to compile our code.

* DMA made a huge performance difference once we figured it out.

* Realizing that we had to be one with the clock tick took a lot of time. Understanding synchronous based programming (if that's the term) was a paradigm shift for my partner and I.

* The utter delight when we got frames streaming across the wire. The performance (though over a LAN) was silky smooth and you could tell immediately this was different than the run-of-the-mill x86 desktop program.

[0] -
[http://www1.cs.columbia.edu/~sedwards/classes/2009/4840/repo...](http://www1.cs.columbia.edu/~sedwards/classes/2009/4840/reports/RVD.pdf)

~~~
MagnumPIG
Your PDF link is a 404.

~~~
rrss
Maybe fixed? Not a 404 for me anyway.

------
xvilka
The problem of FPGAs is their proprietary nature, and Verilog/VHDL are far
from the best languages. Gladly there is a number of open-source projects
aiming to close this gap - Yosys[1], SymbiFlow[2], Chisel3[3]/FIRRTL[4]. Some
time ago I suggested[5] different open source projects should unite and reuse
the common intermediate language, akin to LLVM in many software development
and analysis tools. From my point of view, FIRRTL is the best designed one,
there is a huge problem of being implemented in Scala though, especially for
C/C++/etc written projects. Hopefully, there will be more collaboration one
day. Either reimplementation from scratch, e.g. in Rust or C++, or using Scala
Native.

[1] [https://github.com/YosysHQ](https://github.com/YosysHQ)

[2] [https://symbiflow.github.io/](https://symbiflow.github.io/)

[3] [https://www.chisel-lang.org/](https://www.chisel-lang.org/)

[4] [https://www.chisel-lang.org/firrtl/](https://www.chisel-lang.org/firrtl/)

[5]
[https://github.com/SymbiFlow/ideas/issues/19](https://github.com/SymbiFlow/ideas/issues/19)

~~~
alexhutcheson
If you're just learning, I'd _highly_ recommend that you stick to the
toolchains supported by the manufacturer of the chip you're using. The various
open source toolchains are very cool, but they have sharp edges and
limitations that won't be obvious to a beginner. Many of them also focus on
high-level synthesis, which isn't good for beginners, because you really need
to learn the ins and outs of digital logic to be able to debug the output of a
high-level synthesis tool.

------
lnsru
I am curious, why System Verilog isn’t mentioned in this article. It is much
much better than Verilog and is used in industrial applications.

What I am missing are 2 topics: timing analysis and debugging. Static timing
analysis and proper timing constrains are crucial for functional design. No
tool can differentiate without constrains a slow signal signal toggling LED
every 10 seconds from DDR3 533 MHz differential clock line.

Debugging FPGAS design cannot be avoided. Even if design works in simulator,
it fails very often on real hardware. And then real fun starts. Xilinx has
Integrates Logic Analyzer, Intel has Chipscope. Other vendors have their own
similar tools. There are best FPGA designer’s friends. But these tools can’t
be trusted, they break sometimes design in unexpected ways. Designer must
develop gut feeling, what’s happening. Synthesis of design with integrated
logic analyzer takes much more time than regular one. Debugging cycles are
insane long. Forgetting one signal can mean 2 hours waiting, so add them all
at very beginning.

Writing hardware description language is easy part. As somebody already
mentioned, everybody can count ones and zeros. First problem every software
engineer encounters is simple: there is nothing to program. FPGA design is
about describing your system with these ancient language. Second problem is
using bulky toolchain. It’s more than compile and debug buttons. In fact,
there is huge machine processing code to bitstream. And it’s complexity
naturally takes time to understand, you don’t need to be smart to be FPGA
designer.

~~~
q3k
> I am curious, why System Verilog isn’t mentioned in this article. It is much
> much better than Verilog and is used in industrial applications.

Probably because it has very limited support in open source tooling, same as
VHDL.

~~~
orbifold
That is not true: Verilator is an excellent tool that compiles a large subset
of synthesizable System Verilog to C++. Companies like Tesla use it and report
speed ups of up to 40 compared to commercial tools. Similarly for vhdl there
exists ghdl, which seems pretty feature complete and has an LLVM backend

~~~
q3k
Right, but these are simulators, while the article is looking for a language
to both simulate and synthesize using open source tooling.

Yosys (which AFAIK is still the only non-toy open source synthesis tool)
supports a _very_ small subset of SV, and does not support VHDL (at least in
the open source version).

------
omeze
This is a pretty nice tutorial! My courses in FPGA design in school taught me
a ton about 1) concurrency and 2) good state machine design. In modern backend
web development these topics receive so little attention (from interviewing
all the way to writing technical specs, I've rarely encountered these topics
brought up explicitly) but are important. I was a bit hesitant for this guide
to suggest using C++ since I tend to dislike mixing traditional languages with
hardware languages but I realized it was just for testbenches, which is very
reasonable (and even VHDL exposes things like `for` loops that are really only
useful for testing and meaningless otherwise - sans some special cases[1]).

[1] You can abuse some imperative paradigms to implement things like Conway's
Game of Life as a systolic array -
[https://en.wikipedia.org/wiki/Systolic_array](https://en.wikipedia.org/wiki/Systolic_array)

~~~
scott_wilson46
I disagree about for loops, you actually end up using these quite a lot in
vhdl/verilog (with understanding about what logic you are going to end up
with), if you want to do the same operation on multiple things:

    
    
      input [NUM_OF_MULTIPLIERS*32-1:0] a_in,
      input [NUM_OF_MULTIPLIERS*32-1:0] b_in,
    
      output [NUM_OF_MULTIPLIERS*64-1:0] mult_out
    
      reg [31:0] tmp_a, tmp_b;
      reg [63:0] tmp_mult;
    
      always @(*) begin
        mult_out = {(NUM_OF_MULTIPLIERS*64){1'b0}};
        for (i=0; i<NUM_OF_MULTIPLIERS; i+=1) begin
          tmp_a = a_in>>(i*32);
          tmp_b = b_in>>(i*32);
          tmp_mult = tmp_a*tmp_b;
          mult_out |= tmp_mult<<(i*64);
        end  
      end
    

Would give you NUM_OF_MULTIPLIERS multipliers. If you wrote each multiply out,
it would be more code and also wouldn't allow you to parametrize the code.

~~~
Traster
The key is that for loops are essentially pre-processor macros (like C) so
they must have a fixed number of iterations known at compile time. So yes, you
have a for loop, but it's very different to what you expect from a for loop in
software.

~~~
kingosticks
Yes, the key is that loops are always unrolled so the number of iterations
(number of copies of the hardware) is fixed. But whether the output of each
iteration is used or not can be entire dynamic, potentially resulting in
something very similar to a loop in software.

------
jeremycw
I wrote tetris in verilog that can output to VGA 10 years ago for a university
project. The code is here: [https://github.com/jeremycw/tetris-
verilog](https://github.com/jeremycw/tetris-verilog) for anyone interested.
Comments are sparse but it might be interesting for anyone looking for some
example code that's relatable.

------
analog31
I recently got a TinyFPGA-BX, and have been slowly working through the
tutorials. The amusing thing is that, for the actual applications I'm working
on, a contemporary microcontroller can actually keep up just fine, and is
easier for me to comprehend. Still, one of these days, a use will for an FPGA
will crop up for which I'll be glad that I learned.

~~~
cowbellemoo
I picked up a TinyFPGA BX to make a VU-meter with strips of neopixels for a
Halloween project (keystep+volca keytar with lots of reactive lights). You can
do this with microcontrollers but I wanted to stretch myself and see if I
could get a crazy-responsive 7-band meter working. I'm like 95% of the way
there after several months learning verilog, testbenches, how a few modules
off github work, the I2C protocol, and how to use a logic analyzer -- but I'm
stuck trying to get a ADS1115 to do one-shot conversions reliably and probably
have to implement the VU-meter with an arduino to get it done for Halloween.
It's absolutely thrilling to be working with nanosecond-scale operations and
totally parallel design though.

~~~
rough-sea
Sounds like a fun project.

You might find this video helpful:
[https://www.youtube.com/watch?v=us2F8wAncw8](https://www.youtube.com/watch?v=us2F8wAncw8)

I think his design is very interesting, showing how to mix custom peripherals
with picosoc so you can get very good response but also be able to program in
C.

------
kpmcc
For people who are curious about FPGAs looking to dip their toes in, I’d
highly recommend taking a look at cocotb
[https://github.com/cocotb/cocotb](https://github.com/cocotb/cocotb)

It’s kind of similar to verilator, in that it lets you write test benches for
your designs in programming languages as opposed to HDL. Whereas verilator
lets you write c++, cocotb is python based.

Both of these are probably best to take up after spending some time with an
hdl, so you learn to think from a hardware perspective.

Also check out the zipcpu blog

------
diarmuidc
That's a good intro. What I always have to remind software guys who switch to
FPGA design is remember each line you write is eventually going to end up in
hardware.

That means when you write something you should have some understanding of the
underlying hardware inferred. So is this going to give me some combinatorial
logic, a register or a RAM? It's very easy to keep the software mindset of if
it compiles then it's good.

------
vvanders
Great article, I wish the discussions around clocks had gone a bit more into
how the tradeoff of pipelining vs longest operation ends up impacting designs.
That and SRAM vs DRAM access latencies were the things that really connected
the dots from how performance optimization on the software side of things is
rooted in physical hardware limitations.

~~~
bogomipz
Might you or anyone else have some links or references you could share on
these two topics? Was there a specific book that helped connect the dots that
you could recommend?

~~~
mng2
Here's a greatly simplified example. Let's say you're trying to calculate y =
mx + b in your FPGA. You want this operation to run at 100 MHz. Great, you
write the code, synthesize and implement. Uh oh, the tools report that your
design has failed timing analysis. What now?

Looking at the output of the tools, they'll say something like "x to y setup
time: -2 ns slack". That means your desired operation can't meet the 10 ns
clock period; it actually takes 12 ns for all the logic to ripple through. So
now what?

You can break up the operation into two steps. Let's say the multiplication
takes 8 ns, and the addition takes 4 ns. In timestep 1 you do z = mx, and
pipeline c = b. Then in timestep 2 you do y = z + c. This way your operation
takes two clock cycles = 20 ns total in terms of latency, but you can maintain
a rate of 100 MHz.

Alternatively, you could choose a slower clock rate, say 75 MHz, and have a
clock period of 13.333 ns. Then you would be able to meet the logic delay
requirements in one cycle.

Again this is greatly simplified but it's similar to what one ends up doing in
real FPGA designs. At the beginning you're usually trying to achieve maximum
performance. Then later on you add more features to the FPGA, only to find
that in doing so, you've caused an existing portion of the design to fail
timing, so you need to twiddle things around.

------
sideshowmel
I've been looking for something like this for a while. It's hard to break into
FPGA design coming from a Software Engineering viewpoint, but I think it
teaches how the machine REALLY works and can produce better Software
developers.

~~~
danharaj
I found Digital Design by Mano and Ciletti to be a nice introduction to the
basic building blocks of digital circuits. Four weeks later I'm studying
laplace transforms and transfer functions please send help.

~~~
rrss
Good on you, every idiot can count to one.

[https://en.wikipedia.org/wiki/Bob_Widlar#Fairchild_Semicondu...](https://en.wikipedia.org/wiki/Bob_Widlar#Fairchild_Semiconductor_\(1963%E2%80%931965\))

~~~
markrages
This audience is software developers. They will not have an appreciation for
Widlar's genius.

~~~
sumnole
Where do all the EEs hang out?

------
Vysero
If any of you are interested in FPGA's after reading the article I highly
recommend:

Introduction to Logic Circuits & Logic Design with VHDL by Brock J LaMeres

He was my instructor in college and the text itself is extremely helpful for
all things FPGA.

------
denton-scratch
Thank you, that was interesting. I've programmed microcontrollers, but I've
never tangled with ASICs or FPGAs. Too scared, I guess.

Your article has reduced my fear (and also, you mentioned the price of that
Tiny FPGA board; at that price point, I don't mind too much if the magic smoke
gets out).

------
SemiTom
FPGAs are getting larger, more complex, and significantly harder to verify and
debug [https://semiengineering.com/fpga-design-tradeoffs-getting-
mo...](https://semiengineering.com/fpga-design-tradeoffs-getting-more-
difficult/)

------
inamberclad
I'm surprised that no one has brought up the Lattice IceStick. It's an
oversized USB stick with an ice40 FPGA on it and it's the cheapest option I've
seen.

~~~
tverbeure
The cheapest options with much larger amount of logic are: Upduino v2.1 ($20)
and the never dying EP2C5T144 board ($10).

The latter requires a USB blaster dongle to program things so the Upduino has
the upper hand IMO, especially because it also has very lightweight open
source tools.

------
HNLurker2
Wow this is a gift from God. I was doing fpga for my CV and this just made my
life a lot easier

------
nan0
Cool! Back in college we used fpgas with verilog! The labs always took some
finagling to get working, but still very fun!

------
shaklee3
Small error in the article: 200*10e6 is 2GHz, not 200MHz.

~~~
srjilarious
Ha, good eye. corrected to 1e6 :)

~~~
topspin
Nice work. FPGA design appears to be _very_ similar to GPU shader programming.
First time I've read anything about FPGA design that connected. Usually FPGA
stories get lost in data flow jargon and I learn nothing.

~~~
lnsru
There is no programming in FPGA at all. You describe your hardware using
hardware description languages like VHDL or Verilog.

~~~
rowanG077
Then no programming exists at all. When writing a C code you are describing a
program that runs on the C abstract machine. The same thing holds for all
"programming" languages.

~~~
lnsru
Sorry, I am not ready for philosophical discussion. We can take definition
from Wikipedia:
[https://en.m.wikipedia.org/wiki/Computer_programming](https://en.m.wikipedia.org/wiki/Computer_programming)
Programming involves code execution on computer. There is no computer in FPGA.

~~~
rowanG077
Then why did you start the discussion? An FPGA is a computer just as well as
any CPU.

~~~
lnsru
Technically FPGA is a piece of memory. The functionality of the FPGA device
depends how the bits in this memory are set. Size of this memory is constant
This information is not public, brave hackers are working hard to reverse
engineer this. You can make a CPU in FPGA, no way for the opposite performance
wise. Complex simulation with couple 4k resolution pictures takes days.

Edit: the people here are decent enough to start a discussion.

~~~
rowanG077
Yes an FPGA is a different computer then a CPU. It's still a computer though.
This is the definition of a computer:

> An electronic device for storing and processing data, typically in binary
> form, according to instructions given to it in a variable program. [1]

It fits an FPGA perfectly.

[1]
[https://www.lexico.com/en/definition/computer](https://www.lexico.com/en/definition/computer)

~~~
lnsru
There is a term “variable program” in your link. When you add peripherals to
the chip on the printed circuit board it looses flexibility very fast. The
whole system is made to very specific task. But yes, you convinced me that
FPGA might be treated as a computer in an extreme case.

~~~
rowanG077
FPGA accepts "variable program"s. What you are talking about are peripherals.
A CPU with certain peripherals can also be completely inflexible. That is
completely outside the scope of what a CPU or an FPGA are though.

------
omgtehlion
little nitpick:

it's "combinational circuits", not "combinatorial" (that's whole another part
of math)

~~~
kingosticks
Do you have a reference for that? I'm genuinely curious since both terms are
used in literature.

~~~
omgtehlion
"combinatorial" was used predominantly in 1970s books for whatever reason.

In most of contemporary books and university courses "combinational" is used
and with a note that you should not confuse it with "combinatorics" and
"combinatory logic"

------
tehsauce
Good reference!

------
matiszek23
True

