Hacker News new | past | comments | ask | show | jobs | submit login
FPGA Design for Software Engineers (walknsqualk.com)
496 points by srjilarious 7 months ago | hide | past | web | favorite | 80 comments

Ten years ago, in grad school, I co-wrote a video conferencing module in VHDL [0]. I haven't touched HDLs since but here's what I remember very clearly from that project:

* It took > 3 minutes to compile our code.

* DMA made a huge performance difference once we figured it out.

* Realizing that we had to be one with the clock tick took a lot of time. Understanding synchronous based programming (if that's the term) was a paradigm shift for my partner and I.

* The utter delight when we got frames streaming across the wire. The performance (though over a LAN) was silky smooth and you could tell immediately this was different than the run-of-the-mill x86 desktop program.

[0] - http://www1.cs.columbia.edu/~sedwards/classes/2009/4840/repo...

Your PDF link is a 404.

Maybe fixed? Not a 404 for me anyway.

The problem of FPGAs is their proprietary nature, and Verilog/VHDL are far from the best languages. Gladly there is a number of open-source projects aiming to close this gap - Yosys[1], SymbiFlow[2], Chisel3[3]/FIRRTL[4]. Some time ago I suggested[5] different open source projects should unite and reuse the common intermediate language, akin to LLVM in many software development and analysis tools. From my point of view, FIRRTL is the best designed one, there is a huge problem of being implemented in Scala though, especially for C/C++/etc written projects. Hopefully, there will be more collaboration one day. Either reimplementation from scratch, e.g. in Rust or C++, or using Scala Native.

[1] https://github.com/YosysHQ

[2] https://symbiflow.github.io/

[3] https://www.chisel-lang.org/

[4] https://www.chisel-lang.org/firrtl/

[5] https://github.com/SymbiFlow/ideas/issues/19

If you're just learning, I'd highly recommend that you stick to the toolchains supported by the manufacturer of the chip you're using. The various open source toolchains are very cool, but they have sharp edges and limitations that won't be obvious to a beginner. Many of them also focus on high-level synthesis, which isn't good for beginners, because you really need to learn the ins and outs of digital logic to be able to debug the output of a high-level synthesis tool.

This is true!!! The real problem for me is the IP's. You really do need a DDR SDRAM controller for a lot of real project. Or a PCIE IP to communicate with external resources. If I just use the proprietary tooling it's just a click away to integrated this into a project. Open source has no answer for this as of yet.

Now that we have some nice FPGA's we can use I think this is the next biggest hurdle.

Don't forget https://clash-lang.org/ I've used it a bunch and its real life. I both understood what the compiled circuit would be better and could make powerful abstraction. So a better high level and low level language at the same time!

Just want add Spatial [1] to your list. It is a inovative high level language, very different from the the traditional HDL.

[1] https://spatial-lang.org/

Also Hardcaml, which allows write FPGA code in OCaml.


I am curious, why System Verilog isn’t mentioned in this article. It is much much better than Verilog and is used in industrial applications.

What I am missing are 2 topics: timing analysis and debugging. Static timing analysis and proper timing constrains are crucial for functional design. No tool can differentiate without constrains a slow signal signal toggling LED every 10 seconds from DDR3 533 MHz differential clock line.

Debugging FPGAS design cannot be avoided. Even if design works in simulator, it fails very often on real hardware. And then real fun starts. Xilinx has Integrates Logic Analyzer, Intel has Chipscope. Other vendors have their own similar tools. There are best FPGA designer’s friends. But these tools can’t be trusted, they break sometimes design in unexpected ways. Designer must develop gut feeling, what’s happening. Synthesis of design with integrated logic analyzer takes much more time than regular one. Debugging cycles are insane long. Forgetting one signal can mean 2 hours waiting, so add them all at very beginning.

Writing hardware description language is easy part. As somebody already mentioned, everybody can count ones and zeros. First problem every software engineer encounters is simple: there is nothing to program. FPGA design is about describing your system with these ancient language. Second problem is using bulky toolchain. It’s more than compile and debug buttons. In fact, there is huge machine processing code to bitstream. And it’s complexity naturally takes time to understand, you don’t need to be smart to be FPGA designer.

> I am curious, why System Verilog isn’t mentioned in this article. It is much much better than Verilog and is used in industrial applications.

Probably because it has very limited support in open source tooling, same as VHDL.

That is not true: Verilator is an excellent tool that compiles a large subset of synthesizable System Verilog to C++. Companies like Tesla use it and report speed ups of up to 40 compared to commercial tools. Similarly for vhdl there exists ghdl, which seems pretty feature complete and has an LLVM backend

Right, but these are simulators, while the article is looking for a language to both simulate and synthesize using open source tooling.

Yosys (which AFAIK is still the only non-toy open source synthesis tool) supports a _very_ small subset of SV, and does not support VHDL (at least in the open source version).

> Intel has Chipscope

I think "Chipscope" is a Xilinx thing actually.

You’re right! It’s Signaltap II.

This is a pretty nice tutorial! My courses in FPGA design in school taught me a ton about 1) concurrency and 2) good state machine design. In modern backend web development these topics receive so little attention (from interviewing all the way to writing technical specs, I've rarely encountered these topics brought up explicitly) but are important. I was a bit hesitant for this guide to suggest using C++ since I tend to dislike mixing traditional languages with hardware languages but I realized it was just for testbenches, which is very reasonable (and even VHDL exposes things like `for` loops that are really only useful for testing and meaningless otherwise - sans some special cases[1]).

[1] You can abuse some imperative paradigms to implement things like Conway's Game of Life as a systolic array - https://en.wikipedia.org/wiki/Systolic_array

I disagree about for loops, you actually end up using these quite a lot in vhdl/verilog (with understanding about what logic you are going to end up with), if you want to do the same operation on multiple things:

  input [NUM_OF_MULTIPLIERS*32-1:0] a_in,
  input [NUM_OF_MULTIPLIERS*32-1:0] b_in,

  output [NUM_OF_MULTIPLIERS*64-1:0] mult_out

  reg [31:0] tmp_a, tmp_b;
  reg [63:0] tmp_mult;

  always @(*) begin
    mult_out = {(NUM_OF_MULTIPLIERS*64){1'b0}};
    for (i=0; i<NUM_OF_MULTIPLIERS; i+=1) begin
      tmp_a = a_in>>(i*32);
      tmp_b = b_in>>(i*32);
      tmp_mult = tmp_a*tmp_b;
      mult_out |= tmp_mult<<(i*64);
Would give you NUM_OF_MULTIPLIERS multipliers. If you wrote each multiply out, it would be more code and also wouldn't allow you to parametrize the code.

The key is that for loops are essentially pre-processor macros (like C) so they must have a fixed number of iterations known at compile time. So yes, you have a for loop, but it's very different to what you expect from a for loop in software.

Yes, the key is that loops are always unrolled so the number of iterations (number of copies of the hardware) is fixed. But whether the output of each iteration is used or not can be entire dynamic, potentially resulting in something very similar to a loop in software.

I agree about using C++ for actual ip block implementation. My experience has been pretty mixed. Mostly because the tools (Intel HLS in my case) don't always give you a great idea of what constructs cause you to generate inefficient hdl code.

For example, passing a variable by reference in one context cost me an extra 10% logic blocks, and in another lowered it by 10%. It became a bit of a shotgun approach to optimising

One does not pass a variable in an HDL design ;-). Trying to pluck software principles onto FPGAs is wasting so much performance. Get one with the underlying hardware and map your problem onto them, not an intermediate SW-like representation. Like some other comment mentioned, get one with the clock and your design will fly.

Simply not true. If you realize how the tools use loops you can create useful hardware just as easily as any other construct in a given HDL language.

I wrote tetris in verilog that can output to VGA 10 years ago for a university project. The code is here: https://github.com/jeremycw/tetris-verilog for anyone interested. Comments are sparse but it might be interesting for anyone looking for some example code that's relatable.

I recently got a TinyFPGA-BX, and have been slowly working through the tutorials. The amusing thing is that, for the actual applications I'm working on, a contemporary microcontroller can actually keep up just fine, and is easier for me to comprehend. Still, one of these days, a use will for an FPGA will crop up for which I'll be glad that I learned.

I find this to be true for a lot of applications. Some times it seems like fpgas are hammers looking for a nail.

Where they can shine is if you need some odd combination of peripherals attached to a microcontroller: think of something like a uc with 4 uarts or multiple separate i2c buses.

Anywhere you need a lot of parallel processing that you can guarantee won't be interrupted, like a video processing pipeline is also a good fit.

I picked up a TinyFPGA BX to make a VU-meter with strips of neopixels for a Halloween project (keystep+volca keytar with lots of reactive lights). You can do this with microcontrollers but I wanted to stretch myself and see if I could get a crazy-responsive 7-band meter working. I'm like 95% of the way there after several months learning verilog, testbenches, how a few modules off github work, the I2C protocol, and how to use a logic analyzer -- but I'm stuck trying to get a ADS1115 to do one-shot conversions reliably and probably have to implement the VU-meter with an arduino to get it done for Halloween. It's absolutely thrilling to be working with nanosecond-scale operations and totally parallel design though.

Sounds like a fun project.

You might find this video helpful: https://www.youtube.com/watch?v=us2F8wAncw8

I think his design is very interesting, showing how to mix custom peripherals with picosoc so you can get very good response but also be able to program in C.

Interfacing with arbitrary hardware, for example random LCD devices with sometimes proprietary on the wire protocols. (Don't have a graphics chip for that screen? Make yourself a graphics chip for that screen!) Or Digital Signal Processing.

If you get a high-end microcontroller like a Cortex-M, you can normally just bit-bang the interfaces to devices that you don't have hardware for. DSP is similar - for a hobby project you're normally better off using a high-end device like a Cortex-A and doing the processing in software, rather than futzing with custom digital logic in an FPGA. If you're on a power budget (e.g. for a portable device) you could use a DSP chip like the C6000, but writing the software for it is pretty complicated if you haven't used one before.

If you're building a device that's going to be mass-produced and sold, then the situation is different and using FPGAs can make sense, because you'll amortize the engineering cost for the digital logic across all the units you sell. It can be worth it if it lets you use a cheaper processor or microcontroller.

For people who are curious about FPGAs looking to dip their toes in, I’d highly recommend taking a look at cocotb https://github.com/cocotb/cocotb

It’s kind of similar to verilator, in that it lets you write test benches for your designs in programming languages as opposed to HDL. Whereas verilator lets you write c++, cocotb is python based.

Both of these are probably best to take up after spending some time with an hdl, so you learn to think from a hardware perspective.

Also check out the zipcpu blog

That's a good intro. What I always have to remind software guys who switch to FPGA design is remember each line you write is eventually going to end up in hardware.

That means when you write something you should have some understanding of the underlying hardware inferred. So is this going to give me some combinatorial logic, a register or a RAM? It's very easy to keep the software mindset of if it compiles then it's good.

Great article, I wish the discussions around clocks had gone a bit more into how the tradeoff of pipelining vs longest operation ends up impacting designs. That and SRAM vs DRAM access latencies were the things that really connected the dots from how performance optimization on the software side of things is rooted in physical hardware limitations.

Might you or anyone else have some links or references you could share on these two topics? Was there a specific book that helped connect the dots that you could recommend?

Here's a greatly simplified example. Let's say you're trying to calculate y = mx + b in your FPGA. You want this operation to run at 100 MHz. Great, you write the code, synthesize and implement. Uh oh, the tools report that your design has failed timing analysis. What now?

Looking at the output of the tools, they'll say something like "x to y setup time: -2 ns slack". That means your desired operation can't meet the 10 ns clock period; it actually takes 12 ns for all the logic to ripple through. So now what?

You can break up the operation into two steps. Let's say the multiplication takes 8 ns, and the addition takes 4 ns. In timestep 1 you do z = mx, and pipeline c = b. Then in timestep 2 you do y = z + c. This way your operation takes two clock cycles = 20 ns total in terms of latency, but you can maintain a rate of 100 MHz.

Alternatively, you could choose a slower clock rate, say 75 MHz, and have a clock period of 13.333 ns. Then you would be able to meet the logic delay requirements in one cycle.

Again this is greatly simplified but it's similar to what one ends up doing in real FPGA designs. At the beginning you're usually trying to achieve maximum performance. Then later on you add more features to the FPGA, only to find that in doing so, you've caused an existing portion of the design to fail timing, so you need to twiddle things around.

Sorry, no book reference. What is important to realize is that computing can be seen as a 4D problem. Get the right data to the right processing unit at the right clock cycle. This applies to CPU and GPU as well, but got forgotten under a plethora of abstraction layers.

I've been looking for something like this for a while. It's hard to break into FPGA design coming from a Software Engineering viewpoint, but I think it teaches how the machine REALLY works and can produce better Software developers.

I found Digital Design by Mano and Ciletti to be a nice introduction to the basic building blocks of digital circuits. Four weeks later I'm studying laplace transforms and transfer functions please send help.

Good on you, every idiot can count to one.


This audience is software developers. They will not have an appreciation for Widlar's genius.

Where do all the EEs hang out?

Thanks for the link. I’ve never heard of Wildar before; what an amazing character. Someone had a lot of fun writing that Wikipedia article:

“However, the story about Widlar bringing a goat to trim the lawn in front of his office, retold by The New York Times after his death,[14] was incorrect.[19] It was a sheep, not a goat;[68] Widlar brought her in his Mercedes-Benz convertible for just one day, which included a photo op for the local journalists.[19] According to Pease, Widlar abandoned her in the nearest bar;[19] according to Lojek the sheep was ‘mysteriously stolen’.[68]”


Famous (in certain circles) quote attributed to Bob Widlar, a legendary analog designer who didn't think much of digital designers.

Cute snark ayaya :^)

Hah. I went down the same path! I started with some "light" signal processing for an FPGA based project, and now, 2 years or so later, I'm deep into Laplace and z-transforms (and the foundations required for that).

You might enjoy those books, available fully online: https://ccrma.stanford.edu/~jos/

This book looks great! Thanks for tip. Cheers

I've been going through the book "But How Do It Know" (not a typo) and it's a great introduction on how to build a CPU from scratch starting from basic transistor working up higher and higher levels of abstraction using a bottom up approach. You don't need a lot of math to understand it at all and is a fairly easy read.

how is that true? which component in a modern processor is akin to an fpga or any ip deployed to an fpga?

I think OP is looking at this the other way around -- meaning that it is perfectly possible to create a soft-core CPU on an FPGA, which is a great way to understand what any processor actually does.

I know this, because I started learning and tinkering with this sort of thing a year or so ago, with no prior experience with electronics or hardware design, or a formal comp-sci education.

I had decades of programming experience already, but I think I have learned more about the fundamentals of computer science while playing with cheap FPGAs, than I have by just writing code.

All the digital logic building blocks of a processor, from comparators to ALUs up to branch predictors and pipelines, can be defined and wired together in an HDL. If you have a sufficiently large FPGA, then you can "run" that HDL specification on the FPGA to get a working processor.

It's pretty common for computer engineering students to implement a simple RISC processor (often a simplified MIPS) on an FPGA as a class project. In my experience it was a fantastic way to learn the basics of computer architecture.

The CPU itself? For example, implementing a toy CPU is much much closer to how an actual, non-FPGA CPU works than, say, an emulator. But really anything done on an FPGA should teach you some portion of gate-level logic (yeah yeah, there's LUTs and other specialized cells instead of gates, close enough).

And staying above physics, that's how computers work.

If any of you are interested in FPGA's after reading the article I highly recommend:

Introduction to Logic Circuits & Logic Design with VHDL by Brock J LaMeres

He was my instructor in college and the text itself is extremely helpful for all things FPGA.

Thank you, that was interesting. I've programmed microcontrollers, but I've never tangled with ASICs or FPGAs. Too scared, I guess.

Your article has reduced my fear (and also, you mentioned the price of that Tiny FPGA board; at that price point, I don't mind too much if the magic smoke gets out).

FPGAs are getting larger, more complex, and significantly harder to verify and debug https://semiengineering.com/fpga-design-tradeoffs-getting-mo...

I'm surprised that no one has brought up the Lattice IceStick. It's an oversized USB stick with an ice40 FPGA on it and it's the cheapest option I've seen.

The cheapest options with much larger amount of logic are: Upduino v2.1 ($20) and the never dying EP2C5T144 board ($10).

The latter requires a USB blaster dongle to program things so the Upduino has the upper hand IMO, especially because it also has very lightweight open source tools.

The IceStick is also nice and a bit cheaper at ~$25 but it has a smaller 1k logic element Ice40 FPGA on it whereas the TinyFPGA-BX has a larger 8k logic element FPGA.

Wow this is a gift from God. I was doing fpga for my CV and this just made my life a lot easier

Cool! Back in college we used fpgas with verilog! The labs always took some finagling to get working, but still very fun!

Small error in the article: 200*10e6 is 2GHz, not 200MHz.

Ha, good eye. corrected to 1e6 :)

Nice work. FPGA design appears to be very similar to GPU shader programming. First time I've read anything about FPGA design that connected. Usually FPGA stories get lost in data flow jargon and I learn nothing.

There is no programming in FPGA at all. You describe your hardware using hardware description languages like VHDL or Verilog.

You'll notice I didn't apply the term 'programming' to FPGA. Reading the post I noted this was a likely hang up among FPGA designers and carefully employed the preferred jargon. I imagine this sensitivity is the product of much frustration with forever being conflated with mere programmers. Must be awful.

Then no programming exists at all. When writing a C code you are describing a program that runs on the C abstract machine. The same thing holds for all "programming" languages.

Sorry, I am not ready for philosophical discussion. We can take definition from Wikipedia: https://en.m.wikipedia.org/wiki/Computer_programming Programming involves code execution on computer. There is no computer in FPGA.

Then why did you start the discussion? An FPGA is a computer just as well as any CPU.

Technically FPGA is a piece of memory. The functionality of the FPGA device depends how the bits in this memory are set. Size of this memory is constant This information is not public, brave hackers are working hard to reverse engineer this. You can make a CPU in FPGA, no way for the opposite performance wise. Complex simulation with couple 4k resolution pictures takes days.

Edit: the people here are decent enough to start a discussion.

Yes an FPGA is a different computer then a CPU. It's still a computer though. This is the definition of a computer:

> An electronic device for storing and processing data, typically in binary form, according to instructions given to it in a variable program. [1]

It fits an FPGA perfectly.

[1] https://www.lexico.com/en/definition/computer

There is a term “variable program” in your link. When you add peripherals to the chip on the printed circuit board it looses flexibility very fast. The whole system is made to very specific task. But yes, you convinced me that FPGA might be treated as a computer in an extreme case.

FPGA accepts "variable program"s. What you are talking about are peripherals. A CPU with certain peripherals can also be completely inflexible. That is completely outside the scope of what a CPU or an FPGA are though.

I don’t really see how FPGA programming is similar to shader programming.

Thanks! I too think GPU programming is quite similar in that you need to think of things in a more data streaming sense. It's sort of functional that way too; building up pipelines of transforms.

Very nice article, thanks.

That would more typically be written as 200e6, no need for the explicit multiplication when using standard float literal notation.

little nitpick:

it's "combinational circuits", not "combinatorial" (that's whole another part of math)

Do you have a reference for that? I'm genuinely curious since both terms are used in literature.

"combinatorial" was used predominantly in 1970s books for whatever reason.

In most of contemporary books and university courses "combinational" is used and with a note that you should not confuse it with "combinatorics" and "combinatory logic"

“An alternate term is combinatorial logic,[2] though this usage may be considered controversial.[3]”:


Good reference!


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact