
Program FPGAs with Go - cjdrake
https://reconfigure.io/
======
jerf
This being the internet, let me preface this with this being honest questions,
rather than attacks. I did try to read through the docs before asking but I
don't see the answers directly.

Especially on the FPGA side, how does this interact with all the features of
Go that seem ill-suited to an FPGA implementation? Can I write functions that
generate and consume closures? Where is my garbage going on the FPGA side and
how is it collected? Or is the FPGA code being written only in a subset of Go?

I understand the idea of wrapping the primitives offered by the FPGA hardware
itself into channels, but I'm unclear on how one can sensibly implement a Go
runtime on top of that in the FPGA without making it too difficult to
understand the cost model of your Go code.

~~~
marmaduke
might be like OpenCL: a subset of C99 with some extensions works for parallel
hardware including FPGAs already so why not a subset of Go targeting FPGAs?

~~~
jerf
"A subset of Go" is certainly one sensible option, but I don't see it
documented. The obvious subset that one could get with the least effort is a
rather brutally cut down subset of Go, to the point that I'd consider claiming
it to be "Go running on an FPGA" to be nearly a lie. On the other extreme,
you've got a full FPGA on hand, so nothing stops them from bringing up a
simple ARM CPU die and putting a bit of general-purpose RAM there, so it's
theoretically possible that you could write general-purpose FPGA code and have
that FPGA-CPU gloss over any runtime issues, but that's where I get my
question about how a programmer would be expected to model the costs incurred
by given constructs sensibly. (Presumably, if performance is not a big deal to
the programmer, they're not using FPGAs at all, so I'm going in assuming
performance is at least in the top 3 concerns of anyone who might use this, if
not #1.) (Also I'm not 100% sure about the intersection of what ARM cores
might be available vs. what Go runs on; IIRC Go does ARM but only very high-
end ones. So, take the principle of my point rather than the literal text.
With enough work they can do "anything" with the FPGA code.)

------
rubenfiszel
Congrats!

We, a stanford lab, are pursuing similar goals but opensource and from a Scala
DSL although our doc ([http://spatial-
lang.readthedocs.io/en/latest/tutorial/starti...](http://spatial-
lang.readthedocs.io/en/latest/tutorial/starting.html)) is not that up-to-date:

[https://github.com/stanford-ppl/spatial-lang](https://github.com/stanford-
ppl/spatial-lang)

~~~
eeks
I'll RTFM later, but in the meantime : how to you compare to Chisel
[https://chisel.eecs.berkeley.edu](https://chisel.eecs.berkeley.edu) ?

~~~
aseipp
Spatial is built on Chisel, and from what I can tell, basically offers you an
"SDSoC" experience for Scala.

In Chisel, you write Scala that is compiled to Verilog(?) and you put it into
your synthesis toolchain. But Chisel is mostly just that: it's a HDL, and not
much more. And if you want to talk to an FPGA, especially from software, you
still have to write another pile of glue that does interfacing to your device,
over your peripherials, etc.

Spatial gives you more on top of Chisel. Instead, you simply write a single
program and say "Accelerate this bit", and it generates both the hardware
_and_ software _and_ glues it all together. This means you write a single
program once and the compiler generates all the glue for you, so the usage is
more seamless.

This is what "SDSoC" from Xilinx does, but for C/C++. You simply write a C
program and annotate functions as "Accelerate" and it compiles both the
hardware and software for you and generates the interconnections. Spatial is
like that, for Scala.

~~~
zhemao
> In Chisel, you write Scala that is compiled to Verilog(?)

Berkeley grad student here. The current Chisel compiler generates an
intermediate representation that can be compiled to any target. Our Verilog
backend is definitely the most developed, but we also have an interpreter that
can directly simulate the IR.

~~~
aseipp
Thanks! Is this a reference to FIRRTL that I've seen some light references to
(e.g. Yosys's experimental FIRRTL backend)? I was under the impression FIR was
somewhat new and wasn't sure if it was used in Chisel (yet).

(To be clear, I figured you compiled to _a_ generic IR before lowering onto
some chosen HDL, regardless if it's FIR or not, but wasn't sure if Chisel had
multiple HDL backends strictly speaking).

~~~
zhemao
Yes, I'm talking about FIRRTL, which is used in the latest version of Chisel
(Chisel 3).

There aren't official releases of this version yet, but you can get snapshots
(if you don't mind some API breakage now and then). It works fairly well now
and we use it in all our RTL designs.

------
mmastrac
Interesting. This is a neat approach to building a register-transfer language
using some high-level bindings. Go (at a very high level) seems to have a
pretty good match with the async nature of transistor-level bit-flow
operations when you use channels. In theory you could map this to any language
with strongly-typed async operations.

You get advantages of Go's type-checker, though you're probably limited to a
_very_ small subset of the language. Note that you won't be able to use a lot
of third-party languages unless their translation is _really_ good or the
third-party package code is _very_ simple.

Docs seem to be available here (thanks to another comment in this thread):
[http://docs.reconfigure.io/welcome.html](http://docs.reconfigure.io/welcome.html)

I think the approach of mapping a high-level language to a CPU is not
necessarily novel, but using Go for it is.

I used a different approach to build a bit-flow graph in Java for a project in
the past. Rather than map the whole language to the circuit, I created some
APIs that would generate the graph and export it. It looks fairly similar to
what you see here.

------
dhbx9
The website in itself does not give me an understanding of whatever the hell
they're doing. I'm a computer engineering undergrad and I've done FPGA's
before. I don't see what's "code in go and deploy FPGA's to the cloud". I
think that putting some code and other actual use cases to the website would
be nice.

Looking at some of the examples it seems to me that you'd still need to know
hardware programming, memory etc. Now my comment seems very snarky, but I
still think that it's a huge achievement to have gotten this far with this and
I wish them luck! I just don't get the target user base.

------
pella
docs: [http://docs.reconfigure.io/](http://docs.reconfigure.io/)

github: [https://github.com/ReconfigureIO](https://github.com/ReconfigureIO)

------
zackmorris
This is great, I've waited 20 years for this (computer engineering degree
1999). For all the naysayers - what has gone wrong with computing, why Moore's
law no longer works, etc etc is that we've gone from general purpose computing
to proprietary narrow-use computing thanks to Nvidia and others. VHDL and
Verilog are basically assembly language and are not good paradigms for
multicore programming.

The best languages to take advantage of chips that aren't compute-limited* are
things like Erlang, Elixir, Go, MATLAB, R, Julia, Haskell, Scala, Clojure.. I
could go on. Most of those are the assembly languages of functional
programming and are also not really usable by humans for multicore
programming.

I personally vote no confidence on any of this taking off until we have a
Javascript-like language for concurrent programming. Go is the closest thing
to that now, although Elixir or Clojure are better suited for maximum
scalability because they are pure functional languages. I would give MATLAB a
close second because it makes dealing with embarrassingly parallel problems
embarrassingly easy. Most of the top rated articles on HN lately for AI are
embarrassingly parallel or embarrassingly easy when you aren't compute-
limited. We just aren't used to thinking in those terms.

* For now lets call compute-limited any chip that can't give you 1000 cores per $100

~~~
zensavona
I would argue that Elixir is far closer to being a "JavaScript-like language
for concurrent programming" than Go, due to it's dynamic typing and relative
freedom it affords in comparison to the others you mentioned (except for
Clojure, which is actually quite similar in a lot of ways).

Although it is technically a purely functional language, you can almost mutate
variables (in reality it is creating a new immutable variable with the same
name, which takes precedence)

    
    
      a = 1
      # a == 1
      a = 2
      # a == 2
    

Concurrency feels very natural:

    
    
      # concurrent
      numbers = [1,2,3,4,5]
      doubles =
        |> numbers
        |> Enum.map(fn(n) -> Task.async(fn -> n * 2 end) end)
        |> Enum.map(&Task.await/1)
      # doubles == [2,4,6,8,10]
    
      # consecutive
      numbers = [1,2,3,4,5]
      doubles = numbers |> Enum.map(fn(n) -> n * 2 end)
      # doubles == [2,4,6,8,10]

~~~
runeks
> Although it is technically a purely functional language [..]

This (purity) stirred my interest, but as far as I can see it's incorrect.
This[1] Wikipedia page on pure languages does not list Elixir, and the Elixir
Wikipedia page itself does not mention purity at all.

Can anyone clarify?

[1]
[https://en.wikipedia.org/wiki/List_of_programming_languages_...](https://en.wikipedia.org/wiki/List_of_programming_languages_by_type#Pure)

------
kutkloon7
Let me be a bit pessimistic amd ignorant here.

People who want to use an FPGA should learn VHDL or verilog. There have been a
lot of projects to make C compile to VHDL/verilog, and it's generally accepted
that it does not work very well.

What is the advantage of using Go for the same purpose?

~~~
gluggymug
As a long time HW FPGA guy, I think you might want to take a look at the C
projects again. I don't know whether Go has any advantages but the concept of
a higher level language for development is being used by the major FPGA
companies.

Both Xilinx and Altera have High Level Synthesis (HLS) tools. These use C or
C++. If you know how FPGA work is generally done, you can separate the hype
from the reality and you can understand how to use it for a real application.

The vendors have lots of libraries for IP. You don't write RTL from scratch.
It would take too long to verify. You tie IP together. It can be DSP or
generic maths or a video codec thing. The VHDL is done for you.

You write your algorithm in C++ in a particular format using compatible data
types and calling HLS libraries. You run it all in C++ first and make sure it
does exactly what you want in SW. This is where the algorithm is developed.

THEN you fire up the HLS tool and a couple of hours of synthesizing later
(lol) you get to load a bitstream onto a FPGA to verify it.

Of course there can be problems in that translation. It takes good engineering
to dive down into the design and find the issues.

My current work does not touch any HLS. I am doing the VHDL stuff. But I know
the algorithms all started from SW first. It always does. For the bulk of the
work, verification, it is somewhat irrelevant whether it is manually converted
to RTL or done via tools.

~~~
rthomas6
Another HW FPGA guy here. Albeit one who has never used HLS. My concern with
the whole idea of HLS is that it fails to take advantage of the
parallelization capability of FPGAs, which in my opinion is one of the main
reasons to use an FPGA in the first place. It sounds great for designs that
are linear in nature, that is, putting data through a bunch of sequential
processing blocks and then outputting some result. But for most of those
cases, why not just use a processor + DSP SoC? Or even something like a Zynq?
It will probably be faster.

Seeing how FPGAs do not operate in a linear way the way that software does on
a processor, why are we trying to make them work that way? It would make more
sense to me to design a high-level synthesis language with a paradigm that is
also not imperative: functional programming. Like, for example, how would this
kind of C code even be synthesized in hardware?:

    
    
      A = 5;
      B_out = A + 3;
      A = 6;
      C_out = A;
    

"A" is used as two different things, which is totally fine when the code is
run sequentially, which must be what is happening when code like this is
synthesized, but that's wasteful on an FPGA, because B_out and C_out don't
actually have dependence on each other and could be computed concurrently,
which is what would happen if we used VHDL to do something similar. We need a
high-level synthesis language that describes a _system_ which solves the
algorithm we want, the same way VHDL does, except with more abstraction
capabilities. In my opinion this could be a functional language.

~~~
gluggymug
I agree about the parallelism but you have to understand the design
methodology.

Your example is somewhat pointless. The code is written to create the HW not
the other way around. I can't feed it just any crap.

You want parallelism you have to code it.

Zynq would actually be what I use! You start with SW. The ARM core is not that
quick. You will use the FPGA to accelerate the tough parts. You may think you
will have throughput issues but you have options via the high performance AXI
ports. Your FPGA modules access the data in memory via DMAs.

KNOWING what part of the algorithm you need to accelerate actually suits
FPGAs, you grab the HLS and start coding.

You have to look at some of the libraries to understand what abstraction level
you are working at: [https://www.xilinx.com/products/design-
tools/vivado/integrat...](https://www.xilinx.com/products/design-
tools/vivado/integration/esl-design.html)

Video, matrices, linear algebra, encoders/decoders. Etc. I can string them
together in the same way I would string HDL IP.

The advantage is I can run the algorithm in C++ first and test it all, under
the assumption that the HLS library has the equivalent HW version for
synthesis.

There is still a lot of HW work involved. For instance in your example with A
used twice. One module would calculate B_out by reading A prior to changing
its value then you would have to start the C_out module. You would need a way
to coordinate the two modules to share the same memory at A. But they would be
running in parallel, just not started at the same time.

------
rob_reconfigure
Some great questions (and some other really exciting projects m!)

It's early days for us at reconfigure.io, we're just working with a few core
early users at the moment and we'll be bringing more examples, benchmarks and
increased access over time.

~~~
jnordwick
How are you dealing with GC issues? Is there a paper or docs?

------
mhh__
Are they using soft-cores on a fpga or actually synthesizing a design specific
to one's go code? Am I blind or is is the website not very clear?

~~~
quadrature
you're not blind, i was expecting some code samples to make this clear. I
assume its actually synthesizing a specific design because otherwise it isn't
really noteworthy, but its not clear.

------
ereyes01
This isn't quite the same thing, but it kind of reminds me of Altera's Nios
soft processor:
[https://www.altera.com/products/processors/overview.html](https://www.altera.com/products/processors/overview.html)

In one of my previous lives doing embedded development, we were able able to
program the FPGA using pretty plain looking C on the Nios, which just
dedicated a portion of the FPGA's gates to running a simple, ARM-like
processor.

It was cool for us software dudes because we could just do general-purpose
computing (mostly) on the FPGA, and the verilog folks would wire it up for us
to work right. It's not the cheapest way to design a product, but the stuff I
worked on had crazy high profit margins, so it was a fair trade-off for better
productivity.

------
cosinetau
FPGAs are worth studying in case anyone doesn't really know them. I once had
an interview question right out of uni, and it was about FPGAs and I didn't
know what they were. The interviewer really looked down his nose at me after I
told him that.

~~~
analog31
Ask HN: What's an inexpensive set-up for learning? I've done a lot of work
with microcontrollers, but never FPGAs. Is there a cheap development board
with software, that I can try out at home?

~~~
cosinetau
I'm not up on a lot of the hardware stuff. Raspberry Pi? Beagleboard?

~~~
wott
So you would fail that interview again :-)

~~~
cosinetau
No. I would not take a second interview from them.

------
ris
This looks awwwwfully nonfree...

~~~
metaphor
In what sense? Target is an FPGA, so the point strikes me as moot.

~~~
ris
Investing time in writing your complex code in their Go dialect may leave you
in shackles and at the mercy of their pricing plans.

~~~
metaphor
I completely agree with your point.

To wit, pushing the performance of any FPGA with _< insert_favorite_hdl_here>_
will inevitably result in a high degree of technical debt and/or vendor lock-
in, e.g. instantiating device-specific hard IP.

At the end of the day, we--as developers--aren't quite at that point where we
can have our cake and eat it too, making this solution yet another product
lifecycle trade-off decision.

------
blacksmythe

      >> expensive, hard-to-source hardware engineering skills.
    

If only that were true.

Lots of hardware engineers have moved into software.

~~~
lumost
If it was more practical to source hardware skills and develop fpga components
there are quite a few companies and academic institutions interested in
exploring direct hardware acceleration. In particular seeing examples of
matrix operations carried out on fpga or graph traversals would be an
excellent starting point to driving adoption

------
gaelow
This is cool but where are the metrics, examples?

------
deepnotderp
Someone needs a tool chain with chisel as the target.

~~~
hobo_mark
Why not target verilog directly? Chisel is just a verilog generator in the
end.

~~~
vasilia
Because of ML and neural networks hype. No one wants to learn VDHL or Verilog,
they want OpenCL or something similar.

~~~
alunaryak
I hear what you're saying, but Chisel is a lot closer to an RTL like Verilog
than it is to OpenCL.

------
bonoetmalo
This post is peak 2017 Hacker News

