
FPGA Acceleration by Dynamically-Loaded Hardware Libraries [pdf] - Katydid
http://orbit.dtu.dk/ws/files/126373186/tr16_03_Nannarelli_A.pdf
======
wyager
I think the single biggest hurdle to FPGA adoption is that FPGA development is
horrible. The IDEs suck, everything is license-encumbered and full of DRM, and
debugging is a nightmare.

The best thing I've found so far is Clash. It translates Haskell code to
(System)Verilog or VHDL. It works remarkably well, and is _way_ easier to
write and test than writing VHDL or Verilog directly. You can compile it to a
plain old program for testing and debugging, which means you can use things
like QuickCheck for aggressive testing. If you're really serious, you can use
something like LiquidHaskell for really aggressive formal verification.
There's also Lambda-CCC, which looks a bit more theoretically rigorous, but I
couldn't get that working when I tried it.

The key here is that non-strict lambda calculus maps remarkably well to
hardware, which means that it's actually relatively straightforward to
translate a program to an exactly equivalent hardware representation. The only
exception is unbounded recursion (at the function or data level), because
obviously you can't represent infinite stuff in hardware.

I've written a few FPGA projects using Clash that I never would have had the
patience to finish if I had used a traditional HDL. There's usually a bit of a
performance hit over hand-optimized HDL, but (as with using e.g. Python over
assembly) there are many cases where developer time is more expensive than a
small efficiency loss.

~~~
alain94040
What does Clash bring to FPGA design? I can easily imagine how functional
languages make it easy to express the datapath of a circuit. But If you still
specify registers explictly and have to design your pipeline by hand, it's not
a big improvement over traditional HDLs. Can it help you infer correct control
logic? That would be useful. Explore different speed/area solutions?

When performance matters - and it does or you wouldn't bother to design
hardware - everything gets heavily pipelined with very short logic between
pipes stages. Will Clash be any more readable than Verilog then?

~~~
wyager
I think you're focusing on a bit too narrow scope of improvements over
traditional HDLs.

A few advantages include: sensible language design, strong types, ADTs, easier
and much faster testing/debugging, easier formal verification, easier, more
powerful, and safer parametrization, etc.

And yes, even if your clash degenerates to basically copying verilog (which it
usually doesn't, in my experience) it's still much easier to read thanks to
the richer types and ADTs.

------
pjc50
From abstract: "Provided a library of application-specifc processors, we load
on-the-fly the specifc processor in the FPGA, and we transfer the execution
from the CPU to the FPGA-based accelerator."

Key takeaways: (2.2.2) neat trick to accelerate reconfiguration; (3) sample
applications involving BCD arithmetic; (4) efficient scheduling to avoid
thrashing the reconfiguration.

(Personally, I suspect that until we have a good, _open_ or OS provided API to
FPGA configuration we're going nowhere. 3D acceleration required this in the
form of OpenGL and DirectX.)

~~~
RandomOpinion
> _Personally, I suspect that until we have a good, open or OS provided API to
> FPGA configuration we 're going nowhere. 3D acceleration required this in
> the form of OpenGL and DirectX._

There's no demand for it. There's nearly nothing that the consumer does that
requires that level of hardware acceleration (with the possible exception of
Photoshop) and server type applications are perfectly fine with custom
software interfaces.

~~~
angry_octet
There is a huge demand, but it is already met by GPUs. FPGAs work much better
for streaming and low latency applications though. See for example the
Hololens processor, though that is an ASIC it could be done by an FPGA paired
with a GPU.

------
jackyinger
Really neat! Having a fixed framework to plug accelerator blocks into is lets
accelerator designers cut to the chase.

I'd bet the 10ms hit to reconfigure their zynq grows on larger FPGAs...

Edit (Additional Thought): Thus could allow designers significantly
smaller/cheaper FPGAs. Rather than statically implement all required
functions, intermediate results can be saved in off chip RAM while the next
function is loaded. That said it'd require significant additional engineering
effort.

------
mrlambchop
When I first heard about the co-processor extensions in ARM, I spent hours
dreaming up a dynamic FPGA accelerated co-processor that could have dynamic
'instructions' swapped in based on typical work loads it was receiving. The
compiler would generate the custom HW FPGA programs based on profiling data
from previous iterations of the app and static inspections - the dynamic
complex blocks then attached to the ELF for loading. A statistics block in the
CPU determines which ones are loaded via a kernel driver (I think you could do
with with pure SW actually - no HW support needed) - you'd need a "branch if
instruction-X is available" and I guess some interlocks to stop instructions
being swapped out until the SW has finished with them etc...

I may dream, but this team did it! Super cool. Having spent a day with Zynq
trying to load bitstreams from Linux running on the hard cores (and completely
failing such that I changed attack vector) this alone is impressive :)

------
CalChris
There was a company, GigaOps, back in the 90s that did this.

[http://arith.stanford.edu/courses/abstracts/gigaops.html](http://arith.stanford.edu/courses/abstracts/gigaops.html)

Their MVP was accelerating Photoshop effects plugins but they developed a
decent C like language for general kernels.

Way ahead of its time.

~~~
rch
I still remember sitting through a CS colloquium in the late 90s on the
potential of GPU acceleration, and being utterly devastated that it was
obviously going to sideline this type of FPGA work for a _long_ time.

Even if progress never stopped in certain niche areas, I'm happy to see it
clawing back into the light of day.

