How to accelerate a program using hardware

nickpsecurity · on Dec 18, 2015

Nice simple introduction. Would be interesting for someone here with access to HLS tools, esp C to verilog stuff, to tell us what speedup they get straight-up using the C code. As in, very little input past what the HLS tools come up with.

rmohanx · on Dec 18, 2015

PFC. Rough idea how this might compare to a pathologically and expertly tweaked assembly version?

officialchicken · on Dec 18, 2015

Faster, potentially cheaper, and more expensive to produce?

The history of the bitcoin miner has details and us a real-world example of software on x86 ASIC -> FPGA -> Custom ASIC process. It's easy to find the relative performance of the bitcoin miner running on everything from Rasberry PI's to CUDA clusters [1].

Note that the article is using the very flexible DE2-115 and there's lots of interesting trade-offs made to fit a bitcoin miner in only 115,000 gates... iirc, if you have 250k gates, it can run 4x (???) faster due to optimizations during synthesis.

https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner

Current Performance: 109 MHash/s On a Terasic DE2-115 Development Board [2]

[1] https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner

[2] https://en.bitcoin.it/wiki/Non-specialized_hardware_comparis...

nickpsecurity · on Dec 18, 2015

I guarantee it would smoke it by similarly ridiculous numbers. Assembler will inherently be doing ops sequentially while also waiting on memory accesses in between them where not cached. An expected speed up might factor in the clock difference between it and yours plus number of cores. Yet, you're not going to get the kind of parallelism and simple operation you have with custom HW. It's the lasting drawback of general-purpose CPU's.

And why Intel is buying Altera. Stuff like this article will get easier and with even bigger speedups in the near future. Just wait. :)

p1esk · on Dec 18, 2015

It's more interesting to see what could be done with a GPU.

Also, I wonder if we could bypass the shared main memory, and to turn the pixels on and off directly (by hacking the display driver or whatever).

nomel · on Dec 18, 2015

> Also, I wonder if we could bypass the shared main memory, and to turn the pixels on and off directly

The pixels exist in a shared memory, do they not?

gluggymug · on Dec 18, 2015

That's the problem I think. He's changed from a shared pixel memory in the reference design to a non shared one in his HW accelerated design. That's the impression I get from the diagrams.

It's somewhat moving the goal posts IMO. Guess it's ok since it's a student project.

p1esk · on Dec 18, 2015

Pixels are mapped to memory locations, but they don't have to be, if you can access the map directly. I don't exactly know what I'm talking about here, just a thought.

conorpp · on Dec 18, 2015

What would be the difference between writing to shared hardware pixels over a shared memory performance wise?

I mean, pixels are just like a memory except that they glow. They hold their value and can be writen a new value in sync with a clock (which is normally 60 Hz for most monitors which is MUCH slower then on chip memory). There could be no performance benefit.