Truly the hallmark of any reputable FPGA benchmark "We hired an intern, put him through a lobotomy and then had him write code for this GPU, our team of seasoned professional FPGA design engineers wrote code over the next 7 years that really kicked his arse".
If I understand correctly, price/performance of higher end FPGAs are bonkers because production volume is nil and they are for simulating larger circuits, not faster ones.
FPGA’s flexibility has potentials and they’ve come to comparable price range but won’t be in competitive range for some time still.
They didn’t implement it in proper VHDL/Verilog. They used OpenCL compiler, what is a great waste of resources. Of course, this way for comparison was quick.
Good solution would be high level synthesis from Matlab/Python/C instead of blindly replicating OpenCL kernels designed for GPU. Might work even better on less fancy FPGA than Arria 10.
One of the key points of the paper is about how OpenCL makes it easier to implement things for an FPGA. Using Verilog/VHDL is about an order of magnitude more work, which would probably completely disqualify using the FPGA for a project like this.
"The source code for the FPGA imager is highly different from the GPU code.This is mostly due to the different programming models: with FPGAs, one buildsa dataflow pipeline, while GPU code is imperative."
Please explain how they used OpenCL kernels designed for GPU.
You can think of OpenCL kernels (or any imperative sequence of low-level operations) as data flowing through math operations. Normally, we leverage a single set of math circuits to perform all of these operations in sequence, and orchestrate the data flow through a register file. You could imagine removing the register file and instantiating an actual circuit that represents the data flow of the program itself. This creates more opportunity for pipelining, which should be plentiful in a highly data parallel computation. The issue with FPGA is they are clocked lower and are not very dense, so the tradeoff is generally not worth it.
Yes, I don't think it's an issue with the compiler. The FPGA approach requires a flexible fabric that just has lot's of overhead to give it programmability compared to an ASIC. For an FPGA to have value, you _really_ need to leverage it's programmability. Emulating an ASIC design for verification and testing is a good use case.
Paper mentions SKA has specific requirements but doesn't really go into details.
If you put radiotelescope in the middle of nowhere and you need to build your own powerplant and deal with logistics of transporting then you care about power efficiency and robustness.
That doesn’t make everything irrelevant, but it’s definitely a weird to publish a paper about this in 2019.