The big problem with graphs is that they can get very big, and once they become too big for a single machine's memory things get tricky.
However it's unclear if there are many graphs that are actually too big.
Frank McSherry showed (back in 2015) that he can process 20 iterations of PageRank on a 120B edge graph in ~12 hours on single core on a single laptop. He also shows label propagation to convergence in 1700 seconds on the same single core.
Given this, exactly what advantage does a FPGA give? Who is processing ~100 billion edge graphs so many times that you are going to save on power? And is there actually any power savings over a single core anyway?
 implements work-efficient parallel algorithms for a list of fundamental graph problems. They report times on a O(200B) edge graph on a single multicore.
 reports distributed running times for BFS, connectivity, PageRank, and SSSP on a large number of distributed hosts. The times are usually slower than the single-machine times, despite using two orders of magnitude more threads.
 Reports connectivity times on a truly large graph that is not publicly available (several trillions of edges).
My feeling is that unless you are at FAANG or a big national lab, the largest graphs that will show up in practice (today) are on the order of billions of edges, and can be solved quickly using single-machine multicores.
basically time and power. FPGAs will most likely be both faster and more power efficient than GPUs and CPUs, the memory point isn't quite relevant, you can have an FPGA board that talks to 32 GB of ram, or an HBM chip with as much memory density and memory bandwidth as a 1080 TI GPU.
Not spending an hour on performing the calculation, I’d guess. An hour for PageRank is probably acceptable, but there are applications where waiting an hour for a result isn’t viable.
I think people vote for anything with FPGA in it, because people think FPGAs are the future of high-performance compute (and always will be IMHO).
I've worked in environments where FPGAs were used (specialized camera stabilization and tracking). These kind of environments are around, but generally you need a mix of high performance and some kind of embedded requirement for a FPGA to make sense.
Lightbits uses an FPGA on their storage adapter for NVME-over-fabric, enabling custom processing at line rates. This is a standardized evolution of Annapurna (now AWS Nitro virtualization), https://www.lightbitslabs.com/products/lightfield/
https://NetFPGA.org has been around for a while and continues to advance, https://www.usenix.org/system/files/nsdi19spring_pontarelli_...
> programming a Smart-NIC to support a new network function requires hardware design expertise. While a tech giant can build and assign a dedicated team to the task, this is usually not the case for a large majority of companies, e.g., smaller cloud or network operators. As a result, recent network programming abstractions, such as P4 have the explicit goal of simplifying the programming of FPGA-based network devices ... introducing FlowBlaze, an abstraction that extends match-action languages such as P4 or Microsoft’s GFT to simplify the description of a large set of L2-L4 stateful functions, while making them amenable to (line-rate) implementations on FPGA-based SmartNICs. To benefit the community at large, we build FlowBlaze on open SmartNIC technology (NetFPGA), and provide our hardware and software implementations as open source.
In other words, the overhead of the re-configurable aspect of FPGA is just too high.
a) most applications don't need 7nm. S3 will do a custom ASIC for under $2M, and I've heard figures as low as $100K for a FPGA for ASIC conversion.
b) General purpose embedded CPUs with extended instruction sets are often fast enough for most purposes.
Some of the algorithms require the FPGA to be reconfigured whenever the node data has changed or when applied to a different data set.
Is this an expensive operation?
The GPU instruction set is limited for some applications because of SIMD. If the GPU threads aren't doing exactly the same thing then they stall. The FPGA can have parallel processes going on that are more differentiated. e.g. HMMs work better on FPGA than GPU.
Anything with low latency feedback (e.g. LSTMs) probably works better on FPGA.
I'm hoping for a 10x-100x increase in memory-size for gpu to unlock new possibilities.
Any density upgrades for the GPU will also reflect for FPGAs, but we're physically limited in terms of how many HBM dies you can slap on :(