
Graph Processing on FPGAs: Taxonomy, Survey, Challenges - walterbell
https://arxiv.org/abs/1903.06697
======
nl
This seems a weird problem to solve with FPGAs.

The big problem with graphs is that they can get very big, and once they
become too big for a single machine's memory things get tricky.

 _However_ it's unclear if there are many graphs that are actually too big.

Frank McSherry showed (back in 2015) that he can process 20 iterations of
PageRank on a 120B edge graph in ~12 hours on single core on a single
laptop[1]. He also shows label propagation to convergence in 1700 seconds on
the same single core.

Given this, exactly what advantage does a FPGA give? Who is processing ~100
billion edge graphs so many times that you are going to save on power? And is
there actually any power savings over a single core anyway?

[1]
[https://github.com/frankmcsherry/blog/blob/master/posts/2015...](https://github.com/frankmcsherry/blog/blob/master/posts/2015-02-04.md)

~~~
moab
Here are a few references that solve problems on very large graphs:

[1] implements work-efficient parallel algorithms for a list of fundamental
graph problems. They report times on a O(200B) edge graph on a single
multicore.

[2] reports distributed running times for BFS, connectivity, PageRank, and
SSSP on a large number of distributed hosts. The times are usually slower than
the single-machine times, despite using two orders of magnitude more threads.

[3] Reports connectivity times on a truly large graph that is not publicly
available (several trillions of edges).

My feeling is that unless you are at FAANG or a big national lab, the largest
graphs that will show up in practice (today) are on the order of billions of
edges, and can be solved quickly using single-machine multicores.

[1] [https://arxiv.org/abs/1805.05208](https://arxiv.org/abs/1805.05208)

[2]
[https://www.cs.utexas.edu/~roshan/Gluon.pdf](https://www.cs.utexas.edu/~roshan/Gluon.pdf)

[3] [https://arxiv.org/abs/1807.10727](https://arxiv.org/abs/1807.10727)

~~~
nl
Yes, all these are valid cases. Hence the second part of my question:

 _Given this, exactly what advantage does a FPGA give? Who is processing ~100
billion edge graphs so many times that you are going to save on power? And is
there actually any power savings over a single core anyway?_

------
moab
Any insight on why this paper was posted? What graph algorithms are people in
the field interested in solving on FPGAs/GPUs/other accelerators?
Unfortunately I usually see the same cluster of (toy) problems solved time and
again (BFS/BC/PageRank/(negative)SSSP, usually all using label-propagation
style vertex-centric codes). This survey interestingly enough reports some
results for maximum matching (approximate?), although it does not provide many
details on the result.

~~~
nl
I agree.

I think people vote for anything with FPGA in it, because people think FPGAs
are the future of high-performance compute (and always will be IMHO).

I've worked in environments where FPGAs were used (specialized camera
stabilization and tracking). These kind of environments are around, but
generally you need a mix of high performance _and_ some kind of embedded
requirement for a FPGA to make sense.

~~~
alain94040
Correct. By definition, if volume is high enough, an FPGA will always lose to
a custom chip. That's why it feels like FPGAs are always about to win, except
they never do because for every real use case, someone will make a better
chip.

In other words, the overhead of the re-configurable aspect of FPGA is just too
high.

~~~
banjo_milkman
Back when chip costs were reasonable, custom chips would always beat FPGAs and
were an easy option for many applications but I think things have changed
somewhat in the last few years with the decline of Moore's law. Now there are
fewer and fewer companies that can afford to develop custom chips at leading
technology nodes. Some estimates are it costs $100m to get a custom chip into
product at 7nm. The volumes just aren't there for many applications to justify
a custom chip, it doesn't make sense unless the market is huge and the
algorithms are static. FPGAs work well in these cases. Have you seen Microsoft
Brainwave? [https://www.microsoft.com/en-us/research/project/project-
bra...](https://www.microsoft.com/en-us/research/project/project-brainwave/)

~~~
nl
This is true, but:

a) most applications don't need 7nm. S3 will do a custom ASIC for under
$2M[1], and I've heard figures as low as $100K for a FPGA for ASIC conversion.

b) General purpose embedded CPUs with extended instruction sets are often fast
enough for most purposes.

[1] [https://www.s3semi.com/wp-
content/uploads/2018/06/S3semi_sil...](https://www.s3semi.com/wp-
content/uploads/2018/06/S3semi_silicon_economics_WhitePaper_Jun18.pdf)

------
TBF-RnD
Very interesting read.

Some of the algorithms require the FPGA to be reconfigured whenever the node
data has changed or when applied to a different data set.

Is this an expensive operation?

~~~
banjo_milkman
Not really, but I suppose it depends on the application requirements. It
typically takes under a second which is much faster than taping out another
chip ;-) but could be problematic for real-time systems in some cases. It
depends upon the size of the FPGA/bus speed etc.

~~~
TBF-RnD
thank you have been wondering about this for ages!!

------
RantyDave
I've been (I admit) just starting to try and learn about FPGA's and where
their applications might be. Outside of the painfully obvious - 100G
networking, HFT - I can't see a case where their expense makes them a valid
solution when compared to a GPU. Ideas?

~~~
ddorian43
Bing search uses FPGA for scoring. See FPGA for storage/nvme.

I'm hoping for a 10x-100x increase in memory-size for gpu to unlock new
possibilities.

~~~
michaelxia
If you take apart a gpu and look at the die you can see they basically
frankenstein 1 or 2 HBMs with the GPU die. Xilinx is doing this too in their
latest line of VU33 - VU38P HBM chips too (4-8GB on chip, 460 GB/s).

Any density upgrades for the GPU will also reflect for FPGAs, but we're
physically limited in terms of how many HBM dies you can slap on :(

~~~
ddorian43
What about [https://www.amd.com/en/products/professional-
graphics/radeon...](https://www.amd.com/en/products/professional-
graphics/radeon-pro-ssg) (strap ssd on a gpu)

