Hacker News new | past | comments | ask | show | jobs | submit login
GraphBLAS – Graph algorithms in the language of linear algebra (graphblas.org)
276 points by johlo 1 day ago | hide | past | web | favorite | 40 comments





I have created a tutorial slideshow on the theory of GraphBLAS with lots of examples and step-by-step illustrations: http://mit.bme.hu/~szarnyas/grb/graphblas-introduction.pdf

A shorter version of this tutorial was recorded at FOSDEM earlier this year: https://fosdem.org/2020/schedule/event/graphblas/

More papers/tutorials/libraries are listed at https://github.com/GraphBLAS/GraphBLAS-Pointers.


Thanks and great

This interview with Tim Davis has a bit of background information.

https://www.acm.org/articles/people-of-acm/2019/tim-davis

> I picked Gaussian elimination. "That will be quick," I thought. Not! I got started on matrices in 1985 and I'm not done with Gaussian elimination yet.

> GraphBLAS is a community effort, including industry, academics, and government labs, that is working to design a library that can implement graph algorithms based on sparse linear algebra over semirings. There are lots of great graph libraries out there that don't exploit the linear algebraic abstraction, but most of what they do can be viewed as matrix operations on adjacency matrices. GraphBLAS makes the connection to linear algebra explicit.

> GraphBLAS gives the graph algorithm developer the power and abstraction of linear algebra, with all its advantages (associative and distributive laws, AB=(B'A')', and so on). One level of breadth-first search, for example, is a single sparse matrix-times-sparse vector multiply, with no loss of asymptotic efficiency. Google's entire PageRank computation can be done with a handful of iterations around a single call to GraphBLAS, using what I call the "PageRank semiring."


>> Google's entire PageRank computation can be done with a handful of iterations around a single call to GraphBLAS, using what I call the "PageRank semiring."

The author has an example of PageRank in C:

https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/stable...

I assume his "single call" statement refers to the MATLAB version:

https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/stable...

    for i = 1:20
        r = ((c*r) * C) + a * sum (r) ;
    end
The MATLAB interface uses overloaded operators and methods on a GraphBLAS matrix object.

I was referring to the PageRank_semiring in https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/5e569f... . I wasn't counting the test for the termination condition -- in the statement in my interview, I assumed the # of iterations could just be given. I would guess that's the case for the Google PageRank computation, since the graph doesn't change so much each month when they compute it, they probably know how many iterations they need.

> AB=(B'A')'

Can someone clarify the notation for me? What is B' and A'? I've never seen that before.


It's the transpose operation [1]. See property 3.

[1] https://en.wikipedia.org/wiki/Transpose


Here are two different PageRank approaches with pygraphblas:

https://github.com/michelp/pygraphblas/blob/master/pygraphbl...


RedisGraph uses GraphBLAS:

https://github.com/RedisGraph/RedisGraph/

Their paper claims that computing with adjacency (sparse) matrices is more performant than alternatives:

https://arxiv.org/pdf/1905.01294.pdf

They compile queries written in Cypher (Neo4j's language) to operations in linear algebra. An example query is:

    MATCH (n:actor{name:"Nicolas Cage"})-[:act]->(m:movie)<-[:act]-(a:actor) RETURN a.name, m.title

Note that the Cypher support is incomplete, and missing features like multiple labels.

https://oss.redislabs.com/redisgraph/cypher_support/


Is there a treatment of graph algorithms in linear algebra from the basics anywhere?

pygraphblas author here, here is a good notebook introduction to the very basics of GraphBLAS with Python. The concepts carry naturally to the C API as well:

https://github.com/michelp/pygraphblas/blob/master/pygraphbl...

I also made a quick introduction video for a paper submission you can see here:

https://www.youtube.com/watch?v=JUbXW_f03W0


I'm a graph theorist, and I spend a lot of time on various optimization problems (matchings, packings, coverings, Steiner trees, etc) doing VLSI. AFAICT, it seems like GraphBLAS is geared for information processing about data stored in a graph representation, and doesn't approach the problems I need. Am I wrong?

Right now, the graphs I'm interested in are reasonably small (tens of thousands of edges), and my oldschool approaches are good enough, but if my employer's products continue their grow trajectory, we'll need to leverage GPUs and TPUs if at all possible. Should I be learning about GraphBLAS?


GraphBLAS is meant to solve graph problems (i.e., generally the sort of ones you are interested in, but more emphasis on parallel ones). Perhaps graph analytics may be closer to what it does right now rather than graph theory. At the very basic level, it exploits the matrix graph duality and casts graph problems as matrix problems. For eg., BFS and SSSP are sparse matrix vector multiplications over a suitably defined semiring. The benefit of this is that you can use high performance matrix libraries. The downside is that not everything can be done this way. At least not easily. You may find the book by Kepner and Gilbert useful. Also look at cugraph and gunrock for GPU based graph frameworks.

Time to hit the books, I guess. Thanks!

It's so early in this work that you might be the first person asking those specific questions. I'm no expert but I take it by VLSI you mean doing circuit routing and simulation and optimizing that topology. My gut tells me those would be good fits for the GraphBLAS.

If the base types don't suit your needs, you can always make your own. Tim Davis has talked about how he's made new types that are small nxn matrices as elements of a larger matrix. You can have complex, quaternion, or complex bags of stuff as matrix elements as long as you define the operators you need to work on those types.

Here's an example of using User Defined Types for a shortest path algorithm that also stores backlinks along the shortest path so the shortest path tree is materialized. in Python:

https://github.com/michelp/pygraphblas/blob/master/pygraphbl...


Shortest paths is definitely a big timesink... I'll definitely give that a shot. Thanks for the link :)

As for what I'm doing, I work at D-Wave and a lot of my work is on what can be thought of as "compilers" for our quantum computers. There's tons of similarity between my problem area and compilers for FPGAs, for example, so I call it VLSI because that's the right ballpark.


Definitely not a comprehensive treatment, but I wrote about some specific use cases from the basics here:

https://andersource.dev/2019/08/25/fun-with-matrix-exponenti...


There's a few papers listed on Davis's GraphBLAS [1] site, this one might serve as an intro:

http://faculty.cse.tamu.edu/davis/GraphBLAS_files/toms_graph...

[1] http://faculty.cse.tamu.edu/davis/GraphBLAS.html


There is a book by the same title.

Graph algorithms in the language of linear algebra: https://epubs.siam.org/doi/book/10.1137/1.9780898719918 .

There is also a Python wrapper:

https://github.com/michelp/pygraphblas


Sounds like a good idea with an added benefit that there are BLAS interfaces for many programming languages.

Can you guys give some real world examples of what graphs algorithms people are actually using in applications?

A lot of bioinformatics (my field) uses graph algorithms. DNA sequencing machines don't give you entire genomes, they give you pieces. Re-assembling the genome from those pieces is a prime graph algorithm target.

For example, Canu, one of the more popular current assemblers for PacBio and MinION data, uses a best overlap graph https://genome.cshlp.org/content/27/5/722.long


If you don’t mind the marketing aspect, there are some pretty good overviews in graphconnect (neo4j) conferences. https://m.youtube.com/watch?v=dW6JsFccdkM

There is a talk by two Facebook engineers [0] where they talk about using graph partitioning to better distribute caching of data.

[0]: https://m.youtube.com/watch?v=QHkhyY9atkE


For those like me who prefer watching talks, Tim Davis presented for Redis a year or two ago.

https://youtu.be/xnez6tloNSQ


Is there a short example somewhere of what it looks like in practice?

Tim Davis, the developer of SuiteSparse [1], a library for calculating with sparse matrices, also developed a reference implementation of GraphBLAS. On his blog are a few short examples of how it looks when you call it from MATLAB:

http://aldenmath.com/performance-of-the-matlab-interface-to-...

http://aldenmath.com/a-matlab-interface-for-suitesparsegraph...

[1] http://faculty.cse.tamu.edu/davis/suitesparse.html


There's an example here showing you don't need to write a shortest path algorithm in terms of nodes and edges: https://github.com/gunrock/graphblast#usage

graphblas is nice in theory, but actual high performance code tends to be quite messy and not always easily encapsulated by graphblas. For example, if you want to store predecessors in SSSP, the semiring becomes quite involved, and not easily transferable to GPUs etc. And there are other other optimizations (direction optimizing BFS, delta stepping SSSP etc.) that are not really a part of the algorithmic specification which also break the encapsulation.

Give us time; we're still in the beginning stages of writing algorithms that use it. If you dig through LAGraph, you'll find lots of messy prototypes, as we experiment on various methods. Those are drafts, not polished results. Our most recent methods are polished, extremely simple, yet very fast:

https://github.com/GraphBLAS/LAGraph/blob/master/Source/Algo...

https://github.com/GraphBLAS/LAGraph/blob/master/Source/Algo...

https://github.com/GraphBLAS/LAGraph/blob/master/Source/Algo... (that one has 6 algorithms inside)

https://github.com/GraphBLAS/LAGraph/blob/master/Source/Algo...

We're working on the GPU version, and all the built-in semirings very work nicely on the GPU. For user-defined semirings, we will need the user operators defined as strings containing C code. Then they can all be done on the GPU too.


People have shown that direction optimizing BFS fits very neatly inside graphblas actually [1]. Perhaps in a few years people will have figured out how to make delta stepping and storing predecessors fit too even if it's not clear at the moment.

For delta stepping, it seems all you would need is a priority queue that works in batches as opposed to individual elements. Then to make it performant and match delta stepping code written from scratch, you might need something that can fuse multiple graphblas operations together so that you don't have too many extra memory ops from the priority queue operations.

[1] https://dl.acm.org/doi/pdf/10.1145/3225058.3225122


There are some examples in in the SuiteSparse implementation, https://github.com/DrTimothyAldenDavis/SuiteSparse/tree/mast...

RedisGraph is a graph database that's built on top of GraphBLAS: https://github.com/RedisGraph/RedisGraph

Is their a similar connection from algebra to hyper-graph (semantic graphs)?

Hypergraphs can be represented with Incidence Matrices

https://en.wikipedia.org/wiki/Incidence_matrix

the same algebra applies and GraphBLAS works well with them!


Thanks, I get it, instead of using only 0 and 1, use the index to indicate which property, and the index is also a node in the graph.

The reference is written in C apparently...



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: