
Show HN: Graph Processing with Python and GraphBLAS - michelpp
https://github.com/michelp/pygraphblas/tree/master
======
espeed
Hi Michel - You've been busy with this and the Postgres lib. Good work. The
Gremlin User Group has been working on a new universal graph DSL and compiler
that can compile down to any of the graph backends. GraphBLAS has been part of
the discussion since the start. An early draft of mm-ADT was just released
last week...

mm-ADT: A Multi-Model Abstract Data Type [http://rredux.com/mm-
adt/](http://rredux.com/mm-adt/)

~~~
michelpp
Thanks espeed! I'll give this a read.

------
laurencerowe
This looks really interesting, potentially makes GraphBLAS much more
accessible for exploratory work. A few questions for the author:

Does this work in blocking or non-blocking mode? Naively I imagine there might
be more opportunity for the GraphBLAS implementation to optimize execution in
non-blocking mode.

Is there a way to efficiently store and load matrices to and from files?
Ideally in a such a way that the data is just mmap'ed or copied directly into
memory on load?

Does this only work with SuiteSparse or could it potentially work with a GPU
implementation like
[https://github.com/gunrock/graphblast](https://github.com/gunrock/graphblast)
too?

~~~
espeed
The GraphBLAS GPU code is in the works [1]. For storage see the new RedisGraph
1.0 implementation released last year and 'michelp has a Python Postgres
implementation in development.

[1] Sparse versus dense in GraphBLAS: sometimes dense is better
[http://aldenmath.com/sparse-verses-dense-in-graphblas-
someti...](http://aldenmath.com/sparse-verses-dense-in-graphblas-sometimes-
dense-is-better/)

[2] RedisGraph: A Graph Database Module for Redis
[http://graphblas.org/?title=Graph_BLAS_Forum#Graph_analysis_...](http://graphblas.org/?title=Graph_BLAS_Forum#Graph_analysis_systems_that_integrate_GraphBLAS)

[3] Graph Processing with Postgres and GraphBLAS
[https://news.ycombinator.com/item?id=19379800](https://news.ycombinator.com/item?id=19379800)

~~~
laurencerowe
For the bioinformatics datasets I work with it is not cost efficient to load
everything into a database. For some of these datasets (e.g. GWAS - Genome
Wide Association Studies which are essentially sparse matrices) it might be
interesting to explore with graph queries. I guess my ideal would be to have a
GraphBLAS equivalent to Spark SQL queries working across files in cloud
storage / NAS.

~~~
espeed
There are several teams working on distributed GraphBLAS (the GraphBLAS/D4M
model was designed to run on supercomputers [0]). Kepner's team at MIT is one
[1].

NB: D4M was the original name before it was changed to GraphBLAS and became a
standard.

[0] GraphBLAS: Building Blocks For High Performance Graph Analytics
[https://crd.lbl.gov/news-and-
publications/news/2017/graphbla...](https://crd.lbl.gov/news-and-
publications/news/2017/graphblas-building-blocks-for-high-performance-graph-
analytics/)

[1] A Billion Updates per Second Using 30,000 Hierarchical In-Memory D4M
Databases [https://arxiv.org/abs/1902.00846](https://arxiv.org/abs/1902.00846)

[http://www.mit.edu/~kepner/](http://www.mit.edu/~kepner/)

~~~
sandGorgon
This is very cool. I wonder if the Graphblas and the Dask team should
collaborate.

Dask has a production grade distributed computing system (that is cloud
compatible with kubernetes, yarn, EMR, Dataproc,etc).

~~~
lmeyerov
RAPIDS has picked up Dask for multi-gpu aspects of cudf (think spark/pandas on
GPUs), and as cugraph is single GPU
([https://github.com/rapidsai/cugraph](https://github.com/rapidsai/cugraph))
for going fast on ~billion row datasets... I'm guessing dask+cugraph will be
happening for the next 100-1000X, if not already.

Graph partitioning is a weird world, so will be interesting to see!

------
kuanbutts
Interesting - would be interested in a comparison between GraphBLAS (which I
had not heard of until just now) and, for example, graph-tool's
([https://graph-tool.skewed.de/](https://graph-tool.skewed.de/)) underlying
algorithms (Boost Graph Library).

~~~
enriquto
It seems that they serve very different purposes. Graphblas is mostly focused
on the processing of real-valued functions defined on the vertices and edges
of large-scale sparse graphs, and is optimized for this use case (think graphs
with thousands of millions of vertices). The boost graph library is not
tailored to numeric functions and it will probably be less efficient for this
use case.

~~~
kuanbutts
Thanks! That's a helpful description.

------
refset
GraphBLAS is also the engine behind RedisGraph. Sparse adjacency matrices are
interesting from a graph database perspective in particular because they are
typically faster and more compact than the mainstream index-free adjacency
systems (e.g. Neo4j).

------
enriquto
Very interesting, didn't know about graphblas!

Does anybody here know about the advantages with respect to scipy.sparse ?
Does scipy.sparse use graphblas internally?

~~~
espeed
Tim Davis also wrote the underlying code for much of scipy.sparse (his code is
underneath almost all sparse matrix libs [1]). See...

[1] Tim Davis Research
[http://faculty.cse.tamu.edu/davis/research.html](http://faculty.cse.tamu.edu/davis/research.html)

~~~
enriquto
Oh, what a man! He's also the author of the legendary CHOLMOD algorithm,
probably one of the most used numerical algorithms ever, just after the FFT.

------
dlphn___xyz
very cool

