
Show HN: Graph Processing with Postgres and GraphBLAS - michelpp
https://github.com/michelp/pggraphblas
======
methyl
Since you are mentioning Redis Graph and I'm not really into academic side of
graphs, I want to ask - how do you use it from the user perspective, who wants
to do some simple graph traversals using Cypher or some equivalent? For Redis
Graph you have some nice higher-level API, is something similar planned for
pggraphblas or outside of scope for this project?

~~~
michelpp
There is work being done by a standards committee on a common high level
Cypher like language:

[https://www.gqlstandards.org](https://www.gqlstandards.org)

Long term that's maybe a promising target. That's out of scope of pggraphblas
now, but like RedisGraph uses GraphBLAS internally, one could implement a
Cypher or GQL look alike using the foundational bits of pggraphblas.

Edit: and to answer your actual question, if Cypher, or SQL/GQL for that
matter, is "high-level" user experience then the GraphBLAS API experience is
sort of aiming for the "mid-level". Still quite technical, but more powerful
tools than for "low-level" which would be writing graph algorithms by hand in
some procedural language.

One of the key goals of GraphBLAS is this sort of meeting of the minds between
the hardware experts and the graph problem solvers. This one page poster kind
of sums that up nicely:

[https://resources.sei.cmu.edu/asset_files/Poster/2016_020_00...](https://resources.sei.cmu.edu/asset_files/Poster/2016_020_001_484268.pdf)

------
sg0
I am following your discussions on the GraphBLAS email list. Are you
interested in the distributed-memory API of GraphBLAS (for clusters, typically
over MPI) or can you make use of it if it were available (you may know,
CombBLAS already has a distributed-memory implementation)?

~~~
michelpp
Yes, I have only reviewed ComBLAS's documentation, but would eventually like
to dig into it deeper. A thought I had for distributed graphs would be
something like support on top of a distributed store like citus, where shards
hold subgraphs. This may be an interesting way to sidestep Postgres' 1GB limit
on varlena objects as well.

Edit: I'd forgot to ask, do you have some ideas on the subject? I'm all ears.

~~~
sg0
Thanks for the clarification. I basically wanted to know if database systems
could make use of existing distributed-memory HPC programming models, and it
seems that there is a possibility, however engineering efforts might be
significant. I really have no experience with database engineering, and I am
not sure if MPI and Citus will play out well. I hope you are considering
submitting a paper/poster at HPEC'19 for your current work ([http://www.ieee-
hpec.org/](http://www.ieee-hpec.org/)), I think that's a good venue to engage
with the GraphBLAS community in person.

~~~
michelpp
Good idea! Hopefully I can put something together.

------
sandGorgon
Has anyone done a similar thing with Spark and Postgresql for large graphs ?
I'm curious to know how you write and structure your algorithms, given that
graph partitioning is quite tricky (and not possible for a lot of algorithms).

I'm guessing Google uses a variant of this at scale - but not sure if their
label propagation algorithms can be implemented on something like Spark and
Postgres.

------
sntran
A slightly off topic question regarding the mxm (matrix times matrix), mxv
(matrix times vector), and vxm (vector times matrix) functions. AFAIK,
Postgres supports function overloading. With the new types of matrix and
vector, would it be easier to have a single function for multiplication, than
three separate ones?

Also, can we overload operators in Postgres?

~~~
michelpp
The * operator can be overloaded, and it is already done s so for the mxm, mxv
and vxm combinations. The catch is, all these AxB functions have more
arguments than just their operands (mask, accumulator, semiring, descriptors).
So you can `a_matrix * b_vector` and get "classical" matrix multiplication,
but if you want accumulation, masking, a non-default semiring etc, you have to
use the functional notation.

------
jarym
I found this project on Github a few weeks ago and have been quietly following
it.

It looks really cool but I didn’t get from the readme how I would best take my
relational tables and make a set of matrices from them.

I’ll take another look this weekend

~~~
michelpp
Matrices are constructed from arrays of rows, columns, and values. The
simplest way to pivot a table into a matrix is to use array_agg()

    
    
      select matrix(array_agg(row), array_agg(col), array_agg(val))) from table;
    

For the inverse operation there are set returning functions called
'matrix_elements_<type>(m)' which return the graph elements as row tuples.

From a table management standpoint, matrices are varlena objects much like
arrays. Triggers can be used to call set_element() to keep matrices and tables
sync'ed, although like arrays, this has a high re-serialization cost for large
arrays. Because of this it's best to modify matrices in bulk operations.

