Hacker News new | comments | ask | show | jobs | submit login
Log(Graph): A Near-Optimal High-Performance Graph Representation (2018) [pdf] (mit.edu)
180 points by espeed 4 months ago | hide | past | web | favorite | 12 comments



Most of the paper was quite good and the results seem promising. One of the simpler ideas is to represent the set of offsets as a bitvector with a select structure which I didn’t think of and I think it’s a nice idea. It works under the assumption that the difference between offsets is typically less than the size in bits of an offset but I think this could reasonably be true in many cases.

Other approaches involve permuting vertex ids (which can potentially be quite slow) which makes me a bit sad although I’m not really sure why.

The mathematical notation is ugly and mangled in some places and in a few places wrong but this is sadly common with computer science papers so I can’t really complain much about it.

The diagram on page 2 is completely bonkers. I sympathise a bit in that it can be amusing to create crazy graphics in TIKZ but this one doesn’t really serve its purpose (of explaining what the sections of the paper are) at all.


The paper claims the code is available online. The page, however, shows: "Coming soon...", for quite some time. https://spcl.inf.ethz.ch/Research/Performance/LogGraph/

Is it expected to land in the near future?


Maybe it won't be available until next month after it's presented?


This paper on succinct graph data structures will be presented at PACT (http://www.pactconf.org) next month.

To put it in the broader context of relational database programming and why it's significant now for software developers in general, here are some thoughts from a thread I posted yesterday [0] on the convergence of ideas that have been in the works for the last 10+ years on the graph database front and the forces that have aligned in hardware and other fronts to bring about the architectural paradigm shift that's to come...

[0] The design and implementation of modern column-oriented database systems https://news.ycombinator.com/item?id=18076547

   --//--
This upcoming paper [...] is more data model than database, but I think you'll find it interesting.

For something more database specific, check out the new RedisGraph module [1] implemented with GraphBLAS [2] by the venerable Tim Davis [3] himself, who as you know implements the underlying sparse matrix algos used in everything from MATLAB to Google Maps...

RedisGraph in the Language of Linear Algebra with GraphBLAS, co-presented by RedisLabs and Tim Davis [video]

https://www.youtube.com/watch?v=xnez6tloNSQ

GraphBLAS is not Redis-specific, but the Redis data structures and new modular design made it an ideal candidate for the first popular database implementation.

This is significant because GraphBLAS is more than 10 years in the making, the culmination of the initial D4M matrix model design by Jeremy Kepner [4] and his team at MIT Lincoln Laboratory Supercomputing Center.

And for the last ~5 years or so the software model has been designed in collaboration with hardware teams at Intel, NVIDIA, IBM, and the labs [5] to make chips and architectures optimized for these new matrix models and capable of exascale.

It all came together this summer with the release of GraphBLAS 1.0 [6]. Now that GPU and TPU accelerators are populating the data centers and linear algebra has come en vogue, hopefully enough software engineers will be ready with the background understanding and a working mental model for the underpinning architectural paradigm shift [7].

[1] RedisGraph https://oss.redislabs.com/redisgraph/

[2] GraphBLAS http://graphblas.org

[3] Tim Davis http://faculty.cse.tamu.edu/davis/

[4] Jeremy Kepner http://www.mit.edu/~kepner/

[5] GraphBLAS: Building Blocks For High Performance Graph Analytics https://crd.lbl.gov/news-and-publications/news/2017/graphbla...

[6] Graph algorithms via SuiteSparse:GraphBLAS: triangle counting and K-truss [pdf] http://faculty.cse.tamu.edu/davis/GraphBLAS/HPEC18/Davis_HPE...

[7] David Patterson Says It’s Time for New Computer Architectures and Languages https://news.ycombinator.com/item?id=18009581


> hopefully enough software engineers will be ready with the background understanding and a working mental model for the underpinning architectural paradigm shift

Good luck with that. This seems like a mess of buzzwords. GraphBLAS, Redis, RedisGraph, paradigm shift, exascale. I have no idea what it is supposed to accomplish, let alone what the advantage is.

Is this just some sort of reinvention of linear algebra mixed with a DAG?

Why did someone write an entire paper about counting triangles?


This comment breaks the site guidelines, which ask: "Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something." Please do better than this when posting here.

https://news.ycombinator.com/newsguidelines.html


Hah, "just" a reinvention of graph analysis with linear-algebra. That's a "paradigm shift." Check out the performance numbers from that youtube video for redis-graph versus other graph databases. Reformulation of graph algorithms as algebraic operations means a huge step up in abstraction, and optimization, and performance. GraphBLAS is what enables that. It's the difference between tensorflow and having to rewrite stochastic gradient descent for every deep net architecture you want to test.

For why somebody would write an entire article about counting triangles, check out the youtube video above. It's the answer to one of the first questions from the audience.

It's ok not to care about graphs, but why come in to a thread about graphs and say "what's the point?" Graphs are important for some areas, and not for others.


I'm not asking the point of graphs, I'm asking what is actually new. Directed acyclical graphs have been around for ages, combining them with linear algebra is not new.

Instead of actually explaining anything that is new, original, unique or just a general step forward, all the replies seem to be saying how silly it is for asking and how smart someone is for counting triangles.


Please take any single one of the sentences of my first paragraph above, to explain what's cool about the comment and the OP. Any one sentence would, if true, make this material far more worthy of hacker interest than the typical HN front page article. That the post above shows all of them makes this comment one of the better contributions to the site in the past week. I'm still making my way through all the material and without a doubt consider it the best information I've learned all week.


Yes, graphs have been around for ages. They're as old as trees! ;-)

And you're right, first-year intro courses present graphs as matrices, it's usually the first graph representation you learn.

However, historically that's not how graphs have been represented in commercial or open-source databases due to the computational complexity and impracticality of supporting different architectures. Many PhD research papers over the years have been about the task of creating one-off implementations for the new hardware of the day.

For an overview of GraphBLAS in the context Heterogeneous High-Performance Computing (HHPC) systems running on NVIDIA GPUs and Intel Xeon Phis, see the 2015 talk Scott McMillan (https://insights.sei.cmu.edu/author/scott-mcmillan/) gave at the CMU Software Engineering Institute:

Graph Algorithms on Future Architectures [video] https://www.youtube.com/watch?v=-sIdS4cz7-4

The figure cited by Scott McMillan in the CMU video is that each new hardware architecture implementation requires about 10,000 lines of code for BFS, and that's just one algorithm. The GraphBLAS standard makes this problem go away.

Furthermore, general on-demand access to GPU and TPU accelerators in cloud data centers just now became a thing. GraphBLAS will make it possible for non PhDs to run graph algos on clusters of accelerators in the cloud at supercomputer speeds.

https://cloud.google.com/gpu/

https://cloud.google.com/tpu/

Having the power of a Graph 500 (https://graph500.org) supercomputer at your fingertips and the ability to tap into that power on demand...well that's new! :-) And kinda crazy cool too, don't ya think?

P.S. A few years back, Jeremy Kepner did a mini-course on D4M (the precursor to GraphBLAS). The videos and material are on MIT OCW...

MIT D4M: Mathematics of Big Data and Machine Learning [video] https://www.youtube.com/watch?v=iCAZLl6nq4c&list=PLUl4u3cNGP...

Discussion: https://news.ycombinator.com/item?id=18105931


> I have no idea what it is supposed to accomplish, let alone what the advantage is

Did you read/view any of the references?

I tend to cite everything so that you have direct access to the source material for reference in case the context isn't clear.

This is not a "mess of buzzwords." It's how we move beyond the limits of Moore's Law. A lot of smart people have been working on various aspects of this for a long time. It's a big deal because it's a big change in how we model data, write algorithms, and think about computation in general, from the design of low-level hardware all the way up the stack to application design. It will behoove you to understand the change that's coming because it's what's next, and it's already here.

Re: your question on "Why did someone write an entire paper about counting triangles?"

Prof Tim Davis wrote the paper. He is one of the world's leading experts on sparse matrix implementations, and after the GraphBLAS Steering Committee finalized the 1.0 spec, they asked him to create the reference C implementation. Triangle counting is a common graph algorithm so it provides a familiar reference point when illustrating and comparing the characteristics of an implementation. Dr. Davis presented the paper this week at the HPEC 2018 conference (http://www.ieee-hpec.org), and in the paper's abstract he articulates precisely why:

Graph algorithms via SuiteSparse:GraphBLAS: triangle counting and K-truss

Abstract — SuiteSparse:GraphBLAS is a full implementation of the GraphBLAS standard, which defines a set of sparse matrix operations on an extended algebra of semirings using an almost unlimited variety of operators and types. When applied to sparse adjacency matrices, these algebraic operations are equivalent to computations on graphs. GraphBLAS provides a powerful and expressive framework for creating graph algorithms based on the elegant mathematics of sparse matrix operations on a semiring. To illustrate GraphBLAS, two graph algorithms are constructed in GraphBLAS and compared with efficient implementations without GraphBLAS: triangle counting and constructing the ktruss of a graph.

http://faculty.cse.tamu.edu/davis/GraphBLAS/HPEC18/Davis_HPE...

Watch the RedisGraph video for a general overview of how it works, but the big picture in the gist of it all is that data used in machine learning, AI, and relational databases is now being modeled in the same way -- as vectors/matrices/tensors -- to take advantage of modern CPU/GPU/TPU hardware accelerators that are being optimized for vector/matrix/tensor multiply-add operations. This means we can now run local and global algorithms in the same way, using the same code and the same data model, and we can return queries in real time -- executed in parallel, both locally or distributed -- it doesn't matter how the data is partitioned, the computation model is the same. Not only is this fast and compact, it unifies and simplifies things.


The article was selected to discuss in our reading group.

Here you can find a very interesting remarks and important questions that were asked:

https://papers-gamma.link/paper/112/

Sorry, but I am too lazy to copypaste them here :D




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: