You could have said the same thing about generalized distributed computing as we...

jandrewrogers · on June 14, 2012

Approaches like Pregel have been the canonical way of dealing with large graph problems for many years. Unfortunately, most interesting graph analytic problems do not fit into that model because the "graph-like" aspect is still limited to problem sizes that fit conventional algorithms.

For example, if you can reduce a trillion-edge graph analysis problem into a billion-edge graph plus some other stuff (usually materialized document structures) then you can fit that into something like Pregel. That is how almost all real-world graph analysis is done today.

But by doing so, you've lost the ability to do graph analysis on the other ~trillion edges for the sake of tractability in a narrow case. You can't do relationship analysis across the attached documents. There are many, many graph analytic problems that require a true graph that is orders of magnitude larger than what can be partitioned even after accounting for graph reduction techniques such as those used in Pregel.

The Holy Grail is still the ability to run ad hoc graph analytic queries against a massively distributed graph representation. There are no shortcuts around this for many interesting applications. Right now, we are limited to mere billions of edges for most practical purposes and all of the hacks and workarounds are designed to keep the number of true edges to around this number even when the data model is much larger.

cbsmith · on June 14, 2012

> Unfortunately, most interesting graph analytic problems do not fit into that model because the "graph-like" aspect is still limited to problem sizes that fit conventional algorithms.

"interesting" != "useful"

Again, the same thing has been true with distributed computing in general. MapReduce & Hadoop are pretty much the antithesis of where distributed computing research for the last ~20 years had been working, because MapReduce solves what is nearly an "embarrassingly parallel" problem.

> There are many, many graph analytic problems that require a true graph that is orders of magnitude larger than what can be partitioned even after accounting for graph reduction techniques such as those used in Pregel.

There are even more distributed computing algorithms that don't fit in to MapReduce terribly well (and really, it's not that algorithms don't work with MapReduce/Pregel, it's that they don't work well), but it is still quite useful.

Turns out, the reason it's the Holy Grail is that it is just flat out hard to do (provably so). While what Titan/Pregel do isn't nearly is difficult, it is surprisingly difficult to do at massive scale, so just doing the simple stuff they do is quite useful and game changing.