

Computing shortest distances incrementally with Spark - wak
http://insightdataengineering.com/blog/incr-short-dist-graphx/

======
bitL
Yikes, I had to stop when the author mentioned MapReduce for computing
shortest distance. That's one of the problems for which MapReduce approach is
extraordinarily bad.

~~~
ignoramous
A better approach at the scale being discussed in the article would be to...?

~~~
karussell
What does 'scale' mean :) ? Solving graph problems are best done in-memory
with a big machine (RAM!), otherwise you are an order of magnitude slower (at
least) if you try to distribute. But if you really have no choice I would have
a look if some of the graph databases has a good distributed model. My gut
feeling tells me that even a bad approach there is faster than spark ...

~~~
ignoramous
I guess Titan DB would fit the bill:
[http://thinkaurelius.github.io/titan/](http://thinkaurelius.github.io/titan/)

Facebook has blogged about scaling Apache Giraph to insane number of
Vertices/Edges.

------
ddrum001
I don't think you have a choice, you'll have to use MapReduce if the data is
too big to fit into memory. I believe that's what Facebook and Google do:

[https://www.facebook.com/notes/facebook-
engineering/scaling-...](https://www.facebook.com/notes/facebook-
engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920)

[http://googleresearch.blogspot.com/2009/06/large-scale-
graph...](http://googleresearch.blogspot.com/2009/06/large-scale-graph-
computing-at-google.html)

~~~
bitL
Your 2nd link describes Pregel, Google's distributed graph database
specifically built for these kinds of tasks.

They were using Map Reduce prior to that, but it was a cascading mess.

------
karussell
If you have a hammer everything looks like a ...

~~~
jklein11
unicorn?

