
Mosaic: processing a trillion-edge graph on a single machine - contingencies
https://blog.acolyer.org/2017/05/30/mosaic-processing-a-trillion-edge-graph-on-a-single-machine/
======
jandrewrogers
This is really well executed. I am always surprised that more graph engines
are not designed using the internal architecture here. Similar designs
actually work well in more distributed environments. The design is close to a
graph engine I wrote for a supercomputer in 2009, particularly how the
parallelization and scale-out work, which performed well. The overhead of
networking is significant but can be partially mitigated with aggressive
latency hiding techniques.

The article demonstrates how efficiently a very large graph can be processed
on a single machine but the architecture has analogues that work nicely on
large compute clusters too. It is just a good way to process graphs.

~~~
jacquesm
Do you have any papers about your work online? That sounds super interesting.

~~~
jandrewrogers
Unfortunately not. I've recently started working on a series of articles and
papers to close that gap though.

------
jonathanstrange
Stupid question by someone who finds this very interesting but lacks some
background information: What kind of processing are we talking about? What do
you do with the graph? Searching? Finding paths? Mutating contents? Changing
the graph structure? Aggregating data about the graph?

The purpose seems to be presupposed, so I guess it's some common use case that
I should know.

~~~
faizshah
The major motivation for Giraph (Facebook) and GraphJet (Twitter) was the
ability to do graph based recommendations such as Friend of Friend scoring
(Facebook) or Who To Follow (Twitter). Basically looking at the followers or
friends of each of your followers or friends and using some kind of scoring
algorithm to pick which users you will most likely be interested in.

Another common example is doing some kind of clustering (community detection)
of vertices like for example labeling the users on the social graph by likely
political affiliation or likely hobbies. These labels could be used as
features for advertising recommendations later.

------
ksec
This begs the question if this scales linearly. The 1 Trillion Edge Graph on
Xeon Phi ( Knight Corners ) and Intel SSD. How well would it do on Knight
landing ( The Current Gen )? Or Knight Hill, next gen with 10nm. And with
Optane Memory instead of SSD.

Could we handle Facebook's edge graph on a single machine?

~~~
mfukar
Good question - sounds like you have a paper to write.

------
amelius
> But, with Mosaic, we are able to process large graphs, even proportional to
> Facebook’s graph, on a single machine.

What exactly does the word "proportional" mean here?

~~~
faizshah
On the order of one trillion edges like Giraph.

