
MOSAIC: Processing a Trillion-Edge Graph on a Single Machine [pdf] - blopeur
https://taesoo.gtisc.gatech.edu/pubs/2017/maass:mosaic.pdf
======
eternalban
Section 2 of OP points us to 5.5 regarding "load balancing", after reminding
us that "Optimal graph partitioning is an NP-complete problem".

In 5.5 we read about _single node_ internal processor load balancing and also
the happy comment in the following 5.6 ("Fault Tolerance") that "in MOSAIC,
due to its single-machine design, handling fault tolerance is as simple as
checkpointing the intermediate state data (i.e., vertex array)."

I would be more interested to hear about processing a "Trillion-Edge" graph
distributed over "Two" machines: Partitioning a general graph is not as simple
as partitioning k/v buckets or relational graphs; and, 10^12 edge order sounds
like a lot until you recognize the 'edge amplification' fact of capturing a
logic domain using the weak semantics of named associations.

------
blopeur
Slide deck : [https://taesoo.gtisc.gatech.edu/pubs/2017/maass:mosaic-
slide...](https://taesoo.gtisc.gatech.edu/pubs/2017/maass:mosaic-slides.pdf)

