
How to Partition a Billion-Node Graph [pdf] - espeed
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/Partition.pdf
======
mattkrause
That looks like a pre-print or review copy, and is missing the authors' names
and other citation information.

The paper is: L. Wang, Y. Xiao, B. Shao and H. Wang, "How to partition a
billion-node graph," _2014 IEEE 30th International Conference on Data
Engineering_ , Chicago, IL, 2014, pp. 568-579. doi: 10.1109/ICDE.2014.6816682

Here's the (potentially paywalled) link to IEEExplore:
[http://ieeexplore.ieee.org/document/6816682/](http://ieeexplore.ieee.org/document/6816682/)

~~~
espeed
Here's a direct link to the ICDE Conference version:
[https://www.graphengine.io/downloads/papers/Trinity.Partitio...](https://www.graphengine.io/downloads/papers/Trinity.Partition.pdf)

~~~
mattkrause
Even better!

Mods, maybe swap the links?

------
espeed
This paper by Fabio Checconi and Fabrizio Petrini of IBM Watson is a rare look
into design details:

Traversing Trillions of Edges in Real-time: Graph Exploration on Large-scale
Parallel Machines

[http://www.odbms.org/wp-
content/uploads/2014/05/g500-ipdps14...](http://www.odbms.org/wp-
content/uploads/2014/05/g500-ipdps14.pdf)

------
mozumder
OK now I'd like to see this run on a VLSI ASIC place-and-route system.

Originally these graph partitioning algorithms were largely designed to solve
that problem.

------
guard-of-terra
Can't download it.

What's the problem? Machines with 32G RAM are commonplace, you can probably
just fit it all into memory. The important bits at least.

~~~
koala_man
The examples mentioned are the web (50B nodes, 1000B links) and the Facebook
social graph (0.8B nodes, 100B links).

You might fit quite a bit into a beefy 3TB RAM server, but it might not be the
best ram-to-cpu ratio for graph algorithms.

~~~
moab
You can compute non-trivial statistics about the entire web-graph (roughly
|V|=3B, |E|=100B)on a single high-end server these days. Speaking from
personal experience, a machine with > 64 cores and 1TB of RAM can compute
connectivity, MSTs, and partition this graph in minutes. Are there graphs out
there (with the exception of Facebook and Google's private graphs) that are so
large as to be intractable without resorting to external-memory/distributed
computing?

~~~
serbrech
LinkedIn + Outlook + Dynamics?

~~~
moab
Sorry, I should clarify: publicly accessible graphs. The hyperlink graphs are
the largest I've found.

[http://webdatacommons.org/hyperlinkgraph/index.html#toc1](http://webdatacommons.org/hyperlinkgraph/index.html#toc1)

~~~
espeed
Also see [https://snap.stanford.edu/data/](https://snap.stanford.edu/data/)

~~~
mattkrause
Extrapolating from a few data sets at that link, the road network for the
entire US is probably on the order of 100M nodes and around 250-500M edges.

I would imagine real biological data isn't quite there yet, but it's certainly
not for want of trying. Simulations might get pretty close. Blue Brain was
aiming for 100M neurons and (presumably) a few orders of magnitude more
connections and I think there's another simulation floating around with 10^11
neurons (but presumably less biophysical accuracy).

