
Google Pregel: a system for large-scale graph processing - helwr
http://portal.acm.org/citation.cfm?id=1807167.1807184
======
helwr
here is a brief summary

Pregel defines a sequence of supersteps (execution cycles) which are executed
in parallel on the entire (partitioned) graph.

The supersteps are executed by worker threads, synchronized by global
synchronization points and coordinated by a master. Pregel keeps vertices and
edges on the machine that performs computation and uses network only for
message passing.

Within a superstep each vertex executes the same user-defined function. There
is no message passing between vertices during the superstep but the vertices
will recieve their messages in the next cycle only.

There are no remote reads, or shared memory concepts like locks, mutexes and
semaphores - it is a "pure message passing model".

Vertex can modify its state, or that of its outgoing edges, receive messages
sent to it in the previous superstep, send messages to other vertices (to be
received in the next superstep) or even change the topology of the graph.

Vertex state machine has two states Active/Inactive. a vertex is activated by
a message received from a previous superstep. A vertex deactivates itself by
"voting to halt". When all vertices are deactivated and there are no messages
in transit the algorithms terminates.

The output of Pregel program is a set of outputs emitted by vertices, could be
a directed graph or a set of disconnected vertices or aggregate stats.

The authors mention Valiant's "Bulk Syncronous Parallel model" as the main
inspiration behind this model (I've found only ACM link:
<http://portal.acm.org/citation.cfm?id=79181>)

\----------------------

PageRank in Pregel:

class PageRankVertex

: public Vertex<double, void, double> {

public:

virtual void Compute(MessageIterator* msgs) {

if (superstep() >= 1) {

double sum = 0;

for (; !msgs->Done(); msgs->Next())

sum += msgs->Value();

 _MutableValue() =0.15 / NumVertices() + 0.85_ sum; }

if (superstep() < 30) {

const int64 n = GetOutEdgeIterator().size();

SendMessageToAllNeighbors(GetValue() / n); } else {

VoteToHalt(); } } };

------
kiwidrew
I'd love to find a PDF of the full paper; we hear so much about MapReduce, and
this is essentially the map-reduce approach as applied to graphs.

The best reference about Pregel I've been able to find so far is a one-page
announcement, published last year:
[http://jloxim.mimuw.edu.pl/redmine/attachments/11/p6-malewic...](http://jloxim.mimuw.edu.pl/redmine/attachments/11/p6-malewicz-2.pdf)

~~~
nerme
MapReduce is already being applied to graphs.

Spectral graph theory, ie, the construction and manipulation of graphs that
are described by matrices, has been around for quite some time. Two types of
matrix representations are typically used. Adjacency matrices and Laplacian
matrices.

Page Rank is an example of this. It is equivalent to finding the centrality of
a graph, by computing the primary eigenvector of the adjacency matrix of the
graph.

MapReduce is used for accomplishing the difficult task of computing the needed
eigenvector of a multi-billion dimensional matrix. Instead of attempting an
exact computation of the primary eigenvector, the power method is used. The
power method is an iterative approach that quickly approaches an approximation
of the eigenvector. It comes down to looping over a matrix multiplied by a
vector until you get the precision you're looking for.

The cool thing about the power method is that you don't REALLY need to get
your data in to some sort of linear algebraic representation of a matrix, you
can just use more efficient data structures to mimic matrix multiplication.

Since matrix multiplication is done a row and a column at a time, you can
split it up in a parallel manner. The map function handles the computation of
each row of the resulting vector, and the reduce function puts it back
together again, ready for the next iteration of the power method.

As for PreGel, I didn't really understand the description of it.

I have always had a lot of gripes with publications related to algorithms.
There are very rarely any code snippets. I feel at least that the authors
should go over their algorithms in some sort of pseudo-code.

I realize that academic papers are written for a specific audience in mind, an
audience that is used to seeing a bunch of algorithms described in linear
algebraic notation, but I feel it does a huge disservice to spreading
knowledge. I mean, I'm somewhat versed in mathematics, albeit at an undergrad
level, but I rarely know what these articles are talking about in any specific
detail, at least to the point where I could reproduce their approach in
functioning computer code.

I've even been published in ACM, and I can't follow the majority of papers in
the journal!

This isn't just applicable to computer science, rather all mathematics in
general. The notations used are so arcane that even the simplest of ideas can
seem daunting. It's probably why so many people are turned off by mathematics
early on in their academic careers.

It took me a good week of brushing up on linear algebra and reading through
various descriptions and white papers to figure out what was going on with
Page Rank. It really isn't that complicated of an algorithm in the end...
you're just constructing an MxM matrix where rows and columns are considered
web pages, getting those rows to add up to 1 while making sure there are no
elements equal to 0, computing an eigenvector, and then scaling the
eigenvector down to make each element add up to 1. However, you read the
Wikipedia entry on Page Rank and it is borderline indecipherable!

Why isn't there a "somewhat complex math explained in normal language with
pretty pictures and graphs" textbook or website out there? Are all
mathematicians incapable of communicating their art form to the world at
large?

~~~
ajb
Well, there's "Proofs without words"
(<http://books.google.co.uk/books?id=Kx2cjyzTIYkC>) and its sequel
(<http://books.google.co.uk/books?id=wMpwQgAACAAJ>). I'm not sure that's
exactly what you meant, though.

------
ableal
Digging around a bit, without much luck - there's the research blog post of a
year ago: [http://googleresearch.blogspot.com/2009/06/large-scale-
graph...](http://googleresearch.blogspot.com/2009/06/large-scale-graph-
computing-at-google.html) (some comments there point to similar work.)

Snippets:

 _so far we haven't come across a type of graph or a practical graph computing
problem which is not solvable with Pregel. It computes over large graphs much
faster than alternatives, and the application programming interface is easy to
use. Implementing PageRank, for example, takes only about 15 lines of code._

 _spoiler: The seven bridges of Königsberg — inspiration for Leonhard Euler's
famous theorem that established the basics of graph theory — spanned the
Pregel river._

------
Maro
Is the PDF available for download somewhere?

~~~
gregbayer
Should be available here: <http://portal.acm.org/citation.cfm?id=1807184>

