

Large-scale graph computing at Google - pfedor
http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html

======
rw
> "In Pregel, programs are expressed as a sequence of iterations. In each
> iteration, a vertex can, independently of other vertices, receive messages
> sent to it in the previous iteration, send messages to other vertices,
> modify its own and its outgoing edges' states, and mutate the graph's
> topology [...]"

This sounds _very_ similar to how neural networks are updated through time.
Could this be used to easily simulate biological cognition? Am I missing
something? And, scalability isn't a problem:

> "Currently, Pregel scales to billions of vertices and edges, but this limit
> will keep expanding."

The human brain only has about 100 billion neurons.

~~~
tricky
I would love to implement a graph like this where each node used bayesian
inference to act on messages received from connected vertices.

My gut says if you could get the system's state to settle to an equilibrium,
it could react to changes in external signals in a probabilistic way and
learn.

Instead I'm busy working on an iphone app...

~~~
mattj
Check out en.wikipedia.org/wiki/Junction_tree_algorithm not a great article,
but the referenced literature is ok

Don't have much time to elaborate at the moment, but look up the "junction
tree algorithm"- it's a way of performing inference in graph-structured
statistical models. You think of edges as relationships between random
variable (which are the nodes), and have the nodes communicate with each other
until all the signals have propogated. Makes inference straightforward, though
still exponential

------
ratsbane
Everyone's talking about graph databases lately. Who hasn't designed some
relational data structure and then refined it a bit and tweaked it a bit and
ended up with some sort of adjacency-list monstrosity and the realization that
the relational underpinnings for that are rather awkward?

I'm really looking forward to reading more about Pregel. In the meantime I
just found Greg Malewicz's PhD thesis. He seems very interested in scheduling,
which must be crucial to Bulk Synchronous Parallel:
<http://www.cs.ua.edu/~greg/publications/Malewicz_PhD.pdf>

~~~
Tichy
Representing graphs in relational databases seems straightforward enough (not
that I have done, but I have an opinion nevertheless). Accessing it
efficiently if you want to traverse a graph seems to be the hard problem. I
wonder if there are ANY good solutions at all, short of loading the whole
graph into memory. Otherwise I suppose one would need a good heuristic for
caching the edges and vertices that are most likely to be accessed?

------
Maro
I wonder what the similarities between LinkedIn's Neo4J, Google's Pregel and
CODASYL is. (CODASYL lost out to RDBMS ~30 years ago.)

<http://en.wikipedia.org/wiki/CODASYL>

~~~
henning
With only a hint as to what this Pregel thing is about, my guess would be that
Neo4J and CODASYL are focused on persistence/storage whereas Pregel is meant
for high-performance OLAP stuff. Storage issues like replication are probably
less of a concern.

Notice that they're submitting the paper to a distributed computing conference
and not a database conference.

Ultimately I doubt anyone but the Googles of the world have a need for this
kind of technology.

~~~
scott_s
There was a time when nobody but the IBMs of the world had use for computers.

