
Parallel Universe open sources a novel in-memory data grid - pron
http://blog.paralleluniverse.co/post/26909672264/on-distributed-memory
======
zimbatm
Here is my understanding on how it works:

SpaceBase is a distributed in-memory chunk store. Each chunk is referenced by
an "id" and stores unstructured data (as far as the store knows). Each chunk
is "owned" by a single node but other nodes might keep a read-only copy. If
another node wants to write to that chunk, it first requires a transfer
ownership and the new owner is broadcast. This design is inspired on how L1
caches in CPUs work. The theory is that with MMO, the data is geo-localized
and each computer would take care of a part of the whole world. By keeping
data local to the machine you avoid network I/O.

There are two things I didn't see in the article. The first is how read-caches
are invalidated. Maybe data is eventually consistant or cache-invalidation is
broadcast on write. The second is how data from a stale node is recovered.
That's the difference I though when I saw the CPU reference.

~~~
pron
Yep, except that this project is called Galaxy (we use it internally as part
of our commercial offering - SpaceBase).

I will write another post in the coming weeks, explaining the cache-coherence
protocol in detail, but let me just say now that Galaxy is always consistent
(i.e. not eventually-consistency). Some more details: read-caches are
invalidated when a node wishes to write (just like L1 caches), but the writer
doesn't need to wait for acknowledgements in order to proceed - this helps
with latency. Also, the invalidation doesn't need to be broadcast, because the
current owner of a cache-line maintains a list of all readers.

~~~
zimbatm
Ah right. Makes sense. You can re-use the readers list that you also use in
case of ownership transfer.

Any hints you want to give on how to handle node failures ? For me that's the
potential weak point.

~~~
pron
Exactly.

Fault tolerance is really the hardest part of the implementation. The
documentation contains a detailed explanation of the features and how to use
them. A future blog post will explain all algorithms in detail (it's simply
too long to post in a comment). In the mean time, you can take a look at the
code on GitHub if you feel like it.

------
javajosh
Perhaps this demonstrates a lack of imagination, but I can't see even the
sketch of how this piece fits into building a real game. Presumably one has
thousands, or even millions, of clients simultaneously interacting with the
world and with each other. This system would imply that your (low latency)
world-state would be kept in memory on a small cluster of Galaxy servers.

The novel quality of this system is that node data-locality is determined by
how you access the data.

Let's say you have 10 server nodes and 10^6 clients. That means that, in the
best case, each node deals with 10^5 clients. Presumably clients (or players)
can move through the simulated world, hopping from node to node.

Here things get hazy in my mind. For example, how does the client know which
server to connect to? Does it just make a guess and get redirected if it
guessed wrong? When a player crosses a node threshhold, how does that work?

I'm thinking there must be a central character store - basically a traditional
database, that handles initial node affinity. Position in the world is part of
character state, and when you login, you'll be handed off to the correct node
based on that state.

But if this is how things work, when would you ever need your "cache lines"
moved from node to node? The world-state is spatial so why move it from node
to node? I guess that's the crux - I can't think of any other data that would
need to move between nodes other than player data, and in that case I don't
see what data it would need to take with it.

~~~
pron
I would really like people to treat this as an experiment in distributed
systems design rather than a product for games, but because our main product
is intended to be used by MMOs (as well as other industries), let me address
your scenario.

> how does the client know which server to connect to?

Galaxy doesn't handle any client connections, only connections between cluster
nodes, but if you were to build something on top of that connects to clients,
then, yeah, starting with a guess and redirecting is ok. An initial node that
simply directs connections might work, too. And if players move from one place
to another, having your communication layer telling them to connect to a
different node is pretty much what we had in mind.

> when would you ever need your "cache lines" moved from node to node?

Yes, player data, NPC data, vehicle data - anything that moves. BUT, another
big reason for data migration is load-balancing. Continuing with your game
example, if a lot of players congregate in one area, handled by one machine,
you may decide to split it to two machines, and migrate half of the
information there.

If you were to use Galaxy for a graph database (forgetting the MMO use-case
for now), then while the graph vertices don't "move", changes in the edges
might make you decide on a better distribution of the vertices over the
cluster.

~~~
javajosh
This response fills me with unease. You present MMOs as a core use-case, and
then equivocate your support for MMOs, preferring to hand-wave about failover
and graph databases.

The best way to experiment with distributed systems design is to build a real
distributed system - or at least be able to sketch one out.

Frankly, this seemed like a solution looking for a problem, and your vague
responses are supporting this. The interesting dynamic, of being able to move
data between nodes as a side-effect of access patterns is interesting, but
it's not clear how an MMO could really use this dynamic to good effect.
Indeed, even the fault tolerant case, it's not clear how this dynamic would
help failover - I mean, would you need to duplicate access patterns prior to
node failure to ensure dual-local data?

Frankly, I think you should focus on one use-case (MMO, graph database,
something) and hand-wave a complete solution that really leverages the novelty
of your approach. Get specific and talk about what actually happens when
"things move".

~~~
pron
Alright, sorry for the confusion. Our commercial offering (SpaceBase) is very
much targeted at MMOs and real-time LBSs. However, like many start-ups we
really do like building cool stuff. And while it indeed the case that Galaxy
will soon be a offered as a component of SpaceBase, it is has a very different
design from other memory-grid projects/products, so we decided to open-source
it to the community and let it explore other possible uses.

My post was meant to be an introduction to a series of very technical blog
posts discussing theoretical and practical aspects of distributed systems. The
post was not meant to serve a clear commercial purpose, so I was trying to
steer the discussion away from commercial uses and more to its CS aspects. You
know, we really find this stuff interesting. Some of my future posts will
discuss the more theoretical sides of Galaxy and will drill very deeply into
its design and algorithms, while others will discuss how SpaceBase will make
use of Galaxy to help MMOs build huge, rich worlds, and LBSs track lots of
moving objects in real-time. To be more concrete and give just a taste, I'll
say this: when SpaceBase runs on top of Galaxy, objects are transferred from
one node to another to create a dynamic area-of-responsibility for each node.
This means that each node will be responsible for processing all objects in
some region of the game world (or real world for LBSs). But the regions are
_dynamic_ \- namely, they shrink and grow to accommodate non-uniform load, so
that small busy areas will be split over several nodes, while large,
relatively vacant ones will be handled by just one.

------
azakai
Sounds like Project Darkstar,

<http://en.wikipedia.org/wiki/Project_Darkstar>

------
epaik
Looks pretty cool, although I don't know how useful this would be for games.

Isn't the only real bottleneck for large-scale real-time MMOs the network
bandwidth needed between the server and client? While this tech would improve
the efficiency of handling data on the server, it wouldn't be able to solve
the inherent problem of network limitations.

Major props for making it open source though, I'll enjoy looking at the code.

~~~
pron
Fact of the matter is that, other than EVE online, there aren't any "large-
scale" MMOs out there. Most of them are limited to one server per
world/shard/instance (and even EVE is limited to one server per solar-system
AFAIK), and the number of concurrent players that entails. This is a first
step in helping them build bigger worlds with more players, that seamlessly
scale over a cluster.

~~~
Arelius
I'd argue that the lack of "large-scale" MMO's is more due content production
problems, and activity density (I don't care how good this grid database is,
there is a _density_ that will bottleneck it, at the very least, it will
bottleneck the client's bandwidth pipes if nothing else.) Rather than huge
technical limitations. MMO developers are pretty familiar with sharding these
days.

~~~
rdtsc
I thought of that too. So if you have a large MMO presumable players wouldn't
be uniformly distributed in the world. As in, here, there is a fairly constant
100 players per sq km. Rather I see hubs forming (maybe with some power law
distribution) -- maybe a large city, market, planetary system, ring of hell, a
battlefield that would disproportion ally hold a large number of entities vs
other ares.

There then instead of node ownership based on the space grid, you'd want to
somehow have node ownership based on clustering/density.

So maybe constantly iterate a K-means clustering algorithm, where servers are
cluster centers and every player/client in the cluster belongs to that server.

That would be my back of the envelope approach. It probably has lots of
terrible flaws that I haven't thought of yet.

~~~
Arelius
I mean, I imagine that the system can handle the density by progressive
balancing the tree. The problem is that gameplay doesn't do the same thing, A
client wants to see all characters within 10(or w/e) meters, not the closest
10 characters.

------
ukd1
Awesome. I'm going to have a play with this when I get a sec.

------
fredsters_s
love the design

------
nirvana
I guess if you live long enough you see everything.

I was part of a team that invented "spacebase" back in the mid-1990s! We were
able to support millions of simultaneous MMO players in the age when most
people were using dialup modems and had vastly less bandwidth and vastly
higher latency than they do now. This technology ended up being acquired by
Sony, and used as part of the playstation network.

Like this company our work grew out of simulation programming done originally
for the military (in this case the DoD) and like this company we provided an
API and solution to rapidly partition the space so that the game client would
only need to know about objects located near it according to in-game geometry.
Like this product ours was fully distributed, etc.

Alas, we were ahead of the age of MMOs, though while World of Warcraft didn't
yet exist, Ultima Online did, and there were a lot of other attempts at MMOs.

Nowadays people if there was less temporal difference people would say "They
ripped us off!" but I can totally believe this company had the same idea...
and they saw a green field because there were no competitors.

The problem is, there were no competitors because (at least back then) game
developers were not interested in solutions they didn't invent themselves.
Maybe that has changed.

~~~
Arelius
> The problem is, there were no competitors because (at least back then) game
> developers were not interested in solutions they didn't invent themselves.
> Maybe that has changed.

I'd say that if you disregard indie development, that is still pretty much the
case. There is certainly a bigger market for middleware, but I think it's
generally met with skepticism.

