
Parallel Universe (YC S12) Developing Spatial Databases For Matrix-Style Games - pron
http://techcrunch.com/2012/07/09/parallel-universe-spacebase-will-make-for-new-matrix-style-mmo-games/
======
reedlaw
What a terrible article.

> Oracle and Redis are examples of traditional, spatial database environments.

Redis is traditional? Redis is spatial? What are they talking about?

> Spaccebase is an in-memory technology, which is how it can be so fast.
> Thats’ what truly separates it from the rest. It tracks not when or what
> happened but where it occurred. And that all occurs in-memory. It’s like any
> application that runs this way. It’s very fast in getting data. In this
> case, we are talking about spatial data.

This entire paragraph is ridiculous. Just a bunch of buzz words repeated ad
nauseum. What database does not use "in-memory" technology? I guess that's why
"it's like any application that runs this way". Yeah, and French cuisine is
like any food that's cooked that way.

~~~
dmix
Yet we still link to TC articles...

YC startups always launch on Techcrunch and top HN within an hour.

Is there not a better place?

~~~
Ogre
In this case, a direct link to the site (<http://paralleluniverse.co/>) would
be better. There is actual information there. To be fair, the first two words
of the TC article were a link to the site. Signal to Noise over the first
couple words of that article is quite high!

------
Ogre
It's being marketed as a database technology, but from a game development
perspective it sounds more like "Engine" technology. The lines are admittedly
blurry sometimes, but this doesn't seem to offer durability of data, which is
what I think of when someone says "database" in the context of MMO servers.
That said, the problems this is solving are important problems for any MMO
server engine.

Requiring a JVM be embedded in your server is probably not going to go over
well with many game developers. It's not initially going over well with me,
but if I were currently trying to solve this problem, I would at least want to
run some benchmarks against it.

I guess the big question I have is, how would this make existing MMOs better?
If we take WoW as an example, dumping everyone into a single realm (shard)
would solve the problem of wanting to play with your friends, who are on
different realms. But 200x as many players competing for kills or standing
around the auction house aren't going to make the game better. Most of your
time would still be spent in instances, with a maximum of 25 players (40 if
you go back in time - a case whre less massive was deemed more fun). And
that's ignoring the problem of actually rendering that much stuff on the
client.

I don't mean to sound overly critical, I could find ways to use this, but I'm
not sure they're marketing it very well to MMO developers. Maybe the Eve guys
would like it, but other than that game, I'm not sure the scaling problems
MMOs have are the same scaling problems that this solves.

~~~
chii
"Most of your time would still be spent in instances, with a maximum of 25
players (40 if you go back in time - a case whre less massive was deemed more
fun). And that's ignoring the problem of actually rendering that much stuff on
the client."

you haven't played Eve Online have you? Its a game that allows you to play, on
the same "shard", with hundreds of players on grid at the same time, fighting
it out.

Any tech that puts the massive back into MMO is good - lately, its all been
instancing and sharding. That sort of game isn't MMO enough to make it MMO. In
fact, i've been hearing lately, people calling Diablo 3 an MMO. What a farce.

Edit: oh, didn't you mentioned Eve already.

~~~
debacle
I always thought instancing and sharding were solutions to the social aspects
of server overpopulation - it's not fun to do a quest when you have to wait
for spawns, etc.

------
kelleyk
I'd be curious to hear more about what they're actually doing (technically
speaking). Is this a wrapper around kd-trees, R-trees, and their friends and
relatives? Is it something fancier?

~~~
pron
The spatial index is an R-tree variant that doesn't degrade _and_ allows
concurrent writes (multiples writers at a time, let alone readers). Readers
don't block writers and don't use locks, while writers lock a small subset of
the database for atomic transactions (one of the most common operations we
need to support is a "move" of an object from one location to another - that
has to be atomic).

For parallelization of queries and transactions we use fork-join.

~~~
tintor
How do you do lock-free updates to the R-tree structure?

~~~
pron
Most updates aren't lock-free.

------
jblow
As a longtime profesional game developer, I kind of don't get it. Is this
company out of business as soon as Oracle implements a loose octree that is
ACID? Is there something wrong with spatial hashing?

etc, etc.

~~~
jandrewrogers
It depends on the requirements of your data model. Anybody with a modicum of
competence can scale a point cloud but real-world non-point geometry models is
where systems like Oracle have difficulty. Polygons, lines, vectors, etc are a
real problem. So-called "spatial hashing" (it had a different name in the
1970s and again in the early 1990s -- the wheel of computer science) has a
number of real limitations which is why it was never really used (Oracle has
patents on it that have already expired!).

Also, traditional transactional database engines are not designed for the
kinds of insert/update rates that are common for many spatial applications.
This is a problem for machine-generated data sources generally. It requires an
architecture designed specifically for that kind of (ab)use case.

~~~
hastur
How would you do it then?

I mean, if you wanted to implement a 3d spatial db like that yourself?

~~~
jandrewrogers
This a big topic but there are two main components.

First, you need a storage engine architectures that is designed for very fast
appends concurrent with queries. This is trickier than it sounds because you
can't use secondary indexes and queries still needs to be efficient. Some
recent database engines focused on non-batch "real-time analytics" are
designed for this; it is a different internal model than traditional
distributed analytic engines. Database engine boffins know ways of achieving
this, esoteric but well-understood.

Second, you need a distributed interval index i.e. a distributed data
structure that can act as an efficient index for 3-dimensional cube types.
Scalable distributed interval indexing requires that data models be embedded
in a higher dimensionality space, so at least 4-dimensions. The well-known
example from literature is multi-level grids but those have many limitations.
The state-of-the-art structures are adaptive spatial sieves; advanced versions
are computationally efficient even for very high dimensionality cubes.
However, these algorithms are encumbered and little has been published on
them. (Disclosure: I am the inventor of the first useful spatial sieve
algorithms. The idea dates back to at least 1990 but had unsolved theoretical
issues until 2007.)

I am building a real-time analytical database similar to this right now, and
petabyte-scale 3-dimensional spatial data models are a core part of its
functionality. Building fast, distributed 3d spatial databases are achievable,
it just requires a different data structures and algorithms skill set than you
would use for more traditional database designs.

------
TimJRobinson
Is this going to be targeted towards big companies or startups (in pricing) ?
I'm working on a game at the moment that could definitely benefit from this
tech and I'd love to try it out but I'm bootstrapping so not sure if it would
be affordable.

~~~
pron
Shoot us an e-mail.

------
sown
So, say hypothetically, if I wanted to try my hand at implementing a toy
version of something similar for fun, what would I need to at least have a
grasp at?

------
makmanalp
Hmm, I wonder how many potential customers they have.

~~~
jandrewrogers
Far more than you might imagine, at least if they develop a more sophisticated
geometry model. The real limitation may be the pure in-memory model. Many of
the high-value applications have enormous working sets, much larger than what
will fit in a small RAM-based cluster. Hundreds of terabytes is where the
applications just start to become interesting. In this sense, focusing on game
worlds is probably a good idea because it is one of the use cases that will
fit within their immediate scaling targets.

They are correct that traditional database spatial indexes are slow and scale
terribly, being designed for relatively small and static data sets. It does
not sound like they are pushing the state-of-the-art, just meeting an under-
served need in the gaming space, a market which I can validate as existing but
with somewhat limited revenue potential even if you sign the major game
companies. It is a good "base hit" startup opportunity if they can execute it
well. (I have designed massive-scale real-time spatial database engines for a
number of years; we passed up the gaming market because the size of the market
was too small relative to other markets for this technology.)

~~~
pron
Actually, we're very much pushing state-of-the-art :)

It's just that we're tackling a different problem - low latency applications -
and when I say low latency I mean in the microseconds range. Big spatial data
is a very interesting problem, and tracking a large number (though less than
billions) of moving objects is another - though very different - interesting
problem.

For working sets that don't fit in one machine's RAM we offer a cluster.

~~~
jandrewrogers
While I do not currently work on ultra-low latency spatial databases (more
like milliseconds), I have in the past and so have some idea of what is out
there. :-) I am not all that familiar with the design of your system so I was
mostly working off the scale numbers offered.

The best example I can think of is an ultra-low latency in-memory prototype I
designed in 2009 on a parallel cluster. The working set was several billion
irregular 3-cubes ranging in (metaphorical) size from birds to hurricanes. The
average CPU cost of an access operation was sub-microsecond so the latency was
mostly interconnect related (which was a slow but proper low-latency
supercomputing fabric). The current work I do uses complex geodetic polygons
geometries so the computational cost of operations is quite a bit more but the
actual computational cost of the access method is below the noise floor of the
network fabric.

You are correct though that if you are mostly dealing with tracking points or
cubes then in-memory is sufficient to hold many applications. It is the
sensing data that really kills you... :-)

------
hendzen
Is this limited to 2d and 3d space or can it handle n-dimensions?

~~~
pron
2d and 3d only. Data of higher dimensions is usually not as dynamic (the data
points are created and deleted but rarely moved), and SpaceBase is optimized
for lots of updates. Also, higher dimensions require a different data-
structure (see:
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.6...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.6661))

~~~
hendzen
n-dimensional would be useful for realtime machine learning applications.

