
Titan: A Highly Scalable, Distributed Graph Database - d0ugal
http://thinkaurelius.github.com/titan/
======
jandrewrogers
What makes this graph database "highly scalable, distributed"?

There are difficult theoretical computer science problems that effectively
limit the parallelization/distribution of generalized graph operators. To
achieve high scalability you have to solve these computer science problems
first. If this design offers a novel solution to the longstanding computer
science problem then kudos, but nothing at the site suggests this is the case.

Many graph databases have claimed high scalability and distributability but
none of those claims have held up over time due to the aforementioned computer
science problems. This may be a very nice graph database but I am skeptical of
the claims of "highly scalable, distributed" unless there is evidence that it
uses fundamentally new theoretical computer science to achieve that.

~~~
th0ma5
You mean CAP theorem? <http://en.wikipedia.org/wiki/CAP_theorem> I imagine it
is either the A or the P that gets to be the victim, but I'm not sure which in
this case.

~~~
jandrewrogers
No, I am referring to the set of problems related to graph partitioning.

This is essentially the same underlying problem that is the source of why
distributed NoSQL databases do not support join operations. In the case of
NoSQL databases, they simply do not support joins because it is not a core
operations. (Technically you can still do a join, it just has terrible scaling
characteristics.)

The fundamental operation of graph databases are relational joins by another
name, which means that graph databases have the same limitation on
distribution that distributed NoSQL databases have on joins. However, unlike
NoSQL databases it is their primary operation so they can't just not support
it. Consequently, the only way to have a "graph database" that is massively
distributable is to solve the same problem that prevents distributed databases
from supporting joins.

~~~
ferrouswheel
I've been working on the graph partitioning issues for distirbuted graphdbs
for a while (my own pet project that my brain won't let me give up on) but
it's been from the perspective of someone who's not been keeping up to date
with the Comp Sci literature. Have you got any suggestions for canonical
papers from the last decade or so, for the state of the art?

~~~
jandrewrogers
There has not been much that is both new and interesting in the graph
partitioning literature in a long time. What you already know is probably not
too far off from what is in the current literature. The literature on this
topic has been stagnant for years.

Solutions to the graph partitioning problem exist and among people doing high-
end graph analytics this has been rumored for years now. It just is not
published and people that know how it is done are slathered in NDAs. I know of
two different (related) algorithms for parallelizing graph analysis. IBM
Research currently has the most advanced algorithms for graph analysis and
they disclose very little about how they work.

------
okram
We are going to address alot of these questions in our presentation tonight.
However, for the Hacker News crew that won't be there tonight, here is an
early release of the Titan talk.

[https://speakerdeck.com/u/okram/p/titan-the-rise-of-big-
grap...](https://speakerdeck.com/u/okram/p/titan-the-rise-of-big-graph-data)

<http://titanbiggraphdata.eventbrite.com/>

Enjoy!, Marko.

~~~
hendler
Great presentation.

~~~
benjaminRRR
Also it appears like a great way to take speakerdeck down...

------
oacgnol
Very interesting. Since it implements Blueprints, does it also have support
for Furnace (graph algorithms)? If so, does this imply that graph processing
is done on disk rather than in-memory? I am rather unfamiliar with Blueprints
but I'm wondering how Titan implements that aspect.

------
niho
I don't want to spam this thread, but I feel I have to plug my personal pet
project Related (<https://github.com/sutajio/related>), since it is, well..
yeah... "related" to the topic being discussed.

It can't do half of what Titan does and is a much, much simpler design. But it
is fast, easy to use and works really well for 80% of the use cases you might
have for a graph database on the web (social graphs, semantic web stuff, etc.)

------
nchuhoai
Interesting, does anyone know how this compares with the popular Neo4j
Database?

~~~
norkakn
Each Neo4j node stores all of the data, and it doesn't scale write
horizontally well.

OrientDB tries to scale writes (I'll be testing this in a few months), but
still stores all of the data everywhere.

This looks like it shards the data automagically. If it works well, I might be
able to bang on it a bit, but I'm guessing that it gives shit performance for
complex graph questions.

~~~
okram
Titan exposes graph data over a machine cluster. It is an OLTP system that
allows you to do local neighborhood graph traversals in sub-second time. For
OLAP processing (e.g. global graph algorithms), Aurelius will be releasing two
projects named Faunus and Fulgora in the coming months. These provide Hadoop
connectivity and compressed in-memory representations of "graph slices." We
will be publishing our talk slides tonight that discuss this eco-system of
graph technologies. See <http://titanbiggraphdata.eventbrite.com/>

~~~
norkakn
I really wish that I could attend. We ask a lot of questions like:

Find all Nodes with a property in a tree

Find all leaves L of those nodes

Find all annotations in a DAG of those leaves

Collapse similar DAG entries by backtracking up the graph based on edge
weights

Writes are bulk loaded, and right now, we are just trying to push all of the
graph stuff offline, but there are some limitations to that, and we could
really up our accuracy by being able to perform these queries quickly.

------
joe_the_user
Also,

What is interesting about a graph database relative to simple key-value
database? Storing edges of a graph is trivial for a key-value store and so it
seems like any key-value store could let store the basic graph structure?

Do graph databases support graphic-specific queries and indices?

~~~
JPKab
I can only answer for one advantage I specifically know of regarding graph
DB's over key value: dynamic, mergeable schemas which enforce data integrity
WITHIN the database rather than with code on top of it.

There are many, many people on HN who are much more knowledgeable than I am on
graph DB's, and I sure as hell hope they answer on this question.

I'm curious if this supports the RDF, OWL, and SPARQL standards?

I'm a little tired of graph DB's that focus on scale, rather than speed and
flexibility though. A good one to check out is Stardog. <http://stardog.com/>
I think it just went into 1.0.

~~~
philjohn
It looks like it's Gremlin only.

What triple stores have you looked at? 4store is performant, but doesn't
support reasoning. There's also BigData and Virtuoso which support various
levels of it, and Franz are apparently working on a clustered version of
Allegrograph.

------
PaulHoule
Where's the SPARQL support?

~~~
okram
Titan currently does not have "edge indexing" and thus can not implement
Blueprints' GraphSail interface
[[https://github.com/tinkerpop/blueprints/wiki/Sail-
Ouplementa...](https://github.com/tinkerpop/blueprints/wiki/Sail-
Ouplementation)]. If it did implement GraphSail, then SPARQL would be
supported via the Sail SPARQL engine.

------
sravfeyn
"Vertex-centric indices provide vertex-level querying to alleviate issues with
the infamous super node problem"

Does that mean, for each vertex, is it's sub-graph indexed?!

------
moondowner
Anyone tried to use this with Spring?

~~~
moondowner
Why the downvote? Spring Data Neo4j [1] works excellent, it's pretty popular
and I'm interested has anyone tried working with TITAN in a Spring
project(probably using the Blueprints API).

[1] <http://www.springsource.org/spring-data/neo4j>

