

Titan 0.3.0 Released: Geo, full-text, edge indexing on billion edge graphs - okram
https://github.com/thinkaurelius/titan/wiki/Getting-Started

======
espeed
Titan is a new real-time, distributed, transactional graph database that can
use either Cassandra or HBase as its distributed data store.

Titan 0.3 was stressed tested with Cassandra at 120 billion edges and is
capable of loading 1.2 million edges per second on a 16 machine hi1.4xl
cluster (<https://twitter.com/aureliusgraphs/status/316255164719828992>).

This release provides a complete performance-driven redesign of many core
components, and the primary new feature is advanced indexing.

Here are the new indexing features:

* Geo: Search for elements using shape primitives within a 2D plane.

* Full-text: Search elements for matching string and text properties.

* Numeric range: Search for elements with numeric property values using intervals.

* Edge: Edges can be indexed as well as vertices.

See <http://thinkaurelius.com/news/>

~~~
erichocean
Hey espeed, how well would the SybilGuard[0] algorithm run on Titan? Right
now, we're evaluating Twitter's in-memory graph library (Cassovary[1]), but
that obviously requires a big machine..., and we already use Cassandra
elsewhere, so would prefer to stay with that.

Up to, say, 5 second response times on large graphs would be acceptable.

[0] <http://www.math.cmu.edu/~adf/research/SybilGuard.pdf>

[1] <https://github.com/twitter/cassovary>

~~~
espeed
Titan is one piece of the Aurelius Graph Cluster
(<http://thinkaurelius.com/subscription/>).

1\. Titan is the OLTP piece, and it's very fast at running local-rank
algorithms ([http://markorodriguez.com/2011/03/30/global-vs-local-
graph-r...](http://markorodriguez.com/2011/03/30/global-vs-local-graph-
ranking/)).

2\. Faunus is an OLAP graph-analytics engine that integrates Titan with Hadoop
for global analysis of Titan graphs. Graphs are analyzed using a MapReduce
implementation of the Gremlin graph traversal language. General use-cases
include computing graph derivations/transformations and global graph
statistics. You can then feed the global-algo results back into Titan.

3\. Fulgora is an in-memory, compression-based, transaction-less OLAP graph
processor capable of storing billions of edges within the memory confines of a
single machine. Fulgora is optimized for the execution of massively threaded,
global graph algorithms. It will come out later this year, and you can connect
it to Faunus or feed it directly from Titan.

If you can construct a local SybilGuard algo, you can run it in Titan and get
an immediate response. Otherwise, for global-graph algos, you would feed
Faunus from Titan directly and query Faunus' in-memory graph. There are also
things in the works that will blur these distinctions. More details to come
later this month in a series of blog posts -- stay tuned.

Marko or Matthias will have more insight on how to best run a SybilGuard-type
algo. Right now they're about to jump on a flight to Austin for Data Day Texas
(<http://datadaytexas.com/>), but I'm sure they'll respond when they have a
free moment.

See Marko's YOW! interview ([http://channel9.msdn.com/posts/YOW-2012-Marko-
Rodriguez-Grap...](http://channel9.msdn.com/posts/YOW-2012-Marko-Rodriguez-
Graph-Systems-and-Databases)) and Matthias' Titan/Cassandra talk
(<http://www.youtube.com/watch?v=ZkAYA4Kd8JE>) for more details on the
architecture.

~~~
erichocean
Thanks espeed, the links were especially helpful.

I'm looking forward to Fulgora, sounds great!

~~~
espeed
No problem. I sent a tweet to @erichocean -- is that you? If you want, let's
chat about algo options next week when everyone is back in town.

------
eitland
One of the most interesting part seems not to be mentioned: Apache license.

So far the only real, all-features-included graph database with a permissive
open source license, - or am I missing something?

~~~
xaritas
Well, I am not a Big Data guy, so I don't know if it's "real" enough in terms
of capabilities or maturity, but when looking for a graph database to mess
around with, I came across OrientDB, which also uses the Apache license.
Clearly it's not built for the same use cases or scale as Titan but it seems
to have some case studies and commercial support. I haven't played with it
yet, so I suppose the project could be an elaborate hoax.

<http://www.orientdb.org/>

~~~
espeed
No, OrientDB is not an "elaborate hoax" :) -- it's very real and developed by
Luca Garulli (<https://github.com/lvca>). OrientDB is one of the primary
Blueprints implementations (<https://github.com/tinkerpop/blueprints>) so it
integrates with the TinkerPop stack as well.

------
feniv
I'd love to see more detailed write up about the performance. I'm working on a
natural language parsing problem and have had some success using graphs to
perform chunking in the past.

+1 for using Gremlin! Do you know of any python implementations of it?

~~~
espeed
_+1 for using Gremlin! Do you know of any python implementations of it?_

Note that Marko is the creator of Gremlin :)

There are Gremlin implementations in various stages of development for almost
every major JVM language:

Gremlin-Java (base implementation) -
[https://github.com/tinkerpop/gremlin/wiki/Using-Gremlin-
thro...](https://github.com/tinkerpop/gremlin/wiki/Using-Gremlin-through-Java)

Gremlin-Groovy (original) - [https://github.com/tinkerpop/gremlin/wiki/Using-
Gremlin-thro...](https://github.com/tinkerpop/gremlin/wiki/Using-Gremlin-
through-Groovy)

Gremlin-JavaScript - <https://github.com/entrendipity/gremlin-js>

Gremlin-Clojure - <https://github.com/zmaril/ogre>

Gremlin-Scala - <https://github.com/mpollmeier/gremlin-scala>

Within the next year (before the TinkerPop book comes out), it would be cool
to have all the languages covered, including Gremlin-Jython and Gremlin-JRuby.

Gremlin-Java and Gremlin-Groovy are maintained by TinkerPop.

Gremlin-JavaScript, Gremlin-Clojure, and Gremlin-Scala are being
developed/maintained by community members.

To create a Gremlin implementation, you essentially wrap Gremlin-Java in the
target language's idiomatic style.

If you are interested in helping develop Gremlin-Jython or Gremlin-JRuby (or
any other implementation currently in development), please post to the Gremlin
Users Group
([https://groups.google.com/forum/?fromgroups=#!forum/gremlin-...](https://groups.google.com/forum/?fromgroups=#!forum/gremlin-
users)).

Right now most people use Gremlin-Groovy (it's the original) regardless of
what language they're developing in (think of Gremlin as a domain-specific
language like SQL you use in conjunction with your primary language).

For example, Bulbs (<http://bulbflow.com>) is a Python library I wrote that
supports Rexster, Neo4jServer, and Titan. In Bulbs, you edit Gremlin scripts
in Groovy text files, and when you create a Python Graph object, Bulbs sources
your Groovy files and caches the Gremlin scripts in a library so they are
readily available for when you want to execute them on the server.

See <http://bulbflow.com/docs/api/bulbs/rexster/gremlin/>

~~~
vorg
> Gremlin-Clojure, and Gremlin-Scala are being developed/maintained by
> community members

Would be nice if other Groovy-based products like Gradle enabled a Scala
and/or Clojure frontend to appeal to us who are fussier about what shell
language we use.

------
richardjordan
Really happy to see this. Testing with Titan at the moment and very happy with
it so far.

------
dubcanada
The coolest thing about this is the getting started narration. I love Greek
mythology!

------
Goranek
Comparison with neo4j?

~~~
espeed
Titan is distributed (but can be run in single-server mode). Neo4j is
master/slave.

~~~
btown
Also note that neo4j requires a license to run in high-availability/multi-
server mode.

------
agilord
Any plans to support other storage backends? Postgresql and Riak comes in my
mind.

~~~
espeed
Anyone can implement new Titan storage adapters by implementing a few
interface classes.

Look at the Titan/BDB storage adapter for a simple example:

[https://github.com/thinkaurelius/titan/tree/master/titan-
ber...](https://github.com/thinkaurelius/titan/tree/master/titan-
berkeleyje/src/main/java/com/thinkaurelius/titan/diskstorage/berkeleyje)

It implements the KeyValueStore interfaces (there are other interfaces for
different types of DBs, such as KeyValueColumnStore, etc):

[https://github.com/thinkaurelius/titan/blob/master/titan-
cor...](https://github.com/thinkaurelius/titan/blob/master/titan-
core/src/main/java/com/thinkaurelius/titan/diskstorage/keycolumnvalue/keyvalue/KeyValueStoreManager.java)

[https://github.com/thinkaurelius/titan/blob/master/titan-
cor...](https://github.com/thinkaurelius/titan/blob/master/titan-
core/src/main/java/com/thinkaurelius/titan/diskstorage/keycolumnvalue/keyvalue/KeyValueStore.java)

[https://github.com/thinkaurelius/titan/blob/master/titan-
cor...](https://github.com/thinkaurelius/titan/blob/master/titan-
core/src/main/java/com/thinkaurelius/titan/diskstorage/common/AbstractStoreTransaction.java)

If you are interested in implementing a Titan storage adapter for a new
backend datastore and you have questions, you can discuss it in the Aurelius
Graphs group:

[https://groups.google.com/forum/?fromgroups#!forum/aureliusg...](https://groups.google.com/forum/?fromgroups#!forum/aureliusgraphs)

