

Titan Graph Database Integration with DynamoDB - dmahajan
http://www.allthingsdistributed.com/2015/08/titan-graphdb-integration-in-dynamodb.html

======
lobster_johnson
So is Titan considered alive and actively maintained now?

Back when DataStax acquired Aurelius, they announced [1] that they would stop
developing Titan, and for a while it looked like it was completely dead. It
seems DataStax are maintaining it somewhat, but there have been only 74
commits this year, all/almost of it from DataStax, and there's not a lot of
meat on those commits. Plenty of open issues in the Github tracker.

For example, I noticed that Wikidata was considering Titan but dropped it [2]
after the Aurelius announcement, and ended up with a fairly obscure database
called BlazeGraph instead.

[1]
[https://groups.google.com/forum/#!topic/aureliusgraphs/c07WE...](https://groups.google.com/forum/#!topic/aureliusgraphs/c07WEdH-
epY)

[2] [https://lists.wikimedia.org/pipermail/wikidata-
tech/2015-Mar...](https://lists.wikimedia.org/pipermail/wikidata-
tech/2015-March/000740.html)

~~~
jerven
I think the wikidata was the right decision :) SPARQL really is lovely when
dealing with public databases over http. But then I am biased by the success
of sparql.uniprot.org :)

On the other hand the usecase of wikidata and uniprot are quite different from
most graph database deployments who have only internal or controlled access
via APIs.

Still UniProt is a graph with 3 billion nodes and 15 billion edges so not tiny
but not humongous either. Wikidata is a bit smaller if I recall.

Titan seems to have a different use case, much more orientated to graph
traversal than analytics on graph modelled data. So I can understand that many
systems need something like titan.

~~~
lobster_johnson
For my part I'm evaluating graph databases for content storage, as backing
store for webapps, as an alternative to relational databases. Number of nodes
less than a million, so very small datasets by most standards, and I do need
fast ad-hoc queries (what you call graph traversal, not analytics), sharding
and transaction support.

~~~
rspeer
Other people may convince you otherwise, but I believe there are no mature
graph databases.

You would use them if you want to experience the bleeding edge of databases
and exploring uncharted territory excites you, and not if you want to get
webapps built.

~~~
emileifrem
I'm curious, what is lacking from Neo4j to qualify it as a mature graph
database?

We've worked on it for over a decade, it's used in production by thousands of
community users, hundreds of customers and 75+ Global 2000 companies (see
[http://neo4j.com/customers](http://neo4j.com/customers)). For many of those
Neo4j is used in business critical use cases, i.e. they require Neo4j to be up
and running every minute of every day or it'll show up in their next earnings
call. If you've shopped online or in a US retail store this week for example,
it's very likely that you've used Neo4j. There's rich support for pretty much
any programming language and framework out there, an ecosystem of consulting
partners whose sole business it is to do Neo4j implementations, 10+ books
written specifically about Neo4j, rich online training, formal enterprise
support backed by a global commercial organization, an active community.
What's missing?

I'm not trying to be facetious -- I'm genuinely curious as to what you feel is
missing to consider it mature.

~~~
rspeer
The last time I tried Neo4j was in 2010 or 2011, when I was trying to build
ConceptNet 5
([http://conceptnet5.media.mit.edu](http://conceptnet5.media.mit.edu)) on it.

It had showstopping security problems when bound to anything but 127.0.0.1, so
I came up with a software firewall to put around it and hoped for the best. It
promised Lucene search but its implementation was full of Lucene injections,
unless I escaped every special character I could think of like a freaking PHP
programmer. There was no way to get data in faster than a slow trickle, unless
that data was somehow already in another Neo4j database. Doing any interesting
graph operations led to interesting messages about running out of "PermGen".
And before I could even get all the data in, it had consumed enough resources
to blow my academic AWS budget for months.

I was on the mailing list looking for support, and found it pretty lacking.
The best I ever got was a bunch of Java code to try (my code is in Python).

I use SQLite now. It doesn't do very much, but it does what it's supposed to,
and that's great.

If Neo4J has improved significantly since then, forgive me that I'm not
rushing back to try it again.

~~~
emileifrem
That sucks. :( Sorry about that. Neo4j isn't perfect today and it certainly
wasn't perfect 4-5 years ago. We're working hard on it tho!

And thanks for being specific (amazed that you remember specific issues from
five years ago!). I don't remember the 127.0.0.1 security problems, but I
don't hear anything about them so my guess is they've been addressed. We have
a lot of finance and government customers that have high requirements on
security. As for your Lucene issues, we did a complete overhaul of our search
and indexing story in Neo4j 2.0 (released late 2013). We've continuously
improved import performance (which has traditionally been a weak spot) and
Neo4j 2.2 includes a batch importer which injects >1M records / sec sustained
pace at scale (10s of billions of records) on commodity hardware. As for the
memory management issues, we like many other data products written in Java
struggled with GC for a long time, and like many others we ultimately
concluded that we had to move a lot of the critical parts off heap / manage
the memory ourselves, which significantly improved memory utilization.

I understand that you got stung historically and therefore hesitate to check
us out again. And if SQLite is working well for you, there's no need to! But
Neo4j and the graph space has matured a LOT since 2010 and fortunately I don't
think your "bleeding edge" experience from 4-5 years ago will be replicated
anymore for someone coming new into the space.

Thanks for the feedback.

~~~
jerven
4-5 was a very long time ago in graph db time :) neo4j and its competitors
have changed the lot!!

While neo4j has it's proponents. The lack of standards support means that as a
data provider it's hard to support.

~~~
okram
Check out [http://tinkerpop.com](http://tinkerpop.com). Apache TinkerPop 3.0.0
was released in June 2015 and it is a quantum leap forward. Not only is it now
apart of the Apache Software Foundation, but the Gremlin3 query language has
advanced significantly since Gremlin2. The language is much cleaner, provides
declarative graph pattern matching constructs, and it supports both OLTP graph
databases (e.g. Titan, Neo4j, OrientDB) and OLAP graph processors (e.g. Spark,
Giraph). With most every graph vendor providing TinkerPop-connectivity, this
should make it easier for developers as they don't have to learn a new query
language for each graph system and developers are less prone to experience
vendor lock-in as their code (like JDBC/SQL) can just move to another
underlying graph system.

~~~
jerven
Its more about data interchange support, i.e. we could support GraphML
instead/next to the RDF varieties. But this would be difficult for us to
generate in a streaming way.

Then for our end users, we would need to hack in a namespace convention to
avoid issues when integrating our data.

Then TinkerPop misses the SERVICE concept for federated querying in SPARQL1.1,
which is essential for our endusers who do knowledge discovery (i.e. small
biology labs without the inhouse capability of running their own large
databases).

------
vosper
I'm considering different graph DBs for a problem at work. I'd _love_ to hear
anyone's experience with Titan in production (or any other graph DB, for that
matter!)

If you're in SF or Oakland and I could buy you lunch or a beer and talk about
graph DB's with you for an hour, please find my email in my profile and drop
me a line :)

~~~
jazzido
I tried _really_ hard to use Titan for my project, but the tooling and the
query language is —in my view— just atrocious. Setting up a running instance,
trying to import my moderately complex property graph data into it, and
navigating Tinkerpop's "funny" documentation was a real pain.

Then I decided to switch to Neo4j, and I was up and running in literally an
afternoon. Also, its Java API is very well designed, which allowed me to fork
and extend an integration plugin with Elastic in a couple days
([https://github.com/jazzido/neo4j-elasticsearch/](https://github.com/jazzido/neo4j-elasticsearch/))

~~~
optimuspaul
I've used a bunch of graph databases and I find they all fall short in one way
or another. Almost all of them don't do anything that a good relational model
can't do (except maybe faster for some queries). The biggest thing that leads
me to believe that the graph database arena is not mature is that Tinkerpop is
becoming the de facto standard. Talk about a joke. It is one of the silliest
frameworks ever designed. Most of it's promise is just promise. Apache taking
it over will probably lead to it's death.

Anyway, I'm going to give titan a second look with dynamo. I'm using OrientDB
and neo4j right now. Both have major scaling issues, we haven't reached their
limits yet but expect too very soon. I am wary of Titan because of Tinkerpop.

~~~
jazzido
Can you elaborate on what scaling issues you've run into with Neo? I'm using
it for my thesis project so I don't have very strict scalability requirements,
but would love to know.

------
skyebook
Interesting news, though I'm curious how many people were using just Cassandra
without the additional text search and geospatial functionality provided by
Elasticsearch. Would make sense if Amazon was looking into a plugin for
CloudSearch as well.

------
wingsonfire
Nice! When AWS starts offering elasticsearch it would be awesome package to
have.. (Titan + DynamoDB + ElasticSearch)

~~~
yeukhon
There is dynamo integration with ES ([https://aws.amazon.com/about-aws/whats-
new/2015/08/amazon-dy...](https://aws.amazon.com/about-aws/whats-
new/2015/08/amazon-dynamodb-elasticsearch-integration/)) and there is
cloudsearch.

~~~
wingsonfire
Thanks for pointing out.. AWS is constantly pushing things.

