

Native multi-model can compete with pure document and graph databases - fceller
https://www.arangodb.com/2015/06/multi-model-benchmark/

======
dexterchief
I've been using ArangoDB for a year now and I think they are definitely on to
something cool.

Having stumbled upon some really complex data a few times now, I am
increasingly appreciating how amazing it is to model your data any way you
need, without having to deal with the complexity of running multiple data
stores.

Cool to see that I apparently didn't give up any performance to get the
flexibility. :)

I'd love to see them push the geospatial capabilities a little further, but
they are already pretty decent.

------
amirouche
This does not explain why arangodb is faster. Stating that «Native multi-
model» is a killer feature would be more interesting if it was explained what
it means outside of «arangodb=graph+k/v+document stores». What is the
difference between a graph vertex without edge and a document at the storage
level? arangodb is faster than wiredtiger? suspicious.

~~~
fceller
Hi amirouche, I'm Frank, CTO of ArangoDB. Claudius benchmark looked at queries
occurring a typical social network project. The tests shows that wiredtiger is
indeed a bit faster for reads and writes. The neighbors of neighbors is
typically a question you would ask a graph database, not a a document store.
Therefore, you would set-up two databases and ask MongoDB the document
questions and Neo4J the graph questions. If you use a native multi-model
approach, you only need to setup, maintain one database. The response times
for example for reads and shortest paths are comparable to the specialized
solutions.

For the technical difference at storage level: graphs and documents model are
in my opinion a perfect match, because a vertex (and an edge for that matter)
can be stored as ordinary documents. This allows you to use any document query
you have in a document (give me all users, which live in Denver) and start
graph queries from the vertices found in this manner (give me their 1 and 2
level friends).

------
amelius
Why the "native" adjective? Aren't all databases native?

~~~
phpnode
Some people build "graph databases" on top of storage backends that are ill
suited for such workloads. e.g you can build a "graph database" (or K/V store)
on top of MySQL, but the performance is terrible -
[http://java.dzone.com/articles/mysql-vs-neo4j-large-
scale](http://java.dzone.com/articles/mysql-vs-neo4j-large-scale)

a "native graph database" is one that is actually designed for the task.

------
codewithcheese
How does OrientDB compare to ArangoDB?

~~~
phpnode
OrientDB fails to deliver on its promises. It has a load of features but they
are poorly thought out and/or broken.

ArangoDB is OrientDB done right, but it's a lot younger.

If you're considering using either, you owe it to yourself to investigate
whether postgres's Common Table Expressions [0] can do what you want instead.
If you can stick with something more mature like postgres, then you'll be
saving yourself a lot of pain.

[0] [http://www.postgresql.org/docs/9.1/static/queries-
with.html](http://www.postgresql.org/docs/9.1/static/queries-with.html)

~~~
crudbug
"ArangoDB is OrientDB done right"

How are you backing this ? I am sure Luca from OrientDB will have some
comments.

~~~
phpnode
It's my opinion based on working with OrientDB a _lot_ and ArangoDB a _bit_
for the last 12+ months. I used to be a big cheerleader for OrientDB, but now
I don't recommend it. I'm sure Luca will have some comments but he's
interested in selling his product.

~~~
lvca
Guys, IMHO I think Charles Pick (alias phpnode) doesn't deserve such
attention. Even if he's trying so hard to start a flame against OrientDB I'd
rather like to celebrate the Multi-Model approach. Long life to the Multi-
Model approach ;-)

~~~
dmarcelino
I've been using OrientDB for the last 7 months and as such I've been following
@lvca and @phpnode (OrientDB node.js driver creator/maintainer) work closely.
I have great respect and admiration for both and for what they've achieved!

@lvca, I find your comment distasteful and it's not the first time I see you
trying to shutdown people who criticise OrientDB
([https://news.ycombinator.com/item?id=9253488](https://news.ycombinator.com/item?id=9253488))
when you should really be thanking them for exposing the issues they face (a
lot of people don't go public). It would be more productive to engage
@phpnode, @anonwarnings and others, listen to them, raise issues in the bug
tracker and address them. Accusing people of trolling or flaming doesn't
benefit OrientDB or its users at all...

Unfortunately I haven't tried ArangoDB so I can't make any comparisons with
OrientDB but I hope both succeed!

~~~
crudbug
I am evaluating Graph DB solution for a security analytics project, knowing
internal project politics will surely help.

What is your opinion about OrientDB community ?

~~~
dmarcelino
Given that the posted article is about ArangoDB I don't want steal their
spotlight by turning this into an OrientDB discussion. Send me an e-mail with
your questions and I'll be happy to answer them.

------
maxdemarzi
Never trust a benchmark.

>>The uncompressed JSON data for the vertices need around 600 MB and the
uncompressed JSON data for the edges requires around 1.832 GB.

So why use a 60GB RAM machine for so little data?

Can we get some raw numbers instead of %?

~~~
nosideeffects
I don't understand the % either. They state it is graph of _throughput_, with
higher percentages from baseline being less throughput? If I hadn't read the
backwards description I would've concluded that their DB is really slow on
most fronts.

~~~
ifcologne
Thanks for the hint, the description was cut out. I've updated the chart right
now. (Ingo from ArangoDB)

------
crudbug
why OrientDB left out ? I would love to see the comparison.

~~~
dmarcelino
They've added a graph including OrientDB at
[https://www.arangodb.com/2015/06/performance-comparison-
betw...](https://www.arangodb.com/2015/06/performance-comparison-between-
arangodb-mongodb-neo4j-and-orientdb/)

------
jsteemann
Very interesting! Would like to see a comparison with relational datastores
(e.g. Postgres), too.

------
rmrfrmrf
Pardon my ignorance, but what is a graph database?

~~~
phpnode
It's a database that stores a "graph" of vertices that are connected by edges.
If you were say, creating the next LinkedIn and you wanted to find the
shortest path between two users based on their connections, a graph database
would be a good choice.

Let's imagine you want to see how Fred is connected to Steve, their network
looks like this:

    
    
        [Fred] <-knows-> [Bob]
        [Bob] <-isMarriedTo-> [Sally]
        [Bob] <-knows-> [Alice]
        [Alice] <-workedWith-> [John]
        [John] <-wentToSchoolWith-> [Sandra]
        [Sandra] <-knows-> [Steve]
    

Diagram: [http://yuml.me/6ff3074e](http://yuml.me/6ff3074e)

A "traditional" database like MySQL or Mongo makes this kind of query
prohibitively expensive and complicated, as it must perform a new join for
every connected person in the user's graph.

Graph databases come into their own because they are designed specifically for
efficient traversal of these connecting edges. They typically do this by
storing "pointers" on each vertex to its connected edges, so while a normal
RDBMS requires something like a hash table lookup to resolve a join, a graph
database can simply "jump" to the relevant record via a pointer. This means
that things like Dijkstra's algorithm [0] can be implemented efficiently.

However, "traditional" graph databases like Neo4j require everything to be
structured in terms of vertices and edges. This is often quite inconvenient,
so Multi Model databases like ArangoDB integrate this graph approach with a
document store as well, the idea being that if you can keep everything in the
same db your app gets a _lot_ simpler, you regain things like ACIDity that
you'd normally lose by using 2 separate dbs, and performance should be a lot
better too.

[0]
[http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm](http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm)

~~~
rmrfrmrf
Thank you for this detailed and informative explanation!

------
arthursilva
There should definitely be a bigger dataset version, the teste data is a very
small fraction of the available memory.

~~~
dexterchief
That would be cool. I'm not sure it would change much though. I know someone
working with search data who recently tried out Neo4j with a test data set of
500,000,000 nodes and apparently was really disappointed with the results.

I'm not sure that graph data (generally) is particularly amenable to being
spread across multiple nodes. My understanding is that ArangoDB has
implemented some clustering based on Googles Pregel Framework, so I suspect it
might fare a bit better in my friends test... but in spite of my urging I
don't know that he has had time to recreate the test with Arango. I'm keeping
my fingers crossed.

I don't know if any database is fun to deal with at that size. My experience
with Arango has been an unremarkable amount of remarkably complex data, so I
would also be interested to see the results with something huge.

~~~
jexp
I'd love to hear from your friend and his experience with Neo4j, to see how we
can make it easier / better to configure it correctly for the data volume.

