
NoSQL Benchmark Compares PostgreSQL, MongoDB, Neo4j, OrientDB and ArangoDB - sachalep
https://www.arangodb.com/2015/10/benchmark-postgresql-mongodb-arangodb/
======
cwyers
I really, really distrust these kinds of evaluations when they come from
someone whose product is included in the comparison. Even if everything is
above-board, they're not going to publish if it shows their product just
completely sucks at it. That kind of publication bias makes these kinds of
results a lot less trustworthy than independent benchmarks even if you assume
the best of intentions from the people putting them out there.

~~~
ifcologne
Ingo from ArangoDB here. I agree that vendor tests are always biased, of
course you want to show that your product is competitive.

But as there is no independent institution that compared our product and as we
want to know where we stand with ArangoDB, Claudius did his own tests. And as
the work is already done, why not share it.

We tried our best to do it as open as possible. PostgreSQL performed very well
and we have a problem with memory consumption - have a look at the charts, we
will try to improve there.

\- Every database configuration is public

\- All test scripts are available on Github

\- We publish updates if we get pull-requests or comments with suggestions for
improvements

We did that before and after the last test, some database vendors sent us
improved snapshots of their databases which found their way into the latest
products (OrientDB and Neo4j).

If you have suggestions for improvements, please let us know.

~~~
creshal
> PostgreSQL performed very well

Despite the fact that you crippled it by not using jsonb columns.

~~~
BlackFingolfin
Why so hostile? Personally, I assume this is just an instance of them not
being PostgeSQL experts. So, before jumping to conclusions (namely that they
deliberatly skewed their test, despite being very open in the way they
described the setup), I'll instead wait and see -- perhaps they'll explain why
they used json instead of jsonb, or perhaps (better!) they can updat the test
to also include jsonb.

~~~
SuperKlaus
If they're not Postgres experts then they shouldn't be including Postgres in
the line-up. Or ask someone that ask someone that knows better.

~~~
X4lldux
That's why they welcome pull-requests.

------
don71
I'm Claudius, author of the tests. I've been asked to include a lot of
different databases into the test runs. The most requested databases were
Postgres/JSON and RethinkDB. I started with Postgres. The Postgres manual
states that JSONB might be faster, but some StackOverflow answers indicate
that it takes more space than JSON, while JSON might be slightly more
compatible with legacy code. I've shown the queries and setup to some local
Postgres users. They did not point that JSONB will be much faster for the
kinds of requests used in the test setup. For instance, we do not use special
indexes apart from the primary one by choice.

I wanted to move on to RethinkDB next, but I see your point that a comparison
between the different JSON formats of Postgres can also be very enlightening.
This should replace guessing with hard facts. As always I will update the blog
post and add this tests as well - as we did in the past, see
[https://www.arangodb.com/nosql-performance-blog-
series/](https://www.arangodb.com/nosql-performance-blog-series/).

If you have any improvements concerning the configuration of Postgres or SQL
queries, I'm will be more than happy to include them as well in the update. I
will push the used configuration to GITHUB as well.

~~~
chucky_z
Please refer to the #postgresql channel on irc.freenode.net for any postgres
inquiries, you will receive an answer from experts and core developers on the
correct processes within minutes for almost any question. It is a very active
channel full of knowledgeable folks.

------
jerven
I am just going to say: have a try with the LDBC social benchmark
[http://ldbcouncil.org/](http://ldbcouncil.org/) and
[http://ldbcouncil.org/benchmarks](http://ldbcouncil.org/benchmarks). Where
you can even have audited results.

These are also graph database benchmarks that are synthetic, designed to look
like real data and are quite hard to do well on.

As someone responsible for a public free to use deployment of a graph database
with more than 2 billion nodes and 15 billion edges (sparql.uniprot.org) I
must say this looks like a SPARQL benchmark from 10 years ago.

------
kbenson
I wonder why there's not the equivalent to the Frameworks Benchmark[1] for
databases. It seems we could all really benefit from that. Ideally it would
get to a place where they would be able to simulate real-world worst case
scenarios and test for problems. Each database would likely want multiple
entries with different configs, but if you have some engineered failure
scenarios and tests in the results it becomes obvious what the trade-off is.
Sure, a specific setting may reduce consistency in the event of a failure for
speed, but sometimes that's what you might want, and if the failure cases
clearly show the problem, at least you aren't going in blind.

1:
[https://www.techempower.com/benchmarks/](https://www.techempower.com/benchmarks/)

~~~
crudbug
Having benchmarks for different storage models :
Relational/Document/Graph/Object/XML, would be a better solution.

------
n72
Clicking the link got me "Error establishing a database connection." :/

------
baldfat
I was kind of shocked how good PostgesSQL did.

I still think PostgresSQL and MariaDB are a better tool for most jobs
considered big data.

~~~
JamesMcMinn
Postgres was actually somewhat crippled in these tests since they used json
rather than jsonb for storage, which stores the json in a binary format which
doesn't need to be serialised on reads.

~~~
lobster_johnson
That's not quite correct. The jsonb requires that reads deserialize jsonb into
textual JSON, whereas the json type can be sent directly to the client with no
processing.

jsonb is superior when:

1\. You want to use any of the built-in JSON functions, e.g. for extracting
fields from the document.

2\. You want to index the JSON (either the entire thing via GIN, or individual
fields via ordinary B-tree indexes).

3\. You want to save space; jsonb strips whitespace.

jsonb incurs an overhead on both reads and writes since it must serialize
to/from textual JSON.

------
gegtik
Looking around, it seems that different graph engines pull ahead depending on
the use case.

[http://www.slideshare.net/sympapadopoulos/adbis2014-presenta...](http://www.slideshare.net/sympapadopoulos/adbis2014-presentation)

------
howdoipython
>Error establishing a database connection

------
nevi-me
Like others mention here, I'm skeptical of these types of comparisons. If I
compare myself to my competitors, I won't publish results if they're better
than me.

I tried ArangoDB about a year ago, I think I still have the branch that I
tried it on. After spending a weekend porting some stuff from MongoDB to
Arango, I ended up regretting doing that by Sunday evening. It'd be nice to
fire things up, update the branch's code and see how it performs.

------
hardwaresofton
No RethinkDB?

------
acjohnson55
Comparison of X1, X2, ... , Xn, Y, written by Y

=> suspicion

------
exo762
Hugged to death.

[https://archive.is/cMWCQ](https://archive.is/cMWCQ)

~~~
ifcologne
No, running on XXXXX Cloud. :(

We currently look into it. Thank's for the mirrored page.

------
jbverschoor
and now a 10-node cluster

------
Mindstormy
Would love to see the results for CouchDB in comparison to these.

------
crudbug
Would like to see - Titan with Cassandra backend here.

~~~
lobster_johnson
Out of interest, which version of Titan are you on? I see that 1.0 was
released recently, with little apparent fanfare.

------
ilaksh
Why not include redis or rethinkdb?

~~~
amirouche
redis and rethinkdb are not ACID across documents. So it's not the same
usecase _at all_.

~~~
merlincorey
Are you implying that MongoDB and friends _are_ ACID across documents?

------
covi
The graph dataset is too small in size. It makes little sense for real-world
usage.

~~~
ifcologne
Ingo from ArangoDB: Despite it's the whole dataset of a real-world use case.
:)

[https://snap.stanford.edu/data/soc-
pokec.html](https://snap.stanford.edu/data/soc-pokec.html)

But of course, you need to test and decide on basis of your individual
requirements and use cases.

~~~
covi
Ingo - SNAP has a bunch of other "real-world use case" graphs available for
free, many of which larger than this 1M-node, 30M-edge toy.

I've done a bunch of related benchmarkings, and the _smallest_ real-world
dataset I've used is the _largest_ one on SNAP: orkut.

------
curiousjorge
I have looked at ArangoDB and really hope it takes off, it has some pretty
nifty features I think just at this point the lack of integration with
frameworks like Meteor.js is holding me back.

