But as there is no independent institution that compared our product and as we want to know where we stand with ArangoDB, Claudius did his own tests. And as the work is already done, why not share it.
We tried our best to do it as open as possible. PostgreSQL performed very well and we have a problem with memory consumption - have a look at the charts, we will try to improve there.
- Every database configuration is public
- All test scripts are available on Github
- We publish updates if we get pull-requests or comments with suggestions for improvements
We did that before and after the last test, some database vendors sent us improved snapshots of their databases which found their way into the latest products (OrientDB and Neo4j).
If you have suggestions for improvements, please let us know.
Despite the fact that you crippled it by not using jsonb columns.
You can also see that the very first link outside of the official documentation:
Explains the benefits of jsonb and how to use indexes.
If you're going to include a database, at the very least do 20 minutes of research on it. If that can't be given, then just don't include something.
That's like saying "I've just installed MongoDB and read the intro documentation page, I'm now going to benchmark it against a MySQL cluster, which I have years of experience with, and that I helped develop." (E.g.: they developed ArangoDB, so they should be experts in at least that, right?)
this illuminates the need for "experts" in the different components in the stack. it's intellectually dishonest to claim that a technology is not up to the task when it's not been properly treated in the first place.
Postgres' official documentation on json and jsonb is rather concise: http://www.postgresql.org/docs/9.4/static/datatype-json.html
The difference between the types is described in paragraph two. So, they either benchmarked Postgres while having no idea whatsoever what they were doing, or they were deliberately crippling the competition.
Neither option is confidence inspiring.
(And the first option seems sketchy, seeing how they then went and re-created the whole benchmark as classical RDBMS setup for the second postgres test.)
Biggest problem of (small) German tech companies IMO.
I worked for one myself and they had really good core technology, but that was it. The angelo-saxon companies always outplayed them, because they're just so much better at PR. Even their developers are better at this than most marketeers that burn money on a daily basis in Germany...
Not that I can verify it, because the code in the linked public "No magic, no tricks – check the code and make your own tests!" repository doesn't match the published results and doesn't even work at all with postgres…
EDIT: Okay, they pushed a new version containing the Postgres data now. They ARE using the cripplingly slow json columns, not jsonb columns recommended by the documentation.
If anything it just proves even after almost a decade of these "NoSQL" solutions being around they still can't compete even on basic queries with Postgres which is a fairly conservative SQL solution.
In other words: I wouldn't call a screw driver a bad tool, just because it's not as good at driving nails into wood as a hammer.
And with JSON columns you have to serialize on accesses, which is a lot slower in the read-mostly tests.
I wanted to move on to RethinkDB next, but I see your point that a comparison between the different JSON formats of Postgres can also be very enlightening. This should replace guessing with hard facts. As always I will update the blog post and add this tests as well - as we did in the past, see https://www.arangodb.com/nosql-performance-blog-series/.
If you have any improvements concerning the configuration of Postgres or SQL queries, I'm will be more than happy to include them as well in the update. I will push the used configuration to GITHUB as well.
For instance, we didn't use the index that makes the database go fast to make our own database look good.
These are also graph database benchmarks that are synthetic, designed to look like real data and are quite hard to do well on.
As someone responsible for a public free to use deployment of a graph database with more than 2 billion nodes and 15 billion edges (sparql.uniprot.org) I must say this looks like a SPARQL benchmark from 10 years ago.
I still think PostgresSQL and MariaDB are a better tool for most jobs considered big data.
jsonb is superior when:
1. You want to use any of the built-in JSON functions, e.g. for extracting fields from the document.
2. You want to index the JSON (either the entire thing via GIN, or individual fields via ordinary B-tree indexes).
3. You want to save space; jsonb strips whitespace.
jsonb incurs an overhead on both reads and writes since it must serialize to/from textual JSON.
I tried ArangoDB about a year ago, I think I still have the branch that I tried it on. After spending a weekend porting some stuff from MongoDB to Arango, I ended up regretting doing that by Sunday evening. It'd be nice to fire things up, update the branch's code and see how it performs.
We currently look into it. Thank's for the mirrored page.
But of course, you need to test and decide on basis of your individual requirements and use cases.
I've done a bunch of related benchmarkings, and the smallest real-world dataset I've used is the largest one on SNAP: orkut.