Hacker News new | comments | ask | show | jobs | submit login
Performance Benchmark 2018 – MongoDB, PostgreSQL, OrientDB, Neo4j and ArangoDB (arangodb.com)
155 points by pluma on Feb 14, 2018 | hide | past | web | favorite | 114 comments

I've stopped reading database benchmarks, because they are extremely vague. Instead I spend my time optimizing my current solution/stack. For example Postgresql has hundreds of knobs that you can adjust for almost every scenario you can imagine. Sometimes you have a special query and increase the work_mem just for that session. Other cases you adjust the cost settings for another query/session. You can analyze your indexes and index types. And sometimes you need to rewrite parts of a big query.

Learning all this takes time, you are much better off learning more about your chosen technology stack than switching to another technology stack.

Though in a few rare races, you need a different technology to solve your business problem. In most cases they complement your existing solution, like Elasticsearch/Solr for full-text search or Clickhouse for OLAP workloads.

Agreed. Switching to another system is expensive and the benefit is pretty questionable.

Unless you hit a very specific use-case/bottleneck, which I only ever witnessed once.

expand, please?

I imagine something very specific like having a lot of inserts into a table and that being your main use case. Depending on your data some databases may be better than others and that should be easy to measure.

In most real-world cases the requirements however are not very clear and often conflicting so it's much harder to get data that shows the performance of one system over the other.

> Depending on your data some databases may be better than others and that should be easy to measure.

And the performance difference could be an accidental feature of the design and completely unintentional.

Postgres for instance has a native data engine, so it can store the exact row-ids for a row into an index, but this means that every update to the row needs all indexes to be updated.

Mysql has many data engines (InnoDB and MyISAM to start with), to the row-id is somewhat opaque, so the index stores the primary key which can be pushed to the data engine scans and then have it lookup a row-id internally. This needs an index to be touched for the columns you modify explicitly or if the primary key is updated (which is a usual no-no due to UNIQUE lookup costs).

When you have a single wide table with a huge number of indexes, where you update a lot of dimensions frequently, the performance difference between these two solutions is architectural.

And if you lookup along an index with few updates, but long running open txns, that is also materially different - one lookup versus two.

Though how it came about isn't really intentional.

We just published an Update to the Benchmark, please find it here: https://news.ycombinator.com/item?id=16473117

Haha, I almost spit out my coffee when I saw how poorly we performed here! Reading the code, it's kind of incredible that Neo4j even competes - all the databases benchmarked are running with 25 concurrent database connections:


Except Neo, which is given a single connection to work with :(


When I modify it locally to just set {maxConnectionPoolSize: 25}, like how the other databases are set up, I get roughly an order better performance. neighbors2 went from 0.0226ms to 0.0068ms, for instance. Single write sync went from 3.36415ms to 0.77549ms avg/op..

Benchmarking is hard!

Hey Jake... thanks again for your feedback... we just published an update to the Benchmark following your feedback for Neo4j, please find it here: https://news.ycombinator.com/item?id=16473117

True, benchmarking is really not trivial. Thanks for pointing us to the bug in our benchmark suite and sorry for our mistake. We fixed it https://github.com/weinberger/nosql-tests/pull/24 and reran the benchmark. We will publish an update soon and also fix the stats in the original post. Indeed, Neo4j with connection pooling improved stats in a few test cases but the overall ranking is not really affected.

Thanks again for helping us to improve the benchmark and creating fair conditions for all products tested!

Jan from ArangoDB here:

Did you read the appendix?

"We used a TCP/IP connection pool of up to 25 connections, whenever the driver permitted this. All drivers seem to support this connection pooling, except Neo4j. We sent instead twenty-five requests via NodeJS to Neo4j."

Just to avoid unfair conditions now, if connection pooling is possible with the neojs driver, then you would have to configure the benchmark scripts to not send 25 requests via Node as well.

We tricked ourselves with such stuff when preparing the benchmark.

Yep, I removed the CONCURRENCY constant from the neo4j description.js, so the numbers above are for Neo running without that async.eachLimit call, just like the rest of the databases.

A single session object is backed by a single connection, so even though you "send" 25 requests via async.eachLimit, the result is that they are serialized over the single connection.

Did you read the manual where it says the driver permits connection pooling?


Hi Max.. thanks also for your feedback. We worked it into an update to the benchmark, please find it here: https://news.ycombinator.com/item?id=16473117

For future reference, I find the back ticks to decrease readability, at least on iOS.

Agree, I keep forgetting that those don't turn into monospace on HN. Changing that.

ArangoDB always makes for exciting benchmark posts.

I could see myself there in a bowler hat with a fistful of racing chits screaming “go, Postgres, go.”

I’d love to see a competition were the developers of each database got to use the same hardware and data then tune the hell out of their configs, queries, and indices.

Red Bull could sponsor it. I’d buy a T-shirt.

That doesn't sound that hard to start. Something like RealWorld[1] and the Web Framework Benchmarks[2] combined but for DB workloads. Have one dataset that includes data amenable to OLAP and OLTP, but have separate tests each consisting of OLAP queries, OLTP queries, and combined queries. Choose a low-end, mid-range and high-end set of AWS or GCE instances/configs to normalize against. Let people submit pull requests with new technologies or configs.

You'd want to get some funding to run the tests (or maybe solicit Google or Amazon to see if you could get the instance time donated once a month or something.

If you started small, with maybe a portion of these features, and then scaled up over time, you might actually get to the point where you had tests that emulated a power failure, or master/slave and dual master scenarios and how they handle certain common network errors (split-brain). That would be an amazing resource.

Edit: It occurs to me I probably should have read more of the article, since this is sort of what they are doing already...

1: https://github.com/gothinkster/realworld

2: https://www.techempower.com/benchmarks/

Yeah after I posted it I started thinking about what it would take and what that would actually look like... and how you’d cheat :)

It would probably require a few different categories with some sort of output assertion to validate the query performed right and a means of tracking CPU, usage ram usage, and execution time.

It would be cool to see things like disaster recovery and chaos proofing as well.

Nice idea, I would take part in something like that too.

By the way, there seems to be a raffle of free T-shirts (quite biased, as there's only the ArangoDB logo on them) in exchange for taking part in a survey:


That sound ridiculously good.

Like esports, for DBAs

We could call it DB sports.

On second thought that might be a bad name

My feedback:

- 1.6M documents. This is a tiny dataset. I bet it's much smaller than RAM size (122GB), which means all data is in memory. Interesting benchmark, but not what I'd typically expect for a database (data size >> RAM).

- All databases are allowed either to use all memory or capped at 10GB. PostgreSQL is capped at 128MB, the default shared_buffers value on PostgreSQL 10! This means PostgreSQL is running off of disk, whereas the other databases have all data in memory.

- Just to insist on the previous bullet: I see the point (even though I don't consider it interesting, compared to a minimum tuning of every db) of using default config, but this puts PG on a very unfair position (it's limited to 1/1000th of RAM!). Just set shared_buffers to 32GB (or 10GB to cap all db to 10GB!) and re-run the tests. I bet PG results will improve significantly.

- Other basic tuning parameters (like work_mem or min_wal_size or random_page_cost) may also affect significantly the performance. This could be left as an exercise to do another benchmark with properly tuned databases. If interested, here's a recent presentation I did with many other PostgreSQL tuning recommendations: https://speakerdeck.com/ongres/postgresql-configuration-for-...

Other than this, congratulations for the work on ArangoDB. Building a database is a really brave, hard work.

Hi @ahachete and thanks for your feedback to our benchmark 2018. We worked it into an update, please find it here: https://news.ycombinator.com/item?id=16473117

I hate these types of posts. They do nothing to help the graph database space. How good are they at bidirectional traversals, what about deep traversals in complex networks with more than 50 relationship types? What about x? Every vendor is going to be better at one thing than another, there is always a trade off, but it is irrelevant.

Last week we took a very deep and very wide sales hierarchy query from 10 seconds in Oracle to 4ms in Neo4j. If Arango could it in 3ms, or Orient in 5ms doesn't really matter. The point is graph databases are much better at some queries than relational. They should be blogging about use cases and customer successes, not benchmarks.

>They should be blogging about use cases and customer successes, not benchmarks.

They do, just recently the cases of AskBlue (Location aware recommendations) and Thomson Reuters (Fast & Secure Single-View of Everything)


I am very interested in Graph-Cases and appreciate the support of Neo4J for projects like the Panama- or Paradise Papers that show how connected graphs help to understand former hidden relationships. Graphs are very powerful and a great addition to relational or document approaches.

(full disclosure: I worked for ArangoDB 2 years ago.) And, I've used Neo successfully for a fraud detection PoC recently...

I don't think these results are comparable. Query runtimes are highly dependent on how much memory they can use. Postgres for example isn't configured making it use less memory than other systems. Hence its performance is likely inferior and could be improved significantly by increasing the buffer size.

Hi, Mark from ArangoDB here. We used the default configuration for every Database in the benchmark under the assumption they are picked reasonable. If you have any specific suggestions how to configure Postgres for the specific environment, please let us know.

I think the first step would be to use the automatic configuration generation tool [1]. You give few inputs (like the amount of server memory and number of connections), and get the config.

[1] https://www.pgconfig.org/#/tuning

Thanks a lot for the suggestion. We have used http://pgtune.leopard.in.ua I have appended the resulting config.

The result is that the default config is already very good for our benchmark. There is no visible difference between the old and new config when running the benchmark. We will publish an update to the blog post and show the numbers using the tuned config.

best Frank

DBVersion: 10 Linux, Type: "Mixed type of Applications" 122GB RAM 25 Connections SSD Storage


max_connections = 25 shared_buffers = 31232MB effective_cache_size = 93696MB work_mem = 639631kB maintenance_work_mem = 2GB min_wal_size = 1GB max_wal_size = 2GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1

That is a terrible approach. No database platform is tuned by default.

You may have to run a utility, or edit a config file, but a new system must be configured to fully utilize available hardware, and for the expected workload.

A benchmark run on default config is at best misleading - it smells like marketing spam. You actually dampen interest in a forum like this with such an approach.

Disclaimer: I'm part of the ArangoDB team. As written in the post the whole benchmark is open source. The idea is that you can run it on your own. Also, pull requests are welcome. If you think it's marketing spam, take the chance and improve the configuration. We will publish an update of the post.

One simple suggestion for one platform: https://github.com/weinberger/nosql-tests/issues/22

The more common pgtune CLI is not up to date for PostgreSQL 10+ at this time.

Also there are some old, open issues that indicate the benchmarks have problems:

- https://github.com/weinberger/nosql-tests/issues/16 - https://github.com/weinberger/nosql-tests/issues/13

We just published an Update to the Benchmark including PGtune, please find it here: https://news.ycombinator.com/item?id=16473117

> No database platform is tuned by default.

Not really true any more. Many of the newer databases (e.g. Mongo) and database-like systems (e.g. Elasticsearch) try to give a good default experience.

Since I don't know about current versions of MongoDB, and am genuinely interested, how does Mongo know how much memory to use as an example?

Typically with PostgreSQL, you have to expand the defaults when running on modern server class hardware, so the engine uses enough RAM. Simply letting it use all RAM available can be problematic, since the DB engine and OS have to fight for cache, etc.

Java based systems like Elastic might still need JVM tuning, especially with a distributed platform like ES, which might be run in a large number of small machines, a smaller number of large machines, or containers.

You are exactly right. You must adjust the java heap size on Elasticsearch.

Postgres is configured extremely conservatively by default. While it isn't reasonable to spend time tweaking each database, at least running pgtune would be reasonable for Postgres.

I think it's definitely reasonable to tune important production databases.

But for many people databases have become commodity applications that they install and then forget. So even in reality not all installments will be properly tuned.

Who does this?!

I know it happens because I inherit them often. But seriously knock it off.

Errrrr. Better yet don’t, job stability.

Nothing to see here. Move along.

Which is why many apps are slow... People search for a new toy, rather than learn what they already have.

> But for many people databases have become commodity applications that they install and then forget.

It seems unlikely to me that this group of people would reach benchmark posts.

People using them that way probably don't care about performance very much, though.

Lies, damn lies, and benchmarks. Using default configuration invalidates the result at best, at worst it's knowingly misleading.

Why? Because nobody who cares about performance runs their database with default configuration. Plus some databases, like PostgreSQL are well known to have very conservative settings. So benchmarking with the defaults like that is knowingly trying to make your product look better than it is. You didn't know? Then what are you doing benchmarking databases?!?

Seriously, this is shady as hell.

But which default database configuration has a reasonable tuning to run a benchmark?

The table of times notes:

> The shortest path query was not tested for MongoDB or PostgreSQL since those queries would have had to be implemented completely on the client side for those database systems.

I can't speak for Mongo, but I imagine the shortest path query could be implemented as a recursive CTE or a sproc in Postgres, which would execute server side. It might be a bear to write, and it might be slow, but it should be possible.

I recently used RCTEs in a production environment to operate on reasonably large graphs. We eventually moved to an in memory solution built on top of ETS for various reasons, but performance was not a deciding factor (at least as much as you would think); Postgres handled our queries well beyond expectation, even with heavy factoring into functions. It's definitely not slow.

The largest burden was on development and maintenance. Iterating algorithms and queries powered by RCTEs (and functions) on a changing schema is not scaleable (in effort), at least with the traditional migration scripts approach to upgrades and rollbacks. If you do want go this route, you need to change how you ship those functions and queries.

We theorized that shipping them as part of the application and performing necessary migrations behind-the-scenes would have drastically reduced the maintenance burden. We also considered composing queries completely dynamically and fully leveraging our language (Elixir) to provide a toolbag of functions that aren't functions as far as Postgres was concerned. In either case, the hassle of versioning functions and queries would no longer get in the way iteration.

If you know how to write the shortest path query for Postgres, team ArangoDB would be happy to include it into the next update

Not exactly what you are asking for but this sits on top of PostgresQL and uses non-standard indexing to improve graph queries.


Big missing piece of this was clustered / HA configurations. To run a production level application, you’re going to be running with some sort of clustering or replication, which is going to add a pretty significant latency depending on implementation. But, interesting article, I’m always impressed some of these smaller database companies are still kicking.

Don’t ever trust any vendor benchmarks: https://maxdemarzi.com/2015/10/16/benchmarks-and-supercharge...

In your 2015 article, you criticized that the ArangoDB team restarted the instances after each test run. In this 2018 edition, they don't do this anymore.

You want to trust a vendor that used to restart the db between queries for their benchmarks?

Yes, because they learned. Arango has honest people that are open to criticism.

But the larger issue -- that they have direct financial incentive to unconsciously bias the results -- still exists. This isn't anything against Arango's team, I'm sure they're lovely people, but even physics experimental results have been shown to be biased when the experimenter knew the testing hypothesis.

"I think I've been in the top 5% of my age cohort all my life in understanding the power of incentives, and all my life I've underestimated it. Never a year passes that I don't get some surprise that pushes my limit a little farther." -- Charlie Munger

Performance tests - especially those of databases - are a very complex and resource-consuming venture. And because each use case is different, the benchmark published somewhere does not fit your specific problem and the available environment/budget.

Unfortunately, there is no independent organization that believes in this and defines scenarios that are tested in different environments.

We just published an Update to the Benchmark, please find it here: https://news.ycombinator.com/item?id=16473117

Very good points. Seems like this leaves us with "trust benchmarks that probably aren't generalizable to your use-case" or "run your own expensive benchmarks before you have the scale to have the data to simulate the scale".

What's the best strategy here? I've defaulted to MySQL, pg, or sqlite, based on my preference of the moment. I haven't had enough good/bad outcomes to form a stronger opinion, but I'm now realizing I've never really thought this decision through (despite building a few data systems that I'm pretty proud of).

Or Feynman: "The first principle is that you must not fool yourself and you are the easiest person to fool."

True, but so does everybody else. You shouldn't trust vendor benchmarks, but that doesn't mean you can't read them and use them as sort of a heuristic to help you come to your own conclusions. It might not be particularly scientific, but it's probably more than accurate enough for 99% of people.

True, but that is because every major database is good enough for 99% of people even when run on the cheapest linode available ;).

I trust but verify. When dealing with Arango I’m Not openly hostile to their claims. I’ve made several libraries for it even though I don’t use it. Contrast this with Mongo. Their web scale carpet bombing campaign makes me not use them at all. I immediately dismiss them. That is how I found Arango during my booklet writing days.

Has any vendor ever produced a benchmark, where their solution wasn't the clear winner?

Arango isn’t the clear winner in all of these. Unconfigured PG did pretty good. PG with JSON and no indexes did well too. It also held its own using less memory. Finally, PG has transactions where as Arango doesn’t really if you have an index constraint on a document attribute. This shows Arango is better than Neo4j and Mongo for graph resulted service and document store. It skipped PG’s recursive capability that probably allows for basic graph traversal to a known depth.

Surely! They just didn't publish it. :)

you didn't even open the link. ArangoDB is not the clear winner in these results

ArangoDB is an incredible database with some very shy marketing imo - edge indexing over json documents is too good of a sweet spot.

Not sure if still actual, however among headaches I remember race conditions in the node driver(rocksdb engine) and maybe less straightforward clustering (compared to Couchbase/Cassandra/etc at least).

The race conditions have been eliminated and we greatly simplified the cluster setup. With this "new" tool shipped with arangodb https://github.com/arangodb-helper/arangodb it is just a single line on each machine to get the cluster up and running.

Props to you guys, outstanding product really.

Didn't know about this project. Their homepage made me want to try it out. Is AQL an invention they made ? reading their [examples] made me feel that this is a very welcome update on SQL. Congrats.

[examples] https://www.arangodb.com/why-arangodb/sql-aql-comparison/

It's curious that single purpose Graph DBs like Neo4j perform poorly at some typical graph queries. Or am I getting something wrong?

Hey Michael from ArangoDB here. I was also a bit surprised. Especially for the graph case. We see that more memory is consumed but the throughput did not really increase in comparision to the benchmark we conducted 3 years ago. Nevertheless we see improvements on the shortest path queries, these challenge the database more. As we are no dedicated Neo4J experts we are intrested in every configuration that will speed things up, please share them with us.

In what case does it make sense to benchmark these databases but leave MySQL/MariaDB/Percona out?

Disclaimer: I'm part of the ArangoDB team. As written in the post the whole benchmark is open source, and you are welcome to add other DBs. That is appreciated. It is just important that there is an official node.js driver and a GA version will be used.

Would be nice to domain/product experts to some tweaking and host this at a third party a la "The benchmark game"[1]. A nice little graph-oriented benchmark would be great, especially avoiding things like:


[ed: and:

"For comparison, we used three leading single-model database systems:... and PostgreSQL for relational database."

Jsonb with indexes doesn't rate as "document db", how about postgis for domain specific graph db?]

Some agreement on hw (virtual or otherwise), dataset and queries might make this actually rather useful.

[1] https://benchmarksgame.alioth.debian.org/

Think it's also a question of effort involved for testing even more stores

I was about to post the same thing, seems like a very obvious omission.

From the article, PostgreSQL still looks like the best option in general.

Unless you need easy clustering..

In that case, you can go for CockroachDB. ;)

You can use the same drivers for PostgreSQL to talk to CockroachDB. As specified here: https://www.cockroachlabs.com/docs/stable/frequently-asked-q...

"CockroachDB supports the PostgreSQL wire protocol, so you can use any available PostgreSQL client drivers."

Any particular reason why https://github.com/dgraph-io/dgraph missed the benchmark?

-- It seems to be highly capable (horizontal-scale/acid) & "performant". Reference vs Neo4j: https://blog.dgraph.io/post/benchmark-neo4j/

It doesn't seem like dgraph has an official Node driver, I think?

It is a lot of effort to build all the setup scripts, benchmark code etc.

Interesting, thanks for doing this, and sharing it so openly.

Do you maybe have the data for your graphs in csv or json? Want to plot my own graphs with it.

Personally I would call it Javascript Performance Benchmark for MongoDB, PostgreSQL, OrientDB, Neo4j and ArangoDB. Which is mentioned at the end of the article under the software section. To be truly a Database Performance Benchmark you would need to interface with the c/native library.

Can anybody recommend other database benchmarks?

I think the orkut dataset is already in a tab seperated format, to turn it into CSV you only need to add the column labels.

http://snap.stanford.edu/data/com-Orkut.html For an example you can have a look at the import scripts: https://github.com/weinberger/nosql-tests/blob/master/arango...

Sorry I should have been more clear. I want the raw result data he used to draw the graphs from.

I wanted to use high-charts to make it interactive, eg hide dbs to compare more easily.

I don't want to run my own tests, but I did have a look at the github repo and dataset.

Here you go: https://docs.google.com/spreadsheets/d/1_unaj2x_NCVNUHcxwMUV... Let me know if you have any question or need something else

Compare to a similar post from a couple years ago:


I don't know if its only me but the graphs are very hard to understand. Im viewing in a large desktop monitor and I cant imagine it would be in a mobile phone!

Is it fair to compare JSONB format for PostgreSQL in this benchmark as it is not the main highlight of PostgreSQL but more like an added feature on top of SQL?

Users asked for a JSONB comparison after the previous benchmark, so we included it. The tabular format is also there. Thus, I don't see how it would be unfair.

In-house performance stats are the least trustworthy thing on the planet. Are there any independent studies?

No, there aren't even standard benchmarks that cover all types of databases to compare performance etc. The ArangoDB benchmark is open source however. Feel free to test it yourself and tune as many parameters as you want. You can also add scripts for other database systems.

A benchmark made by a database vendor that shows that the database vendor performs well, if not the best, overall. Whether it is actually true or not, the result being published is entirely unsurprising.

Disclaimer: I'm part of the ArangoDB team. As written in the post the whole benchmark is open source. the idea is that you can run it on your own. Take the chance and get your own impression.

Wow, that looks like a lot of work. Nice effort.

Why is Neo4j in this comparison at all? It's a graph database and its use-case is very specifically different than relational or document databases.

Please compare apples to apples.

ArangoDB is also a graph database. Many of the benchmarks (shortest path, neighbors, etc.) are specifically geared towards graph databases. Neo4j is totally appropriate in this comparison.

Which DB operations included into the test is not something you also might have to perform with neo4j?

Much needed! Thank You

TLDR, just use postgres

This. I used to agonize over which database to use on a project. Now I just use postgres.

And if you want much nicer development tools and a database server that has real stored procedures, just use SQL Server.

Desperately trying to move from sql server to PostgreSQL.

Yeah I know, it’s hard because PG is missing a lot of features that I depend on. Like a profiler. LOL

That’s not the conclusion of this at all...

In most benchmarks Postgres with tabular data got the best results. If you need to store json or do graph queroes and don't want to create indices youself, then consider araango

And even then, should you be storing JSON? Is putting GraphQL into your datastore the right thing?

That depends. If you know that your data contains a highly irregular, non-exhaustive list of extra attributes that have no relations to other parts of the data model, a JSON column might be the right choice. It's rarely the best choice, but I can totally see applications where it is.

I use JSON for data that needs structure, but doesn't need to be relational. That data is then returned with the rest of the row in a SELECT. I will use JSONB for the same reason, and I also need to query that data once in awhile.

JSONB is also very useful for HTTP JSON APIs because you can have the database construct the JSON response and the API server just returns it. I found it to be faster than constructing the JSON response in the API server most of the time. Of course, it depends on the use case whether you can do that.

There's no valid conclusion if they don't use the strengths of the tested products. They didn't even use indices, for example...

Hahahaha this makes me use PostgreSQL even more :D

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact