Every time I've bet against Postgres and used some other data storage mechanism, I always come back to Postgres.
Yes, ArangoDB is young (6 years) in comparison to PostgreSQL (30 years).
Yes, PostgreSQL is a phantastic database with an amazing open source community and is not going away any time soon.
Yes, PostgreSQL is a good choice for a project in which you need a relational single server database.
However, ArangoDB actually has a different value proposition, so in a sense, it is not a direct competitor.
ArangoDB is native multi-model, which means it is a document store (JSON), a graph database and a key/value store, all in one engine and with a uniform query language which supports all three data models and allows to mix them, even in a single query.
Furthermore, ArangoDB is designed as a fault-tolerant distributed and scalable system.
Finally, we do our best to make this distributed system devops friendly with good tooling and k8s integration.
Last but not least, ArangoDB is backed by a company which offers professional support.
Therefore any well informed decision for a project needs to look at the value propositions and capabilities and not only at the age and experience, which is of course a big argument, since people are - rightfully so - conservative with their databases.
What is the consistency model and have you validated that it actually works as designed (for example with Jepsen)? I didn't find anything detailed on your website.
A huge proportion of modern software businesses can run on single-writer RDBMS instances, properly engineered, at a fraction of the operational and implementation cost of a scale-out solution. That applies to hosted and self-managed solutions equally, in my experience.
Modern distributed databases scale better but also have better replication, high-availability with automatic failover and no downtime, easier upgrades, easier backups, and generally less maintenance. Removing the single-point of failure with efficient distribution while being able to run easily on docker/kubernetes makes a big difference over a single monolithic database server.
You might also end up with a LOT of servers unless you starting sharing servers for different shards (but still with different sets of 3 servers for every shard).
Overall, I agree -- PG with JsonB is extraordinary powerful system, covering document oriented and typical row-oriented use cases.
PGs ecosystem with external foreign interfaces, and in memory streaming solutions (like PipelineDB) - must be an envy of any other db ecosystem.
1. Unify JSON and SQL syntax, instead of user->>'email' make it user.email
2. Support sub tables, so I can decide if I want a sub table (schema) or JSONB (schema-less)
We have tens of thousands of connections and no problems. We also forked pgbouncer to use multicore  which allows us properly utilize servers.
It's just an external connection pooler when your native language driver doesn't have a good one
Not sure what solution are you imagining would be the right one.
Most distributed systems have eventual consistency. Very few can achieve strong consistency. Postgres sacrifices a lot for a strong consistency and predictable performance.
I'm not saying that PG has the best model, but for the wast majority of users, out of connections is just not an issue. Most clients have now direct pooling and they are very easy to setup. There are a few who need more connections, but there are poolers for it.
What you gain but separating connections is much easier monitoring and debugging process. You can examine each connection, its impact on resources, state, what locks they need to acquired, ...
Additionally, because each connection opens a file descriptor, you gain a lot of operating security from the underlying OS and its kernel.
PG contributors spent almost 3 decades building on this system. I'm pretty sure the gain would be so minuscule in comparison to the effort that would have to be put in to rebuild the connection model.
I for one would appreciate server polling built in, better HA including simple discoverability, handoffs, ...
Fundamentally just needs isolation to be done at a different layer.
You definitely need to think about pooling fro the beginning of the design of your application.
Connection pools are a necessary work around for a specific design decision with the system software. They can be helpful other problems as well, but they aren't necessarily a requirement.
That to me is a compelling reason to use something other than Postgres.
One solution is to have a generic reference property paired with another indicating which table to reference
Another solution you can look into, which I don't have much experience with myself, is table inheritance:
Another form of modeling the data in such a manner is called Exclusive Arc, which is that you just simply put all of the possible keys you might reference on the table, and add a foreign key on all of them. Then, when you need to make a reference, you just leave all but the one in use on that row as null values. If a "like" can go on a "post" or a "comment" or an "image", you would just have all 3 foreign key columns/constraints on the "like" table.
And lastly, and probably best of all, is to simply use 1-2 tables for your entire data model for vertices/edges, and just treat Postgres as a graph database.
> You can still benefit from indexes for joins in the same way that you would with actual foreign key constraints, but the downside with this is that you can't actually apply a foreign key constraint.
Thanks, I had not thought JOINs possible on this - how would a query look? I'm guessing it works only for one type of reference_table.
> Another solution you can look into, which I don't have much experience with myself, is table inheritance: https://stackoverflow.com/questions/3074535/when-to-use-inhe....
Been there, tried it, banged my head on most of these (particularly the FK limitation): https://www.postgresql.org/docs/11/ddl-inherit.html :)
If it was fixed by Postgres it'd be awesome. I've lost the wiki page which was around on the topic, but it was unchanged for ~10 years, so really should've been marked 'wontfix'.
> Another form of modeling the data in such a manner is called Exclusive Arc,
> And lastly, and probably best of all, is to simply use 1-2 tables for your entire data model for vertices/edges, and just treat Postgres as a graph database.
That's what I'm doing, with the 'node' table using JSON to store the node-specific data. I feel sooooooo dirty but it works. Yet to figure out schema evolution of the JSON in a controlled manner :)
So this is for the structure with the two columns
// Get comments and their respective likes
SELECT * from comments
JOIN likes ON comments.id = likes.reference_id;
// Get posts and their respective likes
SELECT * from posts
JOIN likes ON posts.id = likes.reference_id;
// Get likes and their referenced comments
SELECT * from likes
JOIN comments ON comments.id = likes.reference_id;
I was pleasantly surprised with ArangoDB. It was really user-friendly and I liked how easy it was to setup when compared to other multi-model databases. Definitely consider it for a hobby project.
Can anyone here speak about their experience using ArangoDb in a multi tenant SaaS product? How is it to manage your own cluster, backups, etc?
After I left, they came to really regret it. My team was working at scale, and basically found themselves doing QA work, trouble shooting with the Arango team. To their credit, the Arango crew was extremely responsive and helpful. Maybe they've fixed things up since then; it's been a year and a half.
At this point, I would hard pass on any database whose name doesn't start with PostgreSQL. Just got burned too hard.
Graph databases ostensibly let you write queries that would otherwise be unwieldy, but it turns out PostgreSQL's `recursive` keyword lets you achieve roughly the same things, sans having to learn a whole new query language.
I kept in touch with one of the directors, and after I left he mentioned a couple of things - they found that doing joins returning large amounts of data (maybe 10k records, IIRC) was prohibitively slow. They also found that under certain conditions, with a certain amount of data in the database, it would crash. He didn't ever describe the conditions.
They switched to couchbase, and have reported being happy with it.
I think there have been various issues with the cluster stability 1.5 years ago, and since then we have put great efforts into making the database much more robust and faster. Many man-years have been dedicated to this since 2017.
1.5 years ago we were shipping release 3.1, which is out of service already. Since then, we have released
* ArangoDB 3.2: this release provided the RocksDB storage engine, which improves parallelism and memory management compared to our traditional mostly-memory storage engine
* ArangoDB 3.3: with a new deployment mode (active failover), plus ease-of-use and replication improvements (e.g. cross-datacenter replication)
* ArangoDB 3.4: latest release, for which we put great emphasis on performance improvements, namely for the RocksDB storage engine (which now also is the default engine in ArangoDB)
In all of the above releases we also worked on improving AQL query execution plans, in order to make queries perform faster in both single server and cluster deployments. Working on the query optimizer and query execution plan improvements is obviously a never-ending task, and not only did we achieved a lot here since 2017, but we still have a lot of ideas for further improvements in this area. So there are more improvements to be expected for the following releases.
All that said, I think it is clear now that my intention is to show that things should have improved a lot compared to the situation 1.5 y ago, and that we will always be working hard to make ArangoDB a better product.
Or you could run GraphQL on top of PostgreSQL. Just because you can write anything in SQL doesn’t mean you should.
GraphQL is called that because it was created as a query language for Facebook's "social graph". It actually doesn't provide any graph operations or recursion (i.e. you explicitly tell it how many levels deep you want to go).
You can provide a GraphQL interface on top of any backend or database, though.
If there was an integrated web server, but fully decoupled to not block the database, then there would probably be no real benefit over running a separate web server on the side.
The built-in UI is really nice.
If you use documents and graphs, take a look.
The comments about Postgres and database XX vs ArangoDB are always the same, this is a choice depending on the use-case, if you have a vague scenario, ArangoDB has good performance on a wide array of use cases.
On a personal note, the main downside is not having any ORM for Golang for example, Node.js doesn't have any worth considering too.
Python seems to have arango-orm, which makes it simpler for small projects to integrate it.
Another improvement could be the graph visualization, simplifying the setup to use another solution would be nice.