I was using Raven around ver 1-3. Even it was single node and simple app, we observed stale reads and lost writes so had to eventually migrate to SQL Server. It was really weird reading claims from Oren (expert in .NET space, famous from his great work on Nhibernate and few other frameworks), where his db didn't work as advertised at all (back then build with Esent key/value store + full text search for map/reduce, think it was lucene.net - obviously very broken tech for this purpose). Too bad, was really hoped things were fixed by now, Db has really good programming APIs.
Interesting trivia: there's "raven db done right" - https://martendb.io/ , just an API wrapper around PSQL. Named Marten because thats a natural enemy of ravens:)
We have abandonded raven because server tends to go into some "repair" state for hours eating 16+ Gb of memory. Multiple times on production. It were easier to migrate to posgres, then try to fix ravendb
As database builders and users, we’ve made talking about systems a lot harder on ourselves by conflating the ideas of replication, active-active, atomic commitment, and concurrency control.
- Replication is a technique used to achieve higher availability and durability than a single node can offer, by making multiple copies of the data. Techniques include Paxos, Raft, chain replication, quorum protocols, etc.
- Active-active means that transactions can run against multiple different replicas at the same time, while still achieving the desired level of isolation and consistency.
- Atomic commitment is a technique used in sharded/partitioned databases (which themselves exist to scale throughput or size beyond the capabilities of a single machine) to allow transactions to be atomically (“all or nothing”) committed across multiple shards (and allow one or more shards to vote “nah, let’s not commit this”). 2 phase commit (2PC) is the classic technique.
- Concurrency control is a set of techniques to implement isolation, which is needed in any database that allows concurrent sessions (single node or multi-node). Classic techniques include 2PL and OCC, but many exist.
When vendors or projects answer concurrency control questions with replication answers (which appears to be the case here), it’s worth diving deeper into those answers. There are cases where “Paxos” or “Raft” might be answers to atomic commitment or even concurrency control questions, but at best they are very partial answers and building blocks of a larger protocol. Databases that only support “single shot”/predeclared transactions can get away without a lot of concurrency control, for example, and might be able to do the required work as part of their state machine replication protocol. In general, I'd see using words like "Paxos" and "Raft" in the marketing for a database as a negative sign. It's not a fully reliable one, but it's often the least interesting part of the implementation and the choices the database is making.
To be extra clear, I’m not criticizing Aphyr here (the article clearly doesn’t conflate these concepts), but more pointing out what I think lies at the bottom of a lot of the issues we see with distributed database claims.
I think Aphyr helped a lot by creating this useful resource: https://jepsen.io/consistency which presents a clear classification of consistency models. I am not sure if talking about anything else in the context of distributed databases is reasonable.
That is one way (and a good one) of classifying consistency models and there relationship to isolation levels. But it's an incomplete one (e.g. there are linearizable variants of snapshot and repeatable read that exist that are not captured there). I'm a big fan of that stuff (and Aphyr's work in general), but that page is the beginning of a conversation and not the end of one.
You'll find many of those additional variants (e.g. strong session SI) in the linked papers, and they're cited extensively in the Jepsen reports as well. I just haven't had time to write up every single model--tried to stick to the major ones. :-)
> I am not sure if talking about anything else in the context of distributed databases is reasonable.
There's a whole world in distributed databases, and I suspect you'd agree that there's a lot of stuff worth talking about that isn't covered in your (excellent) work.
I love Jepsen, but it seriously worries me how bad software turns out to be, and how many outrageous claims companies make that turn out to be so easily proved false. Should there be more serious penalties when companies make claims which turn out to be false as soon as they are tested? I think there should be.
Aviation isn't perfect; nothing implemented by large groups of fallible humans with budget constraints will be. But it has one of the best safety track records of any industry.
Now, why won't aviation style engineering be applied in other fields, like databases? Well, because no one really cares enough. No one dies if some random database used by some ad platform loses the occasional transaction. Yeah, it's frustrating to engineers who are trying to build reliable systems, but in the grand scheme of things losing a few percent of transactions isn't the end of the world for most businesses.
You get safety cultures like that in aviation because there are real, substantial risks, so you need to have thorough engineering discipline, properly designed redundancy, etc.
For databases that are used for the majority of the business world, efficiency is generally a bigger concern than correctness; they'd rather have cheap and fast databases that lose a few transactions occasionally than something that actually provides consistency. But of course everyone thinks they need consistency, so it's advertised as a selling point while not actually being provided in practice.
> For databases that are used for the majority of the business world, efficiency is generally a bigger concern than correctness; they'd rather have cheap and fast databases that lose a few transactions occasionally than something that actually provides consistency
'majority' is a strong claim. Any figures to back it up?
No, no figures to back it up. This is just based on anecdotal experience, having never seen very many businesses that actually prioritize consistency or testing or validating it. The fact that Kyle keeps on finding these massive problems in popular databases is part of that anecdotal evidence.
Obviously people want correctness; they would like their databases to not randomly lose data. Hence the fact that it's a highly advertised feature.
But when it actually comes down to selecting a database, convenience and performance seem to be what people actually compare on, there are very few places that actually hire someone like Kyle to dig in and verify the the consistency claims about a database.
I'm positive people want their databases to be correct. There are databases which promise great speed in return for occasionally losing bits of your data, and they get very little use outside of special uses.
Exactly; people would like their databases to be correct and consistent, but generally don't care enough to actually do something like hiring someone like Kyle to verify it before buying a database. They just find something with the right features and performance, and go ahead and use it. You see a lot of them in this thread; people who used RavenDB for a while, and then had to migrate away because of issues. If the business actually cared enough about consistency, there would have been some kind of validation and verification before selecting a database.
I don't think any company explicitly claims their implementation handles X failure scenarios, certainly not in the EULA or license. You could possibly pick apart their documentation, but I think the warranty clause in most licenses would cover the product. This may not apply to regulated products dealing with health/safety/defense, and IANAL.
Either way, best to assume most companies are lying through their teeth about any feature until you or someone you trust has validated it.
When it comes to databases, that's when I get the most conservative in tech choices. Stick with the tried and tested approaches. Data/Metadata integrity is generally the single most important thing for whatever I'm working on.
I love when Jepsen's reports hit HN. I always learn a ton about databases from them. Kudos to the projects brave enough to put their claims to the literal test. Jepsen is the best in the biz.
They claim to run that test on any changes but in fact might not be testing anything at all according to this footnote in the Jepsen report:
> RavenDB’s Jepsen test may not have measured anything at all: at least in the most recent revision, the generator included no client operations of any kind.
Remember folks, if you can't get your test to fail by intentionally breaking the implementation, you don't have a test.
My fun RavenDB story: I briefly used it for an analytics (music royalty data, not advertising) solution somewhere late in the 1.x release series. Ayende (the initial RavenDB author) was/is an avid blogger, and made a really good case for their product in the .NET ecosystem.
It did not... go exactly as planned. Initial tests looked OK, but when I did testing with actual users, there were huge issues right away. Like: OK, I just ran your ingestion pipeline. What do you see? And the answer was 'well, nothing', or 'ehhm, a lot less than I expected'. These issues turned out to be pretty much impossible to fix: there were no real errors, but the data just seemed to... disappear randomly, even in a simple single-node cluster. I got community support involved in a bunch of particular issues, but nothing really helped: the aggregate numbers we got never added up to what they should be.
I then migrated the whole thing to a single SQLite database. That file is, as I write this, a good 2TB in size, and still performs as well as the day it was deployed and never had any unexplained-number issues, without any changes to the surrounding code. I did eventually move away from the .NET Entity Framework (as that did cause some rare, yet unexpected and hard-to-fix concurrency issues, but those were hard crashes and not silent data corruptions) to a hand-rolled entity mapper, but all has been good since then...
TL;DR: databases are very hard, and fashionable choices are not necessarily desirable.
I need a brain colonic after reading though just some of the mess of overhyped claims in RavenDB marketing and documentation. I appreciate Aphyr doing all this wonderful work and how some of the Raven claims triggered that work. I'd have hoped that anybody building a critical system would have read the mess of Raven documentation/claims/hype and run the other way.
> AP systems are known for availability, not safety;
I think in 99.9% of cases, you don't want AP. The P only matters when the network is more prone to go down than the machines. For example, if every node goes down, your AP design won't be available.
With the massive improvements in network and connectivity and increased redundancy, you should aim for CP.
If you really, really need AP, then a ground up design based on CRDTs seems the best, most discipline approach. With CRDT, you can have availability because the operations can be entirely local, and you know you can sync to the other nodes when available without conflict.
Completely agree. In-region (e.g. at the scale of a US state), CP seems like a clear winner. For more geo-distributed latency becomes challenging, and in environments like IoT and mobile unreliable connectivity becomes challenging. If you're there, you need a principled way of approaching AP (e.g. CRDTs or CALM https://arxiv.org/pdf/1901.01930.pdf).
It’s disappointing to me that the technologist desire to experiment with new DBs continually puts naive customers at correctness and durability risk they don’t (won’t) understand.
Even an unpaid report like this involves weeks of work: reading docs, designing tests, executing and refining them, filing issues, writing the report, and editing it. Each report costs thousands of dollars in hardware and and editing. I'd love to do an FDB test! It's just that between contracting, volunteer work, and research, I can only do so much.
What are your most tempting/daunting databases that you haven't got a chance to put through their paces?
And, a second question, if you step back and think about the various APIs you've had to use, have you personally developed favourite styles of API to use?
I'd love to do more work with predicates in general. That's an open research problem I've been noodling on for years. Pretty much any SQL DB would be a good candidate for that work!
I'm gonna be a weirdo and say I actually loved Fauna's FQL. A little Lisp-ish functional language for queries is a great way to interact with document-structured data. SQL is fantastic for sheer breadth, though its specification is a nightmare and actually writing portable SQL is real challenging. One of those places where a stronger spec and conformance tests would have really helped.
Thanks for the shoutout. At some point if you find yourself with some spare time you can check out our new FQL version. It's closer to JS in terms of syntax now, but still a small, relatively functional language.
Your reports have reached such a huge notoriety, i'm surprised you don't already have a dozen of people working for you fulltime on benchmarking techs for wealthy customers.
If you allow me : how many people you work with can actually perform those Jepsen Report ? Or is it only you ?
It's not a huge market. I usually have a queue of clients, but both deal flow and actual scheduling are wildly variable--sometimes I'll go without income for six months or more. I've considered hiring one or two people, but I couldn't offer the kind of stability people need from an employer. I do subcontract though! Editing, legal, finance, occasional code, that sort of thing.
There's a lot of folks out there who can do basic testing work with Jepsen. I've taught... I dunno, maybe a few hundred people directly in Jepsen workshops. A couple people have worked alongside me, and I'm sure lots more have learned from the docs online. Writing a report is a more involved problem--certainly not intractable, but for me it involves testing, experiment design, lots of reading, doc review, writing, editing, finding reviewers, and of course all the business stuff.
> There's a lot of folks out there who can do basic testing work with Jepsen. I've taught... I dunno, maybe a few hundred people directly in Jepsen workshops.
I don't know why all the DBMS vendors don't just have a guy on the QA team whose job is to run and interpret Jepsen tests for every new version. It's certainly a better option than eventually getting a damning report written by you.
Thanks, that's very interesting. I would have assumed that with today's world of distributed database being so common, and with the various technologies available, lots of people would be interested in hiring experts in ensuring database work as advertized. But i guess people trust the product documentation too much.
As an anecdote i was surprised to discover mongodb had a second life in the corporate world as a standard , certified technology to store critical documents. So yeah, maybe people aren't really that aware of the kinds of nasty gotchas that lure in their systems.
The unfortunate thing is .NET deserves to have a proper database written in pure C# because the language offers all the tools to achieve a really performant, safe and cross-platform implementation.
But RavenDB does not do it justice and uses unsafe in catastrophic amounts in places where it is not necessary or in ways which are straight up UB despite the fact that JIT/ILC is much more strict than GCC/LLVM. There have been multiple bug reports submitted to dotnet/runtime by RavenDB which required extensive debugging effort only to end up being an issue on RavenDBs end due to explicit misuse of unsafe APIs (in ways, I must reiterate, that have safe alternatives to achieve the same performance).
(if anyone's interested, I can later ask around/dig through issue history and give the references)
Healthy reminder that a pretty website and warm fuzzies all over do not make a distributed database actually work.
I witnessed RethinkDB losing to MongoDB in spite of being significantly better. I am now worried that FoundationDB isn't gaining popularity, even though it is arguably the best and most well-tested distributed database out there, with strict serializability (!) guarantees. But it doesn't have a shiny website and doesn't cause warm fuzzies, quite the opposite, it looks complex and intimidating. So it isn't popular.
This is worrying, but perhaps neither new nor surprising: we have a history of picking inferior solutions because the good ones looked too complex or intimidating (betamax vs VHS in video formats, ATM vs Ethernet in WANs).
I had understood FoundationDB to be more akin to a storage engine (e.g. a sub-component of a DBMS) than a full-on DBMS. Was I misunderstanding? If so I bet a lot of people have this understanding, going back to your point on the web site/general sentiment in the zeitgeist not necessarily reflecting what they are.
Can you share any more detail? Are you saying there are companies that build software on top of FoundationDB as their primary data store? or are those companies building software around FoundationDB that in turn presents as more of a data store in the traditional sense?
But yeah you're right for the most part. Turns out pretty much any database can be written in terms of transactions of KV pairs, which is what foundationdb gives you, so it means you can write your database query layer as a stateless, scalable service.
There have been attempts to write a SQL RDMS layer for it but it isn't maintained.
It's more of a database-building toolkit than a storage engine. What you get is a distributed KV store with a strict serializable consistency model (see https://jepsen.io/consistency) and very interesting versionstamp functionality.
What you do not get is a "query language" or indexing.
Yes, there are companies that use FoundationDB as their primary data store. It makes a lot of sense to integrate it directly with the application rather than go through additional layers and a "query language". I am working on adapting my app to use it, and so far very happy with the results.
I once joined a team which used RavenDB and literally the highest priority work which they were working on at that time was migrating completely off RavenDB because they had lost all confidence in it.
I had never heard of it until today, but a quick Google tells you it’s a Mongo-like JSON database with multi master capabilities. If you really really want those (I’m not sure it’s a good idea), then this seems way better than PG.
If you're not interested in a science experiment, Cassandra (or Scylla) are the multi-master databases that are mainstream and proven to work and scale. They're not fun or sexy and their feature set is much smaller but they do what they say and they work. Or AWS/GCP/Azure will happily give you an API for one.
It’s funny you say this when they’ve both failed jepsen tests. And by fail, I mean there are scenarios where they do not behave as intended - this is not to suggest either are broken inherently.
For sure. Virtually every Jepsen test has found bugs even with extremely mature products like Postgres. But fundamentally Cassandra (and Scylla) and most other databases tested have fixed those bugs and improved their documentation with many incorporating Jepsen tests into their ongoing process.
That's different from how Raven has just wildly mistated their capabilities.
That database spent 3-4 years primarily focused on correctness from 2018-2022.
The industry moves fast, our memories are slow. But there are millions of instances of cassandra in production across most of the fortune 500, and half of this thread has never heard of RavenDB
"Failed Jepsen tests" is relative; Kyle tends to be quite thorough in his Jepsen tests, so can usually find some issue in even the best of databases.
The big difference is in how severe the difference between claims and reality are, and how the maintainers or vendors react to such bug reports.
In some cases, the maintainers or vendors will fix bugs, or update docs to be more clear. The different transaction isolated levels are complicated and nuanced, and there are standards which disagree with the general consensus in the literature, so there can easily be ambiguities that need to be cleared up or bugs that need to be fixed.
But then there are things like RavenDB; where they make clearly impossible claims like "ACID across multiple documents and multiple cluster nodes in an AP database." There is simply no way to achieve this.
And then there's how they respond to his findings. He filed bugs and had had responses a month ago, about things like "this thing that is documented to have transactional semantics does not have transactional semantics", and their response was just to say "oh, yeah, that's expected", and not fix anything or update any of their documentation to reflect that.
So, there's a big gulf between "this is a complex topic, and even some of the best systems have some issues if you test it thoroughly enough", and "the documentation is blatantly lying about transactions and consistency, and the CEO of the company doesn't think it's a problem."
The batched operations that are sent in the SaveChanges()
will complete transactionally. In other words, either all
changes are saved as a Single Atomic Transaction or none
of them are. So once SaveChanges returns successfully, it
is guaranteed that all changes are persisted to the database.
But the tests in this post show that no, there is no single atomic transaction for a session saved with SaveChanges, even in a single node database this will lose writes.
I dunno. If I paid for an ACID database, I'd expect, well, some ACID features, like the ability to run two transactions concurrently and have them be isolated. It looks like RavenDB is fundamentally not implementing even the most basic of its claims. This isn't some "oh, yeah, this is a complex problem, and there are a few bugs lurking in obscure corner cases", this is "it fundamentally doesn't support what they claim to support as the major front-page selling point."
Their front page selling point is "Fully transactional NoSQL database" which links to a page that says "A database without transactions is… well, not much of a database, and as far as transactions are concerned – ACID is the gold standard."
But then in response to these findings, they say that the thing that is documented to have transactional semantics doesn't actually have transactional semantics.
Ravendb has an interesting use case. It's basically as easy to use as mongo DB, similarly scalable and at the same time is acid, has a nicer query language and is more secure.
Iike it actually more than SQL, especially for typical LOb apps.
Interesting trivia: there's "raven db done right" - https://martendb.io/ , just an API wrapper around PSQL. Named Marten because thats a natural enemy of ravens:)