I've had multiple instances of production data corruption with MongoDB. One of them unrecoverable using normal tools (as in, exporting data or attempting to replicate it just outright failed). Luckily that was a non-critical system, but if it hadn't been, it would have been a disaster. Hundreds of gigabytes of documents were unrecoverable, at least by our database folks.
The bigger issue I've seen was that some major ORMs were relying on some (apparently undefined?) behavior regarding the positional operator with nested collections of documents, and that behavior changed with a minor version. It's harder to blame MongoDB for that one, but the data corruption was cancerous – it multiplied on every document save.
I was surprised to learn that MongoDB can have multiple of the same key in the same document. So you might have a "map" that looks like this:
{"a": 1, "a": 2, "b": 3}
"a" maps to 2 different values in this case. Depending on how your languages driver works it's often difficult to see this is happening. In some languages you'll get "a" mapped to 1, and other languages will show "a" mapped to 2.
This article is very old (in internet years) and Mongo moves very fast. So most of the things in the article are way, way out of date. Very smart people working at Mongo understand where they need to get and how to get there, and they are well along the path. Bad old code is being replaced with better new code, limited mainly by backward compatibility bother.
It is still the case (and will remain for a long time -- also internet years) that to get reliable behavior, you need to use non-default options. Queries need a variety of extra flags that say, "please make this write durable", or "please be sure to give me the right answer". And, when you use the flags, they reliably make the DB slower. If you use enough of them to get ACIDy distributed reliability, it's hardly faster than other DBs.
So, in another two or three years, it will be an OK choice. In the meantime, PostgreSQL is arguably the better choice. And, of course, PG is not sitting still, so may still be a better choice in two or three years.
But most people using MongoDB don't care, because the overwhelming majority of uses of data storage just don't matter very much. When you see the list "people who bought this gewgaw also looked at this lot" on the bottom of a page, chances are that's Mongo telling you, and who cares if it's right? What matters is you didn't have to wait an extra ten seconds for the stupid page to load.
Still! Every time you order from Cisco, the order goes through a Mongo store. Since they have been using Mongo for a long time, they either make durability in their application code -- which, dangerous secret, is much easier than doing it in the DB -- or just aren't too arsed about it. (I'm betting it's the latter.)
I guess I just don’t get why you’d use this over Postgres if it has such a heritage of unreliable behavior and still doesn’t give you actual performance benefits. If you need a document store use Couchbase, CouchDB 2.0 or Postgres blobs. If you need a cache use memcache. Why Mongo, especially after the latest licensing headache?
If you need a document-based approach there are a few options you might want to consider (and certainly not table-based relational db).
I've migrated part of our project to Couchbase (not CouchDB or others) and beside a technical comparison... I've appreciated the ease of onboarding for new hires to this technology, scalability and maintainability... all aspects that become critical in the long run. If your project involves mobile clients, I'm sure your devs will appreciate the easy to use SDKs packaged with features (I personally loved the fine grained offline sync with backend and/or between two local DBs)
All the other points are interesting, but I'm really tired of people bringing up this fact as a slam against NoSQL databases. Not everyone needs ACID compliance.
Not being that is one thing, but (at least in the past?) they implemented updates as non-atomic delete and insert, which is just crazy bad. It's one thing not to be ACID but having a query not return some entry at all because it's being concurrently updated is... an interesting design choice.
I think that's because MongoDB makes far more claims around performance around big data than CouchDB does. CouchDB has always focused a lot more on features like having a built in REST-like API, replication, Fauxton, fault-tolerance, etc. There was also a lull where CouchDB wasn't seeing very much development, but that's changed since before the arrival of version 2.0. I wish CouchDB would see a little more popularity than it has, as I've found it(along with PouchDB) to be handy in a lot of projects.
It is ACID if you add all the query flags to say you want that. The majority of users don't, because it costs performance, which is in many / most? applications more important than correctness.
I'd definitely recommend Couchbase, I feel it's a more robust and feature complete NoSQL document store for most use cases. It's easier to set up, and write performance is much higher than MongoDB in most configurations due to the master-master architecture. The N1QL query language is the most robust NoSQL query language out there, and it's SQL based so very easy to use. Additionally, in the last year, Couchbase has added a complete Full-Text Search system as well as Analytics, which is SQL-based MPP analytics querying with no ETL (think built-in Hive/Hadoop on JSON documents).
Has anyone had any success migrating from MongoDB over to Postgres? My team and I decided to try out MongoDB for a new app we wrote because the data model seemed to fit the idea of a schema-less document quite well. But many months later, Mongo has caused a few more headaches than its been worth and we are slowly leaning towards using a more traditional data model.
I was thinking that one approach might be first to just keep the schema-less data model and move over to using Postgres JSON. Once that is stable, then think about how we would move over to using schemas. Any thoughts on whether that is a plausible approach?
Would recommend straight into schema if you have the time- but if you must do it incrementally, then yea this will work. The JSONB columns are extremely powerful.
The company I worked for until recently maintains a core data store in Mongo, but mirrors the content into a Postgres database for use by other, newer applications (which don’t modify the data).
We built a tool which streams the MongoDB Oplog into tables containing a JSONB column for the document - https://github.com/mudge/oplogjam - and it has been pretty good to work with. Sefilitely am approach than can work well in my experience, though!
I actually evaluated MongoDB before I decided to go with Couchbase. Mainly because of the following reasons.
1. It was a pain to manage the shards and the 'mongos' processes would be hotspots and SPOFs in the system. No wonder there were ways around it, but Couchbase did it right. With Couchbase, sharding was maintenance free and the client was aware of the cluster topology and which shards lived where. Rebalancing was a breeze and you could even pause and resume it if required.
2. We didn't want to introduce MongoDB and then having to setup a separate cache like Redis in the front. That would be too many moving parts. With Couchbase we got a built in cache that was managed as part of the solution, a huge plus point for us.
3. Couchbase offered a great road map, at the time we ran the evaluation N1QL wasn't GA yet, but it offered great promise and look where it has come today.
4. The official Kuberbetes operator offers a great way to run and scale Couchbase on any public or private cloud. It encompasses a lot of operational knowledge about the product and is a massive plus.
4. The cluster management for Couchbase is totally based on a REST API that is open. The standard web UI makes use of it too. The REST API offers the best way to automate administrative tasks and allows you to create your own customized monitoring arrangements.
So, above is a brief roundup of why we made an educated decision not to use MongoDB in favor of Couchbase.
1. Managing shards and the mongos router layer should not be dubious, IMHO. Additionally - there should not be a SPOF if you've designed the architecture to leverage the native scalability aspects of MongoDB. That said, if you struggle with management, you can certainly leverage an online service that greatly simplifies not only scalability but security, manageability, and availability as well.
2. I don't know the read/write profile of your app but based on the hundreds of apps I've helped to configure and size, I'd be inclined to suggest that you don't need a Redis cache if you're configured properly.
3. Interested in that roadmap... haven't seen it... what specifically interested you?
4. MongoDB has similar available... happy to share that with you.
You're certainly free to choose whatever solution you believe works best for your use case. However, the points you've made don't appear to provide compelling arguments for Couchbase over MongoDB.
Precisely, you made my point when you acknowledged that MongoDB allows me to get a critical thing so wrong. And if I get it wrong, the pain afterwards is so intense when you go back to fixing it. Thanks.
I forgot to mention, Couchbase also guards me against the rogue DBA situation by ensuring field level encryption under application control. A huge plus.
No. Replication still had corruption when I was dealing with WiredTiger at scale. They kept claiming to have fixed the corruption problems in each release. After that a few times, I wasn't willing to trust the claims, so I wouldn't use Mongo anywhere that I couldn't just as well use a JSON file as my data store.
TLDR All databases corrupt data, and mongodb is reasonably performant, and also easier to scale than most.
I have had instances of data corruption in production servers with both OracleSQL and MySQL. Several sql databases take forever to fix bugs (e.g. the crappy handshake in MySQL took 5y to get fixed).
MongoDB is actually the best database I have ever used in production, in question of both performance AND scalability. So I call your bluff there, no data, no fact.
> .. has an atrociously poor response time to security issues - it took them two years to patch an insecure default configuration that would expose all of your data to anybody who asked, without authentication
I think that points a little unfair, yes a lot of users were compromised but none of them bothered to RTFM.
When I was learning Mongo and got a link to access the database, the literal first thing I asked was well how do I secure it? Taking the time to actually learn the fundamentals of a (then) new technology is never wasted!
The bigger issue I've seen was that some major ORMs were relying on some (apparently undefined?) behavior regarding the positional operator with nested collections of documents, and that behavior changed with a minor version. It's harder to blame MongoDB for that one, but the data corruption was cancerous – it multiplied on every document save.