The RethinkDB retrospective contains a lot of insight into how MongoDB has succeeded despite being vastly inferior on a technical level back when it first launched. I have to admit them a certain respect for executing their strategy so successfully.
Every time MongoDB shipped a new release and people congratulated them on making improvements, I felt pangs of resentment. They’d announce they fixed the BKL, but really they’d get the granularity level down from a database to a collection. They’d add more operations, but instead of a composable interface that fits with the rest of the system, they’d simply bolt on one-off commands. They’d make sharding improvements, but it was obvious they were unwilling or unable to make even rudimentary data consistency guarantees.
But over time I learned to appreciate the wisdom of the crowds. MongoDB turned regular developers into heroes when people needed it, not years after the fact. It made data storage fast, and let people ship products quickly. And over time, MongoDB grew up. One by one, they fixed the issues with the architecture, and now it is an excellent product. It may not be as beautiful as we would have wanted, but it does the job, and it does it well.
I have no idea how capable MongoDB is these days, as I haven't used Mongo in years (and even then it was not for long).
However, I do not know any developers who, after living through the "hype first, features later" strategy, have been left with a positive enough opinion of MongoDB to ever want to use it again.
- You think people replace a MongoDB cluster by a single Posgres instance? You guys should really use HA, cluster in real life and stop reading reddit / HN and the hype behind PG, with 3.5M+ CCU no one would use an architecture with a single master / slave ( that's what pg is ).
MongoDB / MySQL have bad press by people that never used it in real life and just repeat what they read online.
I could tell you horror story about pg not have an official replication system until 2011 when pg 9.0 landed.
I could tell you a horror story that happened to me just few weeks ago, where MariaDB just corrupted data out of nowhere due to a bug. This happened multiple times and costed us multiple hours of work (including service being down) each time it happened until we realized the issue wasn't hardware but a software bug.
If you ask me, I take PostgreSQL approach of not having a broken replication before 2011 than MySQLs still corrupting data.
Data usually is the most valuable asset a company has.
And I would re-iterate that just because something isn't in mainline, doesn't mean it's not possible. Did you know that Pg didn't have native partitioning until Pg10? Somehow we managed to do partitioning before then.
I don't buy the argument that you need to ship broken features just to have them; Pg doesn't include it into base until it's a /good/ solution which is well engineered and has appropriate toggles. That is not a horror story.
> - You think people replace a MongoDB cluster by a single Posgres instance? You guys should really use HA, cluster in real life and stop reading reddit / HN and the hype behind PG, with 3.5M+ CCU no one would use an architecture with a single master / slave ( that's what pg is ).
I shipped a game which had similar CCUs (within the order of magnitude) and I can confirm that you can't do it with one postgresql machine, or.. actually you could but we chose to fsync() constantly to prevent corruption from ever happening and remove the RAID cache.. but you can shard on top of your database solution too.
"Our top focus right now is to ensure service availability. Our next steps are below:
Identify and resolve the root cause of our DB performance issues. We’ve flown Mongo experts on-site to analyze our DB and usage, as well as provide real-time support during heavy load on weekends."
How does that disagree with my post?
Nowhere in the original post they mention issues related to MongoDB itself it was probably bad design on their side.
Nowadays I would just use redis & cassandra if you need something beyond a collection of postgres instances. Most projects do not.
(Disclosure: I'm executive director of CNCF.)
That's not really what the article is saying, unless we interpret the following text differently.
"We’ve flown Mongo experts on-site to analyze our DB and usage, as well as provide real-time support during heavy load on weekends." -> "We have started to look into the problem together with experts" and not "Experts have tried and failed".
I can for example use jsonb_agg() and get a hierarchical response for 1:N joins. It returns JSON value even though neither of the columns contain JSON.
Previously in that scenario I would either need to make more than one query or get a response that has a lot of data repeated.
It is not professional or appropriate for vendors to be revealing (a) that clients are having issues and need support and (b) the specific workings of technologies or processes within the client's business.
So while "hype first" might reap a deservedly abundant and bitter harvest of developer hatred - it doesn't preclude evolving into a genuinely useful product...
I also remember it also had a lot of quirks and missing features prior to v8. I assume it was leftover cruft from Ingres, but I remember PostgreSQL v6 and v7 being unreasonably complicated to get configured just because the defaults were so off reality.
One thing you can say about PostgreSQL, though, is that it's developers don't rest on their heels. Every major release packs in a ton of new features. They've gone from being fairly low or middling on the feature set to being pretty near the top. Even point releases have me saying, "Wow, that's really nice to have."
Or maybe, probably managed postgres. Esp. In the mysql 3.x days and earlier.
I still want to like MongoDB, I still miss its style of query vs SQL, but I'd have a hard time advocating its use again...
Sometimes it's tempting to use it for projects that I know will remain small, but even then it's not worth the overhead of standing up a different DB when I have a perfectly good SQL server I can muddle through already.
It does require building and maintaining schemas in a different manner, but when you do that, it's pretty great to work with, especially when we're doing design driven development that consists of a lot of prototyping.
I'm a fan, but I'm a manager on business development and digitisation, so I may be a little sheltered from whatever annoyances it may cause in operations.
I’ve worked on quite a few multi-municipalitiy open source
projects, like handling employee refunds on driving.
Basically I drive x kilometers for a meeting, I get paid x and the taxman gets the report. Simple stuff.
Well in the 6 parties involved there were 6 ways to interpret tax laws, 4 different agreements with unions on what rates to pay, 3 different payment systems with 3 very different ways of taking the reported data from a flat file to a rest interface, at least one political decision to overrule tax laws for a certain set of employeees and several different ideas on how to host it and so single sign on, oh and 4 different ways to obtain employee data.
That’s for a simple system with basically 1 function. We have more than 350 it-systems.
Another example is in automation. We have a scanner software and we have an archiving system. They both have APIs but the APIs speak very differ languages. This meant that our local scanner people were tasked with distribution after they scanned things, a task taking several hours each week because putting files into many different areas of an archive sucks. What we did was ask the scanning company to build a QR reader into their software, and then we made a piece of software that put the archive recipient addresses into QR codes. We also made a MOX agent, that accepta the output of our scanning software and loads it into the archive through the API. So now the process of distributing is automated.
You can certainly run a municipality without developers, using standard software and outside hires, it’s just really expensive.
I guess our government should work on writing laws that are more friendly to digitisation and stop expecting IT to fix business practices that don't really make sense in the first place. There has been a genuine movement toward that, but it's slow because none of our top politicians or bureaucrats are from technical fields, and they operate on such a high strategic level that they're often rather far from the daily challenges in a daycare institution.
Local political leadership and bureaucracy could certainly do more to focus on corporation, standardisation and digital transformation, and they actually do, but political views differ and they change every 4 years, and the truth is that there just isn't any voter interest in IT unless it goes wrong.
We're trying to build national standards, we've had a set of architectural standards called Rammearkitekturen for a few yers now, but getting them implemented is slow. For one they're made by muniplicities and our structure of government is split in three. Muniplicities, Counties and the State and each branch has it's own ideas, leading to bureaucracy and political differences. Some want us to use EU standards, others want us to build our own, and even if we decided, there are different sets of EU standards as well as different sets of Danish standards.
I personally think the best we can do is try to use whatever national standards are in favour, and build smaller applications on them, with open API's, and run everything as SaaS in infrastructures such as AWS or Azure. I also think we should do a lot more work on business development, modifying business practices before we throw IT at something.
But it's complicated and it's on a giant scale where even minor changes take years to implement
The second time I used MongoDB to automatically track templated email bodies that were being delivered through a third party mail platform. We had dozens of recurring templates and many more one-off templates for different curated campaigns. If somebody complained that a link or image or token was wrong in their email, we wanted to be able to look back at the history to see if the problem was in the template data or potentially a client issue on their side. Most of the queries were ad-hoc and not very performance-sensitive. This was where a flexible JSON document format came in handy. Modern Postgres would have worked well for it too, but that wasn't available in the company at the time. With MongoDB I got good flexibility, adequate speed, and I avoided reinventing wheels by not trying to shoehorn the data into another MySQL table. I was able to solve a customer support pain point in less than a week and the system has worked well for nearly 4 years now.
I'd be really frustrated if I had to use MongoDB as my only data store. I would guess that much of the hate for it comes from people who were forced into that position, or maybe from people who didn't take its documented limitations seriously enough before productionizing its use.
But every time I see a team transitioning from Mongo to something else, they transition to a relational database. May be their problem is not with MongoDB, but that their data is relational after all?
Personally, I'd take a relational db over NoSQL for most of my needs, but all these stories don't really say anything about how Mongo compares to other NoSQL databases.
As your data grows though you realize that your application become more and more complex. A single query might translate to multiple queries to the database, you need to handle scenarios where fields might not exist etc.
With relational data you might have more work at front, but then the database solves many of the problems for you.
As another person said, when you're using databases like MongoDB you're going back in time and reliving the history, because databases in the past looked a lot like that before Codd invented the relational model, for example .
Also the whole NoSQL thing seems to be cyclical, we had XML databases in early 2000s.
One of the biggest problem with relational DBs is that once you decide on a schema, if it's the wrong one, you're gonna be in a lot of pain. Which makes a NoSQL DB a great fit for an early stage product where you are still figuring out what your product needs to do and contain. Once you have some more experience with it, and have a better understanding of your data, it's far easier to build the correct relationships.
Not really, converting to relational data is quite a bit of work.
Actually the reverse is the correct approach. You start with normalized data, when there's a bottleneck you start denormalizing it, if that's still not enough you move /subset/ of data to NoSQL database.
> One of the biggest problem with relational DBs is that once you decide on a schema, if it's the wrong one, you're gonna be in a lot of pain.
Not really from my experience all migrations were done through SQL. Also if multiple people (who understand relational databases) come with a schema they pretty much will arrive to the same normalized result.
They are repeating the discoveries that people made in the 1970s about storing data in flat files vs. relational models.
Know history or be doomed to repeat it and all that.
A new crop of developers is, always, just an year away though. I feel future adoption depends a lot on how well-suited the tools are for younger devs. That's where MongoDB found the initial audience!
I was going to say that I won't believe that it is on its way to being a decent database until after an article appears on https://aphyr.com/tags/jepsen saying that MongoDB actually delivers on what it claims.
So I looked for the most recent analysis of MongoDB and found https://jepsen.io/analyses/mongodb-3-4-0-rc3. I still want to see verification of the latest release, and hear battle stories from it in production. But I'm provisionally optimistic that a lot of the glaring "it is a pile of shit that doesn't work when the chips are down" issues are now addressed.
That said, I bet that it will be many years before most people who got burned by MongoDB ever rethink their attitudes about it. Once burned, twice shy. And it really was an overhyped steaming pile of shit for a very long time.
> was an overhyped steaming pile of shit for a very long time
No it wasn't. This is something you heard from people who never really used it. It had its faults but it was never a pile of shit nor was it substantially worse than other databases.
I used it at a previous job. Project to move a multi-tera dataset from an Oracle box (24 CPUs, 24G RAM, SAN) to a MongoDB cluster (10 boxes, each with 48 cores, 96G RAM and internal SSD). MongoDB couldn't perform for shit, and it couldn't stay up in a usable state for more than a few hours at a time. This is with 20x the processors and 40x the memory of the system it was replacing. It's a complete joke of a product, sold on the basis of outright lies as far as what they told us and what it could actually do. Having been that badly burned I consider it an act of selfless public service to warn people off it.
If you're just using it for a personal blog that gets 10 views a day, sure it might be barely adequate for that. But I'd still use Postgres.
But for those of us that had document orientated data models it allowed for performance that was orders of magnitude faster than any SQL database.
But read https://aphyr.com/posts/284-call-me-maybe-mongodb and https://aphyr.com/posts/284-call-me-maybe-mongodb for an idea of how the promises in MongoDB documentation compared to the reality of the software under stress. And it wasn't just hypothetical either - there are plenty of horror stories floating around from people who ran into those problems in production for uses cases that were supposed to be a fit for MongoDB.
And the performance argument didn't hold water either. As benchmarks like https://www.enterprisedb.com/node/3441 showed, decent relational databases consistently beat MongoDB on the same hardware. Yes, lots of people rewrote bad relational models and saw performance improve. But apples to apples, writing an application against a relational databases in the same way you would against MongoDB resulted in a win for the relational database.
So yes, there were lots of people saying exactly what you are saying now. But the ones who actually tested their systems and ran performance tests came to a very, very different conclusion.
That EnterpriseDB link is completely ridiculous. Firstly, it predates WiredTiger which replaced the entire storage layer. Secondly, doing one for one comparisons with relational systems doesn't make sense. MongoDB is a document database. Compare it with other document databases.
And he's right. As pages like http://latencytipoftheday.blogspot.com/2014/06/latencytipoft... make clear, we have a lot of calls back to the application happening. Users will notice the occasional slow load surprisingly quickly, and it is worth a lot to get rid of them.
So even your chosen source agrees. A relational database is not orders of magnitude slower. In fact, a relational database is probably a better fit.
Financial time series data is exactly one of the use cases Mongo claimed to be for. Seems you’re the one who can’t tell good engineering practice from bad. And yes, they also pitched themselves as a direct replacement for Oracle. That was highly disingenuous.
This is FUD, I have used mongodb, I have a certification in mongodb even.
Unless you know precisely what you're doing it's very easy to burn yourself. And mongo markets itself as being "easy to use out of the box" this is not a good thing to do.
I consider MySQL defaults to be unsafe, (as in, it used to corrupt data silently) but it's a godsend compared to the data consistency in mongodb.
There are countless promises it fails to deliver on too, I will not, ever, recommend it for a project. However in recent months I've heard it got better- This means I will stop deriding developers who now use it. But it does not mean I will be realistically allowing its use in the environments I work in. I tend to care about the data consistency in those.
Most people that “get it right” the first time around do not get any recognition whatsoever.
It is the people that screw up, release with big flaws that the customer then pressures the company about, that are heralded as heroes and bacon savers when the fix those flaws. After 3 years and as many releases.
Nobody cares about people who are healthy all their life. But someone who suddenly realizes they need to eat better and exercise, and they do, they are applauded. They are defended too, if they go back to old ways. And so on...
I've heard similar complaints before. And, I get it, too-- at a glance, that person is playing the "superhero" by saving the project. But, good management will insist on root causing failures where this will unravel. If it's a recurring problem, you should bring it up with management.
Developers complain about management but tend to forget that managers are people just like everyone else, and we need to apply some skill to our interactions if we are to get the results we desire.
Do you have any resources you've found helpful improving your skill at this?
You can have management that understand tech who will get to the bottom of the problem and you can have management who don't understand tech. They won't.
Management who don't understand tech will either keep somebody on hand who they know and trust who does understand tech (e.g. a consultant) or, more likely, they'll just keep rewarding the faux superheroes who keep screwing up and bailing themselves out.
There's going to be a ton of survivorship bias even with them. It just goes to show that big marketing budgets are such a competitive advantage that can outweigh not actually being any good.
I'd seriously like somebody with a passing knowledge of data integrity who believes the tech industry is meritocratic to explain what they think the success of mongo is all about.
And since PostgreSQL fills that niche very well (correctness + real ACID + extensibility + decent performance), maybe it was really PostgreSQL who killed RethinkDB?
MongoDB has identified a real pain point: many developers don't like to use SQL to interface with a transactional database. I'm not going into the merits of SQL vs. NoSQL, I'm just stating that it's clear there's a need or they wouldn't have gotten any traction.
Now they are maturing the product to the point it might be a safe bet for some use cases, it remains to be seen if their approach to product development will pay dividends or the reputation they have created for themselves has created a time bomb that will eventually kill them.
PG has astonishing feature throughput. With each yearly release, they add 1-3 wow features, 6-10 major features, and countless smaller features still worthy of the release notes.
That's really, really impressive for any database, commercial or otherwise.
There's a perception that postgres is slow to add features because sometimes the feature latency is high. The reason for that is they build a solid foundation first, and slowly build multiple major features on top of that foundation. Consider replication:
1. Write ahead log (WAL)
2. WAL archiving
3. Warm standby
4. Hot standby + Streaming replication
5. Synchronous replication
6. Logical decoding of WAL
7. Logical replication
That's a lot of engineering work there, but they delivered value to users at each stage along the way. And during this time, they did a ton of other stuff -- did you notice that we got parallel query along the way? And logical table partitioning came along too, which means the parallel query can now do partition-wise parallel joins.
Not to mention all of the SQL features and tons and tons of other stuff.
Postgres has kept the lights on for a lot of companies for a long time. I absolutely reject the idea that good engineering is at odds with business success.
I don't think they're at odds, per se, but having been around through the original dotcom bubble, PostgreSQL (or "Postgres95," as I'm pretty sure it was still called when I was introduced to it!) was mostly known to, well, database nerds for at least the first decade of its life. One person's "solid foundation" is another person's "technically correct but practically crawling" -- a perception that, rightly or wrongly, PostgreSQL fought against for a very long time. And I think that's what OP was trying to get at: if PostgreSQL was being developed primarily by a single VC-funded company, they just might not have had the luxury to spend years building that solid foundation.
(I'll allow that as an ex-RethinkDBer, I may have some bias here: I loved many things about the product and especially about the product, but it's hard not to suspect we should have focused on speed and, y'know, revenue earlier than we did.)
Not sure what you mean by 5) though.
Anyway, replciation is strong point of MongoDB with oplog and I don't think Postgres can beat it.
I would argue that it means that different companies using PostgreSQL help fund PostgreSQL development. That's not the same thing as being a single company. It's a model which clearly works very well for PostgreSQL, but it doesn't really give us good data on whether the "single company doing closed source development" (e.g., Oracle) and "single company driving the bulk of open source development" (e.g., MongoDB) models would have worked as well for them.
PostgreSQL remains one of the most mysteriously difficult common DBMSs to setup which is unfortunate, but since the advent of MongoDB they've adopted all the ease of use features that are warranted from it. Developing a quick-and-dirty product prototype on postgres is a breeze and bootstraping constraints and data-integrity to it afterwards is trivial. I am really not seeing any reason to start a new app on MongoDB exclusively at this point, start off in a strong DBMS like postgres and if you end up needing MongoDB-style document storage you can always branch to it later, but using it initially is a case of premature optimization, there is no need for it.
The pain point relates less to SQL but more to the RDBMS and the rigid schema. SQL is spreading and may become a ubiquitous query language.
Ignore its lessons at your peril.
Your job isn't to build an engineering masterpiece. Your job, as pg says, is to build something people want.
The way that I understand it is that what is "Good" depends on how you measure it. When we measure in terms of technical quality, we get one answer. When we measure in terms of suited to be widely adopted, we get a different answer.
We tend to idealize for technical quality, but popularity is what matters more. And once something is widely enough adopted, the technical inferiority tends to be fixable.
If they can be successful despite us, more power to em I suppose. I’m a little annoyed that their path to success was built on the flaming wreckage of so many products that fell apart because of Mongo, by using us as their beta testers instead of building a non-shitty product, and I’m at least going to get this comment in so we aren’t completely forgotten among the congratulation.
The TCP/IP stack was built and used while the OSI model was being designed, and it won all the mindshare. Perhaps it would have been better to have separate presentation and session layers, but we don't; the application layer handles that stuff. It works well enough.
OTOH, this quote is wise:
> It is easier to optimize correct code than to correct optimized code (Bill Harlan)
I think this is doubly true for databases; at least with obfuscated code, you can recover the underlying meaning with work and exploration.
Losing or corrupting data is the worst thing a database can do. Given "this will be correct and hopefully we can scale it" vs "this will be fast and hopefully we can keep it correct", I'd choose the former for any "source of truth" data every time.
There are tricks for speeding up queries - indexes, cacheing (including materialized views), sharding, read replicas, etc.
There are no tricks for recovering data you lost.
True for databases, but not true for businesses.
> Losing or corrupting data is the worst thing a database can do.
Clearly people building simple crud websites with slick JS features didn’t agree otherwise Mongo would be gone and Rethink would be worth hundreds of millions of dollars.
I doubt it's that they didn't agree, it's more likely that the thought simply never occurred to them.
Mongo's marketing is directed with laser like focus on the beginner developer seeking out tutorials to build a website, etc. Questions about data consistency simply never arise in that context.
Later on that developer who was gently guided towards using mongo by all of the slick marketing will likely try to defend their decision when somebody attacks it ("their data consistency problems aren't that bad" or "data consistency isn't that important"), but that's something else.
Popularity is not always a good measure of what ideas are good ones.
FWIW, I don't think this was what happened. RethinkDB started out as an SSD optimized database, and quickly repositioned itself (due to "is this what people want") to something more generally useful, and was one of the most feature-rich databases at the time, I thought.
MongoDB however got first mover advantage and a bunch of cash that comes with it. They could afford to invest heavily in developer evangelism. Then they bought WiredTiger. If I sound bitter, I am a bit - not that Mongo did well in the end, but that RethinkDB went the way it did.
But be certain not to conflate your respect as a business strategist with your judgment as a mindful developer. To speak clearly: By systematically playing a weak spot of ours  they have used countless of small teams as a stepping stone to sell their business contracts to large players, while hurting a lot of these small teams with an (at the time) inappropriate product for their needs. And through these huge costs they still made a product that is inferior to one that was designed properly.
As a community (both as a startup, as well as a developer community) we should resent these tactics and try to find ways to protect us against players that abuse the common good of mindshare. And lest you say, that is the price you have to pay to at all get a product like MongoDB in harsh business environments: We could also lobby for open source funds that are organized like research funds, producing fundamental technology that benefit everyone. Not every technology fits the model of for-profit startup innovation.
 Our community has very little defenses against marketing that comes from our midst, aiming to produce the (false) impression that a disproportionate amount of our fellows have evaluated the product and found it to be excellent. See https://www.nemil.com/mongo/3.html for a discussion about MongoDb specifically (HN thread: https://news.ycombinator.com/item?id=15124306 )
Build something the right way first, even if it does take longer though I use rbdms(mysql or postgres) all the time with an ORM and the ORM does most of the heavy lifting. (Laravel/Eloquent in my case).. so I still develop pretty rapidly. I'm sure if you use pg+react on multiple projects eventually the speed to launch will increase...
> I sympathize with RethinkDB's team — they did what thoughtful engineers are trained to do. Engineering purity and humility is a tiny part of building a sustainable, venture-backed company.
It was unfathomable to us why people would choose a system that barely does the thing it’s supposed to do (store data), has a big kernel lock, throws away errors at random, implements single node features that stop working when you shard, has a barely working sharding system despite it being one of the core features of the product, provides essentially no correctness guarantees, and exposes a hodge-podge of interfaces that have no discernible consistency or unity of vision.
I mean... that's unfathomable to me too. He explains it later, that " MongoDB turned regular developers into heroes when people needed it"
I have a hard time understanding why devs choose / chose MongoDB. Postgres with JSON columns gets you so far, why would you go with MongoDB, given the issues it's had?
Jsonb is a pretty recent addition to postgres, when compared with the MongoDB timeline. And even today postgres still doesn't have the replication/failover story that made MongoDB pretty compelling. I know, it's coming, whatever, but the point is that there was a time where if you wanted a json store that could stay alive through network issues, MongoDB was one of the only choices available, and postgres simply didn't have what was needed.
Did mongodb truly allow anyone to really horizontally scale? Most places that need massive horizontal scaling using something like mysql as far as I know.
The secret ingredient in the horizontal scaling sauce is
giving up inter node ACID transactions.
Nothing prevents you from making the same tradeoff with mysql or postgresql.