I appreciate the "public service" intend of this blog post, however:
1) It is wrong to evaluate a system for bugs now fixed (but you can evaluate a software development process this way, however it is not the same as MongoDB itself, since the latter got fixed).
2) A few of the problems claimed are hard to verify, like subsystems crashing, but users can verify or deny this just looking at the mailing list if MongoDB has a mailing list like the Redis one that is ran by an external company (google) and people outside 10 gen have the ability to moderate messages. (For instance in Redis two guys from Citrusbytes can look/moderate messages, so even if I and Pieter would like to remove a message that is bad advertising we can't in a deterministic way).
3) New systems fails, especially if they are developed in the current NoSQL arena that is of course also full of interests about winning users ASAP (in other words to push new features fast is so important that perhaps sometimes stability will suffer). I can see this myself as even if my group at VMware is very focused on telling me to ship Redis as stable as possible as first rule, sometimes I get pressures about releasing new stuff ASAP from the user base itself.
IMHO it is a good idea if programmers learn to test very well the systems they are going to use with simulations for the intended use case. Never listen to the Hype, nor to detractors.
On the other side all this stories keep me motivated in being conservative in the development of Redis and try avoiding bloats and things I think will ultimately suck in the context of Redis (like VM and diskstore, two projects I abandoned).
1) It is wrong to evaluate a system for bugs now fixed
I disagree. A project's errata is a very good indicator for the overall quality of the code and the team. If a database-systems history is littered with deadlock, data-corruption and data-loss bugs up to the present day then that's telling a story.
2) A few of the problems claimed are hard to verify
The particular bugs mentioned in an anonymous pastie may be hard to verify. However, the number of elaborate horror-stories from independent sources adds up.
3) New systems fails, especially if they are developed in the current NoSQL arena
Bullshit. You, personally, are demonstrating the opposite with redis which is about the same age as MongoDB (~2 years).
When you strip MongoDB down to the parts that actually have a chance of working under load then you end up pretty close to a slow and unreliable version of redis.
Namely, Mongo demonstrably slows to a crawl when your working-set exceeds your available RAM. Thus both redis and mongo are to be considered in-memory databases whereas one of them is honest about it and the other not so much.
Likewise Mongo's advanced data structures demonstrably break down under load unless you craft your access pattern very carefully; i.e. growing records is a nono, atomic updates (transactions) are a huge headache, writes starve reads by design, the map-reduce impl halts the world, indexing halts the world, etc. etc.
My argument is that the feature disparity between mongo and redis stems mostly from the fact that Antirez has better judgement over what can be made work reliably and what can not. This is why redis clearly states its scope and limits on the tin and performs like a swiss watch within those bounds.
Mongo on the other hand promises the world and then degrades into a pile of rubble once you cross one of the various undocumented and poorly understood thresholds.
You know, I didn't think about how similar Redis and Mongo are at the core when I first read your comment. The first thing that jumped out at me was the large set of disparities.
Thanks for that explanation. I agree that Mongo seems to have over-promised and under-delivered and that you do have to really craft your access pattern. I'm not a heavy MongoDB user, but from reading the docs and playing around, I was already under the impression that the performance of MongoDB is entirely up to me and that I would need a lot of understanding to get the beast working well at scale.
EDIT: zzzeek made a good point below about spending time in a low-key mode before really selling the huge feature-set, which convinced me, so I think you're right. I do like the idea of Mongo though, so hopefully they can get through it.
there's something to be said for promoting an application proportionally to the maturity of its implementation. An application with a larger and more sprawling featureset would need to spend several years in "low key" mode, proving itself in production usage by a relatively low number of shops who treat it with caution. I think the issue here is one of premature overselling.
That's a fair problem, but I think It is true for other products as well and was true for things that we feel very solid today like MySQL. In other words there is a tention between stability and speed of development, a very "hard" tention indeed. It is up to the developers culture and sensibility to balance the two ingredients in the best way.
One of the reasons I don't want to create a company around Redis, but want to stay with VMware forever as an employee developing Redis, is that I don't want development pressures that are not drive by: users, technical arguments. So that I can balance speed of development and stability as I (and the other developers) feel right.
Without direct reference to 10gen I guess this is harder when there is a product-focused company around the product (but I don't know how true this is for 10gen as I don't follow very closely the development and behavior of other NoSQL products).
MySQL is a poor analogy because the history of MySQL is very similar to 10gen: a 'hacker' solution originally patched together by people who didn't take their responsibility as database engineers very seriously. It's only after years (decades) of work that MySQL has managed to catch up with database technology of the 80s in terms of reliability and stability (and it still has plenty of issues, as the most recent debacles with 5.5 show.)
On the other hand, commercial vendors like Oracle and open source projects like PostgreSQL recognize their role as database engineers is to first and foremost "do no harm." Ie, the database should never destroy data, period. Bugs that get released that do cause such things can be traced back to issues that are not related to a reckless pursuit of other priorities like performance. Watching the PostgreSQL engineers agonize over data integrity and correctness with any and all features that go out that are meant to improve performance is a re-assuring sight to behold.
This priority list goes without saying for professional database engineers. That there is such a 'tension' between stability and speed says less about a real phenomenon being debated by database engineers and more about the fact that many people who call themselves database engineers have about as much business doing so as so-called doctors who have not gone to medical school or taken the Hippocratic oath.
I agree with you but my comments are more about telling what is going on in my opinion, instead of telling what I think should be the right priority list. Even if I agree I still recognize that MySQL had a much bigger effect to the database world compared to PostgreSQL, so the success of a database can sometimes take strange paths.
But I think a major difference between MySQL and Redis, MongoDB, Cassandra, and all the other NoSQL solutions out there is that MySQL had an impressive test bed: all the GPL LAMP applications, from forums to blogs, shipped and users by a shitload of users. We miss this "database gym" so these new databases are evolving in small companies or other more serious production environments, and this creates all the sort of problems if they are not stable enough in the first place.
So what you say can be more important for the new databases than it was for MySQL indeed.
> MySQL had a much bigger effect to the database world compared to PostgreSQL
And if MySQL never existed, what would have happened ? Would we have all used PostgreSQL in the first place and avoided years of painful instability ?
I read here all the time that fashion and ease of use are more attractive than reliability. And we introduce plenty of new software in complex architecture just because they are easy to use. We even introduce things like "eventual consistency", as if being eventually consistent was even an option for any business.
The problem is to not use random datastores. Use a database that has a proven record of stability. And if someone builds a database, he/she must prove that ACID rules are taken seriously, and not work around the CAP theorem with timestamps...
10 years ago, MySQL was not stable. PostgreSQL was. Today, most key-value databases are not stable, PostgreSQL is.
Interesting to note is that early versions of Postgres, we're talking the pre-6 versions around 1995 here, were awful. Not like I was a very sophisticated user at that time myself but it definitely ate my data back then - we switched to MSQL at that time which at least didn't do that.
That was 16 years ago. Since then, PostgreSQL engineers spent a LOT of time proving the reliability of their engine. And today, 16 years later, we can consider it reliable.
Most key-value databases didn't prove (as in: show me actual resistance tests, not supercompany123 uses it) that they are reliable. The day they do, I'll be the first one to use them. Until then, it's just a toy for devs who don't want to deal with ER models.
you misunderstand me. I LOVE postgresql. It is the best database ever and I try to use it as much as possible. My only point was, they started out as unstable and untrustworthy just like anything else would.
I agree. There was no WAL logging, for instance. Most people consider 7.4 the first actually-possibly-not-a-terrible-idea release.
Then again, Postgres -- the project -- did not try to position itself (was there even such a thing as "positioning" for Postgres 16 years ago?) as a mature, stable project that one would credibly bet one's business on.
Lots of early database releases are going to be like Mongo, the question is how much the parties at play own up to the fact that their implementation is still immature and present that starkly real truth to their customers. So far, it seems commercial vendors are less likely to do that.
Well, 8.0 is really the first really good release.
However, actually-not-a-terrible-idea is pretty relative, when you look at how the industry has evolved in the mean time. I mean, compared to MySQL at the time, PostgreSQL 6.5 was really not a terrible idea. 7.3 was the first release I didn't have to use MySQL as a prototyping system though.
PostgreSQL has evolved a LOT in the last decade even. I thought the university project was looking at OO paradigms in relational databases (inheritance between relations and the like).
The change from Postgres to PostgreSQL was largely a UI/API change and the move from QUEL to SQL. However, over time virtually all of the software has been reviewed and rewritten. It's an excellent project, and I have been using it since 6.5.......
> And if MySQL never existed, what would have happened ? Would we have all used PostgreSQL in the first place and avoided years of painful instability ?
I think you're missing the point a little. Yes, MySQL is a heap, and having to work with it in a Postgres world sucks. But, the point antirez is making in that comment (at least how I read into it) is that an active user community in ANY project is hugely important in that project's formation and "maturity" (sarcastically, of course, because Postgres is clearly more mature than MySQL). There's no extrapolation here to the top-level Mongo discussion going on in this thread -- I was just clarifying antirez's point.
I still think that solid engineering on any project begins with the engineering and leadership of a few, and the feedback of many. So yes, community is important, but less important than the core of that community which is necessarily small.
> IMHO it is a good idea if programmers learn to test very well the systems they are going to use ...
Great point. It would also help if the company that makes a DB would put flashing banner on their page to explain the trade-offs in their product. Such as "we don't have single server durability built in as a default".
I understand if they are selling dietary supplements and are touting how users will acquire magic properties for trying the product for 5 easy payments of $29.99. In other words I expect shady bogus claims there. But these people are marketing software, not to end users, but to other developers. A little honesty, won't hurt. It is not bad that they had durability turned off. It is just a choice, and it is fine. What is not fine is not making that clear on the front page.
It's good to see a voice of reason. I think we all win if NoSQL is allowed to survive. Having multiple paths to modeling and designing our applications is an enrichment of our ability to create interesting and valuable applications in our industry. The last 10 years have been about living under the modeling constraints of RDBMS's and the industry is slowly waking up to the realization that it does not need to be like this. Now we got choices. Graph db's, column db's, document db's etc.
I would like to thank you for the great job you have and are doing on Redis. It's an awesome piece of technology and warms my heart as an European :). Are you based in Palermo ?
"Allowed to survive" is the wrong approach. "Finds a niche" is better.
The fact that software engineers need to understand is that NoSQL is in no way a replacement for SQL in areas of data with inherent structure. In such areas, the relational model wins hands-down, and NoSQL is a big, heavy foot-gun. The caliber of the foot gun goes up significantly when multiple applications need to access the same data.
On the other hand, the relational model breaks down in some ways in many areas. Some things that you'd think are inherently structured (like world-wide street addresses) turn out to only be semi-structured. Document management, highly performing hierarchical directory stores, and a few other areas also are bad matches for the relational model. Other stores work well in many of these areas, from the filesystem to things like NoSQL databases.
The big problem occurs when semi-structured data (say files which contain printed invoice data in PDF format) have to be linked to inherently structured data (say, vendor invoices). In these cases, tradeoffs have to be made......
I have no doubt that NoSQL is able to find a niche. I doubt it will be one which at best involves inherently structured data.
> I think we all win if NoSQL is allowed to survive.
What does that even mean? Is it some sort of cultural practice or religion we are afraid of losing. So we should look over lost data and bad designs just because something falls under the "NoSQL" category?
I think anyone married to a technology like it is a religion is poised for failure. Technology should be evaluated as a tool. "Is this tool useful to me for this job?" Yes/No? Not "it has NoSQL in its title, it must be good, I'll use that".