I appreciate the "public service" intend of this blog post, however: 1) It is wr...

moe · on Nov 6, 2011

1) It is wrong to evaluate a system for bugs now fixed

I disagree. A project's errata is a very good indicator for the overall quality of the code and the team. If a database-systems history is littered with deadlock, data-corruption and data-loss bugs up to the present day then that's telling a story.

2) A few of the problems claimed are hard to verify

The particular bugs mentioned in an anonymous pastie may be hard to verify. However, the number of elaborate horror-stories from independent sources adds up.

3) New systems fails, especially if they are developed in the current NoSQL arena

Bullshit. You, personally, are demonstrating the opposite with redis which is about the same age as MongoDB (~2 years).

catwell · on Nov 7, 2011

> Bullshit. You, personally, are demonstrating the opposite with redis which is about the same age as MongoDB (~2 years).

Apparently you have no idea how many critical bugs have been fixed in Redis...

WayneDB · on Nov 6, 2011

I agree with your responses to 1 and 2. I take issue with the example for 3 though because Redis is nowhere near the complexity or feature set of MongoDB.

moe · on Nov 6, 2011

I don't think that counts as an argument.

When you strip MongoDB down to the parts that actually have a chance of working under load then you end up pretty close to a slow and unreliable version of redis.

Namely, Mongo demonstrably slows to a crawl when your working-set exceeds your available RAM. Thus both redis and mongo are to be considered in-memory databases whereas one of them is honest about it and the other not so much.

Likewise Mongo's advanced data structures demonstrably break down under load unless you craft your access pattern very carefully; i.e. growing records is a nono, atomic updates (transactions) are a huge headache, writes starve reads by design, the map-reduce impl halts the world, indexing halts the world, etc. etc.

My argument is that the feature disparity between mongo and redis stems mostly from the fact that Antirez has better judgement over what can be made work reliably and what can not. This is why redis clearly states its scope and limits on the tin and performs like a swiss watch within those bounds.

Mongo on the other hand promises the world and then degrades into a pile of rubble once you cross one of the various undocumented and poorly understood thresholds.

j_baker · on Nov 6, 2011

If I recall correctly, mongo only requires that the index gets stored in memory. The actual data itself can go on disk.

kanwisher · on Nov 7, 2011

If you actually use Mongo in practice, everything needs to be in ram to have any kind of performance

obfuscate · on Nov 6, 2011

It requires neither.

willvarfar · on Nov 6, 2011

facepalm. Indices on disk is a solved problem.

WayneDB · on Nov 6, 2011

You know, I didn't think about how similar Redis and Mongo are at the core when I first read your comment. The first thing that jumped out at me was the large set of disparities.

Thanks for that explanation. I agree that Mongo seems to have over-promised and under-delivered and that you do have to really craft your access pattern. I'm not a heavy MongoDB user, but from reading the docs and playing around, I was already under the impression that the performance of MongoDB is entirely up to me and that I would need a lot of understanding to get the beast working well at scale.

So, it's a tough call for me to say whether they over-promised or not, but like I said...I'm not a heavy user. I just read a lot. I do think it is easy to be deceived by Mongo's apparent simplicity (ie - usage of JSON, Javascript, schema-lessness, etc).

EDIT: zzzeek made a good point below about spending time in a low-key mode before really selling the huge feature-set, which convinced me, so I think you're right. I do like the idea of Mongo though, so hopefully they can get through it.

zzzeek · on Nov 6, 2011

there's something to be said for promoting an application proportionally to the maturity of its implementation. An application with a larger and more sprawling featureset would need to spend several years in "low key" mode, proving itself in production usage by a relatively low number of shops who treat it with caution. I think the issue here is one of premature overselling.

WayneDB · on Nov 6, 2011

Good point.

gfodor · on Nov 6, 2011

At the end of the post the author notes his concern isn't with the technical bugs per se, but with the deep rooted cultural problems and misplaced priorities the existence of those problems reveal.

antirez · on Nov 6, 2011

That's a fair problem, but I think It is true for other products as well and was true for things that we feel very solid today like MySQL. In other words there is a tention between stability and speed of development, a very "hard" tention indeed. It is up to the developers culture and sensibility to balance the two ingredients in the best way.

One of the reasons I don't want to create a company around Redis, but want to stay with VMware forever as an employee developing Redis, is that I don't want development pressures that are not drive by: users, technical arguments. So that I can balance speed of development and stability as I (and the other developers) feel right.

Without direct reference to 10gen I guess this is harder when there is a product-focused company around the product (but I don't know how true this is for 10gen as I don't follow very closely the development and behavior of other NoSQL products).

gfodor · on Nov 6, 2011

MySQL is a poor analogy because the history of MySQL is very similar to 10gen: a 'hacker' solution originally patched together by people who didn't take their responsibility as database engineers very seriously. It's only after years (decades) of work that MySQL has managed to catch up with database technology of the 80s in terms of reliability and stability (and it still has plenty of issues, as the most recent debacles with 5.5 show.)

On the other hand, commercial vendors like Oracle and open source projects like PostgreSQL recognize their role as database engineers is to first and foremost "do no harm." Ie, the database should never destroy data, period. Bugs that get released that do cause such things can be traced back to issues that are not related to a reckless pursuit of other priorities like performance. Watching the PostgreSQL engineers agonize over data integrity and correctness with any and all features that go out that are meant to improve performance is a re-assuring sight to behold.

This priority list goes without saying for professional database engineers. That there is such a 'tension' between stability and speed says less about a real phenomenon being debated by database engineers and more about the fact that many people who call themselves database engineers have about as much business doing so as so-called doctors who have not gone to medical school or taken the Hippocratic oath.

antirez · on Nov 6, 2011

I agree with you but my comments are more about telling what is going on in my opinion, instead of telling what I think should be the right priority list. Even if I agree I still recognize that MySQL had a much bigger effect to the database world compared to PostgreSQL, so the success of a database can sometimes take strange paths.

But I think a major difference between MySQL and Redis, MongoDB, Cassandra, and all the other NoSQL solutions out there is that MySQL had an impressive test bed: all the GPL LAMP applications, from forums to blogs, shipped and users by a shitload of users. We miss this "database gym" so these new databases are evolving in small companies or other more serious production environments, and this creates all the sort of problems if they are not stable enough in the first place.

So what you say can be more important for the new databases than it was for MySQL indeed.

jvehent · on Nov 6, 2011

> MySQL had a much bigger effect to the database world compared to PostgreSQL

And if MySQL never existed, what would have happened ? Would we have all used PostgreSQL in the first place and avoided years of painful instability ?

I read here all the time that fashion and ease of use are more attractive than reliability. And we introduce plenty of new software in complex architecture just because they are easy to use. We even introduce things like "eventual consistency", as if being eventually consistent was even an option for any business.

The problem is to not use random datastores. Use a database that has a proven record of stability. And if someone builds a database, he/she must prove that ACID rules are taken seriously, and not work around the CAP theorem with timestamps...

10 years ago, MySQL was not stable. PostgreSQL was. Today, most key-value databases are not stable, PostgreSQL is.

zzzeek · on Nov 6, 2011

Interesting to note is that early versions of Postgres, we're talking the pre-6 versions around 1995 here, were awful. Not like I was a very sophisticated user at that time myself but it definitely ate my data back then - we switched to MSQL at that time which at least didn't do that.

marshray · on Nov 6, 2011

Wasn't it still basically a university project for researching MVCC at that point? I love universities of course but we must admit they produce interestingly-architected abandonware sometimes.

My sense was that it got a pretty thorough review and revision/rewrite in the transition from Postgres to PostgreSQL.

einhverfr · on Nov 6, 2011

PostgreSQL has evolved a LOT in the last decade even. I thought the university project was looking at OO paradigms in relational databases (inheritance between relations and the like).

The change from Postgres to PostgreSQL was largely a UI/API change and the move from QUEL to SQL. However, over time virtually all of the software has been reviewed and rewritten. It's an excellent project, and I have been using it since 6.5.......

jvehent · on Nov 6, 2011

That was 16 years ago. Since then, PostgreSQL engineers spent a LOT of time proving the reliability of their engine. And today, 16 years later, we can consider it reliable.

Most key-value databases didn't prove (as in: show me actual resistance tests, not supercompany123 uses it) that they are reliable. The day they do, I'll be the first one to use them. Until then, it's just a toy for devs who don't want to deal with ER models.

zzzeek · on Nov 6, 2011

you misunderstand me. I LOVE postgresql. It is the best database ever and I try to use it as much as possible. My only point was, they started out as unstable and untrustworthy just like anything else would.

fdr · on Nov 6, 2011

I agree. There was no WAL logging, for instance. Most people consider 7.4 the first actually-possibly-not-a-terrible-idea release.

Then again, Postgres -- the project -- did not try to position itself (was there even such a thing as "positioning" for Postgres 16 years ago?) as a mature, stable project that one would credibly bet one's business on.

Lots of early database releases are going to be like Mongo, the question is how much the parties at play own up to the fact that their implementation is still immature and present that starkly real truth to their customers. So far, it seems commercial vendors are less likely to do that.

einhverfr · on Nov 7, 2011

Well, 8.0 is really the first really good release.

However, actually-not-a-terrible-idea is pretty relative, when you look at how the industry has evolved in the mean time. I mean, compared to MySQL at the time, PostgreSQL 6.5 was really not a terrible idea. 7.3 was the first release I didn't have to use MySQL as a prototyping system though.

And with 9.x things are getting even better.

ajsharp · on Nov 6, 2011

> And if MySQL never existed, what would have happened ? Would we have all used PostgreSQL in the first place and avoided years of painful instability ?

I think you're missing the point a little. Yes, MySQL is a heap, and having to work with it in a Postgres world sucks. But, the point antirez is making in that comment (at least how I read into it) is that an active user community in ANY project is hugely important in that project's formation and "maturity" (sarcastically, of course, because Postgres is clearly more mature than MySQL). There's no extrapolation here to the top-level Mongo discussion going on in this thread -- I was just clarifying antirez's point.

einhverfr · on Nov 7, 2011

I still think that solid engineering on any project begins with the engineering and leadership of a few, and the feedback of many. So yes, community is important, but less important than the core of that community which is necessarily small.

lurker17 · on Nov 8, 2011

"eventual consistency" was promoted by Amazon, which seems to run a pretty good business.

JonM · on Nov 6, 2011

Where can I get info on the MySQL 5.5 issues? I'm considering upgrading from 5.1 to get the new InnoDB plugin...

asianexpress · on Nov 6, 2011

Just in case you weren't already aware, you can use the InnoDB plugin in 5.1 http://dev.mysql.com/doc/innodb-plugin/1.0/en/index.html

I know benchmarks don't put this quite as fast as 5.5, but there are still possible gains to be made.

willvarfar · on Nov 6, 2011

Pssst look at tokudb

rdtsc · on Nov 6, 2011

> IMHO it is a good idea if programmers learn to test very well the systems they are going to use ...

Great point. It would also help if the company that makes a DB would put flashing banner on their page to explain the trade-offs in their product. Such as "we don't have single server durability built in as a default".

I understand if they are selling dietary supplements and are touting how users will acquire magic properties for trying the product for 5 easy payments of $29.99. In other words I expect shady bogus claims there. But these people are marketing software, not to end users, but to other developers. A little honesty, won't hurt. It is not bad that they had durability turned off. It is just a choice, and it is fine. What is not fine is not making that clear on the front page.

christkv · on Nov 6, 2011

It's good to see a voice of reason. I think we all win if NoSQL is allowed to survive. Having multiple paths to modeling and designing our applications is an enrichment of our ability to create interesting and valuable applications in our industry. The last 10 years have been about living under the modeling constraints of RDBMS's and the industry is slowly waking up to the realization that it does not need to be like this. Now we got choices. Graph db's, column db's, document db's etc.

I would like to thank you for the great job you have and are doing on Redis. It's an awesome piece of technology and warms my heart as an European :). Are you based in Palermo ?

einhverfr · on Nov 6, 2011

"Allowed to survive" is the wrong approach. "Finds a niche" is better.

The fact that software engineers need to understand is that NoSQL is in no way a replacement for SQL in areas of data with inherent structure. In such areas, the relational model wins hands-down, and NoSQL is a big, heavy foot-gun. The caliber of the foot gun goes up significantly when multiple applications need to access the same data.

On the other hand, the relational model breaks down in some ways in many areas. Some things that you'd think are inherently structured (like world-wide street addresses) turn out to only be semi-structured. Document management, highly performing hierarchical directory stores, and a few other areas also are bad matches for the relational model. Other stores work well in many of these areas, from the filesystem to things like NoSQL databases.

The big problem occurs when semi-structured data (say files which contain printed invoice data in PDF format) have to be linked to inherently structured data (say, vendor invoices). In these cases, tradeoffs have to be made......

I have no doubt that NoSQL is able to find a niche. I doubt it will be one which at best involves inherently structured data.

rdtsc · on Nov 6, 2011

> I think we all win if NoSQL is allowed to survive.

What does that even mean? Is it some sort of cultural practice or religion we are afraid of losing. So we should look over lost data and bad designs just because something falls under the "NoSQL" category?

I think anyone married to a technology like it is a religion is poised for failure. Technology should be evaluated as a tool. "Is this tool useful to me for this job?" Yes/No? Not "it has NoSQL in its title, it must be good, I'll use that".