1) It is wrong to evaluate a system for bugs now fixed (but you can evaluate a software development process this way, however it is not the same as MongoDB itself, since the latter got fixed).
2) A few of the problems claimed are hard to verify, like subsystems crashing, but users can verify or deny this just looking at the mailing list if MongoDB has a mailing list like the Redis one that is ran by an external company (google) and people outside 10 gen have the ability to moderate messages. (For instance in Redis two guys from Citrusbytes can look/moderate messages, so even if I and Pieter would like to remove a message that is bad advertising we can't in a deterministic way).
3) New systems fails, especially if they are developed in the current NoSQL arena that is of course also full of interests about winning users ASAP (in other words to push new features fast is so important that perhaps sometimes stability will suffer). I can see this myself as even if my group at VMware is very focused on telling me to ship Redis as stable as possible as first rule, sometimes I get pressures about releasing new stuff ASAP from the user base itself.
IMHO it is a good idea if programmers learn to test very well the systems they are going to use with simulations for the intended use case. Never listen to the Hype, nor to detractors.
On the other side all this stories keep me motivated in being conservative in the development of Redis and try avoiding bloats and things I think will ultimately suck in the context of Redis (like VM and diskstore, two projects I abandoned).
I disagree. A project's errata is a very good indicator for the overall quality of the code and the team. If a database-systems history is littered with deadlock, data-corruption and data-loss bugs up to the present day then that's telling a story.
2) A few of the problems claimed are hard to verify
The particular bugs mentioned in an anonymous pastie may be hard to verify. However, the number of elaborate horror-stories from independent sources adds up.
3) New systems fails, especially if they are developed in the current NoSQL arena
Bullshit. You, personally, are demonstrating the opposite with redis which is about the same age as MongoDB (~2 years).
Apparently you have no idea how many critical bugs have been fixed in Redis...
When you strip MongoDB down to the parts that actually have a chance of working under load then you end up pretty close to a slow and unreliable version of redis.
Namely, Mongo demonstrably slows to a crawl when your working-set exceeds your available RAM. Thus both redis and mongo are to be considered in-memory databases whereas one of them is honest about it and the other not so much.
Likewise Mongo's advanced data structures demonstrably break down under load unless you craft your access pattern very carefully; i.e. growing records is a nono, atomic updates (transactions) are a huge headache, writes starve reads by design, the map-reduce impl halts the world, indexing halts the world, etc. etc.
My argument is that the feature disparity between mongo and redis stems mostly from the fact that Antirez has better judgement over what can be made work reliably and what can not. This is why redis clearly states its scope and limits on the tin and performs like a swiss watch within those bounds.
Mongo on the other hand promises the world and then degrades into a pile of rubble once you cross one of the various undocumented and poorly understood thresholds.
Thanks for that explanation. I agree that Mongo seems to have over-promised and under-delivered and that you do have to really craft your access pattern. I'm not a heavy MongoDB user, but from reading the docs and playing around, I was already under the impression that the performance of MongoDB is entirely up to me and that I would need a lot of understanding to get the beast working well at scale.
EDIT: zzzeek made a good point below about spending time in a low-key mode before really selling the huge feature-set, which convinced me, so I think you're right. I do like the idea of Mongo though, so hopefully they can get through it.
One of the reasons I don't want to create a company around Redis, but want to stay with VMware forever as an employee developing Redis, is that I don't want development pressures that are not drive by: users, technical arguments. So that I can balance speed of development and stability as I (and the other developers) feel right.
Without direct reference to 10gen I guess this is harder when there is a product-focused company around the product (but I don't know how true this is for 10gen as I don't follow very closely the development and behavior of other NoSQL products).
On the other hand, commercial vendors like Oracle and open source projects like PostgreSQL recognize their role as database engineers is to first and foremost "do no harm." Ie, the database should never destroy data, period. Bugs that get released that do cause such things can be traced back to issues that are not related to a reckless pursuit of other priorities like performance. Watching the PostgreSQL engineers agonize over data integrity and correctness with any and all features that go out that are meant to improve performance is a re-assuring sight to behold.
This priority list goes without saying for professional database engineers. That there is such a 'tension' between stability and speed says less about a real phenomenon being debated by database engineers and more about the fact that many people who call themselves database engineers have about as much business doing so as so-called doctors who have not gone to medical school or taken the Hippocratic oath.
But I think a major difference between MySQL and Redis, MongoDB, Cassandra, and all the other NoSQL solutions out there is that MySQL had an impressive test bed: all the GPL LAMP applications, from forums to blogs, shipped and users by a shitload of users. We miss this "database gym" so these new databases are evolving in small companies or other more serious production environments, and this creates all the sort of problems if they are not stable enough in the first place.
So what you say can be more important for the new databases than it was for MySQL indeed.
And if MySQL never existed, what would have happened ? Would we have all used PostgreSQL in the first place and avoided years of painful instability ?
I read here all the time that fashion and ease of use are more attractive than reliability. And we introduce plenty of new software in complex architecture just because they are easy to use. We even introduce things like "eventual consistency", as if being eventually consistent was even an option for any business.
The problem is to not use random datastores. Use a database that has a proven record of stability. And if someone builds a database, he/she must prove that ACID rules are taken seriously, and not work around the CAP theorem with timestamps...
10 years ago, MySQL was not stable. PostgreSQL was. Today, most key-value databases are not stable, PostgreSQL is.
My sense was that it got a pretty thorough review and revision/rewrite in the transition from Postgres to PostgreSQL.
The change from Postgres to PostgreSQL was largely a UI/API change and the move from QUEL to SQL. However, over time virtually all of the software has been reviewed and rewritten. It's an excellent project, and I have been using it since 6.5.......
Most key-value databases didn't prove (as in: show me actual resistance tests, not supercompany123 uses it) that they are reliable. The day they do, I'll be the first one to use them. Until then, it's just a toy for devs who don't want to deal with ER models.
Then again, Postgres -- the project -- did not try to position itself (was there even such a thing as "positioning" for Postgres 16 years ago?) as a mature, stable project that one would credibly bet one's business on.
Lots of early database releases are going to be like Mongo, the question is how much the parties at play own up to the fact that their implementation is still immature and present that starkly real truth to their customers. So far, it seems commercial vendors are less likely to do that.
However, actually-not-a-terrible-idea is pretty relative, when you look at how the industry has evolved in the mean time. I mean, compared to MySQL at the time, PostgreSQL 6.5 was really not a terrible idea. 7.3 was the first release I didn't have to use MySQL as a prototyping system though.
And with 9.x things are getting even better.
I think you're missing the point a little. Yes, MySQL is a heap, and having to work with it in a Postgres world sucks. But, the point antirez is making in that comment (at least how I read into it) is that an active user community in ANY project is hugely important in that project's formation and "maturity" (sarcastically, of course, because Postgres is clearly more mature than MySQL). There's no extrapolation here to the top-level Mongo discussion going on in this thread -- I was just clarifying antirez's point.
I know benchmarks don't put this quite as fast as 5.5, but there are still possible gains to be made.
Great point. It would also help if the company that makes a DB would put flashing banner on their page to explain the trade-offs in their product. Such as "we don't have single server durability built in as a default".
I understand if they are selling dietary supplements and are touting how users will acquire magic properties for trying the product for 5 easy payments of $29.99. In other words I expect shady bogus claims there. But these people are marketing software, not to end users, but to other developers. A little honesty, won't hurt. It is not bad that they had durability turned off. It is just a choice, and it is fine. What is not fine is not making that clear on the front page.
I would like to thank you for the great job you have and are doing on Redis. It's an awesome piece of technology and warms my heart as an European :). Are you based in Palermo ?
The fact that software engineers need to understand is that NoSQL is in no way a replacement for SQL in areas of data with inherent structure. In such areas, the relational model wins hands-down, and NoSQL is a big, heavy foot-gun. The caliber of the foot gun goes up significantly when multiple applications need to access the same data.
On the other hand, the relational model breaks down in some ways in many areas. Some things that you'd think are inherently structured (like world-wide street addresses) turn out to only be semi-structured. Document management, highly performing hierarchical directory stores, and a few other areas also are bad matches for the relational model. Other stores work well in many of these areas, from the filesystem to things like NoSQL databases.
The big problem occurs when semi-structured data (say files which contain printed invoice data in PDF format) have to be linked to inherently structured data (say, vendor invoices). In these cases, tradeoffs have to be made......
I have no doubt that NoSQL is able to find a niche. I doubt it will be one which at best involves inherently structured data.
What does that even mean? Is it some sort of cultural practice or religion we are afraid of losing. So we should look over lost data and bad designs just because something falls under the "NoSQL" category?
I think anyone married to a technology like it is a religion is poised for failure. Technology should be evaluated as a tool. "Is this tool useful to me for this job?" Yes/No? Not "it has NoSQL in its title, it must be good, I'll use that".