
MongoDB 2.0 Should Have Been 1.0 - joshuacc
http://luigimontanez.com/2011/mongodb-2.0-should-have-been-1.0/
======
rkalla
I don't intend this comment to be an insightful deconstruction of the NoSQL
space and/or Mongo... but does anyone else notice that the level of energy
around a project (positive or negative) _usually_ indicates its progress along
the hype-dissolution cycle[1]?

I try not to take the actual comments or articles as law, but rather use them
as a temperature reading to figure out where in the slope we are currently for
a given tech or trend.

Given that Mongo-talk has absolutely dominated HN over the weekend as well as
reddit/r/programming, I am interpreting this as being in the bottom of the
dissolution curve right now.

It is this point where the community push-back and temporary "hating on"
forces the team to go into overdrive, addressing whatever pain points the
community has griped on the loudest, in this case:

    
    
      - Write locks
      - Durability / Replication consistency (already addressed)
    

This is like tempering steel by pounding on it... but instead it is the
community pounding on the people over at 10gen. That sucks, but this right of
passage for them will see sweeter days on the other side.

I imagine Mongo 3.0 will represent the final climb out of the dissolution
curve where all open complaints have been addressed and we actually get back
to solving problems with the technology.

I don't know that Cassandra or Redis or CouchDB have completed their hype-
cycles yet, because they haven't had the hyper-aggressive response from the
community during the dissolution step. They are all popular and well-liked,
but it seems like their popularity is still climbing.

It is all interesting none the less. Mongo is a huge success, regardless of
how many of these articles are written.

I've not seen a team as dedicated and involved like 10gen is for a long time;
Eliot still answered 100s of messages on the group every week (Along with
every new member of the team) -- which I find the biggest indicator of Mongo's
future success. If the CTO is carving out that much time during the day to
stay involved, while still bug fixing, replying to posts like these and
testing bug reports... that's a lot of love right there.

[1] <http://en.wikipedia.org/wiki/Hype_cycle>

~~~
socratic
This seems like an odd analysis if you mean that MongoDB is hitting the trough
of disillusionment. MongoDB, Cassandra, HBase, and Redis all came out at
roughly the same time (2008, 2009) according to Wikipedia and their project
pages. Is there a reason they would be on totally different hype cycles?

As far as I can tell, no one has hated on Redis or HBase (except for the brief
period when antirez tried to add VM to Redis) because they both (a) work and
(b) solve real use cases. Has there been any suggestion that Redis or HBase
lose data?

However, maybe you are right in a more general sense. Do you think that the
idea of NOSQL itself is reaching the trough of disillusionment? Are we seeing
a shake out of which of these data stores are actually designed by people who
know what they are doing, both in (database) theory and in (systems coding)
practice?

~~~
rkalla
Oh I do mean that -- I don't think their time in existence effects the rate at
which you move along the hype cycle, I think popularity and deployment does.

I would say Mongo is the most popular NoSQL data store at the moment; whether
it is mindshare or deployments and that is what caused the move along the
cycle so much faster.

I don't mean to detract from any of the other NoSQL projects; they don't have
the marketing or manpower budget that 10gen has so I wouldn't expect them to
be at the same place in the cycle. MongoDB came on the scene with the only
NoSQL solution that promised SQL-esque queries, insane magnitudes jump in
performance AND a big commercial company behind to. To anyone trying to
understand "NoSQL", it was the clearest and safest place to look.

Since then we've seen the cracks in that original argument (fast and unstable
means terror in production), and 10gen has changed focus as needed and
addressed those. During that time it wasn't just coding like a lot of these
other projects, they were putting on conference after conference, garnering
mindshare and getting developers on board.

An open source Apache project just won't move along a hype path as quickly as
a force like that.

(I am making no statement towards quality, performance or worthiness... just
positions on the hype-cycle).

    
    
      > Do you think that the idea of NOSQL itself is reaching 
      > the trough of disillusionment? Are we seeing a shake out 
      > of which of these data stores are actually designed by 
      > people who know what they are doing, both in (database) 
      > theory and in (systems coding) practice?
    

I couldn't have phrased it better; yes I think this is exactly what is
happening.

The early days it was _so exciting_ to see different ways to store/retrieve
data. We had been with SQL for decade(s) and it was very exciting to see
something new/fresh and fast popup.

Then everyone started storing data every which way they could think of.

Then a few of us starting solving problems with those new ideas... so far so
good.

Then some of those projects and new projects built on those new techniques
blew up in popularity, and suddenly the "real world" came knocking and we
started to actually test the metal of these things in production... with disk
failures, network failures, power failures and administration failures.

Like shaking out a rug, the weakest approaches got shaken out and the
strongest teams/products weathered the storm to grow stronger and more stable.

2011 was the year NoSQL "Grew up", I imagine 2012 and 2013 will be the year
that NoSQL comes all the way out of the dissolution curve completely and, in a
metaphysical sense, "goes into production".

I mean that in the most hand-wavy way, not literally... literally LOTS of
people have it in production.

I mean it in the sense that you stop seeing articles like these that sparked
all the Mongo hype recently or articles about horrible shortcomings or
failures about XYZ datastore.

Early on the teams making the NoSQL solutions AND the users didn't really
understand where this boat was going or how the puzzle pieces fit together...
they just kept working and refining.

This entire year we've seen more and more specialization in the NoSQL
community:

    
    
      - Antirez gave up on data-larger-than-ram approaches and 
        wants to focus Redis on what it is amazing at: being 
        fast, in memory.
      - CouchDB, building on its uniquely awesome m-m 
        replication, moves into the mobile space with data sync 
        solutions that are awesome.
      - MongoDB keeps replacing MySQL in production at many 
        large-scale startups in the valley; showing more and 
        more the exact migration path to take.
      - Cassandra becomes markedly easier to use with CQL and 
        combined with its CouchDB-esque replication behavior, 
        suddenly makes all sorts of sense in densely populated 
        deployments.
    

Back in 2010 I couldn't have told you which NoSQL solution was best for which
job... closing in on the end of 2011 it is glaringly obvious to me when you
would use Redis and when you would use CouchDB (for example).

This seems silly in hindsight, but I don't think we or the teams really
honestly knew where this trip was taking the technology a year or more ago.

2012 will be a year of polish, stability and deployments.

2013 will be production deployments and replacing MySQL in more and more
places.

2015, it all starts all over again as SSD-optimized data structures and data
stores revamp our understanding of databases :) -- I am half-kidding.

That's my 2 cents anyway.

~~~
rbranson
Cassandra has CouchDB-esque replication?

~~~
rkalla
In the most general sense (master-master) yes, but in a more detailed sense...
not really.

Cassandra and Riak have a similar replication model -- the are deployed into a
"ring" and the data in the ring distributed across some (or all) of the nodes
depending on your ReplicationFactor (how many nodes to copy each piece of data
to).

If you query for a piece of data that a node doesn't have, it hashes the query
and routes you to the node that does have it.

CouchDB is a bit different, in that by default it treats every node as a
master and replicates it in its entirety to any other nodes registered as a
replication target.

You can shard with something like BigCouch, but that is 3rd party.

This is different than Mongo which is master-slave-slave-* or Redis which I
believe is master-slave as well (I never got a clear answer on how "slave"
nodes in Redis resolve or push changes back upstream to the master).

------
codyrobbins
_MongoDB is on its way to becoming the default datastore for web apps._

In my experience there is no way that this is possibly true.

~~~
misterbwong
I sense another "SQL is not dead. It's still used by 90% of the web." article
being written somewhere on the internets.

This is not directed at my parent comment but, seriously, this is getting
tiring. Comparing NoSQL to SQL is like comparing a rubber mallet to a hammer.
Sure, both might be good for some of the same things, but each has its
specific use case.

~~~
jeffdavis
I'm not sure what your point is. Are you saying that MongoDB _is_ on its way
to becoming the default datastore for web apps? Or are you saying that it's
not, and the comparison never should have been made in the first place?

~~~
misterbwong
I'm saying that there isn't _a_ default datastore for web apps in the same way
there isn't a default language for programming. Different data stores do
different things and web apps are so varied that certain apps will benefit
from NoSQL, some from SQL, and yet others from straight text file storage. I
find it tiring that everyone thinks their choice of datastore is the
_bestforeverythingontheweb_ datastore.

~~~
jeffdavis
I mostly agree, but that mentality certainly came from somewhere.

I think it's pretty well established that MySQL was the default data storage
system (I say to include a broad range of systems) for web applications in the
open-source world for a good chunk of the last decade.

And there's at least some reason for a default to exist. There are many
applications where the author(s) don't have particular data storage/management
expertise, and they'll be looking to use the "best practice" or "default"
system that everyone else is using. So it sounds entirely reasonably to me
that there will, again, exist a default way to store and manage data.

And it also seems natural that various systems will vie for that title,
because there are a huge number of potential users there. Others will avoid
that title because they want only experienced users to be involved (which I
think is misguided, but it seems there are always a few).

So, I agree with you in the strict sense that there's no
_bestforeverythingontheweb_ datastore, and it's _way_ too early to assign that
title to anyone right now, but striving for broad appeal is certainly a
reasonable thing to do.

------
matthewcford
I've been using MongoDB for well over a year now in around 6 apps (moved on
from CouchDB) and I agree prior to 1.8 it should have been made more obvious
that there were still some stability issues.

I have seen first hand some of the issues raised, we've had data disappear,
recurring random crashes, ect. But I think the difference is 'everyone' knew
that there were issues with MongoDB, you just needed to check in jira. Jumping
to 1.0 too early is clearly part of the reason for this backlash as not
everyone thinks to check the issues because they've come to believe 1.0 means
its ready for mass adoption.

That being said, I love MongoDB and I would still use it in other apps, just
got to decide if it's the right tool for the job.

~~~
dhimes
Why did you move from couch? I'm considering couch for a project, and am not
especially knowledgeable in the space. Couch has worked fine for a low-load,
minimal-functioning prototype store (no replication needs, etc.). Its scary
feature to me is dealing with compacting-- how and when to schedule it so a
large db won't get bogged-down.

~~~
daleharvey
Couch now has an inbuilt compaction deamon, so you can configure it to run
automatically

[https://github.com/apache/couchdb/blob/trunk/etc/couchdb/def...](https://github.com/apache/couchdb/blob/trunk/etc/couchdb/default.ini.tpl.in#L197)

~~~
dhimes
What I haven't tested, though, is how long compaction takes- that is, how it
scales with db size and whether more frequent compaction means closer to
constant scaling.

Once the prototype was up I started working on other parts of the system (and
the business for that matter) and only half-paid-attention to the mailing
list.

The mailing list for couch is quite good, btw.

~~~
rdtsc
It is better to do it during downtime. You can basically provide scheduling
rules such as 'compact when fragmentation % > X AND time-of-day window is Y'.

------
feralchimp
Raise your hand if you're relying on package version numbers to tell you which
packages to implement in your high-volume production environment against live
customer data.

~~~
justin_vanw
Your comment would be funny, if it didn't have so much truth to it.

From now on, for all my open source code, I'm going to version with this
translation table:

 _Alpha_ : Beta

 _Beta_ : increment version by random number between .1 and .2, eg: 2.0
becomes 2.1.7

 _Release Candidate_ : add the word "Enterprise Edition" to the version

 _Release_ : Add the same letter Oracle added for that version: 8 -> 8i, 9 ->
9i, 10 -> 10g. The exception is anything that happens to be a BMW model, in
which case I'll just make it the same as the BMW. 3.2.8 -> 3.2.8xi, 4 -> z4,
&c.

~~~
cpeterso
No "Technical Preview" releases?

------
latch
I'm not a big fan of having version numbers have some type of special meaning.
To me, your post implies that there's a line in the sand at some magic version
number with respect to due-diligence (on everyone's part). Gmail was in "beta"
for years...it didn't mean anything. I understand that wiki and history
disagrees with me, but it's still how I feel.

Oh, and there's a chance that yesterday's drama was a hoax:
<http://news.ycombinator.com/item?id=3205573>

Edit:

Associating special meaning with these things has always been "gamed". Access
went from 2 to 7. Heck, office went from 97 to 2000!

~~~
leif
The article appears to consider the version number issue symptomatic of a
deeper mismanagement of expectations by MongoDB marketing.

------
andrewvc
Any reasonable dev knows that version numbers are useless until you know the
process and history behind them. You should have a healthy amount of fear
before commiting to a brand new to the market db, regardless of 1.0 status.

There are places to fuck around with your stack choices, but databases aren't
one of them unless you absolutely need this new tech. Lets be honest and
acknowledge that most sites using Mongo today could be using an sql based
solution with no issue.

Some people are leveraging it for a reason, others.... just for kicks. If your
doing it just for kicks, you'd better be comfortable with uncertainty.

------
jimbobimbo
"There is no doubt that MongoDB has benefitted from an aggressive marketing
push. There are more MongoDB conferences held (organized by 10gen) and MongoDB
books written (mainly by 10gen employees) than for the other NoSQL datastores
combined."

Latest developments around MongoDB remind me the history of Oracle DB
described in "The Difference Between God and Larry Ellison..." book: they
basically were selling a DB that was far away from prime time. I'm not
surprised at all that people may run into the issues with bleeding edge
software - I figure that's the price that early adopters must pay anyway.

------
Zuzz
"Being document-based datastores, Riak and CouchDB are the most direct
competitors to MongoDB"

But Riak is a Key-Value store, not a document one. If that's the premise I
wonder how illuminating the rest can be (I kept reading: it's not)

~~~
tsuraan
Riak has secondary indices, and map/reduce for ad-hoc queries. You can store
raw binary data in it, and it's happy with that, but if you store JSON then it
can query it. From what I can tell, the biggest difference between Riak and
Bigcouch from a data model POV is that Bigcouch has materialized views, while
Riak's are ad-hoc. I'm not an expert in either though...

~~~
Zuzz
there you go, from the horse's mouth and fresh off the press:

"For better or worse, many people consider MongoDB and Riak to be competitors.
In reality, there are very few similarities between the products."

[http://seancribbs.com/tech/2011/11/07/mongodb-and-riak-in-
co...](http://seancribbs.com/tech/2011/11/07/mongodb-and-riak-in-context-and-
an-apology/)

------
kevingadd
"No open source project has received more criticism in recent years than
MongoDB."

[citation needed]

Suggesting that the 1.0 version number should have been reserved until more
recently makes me think that what the author of this post is really saying is
something like this:

"MongoDB wasn't really production-ready until recently. People who wanted to
test something bleeding-edge out in the real world should have still been free
to do so, but branding the product as a beta and giving it a sub-1.0 version
number would have helped set expectations correctly."

~~~
latch
[citation needed] indeed. My vote would be Java

~~~
bobz
"Java" is not an open source project.

As a Java programmer, I've not really heard a particular level of criticism of
the OpenJDK project. Java the language sure does have its fault.

Although I suspect you were just Java bashing.

[ed] s/it's/its/g

------
trustfundbaby
I remember being around when everyone was laying into PHP in much the same way
as people are tearing into mongodb now and it makes me smile, because it means
they're doing something right and they'll be around for quite a while if
they're responsive to the feedback.

------
jeffdavis
"At version 2.0, it is finally a stable product free of unexpected surprises."

Wow, that's a bold statement. I don't think I'd go out on a limb like that
considering that 2.0 has only been out for a couple months and doesn't have a
lot of production users.

------
dschoon
Here's that search-interest graph he inexplicably screencaps in the article,
rather than also linking to it:

[http://www.google.com/insights/search/#cat=0-5&q=cassand...](http://www.google.com/insights/search/#cat=0-5&q=cassandra%2Cmongodb%2Credis%2Ccouchdb%2Criak&date=1%2F2009%2035m&cmpt=q)

I added Cassandra, as it's substantially more popular than Riak (whereas HBase
is not, and you only get 5 slots).

------
PhuFighter
This is funny. I thought that the whole paradigm of startups is to first
release the minimally useful featureset and work on the other items later? I
mean - isn't the goal is to first get revenue and then fix up what needs
working later?

------
willvarfar
Is 2.0 rock solid?

Some say not yet: <http://news.ycombinator.com/item?id=3202028>

------
electic
Shouldn't you have all your disclaimers on the top of the article?

------
calibraxis
People should note that yesterday's anonymous pastebin "Don't use MongoDB"
article was apparently a hoax, if you look through the comments. (At least so
claims the troll who posted it.)
(<http://news.ycombinator.com/item?id=3202081>)

I used MongoDB last year and it worked fine for me. (I maintained it for about
9 months.) But of course I can't generalize that to other people's
experiences, so YMMV. ;) I just used it squarely in the use-case which
everyone mentions — many "well-behaved" writes which occur when no one's
reading. Multiple replicas. Leveraged its indices. It wasn't an authoritative
source of data, so in principle I could repopulate it. Dealt with failure
modes, so missing data wasn't catastrophic.

I think articles like this are useful; when evaluating software, one thing I
do is assume it's buggy and its proponents are deceivers. (Whether or not they
intend to deceive. I can imagine being corrupted by wanting something to be
true.) So among other things, I hunt down criticisms. Had this article been
around last year, I'm sure I would've found it a useful hub in doing this due
diligence.

(The actual thesis, of what the version number should be, is not so important
to me... Version numbers are in some sense arbitrary.)

~~~
rdtsc
A hoax or a double-hoax?

Original post was this by nomoremongo:

<http://news.ycombinator.com/item?id=3201772>

Post actually discussed was:

<http://news.ycombinator.com/item?id=3202081>

Very clever.

The way I understand, apparently nomoremongo wrote it but it was reposted
quickly by nmongo (<http://news.ycombinator.com/threads?id=nmongo>) in hopes
that they could then become the top post, so later on they can yell in all
caps how it was a hoax and thus discredit the original post.

You know nmongo, if you are trying to help MongoDB you just did the opposite.
I've said this many times here before, but the best marketing is your
competitors stupid marketing. You are providing stupid marketing for MongoDB
and you are hurting it.

> People should note that yesterday's anonymous pastebin "Don't use MongoDB"
> article was apparently a hoax,

Now another question isb do you know that it wasn't a hoax but continue in the
same vein in hopes to still save the day, or you actually believe it was a
hoax?

~~~
latch
There was always something suspicious about an anonymous post lacking any
verifiable facts. Then, 10gen's CTO states that none of it resonates with any
support issues they've had. Then, add this.

I know exactly what it will take for me to believe the pastbin story.

I'm curious, what will it take for you to not believe it?

~~~
rdtsc
I agree this is a shady post. Even "how" it was posted is shady. So I am not
standing 100% behind it. It is just more of a gut instinct.

At the same time, it got on the front page because the story resonated and
made sense to others.

There were quite a few people who commented how "oh yeah I've had problems
with lost data". And I think that is what pushed the post's popularity more
than the original pastebin. So the discussion got a life of its own after a
while. Followed by response posts and response posts to those and so on.

