
Don't use MongoDB - nmongo
http://pastebin.com/raw.php?i=FD3xe6Jt
======
harryh
Hi,

I run engineering for foursquare. About a year and a half ago my colleagues
and I and made the decision to migrate to MongoDB for our primary data store.
Currently we have dozens of MongoDB instances across several different data
clusters storing over a TB of data and handling 10s of thousands of requests
per second (mostly reads but the write load is reasonably high as well).

Have we run into problems with MongoDB along the way? Yes, of course we have.
It is a new technology and problems happen.

Have they been problematic enough to seriously threaten our data? No they have
not.

Has Eliot and the rest of his staff @ 10Gen been _extremely_ responsive and
helpful whenever we run into problems? Yes, absolutely. Their level of support
is amazing.

MongoDB is a complicated beast (as are most datastores). It makes tradeoffs
that you need to understand when thinking about using it. It's not necessarily
for everyone. But it most certainly can be used by serious companies building
serious products. Foursquare is proof of that.

I'm happy to answer any questions about our experience that the HN community
might have.

-harryh

~~~
fedd
have users of foursquare run into problems? were they serious? did someone
lose money? let's ask. it would answer whether to use an _eventually
consistent_ db.

~~~
harryh
> have users of foursquare run into problems?

Of course we've run into problems from time to time. No one goes from nothing
to foursquare's level of success without running into some bumps along the
way.

> were they serious? did someone lose money?

No.

> it would answer whether to use an eventually consistent db

MongoDB actually isn't really an eventually consistent datastore. It doesn't
(for example) allow writes to multiple nodes across a network partition and
then have a mechanism for resolving conflicts.

~~~
willvarfar
<http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/>

You had 11 hours downtime and didn't lose money?

What about opportunity cost? Reputation?

Now you have to share your secret :)

(I guess, if you weren't profitable, you had nothing to lose?)

~~~
harryh
The 11 hours of downtime was a pretty big deal, but it had very little to do
with MongoDB. It was basically a huge failure in proper monitoring.

~~~
9oliYQjP
Kudos for not blaming the tool when that would have been the easiest route.
It's worth mentioning that 10gen has MongoDB Monitoring Service out now. It
makes monitoring MongoDB instances a lot more accessible and convenient.

------
antirez
I appreciate the "public service" intend of this blog post, however:

1) It is wrong to evaluate a system for bugs now fixed (but you can evaluate a
software development process this way, however it is not the same as MongoDB
itself, since the latter got fixed).

2) A few of the problems claimed are hard to verify, like subsystems crashing,
but users can verify or deny this just looking at the mailing list if MongoDB
has a mailing list like the Redis one that is ran by an external company
(google) and people outside 10 gen have the ability to moderate messages. (For
instance in Redis two guys from Citrusbytes can look/moderate messages, so
even if I and Pieter would like to remove a message that is bad advertising we
can't in a deterministic way).

3) New systems fails, especially if they are developed in the current NoSQL
arena that is of course also full of interests about winning users ASAP (in
other words to push new features fast is so important that perhaps sometimes
stability will suffer). I can see this myself as even if my group at VMware is
very focused on telling me to ship Redis as stable as possible as first rule,
sometimes I get pressures about releasing new stuff ASAP _from the user base
itself_.

IMHO it is a good idea if programmers learn to test very well the systems they
are going to use with simulations for the intended use case. Never listen to
the Hype, nor to detractors.

On the other side all this stories keep me motivated in being conservative in
the development of Redis and try avoiding bloats and things I think will
ultimately suck in the context of Redis (like VM and diskstore, two projects I
abandoned).

~~~
moe
_1) It is wrong to evaluate a system for bugs now fixed_

I disagree. A project's errata is a very good indicator for the overall
quality of the code and the team. If a database-systems history is littered
with deadlock, data-corruption and data-loss bugs up to the present day then
that's telling a story.

 _2) A few of the problems claimed are hard to verify_

The particular bugs mentioned in an anonymous pastie may be hard to verify.
However, the number of elaborate horror-stories from independent sources adds
up.

 _3) New systems fails, especially if they are developed in the current NoSQL
arena_

Bullshit. You, personally, are demonstrating the opposite with redis which is
about the same age as MongoDB (~2 years).

~~~
WayneDB
I agree with your responses to 1 and 2. I take issue with the example for 3
though because Redis is nowhere near the complexity or feature set of MongoDB.

~~~
moe
I don't think that counts as an argument.

When you strip MongoDB down to the parts that actually have a chance of
working under load then you end up pretty close to a slow and unreliable
version of redis.

Namely, Mongo demonstrably slows to a crawl when your working-set exceeds your
available RAM. Thus both redis and mongo are to be considered in-memory
databases whereas one of them is honest about it and the other not so much.

Likewise Mongo's advanced data structures demonstrably break down under load
unless you craft your access pattern very carefully; i.e. growing records is a
nono, atomic updates (transactions) are a huge headache, writes starve reads
by design, the map-reduce impl halts the world, indexing halts the world, etc.
etc.

My argument is that the feature disparity between mongo and redis stems mostly
from the fact that Antirez has better judgement over what can be made work
reliably and what can not. This is why redis clearly states its scope and
limits on the tin and performs like a swiss watch within those bounds.

Mongo on the other hand promises the world and then degrades into a pile of
rubble once you cross one of the various undocumented and poorly understood
thresholds.

~~~
j_baker
If I recall correctly, mongo only requires that the _index_ gets stored in
memory. The actual data itself can go on disk.

~~~
kanwisher
If you actually use Mongo in practice, everything needs to be in ram to have
any kind of performance

------
foobarbazetc
No shit, nmongo.

Anyone with half a brain can go look at the MongoDB codebase and deduce that
it's amateur hour.

It's start up quality code but _it's supposed to keep your data safe_. That's
pretty much the issue here -- "cultural problems" is just another way of
saying the same thing.

Compare the code base of something like PostgreSQL to Mongo, and you'll see
how a real database should be coded. Even MySQL looks like it's written by the
world's best programmers compared to Mongo.

I'm not trying to hate on Mongo or their programmers here, but you've
basically paid the price for falling for HN hype.

Most RDBMSes have been around for 10+ years, so it's going to take a long,
long time for Mongo to catch up in quality. But it won't, because once you
start removing the write lock and all the other easy wins, you're going to hit
the same problems that people solved 30 years ago, and your request rates are
going to fall to memory/spindle speed.

Nothing's free.

~~~
itaborai83
I'm curious and I might be missing more than half of my brain. Would you be
willing to show some examples of bad coding on their source tree?

~~~
mushishi
I haven't ever used MongoDB but got interested, and first non-trivial source
file I picked is this:
<https://github.com/mongodb/mongo/blob/master/db/btree.cpp>

Take a look at for example: bool BtreeBucket<V>::find

Without even thinking about what it is doing, it's quite clear that it is not
readable code, and it's not immediately obvious what the high level structure
of the logic is. The function does not even fit into two screens so it's hard
to reason about; your short-time memory is overused.

~~~
felix_krull
this is the implementation of a b+ tree. the underlying logic has been very
well researched since the 70s.

if there is a part of mongodb that I am sure does not contain bugs, it is that
very file you link to.

if you want to know what it does, go out and read the relevant papers on data
base technology. or graduate in CS.

~~~
jemfinch
Clearly you didn't actually read the source file. I graduated in CS. I know B+
trees.

I also know that an 85-line, 7-argument method in a 1988-line file shouldn't
depend on a global variable ("guessIncreasing") modified from several other,
unrelated functions. I know that in bt_insert, which (apparently) assigns to
"guessIncreasing" and then resets it to false just prior to exit, should be
using an RAII class to do so instead of trying to catch every exit path,
especially in a codebase that uses exceptions.

This code is amateur hour.

------
LeafStorm
Just for comparison, CouchDB has had one major bug that could cause the loss
of data, detailed here: <http://couchdb.apache.org/notice/1.0.1.html>

The bug was only triggered when the delayed_commits option was on (holds off
on fsyncing when lots of write operations are coming in) and there was both a
write conflict and a period of inactivity - when the database was shut down,
any writes that happened afterwards would not be saved.

They immediately worked to develop a process that would prevent any data from
being lost if you didn't shut down the server, then a week later had released
an emergency bugfix version without the bug. Then later they released a tool
that could recover any data lost from the bug if the database hadn't been
compacted.

That's the kind of attitude database developers need to have towards data
integrity.

~~~
itaborai83
One of the things that I love about Couch is that the standard way to shutdown
the process is simply doing a kill -9 on the server process. No data loss. No
Worries. Want to back up your data? rsync it and be done with it.

Couch may have its warts, but it is damn reliable.

~~~
maxogden
I've heard from many people that with Couch you get "all of your
disappointment up front"

~~~
itaborai83
I feel that Couch has too much server side programming. It can be off puting
sometimes. If anyone wants to make some money, I'd suggest them putting a
server on top of a couch cluster that receives mongo queries.

I mean, how hard can it be to

1) Manage some indexes,

2) Keep some metadata around and

3) Build some half-assed single index query planner?

Couch is already a solid piece of technology. It just needs a better API to
"sit" on top of it, kinda like what Membase is doing now.

edit: or on top of Riak, Cassandra, PostgreSQL or etc ... on the API side,
Mongo has clearly won.

------
latch
There's a lot of anonymity going on here. A new HN account, an unknown company
and product, and claims with no evidence.

Why are't links to 10gen's Jira provided? Where's the test code that shows the
problems they had with the write lock?

This is an extremely shallow analysis.

~~~
angelbob
And yet he makes some good points. Pretty much all of this is verifiable.

I don't agree with a lot of his conclusions, but mostly his data is correct.

~~~
latch
Look, I'm not the best person to do this..but...good points?

1 - Default writes are unsafe by default:

MongoDB supports a number of "write concerns":

* fire-and-forget or "unsafe"

* safe mode (only written to memory, but the data is checked for "correctness", like unique constraint violations)

* journal commit

* data-file commit

* replicate to N nodes

The last 4 can be mixed and matched. Most (all?) drivers allow this to be
specified on a per-write basis. It's an incredible amount of flexibility. I
don't know of any other store that lets you do that.

When a user registers, we do a journal commit ({j:true}), 'cuz you don't want
to mess that up. When a user submits a score, we do a fire-and-forget,
because, if we lose a few scores during the 100ms period between journal
commit, it isn't the end of the world (for us, if it is for you, always use
j:true)

The complaint is the default-behavior (which I think you can globally
configure in most drivers) of the driver? Issue a pull request. Is the default
table created in MySQL still MyISAM ?

2 and 6 - Lost Data

This is the most damning point. But what can I say? "No?" My word versus his?
I haven't seen those issues in production, I hang out in their google groups
and I don't recall seeing anyone bring that up - though I do tend to avoid
anything complicated/serious and let the 10gens guys handle that. Maybe they
did something wrong? Maybe they were running a development release? Maybe they
did hit a really nasty MongoDB bug.

3 - Global Lock

MongoDB works best if your working set fits in memory. That should simply be
an operation goal. Beyond that, three points. First, the global lock will
yield, i believe (someone more informed can verify this). Second, the story
gets better with every version and it's clearly high on 10gen's list.

Most importantly though, it's a constraint of the system. All systems have
constraints. You need to test it out for your use-case. For a lot of people,
the global lock isn't an issue, and MongoDB's performance tends to be higher
than a lot of other systems. Yes it's a fact, but with respect to "don't use
MongoDB", its FUD. It's an implementation detail, that you should be aware of,
but it's the impact of that implementation details, if any, that we should be
talking about.

3 and 4 - Sharding

Sharding is easy, rebalancing shards is hard. Sharding is something else which
got better in 1.8 and 2.0, which the author thinks we ought to simply dismiss.
I don't have enough experience with MongoDB shard management to comment more.
I think the foursquare outage is somewhat relevant though (again, keeping in
mind that things have improved a lot since then).

7 - "Things were shipped that should have never been shipped"

This is a good verifiable point? I remember using MySQL cluster when it first
shipped. That was a disaster. I also remember using MySQL from a .NET project
and opened up a good 3-4 separate bugs about concurrency issues where you
could easily deadlock a thread trying to pull a connection from the connection
pool.

I once had to use use clearcase. Talk about something that shouldn't have
shipped.

This is essentially an attack on 10gen, that ISN'T verifiable. Again, it's his
anonymous word versus no ones. Just talking about it is giving it unjust
attention.

8 - Replication

It's unclear if this is replica sets or the older master-slave replication.
Either way, again, I don't think this is verifiable. In fact, I can say that,
relatively speaking, I see very few replica set questions in the groups. It
works for me, but I have a very small data set, my data pieces themselves are
small. Obviously some people are managing just fine (I'm not going to go
through their who's who, I think we all know some of the big MongoDB
installations).

9 - The "real" problem

We've all seen some pretty horrible things. I was using MySQL in 5.0 and there
was some amazing bugs. There's a bug, which I think still exists, where SQL
Server can return you the incorrect inserted id (no, not using @@identify,
using scope_identity) when you use a multi-core system. MS spent years trying
to fix it.

I guess I can say what 10gen never could...If you were using MongoDB prior to
1.8 on a single server, it's your own fault if you lost data. To me,
replication as a means to provide durability never seemed crazy. It just means
that you have to understand what's going on.

Look, I don't doubt that this guy really ran into problems. I just think they
have a large data set with a heavy workload, they thought MongoDB was a silver
bullet, and rather than being accountable for not doing proper testing, they
want to try and burn 10gen.

They didn't act responsibly, and now they aren't being accountable.

~~~
einhverfr
"This is a good verifiable point? I remember using MySQL cluster when it first
shipped. That was a disaster. I also remember using MySQL from a .NET project
and opened up a good 3-4 separate bugs about concurrency issues where you
could easily deadlock a thread trying to pull a connection from the connection
pool."

You can STILL deadlock a transaction against itself in MySQL w/Innodb. How do
they let this happen? I do not know. I just know I have been bitten by
deadlocks in multi-row inserts quite often there enough to get really really
frustrated when I use that db. This is in fact documented in the MySQL manual.

For better or worse, projects which start out without a goal to offer highly
reliable software from the start never seem to be able to offer it later.

~~~
latch
I've also seen a lot of SQL Server developers write large stored procedures
that manage to easily deadlock. It's been years since I dealt with it...had
something to do with lock escalation, from a read lock to an update lock to an
insert lock.

You could say "don't use SQL Server"..or you could say "it's important that
you understand SQL Server's locking behavior"

~~~
einhverfr
It's one thing for two transactions to deadlock against eachother. It takes
special talent to allow a transaction to deadlock against itself, which InnoDB
apparently allows.

I have NEVER had issues with PostgreSQL transactions deadlocking against
themselves, even with monstrous stored procedures.

------
nomoremongo
Pastebin author here.

Refutations are going to fall into two categories, it seems:

1\. Questioning my honesty

2\. Questioning my competence

Re #1, I'm not sure what you imagine my incentive to lie might be. I honestly
just intended this to benefit the community, nothing more. I'm genuinely
troubled that it might cause some problems for 10gen, b/c, again, Eliot & co
are nice people.

Re #2, all I can do is attempt to reassure you we're generally smart and
capable fellows. For example, these same systems exhibit none of these
problems, and we're sleeping quite well through the night, on the new database
system they've moved to. I'll omit the name of the database system just so
there is no conflict that might undermine my integrity and motives (see #1).

edit:

(also, there are a few comments about "someone unknown/new around here"...
trust me, I'm not new or unknown. I'm a regular.)

~~~
nomoremongo
Some are also (fairly) questioning "why the anonymity?", and "where is the
evidence?"

Those two things are connected: I can't provide the evidence without revealing
identity. And the reason for the anonymity is we still have some small
databases with 10gen and a current support contract. I had intended to go
public with all this after we had transitioned off the system entirely, but
more and more reports have continued to pop up of people having trouble with
MongoDB, and it seemed as though delaying would be imprudent. An anonymous
warning would be more valuable than saying nothing.

So--if you choose to ignore or dismiss our claims, you're entitled. :-) I
still feel satisfied that I did what I needed to do.

~~~
ajsharp
Are you willing to reveal your and your company's identity once you're
completely off of Mongo?

~~~
nomoremongo
Yep. I do regret not GPG signing it or something so we could later claim it
without more conspiracy theories. But I'll blog about it on an official blog
as soon as we're clear of any interest in MongoDB.

------
jonpaul
I've used MongoDB in production since the 1.4 days. It should be noted that my
apps are NOT write heavy. But, many of the author's points can be refuted by
using version 2.0.

Regarding the point of using getLastError(), the author is completely correct.
But the problem is not so much that MongoDB isn't good, it's that developers
start using it and expect it to behave like a relational DB. Start thinking in
an asynchronous programming paradigm, and you'll have less problems.

I got bit my MongoDB early on. When my server crashed, I learned real quickly
what fsync, journaling, and friends can do. The best thing a dev can do before
using MongoDB is to RTFM and understand its implications.

The #1 reason that I used MongoDB, was because of the schema-less models.
That's it. Early on in an applications life-cycle, the data model changes so
frequently that I find migrations painful and unnecessary.

My two cents, hopefully it helps.

~~~
CarlHoerberg
Schema-less is imho a overrated feature. ORMs like DataMapper (Ruby) and
NHibernate (.NET) can generate the schema on the fly for RMDBS, so no need for
migrations pre-production. But when your application is in production you need
migrations even with a "schema-less" db! See, rename a field and "all your
data" is lost, unless you migrate the data from the old field to the new one..

~~~
vidarh
"Schema-less" has the potential (if you use it properly) advantage of allowing
gradual migration.

As long as your code can handle all versions of objects in current use, you
can deploy new code, then either migrate objects as they're updated/rewritten,
and/or slowly migrate objects in the background.

For certain types of schema changes in large enough data stores, this can be a
killer feature. I remember one RDBMS setup I had to deal with where we were
"stuck" having to do a lot of suboptimal schema changes because the changes we
actually wanted to do resulted (based on tests in our dev environment) the
system to slow to a crawl where it was unusable for 8+ hours and we just
couldn't afford that kind of downtime. We spent a lot of engineering time
working our way around something that'd simply be a non-issue in a schema-less
system.

~~~
goldmab
Fair enough, but you can also have a schemaless store by using JSON fields in
PostgreSQL or MySQL.

~~~
angelbob
Not indexably. But you _can_ do a hideous many-tables-per-real-table thing
where each field gets a tall thin table in PostGRES or MySQL, do a lot of
joins to get your data, and index the fields in _that_.

It's not as awful as it sounds, performance-wise. It _is_ as awful as it
sounds in terms of maintainability, of course.

~~~
joevandyk
You can index hstore fields in PostgreSQL.

------
electic
We extensively tested this inside Viralheat with a write heavy load of over
30,000 writes per second and basically it failed our test. It is not robust
for the analytics world is the conclusion we came to. Though, I hope it gets
better one day...it has potential.

~~~
jdagostino
what did you end up going with?

~~~
electic
Our company is a big data company. So our amazing engineers are responsible
for storing hundreds of millions of pieces of data per week AND also crunching
and analyzing that data. So basically we need a system where we can have
incredible write and read performance but also a system that is elastic in
nature. Most importantly, it has to be available.

Before I go into more details, MongoDB is great for most people who don't have
a high transaction volume. It is easy to setup and easy to use. So if you are
in this camp, MongoDB is probably a good fit for you.

We did about two months worth of extensive tests in our lab. Basically two
things didn't bode well for us. One, the locking killed reading...we just had
a hard time keeping the flow of writes and the flow of data to our statistics
cluster alive. Yea, you could use replication but that too didn't work too
well performance wise. Two, the sharding didn't seem that robust. As the
cluster got bigger and bigger, we started noticing the overhead of keeping it
up was getting to be too great. Rather than write in detail, I think this
article covers some of the scaling issues we experienced:

<http://blog.schmichael.com/2011/11/05/failing-with-mongodb/>

We finally used a hybrid system. We went with Membase, now CouchBase, to
handle immediate storage and we are now implementing Hadoop for our long term
storage needs.

P.S. Our entire stack is a KV in nature.

~~~
Goldcap
Just reading about your transactional volume, it seems like at it's face
MongoDB wouldn't be a good fit for this project. 30k per second is not
anywhere MongoDB pretends to live, I think by their own admission. And
Sharding in MongoDB, while being called a core feature, was bolted on after
core development, probably intended to give Mongo some credibility with those
who want it to be more scalable. IMHO if you need that kind of scalability,
you're already straying from the Mongo Niche, 2.0.0 notwithstanding.

So agreeing with a point earlier, if you don't like a write lock
implementation, and have concerns about scaling, and have a huge transactional
volume, just really not something that fits well with MongoDB.

I've been using Mongo now (currently using 1.8) for three (is it almost three
now?) years, 2 million hits/day, with a replicated set, and while I've needed
maintenance, reindexing, and (gasp) restarts on occasion, never had any of the
problems identified by the author of this post.

Bottom line, sounds to me like someone was in over someone's head from an
architectural standpoint, made a bad choice of MongoDB, and then blamed 10gen
for his own lack of foresight. So while I empathize with the struggle, I fault
him for not knowing his options in advance, TESTING first, then betting the
farm on a fairly new opensource codebase.

LOTS of other database solutions that would scale better. Analyzing lots and
lots of transactional stateless data with MongoDB map-reduce? Well, just kinda
like killing yourself by trying to sprint up from the bottom of the Grand
Canyon. "You really tried to do that?"

~~~
christkv
there's an initial hadoop plugin for mongo that might be a better fit for
doing map-reduce over large datasets <https://github.com/mongodb/mongo-hadoop>

------
nikcub
Links about Foursquare's problems with MongoDB. The site was down for a while
when their 1.6 instance crashed:

* <http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/>

* <http://www.infoq.com/news/2010/10/4square_mongodb_outage>

* [http://groups.google.com/group/mongodb-user/browse_thread/th...](http://groups.google.com/group/mongodb-user/browse_thread/thread/528a94f287e9d77e?pli=1)

I like MongoDB, it is easy to setup, work with and to understand. I think it
has an opportunity to become the mysql of nosql (in more ways than one)

Foursquare and 10gen (the makers of MongoDB) share USV as an investor.

~~~
vannevar
It should be noted that this was not really a problem with MongoDB. Foursquare
used a poorly-chosen shard key that caused a disproportionate load on one of
its shards, and on top of that did not have proper system monitoring in place
to alert them that a server was running out of RAM. It should also be noted
that no data was lost in the process of resolving the problem.

~~~
latch
And both companies were extremely transparent about it and the community
generally appreciated the way it was handled:

[https://groups.google.com/forum/#!topic/mongodb-
user/UoqU8of...](https://groups.google.com/forum/#!topic/mongodb-
user/UoqU8ofp134)

------
ehwizard
From CTO of 10gen

First, I tried to find any client of ours with a track record like this and
have been unsuccessful. I personally have looked at every single customer case
that’s every come in (there are about 1600 of them) and cannot match this
story to any of them. I am confused as to the origin here, so answers cannot
be complete in some cases.

Some comments below, but the most important thing I wanted to say is if you
have an issue with MongoDB please reach out so that we can help.
<https://groups.google.com/group/mongodb-user> is the support forum, or try
the IRC channel.

> __1\. MongoDB issues writes in unsafe ways _by default_ in order to win
> benchmarks __

The reason for this has absolutely nothing to do with benchmarks, and
everything to do with the original API design and what we were trying to do
with it. To be fair, the uses of MongoDB have shifted a great deal since then,
so perhaps the defaults could change.

The philosophy is to give the driver and the user fine grained control over
acknowledgement of write completions. Not all writes are created equal, and it
makes sense to be able to check on writes in different ways. For example with
replica sets, you can do things like “don’t acknowledge this write until its
on nodes in at least 2 data centers.”

> __2\. MongoDB can lose data in many startling ways __

> 1\. They just disappeared sometimes. Cause unknown.

There has never been a case of a record disappearing that we either have not
been able to trace to a bug that was fixed immediately, or other environmental
issues. If you can link to a case number, we can at least try to understand or
explain what happened. Clearly a case like this would be incredibly serious,
and if this did happen to you I hope you told us and if you did, we were able
to understand and fix immediately.

> 2\. Recovery on corrupt database was not successful, pre transaction log.

This is expected, repairing was generally meant for single servers, which
itself is not recommended without journaling. If a secondary crashes without
journaling, you should resync it from the primary. As an FYI, journaling is
the default and almost always used in v2.0.

> 3\. Replication between master and slave had _gaps_ in the oplogs, causing
> slaves to be missing records the master had. Yes, there is no checksum, and
> yes, the replication status had the slaves current

Do you have the case number? I do not see a case where this happened, but if
true would obviously be a critical bug.

> 4\. Replication just stops sometimes, without error. Monitor > your
> replication status!

If you mean that an error condition can occur without issuing errors to a
client, then yes, this is possible. If you want verification that replication
is working at write time, you can do it with w=2 getLastError parameter.

> __3\. MongoDB requires a global write lock to issue any write __

> Under a write-heavy load, this will kill you. If you run a blog, you maybe
> don't care b/c your R:W ratio is so high.

The read/write lock is definitely an issue, but a lot of progress made and
more to come. 2.0 introduced better yielding, reducing the scenarios where
locks are held through slow IO operations. 2.2 will continue the yielding
improvements and introduce finer grained concurrency.

> __4\. MongoDB's sharding doesn't work that well under load __

> Adding a shard under heavy load is a nightmare. Mongo either moves chunks
> between shards so quickly it DOSes the production traffic, or refuses to
> more chunks altogether.

Once a system is at or exceeding its capacity, moving data off is of course
going to be hard. I talk about this in every single presentation I’ve ever
given about sharding[0]: do no wait too long to add capacity. If you try to
add capacity to a system at 100% utilization, it is not going to work.

> __5\. mongos is unreliable __

> The mongod/config server/mongos architecture is actually pretty reasonable
> and clever. Unfortunately, mongos is complete garbage. Under load, it
> crashed anywhere from every few hours to every few days. Restart supervision
> didn't always help b/c sometimes it would throw some assertion that would
> bail out a critical thread, but the process would stay running. Double fail.

I know of no such critical thread, can you send more details?

> __6\. MongoDB actually once deleted the entire dataset __

> MongoDB, 1.6, in replica set configuration, would sometimes determine the
> wrong node (often an empty node) was the freshest copy of the data
> available. It would then DELETE ALL THE DATA ON THE REPLICA (which may have
> been the 700GB of good data)

> They fixed this in 1.8, thank god.

Cannot find any relevant client issue, case nor commit. Can you please send
something that we can look at?

> __7\. Things were shipped that should have never been shipped __

> Things with known, embarrassing bugs that could cause data problems were in
> "stable" releases--and often we weren't told about these issues until after
> they bit us, and then only b/c we had a super duper crazy platinum support
> contract with 10gen.

There is no crazy platinum contract and every issue we every find is put into
the public jira. Every fix we make is public. Fixes have cases which are
public. Without specifics, this is incredibly hard to discuss. When we do fix
bugs we will try to get to users as fast as possible.

> __8\. Replication was lackluster on busy servers __

This simply sounds like a case of an overloaded server. I mentioned before,
but if you want guaranteed replication, use w=2 form of getLastError.

> __But, the real problem: __

> 1\. Don't lose data, be very deterministic with data

> 2\. Employ practices to stay available

> 3\. Multi-node scalability

> 4\. Minimize latency at 99% and 95%

> 5\. Raw req/s per resource

> 10gen's order seems to be, #5, then everything else in some order. #1 ain't
> in the top 3.

This is simply not true. Look at commits, look at what fixes we have made
when. We have never shipped a release with a secret bug or anything remotely
close to that and then secretly told certain clients. To be honest, if we were
focused on raw req/s we would fix some of the code paths that waste a ton of
cpu cycles. If we really cared about benchmark performance over anything else
we would have dealt with the locking issues earlier so multi-threaded
benchmarks would be better. (Even the most naive user benchmarks are usually
multi-threaded.)

MongoDB is still a new product, there are definitely rough edges, and a
seemingly infinite list of things to do.[1]

If you want to come talk to the MongoDB team, both our offices hold open
office hours[2] where you can come and talk to the actual development teams.
We try to be incredibly open, so please come and get to know us.

-Eliot

[0] <http://www.10gen.com/presentations#speaker__eliot_horowitz> [1]
<http://jira.mongodb.org/> [2] <http://www.10gen.com/office-hours>

~~~
skrebbel
> _If you want to come talk to the MongoDB team, both our offices hold open
> office hours[2] where you can come and talk to the actual development teams.
> We try to be incredibly open, so please come and get to know us._

I envy how all your (potential) customers are from California.

~~~
kanwisher
Half the startups in NYC use mongo, but that might be cause they are connected
to Union Sq Ventures

~~~
martin_sunset
Or it might be because MongoDb really shines in the typical start-up use
case...

~~~
einhverfr
Or at least better than MySQL for cases where not all data fits a perfect
relational model?

------
chx
This rant is completely outdated and it shows: "pre transaction log" "fixed
this in 1.8". You realize MongoDB is at 2.0 now and the transaction log was
introduced in 1.8, right? Yes, MongoDB had problems but since the transaction
log it's pretty good. I have used MongoDB since early 1.3 and I knew what I
was doing and we never lost a bit of data. There is a tradeoff -- while
MongoDB handled write load easily that a MySQL box with 2-3 times the RAM ,
I/O capability couldn't at all we understood the bleeding edge of using
MongoDB back then. We have, for example, kept a snapshot slave which shot
itself down often, took an LVM snapshot then continued replicating. Never
needed those.

We have meticulously kept a QA server pair around and the only time when I
have ran into a data loss problem was when I have hosed one of those -- but
only one and even the QA department could continue (and hosing that server was
me not knowing that Redhat 5 had separate e4fsprogs and e2fsprogs, only
partially MongoDB fault but now it works without O_DIRECT so even this would
not be a problem any more) . Never understood for example how could foursquare
get where they got to -- didnt they have a QA copy similarly?

~~~
dextorious
""This rant is completely outdated and it shows: "pre transaction log" "fixed
this in 1.8". You realize MongoDB is at 2.0 now and the transaction log was
introduced in 1.8, right?""

You _do_ realize that 1.8 vs 2.0 is not eons ago, but just a few months,
right? And you do realize that the cavalier-throw-all-caution-to-the-wind
development attitude that cause all this problems can and does continue to
exist? You don't eliminate that just because you added a transaction log (as
late as in 1.6, IIRC).

Also: <http://news.ycombinator.com/item?id=3200683>

------
openmosix
Well, I worked in Vodafone (and Nokia) in very large (laaarge) projects,
serving ~50 milions users. Years ago, no hope for NoSQL, we used MySQL. We hit
at least 10/20 bugs, solved by 'hotpatch' from Sun. So? I think as developers
we should get used to bugs and patches. Should I write a post "don't use
MySQL?". We also hit several bugs in the generational garbage collector. Stop
using Java? I don't feel the drama here.

~~~
wickedchicken
> Should I write a post "don't use MySQL?".

yes

> Stop using Java?

yes

Tongue-in-cheek aside, the author's point is that regardless of its current
status, MongoDB has been pushed on a lot of people hungry for
performance/simplicity; in that singular pursuit they may be setting
themselves up for disaster later on. Most developers have a (perhaps unspoken)
assumption that a successful write to a database means that data Will Not
Disappear. If Mongo violates this assumption, then either developers'
attitudes have to change or they should look at other software to avoid being
bitten.

Take something like sockets: by using TCP, I am telling my development
environment that I would like an unbroken, sequential stream of traffic to
another endpoint. Just as importantly, I would like to be notified if this
ever is not the case. If I discovered errors in my TCP stack, I want those
fixed _pronto_ because any kind of workaround would be reimplementing the very
task TCP is meant to cover -- I might as well write my own sequencing and
retransmission logic on top of UDP!

~~~
openmosix
Then I think it is way easier to write a post "Do not use technology, go back
to the cave". Any technology has chances to fail, can be SQL, Cloud, yadda
yadda. And if you want to work on the 'edge' (innovating to disrupt your
competitors), that's a risk you should accept. Blaming the tools you use to
achieve that point is childish.

------
woodhull
I couldn't agree more with this analysis, with the added addition that the
single threaded nature of the JS interpreter can also cause really bad &
unexpected performance things to happen.

Most of the people who are excited about mongo, have never used it in a high
volume environment, or with a large dataset. We used it for a medium sized app
at my last employer, with paid support from 10gen, and everyone on the project
walked away wishing we had stayed with a more mature data store.

Of _course_ things work well when traffic is low, everything fits in memory,
and there are no shards.

------
davyjones
I would love to see a thorough approach in which such claims are actually
_shown_ and can be reproduced. This helps everyone immensely...from 10gen to
people looking to adopt.

------
bbulkow
Disclosure: I wrote a product called Citrusleaf, which also plays in the NoSQL
space.

My focus in starting Citruseaf wasn't features, it was operational
dependability. I had worked at companies who had to take their system offline
when they had the greatest exposure - like getting massive load from the Yahoo
front page (back in the day). Citrusleaf focuses on monitoring, integration
with monitoring software, operations. We call ourselves a real-time database
because we've focused on predictable performance (and very high performance).

We don't have as many features as mongo. You can't do a javascript/json long
running batch job. We'll get around to features - right now we're focused on
uptime and operational efficiency. Our customers are in digital advertising,
where they have 50,000 transactions per second on terabyte datasets (see us at
ad:tech in NYC this coming week).

Here's a performance analysis we did: <http://bit.ly/rRlq9V>

This theory that "mongo is designed to run on in-memory data sets" is,
frankly, terrible --- simply because mongo doesn't give you the control to
keep you in memory. You don't know when you're going to spill out of memory.
There's no way to "timeout" a page cache IO. There's no asynchronous interface
for page IO. For all of these reasons - and our internal testing showing page
IO is 5x slower than aio; the reason all professional databases use aio and
raw devices - we coded Citrusleaf using normal multithreaded io strategies.

With Citrusleaf, we do it differently, and that difference is huge. We keep
our indexes in memory. Our indexes are the most efficient anywhere - more
objects, fea. You configure Citrusleaf with the amount of memory you want to
use, and apply policies when you start flowing out of memory. Like not taking
writes. Like expiring the least-recently-used data.

That's an example of our focus on operations. If your application use pattern
changes, you can't have your database go down, or go so slowly as to be nearly
unusable.

Again, take my comments with a grain of salt, but with Citrusleaf you'll have
great uptime, fewer servers, a far less complex installation. Sure, it's not
free, but talk to us and we'll find a way to make it work for your project.

~~~
pbiggar
Looks interesting. May I suggest you provide a hosted service? With mongo, I
tried it online and got a feel for it before we signed up, and there are
multiple hosted services so I didn't have to worry about setting it up in the
cloud. Looking at citrusleaf.com, though the blurb sounds like I might like
it, nothing else really helps me. It's NoSQL, but that doesn't say anything. I
know that memcache has a use case, and I know mongo's use case, and redis',
but I don't see yours.

(PS I know you're enterprise software, but still).

------
donpark
Burden of proof is on 10gen, not frustrated customers. This post is believable
enough for me to avoid using MongoDB for write-heavy apps.

~~~
libria
What if it's not a frustrated customer but a libelous, frustrated competitor
instead?

~~~
peteforde
Except that those are not the words on a libelous, frustrated competitor. I've
seen these claims validated over and over again both by posts on HN but also
people I trust that have worked with MongoDB under load.

Performance benchmarks stop being meaningful when you realize that you can't
fix the problem you're having without committing to a system-wide shutdown of
unknown duration.

The main point that the author makes is that the creators of MongoDB do not
follow rigourous practices. If this doesn't bother you, please go right ahead
and use anything you wish.

I hear that /dev/null is really zippy these days.

------
davidw
People seem to be jumping on a lot of the NoSQL stuff for no good reason. You
can get a _lot_ of mileage out of something like Postgres or Mysql, and they
work pretty well for a lot of things. Ok, if you get _huge_ , you might have
to figure out something else, but that's a good problem to have. On the other
hand, if you've lost all your data, you're not going to _get_ huge.

I had to use MongoDB recently, and I wasn't very pleased with it. It wasn't
really appropriate for the project, which had data that would have fit better
in a relational DB.

------
christkv
A story from a newly created account by a person nobody can verify is real and
asking other people to submit his rant (to gain what? credibility to his
story?)

nomoremongo 4 hours ago | link I'd appreciate if someone would submit this
story for me. <http://pastebin.com/raw.php?i=FD3xe6Jt>

What's up with the trolling here. Who are you and what company do you work for
that has had all those problems you mentioned ?

~~~
slowpoke
Attacking the messenger is shallow. How about you look at the points - whether
valid or not - he or she raises instead and try to refute them? It matters
little if that person is well known or someone entirely new. I don't see how
the relative anonymity of a person is in any way related to his or her
credibility.

Besides, calling a position you don't agree with "trolling" with no further
argumentation is 4chan level of discourse, and I know what I'm talking about
when I say this. I will not take a side in this discussion because I'm not
qualified to voice an opinion over things I do not understand well enough
(databases), but I had to point this out.

~~~
christkv
it's still a valid point as there are no references to back up any off the
claims in the post. he should at least have included links to issues in their
jira or some way of replicating the problem he is experiencing.

as it stands now it's not fact based and could as much be opinion as there is
no way to weight the merit of the claims against anything substantial :(

~~~
slowpoke
Now that's more of a valid argument.

I just dislike calling anyone who prefers to stay in relative anonymity (for
whatever reason) or is simply new to a community "not credible", at least if
it's only because of those attributes. It's a thinly veiled ad hominem.

------
ericflo
But it does 8,000,000 operations per second!
[http://www.snailinaturtleneck.com/blog/2010/05/05/with-a-
nam...](http://www.snailinaturtleneck.com/blog/2010/05/05/with-a-name-like-
mongo-it-has-to-be-good/)

(Sorry, possibly excessive snark. That said, I think that blog post is a good
example of one of this pastebin author's points: at least historically,
benchmark numbers have been a big focus for Mongo developers.)

~~~
j_baker
According to the link, that's 320k operations per server, which means that it
handles 8 million operations per second with 25 servers.

I don't think it's a stretch to say that _any_ database that has 25 servers
should be able to handle _at least_ 8 million operations a second.

------
mtkd
Anyone using Mongo currently has to be aware there are likely to be some
teething issues as it is very new technology.

I haven't used it in production (yet), but I would have no fear of using it
today. I would run regular consistency monitoring and validation around
critical data just like I do with our SQL databases.

I'm willing to take my part of the pain and inconvenience in making technology
like this stable.

You could have written this about any adolescent SQL server BITD. All the
tools you use today had to go through this process.

For me Mongo is awesome and getting more awesome. Mongo and technology like it
is the reason I still get excited about writing new apps.

------
itaborai83
Given the current discussion about MongoDB, I think that the following post is
worth revisiting.

<http://news.ycombinator.com/item?id=2538037>

I'm not a Riak user, but I agree with Basho's analysis on this case.

------
Encryptor
This is textbook projecting. The team deployed an immature database and tried
to push its limits, and now they're saying: "it sucks!". Sure, a 2 year-old
database is the problem, not your ability to make architectural decisions.
Sounds like someone is looking for a scapegoat. They took a risk and failed
and this is just a poor way of coping with it. It's OK to publish your
experiences on your blog (which they did a few days ago). It's NOT OK to go
around the Internets publishing "anonymous" articles about how MongoDB sucked
for you, as if no one will see what you did there. That's just defamation,
folks.

On a side note, we also looked at MongoDB and, after running a few tests, we
concluded that it is a glorified key-value pair storage. That said, we did use
it in a few small-scale projects and it works great.

The bottom line: choose the right tool for the job and don't bitch about the
tools when you fail.

~~~
einhverfr
I would say however that a significant subset of NoSQL deployments (perhaps
even a large majority) are by definition lacking in sound architectural
decisions. I'd argue the same goes for ORM-based database access too.....

The failure exists because many developers don't ask a few key questions up
front:

1) What exactly can the database do for us? 2) Which of these do we need? For
example, is the database going to be a point of integration? 3) What failsafe
or security measures do we want to count on in the database?>

These don't always have objectively right/wrong answers but failure to ask the
questions leads to poor use of databases regardless of what technologies are
chosen.

------
chrissanz
This article should be banned for lack of references and examples. For those
of you looking to learn mongodb check this out
<http://www.mongodb.org/display/DOCS/Production+Deployments>

------
brainless
Is there someone here in HN who has used MongoDB with large data sets, high
concurrency application? Can someone else share some light? And maybe a more
recent version of MongoDB...

~~~
rbanffy
There is a team in the company I work for who has deployed Mongo to
production, with, I suppose, a heavy load. I can check with them. I heard no
complains, but the company is large enough for me not to hear everything.

------
MikeCampo
Very interesting. I recently worked on a little side project using MongoDB and
I noticed during testing that some records would disappear at random. Glad to
see this has happened to others. I suppose it's time to check out Redis.

------
itaborai83
I feel like a dick, but I have got to ask. Is it Disney? Disney is on both the
couchbase and 10gen sites. Both sites mention that they are using their NoSQL
solutions to power their social and online games. Couchbase powers Zynga and
can arguably be considered the leader on this specific market. Am I close?

------
js4all
Losing data is one of the most serious bugs. When I am using a DBMS in
production, I have to rely on it 100%. I believe the complains made could be
real because MongoDB is highly optimized for speed. But, as long as there is
no documented and maybe reproducible case, this post can't be taken for real.

~~~
vannevar
I'm very skeptical of the lost data claims. People using MongoDB are writing
new code. New code has bugs. When data is lost, it's certainly more convenient
to claim 'the datastore ate it' than to admit you have a critical bug in your
own code.

~~~
js4all
I agree. And this is why I like CouchDB's versioning. In similar cases we
could track down unwanted deletes using previous versions of the document in
question. Without those, it could easily be interpreted as "data loss".

------
hendler
10gen might become a victim of it's own popularity. I have heard:

* Yes, playing with Mongo is playing with fire. Know what you are doing. We don't claim that you should use us as your only database.

* We're going to fix these issues soon. The beginning days of MySQL etc were also frightening, with Oracle and MS SQL Server admins warning of all the dangerous things that can happen.

If they confront their issues, I think it's just a matter of time before Mongo
wins the NoSQL race. They have what matters most - good people, a brand, and
great expectations from customers.

------
rshm
Not sure how MongoDB deals with writes in recent versions. It used to leave
everything blindly in control of mmap implementation of OS.

------
antimora
I was planning adopt MongoDB for my big project and this post puts some
doubts. Is this true? Could anyone confirm or deny this?

~~~
StavrosK
I've had version 1.6, if I remember correctly, just lose half my data with no
warning, so I can believe this.

The fact that the 32-bit version also truncated data with no warning doesn't
make me hopeful, either.

------
nullymcnull
These posts are exceptionally well-timed for me. I'm currently wrangling with
one of those problems that is just not solved well with relational databases,
or even the flat document store that my company already uses. I've been
looking hard at Redis and Mongo, and of late I'm leaning towards Mongo. You
know what? Having read these posts and the threads - and having extracted what
little in the way of factual datapoints I could from them - I'm pretty sure
I'll still be riding into production with Mongo.

Some of you guys who were all aboard the NOSQL UBER ALLES hype train a year or
two ago now seem to be swinging back - with scrapes and bruises from some
truly harebrained misdeployments, no doubt - to a reactionary 'All NoSQL are
doomed to reimplement everything relational' nihilism. Back to shitty OR tools
and ugly-ass joins for everyone, damnit! Harumph. I could write a novel just
quoting and responding to some of the stupid pronouncements and prescriptions
for correctness on these Mongo threads' comments.

Anyways. With regards to this specific post:

Let's rewind a couple of years. I work for a significantly smaller company
than our anon raconteur, from the sound of it. At roughly the same time as he
adopted Mongo, I was also looking hard at it, to solve some problems where the
relational options available to us weren't going to cut the mustard. Damn, did
Mongo look cool, fun even. The flexibility of having arbitrary object graphs
in it and querying down into subdocument properties with real indexing on
them, well, it sets nearly any developer's heart a-flutter, particularly those
of us who work on dynamic web stuff a fair bit.

Sadly, I have to be an engineer and pragmatist first, I have to think about
much more than what is sexy and comfortable for devs. I've been through my
share of 3AM wake-up world-enders, I've learned the hard lessons. I considered
variables like basic maintainability by ops people, credibility of the vendor,
track record, robust redundancy and availability solutions, how far up shit
creek we'd be in a disaster recovery scenario, etc. And after thorough
research I decided that, for my much smaller company which can afford to be
judiciously bleeding-edge where it makes sense to, Mongo was just not clearing
the bar. I sucked it up and used unsexy properly normalized relational
database tables, then utilized memory caching and async updates to try and
paper over the performance issues inherent in that scheme.

What was anon doing? Charging full steam ahead into the wild unknown with
Mongo, on an effort that was apparently important to a userbase of millions at
a "high profile" company. That's some mighty responsible stewardship of the
company, or even just the IT department's, broader concerns right there. Now,
I understand that it totally makes sense to have used Mongo 1.x as a scrappy
startup on a greenfield project, no problem. But this guy was in a different
situation. At that scale in a BFC, conservatism rules, and it rules for a
reason.

I think I am starting to understand why anon is anon.

In any case, we're likely going to roll with Mongo soon. It is indeed
maturing, and I'm a lot more comfortable with it on all of my criteria these
days. I have possibly read more of the JIRA issues than some of the devs, and
they are prioritizing the Right Things - at least for my tastes. By my
estimation it is on the right track.

Even having not _used_ it in production yet, I can identify some things people
are complaining about here as complete and utter RTFM-fail, misunderstanding
of what it is they're deploying and whether what they expect out of it is
realistic before they begin. I understand the tradeoffs of Mongo, and in my
particular situation they make good sense.

~~~
linuxhansl
Dislaimer: One of the HBase committers here.

There is/was a LOT of hype in NoSQL. Hype and very little understanding what
NoSQL is about and specifically why/when choosing a NoSQL database makes sense
and when it does not.

It is not about SQL vs. not. It is about consistency, availability, and
partition tolerance, and which of these you are willing to give up.
Surprisingly few people know about the CAP theorem and what it implies.

Generally there two main reasons why you switch to NoSQL (Not Only SQL)
databases. 1\. You need to scale out (add more storage and query capacity by
adding more machines). 2\. You do not want to be locked into a relational
schema.

There is no magic in NoSQL! To scale out these stores give up exactly those
features that would impede scaling out (for example global transactions).

What one has to realize that you give up a lot by letting go of relational
databases: Fast ad hoc queries, transactions, consistency, and the entire
theory and research behind it. I don't see why relational databases are
"unsexy". A good query planner is almost a work of art and it is amazing what
they can do. In fact we use them alongside HBase.

Instead of ad hoc queries you either get slow map/reduce type "queries" or you
need to plan your queries ahead of time and denormalize the data accordingly
at insert time.

You better have _very_ good reasons for the switch.

When we evaluated NoSQL stores a while back (for #1 type problems) I was quite
the skeptic. We looked at Riak, Redis, MongoDB, CouchDB, Cassandra, and
HBase). Eventually we settled on HBase because needed consistency over
availability and we needed more than just a key value store, and we already
some Hadoop projects... and I started to drink the cool-aid :)

Personally, I am not a big fan of eventually consistent (but highly available)
stores, because it is extremely difficult to reason about the state of the
store; and the application layers bears a lot of extra complexity. But your
mileage may vary.

HBase of course is new as well, and I needed to start fixing bugs and adding
new features that we needed.

As with "Java is better than C++" type discussions, here too, what store to
use depends on the use case. As parent points out any hype about anything is a
bad thing, because it typically replaces reasons as an instrument of decision
making.

(not sure what I was getting at, so I'll just stop here).

------
benatkin
I feel a kind of social responsibility to flag this anonymous FUD.

------
bitops
This post is unparalleled FUD. We use MongoDB in production and all the issues
we've encountered have been either environment or configuration related.

There are plenty of things about MongoDB I don't like but this OP is a total
coward. If you've got something to say, put your name on it and come out in
the open.

This type of post is the worst of its "hiding behind Internet anonymity" kind.

And for the record, I don't think Oracle is behind this. They're confident in
their Exadata offering and have little to gain by posting this kind of crap
around MongoDB. Besides, Larry Ellison has never been afraid to openly taunt
his competitors.

~~~
prodigal_erik
I wouldn't use my True Name for serious criticism of a tool that may become
popular, because I expect that to find that a career-limiting move. E.g., I
think MySQL has reckless contempt for data integrity, but that doesn't mean
I'd rather starve than ever be considered by a hiring manager at any MySQL
shop.

------
jmspring
Somewhat off topic, but mongo recently came up in a design discussion. Some of
the points here are intereting to consider/evaluate against the most recent
version.

My question is, given Mongo and the other NoSQL solutions, has anyone come up
with a comparison of strong and weak points across different application
types? Feature lists really aren't always useful - as noted about things like
code maturity, etc.

------
zobzu
I've had similar performance in my use case (big joins and very large tables)
using PostgreSQL (in my case) and disabling sync() to disk, and tuning the
buffers, as with the various NoSQL I tried.

It seems to me that NoSQL does not really bring speed. Just scalability and a
different model. Hopefully most of them don't lose data at random. PG
certainly doesn't, even with sync() off.

I have not tested the scalability of PG.

~~~
politician
"big joins" ?

Aren't joins one of the things that NoSQL loudly and proudly announces that it
isn't suitable for?

------
blago
Sounds painfully familiar...

------
ryanfitz
I think MongoDB's biggest problem is people expect mongo to take care of all
their scalability issues for them. In reality once you start hitting a certain
scale you need to start rearchitecting your system, no datastore can
automatically handle this for you, but perhaps mongoDB let you get a little
bit bigger before this became a big issue.

------
pwaring
"They just disappeared sometimes. Cause unknown."

If the cause is unknown, how can you blame it on a given piece of software?

~~~
jacques_chester
I presume he's referring to the silent truncation stunt Mongo has been accused
of by others before.

"Oh THAT", say Mongo boosters. "You should have read the IRC logs of June
22nd", they continue, "there was a 3 line patch posted in the channel. It
totes fixes that problem".

------
xxqs
I wonder why nobody mentions that MongoDB supports x86 CPU architecture
_ONLY_. It keeps unaligned data in its memory structures, and all operations
are explicitly little-endian. So, no chance to get it running on any ARM,
MIPS, PowerPC or SPARC

~~~
christkv
There is a fork on github no os86 don't remember the name but is supposed to
compile to arm

~~~
xxqs
yes, I talked to the guy. The problem is, that the mainstream maintainers
completely ignore the problem, so there's no official support anyway.

------
mark_l_watson
Yes, if people are sing MongoDB for applications requiring ACID, it is
probably not the best fit. However, there are _many_ great use cases that
MongoDB is a great fit for, sometimes characterized by needing a lot of read
slaves for complex analytics, where data loss is not a lose-the-company
proposition, rapid prototyping, etc. I just ported a Java GWT + Objectify
appengine application to run on an EC2 with MongoDB and it was shockingly easy
to do. Also, you an give up some write performance for increased data safety.

MongoDB (along with PostgreSQL, RDF data sotres, and sometimes Neo4J) is
solidly in my preferred tool set.

------
rfurlan
I just recently published an article describing my experience migrating from
SQL Server to MongoDB, you can read it here:
<http://news.ycombinator.com/item?id=3203601>

I agree with the author that MongoDB is green, maybe not quite ready for prime
time yet. All things considered, I realize we took a risk by switching and
while I am quite happy with MongoDB, I do worry that at some point we will
experience a failure condition we might not be able to recover from.

------
jtchang
I have never used MongoDB in production but have thought about it. To me
though it is just another architectural decision that you need to base around
risk and reward.

MongoDB is awesome at certain things. But it is still not at a tried and true
level as say PostgreSQL or MySQL.

I am skeptical of the article but only because it is all too easy to fault new
projects. However I would be curious to know 10gen's development practices as
compared to say Postgres or SQLite (I have heard awesome things about SQLite's
development testing).

------
latch
Am I the only one who's ever forgotten the where clause in an update or delete
statement? For all the proof presented, for all we know that's the cause of
their lost data.

------
yaix
Some guy posted some unproven random claims on pastebin, and people take it
that serious? "It must be true, I've seen it on the Internet!" 500+ upvotes?
c'mon.

------
tlogan
We are all engineers and MongoDB is open source. Maybe the easiest way to
evaluate the project is to review the source code. This will give at least
some idea about quality of MongoDB - of course MongoDB can still be a great
product even if code is not written well but it is important indicator.

What is your comment about code written? Is it maintainable? Is it modular?
Doe s it seem well written?

------
freerobby
[http://www.google.com/search?gcx=c&sourceid=chrome&i...](http://www.google.com/search?gcx=c&sourceid=chrome&ie=UTF-8&q=%22powered+by+mongodb%22)

------
bdarfler
Yah, MySql 4.0 sucked, thats why I don't use 5.5 now. Come on.

------
HarrietTubgirl
If it's new and you are pushing its boundaries, you will get screwed. Always.
This goes for new major releases of MySQL just as well as MongoDB.

------
captaincrunch
The largest website on the internet uses MySQL, why can't you? (Facebook uses
MySQL)

~~~
ceejayoz
Facebook also has the engineering resources to smack it into submission.

------
aritraghosh007
Is this actually true ? Someone from 10gen needs to confirm and accept it ,if
they are.

~~~
buster
As if they would officially confirm this...

~~~
aritraghosh007
Confirm or refute , doesn't matter but at least a comment giving some strong
evidence would surely work !!

~~~
exDM69
As the author made no specific claims or didn't show any failing test cases
which can be discussed and reasoned about in a sensible way, it's going to be
very hard to confirm or refute anything.

Responding to anonymous flames in the internet is a waste of time.

~~~
aritraghosh007
After reading most of the comments now , I believe it is. I cant understand
why would people just try to bring down something which is popular by all
means by doing such rash publicity . HN should be a little careful while
posting such links without verifying the source.

------
marcf
Are we sure this isn't an Oracle employee?

~~~
jacques_chester
Does it matter, if they're right?

Edit: to the downvoters, this is a serious question.

~~~
vannevar
_Does it matter, if they're right?_

No, of course not. But the point is that it would be easy to generate a post
like this just by going back over the critical bug list for previous versions,
and throwing in an unsubstantiated claim of 'mysterious data loss.' And Oracle
does have the incentive and the means to engage in an old fashioned Microsoft-
style FUD campaign. They've already launched their 'embrace-and-extend'
strategy: <http://www.oracle.com/us/corporate/press/519708> .

------
nmongo
DISCLAIMER: I submitted this story and it is in fact a hoax that has gone too
far, you got trolled, truly frightening how gullible most of you are. DO NOT
BELIEVE EVERYTHING YOU READ ON THE INTERNET!

~~~
Confusion

      how gullible most of you are
    

That's an unsubstantiated, and probably false, assumption. There are a lot of
people in this thread debating the merits of the complaints and _most_ doubt
those merits. Even more people are not participating in this thread at all and
you don't have a clue what they think. If they're anything like me, they
thought "another day, another complaint about product X." and didn't draw any
conclusions from this post.

You seem too convinced of your own greatness to seriously consider the
possibility that maybe _most_ people here are actually sensible and don't rush
to judgments. To me, this post portrays you as an 18-year old that was the
smartest in his highschool, but has yet to realize and accept just how awfully
many people out there are smarter and wiser. God help the people around you if
you're older than 28.

------
belbn
Can we make Google's filesystem mainstream?

~~~
SeanNieuwoudt
there is an open source alternative: <http://hbase.apache.org/>

------
dblock
Remember Ingres? Me neither. <http://bit.ly/uhb6OY>

------
justin_vanw
Shhhhhhhh! As someone who works at startups, the fact that my competitors
would trust something like MongoDB with their data is _awesome_.

------
ricardobeat
_10gen's order seems to be, #5, then everything else in some order. #1 ain't
in the top 3._

So, #5, #1, then the other 3 in some order :)

