

Does MongoDB 1.3.x (dev) silently lose your data? - bravura
http://www.korokithakis.net/node/116

======
cheald
MongoDB VERY CLEARLY warns you about the limitations of the 32-bit install
when you spin it up. When I invoke mongod on my 32-bit development box, I get
a nice big message:

    
    
      ** NOTE: when using MongoDB 32 bit, you are limited to about 2 gigabytes of data
      **       see http://blog.mongodb.org/post/137788967/32-bit-limitations for more
    

If you manage to miss that, it's your own fault. Combined with the fact that
he was using an unstable development branch, he has absolutely zero room
whatsoever to complain. He hit a documented limit and got less-than-friendly
behavior on an unstable build of the software. Going nuclear over it is just
silly, and makes him look inattentive and/or naive.

~~~
bad_user
That seems like a brutal limitation done for non-obvious reasons ... I mean,
wtf?

> _makes him look inattentive and/or naive_

Oh I wish you gave me more options. This is not a proper ad hominem.

~~~
cheald
It's a brutal limitation if you're expecting MongoDB to be "MySQL, except with
more awesome because it's new and shiny!", sure, but once you read the blog
post, it's quite apparent why - memory-mapped files are blisteringly fast, and
raw speed at the cost of other features (like relational constraints,
transactions, and single-server durability) is a part of the MongoDB setup. If
those are acceptable sacrifices, then great, enjoy a wicked fast data store.
If they're not, then it's not the tool for you, no matter how much the cool
kids chat it up.

------
andrewvc
Title is misleading, he was using the unstable dev branch. There's no evidence
here that the stable branch does this.

~~~
benatkin
Flagged, but a title change would be acceptable, too.

~~~
bravura
I have updated the title in response to this criticism

~~~
benatkin
Thanks! I unflagged it.

Come to think of it, the article is still a good cautionary tale. What it
really teaches is:

a) do your research

b) use the production version if you're using code in production, unless you
have a really good reason not to and have some safeguards in place

------
bobx11
I use MongoDB, collecting just under 200k documents a day for months and I
haven't lost any data yet so far that we've found. Sqlite looks pretty speedy
though - may try going back to something like that for smaller jobs.

The one thing I found funny is that deleting documents from a database leaves
the space allocated and you have to manually run the repair command to reclaim
the space, but that requires 100% additional space because it creates a new db
and copies the data there.... little bit of a problem since we were already a
99% for the device :-| other than that - been really happy with it.

~~~
kscaldef
That's not terribly uncommon. PostgreSQL also does not reclaim space from
deleted rows until you vacuum. Many hash table implementations also only mark
items deleted rather than actually removing them.

~~~
selenamarie
Although - for further clarification, PostgreSQL doesn't require 2x the total
amount of disk to reclaim/reuse space under normal circumstances.

As of version 8.1 (out since 2005), VACUUMing is an automatic function inside
the server. (autovacuum)

------
rit
Er, wow. I don't get the impression he has any idea what's going on.

The title is incredibly misleading, especially given that this is a ~2 month
old post.

As of _this_ writing, 1.4.x is the stable branch, with 1.5.x as the unstable.

I should note that I've been running MongoDB in Production since last August.
Development, deployment and go Live occurred on a pre-GA 0.9.x version and I
never encountered ANY issues. With almost a year of uptime I've had no data
loss or anything else.

At the same time I understood I was using an unstable version and was careful
to also understand what was going on under the covers.

So here's the REALLY important thing you should understand if you're using
MongoDB because it probably has to do with his data "loss":

Data operations in Mongo, viz. insert/update are asynchronous - from both a
API client and a disk-persistence standpoint. One of the things that gives you
the speed boost you see from MongoDB is this concept.

When your language driver sends new rows, it does NOT by default wait around
to see if Mongo saved it correctly. It sends it off, makes sure MongoDB got
the data and returns to you. You can force it to wait, and you can ask it for
the lastError - but normally you don't wait for a "I got it in correctly"
answer.

Additionally, once MongoDB receives it, it goes into memory... it writes to
disk lazily, rather than immediately (if you sent 1000 increment commands
between the last disk write and the next, it would batch them into a single
increment, etc).

Yes, these things can lead to data "loss" - or the appearance thereof. If
you're sending in a bad update or insert statement and not checking for
errors, data will "disappear" - by which I mean it was bad data and never
accepted for write.

------
henning
Using MongoDB was a mistake to begin with even if he didn't select the wrong
branch to use. His project isn't about databases, it's about machine learning.
For his requirements, he just needs a simple datastore and it sounds like
SQLite would have worked fine.

He procrastinated by trying out cool new software (a classic symptom of a
project that is not going properly) and lost a lot of time because of the
mistaken manner in which he did it.

He should get back on track by focusing on the real meat of his project and
use the simplest bitbucket he can find for it.

~~~
bjclark
Yes, the fact that MongoDB was losing his data was all his fault because he
chose MongoDB. Sound logic.

~~~
weeksie
Well, the problem was the unstable branch. The general case larger problem was
that he wasn't paying attention to what the hell he was doing.

Oh yeah, hi BJ :)

------
vain
hasnt for me, so far ... been running three _heavily_ written to mongo
installs for more than a month...

~~~
smokinn
I hope you have backups and have tested restoring them.

~~~
digitallogic
Agreed, but that has nothing to do with the use of MongoDB. Any data that
"can't be lost" needs to be backed up.

------
luigi
It should be noted that this post is over two months old.

------
oldgregg
Wahwahwah. This guy sounds like he has never used open source software before.
I have a complex mongodb install in production with no problems. Every time I
"bundle install" it breaks something because everything is running edge and
the APIs are changing all the time. If you are not competent to deal with that
then maybe you shouldn't be playing with shiny new toys.

