Hacker News new | comments | show | ask | jobs | submit login

Very good summary of what to look out for. Here are a few others that I ran into back when I was still entertaining the idea of using Mongo in production:

1. The keys in Mongo documents get repeated over and over again for every record in your collection (which makes sense when you remember that collections don't have a db-enforced schema). When you have millions of documents this really adds up. Consider adding an abstraction mapping layer of short 1-2 character keys to real keys in your business logic.

2. Mongo lies about being ready after an initial install. If you're trying to automate bringing mongo boxes up and down, you're going to run into the case where the mongo service says that it's ready, but in reality it's still preparing its preallocated journal. During this time, which can take up to 5-10 minutes based on your system specs, all of your connections will just hang and timeout. Either build the preallocated journal yourself and drop it in place before installing mongo, or touch the file locations if you don't mind the slight initial performance hit on that machine. (Note: not all installs will create a preallocated journal. Mongo tries to do a mini performance test on install to determine at runtime whether preallocating is better for your hardware or not. There's no way to force it one way or the other.)

I can suggest the sadly deliberately undocumented (viz. src/mongo/db/db.cpp:719) "--nopreallocj" option, as well.

Added this + listed you in the footer, thanks!

Thanks; #1 can be a problem for large collections...due to the extra space. I think I've seen some pre-rolled abstraction layers for this, but never used a open-source one myself.

#2 - usually the journal should be pretty quick to allocate - I've not experienced this problem directly myself.

I'll add some extra bits to the bottom of the post with your notes.

There is a ticket for compression of both keys and values https://jira.mongodb.org/browse/SERVER-164

The Mongo team seem somewhat reluctant to implement it.

I'm not sure if they are still doing it, but they used to work on the highest voted stuff as a priority; so if you want a feature, vote.

Also, I'm working on Snappy compression with Mongo (it's already used for the journal), however it's not currently stable and work is sporadic due to my startup.

It's probably not a clear win in most use-cases.

Reducing the working set is a big win. Mongo's behaviour when the working set is larger than RAM is really bad- https://jira.mongodb.org/browse/SERVER-574

Compression could have drawbacks on documents that get updated frequently. But it will be extremely useful on documents that get created and rarely/never change, coincidentally what I mostly have.

It would also greatly help if keys are compressed or indexed in some way since it could be done transparently.

You may recall the Mongo team being reluctant to make the database function well in single server setups, but they did address that with journalling.

Spring Data - MongoDB's ORM provides a nice way of doing this.

I prefer Morphia as ORM for MongoDB. Actively developed and maintained, less seems like enterprise http://code.google.com/p/morphia/

re #2: looking at the source code I don't see this behavior. Perhaps in a version from the past? Anyway, please advise or point me to a jira...

In src/mongo/db/db.cpp I see:

    void _initAndListen(int listenPort ) {
        dur::startup(); // <- this i believe preallocs journal files
        listen(listenPort); // after journal prealloc i think
p.s. this blog post is a very interesting article overall i think, without commenting on all the specifics...

can you create a Jira ticket for #2 if that is happening it should be easy to fix. jira.mongodb.org

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact