BTW, I like your choice: MongoDB is a great developer experience, if you can lighten up on needing immediate consistency.
Also, re: backups: why don't you run a slave that you take offline a few times an hour to snapshot? Low ceremony doing this, and I would feel better about the safety of your data. It is pretty easy to run a cron job that cleanly shuts down a slave, ZIPs a backup to S3 (or where ever), and restarts the slave.
Thanks for the tip, we might do that eventually :)
Is it possible to know the use case of your setup? Which size (storage, load etc.) are you currently operating at? What is the projected growth in the medium term? What is main the benefit of this migration? Is MongoDB
solving a specific problem that could not have been addressed by mysql/pgsql (or some document oriented/nosql other than mongo)? Are you using any ORM and/or middleware to interface to MongoDB?
We are not using any middleware and all our queries are hand-sculpted to make sure we are never causing more I/O impact than we need to.
I am sure everything we do could have been done with mysql/pgsql with enough consideration for the performance characteristics of either database.
Ultimately, a document-based storage was simply a better, more natural fit for us.
I hope you will find it useful!
How long did it take to train the team to MongoDB?
Did you have a rollback plan if MongoDB is not good enough?
What were your requirements that made MongoDB better than SQL Server and the other DBs?
Our training was very much hands on: (1) we migrated a piece of code, (2) tested against a full-sized database, (3) profiled performance under load, (4) learned from the results, moved on the next piece of code (1).
Our plan if MongoDB wasn't good enough consisted basically of: (1) crying ourselves to sleep, (2) drinking heavily, and (3) contributing to MongoDB's development to make it good enough.
Given our C# code base, the fact that MongoDB has a couple of somewhat mature C# drivers was definitely a factor.
We moved our whole data tier to MongoDB because we were not particularly interested in paying for more SQL Server licenses as we scaled up our operations.
Instead, we had to resort to running our MongoDB database on a 3-node cluster and this is our main strategy for resiliency. Additionally, one of the nodes is set to do a daily full-database dump, which is pretty much guaranteed to be in an inconsistent state but still provides us an extra degree of peace of mind.
So, when your data volume exceeds what will fit in RAM, I'm guessing that your plan is to shard to multiple MongoDB servers. Is your plan to continue to add multiple replicas to each shard to handle DR?
How did it go?
Can you talk through some of the changes you had to make, both from a library perspective, as well as any architectural changes that were required?
To make the most of it, we bit the bullet rewrote the whole application. You can't join results from two collections in MongoDB so we had to denormalize quite a lot.
Did you have anything you could re-use, such as your high level business logic classes, or (and this is meant non-judgmentally) was your code too deeply wrapped around the existing data store, or the idea of the data store as a RDBMS, to make any of it reusable?
Was your logic primarily in Stored Procedures, or (and again i'm assuming here, this time that yours is a .NET environment since you called it 'SQL Server') did you make heavy use of LINQ to handle those kinds of things?
Can you talk about those performance issues, and how you handled them? (For my third assumption, i'll go with Indexes)
Looking forward to reading the in depth article. Thanks for fielding these questions
Fundamentally while we could have migrated without a full re-write, SQL Server was only one of the technologies we wanted to replace. One thing led to another and eventually we decided to re-launch the product instead of iterating on it.
As for the performance issues, most of them are stemmed from MongoDB's current locking strategies. I heard 2.0 handles it a bit better but we haven't rolled it out to production yet. Planning your indexes carefully so they fit in RAM is very, very important to assure a high throughput with MongoDB.
Did you evaluate any other "NoSQL" DBs, such as Riak? If so, what was your main impetus for choosing Mongo? Did you go with mongod or mongos for your environment?
We tried to run Mongo on Windows and that was a bit of a disaster so we are running it on Linux.
Are you guys using Mongos (for running several nodes) or just standalone Mongod?
If Mongos, can you talk a little about your experience in setting that up? If not, can you speak to why you chose to run it in a single node?
Our analytics package is also based on MongoDB and it currently aggregates over 60 million events daily in real-time and with a negligible performance impact.
It is a lot like Mixpanel, but on the server side.