I have two customers (I am a consultant) who use MongoDB - one uses paid-for support from 10gen (which seems very good, BTW) and one doesn't. Just curious: are you using paid for support, or investing instead on learning everything you can about MongoDB?
BTW, I like your choice: MongoDB is a great developer experience, if you can lighten up on needing immediate consistency.
Also, re: backups: why don't you run a slave that you take offline a few times an hour to snapshot? Low ceremony doing this, and I would feel better about the safety of your data. It is pretty easy to run a cron job that cleanly shuts down a slave, ZIPs a backup to S3 (or where ever), and restarts the slave.
Thanks for sharing your experience.
Having performed a sort of reverse migration (mongodb -> pgsql), I'd have a few questions.
Is it possible to know the use case of your setup? Which size (storage, load etc.) are you currently operating at? What is the projected growth in the medium term? What is main the benefit of this migration? Is MongoDB
solving a specific problem that could not have been addressed by mysql/pgsql (or some document oriented/nosql other than mongo)? Are you using any ORM and/or middleware to interface to MongoDB?
Yes, we did had to invest on new servers. Not only we need more RAM, we also needed more storage (which as expensive because we use SSDs). All things considered, SQL Server licences are not cheap either (6K-7K each) so in the long run I believe we will be saving money.
Our training was very much hands on: (1) we migrated a piece of code, (2) tested against a full-sized database, (3) profiled performance under load, (4) learned from the results, moved on the next piece of code (1).
Our plan if MongoDB wasn't good enough consisted basically of: (1) crying ourselves to sleep, (2) drinking heavily, and (3) contributing to MongoDB's development to make it good enough.
Regarding requirements, we were looking for something that was free and open source because we are planning to scale up our operations in the near future. Because we have both Windows and Linux servers, we also wanted something that could run on both platforms. Unfortunately it turned out that MongoDB doesn't perform well on Windows, but you can still use Windows servers as replication targets which is great.
Given our C# code base, the fact that MongoDB has a couple of somewhat mature C# drivers was definitely a factor.
Why? What prompted you to move off an RDBMS in general and to MongoDB in particular? Did you move your entire data tier to MongoDB or just particular parts of your data model?
We always felt that using a relational database to power our service was a bit like trying to fit a square peg in a round hole. Every application is different but in our case, we would gladly trade off consistency for performance and schema flexibility. I am not saying that Microsoft SQL Server is slow, because it isn't, but it does impose a more formal relationship with your data and that wasn't the best fit for us.
We moved our whole data tier to MongoDB because we were not particularly interested in paying for more SQL Server licenses as we scaled up our operations.
We just migrated a large production database for an existing website to MongoDB. I am currently writing an article to share the lessons learned, in the mean time I would be happy to answer your questions!
With SQL Server (or any RDBMS) there are known backup and recovery strategies and a WAL as added security. What kind of disaster recovery strategies are you employing with MongoDB?
That is an excellent question. It is trivial to back up a live SQL Server instance because it supports shadow-volume snapshots. The same isn't true for MongoDB.
Instead, we had to resort to running our MongoDB database on a 3-node cluster and this is our main strategy for resiliency. Additionally, one of the nodes is set to do a daily full-database dump, which is pretty much guaranteed to be in an inconsistent state but still provides us an extra degree of peace of mind.
So, when your data volume exceeds what will fit in RAM, I'm guessing that your plan is to shard to multiple MongoDB servers. Is your plan to continue to add multiple replicas to each shard to handle DR?
What is really important is keeping your indexes in RAM, our data already greatly exceeds the amount of RAM we have available. Even our indexes are only partially in memory already and performance is still terrific.
2.0 has a new index format that should reduce your index sizes by ~20-30% to fit more in RAM. If you haven't looked at upgrading yet, it is probably worth testing with 2.0.1 to see how it performs in your use case. You will need to reindex() or restore from a dump to take advantage of the new index format.
Worth noting that you can define a slave as hidden:true (so it never gets promoted to master), and run it with slavedelay of X hours..it's a great way to keep a rolling backup
Overall, it went well, bumpy at first because we had to learn about Mongo's performance characteristics. Having a full-size database to work with made it a bit easier to identify friction points early on.
To make the most of it, we bit the bullet rewrote the whole application. You can't join results from two collections in MongoDB so we had to denormalize quite a lot.
Can you elaborate a bit more on the scope of the rewrite? I'm assuming you didn't have much front end work to re-do, mainly middle to back end code?
Did you have anything you could re-use, such as your high level business logic classes, or (and this is meant non-judgmentally) was your code too deeply wrapped around the existing data store, or the idea of the data store as a RDBMS, to make any of it reusable?
Was your logic primarily in Stored Procedures, or (and again i'm assuming here, this time that yours is a .NET environment since you called it 'SQL Server') did you make heavy use of LINQ to handle those kinds of things?
Can you talk about those performance issues, and how you handled them? (For my third assumption, i'll go with Indexes)
Looking forward to reading the in depth article. Thanks for fielding these questions
We practically rewrote the whole application. Some code was spared, but I would say that less than 20% of the original source was reused. This was not just because of the migration to MongoDB but more because we decided to take the product to a different direction.
Fundamentally while we could have migrated without a full re-write, SQL Server was only one of the technologies we wanted to replace. One thing led to another and eventually we decided to re-launch the product instead of iterating on it.
As for the performance issues, most of them are stemmed from MongoDB's current locking strategies. I heard 2.0 handles it a bit better but we haven't rolled it out to production yet. Planning your indexes carefully so they fit in RAM is very, very important to assure a high throughput with MongoDB.
Did you evaluate any other "NoSQL" DBs, such as Riak? If so, what was your main impetus for choosing Mongo? Did you go with mongod or mongos for your environment?
Just to clarify it: MongoDB main target platform is Linux, the Windows version is clearly a second-class citizen at this time. Not only the Windows version performs poorly under I/O pressure, it also crashes and leaves the database in a corrupted state (again, this only happens under significant I/O pressure, but it indeed happens).
We have never used SSRS, instead we have been slowly building an in-house analytics suite that reports all the information we need in real-time.
Our analytics package is also based on MongoDB and it currently aggregates over 60 million events daily in real-time and with a negligible performance impact.
It is a lot like Mixpanel, but on the server side.
BTW, I like your choice: MongoDB is a great developer experience, if you can lighten up on needing immediate consistency.
Also, re: backups: why don't you run a slave that you take offline a few times an hour to snapshot? Low ceremony doing this, and I would feel better about the safety of your data. It is pretty easy to run a cron job that cleanly shuts down a slave, ZIPs a backup to S3 (or where ever), and restarts the slave.