Ask HN: We just migrated from SQL Server to MongoDB - ask us anything

mark_l_watson · on Oct 22, 2011

I have two customers (I am a consultant) who use MongoDB - one uses paid-for support from 10gen (which seems very good, BTW) and one doesn't. Just curious: are you using paid for support, or investing instead on learning everything you can about MongoDB?

BTW, I like your choice: MongoDB is a great developer experience, if you can lighten up on needing immediate consistency.

Also, re: backups: why don't you run a slave that you take offline a few times an hour to snapshot? Low ceremony doing this, and I would feel better about the safety of your data. It is pretty easy to run a cron job that cleanly shuts down a slave, ZIPs a backup to S3 (or where ever), and restarts the slave.

rfurlan · on Oct 22, 2011

We are currently not paying for support, although 10Gen is a pleasure to work with even when you are not paying them anything - they are very helpful!

Thanks for the tip, we might do that eventually :)

gmodena · on Oct 21, 2011

Thanks for sharing your experience. Having performed a sort of reverse migration (mongodb -> pgsql), I'd have a few questions.

Is it possible to know the use case of your setup? Which size (storage, load etc.) are you currently operating at? What is the projected growth in the medium term? What is main the benefit of this migration? Is MongoDB solving a specific problem that could not have been addressed by mysql/pgsql (or some document oriented/nosql other than mongo)? Are you using any ORM and/or middleware to interface to MongoDB?

rfurlan · on Oct 24, 2011

I am actually just finishing writing an article with all the details, I will post a link as soon as it is online :)

We are not using any middleware and all our queries are hand-sculpted to make sure we are never causing more I/O impact than we need to.

I am sure everything we do could have been done with mysql/pgsql with enough consideration for the performance characteristics of either database.

Ultimately, a document-based storage was simply a better, more natural fit for us.

rfurlan · on Oct 28, 2011

Thank you everyone for the incredible feedback. I have just published the article here: http://www.wireclub.com/development/TqnkQwQ8CxUYTVT90/read

I hope you will find it useful!

Joakal · on Oct 21, 2011

Did you have extra costs due to more memory requirements? Any other costs? What about savings?

How long did it take to train the team to MongoDB?

Did you have a rollback plan if MongoDB is not good enough?

What were your requirements that made MongoDB better than SQL Server and the other DBs?

rfurlan · on Oct 21, 2011

Yes, we did had to invest on new servers. Not only we need more RAM, we also needed more storage (which as expensive because we use SSDs). All things considered, SQL Server licences are not cheap either (6K-7K each) so in the long run I believe we will be saving money.

Our training was very much hands on: (1) we migrated a piece of code, (2) tested against a full-sized database, (3) profiled performance under load, (4) learned from the results, moved on the next piece of code (1).

Our plan if MongoDB wasn't good enough consisted basically of: (1) crying ourselves to sleep, (2) drinking heavily, and (3) contributing to MongoDB's development to make it good enough.

rfurlan · on Oct 21, 2011

Regarding requirements, we were looking for something that was free and open source because we are planning to scale up our operations in the near future. Because we have both Windows and Linux servers, we also wanted something that could run on both platforms. Unfortunately it turned out that MongoDB doesn't perform well on Windows, but you can still use Windows servers as replication targets which is great.

Given our C# code base, the fact that MongoDB has a couple of somewhat mature C# drivers was definitely a factor.

bartonfink · on Oct 21, 2011

Why? What prompted you to move off an RDBMS in general and to MongoDB in particular? Did you move your entire data tier to MongoDB or just particular parts of your data model?

rfurlan · on Oct 21, 2011

We always felt that using a relational database to power our service was a bit like trying to fit a square peg in a round hole. Every application is different but in our case, we would gladly trade off consistency for performance and schema flexibility. I am not saying that Microsoft SQL Server is slow, because it isn't, but it does impose a more formal relationship with your data and that wasn't the best fit for us.

We moved our whole data tier to MongoDB because we were not particularly interested in paying for more SQL Server licenses as we scaled up our operations.

rfurlan · on Oct 21, 2011

We just migrated a large production database for an existing website to MongoDB. I am currently writing an article to share the lessons learned, in the mean time I would be happy to answer your questions!

peschkaj · on Oct 21, 2011

With SQL Server (or any RDBMS) there are known backup and recovery strategies and a WAL as added security. What kind of disaster recovery strategies are you employing with MongoDB?

rfurlan · on Oct 21, 2011

That is an excellent question. It is trivial to back up a live SQL Server instance because it supports shadow-volume snapshots. The same isn't true for MongoDB.

Instead, we had to resort to running our MongoDB database on a 3-node cluster and this is our main strategy for resiliency. Additionally, one of the nodes is set to do a daily full-database dump, which is pretty much guaranteed to be in an inconsistent state but still provides us an extra degree of peace of mind.

peschkaj · on Oct 21, 2011

That's what I suspected.

So, when your data volume exceeds what will fit in RAM, I'm guessing that your plan is to shard to multiple MongoDB servers. Is your plan to continue to add multiple replicas to each shard to handle DR?

rfurlan · on Oct 21, 2011

What is really important is keeping your indexes in RAM, our data already greatly exceeds the amount of RAM we have available. Even our indexes are only partially in memory already and performance is still terrific.

mathias_10gen · on Oct 24, 2011

2.0 has a new index format that should reduce your index sizes by ~20-30% to fit more in RAM. If you haven't looked at upgrading yet, it is probably worth testing with 2.0.1 to see how it performs in your use case. You will need to reindex() or restore from a dump to take advantage of the new index format.

rfurlan · on Oct 24, 2011

Yes, we are aware and we can't wait to upgrade, we will probably do it at our next scheduled downtime :)

latch · on Oct 22, 2011

Worth noting that you can define a slave as hidden:true (so it never gets promoted to master), and run it with slavedelay of X hours..it's a great way to keep a rolling backup

peschkaj · on Oct 23, 2011

Good to know, I wasn't aware of both of these features. Thanks!

rfurlan · on Oct 24, 2011

Yes, these are very useful features!

mathias_10gen · on Oct 24, 2011

FYI, as of 1.8 MongoDB now has a WAL implementation optionally enabled by starting with --journal. We made it the default in 2.0 for 64-bit platforms.

manuscreationis · on Oct 21, 2011

I'll start the ball rolling with an easy one...

How did it go?

Can you talk through some of the changes you had to make, both from a library perspective, as well as any architectural changes that were required?

rfurlan · on Oct 21, 2011

Overall, it went well, bumpy at first because we had to learn about Mongo's performance characteristics. Having a full-size database to work with made it a bit easier to identify friction points early on.

To make the most of it, we bit the bullet rewrote the whole application. You can't join results from two collections in MongoDB so we had to denormalize quite a lot.

manuscreationis · on Oct 21, 2011

Can you elaborate a bit more on the scope of the rewrite? I'm assuming you didn't have much front end work to re-do, mainly middle to back end code?

Did you have anything you could re-use, such as your high level business logic classes, or (and this is meant non-judgmentally) was your code too deeply wrapped around the existing data store, or the idea of the data store as a RDBMS, to make any of it reusable?

Was your logic primarily in Stored Procedures, or (and again i'm assuming here, this time that yours is a .NET environment since you called it 'SQL Server') did you make heavy use of LINQ to handle those kinds of things?

Can you talk about those performance issues, and how you handled them? (For my third assumption, i'll go with Indexes)

Looking forward to reading the in depth article. Thanks for fielding these questions

rfurlan · on Oct 21, 2011

We practically rewrote the whole application. Some code was spared, but I would say that less than 20% of the original source was reused. This was not just because of the migration to MongoDB but more because we decided to take the product to a different direction.

Fundamentally while we could have migrated without a full re-write, SQL Server was only one of the technologies we wanted to replace. One thing led to another and eventually we decided to re-launch the product instead of iterating on it.

As for the performance issues, most of them are stemmed from MongoDB's current locking strategies. I heard 2.0 handles it a bit better but we haven't rolled it out to production yet. Planning your indexes carefully so they fit in RAM is very, very important to assure a high throughput with MongoDB.

manuscreationis · on Oct 21, 2011

I just had one last question:

Did you evaluate any other "NoSQL" DBs, such as Riak? If so, what was your main impetus for choosing Mongo? Did you go with mongod or mongos for your environment?

rfurlan · on Oct 21, 2011

Yes, we evaluated several other alternatives. Ultimately, we felt that MongoDB's was at the sweet spot of best fit (to our needs) and maturity.

We tried to run Mongo on Windows and that was a bit of a disaster so we are running it on Linux.

rfurlan · on Oct 21, 2011

Just to clarify it: MongoDB main target platform is Linux, the Windows version is clearly a second-class citizen at this time. Not only the Windows version performs poorly under I/O pressure, it also crashes and leaves the database in a corrupted state (again, this only happens under significant I/O pressure, but it indeed happens).

manuscreationis · on Oct 21, 2011

I've only ever run it on a linux environment, I have not heard great things about the windows version.

Are you guys using Mongos (for running several nodes) or just standalone Mongod?

If Mongos, can you talk a little about your experience in setting that up? If not, can you speak to why you chose to run it in a single node?

rfurlan · on Oct 21, 2011

Sorry, I forgot to answer that, we are running mongod :)

muloka · on Oct 22, 2011

What do you use for reporting if any? Do you use still use SSRS? If so how? Would love to hear about how your reporting needs are met.

rfurlan · on Oct 24, 2011

We have never used SSRS, instead we have been slowly building an in-house analytics suite that reports all the information we need in real-time.

Our analytics package is also based on MongoDB and it currently aggregates over 60 million events daily in real-time and with a negligible performance impact.

It is a lot like Mixpanel, but on the server side.