Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: We just migrated from SQL Server to MongoDB - ask us anything
26 points by rfurlan on Oct 21, 2011 | hide | past | favorite | 32 comments

I have two customers (I am a consultant) who use MongoDB - one uses paid-for support from 10gen (which seems very good, BTW) and one doesn't. Just curious: are you using paid for support, or investing instead on learning everything you can about MongoDB?

BTW, I like your choice: MongoDB is a great developer experience, if you can lighten up on needing immediate consistency.

Also, re: backups: why don't you run a slave that you take offline a few times an hour to snapshot? Low ceremony doing this, and I would feel better about the safety of your data. It is pretty easy to run a cron job that cleanly shuts down a slave, ZIPs a backup to S3 (or where ever), and restarts the slave.

We are currently not paying for support, although 10Gen is a pleasure to work with even when you are not paying them anything - they are very helpful!

Thanks for the tip, we might do that eventually :)

Thanks for sharing your experience. Having performed a sort of reverse migration (mongodb -> pgsql), I'd have a few questions.

Is it possible to know the use case of your setup? Which size (storage, load etc.) are you currently operating at? What is the projected growth in the medium term? What is main the benefit of this migration? Is MongoDB solving a specific problem that could not have been addressed by mysql/pgsql (or some document oriented/nosql other than mongo)? Are you using any ORM and/or middleware to interface to MongoDB?

I am actually just finishing writing an article with all the details, I will post a link as soon as it is online :)

We are not using any middleware and all our queries are hand-sculpted to make sure we are never causing more I/O impact than we need to.

I am sure everything we do could have been done with mysql/pgsql with enough consideration for the performance characteristics of either database.

Ultimately, a document-based storage was simply a better, more natural fit for us.

Thank you everyone for the incredible feedback. I have just published the article here: http://www.wireclub.com/development/TqnkQwQ8CxUYTVT90/read

I hope you will find it useful!

Did you have extra costs due to more memory requirements? Any other costs? What about savings?

How long did it take to train the team to MongoDB?

Did you have a rollback plan if MongoDB is not good enough?

What were your requirements that made MongoDB better than SQL Server and the other DBs?

Yes, we did had to invest on new servers. Not only we need more RAM, we also needed more storage (which as expensive because we use SSDs). All things considered, SQL Server licences are not cheap either (6K-7K each) so in the long run I believe we will be saving money.

Our training was very much hands on: (1) we migrated a piece of code, (2) tested against a full-sized database, (3) profiled performance under load, (4) learned from the results, moved on the next piece of code (1).

Our plan if MongoDB wasn't good enough consisted basically of: (1) crying ourselves to sleep, (2) drinking heavily, and (3) contributing to MongoDB's development to make it good enough.

Regarding requirements, we were looking for something that was free and open source because we are planning to scale up our operations in the near future. Because we have both Windows and Linux servers, we also wanted something that could run on both platforms. Unfortunately it turned out that MongoDB doesn't perform well on Windows, but you can still use Windows servers as replication targets which is great.

Given our C# code base, the fact that MongoDB has a couple of somewhat mature C# drivers was definitely a factor.

Why? What prompted you to move off an RDBMS in general and to MongoDB in particular? Did you move your entire data tier to MongoDB or just particular parts of your data model?

We always felt that using a relational database to power our service was a bit like trying to fit a square peg in a round hole. Every application is different but in our case, we would gladly trade off consistency for performance and schema flexibility. I am not saying that Microsoft SQL Server is slow, because it isn't, but it does impose a more formal relationship with your data and that wasn't the best fit for us.

We moved our whole data tier to MongoDB because we were not particularly interested in paying for more SQL Server licenses as we scaled up our operations.

We just migrated a large production database for an existing website to MongoDB. I am currently writing an article to share the lessons learned, in the mean time I would be happy to answer your questions!

With SQL Server (or any RDBMS) there are known backup and recovery strategies and a WAL as added security. What kind of disaster recovery strategies are you employing with MongoDB?

That is an excellent question. It is trivial to back up a live SQL Server instance because it supports shadow-volume snapshots. The same isn't true for MongoDB.

Instead, we had to resort to running our MongoDB database on a 3-node cluster and this is our main strategy for resiliency. Additionally, one of the nodes is set to do a daily full-database dump, which is pretty much guaranteed to be in an inconsistent state but still provides us an extra degree of peace of mind.

That's what I suspected.

So, when your data volume exceeds what will fit in RAM, I'm guessing that your plan is to shard to multiple MongoDB servers. Is your plan to continue to add multiple replicas to each shard to handle DR?

What is really important is keeping your indexes in RAM, our data already greatly exceeds the amount of RAM we have available. Even our indexes are only partially in memory already and performance is still terrific.

2.0 has a new index format that should reduce your index sizes by ~20-30% to fit more in RAM. If you haven't looked at upgrading yet, it is probably worth testing with 2.0.1 to see how it performs in your use case. You will need to reindex() or restore from a dump to take advantage of the new index format.

Yes, we are aware and we can't wait to upgrade, we will probably do it at our next scheduled downtime :)

Worth noting that you can define a slave as hidden:true (so it never gets promoted to master), and run it with slavedelay of X hours..it's a great way to keep a rolling backup

Good to know, I wasn't aware of both of these features. Thanks!

Yes, these are very useful features!

FYI, as of 1.8 MongoDB now has a WAL implementation optionally enabled by starting with --journal. We made it the default in 2.0 for 64-bit platforms.

I'll start the ball rolling with an easy one...

How did it go?

Can you talk through some of the changes you had to make, both from a library perspective, as well as any architectural changes that were required?

Overall, it went well, bumpy at first because we had to learn about Mongo's performance characteristics. Having a full-size database to work with made it a bit easier to identify friction points early on.

To make the most of it, we bit the bullet rewrote the whole application. You can't join results from two collections in MongoDB so we had to denormalize quite a lot.

Can you elaborate a bit more on the scope of the rewrite? I'm assuming you didn't have much front end work to re-do, mainly middle to back end code?

Did you have anything you could re-use, such as your high level business logic classes, or (and this is meant non-judgmentally) was your code too deeply wrapped around the existing data store, or the idea of the data store as a RDBMS, to make any of it reusable?

Was your logic primarily in Stored Procedures, or (and again i'm assuming here, this time that yours is a .NET environment since you called it 'SQL Server') did you make heavy use of LINQ to handle those kinds of things?

Can you talk about those performance issues, and how you handled them? (For my third assumption, i'll go with Indexes)

Looking forward to reading the in depth article. Thanks for fielding these questions

We practically rewrote the whole application. Some code was spared, but I would say that less than 20% of the original source was reused. This was not just because of the migration to MongoDB but more because we decided to take the product to a different direction.

Fundamentally while we could have migrated without a full re-write, SQL Server was only one of the technologies we wanted to replace. One thing led to another and eventually we decided to re-launch the product instead of iterating on it.

As for the performance issues, most of them are stemmed from MongoDB's current locking strategies. I heard 2.0 handles it a bit better but we haven't rolled it out to production yet. Planning your indexes carefully so they fit in RAM is very, very important to assure a high throughput with MongoDB.

I just had one last question:

Did you evaluate any other "NoSQL" DBs, such as Riak? If so, what was your main impetus for choosing Mongo? Did you go with mongod or mongos for your environment?

Yes, we evaluated several other alternatives. Ultimately, we felt that MongoDB's was at the sweet spot of best fit (to our needs) and maturity.

We tried to run Mongo on Windows and that was a bit of a disaster so we are running it on Linux.

Just to clarify it: MongoDB main target platform is Linux, the Windows version is clearly a second-class citizen at this time. Not only the Windows version performs poorly under I/O pressure, it also crashes and leaves the database in a corrupted state (again, this only happens under significant I/O pressure, but it indeed happens).

I've only ever run it on a linux environment, I have not heard great things about the windows version.

Are you guys using Mongos (for running several nodes) or just standalone Mongod?

If Mongos, can you talk a little about your experience in setting that up? If not, can you speak to why you chose to run it in a single node?

Sorry, I forgot to answer that, we are running mongod :)

What do you use for reporting if any? Do you use still use SSRS? If so how? Would love to hear about how your reporting needs are met.

We have never used SSRS, instead we have been slowly building an in-house analytics suite that reports all the information we need in real-time.

Our analytics package is also based on MongoDB and it currently aggregates over 60 million events daily in real-time and with a negligible performance impact.

It is a lot like Mixpanel, but on the server side.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact