Genuine question- who is using MongoDB successfully in production, and at scale? I'm not aware of anyone myself- I hear of it being used in hackathons etc because its so quick to set up, but I'd be curious to know what people are using it with.
I run two sites, one is the perfect use-case for MongoDB - http://www.AUsedCar.com , it's a used car search engine. We've seen nothing but benefits by switching to it from MS SQL Server. Queries are way faster etc... It's a great use case because 99.9% of DB interactions are read-only searches.
My other site, http://www.BudgetSimple.com on the other hand is using SQL Server (in the process of porting to MySQL). It would not be a great use-case for Mongo, because there are usually as many update, delete, inserts as there are reads, and instant database integrity and a schema are important.
Anyone that claims a tool is perfect for every problem is probably wrong. You need to figure out the best one for your use case, and load test, security test, performance test, etc... until you have a good guess for the right answer.
No, and I actually used to work for a company that made a similar type of search engine!
That probably would have worked as well (don't think I considered that specific solution). Mongo came up on top because of it's wide use (among other things), ie it's pretty easy to find support and lots of stories about how to scale it under different scenarios.
The biggest difference was pulling a random record. This is an odd use case for most people, and not even a common one for me, but say I wanted to show you a random car near your location. The more common use-cases that were sped up were any geo-location searches, ie you're searching within X miles.
I'm using it for the analytics suite at my company (large ecommerce multinational).
Its weird at first, coming from a background of Access then SQLite then MySQL/PHPMyAdmin but you get used to it. I essentially treat it like a gigantic python dictionary object.
The sharding is too much of a ball-ache to set up so I've created an optimal way of distributing/mapping files across our cluster to make use of all machines.
Data integration is nice. Making sure there's no temptation to output each integrated line to the terminal, pymongo and its C extensions can integrate a ~500 byte record in ~0.0001 seconds.
Basically the main advantage is not having a schema whatsoever - you can just add random attributes to documents whenever the hell you want. But later you have to be careful with exception handling since documents might not have the attributes you expect.
So right off the bat, you've lost the querying power that SQL offers. When dealing with data that's intended to be analyzed, that sounds like a pretty big loss.
Clearly its built-in sharding support, which is often touted as one of its biggest benefits, wasn't suitable for you. So you had to invest some time and effort coming up with an alternate system. That sounds like a loss to me.
When it comes to the schema issue, it sounds like you haven't actually reduced the effort or work in any way, but merely pushed it somewhere else. Like you admit, you still do have to deal with the schema, it's just handled within the application logic, rather than the database. That sounds worse to me, especially if there is more than one application using the database.
right off the bat, you've lost the querying power that SQL offers.
Can you point me to any reasonably-priced database systems that allow me to execute SQL queries against a cluster of shared-nothing machines running commodity hardware? I'm not interested in paying $20-100K/TB/year for Vertica and friends.
SQL can be great if you have vast amounts of money or if your data can fit onto a single machine. When neither of those things are true though, nosql DBs become important.
Well are we talking with shard capabilities built-in, or having to roll your own? If you can roll your own then I suggest looking at Galera (we use this for our production MySQL stuff) http://www.codership.com
If it has to have built-in sharding then I can't think of anything off-hand.
MySQL cluster is another option beyond MySQL Galera (and Cluster actually has a NoSQL layer on top of it.. FYI)... but I have to say if we want to talk about scaling issues for MongoDB, you might not be happy with some of the limitations of NDB.
I loved this comment 'I noticed that "mongodb.org" is not on the Production Deployments List above. It appears this site uses Confluence which I believe uses PostgreSQL(or MySQL). It's hard to have confidence in an organization that does not even use its own technology.'
How up-to-date is that list? I mean, in terms of removing entries that no longer apply. Some date from 2009 to 2011. Are these systems still in place, and actively being used today?
I've heard of or directly witnessed enough situations where somebody with influence, but maybe not much actual technical experience, pushes for the use of a NoSQL database of some sort. Yes, the project is implemented and often does end up in production for at least some amount of time. But it doesn't survive long. Problems arise, and the system is either discarded, or moved to a more traditional relational database system. That's why I'm curious about that list, and how many of the entires are still valid, as we approach the beginning of 2013.
CERN. We got too excited and started using it for EVERYTHING (it started just being part of the LHC data analyzing project) and it didn't work in some cases, but for some projects it fitted in perfectly.
We have a very large MongoDB installation running in production and at scale, and it works pretty well.
That said, 99% of our production issues involve bugs in MongoDB and it's inability to effectively use all available resources before it becomes unresponsive. I would say it needs a few more generations to become truly solid.
We were very well aware of its characteristics when choosing our DB, and didn't go in expecting any magic Web Scale or somehow getting a HA setup with plenty of durability with just one server.
For a multitenant CMS where you want to store documents with custom schemas, need more than just a key/value store and want some capability to do ad-hoc queries against custom fields, MongoDB is a pretty good fit.
We use mongodb for the analytics as well yes. It's a less obvious choice there than for the CMS part, but it's a good enough fit, in-place updates can be really handy and we prefer not having 2 different databases.
We don't index custom fields and for queries where that would be required we do the actual querying with ElasticSearch, but for simple filters on a custom field or the like, Mongo does fine.