My other site, http://www.BudgetSimple.com on the other hand is using SQL Server (in the process of porting to MySQL). It would not be a great use-case for Mongo, because there are usually as many update, delete, inserts as there are reads, and instant database integrity and a schema are important.
Anyone that claims a tool is perfect for every problem is probably wrong. You need to figure out the best one for your use case, and load test, security test, performance test, etc... until you have a good guess for the right answer.
Did you ever consider a datasource like elasticsearch? If yes what made you choose mongo?
That probably would have worked as well (don't think I considered that specific solution). Mongo came up on top because of it's wide use (among other things), ie it's pretty easy to find support and lots of stories about how to scale it under different scenarios.
Its weird at first, coming from a background of Access then SQLite then MySQL/PHPMyAdmin but you get used to it. I essentially treat it like a gigantic python dictionary object.
The sharding is too much of a ball-ache to set up so I've created an optimal way of distributing/mapping files across our cluster to make use of all machines.
Data integration is nice. Making sure there's no temptation to output each integrated line to the terminal, pymongo and its C extensions can integrate a ~500 byte record in ~0.0001 seconds.
Basically the main advantage is not having a schema whatsoever - you can just add random attributes to documents whenever the hell you want. But later you have to be careful with exception handling since documents might not have the attributes you expect.
So right off the bat, you've lost the querying power that SQL offers. When dealing with data that's intended to be analyzed, that sounds like a pretty big loss.
Clearly its built-in sharding support, which is often touted as one of its biggest benefits, wasn't suitable for you. So you had to invest some time and effort coming up with an alternate system. That sounds like a loss to me.
When it comes to the schema issue, it sounds like you haven't actually reduced the effort or work in any way, but merely pushed it somewhere else. Like you admit, you still do have to deal with the schema, it's just handled within the application logic, rather than the database. That sounds worse to me, especially if there is more than one application using the database.
I'm just not seeing the benefit.
Can you point me to any reasonably-priced database systems that allow me to execute SQL queries against a cluster of shared-nothing machines running commodity hardware? I'm not interested in paying $20-100K/TB/year for Vertica and friends.
SQL can be great if you have vast amounts of money or if your data can fit onto a single machine. When neither of those things are true though, nosql DBs become important.
If it has to have built-in sharding then I can't think of anything off-hand.
Auto sharding, shared nothing, SQL support, it does a pretty good job of things.
Gets some decent traffic and works well.
I've heard of or directly witnessed enough situations where somebody with influence, but maybe not much actual technical experience, pushes for the use of a NoSQL database of some sort. Yes, the project is implemented and often does end up in production for at least some amount of time. But it doesn't survive long. Problems arise, and the system is either discarded, or moved to a more traditional relational database system. That's why I'm curious about that list, and how many of the entires are still valid, as we approach the beginning of 2013.
And there's s growing list of stories at 10gen.com/presentations
Some good ones to point out:
Apollo Group (The University of Phoenix) http://www.10gen.com/presentations/mongosv-2012/how-we-evalu...
AOL : http://www.10gen.com/presentations/managing-large-scale-data...
These are all large deployments. It's a mix of small startups, startups that grew up and large engineering companies (like eBay and Apollo group).
That said, 99% of our production issues involve bugs in MongoDB and it's inability to effectively use all available resources before it becomes unresponsive. I would say it needs a few more generations to become truly solid.
We were very well aware of its characteristics when choosing our DB, and didn't go in expecting any magic Web Scale or somehow getting a HA setup with plenty of durability with just one server.
For a multitenant CMS where you want to store documents with custom schemas, need more than just a key/value store and want some capability to do ad-hoc queries against custom fields, MongoDB is a pretty good fit.
We don't index custom fields and for queries where that would be required we do the actual querying with ElasticSearch, but for simple filters on a custom field or the like, Mongo does fine.