Its weird at first, coming from a background of Access then SQLite then MySQL/PHPMyAdmin but you get used to it. I essentially treat it like a gigantic python dictionary object.
The sharding is too much of a ball-ache to set up so I've created an optimal way of distributing/mapping files across our cluster to make use of all machines.
Data integration is nice. Making sure there's no temptation to output each integrated line to the terminal, pymongo and its C extensions can integrate a ~500 byte record in ~0.0001 seconds.
Basically the main advantage is not having a schema whatsoever - you can just add random attributes to documents whenever the hell you want. But later you have to be careful with exception handling since documents might not have the attributes you expect.
So right off the bat, you've lost the querying power that SQL offers. When dealing with data that's intended to be analyzed, that sounds like a pretty big loss.
Clearly its built-in sharding support, which is often touted as one of its biggest benefits, wasn't suitable for you. So you had to invest some time and effort coming up with an alternate system. That sounds like a loss to me.
When it comes to the schema issue, it sounds like you haven't actually reduced the effort or work in any way, but merely pushed it somewhere else. Like you admit, you still do have to deal with the schema, it's just handled within the application logic, rather than the database. That sounds worse to me, especially if there is more than one application using the database.
I'm just not seeing the benefit.
Can you point me to any reasonably-priced database systems that allow me to execute SQL queries against a cluster of shared-nothing machines running commodity hardware? I'm not interested in paying $20-100K/TB/year for Vertica and friends.
SQL can be great if you have vast amounts of money or if your data can fit onto a single machine. When neither of those things are true though, nosql DBs become important.
If it has to have built-in sharding then I can't think of anything off-hand.
Auto sharding, shared nothing, SQL support, it does a pretty good job of things.