Instead, I'd suggest using NOSQL stuff to complement SQL. Songkick use MongoDB as a fast caching layer for example: http://effectif.com/ruby/manor/denormalising-your-rails-appl... - and I've found Redis incredibly useful as a way of handling write-heavy parts of my applications and dealing with requirements to return random elements.
One of the most interesting aspects of document stores such as CouchDB is that they are schemaless, which for some problem sets is incredibly powerful - anything where you might be tempted to use key/value pairs in SQL for example.
- You have live data that is changing very regularly
- You have a large quantity of flat data (think column-oriented databases)
- You don't need to index / find data and it is very flat
(perhaps simple files on disk will allow you to store more)
- You cannot possibly put off going NoSQL until you have further established yourself in the marketplace
For Gridspy, we have live data and I expect large quantities of pretty flat data. It makes sense to stream the data directly to the user via messaging rather than polling through a database. Plus, I plan to store large quantities of high resolution data in a specialised database or dumped to disk - it will be much smaller and simpler without the indexing information since I don't need to search it, only slice it.
EDIT: I should qualify this argument by stating that I'm talking about datastores that support custom queries (like MongoDB and CouchDB) rather than ginormous k/v stores (like Tokyo Cabinet). I've found limited usefulness for the latter.
It sounds like you do, however I think that there are a lot of people who have no real reason to move away from SQL other than fashion. The database choice should be just like any other engineering decision - often your familiarity with the new tool is very important.
The "mmap and update in place" design has upsides and down. Up: it's very fast and it doesn't allocate any new space for simple updates. Down: data will be trashed if the DB is shut down rudely, so you need a replica of each node just to keep ACID.
Of course I'm biased, and tend to lean towards using SQL/relational databases... FathomDB is all about trying to eliminate the pain points of running a (My)SQL database. I feel a lot of the NoSQL marketing hype is picking on weaknesses of MySQL (rather than relational databases per-se), and so we're thinking about how to make MySQL better, and we don't think it's a good idea to abandon the relational model entirely. After all, our industry started with NoSQL back in the 60s, and there were good reasons for adopting the relational model 30 years ago!
The relational model and data normalization are reasonable at stopping you painting yourself into a corner, and SQL's bulk data manipulation operations can get you out of trouble.
With NoSQL, you're on your own, which is great for those that are infallible and omniscient. Ironically enough, I think that means that only Larry Ellison should be using NoSQL, to twist the well known joke.
Having said that, the guys behind Harmony (http://get.harmonyapp.com/) use MongoDB for everything, as far as I know: http://railstips.org/blog/archives/2009/12/18/why-i-think-mo...
watch some of the videos from nosqleast 2009, https://nosqleast.com/ to get a better picture of some of the different options and major players in this area before making a decision as to what nosql solution to base any of your future projects on.
Relational database systems (+ normalization) compromise everything to ensure the ACID properties, which for the majority of cases, is the most important part.
The last case is hardest and I'll share some thoughts on it.
I know most of you don't live in such environment, but still you can infer the "agile" scenarios from the waterfall one. In other words, the following waterfall issues can be areas of potential fkups using whatever development model.
So, the impact of switching to NoSQL for different waterfall'ish teams:
- it changes the way how your data is organized -- mostly it's denormalization and some strategies tied to the specifics of queries you'd have to use most (read: ad hoc strategies). So, it influences the analysis, architects, development & release management.
- it changes the way how the db "schema" changes can be introduced. You'd say "there's no schema". Well, it's partly true, but in real life you have to add some metadata information to the underlying db, otherwise your db queries won't run. For example, Cassandra has ColumnFamilies definitions, CouchDB has its view definitions. Somebody has to agree what needs to be changed and then write these changes and maintain it in sync with the codebase. You'd probably need mechanism like Rails migrations to maintain it - you won't get rid of it with the promise "there's no schema". Somebody has to apply such changes to production as well. So, back to the waterfall: it influences analysis, development, release management & operations.
- it changes the way how your app scales. The goal of many NoSQL engines is to easily scale horizontally -- this is a big win to operations! But we're not there yet (Cassandra? Maybe MongoDB?), see eg. http://bjclark.me/2009/08/04/nosql-if-only-it-was-that-easy/. Also, if something you need crucially doesn't scale, you have to redesign your app. So the influence is: operations have less work, release management has more work, but in the worst case all the teams have to rework the app.
- it allows for some non-standard app behaviours. Eg. CouchDB is excellent at disconnencted operations, meaning: ocasionally synchronizing data between nodes which are mostly offline. It's also called "no master" as opposed to "multi master". No wonder IBM research funded CouchDB development (trying to rewrite Lotus Notes? ;) and also Ubuntu chose it for their Ubuntu One sharing platform. Feature like this is a relief for release mgmt & operations, but can need a lots of work from the architects, analysis & devs.
Hope this is useful. I'm considering convincing some BigCorp to use NoSQL in some project and these are the issues I thought of.
In testing of the new up-coming platform, that was a huge, huge win for speed. And we're a Postgresql shop too.
MongoDB allowed us to:
* Have embedded documents (very large performance improvements)
* Have arrays and hashes as "columns"
We also use Redis in a few crucial places, because of its really good support for lists (queues), and sets, besides, just its blazingly raw speed.
Downsites? Yes. Many rails-style plugins don't work well. But an upside is that we're forced to write leaner code and not depend too much on those.
Another downside, MongoDB is super-fast, but is still a work in progress in some places, and the ORM we're using (mongo_mapper) is somewhat of a moving target right now.
But hey, thats what happens when you're on the bleeding edge.
* build-in replication
* basic sharding
* embedded documents
* very very fast