Ditch the mutable data and you can stop asking questions like what do we do if 10 becomes 10.5 before it becomes 11 and start storing values which never change.
Also, I think the focus on immutability misses the more interesting discussion about lattices and how important they are to distributed programming. Check out this video, which will take you up from the beginning: http://vimeo.com/53904989
(Seriously, if you do anything remotely distributed, this is required viewing. There's some serious stuff going on in the distributed world and this is a great intro.)
I'm sure it's the right answer for some people, some of the time. In fact, I'm sure it's not applied as frequently as it should be. But it's definitely a specialized tool with special applications, where mutable state is, for better or worse, the hammer we can and ought to continue relying on.
I think there might be a balance here: you can always garbage-collect events that are already merged in an updated value of the given object. Depending on the requirement, this GC can be done e.g. after days or after months of that merge...
Now the 'folding' can be defined as snapshotting the 'merged state'. Instead of fetching 10 events, after the folding + GC, you will fetch e.g. 2 + the folded one. You are saving some CPU and bandwidth over time and that's it.
Once an event is visible to all users, it can be merged into the base state of the system and no longer needs to be stored (or at least kept online) as a separate event. For the performance reasons you allude to, you probably want to do that as much as possible.
(This is still effectively an append-only series of immutable states, but losing, at least from regular on-line access, "older" states that no one can see anymore.)
There's the rub: any GC/compaction (and when you get down to it, any read) is actually a distributed consensus problem. If you're interested in this problem, you might take a look at the CRDT garbage collection literature for more details.
Yes I could see that getting prohibitively expensive when SSDs cost 70 cents/GB and hard drives 5 cents/GB. You should really throw out your historical data at those kinds of costs, probably not worth 5 cents per GB.
Personally, that's why I don't keep backups, files change all the time, and I was going broke making sure I had older copies of my data. I'd rather just rewrite all my code and retake all my pictures.
Even a petabyte should fit in a rack or two.
I really fail to understand how a business could acquire that much data and not be able to sell it.
Here ya go, 180 TB for $10K in 4U, which means 10 to a rack which means 1.8 PB per rack. Who has 180 TB of database that isn't worth $10K?
Seriously, I've seen plenty of bugs found in database transaction mechanisms, and I've never seen a database pull claims of, "transactional integrity". I can't even imagine that getting through the product team in a couple of days.
When dealing with data, you don't get any points for "well, we meant to do it the right way, but it didn't happen. sorry."
Databases are and must be held to a higher standard than the software sitting above them in the stack, just as kernels must be held to an even higher standard, because bugs in lower layers cause more damage with higher costs. DBAs and commercial databases are expensive because data is valuable and there are liabilities. If the database developer made a remark like yours, I would run in the opposite direction. A smug reply like that exposes a fundamental disrespect for other people's data--their property. If one has no respect for our property, one shouldn't find it surprising that we have no respect for one's software or services.
I note and appreciate that the Cassandra developer's reply below is even-handed and serious.
I guess what I'm trying to say is, if I see something I think should obviously be tested not tested, my go-to assumption is that the failure was in bad testing practices. To dispel that belief, I need something to replace it with that explains the observation even better. If you're willing to share your view of the situation, maybe that would do it. (There's no real business case for you to do it, since I'm not in your market. But as a hacker, I'm interested in what I can learn from this.)
So, I think it is relevant that LWT is still a very new feature in Cassandra and not something basic to it at all (arguably counter to a lot of its original design goals).
Personally, I was much more concerned by the server side timestamps only using millisecond granularity (and even that is somewhat understandable given the JVM's limitations).
Conflict of interest disclosure -- I work for FoundationDB, where we put a shockingly high level of effort into testing our software in simulation and the real world. 
Umm... no. The .0 releases of the project are actually where you'd expect the most bugs. That's where you have the first commit with an implementation of a new design.
You'll note that DataStax's commercial version of DSE is still based on 1.2.x...
I remember others (employees of company) hailing its wonderful qualities for quite a while now (years), then I go to the website and all I could find was a bunch of white papers and a registration form. And here it seems a bit of a "my vaporware's features are better than your shipped product's features".
Now matter how many white papers there are I would still put my data in Cassandra rather than this new thing (last I checked I couldn't even download it, I had to fill out a form of some sort).
I'm sorry for any errors/misrepresentations I made about your words, and will be more careful in the future. (Unfortunately, I can no longer edit, delete nor respond to that post.)
Nice marketing attempt though.
Otherwise, kudos for adding a potentially useful feature to cassandra.
The granularity is a CQL partition: http://www.datastax.com/documentation/cql/3.0/webhelp/index....
Is it still not obvious that "universal object storage" is a naive idea?)