*Cassandra can write 50GB of data in 0.12 milliseconds, more than 2,500 times fa...

jrockway · on July 2, 2009

Really? Cassandra can write 416TB, 666GB per second? If Cassandra can write 50GB in 0.12ms, then it can write over 416TB in a second and 25petabytes per minute. Which clearly isn't true. Of course, that quote would also mean that MySQL can save 166GB per second.

If you click through to the slides, they are actually referring to latency. The slides say that for accesses of > 50GB, Cassandra's latency is 0.12ms, and MySQL's is around 300ms.

This, of course, has very little to do with the rate at which data is written.

(Yay for IT journals... quality information for PHBs...)

drawkbox · on July 2, 2009

"They all have trade-offs. Key-value stores eliminate your ability to access data by anything other than its key. So, you have an article with an id of 5. What if you want to look up the articles by author 9? "

For cases like that you don't have one source of data, you do almost like an AFTER INSERT trigger to update other flattened tables of data that you need.

For instance, where you would normally perform a JOIN you would just after save update a JOIN table or a VIEW that is created by incremental updates.

So then you could filter by many other things and really aren't limited too much by space to make those duplications.

It happens with large RDBMS systems anyways after about 5GB of data, same type of scalability decisions have to be made.

But yes the article is off and you are right, most likely systems will approach this with multiple solutions for specific problems.

fendale · on July 2, 2009

> It happens with large RDBMS systems anyways after about 5GB of data, same type of scalability decisions have to be made.

I work every day with a > 4TB database and its all on 1 machine (a very big machine) with some very fast SAN. If you have money, Oracle can get very big on the right hardware. I am willing to bet MySQL or Postgres could do very well on that sort of hardware too.

drawkbox · on July 2, 2009

True but scaling vertical is really a luxury, in most situations especially free to play or open systems for a certain level of usage, this is typically not economical. For a good vertical scaling strategy you need servers that reach into the tens or hundreds of thousands of $$.

andersnawroth · on July 2, 2009

The interesting thing about the article is that the NoSQL movement has succeeded in reaching outside the database nerd circles! To get some insight in the actual trade-offs and uses cases for non-RDBMS database systems, this article is a much better read: http://thinkvitamin.com/dev/should-you-go-beyond-relational-...

gaius · on July 2, 2009

It happens with large RDBMS systems anyways after about 5GB of data, same type of scalability decisions have to be made.

More like 5Tb these days.

drawkbox · on July 3, 2009

I should have clarified, I meant around 5GB per one type of data. Maybe a table or dataset that needs to be merged. For instance if you had a product database table that needed to be joined with another GB large or set of data at runtime.

At that point, even in RDBMS you have to stop with the joins and flatten. I have experienced these limits in Oracle and MSSQL around 20-30 million row tables.

Scalable fundamentals like database flattening, dimensional modeling etc are all what key value stores give you from the start. But really a good mix works best or project specific, just saying in the future with TB, PB of data the JOIN is a historic remnant.

It is possible one day that RDBMS will be seen as one of those evil optimizations that we made in our small relative worlds at the time.

gaius · on July 3, 2009

I've tables much, much larger than that.

It is possible one day that RDBMS will be seen as one of those evil optimizations that we made in our small relative worlds at the time.

I doubt that for the simple reason that relational databases are maths: the relational algebra and relational calculus. There's no such theoretical underpinning to object databases.

drawkbox · on July 4, 2009

I doubt that for the simple reason that relational databases are maths: the relational algebra and relational calculus. There's no such theoretical underpinning to object databases.

True, the relational architecture is moved to the object or code level, if you think about it in relation to size, it makes more sense there for the future.

The RDBMS was the brain but it is also the storage, code will now dictate how to use the storage and not the storage itself, the storage is becoming a component.

sho · on July 2, 2009

Well, the disk space is not the controlling parameter usually. Getting a ridiculously huge amount of fast, reliable disk is pretty much a solved problem. If all you were doing was writing logs at a constant rate then you could "scale" pretty much indefinitely just adding larger and larger disks to the same poky DB server, and it wouldn't matter if it was 5GB or 5PB.

The proper scaling issue is, of course, with the cpu and memory of your single DB server - and its disk speed - which can only be expanded so much, and at exponentially increasing cost. It's conceivable that even with a fixed amount of data, say 100G, you might have to confront scaling issues with increasing site usage, long before you reached anywhere near the limits of your raw disk space.

gaius · on July 2, 2009

One database I work on was under 2Tb when I got it a little over 3 years ago and is nearly 30Tb now, so I am very familiar with the scaling issues. No need for sharding or any of that nonsense either!

trezor · on July 2, 2009

That sharding is required once your DB exceeds what you have of physical RAM seems to be a irrevocable meme in the MySQL world.

Seeing the amount of people generalizing that to apply to all RDBMSs makes me think they don't have any DB-knowledge outside MySQL.

wvenable · on July 2, 2009

That's crazy advice even for MySQL. Having a DB that exceeds physical RAM is why you use a DBMS.

sho · on July 2, 2009

Who said that? Certainly not me, and I doubt gaius has 30TB of RAM so not him either. How much RAM you need is a complex question and revolves to a certain extent around the DB caching the "hot points" but differs so radically app by app there's really no rules or general guidance you can point to. It comes down to measurement of your particular installation, as always.

And fyi I prefer PostgreSQL. But MySQL isn't THAT bad, c'mon.

trezor · on July 2, 2009

I'm not sayin MySQL is that bad, but I exclusively hear this kind of advice and anti-DB rhetoric from people running MySQL and MySQL only, who are complaining about scaling.

MySQL may be able to perform better, as I haven't done in depth tests I wouldn't know how far it can be stretched, but my point is that almost everyone complaining about the performance of the RDMS model seems to come from MySQL thinking that it represent the utmost limits of what relational databases can do, and that is quite sad.

sho · on July 3, 2009

Hm. Well, I agree with your observations. But maybe they're more willing to suggest alternatives because they haven't invested so much into learning an alternate system? You would expect someone who has, say, spent the last 15 years becoming an expert on DB2 to refrain - even subconciously - from questioning the very fundamentals his career is built on. MySQL users may not be advanced but at least they are unbiased.

MySQL is not that bad, no. With proper knowledge it can do a lot, and beyond its limits, capably configured, I am not sure any RDBMS can help. Facebook is not developing Cassandra because they can't afford Oracle.

gaius · on July 3, 2009

I first used MySQL a decade ago... There are plenty of people who've fully invested their careers in it. Which is as sad as someone who thinks MS Access is the state of the art.

sho · on July 3, 2009

Sigh. Can't argue with that.

rmaccloy · on July 2, 2009

This article is really terrible, and it's sad because most of the pieces of software discussed are at least interesting. Several are indispensable to anybody in the business of dealing with PB-scale data. (I'm pretty sure Hadoop is seeing healthy enterprise adoption.)

That said, I think that calling the whole event NoSQL was kind of asking for it; of course people are going to try to squeeze some drama out of that stone. (I would have gone to the meetup anyway, though, if I hadn't already been booked.)

pj · on July 2, 2009

I agree with you. There's a lot of hype around these nosql databases, because a lot of people simply don't understand sql databases or how to make them performant or they haven't run into the problems RDBMSs solve yet.

I would like to add one quick correction or maybe clarification to, [Indexes] merely order the data in a certain way that makes it easy to pluck out certain rows since you don't have to look at every value.

Indexes don't always order the data, only clustered indexes order the data and there can only be one clustered index on a table. You can have lots of unclustered indexes that are merely pointers to data within the larger table. http://en.wikipedia.org/wiki/Index_(database)#Clustered

by · on July 2, 2009

"Cassandra can write 50GB of data in 0.12 milliseconds ..."

These numbers have been taken from p.21 of the linked PDF by the Facebook engineer Avinash Lakshman where it shows this MySQL vs Cassandra comparison:

"MySQL > 50GB Data Writes Average : ~300ms Reads Average : ~350ms Cassandra > 50GB Data Writes Average : 0.12ms Reads Average : 15ms"

Which I'm guessing actually means: with a database table which is more than 50GB total size they have measured individual row accesses at these speeds. Maybe Avinash has accidentally transposed the read and write figures - I can't think why the read would be slower than the write.

15ms to write a row to disk seems possible given that a fast modern disk

http://www.seagate.com/docs/pdf/datasheet/disc/ds_cheetah_15...

has a latency of 2ms and seek time of 4ms and a sustained transfer rate of about 100MB/s. The track to track time is only 0.4ms so maybe if you just wrote all the data to disk serial-log-style you could reconstruct from the log after a failure and handle all reads from memory. I don't know Cassandra. Obviously, from these figures, the disk couldn't do a row read in 0.12ms.

jbellis · on July 2, 2009

Cassandra uses log-structured merge storage, so reads really _are_ slower than writes. (But still much faster than MySQL!)

You can (usually) make reads faster by throwing things like memcached at the problem. Writes are harder. So I think this is the right tradeoff for a modern system.

/Cassandra dev

blasdel · on July 2, 2009

"They all have trade-offs. Key-value stores eliminate your ability to access data by anything other than its key. So, you have an article with an id of 5. What if you want to look up the articles by author 9"

You build and maintain indexes, just like you do with an SQL db.

patcito · on July 2, 2009

> So, you have an article with an id of 5. What if you want to look up the articles by author 9? You're just unable.

Yes you can, at least in CouchDB, just create a view that emits the author id and you're done.

sho · on July 2, 2009

After the server has crawled over every single doc in the DB submitting them to the new view, then yes, you're done : D

va_coder · on July 2, 2009

Thanks for that info. I feel like people are omitting crucial info when describing these new datastores.

sho · on July 2, 2009

Oh don't get me wrong, I use CouchDB and love it. I think it's going to be big.

It is very good at what it does. But it is not good at, nor ever will it be good at, ad hoc queries like the GP describes. The second time you query that view it will be blindingly fast. The first time, however, it has to build the view index from scratch, which in a large-ish DB might well take hours.

You just can't assume you will have pretty fast "random access" queries like you would on MySQL or similar. Of course, it goes the other way as well, and there are many examples of views you can trivially do in Couch which would be prohibitively expensive in MySQL.

As always, you pays your money and you makes your choice.

skorgu · on July 2, 2009

I know you don't mean it this way but this sounds like SQL databases can magically index their data without scanning it. Add a new index to a large table in any RDBMS and it'll do exactly what you're describing.

Query on an un-indexed column and it'll take forever every time. (There's an Oracle database I deal with occasionally that takes 50 seconds to count 78 rows. No we can't add an index.)

All NoSQL (eewww) non-RDBMSs do is move the pain around. The efficiency and lookup times are (usually, mostly) orthogonal to the orientation of the data, all you can do is align your use case to hit as few pain points as posisble.

sho · on July 2, 2009

Oh yeah I know. But RDBMSs tend to be much faster at ad hoc queries.

I don't know what's going on with your Oracle install but even taking the small example of 1k rows, an unindexed ad hoc query in MySQL will return pretty quickly, well under a second on a decent machine/disk. That might be fine for, say, occasional use of a "reports" web page - and you don't need to then store and update an index. The same query on CouchDB will be at least 10 times slower, possibly making the page unusably slow, and if you want it to be usable you need to store the index - no choice.

But yeah, just "moving the pain around" is absolutely right. Ain't no silver bullets.

patcito · on July 2, 2009

This process is only slow if your DB is already huge. CouchDB forces you to think your query a bit more before building your DB. To be fair though, try to add an index to a huge SQL database, it will take long too, and sql indexes are needed on any frequently used query.

fauigerzigerk · on July 2, 2009

I agree, the article is really sloppy, a typical piece of journalism I would say.

I just want to add that column stores are not in any way anti SQL. They don't change the data model at all. They're just a different implementation of the relational model, more suitable for analytics than row stores. For OLTP apps they're bad. Reconstructing one row isn't necessarily expensive (as column stores don't just store individual columns), but writing one row is expensive.

alexgartrell · on July 2, 2009

To be fair to the noSQL guys, the technologies that drive the alternatives are very, very young. People are just starting to play with it. There's no reason to think that techniques for performing more traditional sql-style queries on distributed data won't quickly surpass it.

And as far as a lookup occuring in log(n) time, imagine a broadcast "hey everyone, row associated with 1030923". Each node checks its bounds (in parallel) and then, if it exists w/i those bounds, does a log(n) search. That's pretty reasonable.

In short, my prediction is that these things will grow way outside of key-value pairs and into a complete solution that scales very well (and is _incredibly_ redundant).

dasil003 · on July 2, 2009

Are you kidding? I wouldn't say it's impossible, but "no reason to think that techniques for performing more traditional sql-style queries on distributed data won't quickly surpass it" is quite a stretch. There's plenty of reason to believe that. Mainly because of the number of databases and the sheer volume of thought put into RDBMS', both theoretical and practical. Sure, when it comes to massive scalability there are dozens of promising new technologies that beat SQL databases, but they all make tradeoffs, there's just no way not too when you're talking about something as powerful as SQL.

I imagine there will be some great improvements that come out, but I think we're past the point of any major data store revolutions unless something fundamentally changes like quantum computing becomes a reality and we have to all basically start over.

stcredzero · on July 2, 2009

I suspect that solid state drives optimized for databases will appear before quantum computing. ioFusion is already targeting media distribution applications. Just as filesystems have been optimized to spinning platters and are adjusting for SSD, databases will have to adjust as well. (I'm not just talking about flash, either. There will be other SSD technologies.)

alexgartrell · on July 2, 2009

The software that runs on any node is pretty trivial as compared to a SQL database. Very easily written and understood. In three years, I assure you the systems will be significantly better and will start to rival many of the features of RDBMS'. At least the ones most relevant to anyone running something like hacker news or stack overflow.

messel · on July 2, 2009

So I'm working on a small project that will develop into a startup if the prototype works. It's heavily based on information handling and to supplement the functionality we plan on having an internal user profile database. You are obviously familiar with the functionality of modern database tech. Was thinking SQL, might there be faster/lower latency lookups? Haven't nailed down all the fields yet but baseline will include taxonomy(language topics), historical preferences (general and specific) I can go into more detail by email if you're curious)

archon810 · on July 2, 2009

I also noticed this outrageous claim and left a comment in the article. It's shocking that the author can just blurb out random BS and post it without verifying.

jimcaruso · on July 2, 2009

Great post. Clear, sensible, and informative.