Really? Cassandra can write 416TB, 666GB per second? If Cassandra can write 50GB in 0.12ms, then it can write over 416TB in a second and 25petabytes per minute. Which clearly isn't true. Of course, that quote would also mean that MySQL can save 166GB per second.
The article is a piece about technology that the writer doesn't understand. They all have trade-offs. Key-value stores eliminate your ability to access data by anything other than its key. So, you have an article with an id of 5. What if you want to look up the articles by author 9? You're just unable. Column based databases aren't magic either. They just change the orientation of the data. Rather than seeing 1,Adam,West;2,Mark,Twain;3,Will,Smith you see 1,2,3;Adam,Mark,Will;West,Twain,Smith. That offers some advantages - such as being able to do metrics on a column easily - but it also means that if you want to get the data for one row, it has to do MORE work than a regular row-based database. To retrieve 1,Adam,West it has to do multiple lookups since the data isn't located next to each other. And frankly, random access of a row is what you're likely to want.
They run on clusters of cheap PC servers. PC clusters can be easily and cheaply expanded without the complexity and cost of "sharding," which involves cutting up databases into multiple tables to run on large clusters or grids.
That's like saying that if you plant a hamburger you'll get a McDonalds. They don't just magically run on clusters of cheap PCs. The fact is that there are tradeoffs. If you put 50% of the data on server1 and 50% on server2 you'll get faster query speeds except that you then have to know where to look for a specific piece of data.
So, sharding: basically, you split your data along logical (or non-logical) lines into separate databases. Like registration at a conference, users A-M are in DB1 and users N-Z are in DB2. So, when someone comes up with a query, you can easily tell which place to send them and then the query is run there. Or you can non-logically split and have a map that tells you where to go. So you query metadb and ask "where's Frank" and it says "Frank is in DB1" and then you query DB1 for Frank.
BigTable isn't so different. It does do some awesome auto-splitting of tables, but when it comes down to it, it's much the same. There's a single META0 tablet that gets queried to find the location of the META1 tablet which knows the location of the actual data. And, in order not to overload the META0 tablet, one needs to cache the location that you get back.
Often times, the problem is that you have web programmers who wouldn't know a B-Tree from a linked list. All of these technologies exist because there are certain things that they're good at. However, what this article purports to know is that these NoSQL databases are just superior. In most ways, they're inferior. A key-value store has its place, but it's severely limiting in a way that you can't just use a key-value store; you need more querying power. Likewise, whether a column-oriented database is right for you depends more on how you want to access the data (by column or by row).
Oh, and really, learn SQL indexing and check whether your queries are doing full table scans. That's the root of a lot of problems. I mean, a good index should do lookups in log(n) time which means that on 10,000 rows a query should take 1/769th the time; on a million rows a query should take 1/50,000th the time; on a billion rows it will take 1/33,333,333th the time of doing a full table scan of a table that size. Yeah, indexes make a huge difference, but they aren't magic either. They merely order the data in a certain way that makes it easy to pluck out certain rows since you don't have to look at every value. For example, dictionaries are alphabetized and so you don't have to look at every word. You just start jumping toward the area of the word you're looking for. Indicies work the same way. Now imagine if the dictionary was unordered. You'd have to look at every single word to see if it was the word you were looking for.
Oh, and just saying "put an index on stuff" isn't enough to know. It isn't too complicated, but there's a Google talk that goes over some of the mistakes one can make with assuming that indexes are doing what you want (http://www.youtube.com/watch?v=u70mkgDnDdU&feature=chann...). It's in the middle somewhere, but it's a good watch for the whole thing.
Really, is this the type of article we've become? I mean, column-oriented datastores and key-value datastores are both valuable technologies, but a NoSQL flame written by someone who doesn't know that Facebook is still MySQL backed and uses Cassandra for only specific things and that they use many different technologies where they're good (like heavy use of memcached as a key-value store to reduce load on MySQL where queries aren't needed). Or perhaps that the article doesn't seem to acknowledge the drawbacks that the authors of these systems would talk about. For example, mongoDB states that they lack ACID integrity and that it's more suited for "High volume, low value data" - that should be read as whether someone has upvoted or not; if that data gets lost it doesn't really matter so much because the person can click the vote button again and all is well or just because it isn't super important data.
Why are we supporting "we hate this technology just because"? Technologies are tools that have their places. It's foolish to write off a good technology just because you want to make fun of it or deride it. Use the right tool for the job. In your web application, you might find that using multiple tools in concert is the way to go.
If you click through to the slides, they are actually referring to latency. The slides say that for accesses of > 50GB, Cassandra's latency is 0.12ms, and MySQL's is around 300ms.
This, of course, has very little to do with the rate at which data is written.
(Yay for IT journals... quality information for PHBs...)
For cases like that you don't have one source of data, you do almost like an AFTER INSERT trigger to update other flattened tables of data that you need.
For instance, where you would normally perform a JOIN you would just after save update a JOIN table or a VIEW that is created by incremental updates.
So then you could filter by many other things and really aren't limited too much by space to make those duplications.
It happens with large RDBMS systems anyways after about 5GB of data, same type of scalability decisions have to be made.
But yes the article is off and you are right, most likely systems will approach this with multiple solutions for specific problems.
I work every day with a > 4TB database and its all on 1 machine (a very big machine) with some very fast SAN. If you have money, Oracle can get very big on the right hardware. I am willing to bet MySQL or Postgres could do very well on that sort of hardware too.
More like 5Tb these days.
At that point, even in RDBMS you have to stop with the joins and flatten. I have experienced these limits in Oracle and MSSQL around 20-30 million row tables.
Scalable fundamentals like database flattening, dimensional modeling etc are all what key value stores give you from the start. But really a good mix works best or project specific, just saying in the future with TB, PB of data the JOIN is a historic remnant.
It is possible one day that RDBMS will be seen as one of those evil optimizations that we made in our small relative worlds at the time.
It is possible one day that RDBMS will be seen as one of those evil optimizations that we made in our small relative worlds at the time.
I doubt that for the simple reason that relational databases are maths: the relational algebra and relational calculus. There's no such theoretical underpinning to object databases.
True, the relational architecture is moved to the object or code level, if you think about it in relation to size, it makes more sense there for the future.
The RDBMS was the brain but it is also the storage, code will now dictate how to use the storage and not the storage itself, the storage is becoming a component.
The proper scaling issue is, of course, with the cpu and memory of your single DB server - and its disk speed - which can only be expanded so much, and at exponentially increasing cost. It's conceivable that even with a fixed amount of data, say 100G, you might have to confront scaling issues with increasing site usage, long before you reached anywhere near the limits of your raw disk space.
Seeing the amount of people generalizing that to apply to all RDBMSs makes me think they don't have any DB-knowledge outside MySQL.
And fyi I prefer PostgreSQL. But MySQL isn't THAT bad, c'mon.
MySQL may be able to perform better, as I haven't done in depth tests I wouldn't know how far it can be stretched, but my point is that almost everyone complaining about the performance of the RDMS model seems to come from MySQL thinking that it represent the utmost limits of what relational databases can do, and that is quite sad.
MySQL is not that bad, no. With proper knowledge it can do a lot, and beyond its limits, capably configured, I am not sure any RDBMS can help. Facebook is not developing Cassandra because they can't afford Oracle.
That said, I think that calling the whole event NoSQL was kind of asking for it; of course people are going to try to squeeze some drama out of that stone. (I would have gone to the meetup anyway, though, if I hadn't already been booked.)
I would like to add one quick correction or maybe clarification to, [Indexes] merely order the data in a certain way that makes it easy to pluck out certain rows since you don't have to look at every value.
Indexes don't always order the data, only clustered indexes order the data and there can only be one clustered index on a table. You can have lots of unclustered indexes that are merely pointers to data within the larger table.
These numbers have been taken from p.21 of the linked PDF by the Facebook engineer Avinash Lakshman where it shows this MySQL vs Cassandra comparison:
"MySQL > 50GB Data
Writes Average : ~300ms
Reads Average : ~350ms
Cassandra > 50GB Data
Writes Average : 0.12ms
Reads Average : 15ms"
Which I'm guessing actually means: with a database table which is more than 50GB total size they have measured individual row accesses at these speeds. Maybe Avinash has accidentally transposed the read and write figures - I can't think why the read would be slower than the write.
15ms to write a row to disk seems possible given that a fast modern disk
has a latency of 2ms and seek time of 4ms and a sustained transfer rate of about 100MB/s. The track to track time is only 0.4ms so maybe if you just wrote all the data to disk serial-log-style you could reconstruct from the log after a failure and handle all reads from memory. I don't know Cassandra. Obviously, from these figures, the disk couldn't do a row read in 0.12ms.
You can (usually) make reads faster by throwing things like memcached at the problem. Writes are harder. So I think this is the right tradeoff for a modern system.
You build and maintain indexes, just like you do with an SQL db.
Yes you can, at least in CouchDB, just create a view that emits the author id and you're done.
It is very good at what it does. But it is not good at, nor ever will it be good at, ad hoc queries like the GP describes. The second time you query that view it will be blindingly fast. The first time, however, it has to build the view index from scratch, which in a large-ish DB might well take hours.
You just can't assume you will have pretty fast "random access" queries like you would on MySQL or similar. Of course, it goes the other way as well, and there are many examples of views you can trivially do in Couch which would be prohibitively expensive in MySQL.
As always, you pays your money and you makes your choice.
Query on an un-indexed column and it'll take forever every time. (There's an Oracle database I deal with occasionally that takes 50 seconds to count 78 rows. No we can't add an index.)
All NoSQL (eewww) non-RDBMSs do is move the pain around. The efficiency and lookup times are (usually, mostly) orthogonal to the orientation of the data, all you can do is align your use case to hit as few pain points as posisble.
I don't know what's going on with your Oracle install but even taking the small example of 1k rows, an unindexed ad hoc query in MySQL will return pretty quickly, well under a second on a decent machine/disk. That might be fine for, say, occasional use of a "reports" web page - and you don't need to then store and update an index. The same query on CouchDB will be at least 10 times slower, possibly making the page unusably slow, and if you want it to be usable you need to store the index - no choice.
But yeah, just "moving the pain around" is absolutely right. Ain't no silver bullets.
I just want to add that column stores are not in any way anti SQL. They don't change the data model at all. They're just a different implementation of the relational model, more suitable for analytics than row stores. For OLTP apps they're bad. Reconstructing one row isn't necessarily expensive (as column stores don't just store individual columns), but writing one row is expensive.
And as far as a lookup occuring in log(n) time, imagine a broadcast "hey everyone, row associated with 1030923". Each node checks its bounds (in parallel) and then, if it exists w/i those bounds, does a log(n) search. That's pretty reasonable.
In short, my prediction is that these things will grow way outside of key-value pairs and into a complete solution that scales very well (and is _incredibly_ redundant).
I imagine there will be some great improvements that come out, but I think we're past the point of any major data store revolutions unless something fundamentally changes like quantum computing becomes a reality and we have to all basically start over.