"First, scaling horizontally has little to do with the database engine itself - creating a transparent, consistent hash function is the easiest part."
That is just so incorrect that it's hard to take the rest of the post seriously.
What happens when you want to add a node to your cluster? What happens when a node goes down? What happens when you drop some packets between nodes? When one node has an unbalanced number of keys?
Then, answering each of these questions brings with it many more questions. For example, if the answer to "what happens when your node does down" is to replicate some data to another node, then how do you deal with inconsistencies between different replicas of the data? What happens if the node you try to replicate to goes down? What if the node you try to replicate to has a different idea of what that data should be?
If a database is to be truly horizontally scalable, it will have an answer for all of these questions. Which has a lot to do with "the database engine itself."
Replication is for redundancy, not horizontal scalability. Unless you mean scaling reads, but replication won't scale your writes. For scaling writes you need sharding but not replication.
"In the computer science terminology, an O(N) algorithm is considered “naive”, and in the computer security terminology, it even has a name - “brute force”"
* The CAP theorem is a logical proof of the impossibility of providing both consistency and availability on a network which can lose messages (or using machines which can fail). You can't implement it any more than you can implement general relativity. It's a description of reality.
* The CAP theorem is not a data model which competes with or intersects at all with relational algebras. Rather, a relational algebra is the logical model (allegedly) underlying RDBMSes which are systems which historically provide consistency at the expense of availability in the presence of faults (thus obeying the CAP theorem because they're real systems and not opium dreams).
* Scaling horizontally does not imply anything about fault-tolerance. It instead describes systems in which resources can be incrementally added to incrementally gain capacity. It's possible to build a horizontally-scalable system which is less reliable than a single-machine system; it's also possible to build fantastically fault-tolerant systems which are also horizontally-scalable (c.f. Dynamo). Doing the latter is considered "a good idea;" doing the former is considering "fucking daft."
* Nothing about horizontally-scalable systems (or NoSQL or really anything the author mentions except for Redis) requires that the entire dataset be kept in memory. Systems like Riak (http://riak.basho.com) or Voldemort (http://project-voldemort.com/) use pluggable storage engines, some of which (e.g. InnoStore and BDB-JE) have excellent performance with 1:10 RAM-to-dataset ratios. By the author's own metric, the Holy Grail has not only been found but the damn things are multiplying.
* Neither epoll nor kqueue "scale indefinitely in terms of I/O concurrency." Nothing does. That's horseshit.
You're better off huffing glue than reading this thing. I don't even care what he has to say about Redis. He could have some incredible insights about it, but they'd be completely and totally negated by the incomprehension, misinformation, untruths, and general crazytalk which preceded it.
tl;dr: 15+ years of RDBMS experience gives you 0 clues about distributed computing; reading Time Cube (http://www.timecube.com/) is preferable to reading this drek.
Redis works great as long as the dataset fits in RAM. After that, the background saving process kicks in, and performance becomes an issue. This caused my company to move away from Redis to Mongo. It's foolish to assume that just because a product goes beyond storing key/value pairs that it's over engineered. It actually seems that no research was done outside of what Redis can do given the portion of the article talking about namespaces for keys not being inherent in NoSQL solutions. Check out Mongo's collections. That's exactly what they are.
> Redis works great as long as the dataset fits in RAM
This is by design, companies moving away from Redis because of this did not understood the deal at the beginning, and where looking for something else, so it was a good idea to move away.
Redis is mostly an in-memory database that happens to be disk-backed. With VM it is a different issue, and there are interesting VM uses, but the vast majority of Redis users are using the DB without VM, and as it is, as an in memory store, where the disk dump is used in order to reload the data on startup.
Because this is the Redis way, even developments are focused towards this direction: to use less memory for common data types, and scalable clustering in order to make it simple to use multiple instances.
I see this as a very simple to gasp thing. Just because this is the argument of the discussion instead I fail to see why Mongo should instead not be just considered as an SQL-family DB. It seems more or less a subset of SQL, but implemented with different tradeoffs. For sure they have some good motivations to avoid SQL, but what I mean is that semantically it looks a relational database, while Redis has a completely different data model, so I can't see how the two systems really are a reciprocal drop in replacement and/or comparable solutions.
So I can see how MongoDB can be an alternative to MySQL when used to store a lot (much larger than memory) of row-alike data (call it documents or like you want).
And I can see how Redis can be used when you have memory fitting databases and need very high performances, and in general for all the needs of atomic data structures and complex server-side operations in this data structures. For instance storing or caching timelines, taking leader boards for a game via sorted sets, and a zillion other use cases that are currently running while we are talking.
I can't see instead how MongoDB can replace Redis or the other way around, if not for a very small subset of cases.
There are plenty of SQL and NoSQL applications that are designed to be disk-based databases. Redis is explicitly designed to be a very fast, in-memory database. What you are complaining about is a feature, not a bug.
If the guy needs access counters without an ever-growing disk file, he dismissed MongoDB too fast. The ability to repeatedly alter fixed size data without growing the storage is something MongoDB has, and no other database engine that I know of. (Their design has its own serious disadvantages, but still, if that's the behaviour you need...)
To save you one click, the real problem about the web scale is the integration. This is the hard part, this is where things start to break, this is where on can have fun too.
That is just so incorrect that it's hard to take the rest of the post seriously.
What happens when you want to add a node to your cluster? What happens when a node goes down? What happens when you drop some packets between nodes? When one node has an unbalanced number of keys?
Then, answering each of these questions brings with it many more questions. For example, if the answer to "what happens when your node does down" is to replicate some data to another node, then how do you deal with inconsistencies between different replicas of the data? What happens if the node you try to replicate to goes down? What if the node you try to replicate to has a different idea of what that data should be?
If a database is to be truly horizontally scalable, it will have an answer for all of these questions. Which has a lot to do with "the database engine itself."