It's currently a research project for us and we have an intern from Uppsala University working on the HA features including paxos leader election and multi-master conflict resolution. See the mutinode branch. Some people have added some additional database backends, but I don't know of any serious production users yet.
I gave a talk on it at the 2012 Erlang Factory as it's entirely Erlang-based. Here's the talk - http://vimeo.com/42630498
Benefits over Redis: Faster startup time over big datasets, pluggable backends, (eventually) Multi-master and HA (with a different approach than Redis is currently taking with Sentinel).
Drawbacks: 3x to 10x slower on all operations. We've done no work to optimize.
Whether Couch has or doesn't have Map/Reduce is irrelevant, really. Couch Map/Reduce is a toy (just like Riak) and is not to be trusted. We avoid it as much as possible.
Really? How so?
The only think Riak doesn't have is a high-speed RAM-based data structure capability. But hey, Redis does that. ;-)
The only reason to do it if you already have something written that uses Redis and just want to build on that.
It's not a convenient document store. It's not relational and it's only ACID in that the single thread serializes all access. It doesn't scale worth a damn. But it's fast enough that mostly doesn't matter.
So the question arises: why create something with none of Redis's advantages, and all of its flaws?
There are whole companies who make elaborate servers that speak simple protocols. Membase is essentially a much more elaborate, multi-server replicated database that speaks the Memcache protocol.
We started edis as an experiment to teach some developers Erlang. We had a use case that was quite interesting at the time, so it was a great way to teach Erlang while working through some interesting database scenarios.
Imagine you have a bunch of users come into a site and they'll hit the server for an hour and then go away for at least a week, maybe more, like the demographics around a popular TV show. Imagine you need to handle a HUGE number of users for that hour, but after that hour, you have less than 1/10th of 1% of those users during the week. Also, imagine that you are pretty happy with Redis's speed, but that you've outgrown what you can comfortably handle on one server and so you are happy to tolerate less speed for more space.
This is a really nice use case for a disk-backed Redis server.
It's not the only option, of course, but we didn't need to store documents and we didn't need ACIDity.
I haven't tried to design for exactly that, but my first shot at it would be independent pairs of app servers and redis instances, with redis used as an expiring cache, fronted by a load balancer with affinity, backed by centralized regular SQL. Something like MySQL has no trouble hanging onto a scadload of records it mostly doesn't touch, and you can scale the expiry time on redis down as it gets more heavily loaded, so only the hottest tip of the hot set is kept in RAM.
Maybe a nice backend for an IndexedDB implementation, but for 'serious' storage, no way.
For every level you add an additional disk read to every query. For optimal performance you should have RAM greater than 10% the size of your dataset, so the OS disk cache can operate effectively.
LevelDB write rate initially seems amazing, since it's simply writing unsorted keys to an append-only file until the file hits 2MB or so. For bursty loads it feels great.
But the moment writes are sustained for longer than it can merge segments (say while doing a bulk load), per-write latency spikes appear (average op time jumps from <1ms to >30000ms for a single record), and eventually it'll get so far behind that all attempts to progress will hang entirely, waiting for the background compactor to free up room in the youngest generation. The effect seems to worsen exponentially with database size. To attempt to mitigate this, when LevelDB notices it's falling behind it begins sleeping for 0.1s every write.
It's especially easy to trigger on slow CPUs with spinning rust drives.
Isn't that a problem common to all SSTable-based databases?
Is LevelDB any worse than HBase or Cassandra in this area?
The only solution I've found is Castle backend from Acunu , but there were no updates from 2011  and it looks really heavy (with kernel module and all that)
Been using Tokyo Cabinet for a long time now, and have repeatedly hit similar hangs. Shopping for a future datastore of this sort!
Tokyo Cabinet should behave similarly.. can you tell us a bit more about your setup?
We like the speed of these datastores as some of our algorithms proceed with millions of calls every few seconds or so, and we like that it is not remote.
This doesn't make much sense to me, as in redis you have to design how you are gonna store your data with it's limitation and performance in mind. So why a drop in replacement instead of redesigning your storage?
It looks like he already made his own (3yrs ago): https://github.com/antirez/Bigdis
> in redis you have to design how you are gonna store your data with it's limitation and performance in mind.
Hmm, that's interesting. This is the very reason I've always loved Redis and I'd love a Redis-for-disk.
Isn't the entire point of Redis that it's meticulously written in C by some Italian guy, specifically designed for in-memory datasets? Protocol compatible is one thing but maybe name it something with a greater edit distance?
And I was wondering if anyone was using it - and what they thought…
Also the github repo link: https://github.com/inaka/edis
That's nice. That means it would be (relatively) easy to change it to lmdb to make it fast.
I do not quite remember if there were any outstanding "didn't quite finish this corner case" items, but I don't think there were. (If I looked at it I could tell, but I'm on my phone, and should be looking at other things right now anyway.)
(edit: To be clear, I mean in "normal usage". I almost certainly didn't think through issues involving, say, replication, to make certain the semantics were always sensible; my goal was "doesn't crash or break or corrupt".)
(I think the problem was mostly that the higher-level architecture I was putting this piece into underwent a slight change that made me think I might need something slightly different, and the result became "since I literally only use the APPEND command, I can probably come up with something more directly solves my problem.)
If you have any questions, I'm on IRC (various networks, including freenode and irc.saurik.com as "saurik").
But I remember some time ago antirez was trying to implement the same idea (it was called "virtual memory"), but failed  mainly due the complexity of implementation.
By the way, what was the storage that satisfied your 'append only' use case at the end of the day? AFAIK plain LMDB could get performance on par with raw disks I/O in this case.
In essence, I wanted some kind of vector array implementation (std::vector, java.util.ArrayList, etc.) on a server, but in a situation where a very large number of the values would end up becoming "dead weight" at some point and, while I needed them to not be deleted, could be flushed to disk.
(At the higher level of my system I am able to shard the strings into which I'm appending so they don't themselves get so large that the eviction/recovery process becomes insane, but within windows of time I want near constant-time performance doing random indexing into the appended data.)
Redis, with the caveat that it has a needlessly and seemingly painfully wasteful format for its append-only durability solution, pretty much does this, but was RAM-only, so I set things up such that old keys could be evicted to LMDB and later could be seamlessly recovered when/if required.
I do not believe "plain LMDB" (without Redis) is a good solution to this problem, but I am willing to believe that I just don't understand how to use it very well (I never found any real "documentation", so I was reading all the comments in the code, looking at the API, checking the mailing list, and falling back on some guess/check; not many people use it).
(edit: Just realized this maybe didn't answer the question while re-reading our thread. Will continue with more focus.)
One thing that I was dissatisfied with here is that the durability of the eviction to LMDB is still only going to be on a single computer; for my purpose, once a key is evicted, its latency requirements drop dramatically, so I could potentially store it to something much slower (like S3).
Honestly, though, I don't quite remember what I decided was sub-optimal about this, because my brain is right now filled with other things that I don't quite want to fully context switch out for the purpose of trying to bring this project back to the front of my mind ;P. It worked, though.
(The other comment I made about "plain LMDB" was to address a question posed by snaky, where I was assuming either embedding it into my app server or having a thin wrapper; I was assuming the suggestion was that LMDB could handle "append to key" in a durable and fast manner without relying on redis to first buffer the data to be written en masse as larger chunks, which is pretty much what it is doing in this architecture I was designing.)
(BTW, I've never talked to you, but I knew one of your former students. Hello! ;P)
OK, that makes sense. Yah, it sounded like that to me too, but LMDB's Append mode is only usable when all the input is already in sorted order. Great for bulk loads of prepared data, not much help with randomly arriving input.
You can also catch us on irc #openldap if you want to ask questions about it and aren't posting to the openldap-technical mailing list. The py-lmdb guy has been on irc with us a lot lately, which is probably why his docs have so much info - he keeps asking questions and getting answers.
And I agree with you in that all the relevant pieces of information are there, and for me personally, as well as for saurik, source code is more than enough actually, especially considering they are really clear and compact.
I just wish LMDB were more popular and widely adopted because it's really brilliant piece of software. And I think documentation is a one of main factors in that, that's it.
Anyway, thanks for pointing, I'm thinking about a potential complexity of adding to LMDB the ability to use secondary indexes (separetely stored, due the carefully done tuning of main storage scheme) so I'm sure it'll come to discuss with you on IRC and mailing list.
For secondary indexing, you should take a look at what slapd back-mdb does. I know that BDB supported 2ndary as a built in feature but we never used it because it was too slow. What we're doing now is already quite fast.
But should I suggest, considering SkyDB highly experimental, maybe port it to more flexible backend like Mapkeeper that gives for free not only easy switchable storage backends, but (not switchable, but changeable with a little effort) network/API backends (now it's Thrift), event engine (now it's libevent), etc. It would be nice to focus on core SkyDB things (which I think are data organization, API, stored procs etc.) while "outsource" the non-core low-level jobs to things like Mapkeeper.
And by the way, lmdb backend have being merged into Mapkeeper not so long ago :-)
Sky originally started out as a project to learn how to write a database and programming language. The storage was originally a block-based mmap format and the query engine was built on LLVM. It was a great learning experience but I am trying to take a higher level approach by delegating storage to LevelDB (or possibly LMDB instead) and the query engine uses LuaJIT. Focusing on a higher level use case (i.e. behavioral analytics) has given some crazy performance gains and great scalability.
I think once Sky matures more then it would be more interesting to be able to swap out the backends. I worry about having too many moving pieces. I'd also be concerned about copying data with the extra layers. It looks like LDMB does a zero copy which would be awesome.
I'm confused about Edis current version being 1.0. Is it production ready or just ignoring semantical versioning?