I will be interested to see how scripting Redis with Lua measures up to that solution. The separate daemon would have the overhead of protocol handling for each communication with Redis, but it could also be written in a compiled language, a language with better concurrency support, etc. It would also be trivially sandboxed from the Redis server.
Proof is what it was possible to do with variable argument list push. You can push something like 350k items per second per core just using an LPUSH with a few more arguments per item.
Disclaimer: I wrote AlchemyDB, embedded lua in redis a while back and have played w/ it constantly (it is robust and mind bogglingly flexible).
As for the speed differences of embedded lua vs a daemon sitting next to the server, Alchemy has its test suite in Lua and some tests I run via an external client, and some tests I run internally via embedded Lua.
The speed difference between client/daemon and embedded lua becomes VERY evident (10X faster) on large loops, where I/O and TCP kernel time are saved ... but as redis is single threaded server, scripts block the server during their execution, so it is just dangerous for novice programmers.
All in all, if redis embeds lua correctly, it will really open up the project and it is a very minimal bloat, lua is tiny, and it is just one command :)
Diskstore is just an exercise for now. Likely we'll ship 2.4 that is an improved version of 2.2. Then Redis 3.0 that is 2.4 + cluster and other things. Later if even diskstore will be good enough, we'll ship it, but it is possible that we'll mark diskstore as "off topic".
About Redis Cluster, it is no longer into a private branch. It is into unstable, and you can even play with it, check the latest antirez.com blog posts for instructions. Currently I'm designing the second layer, that is the resharding and how master-slave nodes interact. It does not need much coding, but requires to get the details right now that we have a base.
You'll see something about Redis Cluster soon.
About scripting, don't think that every blog post I do means I'll spend a lot of time on it. What will happen is that in the following weeks at some time I'll send something like one or two mornings of work to get an alpha version with scripting and put it into a topic branch, post a message on the mailing list and a blog post. Everything else will wait for the later times.
I've the attitude of talking a lot, at the point the development of Redis is almost completely a public process.
I also change ideas often, that I think is a good idea, as to go forward without reconsidering what you are doing is not good. But this does not means that the development of Redis is a complex path that goes forward and backward, we actually are trying since Redis 2.2 to provide continuously updates not about features but about the quality of the implementation.
A more precise view of the actual development path can be seen looking at the 2.4 and unstable branches commit log messages.
As a happy Redis user, the VM is the only sore spot - it seems like it should be possible to get Redis like performance on cached keys and still be able to store a long tail of data on the same server.
BUT I did not stop experimenting, we'll return on this, but in the form of diskstore and as on disk allocators using mmap() that is another thing I'm playing with.
As for clustering, Antirez also explained in the blog post that it will work as long as you use a single key per script.
Ilya Grigorik wrote a nice article about Tokyo+Lua back in the day: http://www.igvita.com/2009/07/13/extending-tokyo-cabinet-db-...
Love the idea of server-side code
"Redis will try to be smart enough to reuse an interpreter with the command defined."
Caching the scripts should be fairly simple. Maintain a table of functions, then when a new script comes in, take its CRC32 value. If said value is in the table, just lua_pcall the script with the arguments. Otherwise, lua_load the function, store it in the table, then call it. Also, since Redis is single-threaded, you should only need one lua_State per server.
How are you planning to represent Redis values in the scripts? Would you just represent them as Lua strings and tables, or would you wrap the Redis values in userdata and allow operations to be called on them directly using metatables?
Currently, I believe Redis has enough functionality do a lot. I think it should be possible to build more query-like features on top of Redis, and preserve more flexibility in terms of how the querying works. Then, you could decide how far you want to go with SQL-like functionality.
To completely close the gap, I'd want two more things:
1. Store multi-line scripts as Redis objects. Trying to cram a loop and several expressions into one line will not be easy to understand or maintain. I'd rather deal with the cluster consistency issues in the application than be limited to one-liners.
2. Ability to execute a script in a separate thread. If the script doesn't require isolation, execution shouldn't necessarily block other operations. Some scripts might take 250ms to run, which is too long to block the main thread.
Definitely agree about being able to use separate threads, though.
But as you point out, that can always be added later.
Your performance will be absolutely horrible compared to Redis. I can't see doing this for anything production-worthy. (Miniredis was made for a very specific use-case where the performance hit was fine--and even there, we've replaced it with the real Redis for the next version.)
However the problem with this approach is that you get a lot less performances and memory efficiency.
Not sure about memory efficiency. You can use __slots__ in Python or custom data structures (e.g. blist) if you have specific needs.
I've no idea about __slots__, just a feeling that there are too much underscores to work well ;) But in general to get as memory efficient as Redis in a scripting language is very hard for different (good) reasons.