

Redis and Scripting - antirez
http://antirez.com/post/redis-and-scripting.html

======
teaspoon
By saving bandwidth, do you mean saving network bandwidth between a Redis
server and a client on another machine? In that case, another solution would
be for the user to write her own daemon (in a language of her choosing) that
sits on the same machine as Redis, listens for "custom" commands from the
remote client, and carries them out by communicating with the Redis server
over a local socket.

I will be interested to see how scripting Redis with Lua measures up to that
solution. The separate daemon would have the overhead of protocol handling for
each communication with Redis, but it could also be written in a compiled
language, a language with better concurrency support, etc. It would also be
trivially sandboxed from the Redis server.

~~~
antirez
Teaspoon, this was definitely one of the ideas we had, but the performance
hint of doing the I/O with another process is basically almost as bad as the
socket in between. The problem is not just the amount of bandwidth (but it is
_one_ of the problems when there are a massive number of clients, we have an
use case that I can't mention in detail about this). But also the problem is,
the I/O is where most of the time is spent, and it is a shame. With Lua
scripting we fix part of this problem allowing to deliver more performances.

Proof is what it was possible to do with variable argument list push. You can
push something like 350k items per second per core just using an LPUSH with a
few more arguments per item.

~~~
teaspoon
Makes sense to me. Do the Lua scripts get lower-level access to the data
structures, too? E.g., could I write a variable-argument-list LINSERT that
inserts M items in O(M+N) time rather than O(M*N)?

~~~
antirez
I don't think such low level access will be allowed. Would be cool but it is a
lot of work and means to change the scripting layer every time we change the
internals. I'll start with something much higher level than that.

------
julien
It does sound like an amazing idea, but I can't help but worry about the focus
of redis. It's a long time that cluster is "in the pipes" and it never seems
to get focus for more than a couple weeks (I may be wrong, but that's how it
looks from outside). I just hope cluster-ed redis will eventually show up, as
well as disk-backed redis... etc.

~~~
antirez
Hello Julien, the cluster and diskstore are completely different projects from
the point of view of priority: Redis Cluster is all I'm doing every day more
or less, but I stop from time to time in order to focus also on 2.4 since the
cluster release date is too far (later this summer) to block everything else
in the meantime: we need to provide something to the user at the same time we
develop the cluster.

Diskstore is just an exercise for now. Likely we'll ship 2.4 that is an
improved version of 2.2. Then Redis 3.0 that is 2.4 + cluster and other
things. Later _if_ even diskstore will be good enough, we'll ship it, but it
is possible that we'll mark diskstore as "off topic".

About Redis Cluster, it is no longer into a private branch. It is into
unstable, and you can even play with it, check the latest antirez.com blog
posts for instructions. Currently I'm designing the second layer, that is the
resharding and how master-slave nodes interact. It does not need much coding,
but requires to get the details right now that we have a base.

You'll see something about Redis Cluster soon.

About scripting, don't think that every blog post I do means I'll spend a lot
of time on it. What will happen is that in the following weeks at some time
I'll send something like one or two mornings of work to get an alpha version
with scripting and put it into a topic branch, post a message on the mailing
list and a blog post. Everything else will wait for the later times.

I've the attitude of talking a lot, at the point the development of Redis is
almost completely a public process. I also change ideas often, that I think is
a good idea, as to go forward without reconsidering what you are doing is not
good. But this does not means that the development of Redis is a complex path
that goes forward and backward, we actually are trying since Redis 2.2 to
provide continuously updates not about features but about the quality of the
implementation.

A more precise view of the actual development path can be seen looking at the
2.4 and unstable branches commit log messages.

~~~
reitzensteinm
I'm quite surprised that Diskstore isn't more concrete than that - were there
unforseen problems with it? Or is the focus going to stay on in memory
databases?

As a happy Redis user, the VM is the only sore spot - it seems like it should
be possible to get Redis like performance on cached keys and still be able to
store a long tail of data on the same server.

~~~
antirez
The problem is exactly the one with VM: I believe that likely disk will suck
_but_ for a specific work load, that is, extremely biased working set + mostly
reads. But this is exactly the use case of on disk DBs anyway, that are doing
a lot work to work well in this use case, why should we also enter this
business? There should be space for everybody, I'll be very happy if we'll do
our work well, that is, the in memory data structure server :)

BUT I did not stop experimenting, we'll return on this, but in the form of
diskstore and as on disk allocators using mmap() that is another thing I'm
playing with.

------
StavrosK
Oooh, this does sound like a good idea but beware feature bloat. I don't want
Redis to end up being a full RDBMS, and this is basically the equivalent of
stored procedures, especially if the scripts are stored as redis objects...

~~~
riffraff
I'd say in a way this could avoid feature bloat, in the sense that new
operations can be added as "frequently used snippets" instead of core
commands. I wonder how this plays with replication/clustering, though.

~~~
catwell
Avoiding feature bloat is indeed one of the goals of this move, as explained
in the blog post. Several users are requesting features in the Google Group,
others like me maintain forks. With user-defined procedures, hopefully we will
no longer need all this.

As for clustering, Antirez also explained in the blog post that it will work
as long as you use a single key per script.

------
rb2k_
The idea of lua integration reminds me of Tokyo Tyrant.

Ilya Grigorik wrote a nice article about Tokyo+Lua back in the day:
[http://www.igvita.com/2009/07/13/extending-tokyo-cabinet-
db-...](http://www.igvita.com/2009/07/13/extending-tokyo-cabinet-db-with-lua/)

Love the idea of server-side code

------
LeafStorm
I am glad that Lua was chosen as the implementation language. It really is a
perfect fit for the problem.

"Redis will try to be smart enough to reuse an interpreter with the command
defined."

Caching the scripts should be fairly simple. Maintain a table of functions,
then when a new script comes in, take its CRC32 value. If said value is in the
table, just lua_pcall the script with the arguments. Otherwise, lua_load the
function, store it in the table, then call it. Also, since Redis is single-
threaded, you should only need one lua_State per server.

How are you planning to represent Redis values in the scripts? Would you just
represent them as Lua strings and tables, or would you wrap the Redis values
in userdata and allow operations to be called on them directly using
metatables?

------
gersh
Johm builds allows to use data types on top of Redis. I implemented the
ability to query multiple fields at <https://github.com/gersh/johm>. I've been
discussing ways to implement more complex queries on Johm mailing list.

Currently, I believe Redis has enough functionality do a lot. I think it
should be possible to build more query-like features on top of Redis, and
preserve more flexibility in terms of how the querying works. Then, you could
decide how far you want to go with SQL-like functionality.

------
mef
Very excited about this, especially since nginx also has robust embedded Lua
support through lua-nginx-module. Nginx+redis+lua is fast becoming my favorite
stack for frontend stuff.

------
panarky
Simple scripting will solve some very inefficient patterns, such as retrieving
a long list of values over the network, performing some operation on them in
the client, and storing them back to Redis.

To completely close the gap, I'd want two more things:

1\. Store multi-line scripts as Redis objects. Trying to cram a loop and
several expressions into one line will not be easy to understand or maintain.
I'd rather deal with the cluster consistency issues in the application than be
limited to one-liners.

2\. Ability to execute a script in a separate thread. If the script doesn't
require isolation, execution shouldn't necessarily block other operations.
Some scripts might take 250ms to run, which is too long to block the main
thread.

~~~
prospero
Where does it say scripts need to be single lines? The Redis protocol prepends
all arguments with a description of their byte-length, so the scripts should
be able to have as many newlines as you like.

Definitely agree about being able to use separate threads, though.

~~~
antirez
Indeed the script can be multiple lines without problems. About threads, I
think we'll stay single threaded for scripting as well since otherwise we have
troubles, that is, Lua scripts will not be atomic from the point of view of
the caller. There is to take care and not write commands that do complex
stuff, or to be aware that this commands do complex stuff :) In Redis there is
the tendency of doing a simple raw idea and try to make the user aware that it
is easy to shot yourself on your foots instead of making it more complex to
avoid problems, and it is probably a good idea to follow this path for
scripting as well. After all there is always time to make it more complex.

~~~
btilly
If you stay single threaded, I would strongly recommend having a yield
command, which would just exit the script, do other things, and then come back
and execute the script again with state intact.

~~~
LeafStorm
That could be implemented fairly easily using coroutines. Redis could start
the script in a coroutine, then if it yields, return whatever values it
yielded back to the user and schedule the coroutine to be finished later. When
Redis has nothing to do, it could go back and restart the coroutine, then
discard it once it finally returns. (Though of course the user would have no
way of getting whatever values it yielded after the command returned - it
would have to communicate by storing a key somewhere, or possibly with
publish/subscribe.)

~~~
LeafStorm
On second thought, doing this automatically might be a waste of time for most
one-shot commands. Perhaps a separate COEVAL command would work better for
this.

~~~
btilly
That sounds right to me. Only a few commands would need this facility. But
when you need it, you need it. A long-running script should _NOT_ lock up
Redis indefinitely.

------
ntoshev
I've had an alternative Redis-like design for a while... What if you have a
Python process that keeps the data in dicts and lists and Python objects as
usual. These data structures get persisted to disk by forking and pickling the
data as a snapshot, while the main process continues to serve requests. For a
small server, the Python process can be the single-threaded web server itself
(e.g. using Tornado web server).

~~~
gecko
I did that; it was called miniredis: <https://github.com/bpollack/miniredis>

Your performance will be absolutely horrible compared to Redis. I can't see
doing this for anything production-worthy. (Miniredis was made for a _very_
specific use-case where the performance hit was fine--and even there, we've
replaced it with the real Redis for the next version.)

~~~
ntoshev
I didn't exactly mean a Redis server implemented in Python instead of C... It
could be that, but speaking application-specific DSL instead of generic data
structure DSL.

