
Redis re-implemented in Rust - wmwragg
https://github.com/seppo0010/rsedis
======
undefined0
For me, Redis was the first software written in C which I could easily
customize with additional features (as I have low C knowledge). It was written
beautifully. I've been learning Rust, I certainly find learning Rust easier
than C and I like the fact that I can dive into software written in Rust
without worrying about GC. You've done the best of both worlds for me by
writing Redis in Rust. With that said, I'm still having an easier time reading
Redis in C over your code as you are lacking comments and well named
function/variable names. I admire your work nonetheless.

~~~
seppo0010
That's a fair criticism, and well taken, but keep in mind I'm learning the
language as I go, and most of the commits are just rewriting things because
they were suboptimal, not idiomatic, or hard to read. At this stage, I would
consider adding comments wasteful.

I also have no intention of making this project live as long or have as many
users as Redis does.

~~~
illumen
Learning to write readable code is a good thing I think worth the effort.

Can rust be readable?

~~~
seppo0010
If you want good examples of Rust code, you should probably look into other
repositories. Today I was checking out how Rust's HashMap works and I found it
quite readable:

[https://github.com/rust-
lang/rust/blob/master/src/libstd/col...](https://github.com/rust-
lang/rust/blob/master/src/libstd/collections/hash/map.rs#L1115)

You can also look at `mio`, the asynchronous IO library that's popular in
Rust:

[https://github.com/carllerche/mio/blob/master/src/notify.rs#...](https://github.com/carllerche/mio/blob/master/src/notify.rs#L148)

~~~
anonfunction
I would hardly say this is readable:

    
    
        fn search_entry_hashed<'a, K: Eq, V>(table: &'a mut RawTable<K,V>, hash: SafeHash, k: K)
            -> Entry<'a, K, V>

~~~
pygy_
I don't know Rust, and the only thing opaque to me in that line is the `'a`
syntax. I suppose it is related to memory safety.

In plain English:

`search_entry_hashed` is a parametric function. It works in a type safe way on
any kind of K and V. It takes as arguments a reference to a mutable hash
table, a hash function and a key, and returns an `Entry` object whose precise
type depends on the type of the parameters.

You could hardly express that more succinctly. You can't understand it by
skimming it, but it is a complex definition.

~~~
Tuna-Fish
Your guess about 'a is correct. 'a is here used to signify that the return
value can only be safely used while the value table exists and is not being
concurrently modified. This implies that the value is not copied on return,
but only a reference into the table is returned.

~~~
pygy_
I'm surprised that RawTable isn't parametrized on the type of hash function.
Something like this (possibly wrong way to express it, but you get the idea)

    
    
        fn search_entry_hashed<'a, K: Eq, V, H: SafeHash>(table: &'a mut RawTable<K,V,H>, hash: H, k: K)
            -> Entry<'a, K, V>

~~~
Jweb_Guru
RawTable doesn't have to be because the hash function has already been
computed at that point. The actual hash table is.

------
thedufer
Before anyone starts using this as a Redis replacement on Windows as the
readme suggests, take a look at the TODO file. Notable missing features
include:

\- maxmemory key eviction

\- hash values

\- ~2/3 of the set operators

\- multi/exec

\- lua scripting

This is an interesting and potentially useful effort, but a replacement for
Redis it is not.

~~~
shankun
By the way, if you are looking for a production-quality Windows port of Redis,
there is a fork available at
[https://github.com/MSOpenTech/redis](https://github.com/MSOpenTech/redis). We
(Microsoft) provide it in production as Azure's cache service today, and are
committed to continuing to work on it.

~~~
ddlutz
Are you part of the Azure cache service? I was an intern the the Edge Caching
and Storage team and joining back fulltime next month. If things are the same
way they were it would be worth it exploring using Redis for our cache and I'd
like to talk details.

~~~
shankun
I am not, but I work with them closely. If you'd like to talk to the team
involved, send me (shankun_at_microsoft_com) your email and I'll connect you
up!

------
kibwen
Since this seems to be just a learning project, note as well that there exist
Rust bindings to Redis itself, from Armin Ronacher (though I'm not sure if
they've yet been updated to work on 1.0): [https://github.com/mitsuhiko/redis-
rs](https://github.com/mitsuhiko/redis-rs)

~~~
the_mitsuhiko
Yep, works with 1.0.

------
wyaeld
The readme says its a learning project.

Its a very interesting piece of work though.

I'll be interested to see Antirez's view on the trade-offs between C and Rust
for this.

~~~
seppo0010
He is at least curious.
[https://twitter.com/antirez/status/611189939519229952](https://twitter.com/antirez/status/611189939519229952)

------
unfamiliar
Could somebody give me a tl;dr on Redis? I keep hearing about it but from the
summary I can't tell what kind of applications it is being used for.

~~~
aaggarwal
From their official github page, Redis is an in-memory database that persists
on disk. The data model is key-value, but many different kind of values are
supported: Strings, Lists, Sets, Sorted Sets, Hashes, HyperLogLogs, Bitmaps.

It simply means that the key-value store is directly loaded into the memory
(RAM) and is available for fast access, but the data is retained (persistent)
even after the application is closed.

It is usually used as cache store, queuing messages to communicate with
different processes locally or distributed.

~~~
rakoo
To add to that, Redis is a TCP server so you can speak to it from multiple
processes on multiple machines (and Redis will easily support huge loads)

Its data structure cover a good part of what you'd need with generic data
structures, which makes Redis an easy way to do the logic of, say, List
intersection of friends common between multiple people, sorted set of goods
ranked by their amount, all of this shared with other processes.

Redis also offers pubsub capabilities in two forms:

\- A standard PUB/SUB couple which does what you think it does

\- Blocking pop on a list for a client, and a push for another client, which
will "wake up" the first one with the value.

It's a very versatile swiss knife.

------
shmerl
I was just thinking, that Rust is a great candidate for big data processing
tools. So much more than Java (which is annoyingly used a lot there).
Something like Spark and HDFS should be implemented in Rust.

~~~
pron
The more cores you have and the more RAM, the bigger advantage GC has. The
thing with having lots of RAM is that it's very hard to take advantage of it
with on-stack data (which can, at most, use about 1-2% of the total RAM
available -- do the math) and thread-local heaps. Once you use thread-local
heaps/arenas, you need to shard your data. Any cross-shard access would mean
locking, which doesn't scale very well. That's exactly where GCs shine: they
let you have scalable, concurrent access to data with arbitrary lifespan.
That's why Java is used for those kind of applications -- it performs and
scales much better than Rust can hope to do on large machines.

You are right, though, that if the processing is extremely "batchy" and all
data dies at the same time, then it doesn't make a difference.

~~~
shmerl
_> That's why Java is used for those kind of applications_

I'm not convinced that's the reason why Java is used for it. There are native
alternatives like HPCC which claim to perform better.

As was noted, concurrent access to shared data is not something very common in
such distributed computation scenario. Well designed processing will avoid it,
and thus will avoid need for locking as well.

~~~
pron
> There are native alternatives like HPCC which claim to perform better.

The goal with performance is almost never to get the maximum possible
performance but the best peformance/effort ratio for your needs. This is true
even for performance-critical applications. As there are diminishing returns,
every increase in performance costs more than the previous. Very, very few
applications are willing to invest an extra 30-50% effort to go from 98% (of
maximum performance) to 100%.

As to concurrent access -- see my other comment (depends on use-case).

------
r0naa
Impressive!

Could someone (or OP) elaborate on the value that re-implementing a whole
software to a new language provide comparatively to just building an interface
"bridging" both worlds?

To clarify, my metric for "value" is usefulness to other people. That is,
without considering the (interesting) learning opportunity that it represent
for the author.

For example, someone developed a Python interface to the Stanford Core-NLP
library (written in Java). Would re-writing the Core NLP library to Python be
useful to the community? How to figure what are people needs?

I am asking because while I think it would be ton of fun and allow me to learn
a lot, I also value building useful software and re-writing a whole system
sounds like an overkill except for a few very niche cases..

And if I am not mistaken you would need a team at least as large as the parent
project to implement new features, fix bugs and keep pace with it. Looking
forward to hear what HNers think!

edit: clarified ambiguities

~~~
themckman
The README answers this:

    
    
      To learn Rust.
    

Edit: It also mentions not being tied to UNIX and appears to claim it will run
on Windows. That's certainly something.

~~~
r0naa
Sorry if I wasn't clear, but I am looking for a more general answer! I would
like to know in which case it is useful (to other people) and discuss it's
value comparatively to writing interfaces to other languages.

~~~
seppo0010
I had no intention of making the end result useful, but I run into interesting
problems.

First, I wanted to make it as pure Rust as possible. I tried to avoid UNIX
specific code, and since there is no library with Windows support for
asynchronous IO in Rust, I was pushed into spawning thread and blocking
waiting for client data. I quickly noticed that the benchmark was way below
Redis (around 60% of ops/sec with 50 clients). But then someone point out to
me[1] that I was running tests in a machine with two cores, and this actually
may be better for machines with multiple cores[2]. I have yet to try it out
and benchmark the results.

So far Rust API was disappointing for network operations. For example,
`TcpStream.read()`[3] and `TcpListener.incoming()`[4] do not have a timeout.
Maybe because its development is driven for Servo and not for servers.

I have thought about doubling down on multithreading and instead of a global
database lock as rsedis is using now, having one per key (or some other
arbitrary partition), and having concurrent operations, which is hard to do
safely in C. But I have not gotten there yet.

[1]
[https://github.com/jonhoo/rucache/issues/2](https://github.com/jonhoo/rucache/issues/2)

[2] [https://github.com/jonhoo/volley/](https://github.com/jonhoo/volley/)

[3] [http://doc.rust-
lang.org/1.0.0-beta/std/net/struct.TcpStream...](http://doc.rust-
lang.org/1.0.0-beta/std/net/struct.TcpStream.html)

[4] [http://doc.rust-
lang.org/1.0.0-beta/std/net/struct.TcpListen...](http://doc.rust-
lang.org/1.0.0-beta/std/net/struct.TcpListener.html)

~~~
steveklabnik

        > Maybe because its development is driven for Servo 
        > and not for servers.
    

This is not true at all. If Rust's development were determined by Servo, we
would have kept green threads and implemented struct inheritance by now.

The reason timeouts were dropped is in the IO/OS reform RFC:
[https://github.com/rust-
lang/rfcs/blob/8fa971a670f9b9bc30f31...](https://github.com/rust-
lang/rfcs/blob/8fa971a670f9b9bc30f31bed30b9c3b679ea1ad3/text/0517-io-os-
reform.md#tcp)

    
    
        >  set_timeout has been removed for now (as well as other
        > timeout-related functions). It is likely that this may
        >  come back soon as a binding to setsockopt to the 
        > SO_RCVTIMEO and SO_SNDTIMEO options. This RFC does not
        >  currently proposed adding them just yet, however.
    

And on UDP:

    
    
        > All timeout support is removed. This may come back in
        > the form of setsockopt (as with TCP streams) or with 
        > a more general implementation of select.
    

I'm on shaky wifi and my phone, so I can't find a citation for this, but I
also believe it was removed due to 1) Rust not having any stable
representation of time and 2) needing to shim certain behaviors on some
platforms, which we decided wouldn't happen in the first round of stable
interfaces.

That said, the lack here certainly hurts, and we did manage to stabilize
Duration, paving the way for timeouts to return.

EDIT: Oh! I forgot that [https://github.com/rust-
lang/rfcs/pull/1047](https://github.com/rust-lang/rfcs/pull/1047) got merged
recently. [https://github.com/rust-
lang/rust/issues/25818](https://github.com/rust-lang/rust/issues/25818)
implemented it. [http://doc.rust-
lang.org/nightly/std/net/struct.TcpStream.ht...](http://doc.rust-
lang.org/nightly/std/net/struct.TcpStream.html#method.read_timeout) shows it
implemented on nightly, so you can actually even do timeouts today, just not
on the stable channel.

~~~
seppo0010
Thanks for the clarification. I wanted to subscribe to rust's internals
debates and proposals, but I was not sure how to find them. Should I be
looking at [https://github.com/rust-lang/rfcs](https://github.com/rust-
lang/rfcs) or is there anywhere else?

~~~
steveklabnik
There's a few stages:

1\. We keep open issues on that repo to track ideas.

2\. At some point, someone may decide to formally propose an idea. They may or
may not post a "pre-RFC" to internals.rust-lang.org to get early feedback.

3\. An actual RFC will get filed at that repo as a PR.

4\. The relevant sub team will check it out and comment, and at some point,
consensus will be reached either way.

5\. The RFC will go to 'last call' for a week, making sure all voices have
been heard.

6\. Assuming nothing in last call blocks moving forward, the RFC will be
accepted.

7\. The RFC will be implemented.

So, in short, yes, subscribing to that repo will notify you of the relevant
discussions.

------
resca79
I like this kind of project. But the use case of redis it's a little bit
exstream, I mean that the main feature of redis is the speed and the way how
the memory consuption is handled. If this requirements are not satisfied, it
is only a very good way to learn Rust( as the author goal) and the redis
internal.

------
GeertVL
So how do you re-implement something like Redis in another language? Is it
more of a translation job or do you start with splitting the concepts and try
to implement it. Or take the idea and go your own way with implementing it?

------
sudhirj
I'm try the same thing for similar reasons in Go, but I'm wondering if at some
point a Go version would perform better than C. On a machine with a large
number of cores, perhaps?

GitHub.com/sudhirj/restis

Also wondering if some rethinking is possible - would a HTTP interface a la
DynamoDB be more useful? Can complexity and performance be increased by using
a purely memory backend with no disk persistence? If there were pluggable back
ends would a Postgres or DynamoDB back end be more useful for terabytes /
petabytes of data? Is the beauty of Redis the API or the implementation?

~~~
endymi0n
> but I'm wondering if at some point a Go version would perform better than C.

The answer is "no" with a certain amount of probability. Redis isn't single
threaded by lack of capability, but by design. Concurrency for multiple CPUs
will actually _slow down_ a lot of the stuff you see, as you will need to
introduce locking mechanisms.

Also, garbage collection is highly tuned and customized in Redis to the use
case of an in-memory-DB (in stark contrast to usual allocation patterns of an
application), up to the point where it's almost impossible to replicate the
performance in a garbage collected language.

I love Go and we're a 100% Go (and Angular) shop, but for an in-memory DB it
wouldn't be a sane choice.

------
beyondcompute
Spectacular! Could you add synchronous replication though? And coalescing
queries (so that entire system processes queries in batches, say 300 times per
second)?

------
vicpara
Why would someone do that? To what end? Why isn't anyone re-writing Redis in
assembler to have it kick ass like pros? Can you write Windows in rust?

~~~
derefr
What I'm personally really surprised about is that nobody's rewriting Redis as
a unikernel to clear away all the OS context-switching/networking overhead
from its basic operations.

~~~
Jweb_Guru
Redis is already leaving plenty of performance on the table, e.g. by not
having any concurrent shared memory data structures (the fastest concurrent
hash tables achieve better throughput even on inserts than the fastest single-
threaded ones). It does this in the name of implementation simplicity. People
focused on implementation simplicity don't generally abandon the operating
system.

~~~
derefr
People run Redis mostly in Linux-in-a-VM (with Redis being the only "user-
serving" process) already, though, no? I would think Redis-as-the-entire-VM
would be less to think about, operation-wise, at least if your cloud or data-
center templates its VMs with something like EC2 AMIs. You would just launch a
"Redis appliance" AMI and move on.

It's a feeling less of maintaining boxes, and more equivalent to paying a
Redis-as-a-Service provider.

------
clu3
Man you should have named it Rudis

------
vamitrou
Is it compatible with the .rdb redis dumps?

------
ahmetmsft
Care to post details about this? Is this actually fast? Does it implement all
features and guarantees of redis? Should anybody actually use this in
production (maybe because it works on Windows)? Is it well tested?

Looks like a really cool effort but authors of open source projects often
think people would read the code and figure out all, the truth is people
usually look at what's in the readme and that's all the attention span most
people are going to have. My 2c: improve your README.md.

~~~
detaro
He links a list of missing stuff in the readme.

And if you read "Why? To learn rust" and ask "should I use this in
production"...

