Only one thread can access the data at any given time, so it seems like most of ...

smueller1234 · on Oct 7, 2019

From having played with/worked on profiling and optimizing Redis in the 2.6 timeframe, I can confirm that at least for small/simple operations, this is true, the data structure access is a small fraction of the cost.

One related choice that Redis makes (or made at the time) is to rely extremely heavily on the malloc implementation, rather than doing work to manage it's memory internally. Even a very trivial, naive free list provided a modest speed-up, for example.

There are a lot of these choices in the code base, largely owing to maintainability concerns (though antirez can surely speak for himself). Given how easy it is for an otherwise uninitiated C programmer such as myself to hack on it, I struggle to disagree with the prioritization. :)

tyingq · on Oct 7, 2019

The excerpted comment in a format mobile readers can see without left/right scrolling:

"Unlike most databases the core data structure is the fastest part of the system. Most of the query time comes from parsing the REPL protocol and copying data to/from the network."

cheald · on Oct 7, 2019

The human-readable/writable protocol is one of my favorite things about Redis, tbh.

I can see cases where a really optimized system could benefit from a binary protocol, but I suspect it'd be a loss for most people.

silisili · on Oct 7, 2019

Why not just offer both?

femto113 · on Oct 7, 2019

That was my thinking as well, though taking a peek at the actual code suggests that there's a pretty deep expectation that the client is speaking strings, e.g. in code that handles the ZRANGE command[1] I see

    if (c->argc == 5 && !strcasecmp(c->argv[4]->ptr,"withscores"))

and a quick grep suggests that's a common pattern

    % grep argv src/*.c | grep -c -e 'str\(case\)*cmp'
    482

I guess this means someone would have to tackle creating an intermediate binary format first, rewriting the command handlers to expect that format, and then making client libraries that can produce the format. Perhaps still worth it in the end, but not trivial.

[1] https://github.com/antirez/redis/blob/unstable/src/t_zset.c#...

lmm · on Oct 7, 2019

Is this really "unlike most databases"? I remember MySQL posting profiling data years ago showing that for looking up by primary-key, 3/4 of the time was spent parsing SQL. (They went on to introduce support for querying with the Memcached protocol to address this)

ijpoijpoihpiuoh · on Oct 7, 2019

That's really surprising if true, considering the SQL should only need to be parsed once.

    SELECT foo FROM Table WHERE key = @mykey;

Then you bind the parameter to whatever you're interested in.

lmm · on Oct 8, 2019

Prepared statements are per-connection and a lot of time you want to use connections from a single pool that's used for all you different queries, so you can't really use them.

ijpoijpoihpiuoh · on Oct 8, 2019

Even with that, the SQL would be parsed once per connection? So, the costs should be de minimis, unless the benchmark were short indeed?

lmm · on Oct 8, 2019

> Even with that, the SQL would be parsed once per connection?

In a webserver-like context it's once per query one way or another - the server process is stateless-ish between page loads, so each page load is either a from-scratch connection or a connection taken from a pool, but even if you're pooling you can't use prepared statements in practice (you can't leave a prepared statement on a connection that you return to the pool because you'll eventually exhaust the database server's memory that way, and you'd have to resubmit the prepared statement every time you took a connection out of the pool anyway because there's no way to know whether this connection has run this page already or not).

If you assume a page that's just displaying one database row, which is not the only use case but a common one, then each page load is one query and that query will have to be parsed for each page load, short of doing something like building a global set of all your application's queries and having your connection-pool logic initialise them for each connection.

ijpoijpoihpiuoh · on Oct 8, 2019

In a database product I'm familiar with, the prepared statements are cached according to their content and those cached objects are shared between connections. Only if they fall out of the cache do they have to be re-parsed. I had assumed that's how all databases worked.

I'm somewhat surprised at the mechanism you're describing, but now I read the documentation it does seem to be the case. I wonder if a small piece of middle-ware might be sufficient to replicate the behavior I'm describing on a connection pool, and whether that would be desirable.

graycat · on Oct 7, 2019

See my post for essentially a "binary" interface: For my key-value store (I wrote because I thought that writing the code would be faster than understanding Redis, :-)), a client uses just standard TCP/IP sockets to send a byte array. The array has the serialization of an instance of a class. Then my key-value store receives the byte array and deserializes to get a copy of the client's instance of the class. So, with the byte array, maybe can count the interface as "binary"? I'm unsure of the speed of de/serialization.