Hacker News new | comments | show | ask | jobs | submit login
Concurrency and Redis (heroku.com)
49 points by slig 2299 days ago | hide | past | web | 17 comments | favorite

A lot of (especially web) applications suffer from race conditions. Luckily for most of them the effects are either temporary and/or not severe.

It does not only apply to Redis but basically all data stores unless something like transactions in RDBMSs are used and even then you can get into trouble at times.

The last example still has a problem when an exception is thrown and the 'redis.del "setting_up_auth_token"' fails for whatever reason. Furthermore it has a race condition if a process wants to read the content of 'web_service_auth_token' because it sees 'setting_up_auth_token' has been set but the other process who is setting it up has not reached the redis.set line for it.

The example itself is not very good as I'd rather do a little bit of extra work (would not hurt here) than being vulnerable to failures.

Assuming the work really needs to be done strictly once - either because it's a lot or the logic requires it - you would probably have to do something like the following pseudo code:

  while (true) {
    v = get("setup_key")
    if v == 2 {
    if v == 1 {
    setex("setup_key", 10, 1) // 10s timeout
    if !exec() {
      continue // someone else got the lock in the meantime, try again
    // we now have a temporary exclusive "lock"
    ... do the real work ...
    set("setup_key", 2)
What does this do? Well, setup_key acts like an exclusive lock which has 3 states: non-existing, 1 and 2.

non-existing means we are the first, lets do the work. 1 means the lock is being held, sleep and check again. 2 means the work has been finished.

The watch(),multi(),exec() work as a CAS (really, whytf is this not a single command?) to atomically acquire the lock. If the lock is held by someone, we just sleep 1 second and try again. I use a timeout on the key in case the client holding the lock dies to prevent a deadlock.

But this is STILL not perfect. What if your setup work takes longer than the timeout specified for the temporary key? I haven't spent enough time thinking about it right now to come up with a solution if there even is any for redis, so I'll leave it as an exercise to the reader :)

To be honest: with slightly different commands/semantics, Redis could be MUCH easier to write correct applications for. Especially the "transaction" support is mediocre imho.

Conclusion: Many web applications are inherintly and severely concurrent. Concurrency is HARD to get right. There are many patterns like locks, atomic operations, transactions etc. - all with completely different semantics. Redis is singlethreaded for a reason :)

In our setup (using Sql Server, alas, not redis but same concept applies) the worker process sets the lock and the time of acquiring it.

Other workers look for pending jobs that either don't have the lock set or the lock has expired where the expiry time is 5min + lock time.

Since each 'job' is a small amount of work, 5 minutes is fairly generous. If a worker goes down because of a system crash or network failure, then that job wouldn't get processed again for 5 minutes. That's (for us) an acceptable overhead. There's a chance the job gets processed twice, if it gets locked, work gets done, but releasing the lock fails. That's a trade-off we're willing to make as we can't really think of any better solution.

I believe, since redis has built-in support for expiring records, this maybe even easier. To process job 332, you would try to check the existence of 'lock:332' and then set it if it didn't exist, and give it an expiry time.

Would be interested in hearing how other people solve these types of issues.

Thanks for the detailed analysis. A quick thing to consider though, if redis.del fails, it can only mean redis is busted. This means, that none of the examples in any context that use redis in anyway have any bearing. Hence, guarding against that is a little excessive imo.

No, it does not mean that redis is busted. It could be your client crashing (e.g. the OOM killer decided to pay a visit), the network between the client and redis failing (but not for other clients!) and so on. There are many reasons why the redis.del can fail and result in problems for the application.

So, this would mean every call you make to redis (not just the del) would need to be guarded against all of these reasons you have cited. To me that's just overkill. If transactionality is that important, redis may not be the right tool.

Yes you need to make sure all your modifications lead to consistent states. If you don't do this, then you end up with inconsistent states and depending on your app deadlocks.

Then it would only be useful as a cache and even then only if the temporary inconsistencies were tollerable.

Using a datastore without guaranteeing (and be it only eventual) consistency is calling for trouble for any data that is worth something.

Redis CAN be used for many things, but it's not always trivial to do it right even though at first glance it might look so and most web developers are lazy or just don't know better.

I've used counters in Redis for mutex control. If incr returns 1, it means we have succeeded. Otherwise it decrements it and tries again after 1 second until it suceeds.

Here is the code: https://gist.github.com/1062584

What if the decr() fails because of for example a network issue? What if the process who is holding the lock dies?


Good point, never thought about decr failing!

Can't seem to comment on the blog itself, so:

> This code assumes the existence of a UUID library, which returns a unique ID on every call. Now, if more than one process were to run, they would each create their own temp_list’s

Wouldn't this still race in sense that a second update to the SafeIps table may be overwritten by the atomic move?

(A inserts into SafeIps and generates unique temp list, B inserts into SafeIps and generates another temp list, B moves its temp list to "safe_ips", A moves its temp list to "safe_ips" => oops, we've lost B's insert.)


The first example "fixes" the bug with checking first, but without any lock on the resource. If key's existence is a key problem in that design, who guarantees that "some_key" is still there after the long computation? He might equally be returning a nonexistent key's value.

I think the problem is he's trying to use Redis as a transactional store while his design is not a good fit for it. He should either use a RDBMS for this stuff or change his design to match Redis' capabilities.

What I've tried to show in the safe_ips example is just one thing. That the safe_ips list does not have to be deleted to be modified. The fact that redis semantics allows us to atomically do that is awesome. If process B were to overwrite the write process A had done or the other wa around, due to some timing issues, is irrelevant in the context of this example. If that is a concern, then a multi-exec or a setnx with a time-value is the way to go.

Correct. While the rename of safe_ips itself is atomic, it does not guarantee you that the value hasn't been updated in the meantime. That's what CAS is for.

You have to use Redis' INCR to generate the UUIDs.

That does not help at all with the issue at hand.

The problem is not colliding UUIDs, it is a race condition between getting the list of IPs and then writing them to redis.

Oh, well in the pseudo(?) code he seems to push the content from temp lists into the main list, so it seemed that as long as you have separate temp lists for A,B you wouldn't lose anything, but I probably misread it.

But anyway, as the other comment mentions, there is some transaction support http://redis.io/topics/transactions

I'm not sure how much Redis will focus on transactions but if it doesn't you're out of luck or reinvent the SQL wheel on top of it.

Is it neccessery to use UUIDS? Why not use Redis sets instead of list so we'll get rid of key duplication problem?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact