
I've been writing ring buffers wrong all these years - b3h3moth
https://www.snellman.net/blog/archive/2016-12-13-ring-buffers/
======
jgrahamc
_This is of course not a new invention. The earliest instance I could find
with a bit of searching was from 2004, with Andrew Morton mentioning in it a
code review so casually that it seems to have been a well established trick.
But the vast majority of implementations I looked at do not do this._

I was doing this in 1992 so it's at least 12 years older than the 2004
implementation. I suspect it was being done long before that. Back then the
read and write indexes were being updated by separate processors (even more
fun, processors with different endianness) with no locking. The only
assumption being made was that updates to the read/write pointers were atomic
(in this case 'atomic' meant that the two bytes that made up a word, counters
were 16 bits, were written in atomically). Comically, on one piece of hardware
this was _not_ the case and I spent many hours inside the old Apollo works
outside Boston with an ICE and a bunch of logic analyzers figuring out what
the hell was happening on some weird EISA bus add on to some HP workstation.

It's unclear to me why the focus on a 2^n sized buffer just so you can use &
for the mask.

Edit: having had this discussion I've realized that Juho's implementation is
different from the 1992 implementation I was using because he doesn't ever
reset the read/write indexes. Oops.

~~~
kazinator
> It's unclear to me why the focus on a 2^n sized buffer just so you can use &
> for the mask.

The cost of a mask can probably be entirely buried in the instruction
pipeline, so that it's hardly any more expensive than whatever it costs just
to move from one register to another.

Modulo requires division. Division requires a hardware algorithm that
iterates, consuming multiple cycles (pipeline stall).

~~~
ambrop7
You do not need modulo or division to implement non-power-of-2 ring buffers.
Because you will only increment by one. So instead of "x = x % BufferSize" you
can do "if (x >= BufferSize) x -= BufferSize;" or similar.

That's for "normal" ring buffers. I suspect that the design described in the
article can be implemented for non power-of-two without division but I'll need
to think about the details.

~~~
amag
Then you have to deal with branch mispredictions which may hurt performance
pretty bad if the RB is heavily trafficked (which often is the use case for an
RB).

~~~
gpderetta
Actually it might not really be mispredicted. Slightly older Intel CPUs had a
dedicated loop predictor that would exactly predict quite complicated
taken/non-taken sequences. If the RB is of fixed size the edge case would be
always perfectly predicted.

More recent CPUs, IIRC, do away with the dedicated loop predictor as they have
a much more sophisticated general predictor, which, although won't guarantee
perfect prediction on this case, it might still get close enough.

------
phaemon
> Join me next week for the exciting sequel to this post, "I've been tying my
> shoelaces wrong all these years".

Probably. Use the Ian Knot:
[http://www.fieggen.com/shoelace/ianknot.htm](http://www.fieggen.com/shoelace/ianknot.htm)

Seriously, spend 20 mins practising this, and you'll never go back to the
clumsy old way again.

~~~
Bognar
The Ian Knot is quick, but as someone who never ties their shoes and just
slips them on and off, I much prefer Ian's Secure Knot:
[http://www.fieggen.com/shoelace/secureknot.htm](http://www.fieggen.com/shoelace/secureknot.htm)

I usually tie this knot twice over the lifetime of a pair of shoes. Once when
I get them, and once more when they're worn in and need to be tightened.

~~~
chrisseaton
I don't get it - both of these knots seem to be identical to the standard
shoelace knot, just illustrated differently.

~~~
wccrawford
From the site:

"The finished "Ian Knot" is identical to either the Standard Shoelace Knot or
the Two Loop Shoelace Knot. Because it was tied much more quickly and
symmetrically, the laces suffer less wear and tear and thus last longer."

~~~
chrisseaton
Do people's laces wear out? That's not a problem I've ever experienced.

~~~
to3m
I've had a shoelaces break maybe 3-4 times on shoes I wore regularly for more
than 2-3 years. It's annoying out of all proportion to the expense involved.

(The plastic bits at the end can also get frayed and fall off, which happens
more quickly, but I'm not sure knot style has much to do with that.)

~~~
__david__
I think it's so annoying because of the timing. I've never had one break when
untying the knot or when just walking around. It's always while tying it which
means I was just about to leave and now life has thrown a monkey wrench into
my plans. Depending on how close I am cutting things, this may be an event
that makes me late. Grrrrrr. Stupid shoelace!

~~~
pkolaczk
Yes, this! And this happens particularly often, when you use standard cotton
laces which make knots harder to accidentally undo. The synthetic ones last
much longer, but are slippery and easy to untie.

------
cannam
I love the way this discussion has divided neatly into thirds: history of
ringbuffers; digression on shoelaces; fragmentary, widely ignored, replies
about everything else (this one included, I'm sure).

I like this kind of article and enjoyed this particular one, but the long
discussion above about the "right" way to do it goes some way to justifying
why so many people are happy to do it the "wrong" way.

I've implemented and used ring buffers the "wrong" way many times (with the
modulus operator as well!) and the limitations of this method have never been
a problem or bottleneck for me, while its simplicity means that it's easier to
write and understand than almost any other data structure.

In most practical applications, it's memory barriers that you really have to
worry about.

------
planckscnst
This is another interesting ring buffer implementation that uses mmap.
[https://github.com/willemt/cbuffer](https://github.com/willemt/cbuffer)

~~~
AndyKelley
Here's another implementation that works on Windows too:
[https://github.com/andrewrk/libsoundio/blob/master/src/ring_...](https://github.com/andrewrk/libsoundio/blob/master/src/ring_buffer.c)

~~~
csl
This seems to use modulus. The whole point of the mmap trick is to get the
kernel/MMU to do the work for you, IIRC.

EDIT: Oops, I see they use mirrored memory here as well.

------
tveita
The Linux kernel seems to leave one element free, which surprised me, but it
does have this interesting note about it:

[https://www.kernel.org/doc/Documentation/circular-
buffers.tx...](https://www.kernel.org/doc/Documentation/circular-buffers.txt)

    
    
      Note that wake_up() does not guarantee any sort of barrier unless something
      is actually awakened.  We therefore cannot rely on it for ordering.  However,
      there is always one element of the array left empty.  Therefore, the
      producer must produce two elements before it could possibly corrupt the
      element currently being read by the consumer.  Therefore, the unlock-lock
      pair between consecutive invocations of the consumer provides the necessary
      ordering between the read of the index indicating that the consumer has
      vacated a given element and the write by the producer to that same element.

------
ChuckMcM
I have always considered these "double ring" buffers. Along the same lines as
how you figure out which race car is in the race is in lead by their position
_and_ lap count. You run your indexes in the range 0 .. (2 * SIZE) and then
empty is

    
    
        EMPTY -> (read == write)
        FULL -> (read == (write + SIZE) % (2 * SIZE))
    

Basically you're full if you're at the same relative index and your on
different laps, you are empty if you at the same relative index on the same
lap. If you do this with power of 2 size then the 'lap' is just the bit 2 <<
SIZE.

~~~
nickodell
No, I think the author is using the full range of a 32 bit int. So read could
be any 32 bit integer, even if the size of the ring is 1.

(The trick is that SIZE has to be a power of two, or else when you increment
from 2^32-1 to 0, your pointers will jump to a different position in the
array.)

------
ams6110
_Why do people use the version that 's inferior and more complicated?_

Because it's easier to understand at first glance, has no performance penalty,
and for most busy programmers that often wins.

~~~
hzhou321
The first version always leaves a "clean" state, that is both indices points
to actual array locations. A mentally "clean" state makes understanding
easier. For the third version one has to keep in mind the wrap around behavior
of computer specific integers throughout the comprehension process, so it is a
bit more difficult (to understand).

~~~
kbenson
The third version also allows for the write index to be a counter of total
store operations, at least until overflow, which could be useful.

------
dom0
A very related post by ryg: [https://fgiesen.wordpress.com/2010/12/14/ring-
buffers-and-qu...](https://fgiesen.wordpress.com/2010/12/14/ring-buffers-and-
queues/)

~~~
gcatlin
Another one by ryg: [https://fgiesen.wordpress.com/2012/07/21/the-magic-ring-
buff...](https://fgiesen.wordpress.com/2012/07/21/the-magic-ring-buffer/)

------
falcolas
Usually when I'm writing a ring buffer, it's for tasks where the loss of an
item is acceptable (even desirable - a destructive ring buffer for debugging
messages is a fantastic tool). As such, I simply push the read indicator when
I get to the r=1, w=1 case.

Using the mask method is slick (I'd cache that mask with the array to reduce
runtime calculations), but it's definitely going to add cognitive overhead and
get messy if you want to make it lockless with CAS semantics.

~~~
Bartweiss
In general, this makes sense; certainly data you're putting into a ring buffer
is data you're willing to lose.

Doesn't it break the order invariant of the buffer, though? I can't see a way
to do this without the risk of getting reads of newer data prior to older
data. That's probably fine in many cases, but something like non-timestamped-
debugging strikes me as a case where I'd want to know that the data arrived in
the order I'm seeing.

~~~
falcolas
> Doesn't it break the order invariant of the buffer, though

No, if you increment the read pointer prior to the write pointer, the read
pointer will still point at the oldest _valid_ value in the buffer.

So, in pseudo code:

    
    
        if (w+1 >= r) {
           r = w + 2
        }
        w++
        b[w-1] = value
    

For a debugging ring buffer (i.e. looking at it in a core file), you have the
last value of the write pointer, so you can simply read from write pointer + 1
back around to the write pointer and have your messages in order. This makes
the assumption that there is no readers of the debug buffer, so you're only
having to deal with the one pointer.

------
RossBencina
From what I understand, this is the way you'd do it with hardware registers
(maintain the read and write indices each with one extra MSB to detect the
difference between full/empty).

We've been using similar code in PortAudio since the late 90s[0]. I'm pretty
sure Phil Burk got the idea from his hardware work.

[0]
[https://app.assembla.com/spaces/portaudio/git/source/master/...](https://app.assembla.com/spaces/portaudio/git/source/master/src/common/pa_ringbuffer.c)

------
pawadu
> This is of course not a new invention

No, this is a well known construct in digital design. Basically, for a 2^N
deep queue you only need two N+1 bit variables:

[http://www.sunburst-
design.com/papers/CummingsSNUG2002SJ_FIF...](http://www.sunburst-
design.com/papers/CummingsSNUG2002SJ_FIFO1.pdf)

------
tankfeeder
PicoLisp: last function here as circular buffer task
[https://bitbucket.org/mihailp/tankfeeder/src/3258edaded514ef...](https://bitbucket.org/mihailp/tankfeeder/src/3258edaded514ef010a1526d5a298eeaebed215d/exercism-
io/a-f.l?at=default&fileviewer=file-view-default)

build in dynamic fifo function [http://software-
lab.de/doc/refF.html#fifo](http://software-lab.de/doc/refF.html#fifo)

------
kazinator
> _don 't squash the indices into the correct range when they are incremented,
> but when they are used to index into the array._

Great! Just don't use it if the indices are N bits wide and the array has 2
__N elements. :)

Not unheard of. E.g. tiny embedded system. 8 bit variables, 256 element
buffer.

------
jstanley
I had to pause for a second to convince myself that the version relying on
integer wrap-around is actually correct.

I guess that's the reason most people don't do it: they'd rather waste O(1)
space than waste mental effort on trying to save it.

------
hzhou321
He keeps stating the case of one-element ring buffer. Is that a real concern
ever?

~~~
alanbernstein
It seemed like a sarcastic comment to me. Why would that ever be used?

~~~
jsnell
It's indeed a ridiculous data structure, but I did actually need it.

It's a dynamically sized ring buffer with an optimization analogous to that of
C++ strings; if the required capacity is small enough, the buffer is stored
inline in the object rather than in a separate heap-allocated object. So
something in the spirit of (but not exactly like):

    
    
      struct rb {
          union {
              Value* array;
              // Set N such that this array uses the same amount of space as the pointer.
              Value inline_array[N];
           };
          uint16_t read;
          uint16_t write;
          uint16_t capacity;
      }
    

You'd dynamically switch between the two internal representations, and choose
whether to read from array or inline_array based on whether capacity is larger
than N. In this setup it'd be pretty common for N to be 1. Having to add a
special case to every single method would kind of suck, generic code that
could handle any size seemed like a nice property to have.

------
phkahler
I find the headline very interesting. It's very inviting because of the way it
expresses a sort of epiphany about doing it wrong on a mundane programming
task. One is tempted to read it in order to see if there is some great insight
to this problem. just maybe it's applicable outside this one problem. It begs
the question: if he's been doing it wrong on a fairly mundane thing, maybe I
am too. I need to see what this is about.

~~~
buzzybee
I believe it's very common to find little variations on algorithms or coding
style like this that could produce a nice gain in efficiency or elegance. They
aren't really the same problem as whole-system engineering, though, since most
of your bottlenecks come from the algorithm that is completely unsuitable, not
the one that is a little bit suboptimal.

------
ansible
Hmm..., interesting.

I've always been doing it the "wrong" way, mostly on embedded systems. My
classic application is a ring buffer for the received characters over a serial
port. What's nice is that this sort of data structure doesn't need a mutex or
such to protect access. Only the ISR changes the head, and only the main
routine changes the tail.

------
noiv
Just in case, StackOverflow has some variations for JavaScript, although not
that much optimized ;)

[http://stackoverflow.com/questions/1583123/circular-
buffer-i...](http://stackoverflow.com/questions/1583123/circular-buffer-in-
javascript)

------
falcolas
My C is rusty, but won't this act... oddly... on integer overflow?

    
    
        size()     { return write - read; }
    

0 - UINT_MAX -1 = ?

[EDIT] Changed constant to reflect use of unsigned integers, which I forgot to
specify initially.

~~~
crististm
Actually, this method counts on it.

What I find interesting are the trade-offs: machine vs explicit integer wrap-
around and buffers with maximum ~size(int)/2 vs ~size(int).

~~~
falcolas
Got it. Modular arithmetic was the term I was looking for to resolve this.

    
    
        (0 - (2^32 - 1)) % 2^32 = 1

------
ared38
Dumb question: why use power of two sized rings? If I know the reader won't be
more than 100 behind the writer, isn't it better to waste one element of a 101
sized rings instead of 28 of a 128 sized ring?

------
ts330
i love that he has 20 different shoelace knots! life was too simple before
now.

------
geophile
His favored solution introduces subtlety and complexity. Remember that 20-year
old binary search bug in the JDK a few years ago? That is the sort of bug that
could be lurking in this solution.

I understand not wanting to waste one slot. A third variable (first, last,
count) isn't too bad. But if you really hate that third variable, why not just
use first and count variables? You can then compute last from first and count,
and the two boundary cases show up as count = 0 and count = capacity.

~~~
simonbw
> Why not just use first and count variables?

I think he addressed that in the post:

 _The most common use for ring buffers is for it to be the intermediary
between a concurrent reader and writer (be it two threads, to processes
sharing memory, or a software process communicating with hardware). And for
that, the index + size representation is kind of miserable. Both the reader
and the writer will be writing to the length field, which is bad for caching.
The read index and the length will also need to always be read and updated
atomically, which would be awkward._

------
zimpenfish
If you use modulus instead of bitmasking, it doesn't have to be power-of-2
size, does it?

~~~
DblPlusUngood
No, the size of the array doesn't need to be a power-of-2 if you use modulus
to derive indices. But you need to deal with the overflow somehow. For
instance:

0xffffffff % 7 = 3, but (0xffffffff + 1) % 7 = 0.

~~~
RBerenguel
Also as mentioned elsewhere in the comments, modulo is expensive, even more
for non-powers of 2

~~~
pklausler
Modulus by a power of two is cheap. Modulus by a constant is a multiplication
by reciprocal and a shift. And if your argument is in [0..2N], mod N is just a
conditional subtraction that doesn't even require a branch.

~~~
CorvusCrypto
cheap is relative right? I mean a multiplication can be spread over shift and
add/sub instructions whereas a mask is just one instruction I think right?

------
blauditore
> I've must have written a dozen ring buffers over the years

Why would someone do this instead of re-using previous (or third-party)
implementations? Of course unless it's all in different languages, but I don't
think that's the case here.

------
doktrin
> So there I was, implementing a one element ring buffer. Which, I'm sure
> you'll agree, is a perfectly reasonable data structure.

I didn't even know what a ring buffer was

where do I dispose of my programmer membership card?

edit : lol, what a hostile reaction...

~~~
doktrin
I honestly can't tell whether the downvotes are from elitist neckbeards or
offended plebs

pls explain I'd love to hear

~~~
mikekchar
Probably just because it doesn't add to the discussion. Though, from a certain
standpoint it shows one of the problems with our education system pretty
clearly. This is truly a fundamental technique. I don't know how one gets out
of school without knowing it. It doesn't say anything about you, but it says a
lot about what we are teaching people. Embarrassingly, for a long time I
thought I had invented this technique ;-)

~~~
doktrin
I didn't attend college or graduate school (yeah ik ik I'm a pos), so that may
well go a ways towards explaining my dumbassery

~~~
mikekchar
Don't worry. Programming and computer science is one of those things that
anyone can learn on their own. If you don't mind some advice, though, try not
to be embarrassed by things that you don't know. I can imagine that it is
difficult, especially if you don't feel confident about your previous
education. Even if most other people already know it, it just means that you
have the pleasure of discovering it (as a certain XKCD comic pointed out).

One thing I've said to many people starting out (especially those without an
academic background in the area) is that there is a lot to learn. Sometimes at
the beginning, you improve so quickly that it is easy to think, "I must be
getting close to knowing it all". After several decades in the industry,
though, I'm still learning brand new (to me!) , important things every single
day. In many ways, the best programmers are the ones who can see how much they
_don 't_ know, not how much they do know.

~~~
doktrin
I want you to know I genuinely appreciate you taking the time to write that.
It's both helpful and uplifting. I've been going through a rough patch
professionally and in life, and your comment lifted my spirits and brought me
to tears (as absurd as I'm sure that must sound).

From one stranger on the internet to another : thank you.

