
The Magic Ring Buffer (2012) - jsnell
https://fgiesen.wordpress.com/2012/07/21/the-magic-ring-buffer/
======
AndyKelley
libsoundio has a working implementation of this. It's been tested on Linux,
Windows, and MacOS.

[http://libsound.io/](http://libsound.io/)

Interesting bits of the implementation are here:
[https://github.com/andrewrk/libsoundio/blob/1fe64770bde0a4fb...](https://github.com/andrewrk/libsoundio/blob/1fe64770bde0a4fb4a9a7f08e3bce8d4ee89b7c2/src/os.c#L617)

~~~
AstralStorm
What are those locks I see in realtime functions? Sources of unbounded runtime
and hanged machines?

Can you prove they are always safe? (at least on the level of linux kernel
LOCKDEP) I bet you cannot, because 1.1.0 changelog mentions "fixing" a
deadlock by tossing functionality.

Likewise, I see a lot of places that do reference counting (those _ref/_unref
calls). Those are probable deallocations on heap from RT side, which have
unbounded runtime unless you have a realtime memory allocator. I haven't
noticed anything like that in the code.

This coming from a guy who supposedly "has carefully read the documentation
for every audio backend and understands the purpose of every line of code."

For people who want to get a primer on such issues, I recommend this book:
[http://www.amazon.com/Real-Time-Systems-Programming-
Language...](http://www.amazon.com/Real-Time-Systems-Programming-Languages-
International/dp/0321417453) It's a bit old despite updates (e.g. no C++11, no
C99), but still highly relevant. The techniques themselves and the analysis
parts are well written.

~~~
AndyKelley
You are mistaken about the presence of locks and memory deallocation in real-
time functions. I invite you to point out specific lines of code and I'll help
you fix your understanding.

------
pierrec
I can indeed see this being useful for realtime audio. I spent a lot of time
trying to optimize IPC performance while making a 32 - 64bit bridge for audio
plugins. I used a related hack that allowed me to reduce inter-process
synchronization latency to zero (though that phrasing might be disputable).
The hack was essentially that whenever the host requested an audio block from
a plugin, it would get the output not for that block but for the previous
block instead. This introduced a fixed audio delay for bridged plugins, but it
eliminated IPC latency, so even complex tracks with lots of bridged plugins
wouldn't stutter.

Thankfully I don't put up with any of this now that all the software I use
works correctly in 64 bit. Still, latency is often a huge PITA with realtime
audio.

What's interesting is that as processors get faster, audio plugin makers
publish stuff that uses more and more processor. So much that I'm now
considering getting one of those new 16-core processors for my next
workstation.

~~~
wallacoloo
> What's interesting is that as processors get faster, audio plugin makers
> publish stuff that uses more and more processor.

Uggh. I know how you feel. For a while now, I've wanted to create some sort of
audio processing engine (either a very versatile LADSPA/VST/AU/etc plugin or a
minimal DAW) wherein all the different components that usually run their own
code to generate/process audio are described as data instead (e.g. transfer
functions), and then the host decides how to actually compute the audio. This
would give lots of room to try new optimizations that aren't usually possible,
and you only have to optimize _one_ area of critical code, rather than
optimizing each individual plugin. This is especially beneficial when trying
to make conditional use of processor extensions like SSE.

~~~
Kristine1975
This sounds a bit like what Mac OS X's CoreImage does for pixel
transformations:
[https://en.wikipedia.org/wiki/Core_Image](https://en.wikipedia.org/wiki/Core_Image)

 _Instead of applying a series of filters individually, Core Image assembles a
dynamic instruction pipeline so that only one calculation needs to be applied
to the pixel data to achieve a cumulative effect... Regardless of the number
of filters, Core Image assembles the code for this instruction pipeline with a
just-in-time compiler_

------
jevinskie
See also :) [https://mikeash.com/pyblog/friday-qa-2012-02-17-ring-
buffers...](https://mikeash.com/pyblog/friday-qa-2012-02-17-ring-buffers-and-
mirrored-memory-part-ii.html)

------
rossy
It's definitely a failing of modern virtual-memory operating systems that
there's no nice way of doing this without getting the filesystem involved or
creating race conditions. Windows almost does this right with the Nt* APIs for
manipulating sections, but user space programs are not supposed to use those
directly. Instead they're stuck with the Windows 9x-era public APIs. A modern
API for manipulating virtual memory would be a really powerful thing to have.

~~~
to3m
I think you can do this on OS X using the Mach vm_XXX family of functions
(I've only used them for querying state, though, rather than rearranging the
mappings.)

The documentation is pretty terrible, though, so some experimentation may be
necessary. It's also even less like the POSIX parts than the NS bits are...

(Apple doesn't seem to tell you much; OK references are
[https://www.gnu.org/software/hurd/gnumach-doc/Virtual-
Memory...](https://www.gnu.org/software/hurd/gnumach-doc/Virtual-Memory-
Interface.html#Virtual-Memory-Interface) [http://www.amazon.co.uk/Mac-OS-
Internals-Systems-Approach/dp...](http://www.amazon.co.uk/Mac-OS-Internals-
Systems-Approach/dp/0321278542), and, of course,
[http://www.opensource.apple.com/source/xnu/](http://www.opensource.apple.com/source/xnu/).
The source has some reference-style documentation in it.

~~~
aktau
This is mentioned in the comments of the original article, I'll quote:

> There are a couple of in-depth articles about implementing this same idea on
> Mach / OS X:

> [http://www.mikeash.com/pyblog/friday-qa-2012-02-03-ring-
> buff...](http://www.mikeash.com/pyblog/friday-qa-2012-02-03-ring-buffers-
> and-mirrored-memory-part-i.html)

> [http://www.mikeash.com/pyblog/friday-qa-2012-02-17-ring-
> buff...](http://www.mikeash.com/pyblog/friday-qa-2012-02-17-ring-buffers-
> and-mirrored-memory-part-ii.html)

------
davvid
The article mentions that Wikipedia has a "POSIX implementation" but that code
has been scrubbed from the article.

I dug around in the history and was able to dig it out:

[https://en.wikipedia.org/w/index.php?title=Circular_buffer&o...](https://en.wikipedia.org/w/index.php?title=Circular_buffer&oldid=647976593)

------
daurnimator
That's a really cool idea!

