Hacker News new | past | comments | ask | show | jobs | submit login
Android’s 10 Millisecond Problem explained (superpowered.com)
431 points by lmedinas on April 16, 2015 | hide | past | favorite | 299 comments

Reading about this takes me back to the late 90s. I had just gotten serious about recording music on my computer and purchased an 866mhz dell pc. I was a using windows 98 and ASIO was still mostly a Steinberg only development and I couldn't afford cubase. I got a hand me down version of cakewalk pro audio 8. I remember the latency of that set up was about 35ms and thinking this is pretty good. There was some degree of having to start playing just slightly early so you could stay on time. With guitar and keys this was doable but I'm sure it would have been maddening with drums. I think it was when fl studio 3 came out that I had the first "daw" that supported ASIO. I remember being blown away at how much tighter everything sounded from that point on.

For those interested in why iOS is better than android, a good summary would just be to say Core Audio was designed from the ground up to be a low level low latency audio API. There are fewer layers in it vs android. When designing OS X Apple knew they had a large professional media market so that got priority. I am interested to see the results of these guys efforts and have wondered when an ASIO equivalent would pop up for android.

Is it possible that their Audio API is much simpler due to the fact that they control all of the implementations (Hardware interface) of iOS?

With Android the variety of customized android builds on various manufacturers does make this more challenging than it needs to be. As the article points out, many vendors simply break realtime audio by using slow drivers and slow paths.

However, as someone who has dealt with Android, iOS, Windows Phone, and BB audio APIs, Android's audio API really is just amateur hour compared to Apple's, even to the present day. In older versions it was simply unusable for anything more complicated than playing a sound effect or fixed-length mp3 file. Today it functions for more use cases but still abstracts concepts (like codecs and container parsing) in flawed and inconvenient places.

Windows Phone and BB are still even worse though.

Do you think Ubuntu for Phones (or whatever it's called) will suffer from this too? Since Ubuntu is using a portion of Android.

I'm not that familiar with Ubuntu Phone, but from a look at the docs I am concerned that it may have some of the same media issues (or at least the same variations between vendors). According to the architectural overview in the porting guide[1] their Qt-based API sits on top of some abstractions of their own, which then sit on top of stagefright, OMX, and Android's HAL (and thus Android's drivers). Not reassuring.

I'd personally be much more at ease if I saw ALSA sitting directly under Qt, but I can understand why they'd want to leverage the huge Android hardware ecosystem.

1. https://developer.ubuntu.com/en/start/ubuntu-for-devices/por...

Not really. It's the same API they've been using on Macs since the PowerPC days, so they've implemented it for a bunch of different audio chips over the years and that includes a lot of common off-the-shelf solutions. The Hackintosh scene provides fully-capable drivers for all the audio chips you're likely to find in a PC today. Hardware selection or effort put into each driver are almost certainly not factors.

Considering OS X can run on regular old Intel PCs (including the Surface 2!), I'm going to guess that they just have a rock solid, low latency solution, and not something tied to a specific platform.

Thank you, that's exactly what I was wondering. I didn't see anything in the article that gave good reasons for why iOS would be a significant improvement.

Hmm, or it could just be there are a ton of very good free audio apps for android so no interest in paying for them?

> There are fewer layers in it vs android.

Fewer layers usually also means less flexibility.

Actually you have it inverted. Fewer layers means fewer abstractions. Less abstraction means you're lower level, thereby having increased flexibility.

Edit: Anyone care to elaborate why the downvotes? From a technical standpoint, I'm absolutely right.

Then assembly is more flexible than your favorite high-level language? It's true that you can, by definition, do more with low-level access. And I suppose that's one definition of "flexibility". But I think most developers would use the word to mean that various pieces of functionality can be put together easily. So they might say, for example, that Unix pipes are "flexible". I think I'd usually agree with that meaning.

Yes, assembly is more flexible than my favorite higher level language.

Because if you have access to assembly, you can implement the whatever higher level language or semantics you want.

Whereas if you only have a higher level language, you have to work with/around the abstractions baked into it.

For realtime applications it's better to at least have access to a low-level, less overhead API.

That is true.

For example, in assembly we can create a higher level language with an absolutely air-tight, precisely tracing garbage collector that is free of issues like false retention.

We cannot do that in C. That's because there are areas of the program state that are "off limits", and the compiler generates "GC ignorant" code.

do you mind elaborating on this? What would be off limits, and what in assembly would allow a garbage collector free of false retention vs C?

What in assembly would allow freedom from false retention is that we know exactly what is in every register and memory location (because we put it there). We know where the GC has to look for root references and where it doesn't have to look.

In C, if we have a pointer p which is the last reference to some object, and is not used any more, and add the line "p = NULL", hoping to drop a reference so the object can be reclaimed, there is no guarantee that the compiler actually generates the code which does the assignment. Since the variable has no next use, and the compiler doesn't know anything about garbage collection, the assignment looks like wasteful, dead code that should be optimized away. Even if the scope finishes executing, the compiler can leave behind a memory location which still references the object.

Here is something else, not related to GC. In assembly language, we can make ourselves a calling convention for variadic functions which know how many arguments they have. As we build up the higher level language, it will have nicely featured variadic functions.

In C, we are stuck with <stdarg.h> which doesn't have a mechanism for the callee to know where the arguments end. The language has no flexibility to add this --- without resorting to approaches which will basically involve assembly language.

In C you can create a higher level language with an absolutely air-tight, precisely tracing garbage collector that is free of issues like false retention.

You can do it portably if you just don't store objects on the C stack, and you can do it non-portably if you do so.

> In C you can create a higher level language with an absolutely air-tight, precisely tracing garbage collector that is free of issues like false retention.

If so, you will be the first; I look forward to the "Show HN:" when it's done.

> You can do it portably if you just don't store objects on the C stack,

Sure, for example, you can do it portably in C if you write a complete emulator for an 80386, and then use that to run an assembly language program. That program isn't a utility for your C code; it's not extending the host C with a garbage collector or whatever else.

If you do not store object references on the stack, your use of C is severely crippled to the point that it's not really C any more. For one thing, C function arguments are on the stack (or, more abstractly, "automatic storage"), so say goodbye to conventional use of C argument passing: the backbone of most normal C programming.

That garbage collector isn't for C. C has automatic storage, and a proper garbage collector has to traverse it.

> and you can do it non-portably if you do so.

Sure, if non-portably means going as far as forking a specific C compiler with your custom hacks, and requiring that C compiler, or else living with the imprecision and taking a whack-a-mole approach to plugging the issues as they arise.

Chicken scheme uses C function calls, and merely requires a contiguous stack for automatic storage (which is technically non-portable, but doesn't require forking a compiler):


[edit] And here is a toy scheme interpreter that uses precise garbage collection to show off the Ravenbrook MPS


You're ignoring the fact that speed is paramount (the whole point of the article). If you want flexibility and speed, you're going to look at using assembler (where you can do anything that can be done, running as fast as it can run) instead of a high-level language (where you have to work with packaged functionality and aggregates hiding complexity). No, it's not going to be easy.

The fact that Apple totally controls both sides of the symbiotic audio hardware & SDK running on a tiny set of products, while Android must have an SDK which accommodates an unknown range of hardware for hundreds of products, means iOS will have an inherent advantage.

    The fact that Apple totally controls both sides of the symbiotic audio hardware & SDK running on a tiny set of products

You can use CoreAudio with third party audio hardware

There are a TON of third party audio interfaces that CoreAudio supports.

One of the counter arguments would be that more layers of indirection lead to more flexibility.


Heavily down voted for taking a one line comment about whether more or fewer layers of abstraction is more flexible, and using it to beat your "I hate apple" drum about terms and conditions and control and Muh Freedoms.

You're being that guy, so obsessed with a thing that you see it everywhere and think every conversation is about it.

Have you ever seen Rogue Amoeba's Audio Hijack for OS X?

It proves the system supports a ton of flexibility.


I am not sure why this is downvoted. Abstraction layers are often added to give flexibility and ease of use, sometimes at the cost of performance. For example ALSA has features like muxing together audio from several apps, while a lower level API might only allow one app to use audio.

I think it's being downvoted because the point of abstraction layers is generally to make specific things easy at the cost of making most other things much harder. For example, Rails makes building web apps easier, but it would be much harder to make a command-line program in Rails. Ruby itself is fine for command-line programs, but there are a host of things that are impossible using it which are doable with raw C. That's why Ruby allows for C extensions: it breaks Ruby's abstractions, but restores some lost flexibility.

In that sense, flexibility and ease of use aren't synonyms; they're opposites. An open flame is flexible but not easy to use, so we have the toaster, which is easy to use but not flexible at all.

iOS can mix audio while preserving low latency. It works much better because CoreAudio is basically a full audio routing and mixing engine and forces more realtime-ish requirements on implementations. The system level engine just takes audio from apps as if they were submixers.

None of that stops you from having a high-level API on top, and in fact iOS has several at different levels of abstraction: AVAudioEngine gives you a lighter weight but object-oriented engine that's less complex to setup. AVAudioPlayer handles almost everything for you.

Just came here to say that. The fact that core audio is a great performing API means even less performance critical stuff still gets the benefits of it even if the easy to use abstractions like AVAudioPlayer give up some flexibility and performance vs the native API. While abstraction can make things more flexible, I would argue that the main purpose is ease of use and hiding implementation details.

In your ALSA example the lower level API doesn't prevent you from mixing audio, it simply isn't as easy because you must implement it yourself, this could be arguably more flexible as well.

Abstractions can hide features of the hardware but they cannot create new hardware. Whatever the abstraction is doing the client software could do instead with the lower level API.

So I would say abstraction layers are often added to give ease of use, sometimes at the cost of performance and flexibility.

I think this whole subthread is about different meanings of "flexibility".

An audio abstraction layer can allow different kinds of audio hardware without changing the apps, which is a kind of flexibility.

Interesting to read of the details behind this issue. This has been a serious issue for me - it's actually why I own an iPhone.

My first modern touch device was an iPod touch 4. I downloaded Garage Band and, as a long time milt instrumentalist and composer, loved it. I was amazed by how well the touch instruments worked and how easily I could record riffs and flesh out small snippets of songs. It ran almost flawlessly on the 4th gen Touch.

Next, I decided to buy an Android phone - a Motorola Droid 2. I was surprised to find that despite the power advantage over the iPod, none of he music apps I tried were usable. The drum programs, for instance, we're so lacy and unpredictable as to be worse than useless. Hit a drum, and you may hear it seemingly instantly, maybe 1/8 a second later, maybe half a second, possibly never. Meanwhile the tiny iPod could play Garage Band instruments so well one could use it for live performance.

I upgraded my phone twice, first to a Droid X, then a Galaxy S3... Each time was disappointed that the improved specs gave no improvement in the terrible audio performance.

Currently I have an iPhone 6 so I can use Garage Band. Kudos to apple for doing this right - it's the best app I've ever used.

Thanks for a musician's perspective!

Google has a whole section on this here (if you want more technicals and less sales-pitch):


including a bunch of measurements of Nexus devices here (even going as far back as the Nexus One on Gingerbread):


And here are some measurements for iOS as well (Superpowered Mobile Audio Latency Test App for Android and iOS.)

The iOS devices come out at 6-18ms, Androids at 17-860ms. The faster Androids have Samsung's Professional Audio SDK.


TL;DR the Linux layer (ALSA) and the Java layer (Audio Flinger) use widely compatible but high latency techniques, whereas Apple designed their API's and hardware such that these layers can be optimized to almost nothing. (From the article: http://superpowered.com/wp-content/uploads/2015/04/Android-A...)

ALSA is not the problem, it works very fine for the low-latency case, I can reliably run soundcards with light processing load at 96 kHz @ 64 frames/period (20.7ms --> 1.4 ms latency) on a quad core i5, e.g. for running a reverb effect, but most of the time I only record and will settle for 1024 frames/period or so. (210ms --> 20ms). The period size, just for completeness, is the number of samples recorded on each block that is forwarded to the audio processing application.

If whatever audio framework you use doesn't allow to run processing with a input-to-output delay (latency) of two times the period size, it's broken (probably the case for Audio Flinger at Android, don't know much about it).

    ➜  ~  jackd -d alsa -p 64 -r 96000
    jackdmp 1.9.10
    Copyright 2001-2005 Paul Davis and others.
    Copyright 2004-2014 Grame.
    creating alsa driver hw:0|hw:0|64|2|96000|0|0|nomon|swmeter|-|32bit
    configuring for 96000Hz, period = 64 frames (0.7 ms), buffer = 2 periods
    ALSA: final selected sample format for capture: 32bit integer little-endian
    ALSA: use 2 periods for capture
    ALSA: final selected sample format for playback: 32bit integer little-endian
    ALSA: use 2 periods for playback
(this is on my laptop, just for illustration purposes)

But ... why is there a period size? Isn't that a broken design that can only introduce latency? What is wrong with "however much audio data is ready when the application asks, send it"?

Well... it's like all audio chipsets work nowadays. Only some DSPs will be able to efficiently handle single-frame data processing, but they have the help of dedicated address generators and lightweight interrupts synchronized to the digital interface.

If you write "process however much audio data is ready", then you already imply that your CPU will not be up to speed to process 48000 interrupts/second reliably and you need some buffering.

And if you have to assume that sometimes you'll miss 100 samples (which, then, you'll process en-block), this means that to work reliably, you'll have to start at least 100 samples early so that you don't miss the deadline of the DAC, because the DAC will, with intractably output one sample every 48000th of a second. This already implies some kind of periodic processing of blocks, doesn't it?

(and yes, such a scheme will theoretically allow you to half the latency from something to 2period-size to 1period-size + the time for processing)

Third, a lot of the algorithms for processing audio can be implemented much more efficiently if you have a known block size and don't have to calculate your filters or convolutions with constantly changing number of samples for every step.

Also efficiency of processing will decrease (reloading the cache after each interrupt when switching from processing plugin to processing plugin), so the time spent on calculating per frame will go up if your period size gets smaller. At one point you'll need exactly one "period size" to calculate one period size worth of samples: That's the maximum your machine can handle, and at that point you'll have a latency of your "period size"*2, which is exactly the same as running with a fixed period-size ;-). And as you can choose the period-size rather freely (maybe completely arbitrary, maybe 2^n, depends on the chipset/hardware) there's no disadvantage left.

As someone who has actually done a good amount of soft-real-time audio programming, I can tell that you probably haven't. Everything you are saying about CPU speeds is made-up nonsense. Look into how these things are done on systems where folks actually care about latency (for example, commercial audio hardware, game consoles, etc).

I understand that people are downvoting this because it is just a negative comment, or something. But, I felt it was VERY important to call out information that is clearly false. Someone who doesn't know about audio programming might read the above post and think "hey that sounds plausible, I learned something today" when in fact they were deeply misled. Registering dissent is important and I tried not to be rude about it. I did go on to give a sketch of reasons in the thread below (but it is a complex issue with a lot of details; exact situations differ on every platform; etc, etc.)

I didn't downvote you and was genuinely interested in why you were considering my information to be incorrect. And I now realize it's because I've always worked with systems where processing is always strongly synced to the central frame/sample/... clock. Also I read your initial comment as "why don't we use 'process every single sample' to reduce latency at all costs" which is -as you wrote- clearly a bad idea. Sorry for misrepresenting that.

I thought it was a tad condescending. Instead of:

>As someone who has actually done a good amount of soft-real-time audio programming, I can tell that you probably haven't. Everything you are saying about CPU speeds is made-up nonsense.

You could have wrote:

>I've actually done a good amount of soft-real-time audio programming and everything you're saying about CPU speeds doesn't make sense.

I think it's good to call out what you see as misleading information but that may have been going a bit too far.

Please tell me what claim I made about CPU speeds is unsubstantiated nonsense?

I could type up a thorough explanation, but it would take about an hour, and I have a lot to do. It is actually not a bad idea to do such a write-up, but I don't think the appropriate venue for it is an ephemeral post on Hacker News ... I'd rather blog it somewhere that's more suitable for long-term reference.

But I'll drop a few hints. First of all, nobody is talking about running interrupts at 48kHz. That is complete nonsense.

The central problem to solve is that you have two loops running and they need to be coordinated: the hardware is running in a loop generating samples, and the software is running in a (much more complicated) loop consuming samples. The question is how to coordinate the passing of data between these with minimal latency and maximum flexibility.

If you force things to fill fixed-size buffers before letting the software see them (say, 480 samples or whatever), then it is easy to see problems with latency and variance: simply look at a software loop with some ideal fixed frame time T and look at what happens when T is not 100Hz. (Let's say it is a hard 60Hz, such as on a current game console). See what happens in terms of latency and variance when the hardware is passing you packets every 10ms and you are asking for them every 16.7ms.

The key is to remove one of these fixed frequencies so that you don't have this problem. Since the one coming from the hardware is completely fictitious, that is the one to remove. Instead of pushing data to the software every 10ms, you let the software pull data at whatever rate it is ready to handle that data, thus giving you a system with only one coarse-grained component, which minimizes latency.

You are not running interrupts at 48kHz or ten billion terahertz, you are running them exactly when the application needs them, which in this case is 16.7ms (but might be 8.3ms or 10ms or a variable frame rate).

You don't have to recompute any of the filters in your front-end software based on changing amounts of data coming in from the driver. The very suggestion is nonsense; if you are doing that, it is a clear sign that your audio processing is terrible because there is a dependency between chunk size and output data. It should be obvious that your output should be a function of the input waveform only. To achieve this, you just save up old samples after you have played them, and run your filter over those plus the new samples. None of this has anything to do with what comes in from the driver when and how big.

Edit: I should point out, by the way, that this extends to purely software-interface issues. Any audio issue where the paradigm is "give the API a callback and it will get called once in a while with samples" is terrible for multiple reasons, at least one of which is explained above. I talked to the SDL guys about this and to their credit they saw the problem immediately and SDL2 now has an application-pull way to get samples (I don't know how well it is supported on various platforms, or whether it is just a wrapper over the thread thing though, which would be Not Very Good.)

> Any audio issue where the paradigm is "give the API a callback and it will get called once in a while with samples" is terrible for multiple reasons

This is, actually, how most professional audio APIs are designed and they generally work quite well. ASIO, VST, JACK, PortAudio, CoreAudio, etc.

The other commenter was talking about audio software that both consumes and produces samples at a fixed rate. Clearly, if the audio software is late grabbing 64 samples from the input device, it's also late delivering the next 64 to the output device, and there will be a dropout. The output sample clock has to be the timing master, and the software can never be late, and since it's also waiting for the input audio, it can never be early enough to "get ahead", either.

I am not sure we can make the assumption that the input and output devices are on the same clocks or run at the same rates. Maybe they are (in a good system you'd hope they would be), but I can think of a lot of cases where that wouldn't be true.

However, even when they are synced, you can still easily see the problem. The software is never going to be able to do its job in zero time, so we always take a delay of at least one buffer-size in the software. If the software is good and amazing (and does not use a garbage collector, for example) we will take only one delay between input and output. So our latency is directly proportional to the buffer size: smaller buffer, less latency. (That delay is actually at least 3x the duration represented by the buffer size, because you have to fill the input buffer, take your 1-buffer's-worth-of-time delay in the software, then fill the output buffer).

So in this specific case you might tend toward an architecture where samples get pushed to the software and the software just acts as an event handler for the samples. That's fine, except if the software also needs to do graphics or complex simulation, that event-handler model falls apart really quickly and it is just better to do it the other way. (If you are not doing complex simulation, maybe your audio happens in one thread and the main program that is doing rendering, etc just pokes occasional control values into that thread as the user presses keys. If you are doing complex simulation like a game, VR, etc, then whatever is producing your audio has to have a much more thorough conversation with the state held by the main thread.)

If you want to tend toward a buffered-chunk-of-samples-architecture, for some particular problem set that may make sense, but it also becomes obvious that you want that size to be very small. Not, for example, 480 samples. (A 10-millisecond buffer in the case discussed above implies at least a 30-millisecond latency).

Music or video production studios typically have a central clock, so for this use-case the sample rates should be perfect. But even if the input and output devices are on perfect clocks, with NTSC (59.94 Hz), you'd need a very odd number of samples per video frame in your software, if your processing would happen at a integer fraction of the video frame rate.

Do you know whether studios use 48000Hz with 59.94fps or 48000/1.001 ≈ 47952Hz? Does converting from 24fps film to 23.976fps Blu-ray require resampling the audio? Or are films recoded at 48048Hz and then slowed to 48000 for consumer release?

The short answer is that it's complicated. Digital film (DCP) is typically 24 fps, asirc -- and that doesn't go well into 60, or 50. And the difference is enough that you need to drop a frame and/or stretch the audio. And sometimes this doesn't go so well.

There's a relatively recent trend to try and record digital all the way, and this is also complicated. Record at 24 fps? At 30? At 60? 60 fps 4k is a lot of data. And sound is actually the major pain point -- video frames you can generally just drop/double, speed up/down a little to even things out. But 24 fps to 60 fps creates big enough gaps that audio pitch can become an issue.

If everything happens strictly synchronous to your audio clock, then fixed block processing is the way to go.

But jblow is right in that when you have to feed in samples from a non-synchronized source into your processing/game/video-application/... then trying to work with the fixed audio block size will be terrible/require additional synchronization somewhere else, such as a adaptive resampler on the input/output of your "main loop".

> Since the one coming from the hardware is completely fictitious

Why do you say this? The USB audio card (or similar) is generating blocks of audio at a fixed rate, no?

Maybe for video playback or games you need to synchronize audio and video, but there is no need to do that for music production apps.

If you are writing some sort of synth, as soon as you receive a midi note or a tap, trigger the synth and the note will play in the next audio block. No need to wait for the GUI to update.

If you are doing some sort of effect, grab the input data, process and have it ready for the next block out. I don't understand why you need a second loop.

Well, it depends on how that specific hardware is designed, but we could say that hardware that is designed to generate only fixed blocks of audio is very poor from a latency perspective.

I think you will find, though, that most hardware isn't this way, and to the extent this problem exists, it is usually an API or driver model problem.

If you're talking about a sound card for a PC, probably it is filling a ring buffer and it's the operating system (or application)'s job to DMA the samples before the ring buffer fills up, but how many samples is dependent upon when you do the transfer. But the hardware side of things is not something I know much about.

> If you are writing some sort of synth, as soon as you receive a midi note or a tap, trigger the synth and the note will play in the next audio block

Yeah, and waiting for "the next audio block" to start is additional latency that you shouldn't have to suffer.

> If you are doing some sort of effect, grab the input data, process and have it ready for the next block out. I don't understand why you need a second loop.

The block of audio data you are postulating is the result of one of the loops: the loop in the audio driver that fills the block and then issues the block to user level when the block is full. My whole point is you almost never want to do it that way.

Can you recommend some good code / APIs to check out that don't do it block based? I usually use JUCE which is block based, and I assumed it was just a thin wrapper around the OS APIs which we also block based.

If you want a further analogy, it's like public transit. Which is a better commute: You take Bus A, which then drops you off at the stop for Bus B, at which you have to wait a varying and indeterminate amount of time, because the schedules for Bus A and Bus B are not synchronized; or just taking Bus C, that travels the same route without stopping?

Doesn't a lot of audio processing rely on FFTs, for which you need large block sizes? I agree that constantly varying fractional block sizes don't seem like a good idea.

1) Userspace sleeps until audio data is available because there's an eternity of clock cycles between each sample; do you want to be woken up after every sample if you're done doing work or a handful? You could also busy wait but that kills battery life.

2) In order to hand you off some samples you have to at least make one copy. It's convenient to be able to copy an entire _something_ without worrying about the sound card trying to DMA into it (and any hardware-specific details to make that possible).

The difference here is that with something like jackd (similar conceptually to CoreAudio or ASIO) there is just the hardware buffer in the kernel and the user buffer in jack which can be basically "shared" by all jack-enabled apps without additional copying on the user-space side. On the other hand you can't do sample-rate conversion and per-app volume control with something like that.

But if you're doing audio software you're not worrying if flash is too loud and Skype is too soft. It's a whole-system thing enabled by the software and all the buffers and latency are being managed there.

Why is (2) dependent on anything regarding the number of samples you get at once? Sure, suppose there is a maximum block size; why does anything regarding copying "an entire something" require you to have filled that entire block size with live data? Why can't you just copy however much is available in the buffer?

I don't understand why copies are even relevant: you can make several extra copies and nobody will ever notice. Audio data is trivial in modern systems. Let's say there are two channels coming in; 48000 * 2 * 2 bytes per second is an absolutely trivial amount of data to copy and has been for many years. Building some convoluted (and unreliable) system just to prevent one copy per application, when each application is going to be doing a lot of nontrivial processing on that data, strikes me as foolish. But don't listen to me, look at the fact that Linux audio is still famously unreliable. If the way it's done were a good idea, it would actually work and everyone would be happy with it.

Typically because the hardware delivers audio in blocks. I know mostly about how USB works, but I imagine it applies to other hardware types as well.

USB has several transfer types: interrupt, bunk and isochronous.

Interrupt is initiated by the device so it wouldn't help you read variable amounts.

Bulk is good for mass data transfers, but has guaranteed access to the bus, so you'd possible get audio dropouts when accessing another USB device.

Isochronous can reserve bus bandwidth and have latency guarantees, but must occur on a fixed schedule. Since they are on a fixed schedule, they always have the same amount of data, hence a fixed block size.

Since data is arriving at the OS in fixed blocks, the lowest latency way to handle the data is deal with the blocks when the arrive. If you wanted to read variable amounts of data, you'd need to add a buffer on top that could hold a variable amount of data which would add latency.

Copying variable amounts of data isn't slow, but dealing with dynamic memory allocation is. If you needed to allocate a different sized block every few milliseconds you'd be spending the majority of your time allocating memory rather than processing audio.

This system can work very well: OS X, iOS, Windows and Linux can all get very low latencies. The issues with Android has nothing to do with block sizes, but something else in it's architecture.

This is actually how PulseAudio attempts to work. It doesn't always work out (skype's startup sound being infamously jittery), but the thought is there.

Audio Flinger is not Java. On the graph you see absolutely no part which runs on Java.

Android's media architecture with it's "push" design is not more compatible than anything else with a "pull". The many layers doesn't add any flexibility, they are "only" the result of many years hacking.

ALSA with JACK has been shown to give latencies less than 2ms.

I don't know about Android but you can get quite good audio latency with a full Linux install, using ALSA(via JACK), provided your audio interface is up to the task.

If you’re interested in this topic, you should listen to episode 20 of the Android Developers Backstage podcast. [1]

Raph Levien talks about audio latency and how they started working on minimising it in Lollipop. He mentions that it’s still an ongoing process.

The relevant part starts at 35 minutes into the episode.

[1]: http://androidbackstage.blogspot.com/2015/01/episode-20-font...

Also this:

Related presentation at Google I/O 2013. https://developers.google.com/events/io/2013/sessions/325993...

This was probably the best presentation I saw during Google I/O.

This is a great description of the lowest level problem. But for those wondering why iOS doesn't have the problem or why only 10ms is such a difference, the answer is that this is only the beginning of the problem of Android audio latency.

There are still additional problems that add latency that span the entire Android stack from the actual hardware and drivers, to the kernel and scheduler, to the Android implementation of their audio APIs.

Anybody who has tried to do any serious audio on Android knows the infamous Bug 3434:


Google I/O 2013 did a pretty good talk on the problem and shows how there are problems across the entire stack. Glenn Kasten pretty much carries the brunt of all the audio problems with Android. I find it telling that he had to handcraft his own latency measurement device using an oscilloscope and LED because there were no actual tools built into the OS to help them analyze performance.


Audio has been terrible since Android's inception. It has improved a little over time, but unfortunately, 7 years later is is still pretty much unacceptable for any serious work.

Whole Androids latency has been terrible since its inception, screen/input/audio.

Android is famous for $600 2GHz phones that drop frames when doing simple animations :(.

There is something I don't understand; maybe someone here can explain:

Sound travels at about 340 m/s (in a typical room). That means it travels about 3.4 metres in 10 milliseconds. Therefore another way to get a 10 millisecond problem is to stand 3.4 metres from the orchestra.

Most people sit farther than 3.4 metres from the orchestra, yet they don't complain about a lag between when the violin bow moves and the sound is heard. Why not?

(The speed of light is so fast that we can assume it's effectively infinite for the purposes of this argument.)

The audience doesn't need instant feedback and low latency, the performers do. If you listen to a classic rock album, you're hearing the sounds with a "latency" decades after they've been played. But the playing is cohesive and tight. If one of the guitarists was consistently 50ms off, you would notice it.

> Most Android apps have more than 100 ms of audio output latency, and more than 200 ms of round-trip (audio input to audio output) latency.

It's also much more than just 10ms latency. I play with digital instruments all the time, the latency can be as high as 15ms before I can tell. I don't know if an audience can perceive a 15ms latency, especially because you tend to "play early" to have notes land on time. But it's very upsetting for performing.

I haven't tried playing with music apps on Android recently, but when I did, the latency was not just long, but inconsistent, and would result in stuttering in the audio.

> If one of the guitarists was consistently 50ms off, you would notice it.

The best movie about this, ever: https://youtu.be/VnuImW1dWAk

that really is one of the top movies in recent history - like a shawshank for musicians...

Probably because they're not directly interacting with it, they're just watching it, which makes the delay a lot less noticeable. The 10ms delay is more of a problem for interactive apps where you eg. touch the screen and expect it to instantly make a sound. In this kind of feedback loop, even small delays are distracting and it becomes difficult to keep a beat because the lag is perceptible. If you tried to remotely play an instrument from 3.4 meters away, you would probably notice it too.

Sort of like why audience clapping tends to get really messy until the performer does the "hands above the head" thing to get them back in sync.

Clapping synchronisation is actually a well studied spontaneous occurrence, that just takes O(1) periods, see this article for some analysis: http://arxiv.org/pdf/cond-mat/0006423v1.pdf

He's talking about clapping in rythm during the music, not applause afterward (fun pdf tho)

Audiences have a natural tendency to slow down. They're mostly not musicians and I imagine their clapping as a response to the music (and the other clappers) rather than internalizing the music as a performer would.

If you start from a perfectly synchronized audience and assume they clap reactively, you'd expect them to delay each of their claps by ~1/2 speed of sound * width of the room. A good example for why on music you need a good rhythm (internal constant frequency driver) instead of acting reactively to perform well.

Yes, the audience will slow down regardless. Either the band slows down with them or they keep the same tempo and the clapping gets screwed up. It's frustrating as a performer trying to fight the audience and the only solution I know is to have the house levels sufficiently loud that the audience continually adjusts.

Lag is not an issue for the listener, it's an issue for the performer. An extra 10ms of lag when wearing headphones and playing a keyboard can make you feel completely disconnected from your instrument.

I'm a musician, 10ms latency would be fine, however as they note, most Android apps have 100ms latency, or 200ms round-trip latency. That is definitely not usable.

The funny part is that pipe organists would laugh at 100ms latency and say "cry me a river". With the pneumatics combined with the distance of the pipes from the performer, pipe organ latencies can be in the 200 to 500ms range. I asked my sister how she managed it and she said it was just a learned skill. She had to learn to completely tune out what she was hearing and play with the beat and music completely internalized.

The difference, of course, is that organists are usually not syncing up to other instruments. If there are other instruments involved, they tend to sync up with the organ.

The latency has to be predictable and for most people probably also consistent.

Huh, those tests show it has 35-50, not 200ms O.o

They also mention that those tests were made in the best case scenario (disabling noise correction, etc. for the sake of speed) with the best performing mobile, and by following Google's low latency guidelines. Most Android apps don't have the first two luxuries in the general case, and (apparently) don't bother with the last one, thus the much higher latencies in practice.

If you actually read the words in the article you will see that that test was the absolute best case they could find.

I am too lazy to try to find the exact numbers and versions but back in the Android 1.x, 2.x days, the latency was in the 200 ms ballpark. So things have improved a lot since then (even though there is still a lot of ground to cover)

Even at 35-50ms latency, it's still unusable.

Our brains perceive what we hear trailing what we see by small amounts to be completely normal. If you show people a video of an orchestra concert with the sound of a violin coming 50ms before seeing the bow move, most would immediately notice something is off.

Musicians performing together, however, is a much harder problem than just listening. Ask anyone who has ever performed in a DCI-style drum corps, they will tell you compensating for hearing someone on the other side of field 200ms or so late is incredibly difficult.

Probably because in this orchestra example the feedback is purely passive (you didn't take any action), whereas with Android the delays are after an action you took, so it's psychologically jarring (seeing the response to your action immediately, and hearing the response roughly 10ms later). Just my guess.

You are exactly right. This is why orchestras need a conductor who provides a visual signal for tempo, and marching bands have a drum major with a huge baton, while rock bands can just listen to the bass drum.

Android' problem is that it has a 100-200 ms lag. The stated 10 ms is the goal.

Also, most people don't notice the lag between the instruments of an orchestra because the instruments are close to each other. Their distance to the listener is not relevant.

80ms/30 meters rule. Human brain is compensating for the audio/video discrepancy up to ~that number.


tldr: Brain is inherently parallel, nothing happens in sync. In order to make sense of the outside world higher level functions are presented with artificially coordinated stimuli.

If you're way in the back in a big concert hall, you can definitely notice the lag, but it's not a big deal. The problem would be if different instruments reach your ears at different times, and since the closest seat is still a considerable distance from the audience, you're not going to be hearing sounds more than about ~20ms apart. The reverb of the hall also mushes everything together.

The problem you describe is a very real problem, however, for the musicians themselves. If you're sitting in a big orchestra and you try to index your playing off someone sitting on the other end of the orchestra, you will not be in time. That's why there's a conductor, so that the orchestra can be synchronized at the speed of light rather than of sound.

IANA neuroscientist but it seems reasonable that the brain will fix up small errors. It has to anyway, because it has its own input latency. Besides, the article is referring to discrepancies between multiple audio tracks, which don't depend on vision at all.

For a real-world example, listen to 2 TVs several meters apart and tuned to the same channel. At least with OTA or cable you can expect them to be playing ~simultaneously but the skew between the received signals is easily perceptible.

>Besides, the article is referring to discrepancies between multiple audio tracks, which don't depend on vision at all.

They do depend upon vision or touch if it is an interactive app.

>For a real-world example, listen to 2 TVs several meters apart and tuned to the same channel. At least with OTA or cable you can expect them to be playing ~simultaneously but the skew between the received signals is easily perceptible.

I believe what you're experiencing is the difference in decoding latency between different models of TV set. Several meters (3m) represents only about ~9ns (practically, low tens of ns if the cables are longer than necessary) maximum delay. It would not be directly perceptible by a person. Signal delay in a cable is ~1ns/ft.

>They do depend upon vision or touch if it is an interactive app.

Discrepancies between tracks are totally unrelated to the visuals. It's much easier to tell if two sounds are synced than a sound and a visual.

> Signal delay in a cable

I don't know what comment you read but it's not the one you replied to. Same model, synchronized visuals, easy to hear audio desync when you're closer to one.

Maybe because 10 ms delay is not perceived by our eyes. Eye does not have slow motion capabilities to be able to capture that small of frame rates.

Because they're sitting back and listening to the music. They're not pressing buttons and expecting feedback within a specific timeframe.

Isn't this kind of a known issue in Linux land? As much as it has improved, latency has always been the bane of audio applications in the Linux kernel. I remember in the days of kernel 2.2 that even XMMS would stop playing any music if I started using more than one or two applications.

Recently I got one of those cheap USB interfaces to connect my guitar. I spent some good 4 hours changing the kernel to "low latency" one provided by ubuntu, then trying to setup aRTs to run, then trying to make pulseaudio work with it, then figuring out how to keep both aRTs and pulseaudio dependent applications happy. In the end I got most of it working and I could run some kind of guitar effects application, but the next minute I realized that the volume control media keys stopped working. It was enough for me to throw the usb interface in the drawer and give up on linux for audio applications on workstations: buying a 50€ effect pedal was cheaper than all the time devoted to it.

It was and I'm guessing still is a problem on Windows too:


Linux has a Real Timer kernel you can run, I used to run it when doing audio stuff, it works great.

Windows overcame its audio latency issues in the Vista days by introducing WASAPI (Windows Audio Session API), an API especially designed for professional audio processing.

It grands an application exclusive, direct control of the audio hardware drivers, bypassing the higher-level audio layer of the OS.

That + real-time threads (I do not know how old that feature is) allows for pro-grade latencies.

I had the exact opposite experience (about 10 years ago, didn't try any audio since on linux): even after all possible RT patches and whatnot we were happy if we would get skip-free roundtrip latencies of 50mSec on Debian with pretty much top of the line pro soundcards. Might have been bad drivers, or maybe we were just screwing up, but on Windows (with ASIO)/OsX we would just install driver, open control panel, turn down buffer size to minimum and sampling rate to maximum and be looking at about 2mSec of latency with a simple loop I->O test app (IIRC but I'm pretty sure it was 64 samples @ 96kHz)

Trying to make Pulse work is a big mistake. The first step on any Linux setup is ensuring that it isn't installed.

One can get very short latency out of Alsa, up to the point where the hardware becomes your bottleneck. But that's extremely processor intensive, and won't work well if you try to share the dsp with several processes (if you want to get that extreme, I'd recommend you get extra hardware for exclusive use of the application you want low latency from - but here I'm talking about 1ms latency).

Anyway when using a cheap USB interface, I'd focus on improving the hardware first. Low latency and high throughput USB isn't cheap (nor is it available at every computer).

I am talking about my day-to-day workstation and laptop running (x)ubuntu. Pulse may not be great, but nowadays I get to use Skype, watch videos, and listen to music all day using Pulse... why should I go through all the trouble of removing it if it works for the most common tasks?

About the interface. I doubt that was the problem. When I got jack to work, I was getting 2-3ms latency between input and processed output. I did get the guitar effect application to work, I just thought it was too inconvenient to be forced to be aware of "what-application-uses-what-sound-system-and-when-I-need-to-flip-the-switches".

In any case, my goal was to have a alternative that could be (a) cheap and (b) convenient if I wanted to play with my guitar and have some DAW tools, not to see how low I could bring down latency in a linux system. The lesson learned is that it can be cheap, but not convenient.

Ps: did you go to Unicamp?

> why should I go through all the trouble of removing it if it works for the most common tasks?

Well, your question that implied you wanted it. Although, yes, the pedal is probably a better choice after all.

Anyway, Pulse is only needed for advanced tasks of streaming sound through a network, using application based mixing settings, etc. If you are only doing common tasks, they'll almost certainly keep working without it. It's one of those cases of an apt-get and you are done.

Yes, I went to Unicamp, 99's class. Is your nick based on your name?

"Isn't this kind of a known issue in Linux land?"

No. If you're doing serious audio in Linux, you use jack and get fantastic latency properties.

This is a Solved Problem.

> If you're doing serious audio in Linux

Except I am not. And I guess it would be the same case for the vast majority of people who have Android devices.

It is only a "Solved Problem" if you are talking exclusively about sound systems that are designed to deal with latency. The problem I was stating is that you can not transparently run "serious audio" and common desktop applications that rely on alsa/pulseaudio.

I would consider it a "Solved Problem" when I can get to install Audacity, Skype, my web browser, the mentioned usb interface connected and I can run some effect software... and run them all concurrently without caring how to setup the sound system. People can do that in MacOS/iOS/Windows, and they can't do that in Linux (GNU or Android).

>I remember in the days of kernel 2.2 that even XMMS would stop playing any music if I started using more than one or two applications.

Back then the sound subsystem didn't do any mixing or similar, so if some program grabbed /dev/snd, everyone else had to wait.

As for low latency sound work on Linux today, Jack is what you want rather than pulseaudio. Frankly Pulseaudio is a massive detour when it comes to Linux audio.

Pulseaudio has gotten a really bad rep, and I think there was at time where it was legitimately awful, but I think it's better than it was.

On mainstream distros with pulseaudio like Fedora or Ubuntu, audio just works, when you don't have low-latency requirements.

When you do have low-latency requirements, things are a bit tricky. You do pretty much need to get a low-latency or realtime kernel, and you definitely want to use Jack or ALSA. Maybe this setup is a little more frustrating than Windows, where you may just need to install one driver like ASIO4ALL.

But it's flexible, and you can actually get Pulseaudio and Jack working pretty well together after installing the pulseaudio-module-jack. You can turn on jack when you need it, turn it off when you don't, and all of the pulseaudio stuff will get routed through it so that you don't lose sound from your other applications. If you want, you can route audio from pulseaudio into whatever other audio applications you're using - sometimes I like taking Youtube videos and routing the audio through weird effects in Pure Data.

I toggle jack on and off with this little script, works pretty well for me right now: https://gist.github.com/YottaSecond/f0a1b515f95b2e791755

Ubuntu is the reason PA got a bad rap, and PA is an example of why I don't use Ubuntu.

When Ubuntu adopted PA, the readme file still described it as "the sound server that breaks your audio"

It was the most mature thing that had the features that Ubuntu wanted, so they adopted it despite the fact that it was clearly not yet ready for prime-time.

That all being said, PA is not the choice if you want to do DAW style stuff; it tends to prefer lower cpu utilization to lower-latency.

Yeah, I mentioned aRTs but I was actually thinking of Jack (too late to change it). The annoying thing with these different sound systems is that it turns a general purpose workstation into an either/or proposition.

- The applications that depend on pulseaudio were really not happy when jack was the sink.

- Skype wouldn't work.

- Because of the real-time requirements of the sound applications, everything else felt absolutely sluggish.

- The volume control stopped working.

All in all, I'd have to setup a separate system just to run jack-dependent applications.

Well at least having Skype go Pulseaudio only prompted the creation of https://github.com/i-rinat/apulse.

You can actually layer pulseaudio on top of jack and have per-application volume controls, plus letting some applications output to your usb headset and whatnot; and also allowing pro-grade applications use jack directly, not bothering with any of the pulse nonsense at all.

Hi everyone -- Gabor and I wrote the piece -- let us know if you have any questions we can help answer for you.

What are the concrete steps you are taking to tackle the issue?

Idea: A latency ranking for devices would put pressure on manufactures.

Will you work together with the Linux community so we all benefit from it?

Hi there -- latency ranking you ask?

As Westley from The Princess Bride might reply, "As you wish"

http://superpowered.com/latency :)

Awesome, thank you!

It's remarkable how apple cares about certain quality aspects of their devices. They would be interesting options if they wouldn't jail and lock you down :/ However, I wonder why newer and supposedly faster devices like the iPhone 6 have higher latency. I'd think to surpass the latter generation would be the goal for each successor.

Also, it seems Samsung is taking the challenge seriously.

Excellent research - thank you for doing it.

Glad you enjoyed it!

interestingly, a 2012 article[1] by Arun Raghavan mentions:

On the Galaxy Nexus, for example, the best latency I can get appears to be 176 ms. This is pretty high for certain types of applications, particularly ones that generate tones based on user input. With PulseAudio, where we dynamically adjust buffering based on what clients request, I was able to drive down the total buffering to approximately 20 ms (too much lower, and we started getting dropouts). There is likely room for improvement here, and it is something on my todo list, but even out-of-the-box, we’re doing quite well.

Let's get systemd on android and then see !

[1] http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-f...

One of the reason for the audio latency issue on Android is the complex power management circuitry (PMIC) implemented on most of not all Qualcomm SoCs. That's why to anyone with experience in regards to using Android device in audios production, most would agree that the older 2011 Galaxy Nexus is still the Android device to use. Unlike the the recent Snapdragon based Nexus device, the GNex is based on TI OMAP4460 (dual-core ARM Cortex A9). It might not be a coincidence the Apple kept the number of cores on its A series SoCs to a minimum (only moving to 3 with the A8X).


Can someone explain in layman's what the issue actually is, because the article, neither its comments here or on Reddit, seem to dumb it down for non audiophiles?

Audio processing in Android is low-priority, highly layered and each layer has extensive buffers. This means samples take a long time to move through layers, and as these delays accumulate you end up with fairly big and definitely noticeable audio latencies.

This is obviously an issue when doing real-time audio processing or generation: if you're using virtual instruments and the feedback to your cans is 1/10th of a second behind the interaction it's unusable.

But it's also a problem for more mundane applications: a 200ms roundtrip (100ms to move samples from the mic to the application on the talker's side, and 100ms to move them from the application to the speaker on the listener's side) is is more delay than the transmission time from somebody literally on the other side of the earth. Same with fast-paced game, audio feedback >100ms after an action is highly bothersome or even game-breaking (the sound is output several frames behind the gamestate)

Audio processing on Android devices is Slow. Even seemingly small delays (10 milliseconds) is enough to make apps seem clunky and erratic. This puts the Android platform at a disadvantage for developers whose product ideas involve delivering high quality audio performance.

The embedded graph with the U shaped figure pretty much sums up the author's search for the components in the software that are the culprits.

> Even seemingly small delays (10 milliseconds) is enough to make apps seem clunky and erratic

Not just that, if you try to play guitar with a 10ms latency from input to output, you'll find that it's impossible to keep rhythm at all. It's the same effect as a speech jammer: http://www.stutterbox.co.uk/

And I don't think many people outside the the tech audio community realize how many people have ditched their guitar amplifiers and are playing guitar purely through the iOS devices. It's a great use case for tablets, but is completely infeasible to do on Android at the moment.

> And I don't think many people outside the the tech audio community realize how many people have ditched their guitar amplifiers and are playing guitar purely through the iOS devices.

I'm super interested in this question because I honestly don't know. My assumption is that among serious guitar players (defined as individuals who play in groups or in front of people at least monthly) the number is almost nil.

I'm sure the number of casual guitarists (those who rarely play, but technically own one), this number is quite high.

I'm unwilling to even concede the tubes in my amp let alone the amp itself...

If you take out the word "purely" you're absolutely right.

Few serious guitarists would resign themselves to exclusively playing through an iOS device. Nearly all serious guitarists will do it on a regular basis though, which is the more relevant point.

Say you want to create an app that adds a sound-effect to your voice, live. On Android, the time it takes for the audio from the microphone to get into the app, be processed, and sent back out to the speakers is long enough that it'll be jarring to use. For musical instrument apps, the latency between touching the screen and the speakers responding is too long to be useful.

Imagine if you were playing an instrument, and your hand moved a little (36ms) before any sound came out.

Or imagine trying to listen to your own voice, but having the sound delayed by that 35ms, so it feels like someone is talking over you every time you open your mouth.

Audiophiles are a group of people who believe that they have magical hearing powers that are enhanced with oxygen-depleted gold cables and the scratch/hiss/pop of vinyl. This is targeted at musicians.

TLDR: Press a button, wait 200-300ms for a sound to play. This is android.

Going completely off-topic: Why has everyone recently decided to use ultra-thin fonts everywhere? On my system (24" FullHD, Win8.1, Chrome 42 and Ubuntu 14.04, Chrome 43) the text is thinner than one pixel and thus unreadable below 130% zoom. Sure, I can open up DevTools and fix the font-weight, but seriously?

My theory on this is that Windows 8 hyped this.

It's particularly problematic when it's being copied in mac circles because OSX has a biased font-smoothing algorithm that adds font weight (it's a design flaw). In other words, to get something to look thin on a mac, especially a non-retina mac, it needs to be very thin, sometimes less than a pixel thin. How other systems display borderline visible strokes varies depending on the system and the details of the font.

Oh, so I can tell whether the designer uses OSX by the fact that I can't read the fonts. That's nice.

The website font? I'm viewing it on a 27" 2560x1440 screen, Chrome 42 and OS X 10.10 and the fonts are normal sized for me. In fact larger than the HN font.

This squares pretty well with my personal experience. Pretty much every music geek I know chose an iPhone over an Android because of the music apps

I'm a music geek, I just chose to carry a dedicated music player.

Yeah I wasn't talking about that kind of music geek :)

There are some pretty great instrument apps available for the iphone, like the ikaossilator by korg, a tonne of great drum machines, loop apps, synths etc. And mixed with the Audiobus app that lets you route the sound of one app into the input of another (as long as both apps support Audiobus which most seem to do), the possibilities for creating music entirely on your phone are limitless

The issue isn't about having your phone play music, it's about you playing music on your phone, in some cases with friends who play on their own phones.

Ah, for that I have a workstation. But as my guitar playing is very poor it doesn't help a great deal.

It seems you're talking about two different things still. If you have an iOS device, you can connect an electric guitar to it via a dedicated (but relatively inexpensive) interface and have it sound like a reasonably amp plus some effect processors. It's really great for mobile musicians. I don't even mention Garageband and other apps which basically allow you to compose quite decent music on the go.

I spent a lot of hours fighting with the low level Android libraries while writing a Spotify App for Android back in late 2013. There is very little documentation on the web (especially for streaming Audio). Getting this working was a serious headache.

If anyone is interested, I pulled the OpenSL ES parts out and posted them to github.


Up to the ALSA driver step these delays would be the same on any Linux system. Do Linux desktop systems experience these types of delays? I have experienced these types of delays trying to setup a Windows box as karaoke machine. In fact, I've never seen a DJ use anything but a Mac. That leaves the question, how does Apple do it?

ALSA isn't the problem- JACK runs on top of ALSA, and people comfortably run well under 10ms. The problem is higher up in the stack. Even switching to the much-maligned pulseaudio could be a huge improvement: http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-f...

I would guess the driver model works closely together with the hardware which in practice roughly means you get samples from the ADC and put them into a buffer which is then directly accessible by the toplevel userland software. Which in turn puts output samples into a buffer which is then directly fed to the DAC. At least that is also how it works for ASIO or for example for data acquisition cards from major players like National Instruments.

So if I get it correctly the problem is twofold: there is some extra intermediate processing and the buffer size and sampling rate are fixed to 256 samples and 48kHz respectively? And which of these two does Superpowered fix? Or both? And what would be the lowest possible latency on for example the Nexus 9?

The lowest possible input-to-output latency of an audio workstation is always two times the period size (samples/buffer) (plus a few microseconds of internal delays in the ADCs, controllers, ...)

The sound chips operate on integral periods or a fixed number of samples, and when you get a period worth of audio data from your ADC, the DAC will already have started putting out the first samples of the next period. Hence, you prepare your audio samples for the second-to-next period.

Assume you have audio processing code roughly looking like this:

    while (1) {
        poll(); /* some API function waiting for the "next period" */
        read(soundcard, block_of_samples);  /* or let DMA do it */
        write(soundcard, block_of_samples); /* or let DMA do it */
Let's try some ASCII art:

      v- audio samples going into your soundcard
       _     _     _     _     _     _     _     _     _     _     _     _
      /1\   /2\   /3\   /4\   /5\   /6\   /7\   /8\   /9\   /0\   /1\   /2\ ...
         \_/   \_/   \_/v  \_/   \_/   \_/   \_/   \_/   \_/   \_/   \_/   \
       Period 1          Period 2          Period 3          Period 4
                        [*] here, Period 1 has been DMA'ed from the soundcard
                            to an mmapped buffer of your audio application
                          <~~~~~~~~~~~>  here processing of your audio takes place
                                       [*] this is the latest point at which processing
                                         | must complete so that there will be data for
                                         | the soundcard to output. DMA will start
                                         | from the buffer to the soundcard DAC.
       _     _     _     _     _     _   v _     _     _     _     _     _
      / \   / \   / \   / \   / \   / \   /1\   /2\   /3\   /4\   /5\   /6\ ...
         \_/   \_/   \_/   \_/   \_/   \_/   \_/   \_/   \_/   \_/   \_/   \
                                           ^- processed audio samples will come out
                                              of your laptop's speakers...
Obviously, your machine has to be fast enough to do the whole real-time audio computation in a little less than the time between two interrupts, that's the period size. And it must reliably be able to do this, because if it misses the time to have a block of samples ready for the DAC. If it misses that goal, a "underrun" will take place, and the audio application will have to resynchronize, possibly causing some clicking, intermittent audio, ...

+1 for this fine explanation. However I'm well aware of how this works and my actual question was really what part of the problem in achieving this ideal scheme Superpowered is solving. Or are they just targetting <10mSec latency? Are they getting rid of intermediate layers? Or are they just lowering buffersize? Or both?

Is 10ms really that big of a deal? I'm an amateur musician so have some experience playing in bands, but I have a hard time believing 10ms would feel off when playing with others.

9ms is the industry-recognised time that it takes before the brain notices that sync is out. If you have 2 woodblocks clapping and they are more than 9ms out of sync, you will hear one as an echo instead of 2 blocks at the same time. So if you are a band wanting to play remotely and you have 100ms network latency plus all the other latencies of the software layer, it's not going to be possible. Similarly if you are trying to drum in time with a track: more than 10ms latency from when you tap your finger to when you hear the audio, you'll have a hard time staying in sync.

IIRC it's a similar latency figure for VR headtracking (15ms), and also for touch screens (10ms).

Thanks for the 9ms reference, that's helpful. I'm honestly surprised it's that low.

The title isn't clear but (sub-) 10ms is the target.

The "problem" part is that Android devices generally don't come close, the lowest-latency Android device on the market today (barring Samsung's custom "Professional Audio SDK") has a 35ms roundtrip latency (ADC -> DAC), and many devices are way beyond 100ms (http://superpowered.com/latency/)

This makes more sense. 100ms and over sounds to me like a huge problem, while 10ms does not.

I came across this (which is a commercial so exaggerates the lag) -- but still quite hilarious (Swedish audio):


So I've got a few decades of serious music training under my belt. Multi-instrument.

I notice any latency above 7ms. I can compensate pretty well between about 7-14ms. Anything above that will affect the performance. Once you get to, say, 25ms, it's like I lose several years of training.

It's hard to relate to because most people don't have experience doing tasks demanding that level of precision. Perhaps it would be useful to think about what would happen if you introduced a delay between striking a key on your keyboard and the sensation of feeling it travel. You "can type" without any sensation at all (see iPads) but people who are serious about typing go to extraordinary lengths to make the keyboard feel right.

I think the biggest problem is that 10ms is more than enough to throw off a performer because it feels like a lag between when you hit the key and the note sounds. This makes it hard to play proficiently for anything with a fast tempo because it feels like the sound engine isn't keeping up with you.

For an audience, 10ms is going to feel like sloppy timing. That will be more of an issue with tightly timed music (techno/EDM) and less with a slow jazz ballad.

Really 10ms is ok, and you can cmompensate for it mentally. As they note, it's because Android music apps actually have a latency of 100ms or worse - that it's a problem.

10ms is fine. The problem is that Android apps cannot reach 10ms. The title and article aren't very clear on that point.

When recording guitar on my computer, I can feel the difference between 10ms and 5ms. 10 isn't horrible, but it's definitely not optimal.

If you think about it, speed of sound is about 340.29 meters per second... also 0.34029 meters per millisecond... also 1.11644 feet per millisecond.

So 5ms vs 10ms latency is like the difference between having your amp 5 feet or 10 feet away.

Worse, if the latency shifts, you start getting phase problems.

10ms is OK, 50ms not so.

This is the one downside with using Android I have experienced. I came from an iPhone 4 and as an amateur musician I used it to produce music, record ideas on the go and whatnot. When I moved to a Samsung Galaxy S4 (my first Android phone) I was mortified. I grown to love a couple of drumming applications on iOS, when I moved to Android there were a few sub-par applications and they all had ridiculous latency, to the point they were not usable and I had to bust out my keyboard at home to record drums (no drum ideas on the go anymore). You hit a snare in a drumming app and it felt like hundreds of milliseconds of lag, horrible.

It was so bad to the point where I went out and purchased an iPad just so I could have that freedom of recording ideas when I am not at home. The iPad and iPhone offered what sounded like basically no latency at all, I never understood why Android devices struggled (but I speculated and assumed it was how the audio was being processed). I considered moving back to an iPhone, but I love the freedom that Android affords me and the competition, so I stuck it out and kept using the iPad.

I recently purchased a Samsung Galaxy S6 Edge and while I still notice some slight latency, it is usable again. I can finally jot down ideas when I am away from home using my phone again. Research seems to yield some improvements that Samsung themselves have made to their hardware and software, not to mention the professional audio driver that allows the use of a third party audio interface (great feature by the way). Google needs to make this a priority, because believe it or not a lot of people use their tablets and phones to produce music. We need to fix this.

Music production might not seem like a big deal to Google, but Apple definitely gets it and have from the beginning. The one aspect I miss about owning an Apple device, not enough to make me switch back but definitely a good feature of iOS devices.

Anyone with any insight about how Apple does this?

Probably by custom and known hardware. Let's say it takes 1000 working hours to produce a low latency driver/firmware; Apple only needs to do that for a few devices, but every Android manufacturer has different audio chips and most of them only care that audio works, not that it works well - so they do not invest that time for each device type. Having known hardware also works for the application developers, because they can optimize their code for the specific device (knowing that you always have a 40ms latency is way better than knowing that the latency is between 35 and 300ms).

Also, Apple is much more motivated to get the audio path right, having started with the iPods and selling a lot of music.

The latency problems arise with ALSA (mostly hardware-independent) and AudioFlinger (totally hardware-independent).

That's technically correct but an extremely inaccurate summary.

The reason ALSA and AudioFlinger add latency is to hide hardware-dependent differences as well as kernel-caused scheduling issues & policy decisions.

To achieve low-latency you need real-time scheduling, something Linux has with SCHED_FIFO but it's a bit kludgy, and getting the policy right on that is tricky (obviously you don't want a random app to be able to set a thread to SCHED_FIFO and preempt the entire system). So you have to restrict the CPU budge of a SCHED_FIFO thread, and you have to only allow apps to have a single SCHED_FIFO thread. But how much CPU time you give it needs to depend on the CPU's performance in combination with the audio buffer size that the underlying audio chip needs (and those chips also have different sample rates, is it 44.1khz or 48khz or etc...).

tl;dr: this is insanely hardware-dependent.

ALSA is super. The HAL connecting ALSA to AudioFlinger is the major problem, plus the audio stack's "push" philosophy.

Look at the architecture of the Core Audio API and compare it to the parent article:


Besides the fact that you are writing in C/Objective C, most heavy lifting is being provided by Audio Units which are specifically designed with common datatypes to be chained, composed and executed in low-latency situations.

Furthermore, most common things you would need in your app like mixing, conversion, timing, etc are provided as highly optimized services by the system.

Also CoreAudio drivers (HAL) calls back every 11ms to empty/fill a buffer. So it starts with lower latency?

Some answers on quora. TLDR Apple have been working on audio a long time.


Or, rather, actually care about audio. Google (/Motorola Mobility, when it was owned by it) doesn't even care if a phone ships with a broken headphone port and refuses to fix it.

Seriously, the absolute garbage headphone jacks on the two flagship Android phones I owned (Motorola Milestone and Galaxy Nexus) are the reason I now own an iPhone 6. I still mostly prefer Android as an OS, but at least Apple is willing to spend the extra 50 cents on a headphone jack that lasts for more than a couple weeks.

Same way they do with anything else they care about: by controlling everything in the stack from the hardware up.

I don't see how this has to do with hardware rather than architectural decisions

AudioFlinger + Alsa take a lot of time, as seen per the graph

But Android seems to take the option of least effort and "works most of the time" (which they have their reasons to)

The hardware you intend run on will impact your architectural decisions.

Given the wide scope of hardware targeted by android, it's not that surprising that it performs less well than a system targeting a very limited set of devices.

Having said that, it performs 'well enough' for the vast majority of use cases.

By controlling the hardware, they need fewer drivers and they can skip one of the abstraction layers.

That whole theory is blown by the fact that CoreAudio supports 3rd party audio hardware. There is a thriving market of soundcards and audio peripherals for the professional Mac OS X market. Those all run CoreAudio, which is exactly what iOS runs.

> they need fewer drivers

This suggests that you can work focused on one driver and therefore save developer-time. However, the drivers on Android are made by a bigger workforce, which must be taken into account.

> they can skip one of the abstraction layers

The post explicitly states that the HAL ought to add no latency at all.

Besides the point. When you know the hardware you can shave away the upper layers.

Consider e.g. AmigaOS.

AmigaOS let you obtain a pointer directly to the screen bitmap to update your window contents with no buffering or clipping. It could do that because originally all the hardware was the same, or close enough.

Then graphics cards came along, and you didn't necessarily have a way of writing directly to the bitmap. Suddenly you had to use WritePixel() and ReadPixel() and similar, which would obtain the screen pointer for the window, and obtain the display the screen is on, and find the driver corresponding to the screen, and call the appropriate driver function via a jump table.

Similarly, the AmigaOS had functions to e.g. install copper lists (the copper was a very primitive co-processor that could be used to do things like change the palette at specific scan lines), which also wouldn't work at all on graphics cards.

This is why knowing the hardware is part of a limited set matters: You can define your API to match the hardware very precisely, or even expose hardware features directly.

One aspect of CoreAudio that, IMO, contributes to low latency: CoreAudio provides the buffers to process to the AudioUnit. Not the other way around. This permits CoreAudio to chain AudioUnits while minimizing buffer use. In the ideal case where all AudioUnits support the same audio stream format they are all provided the same buffers to process. Which can get close to zero-copy, zero-allocation audio rendering.

I don't know enough about android to say that this is the key aspect tho. Perhaps the low level API of android is the same? I didn't find an easy reference in a quick search.

By having their devices all roughly the same, it is easier to mitigate in app development than the variance in latency on android devices

Any word on how Windows Phone does on the latency metric?

There is this thread: https://social.msdn.microsoft.com/Forums/en-US/14f96b68-60e3...

and it seems to be that anything below 140 ms latency seems impossible on Windows Phone.

Compare to a nice grand piano: 1 meter of air ~3ms. (hammer travel after you bottom out the key - another 10ms? A MIDI keyboard doesn't know the velocity of the note (or send the note at all) until key travel is mostly done too.)

I have only 5 ms of ASIO buffering using my PC but I don't actually know how much actual key->sound latency; I do know that using headphones it's only slightly less immediate than a nice grand.

I think the low MIDI bit rate (31kbaud) also adds a little latency on chords.

It would be nice if keyboards in the future immediately sent a lower latency+precision keypress-initiated notice (before full key travel) so disk-based samplers can make ready for that note (and make any initial attack sound that's appropriate).

Great explanation, thanks for taking the time to write this up. A while back I figured that there had to be complicated structural reasons for the lack of progress on the notorious "issue 3434", and I decided to go with native iOS for mobile audio projects rather than wait. Seems it will be quite a difficult problem to solve (although perhaps a library that is not totally backwards-compatible would be easier to optimize). But kudos for taking a crack at it, and good luck. I'm interested to hear how it goes.

> “Consumers ... have a strong desire to buy such apps on Android, as shown by revenue data on iOS...”

That's like saying, “Consumers have a strong desire to buy gourmet steaks from McDonald's, as shown by revenue data from Ruth's Chris.”

No, McDonald's serves billions of meals by understanding its own market, not by catering to diners at Ruth's Chris. And naturally, comparing top sellers at each will give very different lists.

This is not a good analogy. McDonald's and Ruth's Chris are not direct competitors, they are targeting completely different markets. Android and iOS are in direct competition.

If we have to use a food analogy, I would propose Starbucks and a competing coffee shop. Almost all the major players in app development make corresponding Android and iOS versions of their apps. Imagine if Twitter or Snapchat only had an iOS or only had an Android app. Similarly, customers expect to have certain things available at all coffee shops, regardless the brand behind them: cappuccinos, flavored syrups, alternative milk choices, etc. So a better analogy would be one coffee shop chain not supplying their stores with espresso machines, or flavored syrups, or alternate milk choices... or maybe just not letting them have refrigerators at all to keep milk cold.

The prevalence of orders for highly-sweetened, cold, milk-based drinks at Starbucks almost certainly means that customers would order the same thing at similarly positioned coffee shops if it was available, and indeed most coffee shops offer such drinks now.

There is no reason to think that somehow Android users are in such a different market that they would have no interest in these apps on Android even though in almost every other case platform parity is expected from large players in the app market.

> This is not a good analogy.

I'd say it's a bad analogy because most people who don't live in the US have no idea what Ruth's Chris is.

Nor do most people who do live in the US, I would imagine... I infer that it's a high-end steak joint, but I've never heard of it.

I haven't heard that name since leaving the Midwest.

They have locations in DC, San Francisco and South Texas, so they're around.

North Texas, too.

> This is not a good analogy. McDonald's and Ruth's Chris are not direct competitors, they are targeting completely different markets.

So it was good, because that was my point.

> Android and iOS are in direct competition.

Citation needed. ;-) But seriously, I don't think they are. Too many observable differences in goals and strategies across each platform for your assertion to be true.

> There is no reason to think that somehow Android users are in such a different market that they would have no interest in these apps...

On the contrary, there is every reason to think that. Outside a very narrow and limited class of tech geeks who argue on technical merits, the broader market stats point to these being very different segments with different consumer profiles valuing different things.

It's also a poor analogy because McDonald's is a futures trader with a private hedge (its stores) while Ruth's Chris is a steakhouse.

Analogizing McDonald's:Ruth's Chris::Android:iOS is dishonest. While I agree you can't necessarily draw that conclusion, it's like having a major (equal caliber!) seafood restaurant deciding whether to start selling steaks because Ruth's Chris is doing well. It's still apples to oranges, but it's not apples to Twinkies.

Oh come on, if you look at the major apps and games, Android gets the same things at like a six to eight month delay, if not better than that. And if you look at apps and games available on both platforms, the correlation between downloads is probably pretty good. It is nothing like apples to oranges. It's like Macintoshes to Granny Smiths.

There's a significant unit cost associated with high-quality beef; it would be literally impossible to sell at McDonalds prices.

This is not true for low-latency audio; it's just something that Android has not prioritized.

This is a pet peeve of mine. Hackers, when measuring time for software performance, please use something smaller than milliseconds. 0ms is a Dirty Lie!

In this case the performance measure in question is whether the latency is perceptible to humans or not. Milliseconds is the appropriate unit. Nobody is going to notice differences of less than 1ms.


If x is the unit of measure for the end-result and we have n components that add together, then we need at least x/n as unit of measure for the performance of each component. x/(2*n) is more reasonable to not deviate from the target performance more than one x after rounding in the worst-case.

Yes. But if some of your stages have delays of tens of milliseconds then there's no point knowing how big the delays are in the stages that are lower than 1 millisecond.

Once they're all below 1ms it's worth increasing the resolution.

Well, e.g. 20x 0.4ms (rounded to 0ms) is going to be noticed quite soon.

Their goal is to produce and "easily digestible overview", which necessarily means removing a lot of the detail that a hardcore performance measurement person would find interesting. They were right to stick with milliseconds throughout.

This is actually a known bug hanging there since 2009:


And very good video from Google about it:


I think the article may be ascribing too much technical reasoning to why iOS has a better community of music apps - remember that Apple is also has Garage Band and dominates the online music sales business. It's fair to say that music is a core part of Apple's brand, and so the platform has a much greater draw for people who prioritize music software.

It's possible for Apple to have stuff like Garage Band because they have the ability to do low latency audio.

Garage Band (and Apple's position as the kings of music) predates iOS and Android. That's my point. Even if Android had good low-latency audio, I doubt that Android would still see the kind of audio-community we see with iOS because Apple is the brand for that sort of thing and has been that since the fruity-iMac-days.

Does anyone actually believe the reason " the Google Play store, the Music category is not even a top five revenue producing app category." is due to audio latency? I am not disagreeing with the fact that there is a lot of audio latency (no worse than Windows 8) but maybe it's time to explain cause vs correlation? Interesting article all the same.

It wouldn't be satisfying.

I'm an Android user and have given thought to developing on Android but the only things I'm interesting in doing on a mobile platform are synthesis and sequencing. Based on everything I've read.. it seems like it would be a waste of time. Forget commercial viability, it doesn't even sound like it would be worth it to make something for personal use. I get 7ms roundtrip on my Linux music workstation, with my audio interface, JACK, ALSA, Bitwig.. total cost maybe $1600(when buying the machine I was trying to figure out how much I could leave out/how much of a cheapskate I could be).. not a huge sum but it shows how important low latency is. A lot of the people buying music apps(synthesizers, toys) own hardware synthesizers, have computer recording/sequencing setups, or play acoustic instruments.. some of them spend multiples of 10K on their setups over time.. when they pick up a phone or a tablet, they're not comparing it to a flash site or the performance from an integrated soundcard on a $300 laptop.. they're comparing it directly to the immediate physical response of their instruments or the low-latency response of the computer audio setup they've invested in.

High latency feels incredibly sluggish and harms your sense of rhythm; unpredictable latency is just murder. Given that most makers of music software are themselves music makers.. when they pick up a device and see its audio performance is so poor the thought process is something like "If I can't even make something I would use myself on this, what's the point in trying to release something to the public?"

There are some good music apps for Android, some that I would even say are very good efforts, but they're very few, and still suffer from latency pretty badly.

I think the argument comes down to the fact that audio latency is preventing the makers of the most popular/lucrative apps from developing for Android, the implication being that they would develop for Android if the latency wasn't there and the music category would therefore increase in revenue numbers.

And so your preferred hypothesis is, what, everyone just forgot to write nice professional audio and music tools for Android?

They did not forgot to write nice professional audio and music tools for Android. They can't, because of audio latency and dropouts.

Perhaps you missed the section, "How Android’s 10 Millisecond Problem and Android Audio Path Latency Impacts App Developers and Android OEMs" and in particular the bullet points with specific examples of the problems high latency causes.

The difference between the music app business on Android and iOS is a complex one that really encapsulates the meaning of a "brand."

Apple has a long history of market success with creative professionals, so to make sure that they retain this success, they focus heavily on the product issues that would be relevant to creative professionals. These creative pros are who the advanced amateurs look up to, so when the creative pros, let's say Ryan Lewis, are all using Macs and iOS, the amateurs do too... Apple's success in these markets is a long history of positive feedback between engineering, marketing, and branding.

If our site goes down (again), the webcache can be found here:


One piece of software that does do very well on Android is Caustic3. Admittedly, I am likely creating loops instead of live playing but the experience is a good one.

Even better is that you can run the OSX/Windows version for £0.00 and copy your projects over to work on a "real" machine after tinkering on your phone throughout the day.

Developing with the NDK will not solve this problem? Does it still have to go through alsa and audioflinger?

Even if you write an application in C for a GNU/Linux distribution you have to use whatever sound Linux-subsystem is deployed, unless you have root permissions and want to mess concurrently with the audio stack (you don't).

The problem is not the fact that they use ALSA and AudioFlinger. ALSA and AudioFlinger just use at this moment too much time. This could be improved by decreasing the period size.

All of the measurements and the latency graph shows you a pure NDK implementation.

I am writing a game that implements an audio sequencer and it runs great on iOS and OSX. But I had to write a karaoke app for Android recently and found an amazing 100ms latency there which makes me think I won't be able to port my game to Android... Wish Google fixed this.

It's interesting that their business bases on Google not fixing this problem.

What happens when they fix it?

People have been assuming Apple's success depends on one or two minor features that someone else will copy and destroy their business for, well, ever.

The audio thing is the number one reason I will never use the Android platform.

At this point, iOS devs have so many numbers of years ahead of Android devs in the music department, and the simplicity of porting existing Mac OS compatible audio stuff makes it incomparable.


Two people are going for a race, over the same distance. One starts 5 years ahead of the other.

At that point, is anyone even watching the race anymore?

Sorry, despite it's absolutely God-awful flaws such as no user-accessible filesystem, no ability to do a basic task such as download an MP3, without GarageBand (and it's awesome ability to open files I sketch out on the go right in Logic on my Mac...), without iElectribe, and the list goes on, I absolutely just can't even use the platform, and I have no reason to go back 5 years technologically.

Sorry, Android, you already lost, here is a user you can never have.

Not really good metaphors. Its not necessary to cover the intervening ground, to catch up technologically. Fix latency, app developers port and voila - Android looks pretty good again.

Did you not get the '5 year head start' bit?

Doesn't matter how fast developers create applications that are amazing, iOS has already had all these years to create an already fantastic subset of these applications.

Doesn't matter how 'good' it looks - for the use case of music, android has simply lost.

That was actually my point. There's no need for Android development to cover that ground - it can just start where we are now, with cool apps that can just be ported over. Once the Android audio latency issue is addressed.

App developers LOVE to port, its another sale with small effort.

I feel like I learned something about the problem. I was expecting to also learn about the solution but instead got hit with sales pitch at the end there.

How does Superpowered get around ALSA and the Audio Flinger?

While the specific problems differ, in general iOS is far easier to work with when it comes to video as well as audio. Android video is a never ending headache for us.

What downsides would there be to just halving the period size?

The heart of the problem is simply the limited timing and computational resources in a multitasking non-realtime OS.

The smaller the audio buffers are, the more prone they'll be to starvation. Only if the application can be guaranteed to receive interrupt service and/or thread timeslices at a 100 Hz rate or better is it possible to achieve audio latency of 10 milliseconds. That's a difficult thing to guarantee in a modern consumer OS. It can be done, but it won't happen by accident, only by design.

The penalty for failing to service your 10-ms buffer is a dropout that sounds much worse than slightly higher latency, so there's an incentive to use larger buffers than necessary at every link in the signal chain. From the point of view of the OS vendor, musicians might complain about latency, but everyone will complain about dropouts.

My guess is that the large buffers are covering up for scheduling issues. On my devices (Moto X, Nexus 7) I can regularly get Android audio to stutter by connecting the power or switching apps. (And it'll happen on its own sometimes when it picks, as it so often does, a terrible time to update apps.)

The easy fix to that is to make the buffers bigger. Doing that, though, increases latency. The hard fix is to do what Apple has done and build for audio from the ground up.

Android's audio stack is "pushing" audio down to the audio driver, with the audio threads sleeping between every push. In other words, the audio driver's interrupts are not scheduling the actions. Having smaller period sizes increases the risk of not "guessing" the sleep scheduling right, resulting in many audio dropouts.

Using a "pull" method is required for low latency audio, where the audio driver's interrupts are scheduling when audio is passed to and pulled from applications.

Another downside of halving the period size is CPU load/battery drain. There are simply too many layers in Android's audio stack, and it has quite a few unoptimised code as well (such as converting between audio sample formats with plain C code).

Given that each sample means a kernel mode-user mode trip and various context switches, I'm going to guess that halving the sample size globally would make your phone's audio handling consume more CPU.

Audio lag is definitely not the worst of Android's lagginess problems.

I hope they can fix this for Linux in general, and not just for Android.

Properly configured JACK should already achieve single-digit millisecond latency on Linux.

Does this cause audio de-sync issues on Android?

Happy to see this side is getting attention!

it looks it's currently down?


I'd heard over the years that working with isochronous systems was difficult. I'd done a number of real-time systems before, and written OS schedulers and NTP-like systems and so forth. A little audio work should be a walk in the park, right? A little manly-man programming from the wrist and we move on to real problems. So I walked into an audio project thinking that "Oh, this latency and synchronization stuff, how bad could it really be?"


Nailing down buffer-bloat and sources of latency and jitter in an isochronous system took several months, the last few weeks of which were 80-100 hour weeks just before ship. Several times we thought we'd fixed the issues, only to find that our tests had been inadequate, or that some new component broke what we had built.

I remember nearly being in tears when I finally realized what the clock root of the audio system actually was, and that it wasn't what people had been using. From there everything fell into place.

Don't pull the number of buffers you have out of thin air ("Oh, six is enough, maybe twelve, don't want to run out in my layer of the system, after all...") Don't assume your local underflow or overflow recovery strategy actually works in the whole system. Don't assume your underlying "real time" hypervisor won't screw you by halting the whole god damned system for 20ms while it mucks around with TLBs and physical pages and then resumes you saying, "Have fun putting all the pieces of your pipeline back together, toodle-oo!" Put debug taps and light-weight logging everywhere and attach them to tests. And know what your clock root is, or you will be sunk without a trace.

Isoch is hard.

A real postmortem of this would be a fascinating read.

Well, if something bad happens I might have to work there again, so . . . not for a few years, probably :-)

>when I finally realized what the clock root of the audio system actually was

How did you finally figure this out -- any recommended references?

Flagged for sexism.

None of this is correct.

Down votes don't make me wrong.

"A little manly-man programming from the wrist and we move on to real problems."

See... it's the little things like this. I am pretty sure that was not your intention, but please do know that turns of phrases like that hurt a little, and exclude a little.

To see what I mean, s/man/jew/ or s/man/white/ or some other category and see how it reads.

Well, if I wanted to offend, I would have. It is frankly no challenge to deliberately say something un-PC and widely offensive. It is also apparently little more challenging to utter something more subtle and more narrowly offensive.

I look at "manly man" depictions of, well, manly men (for instance, watch Kevin Kline's performance in A Fish Called Wanda) as parody. Monty Python did it. Mark Twain probably did it. Do they offend people? Sure. Do people not get the joke? Oh yeah. Did that stop the artists in question? Not really. Am I comparing myself to great artists? I'm not worthy, but I study at the feet of masters.

So, in my writing (and I've done a bit of it; look at my profile and my blog) I tend not to give a flying unmentionable about who I piss off. Now, HN is different because it's not my ball game here, but hey, if everyone wrote so as not to offend then the world would be a dull place. I've opposed what I believe is the actual evil -- Political Correctness -- for decades, and I'm not going to stop now.

I like to take language out and give it a violent shake. Have fun with it, see what it can do. I'm not going to change the world by substituting something benign and utterly unoffensive for "manly-man," so fuck it. That's what down-votes are for. I'm not even sure what I'd put there instead, quite frankly. Your suggestions don't preserve the meaning at all. Kevin Kline would be sad.

I regret that you were offended.

Now, can we Godwin this stupid thread and get it over with? Someone toss in the grenade, okay? in 10, 9, 8 . . .

"So, in my writing (and I've done a bit of it; look at my profile and my blog) I tend not to give a flying unmentionable about who I piss off."

Then I'm sure you're also OK with an ever-shrinking subset of the world being willing to put up with that tone.

It all works out, I suppose.

So you like to be an asshole and impede communications because you don't have anything of value to add, and you get offended when someone calls you on it?


I enjoyed the insights in your first comment and looked forward to reading this thread for obscure tidbits on audio latency on Android. However, instead of getting replies on your experience, we get a huge meta thread that contributes nothing to the original topic.

Sadly, some users have to disrupt a thread, no matter how inane or off-topic it might be :(. Sometimes it gets to the point I don't even want to read or contribute to the discussions on HN anymore and add the site to my hosts list so I won't be tempted to read it out of habit. Eventually I come back, but the times in between get longer.

After skimming the comment history of the person you're replying to, I'd just ignore them, because they have a history of doing this. If HN had an ignore list, they would surely be on mine. Won't be the first or last time they disrupt (troll) a thread.

I don't get it.

I read his comment as self-deprecating humor. If he had used white, jew, or whatever else may apply, it would still be self-deprecating humor, which is generally non-offensive for the simple reason it applies to the self.

So, if cultural-linguistic idioms make use of a specific gender, they shouldn't be used because someone might get offended? At some point you have to assume that the other party is mature enough to understand and tolerate speech that isn't exactly tailored to their gender or sexual preference or race or what have you. I'm not going to cut out all the parts of my vocabulary that aren't perfectly 'equal' in the eyes of another.

The phrase he used has a fairly well understood meaning, so of course the substitution would read differently. It would be an entirely new phrase. Different words/phrases have different connotations.

This comment is terrifying. You cannot say anything without being called out with this sort of terrible attitude.

Of course you can. Language is intensely rich, there's such a tiny subset of stuff that we're now starting to recognize rubs people the wrong way/excludes people/makes us sound like cavemen.

I don't at all get what benefit there is to being able to say "manly-man programming," either from a communication or a social standpoint.

I don't get what's lost when we replace "manly-man programming" with anything less stupid and awful.

What's gained is language flair and color. You may not see it but I do.

This is like any other literature. Don't force the world into newspeak.


We should apply parent's logic here :)

I did the substitution in my head and the statement sounds silly. Just like "manly-man" sounds silly.

Is this genuinely offensive and exclusive though?

To call it "offensive" may be a bit strong (though I'd say it's offensive to those of us who enjoy good use of language ;) ), but exclusive? Sure. It links superior programming and masculinity.

You're sitting in a conference room full of women who are developers, and you start saying, "So let's do some manly-man programming and get this done," what are these women supposed to think?

One would hope they were mature enough to understand an idiom, and not so delicate that they would need to be handled with kid gloves.


So if your dev team consisted of a mix of men and women (not even counting if anyone identified as neither or somewhere in between), you'd still say "Let's put in some manly-man programming time" to them?

What if your dev team consisted of just one employee, a woman? Would you still tell her to put in her manly-man programming time? I'm having trouble seeing it. "Jane, it's crunch time. Can you man up and put in some manly-man programming time this weekend?"

I already do this exactly, in the case where it's one woman it would be humorous to use that statement and would lighten the mood.

If someone is looking to be offended then they will find a reason no matter what is said or how it's said.

Intent is the thing that should be judged, anything else is a foolish way to live and you will constantly find yourself offended.

I'm sure the old people I know who still refer to black folks as "negros" don't mean anything by it. I also bet the people who throw around "gay" as a generic pejorative aren't thinking of actual gay people when they use the word as an insult. That doesn't make either one acceptable here in the 2010s.

Same with telling a woman that doing something "manly-man" style is doing it skillfully.

And even if you know it's going to be taken as humorous, let me give you a hint: it's not very funny. It's just not very good as humor. There is much better material out there if you're going for a joke.

I am black and would not be offended if someone referred to me as a negro, if they mean no offense, then I take no offense.

I've been referred to as African American, Afro American, colored, person of color, black, negro, etc. It makes no difference to me personally. Unless intended otherwise, it's just a point of reference.

Would you say that your personal experience is generalizable to many/most people of color?

If, say, someone from another company who's visited your offices for a day later refers to you as "that negro in product development," would it be typical that other people of color would take no offense at that use of language?

I'm curious about it. I don't have that lived-in experience, so all I can do to understand is ask and listen.

I have customers from Latin America and negro in Spanish simply means black.

Why would I ruin my day and relationship with someone by taking offense to a word when there was no malice intended?

That may be, but I would appreciate if you could also lend me your insight into the questions I had asked.

"not so delicate that they would need to be handled with kid gloves."

See... it's the little things like this. I am pretty sure that was not your intention, but please do know that turns of phrases like that hurt a little, and exclude a little.

To see what I mean, s/kid/jew/ or s/kid/white/ or some other category and see how it reads.

"phrases like that hurt a little, and exclude a little."

I agree with you on that part but your substitution doesn't make much sense.

The original comment links programming competency to masculinity, which is a problem. It's as simple as that.

Why is that a problem?

You can't really be serious.

Sure he is. I for one look forward to when we drop these asinine phrases like "manly-man" from casual conversation and start speaking with a better, broader, more meaningful, and richer kind of vocabulary.

I for one look forward to when language can be used in a way that the original author sees fit and without having to pander to every gentle sensitivity or taste.

I don't know how the original author intended it, except that they seem to equate programming ability with masculinity.

If that's what they meant, that's a pretty retrograde attitude.

If that isn't what they meant, maybe it's time to update their vocabulary.

We should be considerate in how we listen as well as how we speak. It does no good to speak insensitively, nor to listen oversensitively. Communication is a two-way street.

When I feel offended, I treat it as an opportunity to learn about a different viewpoint. Sometimes it makes me change my mind. It always gives me a better understanding of people.

The thing is, you've got a woman right here telling you, "this is exclusionary and seems stupid on the face of it." We don't even have to guess at how it's taken by someone who doesn't fit into its built-in description; they're telling us in plain language.

The different viewpoint, I guess, is that the original author has some outdated views on gender competence.

I've seen lots of comments saying "oh but it adds flair to language," and that's a reeeeeally weak defense. There is much better language available; that stuff just sounds stupid at best and retrograde at worst.

I agree that the comment was insensitive. I never said otherwise. I was pointing out that you missed a case: it is entirely possible “that is not what they meant and they do not need to update their vocabulary”.

Of course it's possible they didn't mean to equate programming skill with masculinity. That doesn't mean that they don't need to update their vocabulary--just like those those elderly people I know who still use words like "negros" to refer to black folks.

Sometimes, the world changes, and it's up to us to keep up with the times.

(It's funny, I bet you anything if the original author had said "this negro gentleman at my workplace...", no one would be defending his retrograde use of language.)

You’re not wrong, but we’re arguing about different things, and I’m not sure how I can make myself clear.

I think they didn’t mean to equate programming skill and masculinity. I also think they should be more sensitive in how they speak. I also think it’s dishonest to ignore the fact that one can be true without the other.

Furthermore, “manly man” is a wacky satirical caricature of “real man”, a problematic stereotype. “Negro” is just a problematic stereotype. I think we should find a better comparison, but one doesn’t come to mind.

Actually, I think you've hit on something important: you're saying "manly man" is wacky and humorous, but "negro" is problematic. The only difference here and now is how it's no longer acceptable, at all, to play on race stereotypes, not like it was 50 years ago.

Times change, and we can either keep up, or we can let ourselves becomes relics from an earlier era.

“Negro” was never humorous, as far as I know—it was just the name for a black person, or more accurately “a black”, through the lens of the culture at the time that dehumanised them by identifying them solely with their skin colour. However, I do see your point and I agree that it’s important to change with the times and adapt our language to the new, and hopefully more enlightened, cultural and historical context.

1. A woman saying something is exclusionary does not make it exclusionary on the basis that it's a woman who said it.

2.>original author has some outdated views on gender competence This is libel. This is not at all a conclusion that can be drawn from that user's posts. It's an unfair presumption and you're wrong to go around stating your opinions of others as facts.

On point 1: if a woman says she feels excluded, then it's exclusionary. It's not up to you to decide how she feels or tell her how she should feel.

On point 2: I can't even dignify that "libel" remark with a response.

Snesker is a holocaust denying troll.

The fact that this comment https://news.ycombinator.com/item?id=9200905 is still live and wasn't heavily downvoted says something about the current HN audience.


Thanks for pointing that out.

What part of my comment do you object to? Challenging the official story of the alleged holocaust DOES cost you your freedom in Germany, and also to a lesser extent in Canada. Do you think it should be a crime to disagree with the government? With Jews? With the 'general consensus'?

Furthermore, my remark doesn't prove in the least that I'm a troll or a holocaust denier. Use your logic. The fact that your comment is still live and wasn't heavily downvoted says something about the current HN audience.

Someone feeling excluded does not make something exclusionary. It's not up to you to decide someone's opinion is objective fact [i]simply because they're a woman, I might add[/i].

Sure he is.

Indeed. I'm glad at least one person got my point.

(A minor nitpick, tho: for the record I happen to have been born with two X chromosomes. Good guess though; especially on HN threads like these.)

Sorry! My bad, thanks for pointing out my assumption.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact