Hacker News new | past | comments | ask | show | jobs | submit login
The Case of the Extra 40ms (netflixtechblog.com)
597 points by atg_abhishek on Dec 15, 2020 | hide | past | favorite | 197 comments

This is still bad engineering on Netflix's part. You can't and shouldn't rely on your audio handler getting called on time via a timer in order to keep playback stable, especially not on a non latency sensitive use case, which Netflix very much isn't.

If you're doing real time audio, your processing loop needs to be directly driven by audio hardware buffer events/interrupts (through as many abstraction layers as you want, but a traceable event chain nonetheless, no sleeping guesswork or timers - and ideally all those threads should be marked real-time, never allocate any memory, and never lock or block on anything else). This is what serious real time audio systems like JACK do. And if you're not, you need to be capable of buffer filling way more than one frame's worth of audio - at least 100ms, preferably much more. Not just for robustness, but also for battery life.

Sure, there was an Android bug here, but the way Netflix designed this is fragile and, as someone who has messed with audio/video programming enough, wrong. Had they done things properly, they would've been insulated from this OS bug.

This kind of bug is best used as a learning experience: what happened here consistently and resulted in completely destroyed playback on one device, is something that has always been happening sporadically and hurting all of your users whenever something causes the CPU to stall or slow for long enough to miss the deadline anyway. Instead of just fixing the bug, make your software robust against these cases, and then all your users win.

Android does not have a functioning audio API. And really doesn't have the ability to mark threads as real-time that will stick. I've worked on Android game ports and done audio stack work, the whole thing is a shitshow and barely functions as documented. What their support team recommends in one version is quickly broken by hacks done by chip vendors and random bugs added to OS versions. Nobody at Google even tests basic scheduler behavior such that an extra 40ms can sneak in completely unnoticed. I've seen stuff far worse than this, they once broke their own cert test and nobody noticed for half a year.

Sure, but this is all painful for real-time use cases, like audio production and games. Playing video is not one of those, and there the more you buffer the better the battery life thanks to reduced interrupt load and longer CPU idle states.

>Android does not have a functioning audio API.

I work on an app which relies on fairly low latency real-time audio (triggered by a MIDI keyboard) on both iOS and Android and have had more than my fair share of headaches with Android audio, but have to say that the Oboe audio API in the newer versions of Android has made things quite a lot better. We use JUCE as an abstraction layer over the audio APIs.

> And really doesn't have the ability to mark threads as real-time that will stick.

Yeah, we found this is the cause of a lot of real-time audio issues on Android. If you use systrace you can see that your audio thread is hopping from core to core. We get around this by manually setting the affinity of the audio thread to a certain core (working out which core to use requires some guess work sadly, there's meant to be a way to ask for the "best" core but it doesn't really seem to work).

Out of curiosity, are the audio APIs any better on iOS? Core audio seemed to work pretty well the last time I used it on OS X, so I assume it does, but I'd be curious to hear your take on it.

Much better. In general, "creative" APIs (e.g. audio playback, camera/image processing) tend to have well-rounded APIs on iOS.

Probably unrelated observation. I use a MacBook for personal use and a Windows laptop for work.

For many things, they both work well in general (Windows is better for "document stuff" and file management, Mac a little better in window management etc., but comparable in general).

But when it comes to media stuff (think: cropping a lossless audio file and compressing the result into an AAC or MP3, resizing 20 images, transcoding a movie from one format to the other, removing one page from a PDF), on Mac it can usually be done with built in tools. On Windows you often need either a prohibitively expensive software suite or go on an ad infested crapware hunt resulting in a borderline unusable tool.

> Mac a little better in window management

Can you elaborate on this? I've been using a Mac for work for a couple of years now and I find the out-of-the-box window management experience pretty bad compared to Windows. No snapping to sides or corners, no quick resize to standard sizes other than maximum, and most importantly no easy way to change between multiple windows of the same program using keyboard shortcuts. I've been using Magnet [1] and Contexts [2] to improve things, but it's a shame I have to use paid third party apps for basic functionality like this.

[1] https://magnet.crowdcafe.com/ [2] https://contexts.co/

Windows definitely improved significantly with Win10 in this regard. But Exposé (or whatever it's called now) invoked via a swipe gesture is fantastic. Swipe up with 4 fingers and I see all windows. Swipe down with 4 and I see all windows of the current app. Combine this with Spotlight to quickly find and open apps/documents.

But, everyone has a different workflow. I can imagine that if you're a heavier shortcuts user, Windows works better.

Expose is pretty good! But the Windows 10 task view [1] (Windows+Tab) does practically the same thing.

[1] https://www.windowscentral.com/how-use-task-view-windows-10

Spotlight is great and puts Windows 10's search to shame, although it feels like it has gotten worse in the past few years.

I found Spectacle (since replaced by Rectangle [0], I hear?) to be a treasure in this regard. Key bindings to maximize and resize, and even change what desktop a window is on.

I picked it because it was closest to the window-snapping / sizing bindings I had in XFCE years ago, but it basically makes OSX window management no longer frustrating.

0: https://rectangleapp.com/

It might not be quite what you want, but Command-(Shift-)-` should cycle through an app’s windows.

Only works if your keyboard layout includes backtick unfortunately.

At least on Gnome (which AFAIK copied this from Mac) it's actually mapped to "the key above Tab", even if that key is something other than backtick.

They improved it on copying, this does not work in MacOS.

You can change the shortcut in System Preferences > Keyboard > Shortcuts > Keyboard > Move focus to next window.

Try also using “Move focus to active or next window”. I have that mapped to Option-Command-` (and implicitly Option-Shift-Command-` for reverse order).

Contexts is nice but hasn't been updated since 2018! If you know a similar nice alternative, let me know.

They have said they're working on a Big Sur compatible version (it works currently, but the rendering is a bit broken).

That's fine by me to be honest—I paid for it once and it's done everything I need it to ever since.

> no easy way to change between multiple windows of the same program using keyboard shortcuts

CMD + ` (backtick)

I used various window management helpers over the years but recently settled on Hammerspoon.

This is my experience as well. I recently bought a Windows machine since a very long time as I want access to all 3 big operating systems (Mac, Windows, Linux). I forgot how terrible Windows can be with downloading stuff from the internet.

I wanted to download a program so that I could control my fans a bit better. Before I knew it, it took over my browser and would redirect me to other sites, couldn't remove the program either. Norton didn't do its job, and I didn't do mine because I probably unchecked one box too few during the installation process. The only way to fix this was to wipe the disk completely.

I've never had this experience on Mac or Linux.

Sounds like the problem exists between keyboard and chair.

Executing strange binary files from untrusted sources could ruin your day on Mac and Linux as well. It is only less common because market share is so small compared to Windows, and as such these systems are less of a target.

How do you establish that a binary is trustworthy?

I know what I do, I generally try to find user reviews, but someone had to install it first.

You can upload the binary to virustotal and see if it got flagged.

Oh, that’s an interesting tool. Thanks.

Just as a heads up to anyone out there, if you ever get infected run Malwarebytes (free) first and that should probably fix your woes or else you can download bitdefender (also free) and run that. I see a lot of people using norton or whatever else and imo they're all just a waste of resources and time.

I personally just uninstall Malwarebytes and bitdefender after I clean up and make sure windows defenders up and running and I'm ready to download all kinds of shit off the internet again.

I think the best way to find niche software on windows is to search for open source alternatives.

Making a system restore point from time to time can help greatly too on windows! (make sure you change the setting so that it doesn't delete your old system restore points due to low disk space making the whole thing useless)

I think that if you have malware, then formatting of hard drive is minimum of what should be done.

I would not assume that malware is actually purged, and unpacking backup archive and running script to reinstall software is not so troublesome (assuming that you have backups and script to automate installing your programs, but if you are on HN then you probably have it)

Well, I normally do :P But this was a fairly fresh install, so I didn't set it up yet.

I use Clonezilla for backups.

> But when it comes to media stuff (think: cropping a lossless audio file and compressing the result into an AAC or MP3, resizing 20 images, transcoding a movie from one format to the other, removing one page from a PDF), on Mac it can usually be done with built in tools.

Which tools are these? On Mac I ended up using Audacity for cropping audio files, ffmpeg (cli) to compress it, and used either ffmpeg or Handbrake for transcoding. I don't remember what I used for pdf page split/combine, but I do remember that it was a pain.

Honest question - as I still use Mac occasionally, it'd be great to know what are those builtin tools so I can actually use the features the OS has.

For cropping audio files, QuickTime can do it. Look in its menus. Also transcoding video files (to what extent this might be “compression” depends on your use case).

PDF page manipulation is the easiest thing on macOS, also quite robust usually. Just open the document in Preview and open the page sidebar. You can reorder or remove pages there; or drag and drop pages from other documents.

>For cropping audio files, QuickTime can do it. Look in its menus. Also transcoding video files

Quicktime Pro could do this all very well, Quicktime X however? It can trim but the UI is bad and encoding options are so limited I stopped using it all together.

It often comes to what we are used to. I switched from Windows to Mac on purpose: being a 20+ years Windows power user (software engineer) I wanted to learn a Mac to be a bit more versatile. It took me a year just to be decent MacOS user and I stil "hate it". For me Windows is still better, as for me it is easier and I know all the tools, etc.

I am still to discover a built in set of tools on either platform. I use 3rd party to accomplish my daily tasks on both.

On a very basic level Quicktime can do that. And Preview lets you easily remove pages from PDFs - select the page in the sidebar, press Delete.

Apart from that, more advanced 1st party apps for editing (for certain values of advanced) are available free - iMovie for video, GarageBand for audio, Pages for documents, etc.

I personally know people who switched to Mac because of Print/Save as PDF. Not just because of that but it was the proverbial last straw. (That was before it was built in in Windows.)

> Mac a little better in window management

I need to echo the sibling comment. I was overwhelmed with how bad Mac window management is in a multi-monitor setup. This is compounded by Apple's apparent hatred of keyboard shortcuts and power users. Speaking to my longtime Mac-loving coworkers, they displayed Sapir-Whorf-like inability to understand why the missing features/behaviors are even a problem.

In a the Covid/WFH world, I'm happy to let the 16" MBP collect dust while I use computers that adapt to me, rather than forcefully adapting myself to one particular computer's eccentricities.

ffmpeg is open source and cross-platform, no need for infested crapware.

GUI apps are easier to pick up and learn than CLI commands with cryptic namings.

You end up using FFMPEG all the time on Mac anyway because the built in tools only really support apple approved formats and if you're dealing with video being captured from elsewhere you end up with piles of webm containers full on non-apple endorsed codecs.

Didn't use to be this way in the QuickTime Pro ecosystem but since QuickTime X you can't just work with any old video files.

Meh, pick your poison. I was just pointing out there are more alternatives than crapware and expensive tools.

I still remember how bewildered I was when I realized how easy it was on my first Mac to trim or convert an audio file, crop or resize a picture, build a pdf from images or sign a damn pdf ^^

> On Windows you often need either a prohibitively expensive software suite or go on an ad infested crapware hunt resulting in a borderline unusable tool.

For basic audio/video stuff on Windows, try http://avidemux.sourceforge.net/

I've used FFMPEG for these kind of things with no problem on Windows and *nix systems.

If your willing to spend the money Adobe Media Encoder works perfectly fine on Windows as well.

As a person who develops a video player SDK on iOS, I can say that video processing is horribly undocumented, buggy, and randomly changes behavior depending on device and OS version. And Apple being Apple, you pretty much have no channel to report issues.

Metal is a pleasure to work with, versus the "build your own API" experience of using Vulkan on Android.

Much better. Last time I checked, there was a baseline audio latency on Android that makes realtime music apps (and decent audio responsiveness in games) basically impossible, vs on iOS where there's a giant ecosystem of them.

See this blogpost (from 2018) that seems to confirm my recollections:


> Based on MAQ stats, devices from Apple & Google are suitable for live audio use, and devices from other manufacturers generally are likely to have glitchiness and latency that makes them unsuitable.

Seems to be more of an issue with manufacturers than a specific issue with Android as Google's own phones get low latancies.

That's kind of a moot point - most Android devices are not Google's own phones, so you can't ship a reliable user experience on most Android devices.

My point was more that if Google can do it, other manufacturers can do it.

The issue with other manufacturers is that they concentrate a lot more on features that are visible to users whereas features like these aren't necessarily visible directly to the user and aren't given attention even though they're possible to implement.

I haven't used it, to be quite honest, the games I worked on had the Mac/iOS ports done by a different team, but since I was usually called in to help diagnose customer issues in live and never had to touch the Mac/iOS audio stuff, it probably worked well enough!

You can tell by the plethora of realtime audio production apps for iOS that it’s pretty solid. CoreAudio is the best in its league.

No wonder Apple is lightyears ahead in A/V-land. At least with reliable OS and expected behaviour of hardware you don't need to worry about this.

AAudio is supposed to overcome that, although I never used, only care about graphics stuff.

Isn't that the case in your experience?

Author here: Fair criticism. There are some details that are missing in the interest of keeping the story simple.

1) The audio and video playback is driven by the hardware, using the audio stream as the master clock.

2) Ninja does fill the Android and hardware buffers and keep them full. In the pathological case, the hardware and Android buffers had emptied.

3) When the Android buffer isn't full, the thread yields to Android (to be a good citizen) but asks to be invoked again right away. The 40ms "background thread" delay broke this behavior. The comment about "changing this behavior involved deeper changes than I was prepared to make" was when I explored changing this behavior (copying multiple samples per invocation) and decided it was more likely to introduce more bugs.

Excellent article btw!

But why did the problem only appear on this particular TV and not any other using the same version of Android? Was this the only TV to use Lollipop?

Yes. This was the only TV device on L. It was already 2 years old at this point, which for pure Android TV would not be allowed by Google. Since this was an AOSP device, the operator could stay on an older version.

Quick simple question for you. What does the TV testing landscape look like at a Netflix? Is it hundreds of various TVs in a room on a variety of networks or am I imagining too much here?

TV and set top box testing is shared between our partners (manufacturers and operators) and Netflix. Our partners must run our test suite before submitting for certification. We don't have all the devices under active testing all the time, but almost all of the devices have been tested at Netflix at some point. Most operator devices require VPNs to the operator network to function.

So yes, hundreds of devices on a variety of networks, but at any given moment most of the devices are in storage.

The "buffering" parts of the graph show the "fill the buffer" behavior described in (3). The time between calls is very short and large amounts of data are moved.

How does the buffering proceed quickly, but then the (slower than usual) periodic calls only copy a limited amount of data? That's the part that doesn't make sense to me. Normally if you're filling buffers, you copy as much data as is needed to keep the buffer full, no matter when you're invoked.

The handler always copies the same amount of data. It has to be called more often to copy a lot of data, which is what usually happens when Android buffer isn't full. I agree with you that this isn't usually how this is done. Doing it this way means that Android has more freedom to schedule other tasks during the buffering phase.

Okay, so the real problem here is there was effectively a yield() inside a buffer fill loop with a small block size, and Android decided to make each of those cost 40ms. The article made it sound like you were always waiting 15ms, but you were actually waiting 0ms when there was more work to do.

I still think the blocksize should've been larger for various reasons in this case (it's a trade-off still, usually larger block sizes are mildly more CPU-efficient, besides preventing pathological scheduling cases like this one) but this explanation makes more sense.

Yes, that's all correct. The reason for the small block size has to do design decisions made in the rest of the streaming stack, which is shared between all TV devices.

That is indeed interesting. The pipeline should be filling up if the OS scheduler delays the buffering for 40ms, giving enough data on the next call. Except if the buffering isn't greedy enough or aborted by the OS in some way.

>You can't and shouldn't rely on your audio handler getting called on time via a timer in order to keep playback stable, especially not on a non latency sensitive use case

Why not? According to [1], using timers is how Windows, CoreAudio, and PulseAudio all work under the hood, and on Windows and in PulseAudio it replaced the previous interrupt-based implementations. On the app-end of the APIs, Windows' WASAPI code example uses Sleep polling [2], and PulseAudio's write callback is optional and VLC doesn't use it [3], foobar2000 has a polling-based output mode[4], Windows has specific APIs for audio thread scheduling [5], etc.

Is this a specific deficiency of Android?

[1] https://fedoraproject.org/wiki/Features/GlitchFreeAudio

[2] https://docs.microsoft.com/en-us/windows/win32/coreaudio/ren...

[3] http://www.videolan.org/developers/vlc/modules/audio_output/...

[4] http://wiki.hydrogenaud.io/index.php?title=Foobar2000:Compon...

[5] https://docs.microsoft.com/en-us/windows/win32/procthread/mu...

The way to get the best latency on a device is not for your processing loop to be directly driven by an audio hardware events/interrupts. This is for many reasons:

1) events/interrupts can be delayed (usually by (bad) drivers disabling interrupt for a long time).

2) latency is not adjustable. You may want to give your processing more time if you know more work needs to be done, or the system is under variable load.

3) you will usually finish generating your audio too early. Eg: if the interrupts is every 2ms, but your processing is 1ms long, you are adding an unnecessary 1ms latency by starting as soon as the interrupt is received.

4) the device will consume more power as it can not know when it will next wakeup (theoretically, I do not think it makes a difference in practice)

Modern mobile devices use a DMA to copy the content of the application processor (AP) to the audio DSP. This DMA will copy periodically the content of a AP buffer to a DSP buffer, this is called a DMA burst.

You want to track this burst and wakup just enough time before it to have the time to generate your audio data and write it to the AP buffer + a safety margin. This allows you to track the system performance & application load to adjust your safety margin and optimize latency. It also allows the scheduler to know when your app will need to be woken up far in advance leading.

The Android Aaudio API [1] implement what I just described, as well as memory mapping the AP buffer to the application process to achieve 0 copy. It is the way to achieve the lowest latency on Android.

I believe Apple low latency APIs uses a similar no interrupt design.

Source: worked 3 years on the Android audio framework at Google.

Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.

[1]: https://developer.android.com/ndk/guides/audio/aaudio/aaudio

The Android Aaudio API is newer than the device in question.

The previous options are either SLES or AudioDriver, both of which are bad in their own ways.

You are right, Aaudio is available since Oreo only and the device is 3 version older. Additionally Netflix 's video playback is not a low latency use-case, so it shouldn't use Aaudio even on recent devices.

My comment was about the parent's affirmation that low latency apps should be scheduled by interrupt.

Saying in hindsight when the error has been found that something is a wrong approach and should have never been done like this is easy. But everyone of us is daily coding simple „tricks“ to not implement a more complex solution because the simple approach seems to be working, until some months/years later it does not anymore. And then someone will point their figures at you/us and say it‘s obvious it should have never been done like that...

This isn't just hindsight - anyone who knows audio coding or really anything about thread-sleeping will have the same reaction of disgust to the design of that audio buffering. It is a design that is bad in ways it never needed to be, that would not have saved time in coding.

This kind of attitude, that there is only one "right" way to do it, and any other method is disgusting, is why there are at least three different, incompatible, audio subsystems for Linux. Every time a self-proclaimed expert on audio decides that the current most popular implementation is wrong, they just start writing (and leave incomplete) a different one.

This same jousting over technical windmills happens in other userspace elements of Linux. Linus, for better or worse, keeps a tight rein on what's allowed in the kernel, but when it comes to audio, window management, display, UX, everything is a shitshow of mixed metaphors and dysfunctional interoperability.

Replace audio with security and you get the reason why everything is so good, everyone does what they think is best for security and security bugs slowly lessen.

Wait... That's not how that works.

Honest question (I don't know much about audio coding), but from what I've read here on HN, Netflix offer some of the best remuneration packages - so don't they have some of the best developers working there?

Surely they'd have picked this up if it were that straight forward?

> offer some of the best remuneration packages - so don't they have some of the best developers working there?

Having worked in several top-tier for employee-compensation firms, I can wholeheartedly assure you that best pay does not result in best developers (or any other job function).

It does generally result in a higher “floor” of employee, but that’s about it.

Have you seen how broken the hiring process is at these bigger places.

If you spend your time learning about the details of audio playback what kind of additional time do you have to learn leetcode?

Can't say I'm surprised; I have a lot of other small beefs with the Netflix app that collectively hint to me the developers who coded it may not have been top of their game.

Some examples:

- Unacceptably high latency for any sort of seek operation - even if it's just going back a few seconds to what you just watched and ought to still be somewhere in a local buffer.

- Trying to seek multiple times sometimes forces you to wait for the previous operation to finish instead of "accumulating" your taps to determine a new time target

- Grabbing the playback marker and dragging it left and right is very imprecise, it should converge into "logarithmic" behaviour for fine movements

- Flakey Chromecast button. More than half the time I have to kill the app and relaunch it to get the cast button to show up.

- Occasionally flakey UI layout (random bugs which present when moving between screens or searching, or flickering due to layout engine doing multiple repaints over the same controls)

- Likes to plaster up things I don't want. They seem to prioritize "discovery" over "user intent", which would work if only they surfaced articles I'm interested in (which is rare). When I just want to find a "comedy" or get to the series I'm watching, it sometimes feels like I'm fighting against the UI

- Constantly moving my cheese - it's like the UI designers have ADD and can't commit

Amazon Prime is just as bad (maybe worse). I wish these companies didn't control the vertical - i.e. just provide content, and let other vendors make the client software.


Good points, but I have to say that even with all that, Netflix app/site is by far the best of the streaming services that I've used. HBO Nordic, Viaplay and YouTube movies have pretty bad UI/UX problems and their streams become borderline unwatchable with subpar internet connections, whereas Netflix has in my experience still been able to push through video in those conditions with good enough quality that I can still enjoy it.

> Unacceptably high latency for any sort of seek operation - even if it's just going back a few seconds to what you just watched and ought to still be somewhere in a local buffer.

Unacceptable for what use-case? I had assumed this was somewhat intentional on Netflix's part, or at least not a priority for them.

For the use case of watching a movie?

You seek back a scene or two while watching something when your mind has wandered away and you missed what was going on a few seconds ago. For some non-neurotypical people it's a really common thing to do while watching a longer movie as it may be hard to follow what's going on otherwise.

Id generalize this to:

If your application is timing critical, and you aren't using things specifically desiged for time critical applications (like RTOS), then you should be doing as much low level implementation yourself as possible.

Or the other way: don't make your application timing critical if it does not have to be. As others point out the hardware provides a buffer and the app only fills in one frame in advance instead of filling the buffer completely when possible.

> And if you're not, you need to be capable of buffer filling way more than one frame's worth of audio - at least 100ms, preferably much more.

Author addressed this point and agrees with it:

> Why don’t you just copy more data each time the handler is called? This was a fair criticism, but changing this behavior involved deeper changes than I was prepared to make

In general, "real time audio" usually refers to low-latency audio, which uninteractive video playback is not.

Yes he addressed it but just said he was unwilling to do anything about it as it resulted in too much change/work

That does answer the issue of why it was implemented like that in the first place

Well, it's been my experience that you only find out the proper way to implement something after you do it one or a couple of times.

>If you're doing real time audio, your processing loop needs to be directly driven by audio hardware buffer events/interrupts

Is this realistic? I don't know much about providing software for a wide range of hardware, but I would imagine this would require either Netflix or the integrator to write drivers for Netflix to communicate to the hardware dsp right (or standardize)? It seems more feasible to piggyback on a platform that has done that already and is already widely deployed.

If I'm reading correctly, they're saying that Netflix is not doing realtime audio, so they need not have this level of precision. They should buffer more than 40ms of audio data.

I think they are doing so ("The Netflix application is complex, but at its simplest it streams data from a Netflix server, buffers several seconds worth of video and audio data on the device").

The system was built to fill the (hardware) buffers for 15ms worth of audio. The problem is that the 15ms window was not invoked in time (sometimes it took 40ms). Some suggested Netflix that in that case they could fill audio data for 40ms into the HW, but they didn't wanted to because it required low-level modifications.

Except buffering wasn't the problem, an OS-managed thread was.

The reason it caused a problem is because Netflix wasn't loading more than a frame worth of data at a time. That's a strategy that makes sense for a game streaming app like Stadia or Steam Link where every frame of latency matters, but for a movie/TV streaming app it really doesn't.

The point is that there was never any good reason to make the Netflix app so timing-dependent. Had Netflix not done this they wouldn't have even noticed.


That said, if a company was releasing an Android 5.0 powered device in late 2017 when Android 8.0 had already been released I can't help but say they brought this on themselves by being idiotically out of date. There's no excuse for launching a device with an OS three years and three versions out of date.

Everybody in this story is doing really dumb things.

> wasn't loading more than a frame worth of data at a time

So the real question should be, does Netflix have a valid reason for only delivering a single frame of video and audio at a time? Is it bad software design, or good software design for an unknown problem (to us)?

I've seen plenty of things that looked stupid out of context, then made sense later on after understanding why the "stupid" things was done in the first place.

Stupid thought, especially thinking about Netflix's history as one of the earliest services that streamed licensed content, it would not surprise me if this was at some point a way to minimize how much unencrypted data was in memory or similar.

Other thought would be, whether the architecture was somehow related to how one had to do things in Silverlight (any silverlight devs still remember how one would do this sort of thing? Any Silverlight devs still around?)

It's probably easier to keep the audio and video in sync with a simple implementation. They can't really drift out of sync if all that's being one is handling one frame at a time.

Though it sounds like they aren't even getting that advantage, because the audio thread is separate, and the described problem is starvation of the audio buffer alone.

It probably makes it easier to implement things like variable frame rates.

"There's no excuse for launching a device with an OS three years and three versions out of date." Really? Why not? I mean if you only launch for the newest OS version, you cut out a lot of devices which were not or cannot be upgraded. Maybe in the US this is not a problem, but in Europe I wouldnt be surprised if this were actually a larger than 50% market share you cut out.

You have misunderstood. The hardware device that was misbehaving had not yet been released to the public, yet it was running an ancient and out-of-date OS build. That's what I'm saying is idiotic.

Netflix can and should support their software on as old of OS releases as makes sense for them, but a hardware vendor introducing a new device to market on an outdated OS is inexcusable. Android 5.x received its final update in early 2015, over two full years before this mystery TV box was to be released. There is no good reason it couldn't have been on Android 7 or 8.

It was the interaction between the expected amount of required buffering and the thread scheduling behavior that caused the problem. As typically happens when there's a video programmer around, the audio timing is considered less important, and there's a belief that "we can always get the audio we need for the video". Ergo, not much need for buffering. In this case, the odd/incorrect behavior of the thread scheduler exposed the optimistic assumption that you can rely on waking up every 15ms and handling the audio.

And further, video latency is much less noticeable than audio latency. ~nobody will notice one dropped frame, but ~everyone will notice 16ms of audio missing, because you get a blatant click. Therefore, it is always important to prioritize audio buffering over video buffering.

oh, yes, I had to use some videoconferencing software recently that appears to drop audio before video. It was basically unusable.

I see no reason that interrupts are need at all. On a properly working system, the system clock will track the audio clock quite well. If the audio hardware tells you that it will need more data in x ms, it will be very close to x ms as seen by the CPU. If you use a deadline scheduler, you don’t want to wait for an interrupt — you tell the scheduler that you need to run before a specific deadline, and that that. A hardware interrupt is pure overhead, especially on platforms like x86 with silly interrupt latency.

(A good kernel can account for the time it takes to wake from idle and compensate when programming a timer (as could a good CPU, but I’ve never heard of hardware that does this) and can wake a user program a bit early if the CPU is otherwise idle and would take a short nap. A sound hardware interrupt can’t do these tricks.)

This is both true and false at the same time.

(1) It is absolutely not true that the system clock will track the audio clock "quite well". Real world numbers would typically involve drift measurable in seconds within a low integer number of hours.

(2) It absolutely is true that on most sensible audio interface designs, you don't need interrupts except early after device open. At that point, you use them to set up a DLL (Delay Locked Loop) that will enable you to use the system clock to know where the audio hardware is reading/writing to in the buffer used to move data to/from the hardware. Once the DLL is correctly configured, you can just be woken by the system timer, determine the current audio hardware state and read/write data appropriately.

But on (2): unless you're a low level app directly driving the hardware, you shouldn't do this. And you especially shouldn't do this without good guarantees that system timers are indeed well behaved. Effectively, using a DLL and system timers instead of straight IRQs is a low level implementation detail, but that belongs either in the driver or in a very low level piece of code anyway (like JACK or some other audio daemon), not user applications. I was sort of throwing all of this together when I said hardware "events"; locking a DLL with a different time base to the playback hardware buffer pointer stats is still effectively that, and should be done, but only at a level low enough where you know it'll work. Which definitely isn't Android's high level thread scheduling framework.

For (1), by “quite well” I mean well enough to avoid buffer underruns. So I think we agree.

No, we don't agree. The drift will exceed the buffer size is less than an hour, or certainly could with many hardware configurations.

>"A hardware interrupt is pure overhead, especially on platforms like x86 with silly interrupt latency."

Relatively speaking maybe. But I have a device I've designed that communicates to PC using interrupt transfer at the rate of 250 requests/s. I've watched it then over oscilloscope and it is stable like a rock. Sure if you overload PC it might get into trouble but it is a game like application so when the app is ran PC is not doing much else. I recently ported the same thing to Raspberry Pi 4.

Hah, you must never have tried this on an otherwise idle stock Sandy Bridge before Linux added mitigations :)

x86 interrupts are slow but at least reasonably consistent on a non-idle system. If your system is deeply enough into its various idle states, this is not at all true any more. I’ve seen latencies over 10ms on Sandy Bridge with C1E enabled.

Linux was on Pi 4. On PC it was actually Windows. The first version went out some 7 years ago I think and was rock solid since the beginning. They must've done something right ;)

It isn't on Android, but as I said, Netflix isn't a real-time app, so Android's deficiencies in the low latency audio department should not concern them. They should just buffer more audio per wakeup.

> as I said, Netflix isn't a real-time app

I honestly had a tough time parsing the sentence when you said that due to the triple negative, so it's possible that others might have misunderstood it.

Multiple negatives are common in some languages and absolutely do connote a distinct meaning from the simplified language.


Yep, I don't disagree! I think they can still be tough to parse in English though, even as a native speaker

That sentence was indeed a bit awkward, but I hope the meaning gets across since it wouldn't really make much sense the other way around, I think :-)

I'm not a native speaker and the sentence is honestly absolutely unparseable for me and reads like a contradiction. Is it even... correct?

I think it's grammatically correct, yes! But being grammatically correct isn't always the same as being easy to understand; some correct things can be hard to understand, and some incorrect things can be pretty easy to understand!

> You can't and shouldn't rely on your audio handler getting called on time via a timer in order to keep playback stable, especially not on a non latency sensitive use case, which Netflix very much isn't.

I don't understand this criticism, but I am likely missing something.

From what I gathered from the article, Ninja is specifically for firmware. It buffers frames, and delivers those frames whenever requested.

> your processing loop needs to be directly driven by audio hardware buffer events/interrupts

Again, I could be missing the thrust of the criticism, but isn't that exactly what's happening, here? Ninja supplies an endpoint, and it's the hardware that calls it?

> It buffers frames, and delivers those frames whenever requested.

Going by the article they use a plain android thread that runs at fixed 15ms intervals and buffers exactly one frame in advance. For 30 fps this is more often than needed (yay battery life) and for 60 fps that just asks for trouble when your OS doesn't guarantee real time behavior.

> Ninja supplies an endpoint, and it's the hardware that calls it?

As far as I understand it provides a cross platform endpoint, called by a platform specific wrapper. In the Android case this was a background worker and Android considered background workers not real time critical.

Thanks for taking the time to explain! Much appreciated.

> They use a plain android thread that runs at fixed 15ms intervals and buffers exactly one frame in advance

From the article: "...buffers several seconds worth of video and audio data on the device, then delivers video and audio frames one-at-a-time to the device’s playback hardware"

And: "under normal playback the thread copies one frame of data into the Android playback API, then tells the thread scheduler to wait 15 ms and invoke the handler again"

And: "When you create an Android thread, you can request that the thread be run repeatedly, as if in a loop, but it is the Android Thread scheduler that calls the handler, not your own application."

But from OP ...you need to be capable of buffer filling way more than one frame's worth of audio - at least 100ms, preferably much more. Not just for robustness, but also for battery life.

I suspect there is a misapprehension here. "...under normal playback the thread copies one frame of data into the Android playback API, then tells the thread scheduler to wait 15 ms and invoke the handler again". I suspect that one can call that handler more often than every 15ms to fill the playback buffer, and then in normal playback one adds frames one at a time to keep the buffer filled and no faster, but on quick reading it sounds like as the OP stated.

I agree this is bad engineering: don't use Android threads for real-time stuff. This should be running at module level.

A better approach, if the micro you are using allows it, is to have an interrupt when (for example) the I2S buffer is empty. I would then point the DMA to fetch the next buffer (already processed and mixed) and fire the DMA transfer.

> at least 100ms

This calls for huge latency problems. But I understand the approach if your buffer-filling/reading procedures are slow or unreliable.

I disagree with the timer thing. There are systems that provide precise timers for media (for example Win32 multimedia timers[1], that I never used but I know they exist).

[1] https://docs.microsoft.com/en-us/windows/win32/multimedia/ab...

>This calls for huge latency problems

It doesn't mean there is 100ms of latency, it just means that 100ms of audio is buffered so that you have ~100ms of leeway about when you app's audio thread is scheduled. Changes to the audio stream such, as stop/start/volume control, can be achieved with much lower latency using buffer rewriting or by applying changes lower down the stack where the buffers are smaller, or both. By default PulseAudio will buffer ~2000ms of audio from clients [1]

[1] https://freedesktop.org/software/pulseaudio/doxygen/structpa...

It DOES mean there is 100ms of latency.

Anything above 10ms (some say max. 20ms) for real-time audio processing (especially musical instruments) is prohibitive. Imagine an electronic drum set: if you hit the snare and the audio is output 100ms later from the speakers, I bet you'll notice it :)

Ah but GP wasn't talking about real-time in that context ("And if you're not...").

You can still have 100ms buffer without 100ms latency: within 10ms of the drum being hit, write 100ms of the drum sample into the playback buffer and immediately trigger its playback (or write it into the buffer starting at the cursor position that is just about to be played).

The only trouble is when you need to modify some of that 100ms before it is played back, for example if the user hits another drum 50ms later. In that case it becomes more complex, you'd have to overwrite some of the existing buffer with a new mix of both drum samples. The complexity is not worth it for that kind of app.

For a simple app like a video player, the audio stream is much more predictable so you can buffer more. Volume changes and pausing can still be applied with no perceivable latency by modifying the existing buffered data [1]

[1] https://www.freedesktop.org/wiki/Software/PulseAudio/Documen...

> within 10ms of the drum being hit, write 100ms of the drum sample into the playback buffer

You can do that if you can predict the future and fill the buffer with data you don't know (samples from the future). Otherwise, you still have to wait for 100ms of samples to output. So if you have to wait 100ms for the samples, then the output happens after 100ms, hence 100ms of latency.

100ms samples can be fine for a video player. For real-time you (usually) do this: 2 buffers of 10ms each. While one buffer is playing, you fill another buffer with real-time data. After 10ms has passed, you start playing the buffer with real-time data, while the other buffer gets filled.

The point is that in a video player, you can predict the future: the video will continue playing.

> Otherwise, you still have to wait for 100ms of samples to output

No, you don’t, not if you can write into the buffer at an arbitrary point. This is what the whole page I linked is about.

I was sure that I had read this story before but checking the date showed that this blog post was just published a few days ago. A little searching turned up the place that I had originally seen it: a comment on this very site. https://news.ycombinator.com/item?id=24799332

Indeed, writing that post is what inspired me to write the tech blog article.

So was this the first device that used that specific version of Android? I'm curious if it had anything to do with the device itself

It was the only device to use this version of Android. There were other issues, too, this was just the most interesting.

It reminded me of something I see on Windows also, with Netflix at least on Edge. Not sure how to reproduce entirely but I think it's if I pause and move off the browser tab and play, there's a good chance the frame rate will begin to stutter.

Hah, and my own deja vu was from that rachelbythebay post that this comment is underneath

So I’m assuming the bug actually was on all Lollipop devices?

I have tried a bunch of steaming platforms (Netflix, Amazon, hbo max, Disney plus, Apple TV) and Netflix has the best playback experience (and overall experience), followed by Apple. They definitely take engineering more seriously than the others.

We have considered ditching Netflix (content is meh for this household) but after using one of the other apps we frequently bail and find something on NF simply because the overall experience is better.

I think Netflix has a very high prio on the "customer wants to see a video, so we shall play that video". It's on every platform and your mom's IoT hairdryer, even with an atrocious connection speed, you still get a (blocky low quality) video stream with audio. This is invaluable, especially with kids you need to keep occupied on a train ride with 3G coverage.

Apple on the other hand, seems to think: "we have a glorious 4K 60 fps 30 Mbit Dolby Atmos 7.1 extended edition of <movie name> on our server. What's the point in experiencing anything less than the perfect experience? You either get the full shebang, or a spinning loading indicator."

Interestingly I have found the various Prime Video apps to consistently be the worst in terms of UI and performance. I imagine Amazon as a company also takes engineering and infrastructure seriously, but the results don't show it.

HBO, Disney, Hulu, Apple etc. are all somewhere in the middle, and yes Netflix is FAR ahead of the pack.

I think Amazon take it less seriously than some of the others. For Netflix, streaming is their entire business. For Disney, HBO and Hulu, it’s the future - they’d probably rather stick with the old school cable TV, movie theatres, etc. model, but they can see the writing on the wall, and they aren’t going to let streaming take them down. But for Amazon ... it’s more just a perk to package with Prime shipping. They want it to be solid, but it’s always going to be very small for them compared to e-commerce, AWS and their platform for 3rd party sellers, fulfilment, etc. Prime Video revenue is similar for them to their co-branded credit cards, I think?

I actually think they have some great content, and a pretty decent app. Their “originals” especially are pretty great - Fleabag, the Boys, the Expanse, all excellent. But yeah, the quality of the app is a bit lower than some of the others, and they have a lot less content than Netflix, because Prime is really not a main focus of their company, more of a nice add-on.

Funny you mention this - my "smart" TV (Vizio) had a recent software that made it so opening Prime Video crashes the whole smart-OS (no home menu and no apps)... with the exception of Netflix's app, which works flawlessly.

I assume Netflix ships a much lower level integration that doesn't rely on nearly as many of the TV OS's api's and therefore can launch even when Prime has decided to bork the whole system.

I agree with this (Haven't tried HBO though).

Disney streaming is pretty good, but the UI isn't as good as Netflix for use across the room. Why do they make the title cards have second level text that is so small it can't be read, and then not have the continually updated preview that Netflix does?

I agree Amazon is pretty disappointing. It seems more like a dodgy pirate video player than a major competitor.

I can't even use the Prime Video apps, because they all ignore all language settings (both those of my devices and my Amazon account) and pick which language to use based on IP Geolocation.

There's a setting for language on the web, but it only affects web.

In my case I'm a foreigner who is still learning the local language, but it must be a real nuisance for people in multilingual countries like Belgium or Switzerland where I guess it would just go "oh, Swiss IP, I guess you want German then".

My experience with Disney has been as good or better than Netflix. Hulu, OTOH, is a disaster. I have a grandfathered Spotify/Hulu plan that basically makes Hulu free so I keep it around but it's my option of last resort.

Maybe I shouldn't be surprised, but Prime Video's Chromecast integration is just terrible.

Still better than actually using a fire stick directly. At least the UI latency on the phone is bearable. My, admitedly old, fire sticks have input latency measurable in seconds.

The firestick remote replacement app (because I can't ever find the actual remote) is also a joke: it disconnects all the time, fails to reconnect and it is generally not very usable.

Because it is rare that they go above 1080p. In mac's chrome, they just go to 720p max, which is way below industry standard. And the worst thing is there is no option to set the quality even if I am ready to handle some pauses in the video if my network speed can't catch up(which it can, just that they err on the safer side)

> 720p max, which is way below industry standard

maybe low for western standards, but the majority of devices worldwide cannot display higher resolutions.

hek, even new budget phones today rarely exceed 720p.

anecdote: worked at a cinema and we cast 720p onto the big canvas (10meters wide) because people would not notice a difference to anything higher. overall bitrate and quality of encoding are mostly the deciding factors.

My favorite unpopular opinion to share (I work in video) is that 99% of viewers don't care about resolution above 720p. Other encoding and playback characteristics are just way more important if you aren't building for videophiles.

Netflix limiting mac chrome playback to 720p is case in point if true (though I've never heard this). What % of their "high-end" users is this? My macbook has more than five times as many pixels than 720p... I've never noticed this, and Netflix surely recognizes that they don't stand to gain much by streaming higher resolution to most users.

Netflix, of course, does build for videophiles, but I still hold my conviction to the extent that I constantly question the priority of work to make something 1080p or 4K.

I will certainly concede that VR use cases need higher than 720p resolution.

That's a seperate DRM issue with WideVine. Use Safari if you want 1080p (or 4k if you have a T2 chip inside).

IMO content is far more important than UX and Netflix has gotten pretty bad. I don't care about all the crappy Netflix originals.

I expected something about TCP_NODELAY and Nagle's Algorithm, a recurring theme on HN [1][2]. I'm left surprised, but I also can't shake the idea that that arbitrary sleep might just be tailored after TCP.

[1] https://news.ycombinator.com/item?id=24785405

[2] https://news.ycombinator.com/item?id=25133127

The first thing I did after opening the article was: CTRL-F "nagle"

200ms for that one.

> Next I started reading the Ninja source code. I wanted to find the precise code that delivers the audio data. I recognized a lot, but I started to lose the plot in the playback code and I needed help.

> I walked upstairs and found the engineer who wrote the audio and video pipeline in Ninja, and he gave me a guided tour of the code.

THERE IS NO SUBSTITUTE FOR DEVELOPER CONTINUITY. Yes, you want your code as good as possible, your docs as good as possible, your knowledge transfer as good as possible. But there is ultimately no substitute!

Domain knowledge matters, and takes years to develop. Familiarity with the codebase matters, and in a complex and changing codebase is a process that is never complete. Having access to the people who wrote the things in the first place matters.

These facts are often neglected in a world where developers are treated like swappable commodities, and it's considered normal to job hop frequently.

Organizations that treat developers well and give them paths of advancement/reward within the organization to encourage developers to stick around for a while will reap benefits.

Good observation. I've experienced the consequences of turnover on my current team. The product was mostly built 5 years ago, but no one on my team has been here more than 1.5 years. Getting even basic things done requires extensive codebase archaeology. It sucks.

I don't necessarily think you need the people who wrote the code originally though. As long as there's enough overlap between generations, and deliberate effort to train people on different parts of the system, knowledge can be preserved. That's probably not possible if everyone changes teams after 2 years though, except for relatively simple or highly standardized codebases.

I appreciate the humbleness of this writeup. The dev asked others for help explaining the code, and eventually the bug was solved by someone on another team completely! And there's no shame in that (nor should there be), but my pride or fear of inadequacy might have made me feel otherwise.

Same. I realized after reading this that I do not ask for help enough

They didn't say how they fixed it. I doubt the vendor updated the Android version. Did Netflix make a change to ensure they're always making the thread in the foreground?

My impression is that the bug workaround was for the integrator to make sure the application was in the foreground before calling the Ninja handler

> The integrator back-ported the patch (linked above) to L. It is literally only a few lines so this was straightforward. Some of the hardest problems take 1 or 2 lines to fix :)

yeah I also hit this impression. despite all the investigation and detail it seems in the end it was an exercise in hand washing and blame shifting.

I have struggled a lot with video streams decoding on Android. Every chip manufacturer does things differently. Some have a 11 frame buffer, some a 6 frame buffer, some will pipe data out after only a few frames. Let aside the tens of different YUV formats from diagonally interlaced streams to split-byte ones lest you want to work with the output directly and not let the decoder render to a surface directly (to avoid the latency of reading from the surface). Back then I ended up doing a lot of crazy things like piping "empty" NAL frames to the decoder, to force a flush among other things. If you really cared for low-latency specific streams it was very difficult to work with the fragmentation of the ecosystem.

That was a few years ago, the versions that came at around 2016++ (after API 23) were the versions that became more stable, and relied less on OEM specific magic.

Can definitely concur with this; decoding on each compatible phone for gearvr was a nightmare, not to mention that decoding to a buffer instead of a texture surface was noticably faster. My code still has enums of magic numbers for S5's and note3 weird yuv formats

Interesting. Oculus team at FB has been playing with frame timings etc and just released 'Phase Sync' in Quest HMDs (Android) [1] so even though it's a highly customized ROM build, it is likely they face the same type of problems.

[1] https://uploadvr.com/facebook-phase-sync-quest-latency/

A little OT: often when chasing bugs, the diagnosis represents 99.99% of the work, and the fix is completely trivial.

I recently spent over a week trying to find why a Windows PC was acting erratically in very specific circumstances and seemed to work fine the rest of the time. It was (here also) a problem with timers. Once the problem was identified, correcting it took 15 seconds.

I sometimes worry that in healthcare it's the other way around: doctors do an immediate diagnosis and then months (or years) are spent trying to fix what they think needs fixing. Isn't it possible that we are doing it wrong and are not actually getting to the bottom of things?

The reason we can make medical diagnoses so efficient is because we have had the collective experience, and that the human body does not vary radically from one person to another such that it would be hard to keep track. While a doctor can’t be expected to know every intricacy of the human body, it has been catalogued and studied extensively over thousands of years, and this information has been made available for quick reference. Between experience and the rigorous education, doctors will know a significant amount of detail such that they can diagnose (or confidently know how to delegate this) fairly quickly. It’s not perfect, of course, and you are right that sometimes the symptoms are fixed rather than the causes.

Compare, however, to software: unlike the human body which has numerous small variations but a common base, software is free to take many different forms and frameworks. This software continues to change over time, so while documentation certainly can exist, it can be immensely fragmented and describe complex behaviours. Plus, when software developers fix an issue it often tends to stay fixed; there is a use for a postmortem but no immediate necessity. But in people, this has to be diagnosed every time it is encountered, so it is natural that documentation will occur.

(This isn’t the perfect metaphor, of course, but I hope it roughly corresponds.)

    This story really exemplifies an aspect of my 
    job I love: I can’t predict all of the issues 
    that our partners will throw at me, and I 
    know that to fix them I have to understand 
    multiple systems
Reading things like this makes me realize I've got to get out of this business at some point.

That is not a part of the job that I love. That is a part of the job that makes me want to bash my brains out with a rock.

I actually do sort of like that kind of problem-solving, in the abstract, buttttttt in reality I usually need to do it under pressure for unappreciative clients/management who just know that things are taking longer than expected. This is of course a communication challenge, and I am a good communicator in general, but... I'm kind of burned out on it.

I bet that he really meant "This story really exemplifies an aspect of my job I hate" and won't miss at a future next job. The positive spin was only there because it got on the company blog.

Hahaha. I was wondering the same thing.

Did he really, truly mean that? Was it sarcasm? Was it something PR asked him to add? Was it something PR added without asking him?

Quote from the blog:

> I walked upstairs and found the engineer who wrote the audio and video pipeline in Ninja, and he gave me a guided tour of the code.

That's what I miss working from home. Random, spontaneous, productive interactions with coworkers.

While it's not quite the same, remote has the benefit that you can do that with people not working in the same building too. (e.g. where I work spontaneous screen share sessions are totally normal)

This bug write-up reads like a suspense novel. I cheated and jumped to the ending.

It's finding bugs like these that give you an immense sense of satisfaction for discovering it once you get through all the frustration.

Well done!

Ugh, if only other companies cared that much :-(

Amazon Prime app on LG TVs has had lip sync issues for at least 4 generations of their TVs, problem has been reported both to LG and samsung, and no one cares enough to fix it. When using the app on my LG CX the audio is about 500ms behind, it's an insane amount of missing audio sync and no one cares to fix it? No other apps do that, it's specifically amazon prime, but I guess it doesn't sell their Fire Sticks so hey, who cares?

The YouTube app also has horrendous lip-sync issues on LG WebOS. They don't care.

Yeah I saw, but Netflix is pretty much perfect(I think it has maybe a frame or two of audio delay, but not enough to notice most of the time), so clearly it is technically possible, just that these companies don't care about sitting down and fixing it.

*LG and Amazon, no idea why I said samsung :P

Immediately searched for “Nagle”, “TCP”, etc but was surprised to see this is something else! Not the typical TCP_NODELAY issue.

> that Android Threads are a userspace construct


Does anyone have more reading on this? Sounds interesting. I know Windows 7+ x64 has a user-mode scheduler but I hadn't heard of a Linux-based OS supporting this. How does it handle blocking calls and such?

they're not a userspace construct in any meaningful way. libutils/Thread (capital-T Thread) is a thin wrapper around pthread_t, and most Android code uses pthread_t directly. what this hit is probably a bug related to timerslack_ns--there's some code to tweak timerslack values on a per-thread basis instead of a per-process basis even though audio should never be coming from processes low enough on the importance list to get a high timerslack value--but L bugs fixed in M are before my time.

The vendor was right: you’re using unreasonably short buffers for a non-interactive use case (i.e. video playback).

With such short buffers (16.6ms) and a user-space timer (i.e. basically usleep) driving the whole thing I’m actually surprised that this didn’t broke earlier...

Wait, a 40ms bug that isn't due to Nagle’s algorithm? Is that even possible?

I am curious what tools were used to make the nice visualization. I lack skills in making visualizations, and do regularly run into cases similar to this one where it would help me understand or diagnose something to chart some somewhat complex data like this.

Any spreadsheet software should get you pretty far. Most languages have some graph/chart making libraries too.

My first thought when I saw the title was that it was due to Nagle's algo.

This is a really fun read - I like the core insight of graphing the log statement timings to see what's going on.

I love this. I'd like to know a bit more about how it was fixed, but the story of diagnosing and proving the problem is great.

Sadly, Medium is now more full of crap than a herd of circus elephants, with tech blogs written by people who have nothing under their belt but a code camp, but this is a diamond in the rough.

There are a few websites who spend so much on CDNs, static site generation and other crazy techniques to lower a few milliseconds.

But then you have the same websites, cluttered with Ads all over and those super annoying pop ups, ultimately slowing down the experience.

So how did they finally fix it? Did they do the workaround of loading more than 1 frame of data? Or did the vendor upgrade to a more recent AOSP? Or did they patch the fix into their version?

Android frustrates me on many levels, but I appreciate having the source code to dig into. If the same issue happened in an iOS device, Netflix would need to do the workaround.

Interesting - I wonder if not having the source code means coreaudio and video playback on apple is actually better behaved across the entire ecosystem of devices vs android. I've never seen a music production app I don't think on android now that I think of it.

> I've never seen a music production app I don't think on android now that I think of it.

Have you looked for any?


I am curious what caused opening the thread in the background. Seems like a race condition of sorts. Anyways fascinating read. I love to debug and read about such cases.

Huh. I wonder if this is also the reason why my unity game stuttered below 30fps even in menus on a galaxy s6 until I put it to sleep and woke it back up.

I get this exact same problem on my SHIELD TV 2017 with all latest updates. Makes me think I should be reporting it somewhere.

Saw 40ms was immediately expecting classic Nagles problems. For those with the same suspicion this isn't that.

At least tell us how you fixed it!

"The integrator and Netflix had already completed the rigorous Netflix certification process..."

That part confused me. Netflix completed its own certification process? Does the author mean himself, as "Netflix"?

My money was on: someone ported NT thread priority inversion to Linux.

It’s a scheduler bug (exposed by a race), but not due to priority inversion, as it wasn’t waiting on anything —- it was just being treated as a low priority task.

Android itself just isn’t very suited to real time tasks.

If it's running on a TV, it better be. At that point it's not an plucky O/S running on underpowered hardware, trying to save battery.

Man, Android is just not a well-designed OS. This is silly.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact