If you're doing real time audio, your processing loop needs to be directly driven by audio hardware buffer events/interrupts (through as many abstraction layers as you want, but a traceable event chain nonetheless, no sleeping guesswork or timers - and ideally all those threads should be marked real-time, never allocate any memory, and never lock or block on anything else). This is what serious real time audio systems like JACK do. And if you're not, you need to be capable of buffer filling way more than one frame's worth of audio - at least 100ms, preferably much more. Not just for robustness, but also for battery life.
Sure, there was an Android bug here, but the way Netflix designed this is fragile and, as someone who has messed with audio/video programming enough, wrong. Had they done things properly, they would've been insulated from this OS bug.
This kind of bug is best used as a learning experience: what happened here consistently and resulted in completely destroyed playback on one device, is something that has always been happening sporadically and hurting all of your users whenever something causes the CPU to stall or slow for long enough to miss the deadline anyway. Instead of just fixing the bug, make your software robust against these cases, and then all your users win.
I work on an app which relies on fairly low latency real-time audio (triggered by a MIDI keyboard) on both iOS and Android and have had more than my fair share of headaches with Android audio, but have to say that the Oboe audio API in the newer versions of Android has made things quite a lot better. We use JUCE as an abstraction layer over the audio APIs.
> And really doesn't have the ability to mark threads as real-time that will stick.
Yeah, we found this is the cause of a lot of real-time audio issues on Android. If you use systrace you can see that your audio thread is hopping from core to core. We get around this by manually setting the affinity of the audio thread to a certain core (working out which core to use requires some guess work sadly, there's meant to be a way to ask for the "best" core but it doesn't really seem to work).
For many things, they both work well in general (Windows is better for "document stuff" and file management, Mac a little better in window management etc., but comparable in general).
But when it comes to media stuff (think: cropping a lossless audio file and compressing the result into an AAC or MP3, resizing 20 images, transcoding a movie from one format to the other, removing one page from a PDF), on Mac it can usually be done with built in tools. On Windows you often need either a prohibitively expensive software suite or go on an ad infested crapware hunt resulting in a borderline unusable tool.
Can you elaborate on this? I've been using a Mac for work for a couple of years now and I find the out-of-the-box window management experience pretty bad compared to Windows. No snapping to sides or corners, no quick resize to standard sizes other than maximum, and most importantly no easy way to change between multiple windows of the same program using keyboard shortcuts. I've been using Magnet  and Contexts  to improve things, but it's a shame I have to use paid third party apps for basic functionality like this.
But, everyone has a different workflow. I can imagine that if you're a heavier shortcuts user, Windows works better.
Spotlight is great and puts Windows 10's search to shame, although it feels like it has gotten worse in the past few years.
I picked it because it was closest to the window-snapping / sizing bindings I had in XFCE years ago, but it basically makes OSX window management no longer frustrating.
Try also using “Move focus to active or next window”. I have that mapped to Option-Command-` (and implicitly Option-Shift-Command-` for reverse order).
That's fine by me to be honest—I paid for it once and it's done everything I need it to ever since.
CMD + ` (backtick)
I wanted to download a program so that I could control my fans a bit better. Before I knew it, it took over my browser and would redirect me to other sites, couldn't remove the program either. Norton didn't do its job, and I didn't do mine because I probably unchecked one box too few during the installation process. The only way to fix this was to wipe the disk completely.
I've never had this experience on Mac or Linux.
Executing strange binary files from untrusted sources could ruin your day on Mac and Linux as well. It is only less common because market share is so small compared to Windows, and as such these systems are less of a target.
I know what I do, I generally try to find user reviews, but someone had to install it first.
I personally just uninstall Malwarebytes and bitdefender after I clean up and make sure windows defenders up and running and I'm ready to download all kinds of shit off the internet again.
I think the best way to find niche software on windows is to search for open source alternatives.
Making a system restore point from time to time can help greatly too on windows! (make sure you change the setting so that it doesn't delete your old system restore points due to low disk space making the whole thing useless)
I would not assume that malware is actually purged, and unpacking backup archive and running script to reinstall software is not so troublesome (assuming that you have backups and script to automate installing your programs, but if you are on HN then you probably have it)
I use Clonezilla for backups.
Which tools are these? On Mac I ended up using Audacity for cropping audio files, ffmpeg (cli) to compress it, and used either ffmpeg or Handbrake for transcoding. I don't remember what I used for pdf page split/combine, but I do remember that it was a pain.
Honest question - as I still use Mac occasionally, it'd be great to know what are those builtin tools so I can actually use the features the OS has.
PDF page manipulation is the easiest thing on macOS, also quite robust usually. Just open the document in Preview and open the page sidebar. You can reorder or remove pages there; or drag and drop pages from other documents.
Quicktime Pro could do this all very well, Quicktime X however? It can trim but the UI is bad and encoding options are so limited I stopped using it all together.
I am still to discover a built in set of tools on either platform. I use 3rd party to accomplish my daily tasks on both.
Apart from that, more advanced 1st party apps for editing (for certain values of advanced) are available free - iMovie for video, GarageBand for audio, Pages for documents, etc.
I need to echo the sibling comment. I was overwhelmed with how bad Mac window management is in a multi-monitor setup. This is compounded by Apple's apparent hatred of keyboard shortcuts and power users. Speaking to my longtime Mac-loving coworkers, they displayed Sapir-Whorf-like inability to understand why the missing features/behaviors are even a problem.
In a the Covid/WFH world, I'm happy to let the 16" MBP collect dust while I use computers that adapt to me, rather than forcefully adapting myself to one particular computer's eccentricities.
Didn't use to be this way in the QuickTime Pro ecosystem but since QuickTime X you can't just work with any old video files.
For basic audio/video stuff on Windows, try http://avidemux.sourceforge.net/
If your willing to spend the money Adobe Media Encoder works perfectly fine on Windows as well.
See this blogpost (from 2018) that seems to confirm my recollections:
Seems to be more of an issue with manufacturers than a specific issue with Android as Google's own phones get low latancies.
The issue with other manufacturers is that they concentrate a lot more on features that are visible to users whereas features like these aren't necessarily visible directly to the user and aren't given attention even though they're possible to implement.
Isn't that the case in your experience?
1) The audio and video playback is driven by the hardware, using the audio stream as the master clock.
2) Ninja does fill the Android and hardware buffers and keep them full. In the pathological case, the hardware and Android buffers had emptied.
3) When the Android buffer isn't full, the thread yields to Android (to be a good citizen) but asks to be invoked again right away. The 40ms "background thread" delay broke this behavior. The comment about "changing this behavior involved deeper changes than I was prepared to make" was when I explored changing this behavior (copying multiple samples per invocation) and decided it was more likely to introduce more bugs.
But why did the problem only appear on this particular TV and not any other using the same version of Android? Was this the only TV to use Lollipop?
So yes, hundreds of devices on a variety of networks, but at any given moment most of the devices are in storage.
I still think the blocksize should've been larger for various reasons in this case (it's a trade-off still, usually larger block sizes are mildly more CPU-efficient, besides preventing pathological scheduling cases like this one) but this explanation makes more sense.
Why not? According to , using timers is how Windows, CoreAudio, and PulseAudio all work under the hood, and on Windows and in PulseAudio it replaced the previous interrupt-based implementations. On the app-end of the APIs, Windows' WASAPI code example uses Sleep polling , and PulseAudio's write callback is optional and VLC doesn't use it , foobar2000 has a polling-based output mode, Windows has specific APIs for audio thread scheduling , etc.
Is this a specific deficiency of Android?
1) events/interrupts can be delayed (usually by (bad) drivers disabling interrupt for a long time).
2) latency is not adjustable. You may want to give your processing more time if you know more work needs to be done, or the system is under variable load.
3) you will usually finish generating your audio too early. Eg: if the interrupts is every 2ms, but your processing is 1ms long, you are adding an unnecessary 1ms latency by starting as soon as the interrupt is received.
4) the device will consume more power as it can not know when it will next wakeup (theoretically, I do not think it makes a difference in practice)
Modern mobile devices use a DMA to copy the content of the application processor (AP) to the audio DSP. This DMA will copy periodically the content of a AP buffer to a DSP buffer, this is called a DMA burst.
You want to track this burst and wakup just enough time before it to have the time to generate your audio data and write it to the AP buffer + a safety margin.
This allows you to track the system performance & application load to adjust your safety margin and optimize latency. It also allows the scheduler to know when your app will need to be woken up far in advance leading.
The Android Aaudio API  implement what I just described, as well as memory mapping the AP buffer to the application process to achieve 0 copy. It is the way to achieve the lowest latency on Android.
I believe Apple low latency APIs uses a similar no interrupt design.
Source: worked 3 years on the Android audio framework at Google.
Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.
The previous options are either SLES or AudioDriver, both of which are bad in their own ways.
My comment was about the parent's affirmation that low latency apps should be scheduled by interrupt.
This same jousting over technical windmills happens in other userspace elements of Linux. Linus, for better or worse, keeps a tight rein on what's allowed in the kernel, but when it comes to audio, window management, display, UX, everything is a shitshow of mixed metaphors and dysfunctional interoperability.
Wait... That's not how that works.
Surely they'd have picked this up if it were that straight forward?
Having worked in several top-tier for employee-compensation firms, I can wholeheartedly assure you that best pay does not result in best developers (or any other job function).
It does generally result in a higher “floor” of employee, but that’s about it.
If you spend your time learning about the details of audio playback what kind of additional time do you have to learn leetcode?
- Unacceptably high latency for any sort of seek operation - even if it's just going back a few seconds to what you just watched and ought to still be somewhere in a local buffer.
- Trying to seek multiple times sometimes forces you to wait for the previous operation to finish instead of "accumulating" your taps to determine a new time target
- Grabbing the playback marker and dragging it left and right is very imprecise, it should converge into "logarithmic" behaviour for fine movements
- Flakey Chromecast button. More than half the time I have to kill the app and relaunch it to get the cast button to show up.
- Occasionally flakey UI layout (random bugs which present when moving between screens or searching, or flickering due to layout engine doing multiple repaints over the same controls)
- Likes to plaster up things I don't want. They seem to prioritize "discovery" over "user intent", which would work if only they surfaced articles I'm interested in (which is rare). When I just want to find a "comedy" or get to the series I'm watching, it sometimes feels like I'm fighting against the UI
- Constantly moving my cheese - it's like the UI designers have ADD and can't commit
Amazon Prime is just as bad (maybe worse). I wish these companies didn't control the vertical - i.e. just provide content, and let other vendors make the client software.
Unacceptable for what use-case? I had assumed this was somewhat intentional on Netflix's part, or at least not a priority for them.
You seek back a scene or two while watching something when your mind has wandered away and you missed what was going on a few seconds ago. For some non-neurotypical people it's a really common thing to do while watching a longer movie as it may be hard to follow what's going on otherwise.
If your application is timing critical, and you aren't using things specifically desiged for time critical applications (like RTOS), then you should be doing as much low level implementation yourself as possible.
Author addressed this point and agrees with it:
> Why don’t you just copy more data each time the handler is called? This was a fair criticism, but changing this behavior involved deeper changes than I was prepared to make
In general, "real time audio" usually refers to low-latency audio, which uninteractive video playback is not.
That does answer the issue of why it was implemented like that in the first place
Is this realistic? I don't know much about providing software for a wide range of hardware, but I would imagine this would require either Netflix or the integrator to write drivers for Netflix to communicate to the hardware dsp right (or standardize)? It seems more feasible to piggyback on a platform that has done that already and is already widely deployed.
The system was built to fill the (hardware) buffers for 15ms worth of audio. The problem is that the 15ms window was not invoked in time (sometimes it took 40ms). Some suggested Netflix that in that case they could fill audio data for 40ms into the HW, but they didn't wanted to because it required low-level modifications.
The point is that there was never any good reason to make the Netflix app so timing-dependent. Had Netflix not done this they wouldn't have even noticed.
That said, if a company was releasing an Android 5.0 powered device in late 2017 when Android 8.0 had already been released I can't help but say they brought this on themselves by being idiotically out of date. There's no excuse for launching a device with an OS three years and three versions out of date.
Everybody in this story is doing really dumb things.
So the real question should be, does Netflix have a valid reason for only delivering a single frame of video and audio at a time? Is it bad software design, or good software design for an unknown problem (to us)?
I've seen plenty of things that looked stupid out of context, then made sense later on after understanding why the "stupid" things was done in the first place.
Other thought would be, whether the architecture was somehow related to how one had to do things in Silverlight (any silverlight devs still remember how one would do this sort of thing? Any Silverlight devs still around?)
Netflix can and should support their software on as old of OS releases as makes sense for them, but a hardware vendor introducing a new device to market on an outdated OS is inexcusable. Android 5.x received its final update in early 2015, over two full years before this mystery TV box was to be released. There is no good reason it couldn't have been on Android 7 or 8.
(A good kernel can account for the time it takes to wake from idle and compensate when programming a timer (as could a good CPU, but I’ve never heard of hardware that does this) and can wake a user program a bit early if the CPU is otherwise idle and would take a short nap. A sound hardware interrupt can’t do these tricks.)
(1) It is absolutely not true that the system clock will track the audio clock "quite well". Real world numbers would typically involve drift measurable in seconds within a low integer number of hours.
(2) It absolutely is true that on most sensible audio interface designs, you don't need interrupts except early after device open. At that point, you use them to set up a DLL (Delay Locked Loop) that will enable you to use the system clock to know where the audio hardware is reading/writing to in the buffer used to move data to/from the hardware. Once the DLL is correctly configured, you can just be woken by the system timer, determine the current audio hardware state and read/write data appropriately.
Relatively speaking maybe. But I have a device I've designed that communicates to PC using interrupt transfer at the rate of 250 requests/s. I've watched it then over oscilloscope and it is stable like a rock. Sure if you overload PC it might get into trouble but it is a game like application so when the app is ran PC is not doing much else. I recently ported the same thing to Raspberry Pi 4.
x86 interrupts are slow but at least reasonably consistent on a non-idle system. If your system is deeply enough into its various idle states, this is not at all true any more. I’ve seen latencies over 10ms on Sandy Bridge with C1E enabled.
I honestly had a tough time parsing the sentence when you said that due to the triple negative, so it's possible that others might have misunderstood it.
I don't understand this criticism, but I am likely missing something.
From what I gathered from the article, Ninja is specifically for firmware. It buffers frames, and delivers those frames whenever requested.
> your processing loop needs to be directly driven by audio hardware buffer events/interrupts
Again, I could be missing the thrust of the criticism, but isn't that exactly what's happening, here? Ninja supplies an endpoint, and it's the hardware that calls it?
Going by the article they use a plain android thread that runs at fixed 15ms intervals and buffers exactly one frame in advance. For 30 fps this is more often than needed (yay battery life) and for 60 fps that just asks for trouble when your OS doesn't guarantee real time behavior.
> Ninja supplies an endpoint, and it's the hardware that calls it?
As far as I understand it provides a cross platform endpoint, called by a platform specific wrapper. In the Android case this was a background worker and Android considered background workers not real time critical.
> They use a plain android thread that runs at fixed 15ms intervals and buffers exactly one frame in advance
From the article: "...buffers several seconds worth of video and audio data on the device, then delivers video and audio frames one-at-a-time to the device’s playback hardware"
And: "under normal playback the thread copies one frame of data into the Android playback API, then tells the thread scheduler to wait 15 ms and invoke the handler again"
And: "When you create an Android thread, you can request that the thread be run repeatedly, as if in a loop, but it is the Android Thread scheduler that calls the handler, not your own application."
But from OP ...you need to be capable of buffer filling way more than one frame's worth of audio - at least 100ms, preferably much more. Not just for robustness, but also for battery life.
I suspect there is a misapprehension here. "...under normal playback the thread copies one frame of data into the Android playback API, then tells the thread scheduler to wait 15 ms and invoke the handler again". I suspect that one can call that handler more often than every 15ms to fill the playback buffer, and then in normal playback one adds frames one at a time to keep the buffer filled and no faster, but on quick reading it sounds like as the OP stated.
A better approach, if the micro you are using allows it, is to have an interrupt when (for example) the I2S buffer is empty. I would then point the DMA to fetch the next buffer (already processed and mixed) and fire the DMA transfer.
> at least 100ms
This calls for huge latency problems. But I understand the approach if your buffer-filling/reading procedures are slow or unreliable.
I disagree with the timer thing. There are systems that provide precise timers for media (for example Win32 multimedia timers, that I never used but I know they exist).
It doesn't mean there is 100ms of latency, it just means that 100ms of audio is buffered so that you have ~100ms of leeway about when you app's audio thread is scheduled. Changes to the audio stream such, as stop/start/volume control, can be achieved with much lower latency using buffer rewriting or by applying changes lower down the stack where the buffers are smaller, or both. By default PulseAudio will buffer ~2000ms of audio from clients 
Anything above 10ms (some say max. 20ms) for real-time audio processing (especially musical instruments) is prohibitive. Imagine an electronic drum set: if you hit the snare and the audio is output 100ms later from the speakers, I bet you'll notice it :)
You can still have 100ms buffer without 100ms latency: within 10ms of the drum being hit, write 100ms of the drum sample into the playback buffer and immediately trigger its playback (or write it into the buffer starting at the cursor position that is just about to be played).
The only trouble is when you need to modify some of that 100ms before it is played back, for example if the user hits another drum 50ms later. In that case it becomes more complex, you'd have to overwrite some of the existing buffer with a new mix of both drum samples. The complexity is not worth it for that kind of app.
For a simple app like a video player, the audio stream is much more predictable so you can buffer more. Volume changes and pausing can still be applied with no perceivable latency by modifying the existing buffered data 
You can do that if you can predict the future and fill the buffer with data you don't know (samples from the future). Otherwise, you still have to wait for 100ms of samples to output. So if you have to wait 100ms for the samples, then the output happens after 100ms, hence 100ms of latency.
100ms samples can be fine for a video player. For real-time you (usually) do this: 2 buffers of 10ms each. While one buffer is playing, you fill another buffer with real-time data. After 10ms has passed, you start playing the buffer with real-time data, while the other buffer gets filled.
> Otherwise, you still have to wait for 100ms of samples to output
No, you don’t, not if you can write into the buffer at an arbitrary point. This is what the whole page I linked is about.
We have considered ditching Netflix (content is meh for this household) but after using one of the other apps we frequently bail and find something on NF simply because the overall experience is better.
Apple on the other hand, seems to think: "we have a glorious 4K 60 fps 30 Mbit Dolby Atmos 7.1 extended edition of <movie name> on our server. What's the point in experiencing anything less than the perfect experience? You either get the full shebang, or a spinning loading indicator."
HBO, Disney, Hulu, Apple etc. are all somewhere in the middle, and yes Netflix is FAR ahead of the pack.
I actually think they have some great content, and a pretty decent app. Their “originals” especially are pretty great - Fleabag, the Boys, the Expanse, all excellent. But yeah, the quality of the app is a bit lower than some of the others, and they have a lot less content than Netflix, because Prime is really not a main focus of their company, more of a nice add-on.
I assume Netflix ships a much lower level integration that doesn't rely on nearly as many of the TV OS's api's and therefore can launch even when Prime has decided to bork the whole system.
Disney streaming is pretty good, but the UI isn't as good as Netflix for use across the room. Why do they make the title cards have second level text that is so small it can't be read, and then not have the continually updated preview that Netflix does?
I agree Amazon is pretty disappointing. It seems more like a dodgy pirate video player than a major competitor.
There's a setting for language on the web, but it only affects web.
In my case I'm a foreigner who is still learning the local language, but it must be a real nuisance for people in multilingual countries like Belgium or Switzerland where I guess it would just go "oh, Swiss IP, I guess you want German then".
The firestick remote replacement app (because I can't ever find the actual remote) is also a joke: it disconnects all the time, fails to reconnect and it is generally not very usable.
maybe low for western standards, but the majority of devices worldwide cannot display higher resolutions.
hek, even new budget phones today rarely exceed 720p.
anecdote: worked at a cinema and we cast 720p onto the big canvas (10meters wide) because people would not notice a difference to anything higher. overall bitrate and quality of encoding are mostly the deciding factors.
Netflix limiting mac chrome playback to 720p is case in point if true (though I've never heard this). What % of their "high-end" users is this? My macbook has more than five times as many pixels than 720p... I've never noticed this, and Netflix surely recognizes that they don't stand to gain much by streaming higher resolution to most users.
Netflix, of course, does build for videophiles, but I still hold my conviction to the extent that I constantly question the priority of work to make something 1080p or 4K.
I will certainly concede that VR use cases need higher than 720p resolution.
> I walked upstairs and found the engineer who wrote the audio and video pipeline in Ninja, and he gave me a guided tour of the code.
THERE IS NO SUBSTITUTE FOR DEVELOPER CONTINUITY. Yes, you want your code as good as possible, your docs as good as possible, your knowledge transfer as good as possible. But there is ultimately no substitute!
Domain knowledge matters, and takes years to develop. Familiarity with the codebase matters, and in a complex and changing codebase is a process that is never complete. Having access to the people who wrote the things in the first place matters.
These facts are often neglected in a world where developers are treated like swappable commodities, and it's considered normal to job hop frequently.
Organizations that treat developers well and give them paths of advancement/reward within the organization to encourage developers to stick around for a while will reap benefits.
I don't necessarily think you need the people who wrote the code originally though. As long as there's enough overlap between generations, and deliberate effort to train people on different parts of the system, knowledge can be preserved. That's probably not possible if everyone changes teams after 2 years though, except for relatively simple or highly standardized codebases.
That was a few years ago, the versions that came at around 2016++ (after API 23) were the versions that became more stable, and relied less on OEM specific magic.
I recently spent over a week trying to find why a Windows PC was acting erratically in very specific circumstances and seemed to work fine the rest of the time. It was (here also) a problem with timers. Once the problem was identified, correcting it took 15 seconds.
I sometimes worry that in healthcare it's the other way around: doctors do an immediate diagnosis and then months (or years) are spent trying to fix what they think needs fixing. Isn't it possible that we are doing it wrong and are not actually getting to the bottom of things?
Compare, however, to software: unlike the human body which has numerous small variations but a common base, software is free to take many different forms and frameworks. This software continues to change over time, so while documentation certainly can exist, it can be immensely fragmented and describe complex behaviours. Plus, when software developers fix an issue it often tends to stay fixed; there is a use for a postmortem but no immediate necessity. But in people, this has to be diagnosed every time it is encountered, so it is natural that documentation will occur.
(This isn’t the perfect metaphor, of course, but I hope it roughly corresponds.)
This story really exemplifies an aspect of my
job I love: I can’t predict all of the issues
that our partners will throw at me, and I
know that to fix them I have to understand
That is not a part of the job that I love. That is a part of the job that makes me want to bash my brains out with a rock.
I actually do sort of like that kind of problem-solving, in the abstract, buttttttt in reality I usually need to do it under pressure for unappreciative clients/management who just know that things are taking longer than expected. This is of course a communication challenge, and I am a good communicator in general, but... I'm kind of burned out on it.
Did he really, truly mean that? Was it sarcasm? Was it something PR asked him to add? Was it something PR added without asking him?
That's what I miss working from home. Random, spontaneous, productive interactions with coworkers.
It's finding bugs like these that give you an immense sense of satisfaction for discovering it once you get through all the frustration.
Amazon Prime app on LG TVs has had lip sync issues for at least 4 generations of their TVs, problem has been reported both to LG and samsung, and no one cares enough to fix it. When using the app on my LG CX the audio is about 500ms behind, it's an insane amount of missing audio sync and no one cares to fix it? No other apps do that, it's specifically amazon prime, but I guess it doesn't sell their Fire Sticks so hey, who cares?
With such short buffers (16.6ms) and a user-space timer (i.e. basically usleep) driving the whole thing I’m actually surprised that this didn’t broke earlier...
Sadly, Medium is now more full of crap than a herd of circus elephants, with tech blogs written by people who have nothing under their belt but a code camp, but this is a diamond in the rough.
But then you have the same websites, cluttered with Ads all over and those super annoying pop ups, ultimately slowing down the experience.
Have you looked for any?
That part confused me. Netflix completed its own certification process? Does the author mean himself, as "Netflix"?
Android itself just isn’t very suited to real time tasks.