Hacker News new | past | comments | ask | show | jobs | submit login

The other commenter was talking about audio software that both consumes and produces samples at a fixed rate. Clearly, if the audio software is late grabbing 64 samples from the input device, it's also late delivering the next 64 to the output device, and there will be a dropout. The output sample clock has to be the timing master, and the software can never be late, and since it's also waiting for the input audio, it can never be early enough to "get ahead", either.



I am not sure we can make the assumption that the input and output devices are on the same clocks or run at the same rates. Maybe they are (in a good system you'd hope they would be), but I can think of a lot of cases where that wouldn't be true.

However, even when they are synced, you can still easily see the problem. The software is never going to be able to do its job in zero time, so we always take a delay of at least one buffer-size in the software. If the software is good and amazing (and does not use a garbage collector, for example) we will take only one delay between input and output. So our latency is directly proportional to the buffer size: smaller buffer, less latency. (That delay is actually at least 3x the duration represented by the buffer size, because you have to fill the input buffer, take your 1-buffer's-worth-of-time delay in the software, then fill the output buffer).

So in this specific case you might tend toward an architecture where samples get pushed to the software and the software just acts as an event handler for the samples. That's fine, except if the software also needs to do graphics or complex simulation, that event-handler model falls apart really quickly and it is just better to do it the other way. (If you are not doing complex simulation, maybe your audio happens in one thread and the main program that is doing rendering, etc just pokes occasional control values into that thread as the user presses keys. If you are doing complex simulation like a game, VR, etc, then whatever is producing your audio has to have a much more thorough conversation with the state held by the main thread.)

If you want to tend toward a buffered-chunk-of-samples-architecture, for some particular problem set that may make sense, but it also becomes obvious that you want that size to be very small. Not, for example, 480 samples. (A 10-millisecond buffer in the case discussed above implies at least a 30-millisecond latency).


Music or video production studios typically have a central clock, so for this use-case the sample rates should be perfect. But even if the input and output devices are on perfect clocks, with NTSC (59.94 Hz), you'd need a very odd number of samples per video frame in your software, if your processing would happen at a integer fraction of the video frame rate.


Do you know whether studios use 48000Hz with 59.94fps or 48000/1.001 ≈ 47952Hz? Does converting from 24fps film to 23.976fps Blu-ray require resampling the audio? Or are films recoded at 48048Hz and then slowed to 48000 for consumer release?


The short answer is that it's complicated. Digital film (DCP) is typically 24 fps, asirc -- and that doesn't go well into 60, or 50. And the difference is enough that you need to drop a frame and/or stretch the audio. And sometimes this doesn't go so well.

There's a relatively recent trend to try and record digital all the way, and this is also complicated. Record at 24 fps? At 30? At 60? 60 fps 4k is a lot of data. And sound is actually the major pain point -- video frames you can generally just drop/double, speed up/down a little to even things out. But 24 fps to 60 fps creates big enough gaps that audio pitch can become an issue.


If everything happens strictly synchronous to your audio clock, then fixed block processing is the way to go.

But jblow is right in that when you have to feed in samples from a non-synchronized source into your processing/game/video-application/... then trying to work with the fixed audio block size will be terrible/require additional synchronization somewhere else, such as a adaptive resampler on the input/output of your "main loop".




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: