But I'll drop a few hints. First of all, nobody is talking about running interrupts at 48kHz. That is complete nonsense.
The central problem to solve is that you have two loops running and they need to be coordinated: the hardware is running in a loop generating samples, and the software is running in a (much more complicated) loop consuming samples. The question is how to coordinate the passing of data between these with minimal latency and maximum flexibility.
If you force things to fill fixed-size buffers before letting the software see them (say, 480 samples or whatever), then it is easy to see problems with latency and variance: simply look at a software loop with some ideal fixed frame time T and look at what happens when T is not 100Hz. (Let's say it is a hard 60Hz, such as on a current game console). See what happens in terms of latency and variance when the hardware is passing you packets every 10ms and you are asking for them every 16.7ms.
The key is to remove one of these fixed frequencies so that you don't have this problem. Since the one coming from the hardware is completely fictitious, that is the one to remove. Instead of pushing data to the software every 10ms, you let the software pull data at whatever rate it is ready to handle that data, thus giving you a system with only one coarse-grained component, which minimizes latency.
You are not running interrupts at 48kHz or ten billion terahertz, you are running them exactly when the application needs them, which in this case is 16.7ms (but might be 8.3ms or 10ms or a variable frame rate).
You don't have to recompute any of the filters in your front-end software based on changing amounts of data coming in from the driver. The very suggestion is nonsense; if you are doing that, it is a clear sign that your audio processing is terrible because there is a dependency between chunk size and output data. It should be obvious that your output should be a function of the input waveform only. To achieve this, you just save up old samples after you have played them, and run your filter over those plus the new samples. None of this has anything to do with what comes in from the driver when and how big.
I should point out, by the way, that this extends to purely software-interface issues. Any audio issue where the paradigm is "give the API a callback and it will get called once in a while with samples" is terrible for multiple reasons, at least one of which is explained above. I talked to the SDL guys about this and to their credit they saw the problem immediately and SDL2 now has an application-pull way to get samples (I don't know how well it is supported on various platforms, or whether it is just a wrapper over the thread thing though, which would be Not Very Good.)
This is, actually, how most professional audio APIs are designed and they generally work quite well. ASIO, VST, JACK, PortAudio, CoreAudio, etc.
However, even when they are synced, you can still easily see the problem. The software is never going to be able to do its job in zero time, so we always take a delay of at least one buffer-size in the software. If the software is good and amazing (and does not use a garbage collector, for example) we will take only one delay between input and output. So our latency is directly proportional to the buffer size: smaller buffer, less latency. (That delay is actually at least 3x the duration represented by the buffer size, because you have to fill the input buffer, take your 1-buffer's-worth-of-time delay in the software, then fill the output buffer).
So in this specific case you might tend toward an architecture where samples get pushed to the software and the software just acts as an event handler for the samples. That's fine, except if the software also needs to do graphics or complex simulation, that event-handler model falls apart really quickly and it is just better to do it the other way. (If you are not doing complex simulation, maybe your audio happens in one thread and the main program that is doing rendering, etc just pokes occasional control values into that thread as the user presses keys. If you are doing complex simulation like a game, VR, etc, then whatever is producing your audio has to have a much more thorough conversation with the state held by the main thread.)
If you want to tend toward a buffered-chunk-of-samples-architecture, for some particular problem set that may make sense, but it also becomes obvious that you want that size to be very small. Not, for example, 480 samples. (A 10-millisecond buffer in the case discussed above implies at least a 30-millisecond latency).
There's a relatively recent trend to try and record digital all the way, and this is also complicated. Record at 24 fps? At 30? At 60? 60 fps 4k is a lot of data. And sound is actually the major pain point -- video frames you can generally just drop/double, speed up/down a little to even things out. But 24 fps to 60 fps creates big enough gaps that audio pitch can become an issue.
But jblow is right in that when you have to feed in samples from a non-synchronized source into your processing/game/video-application/... then trying to work with the fixed audio block size will be terrible/require additional synchronization somewhere else, such as a adaptive resampler on the input/output of your "main loop".
Why do you say this? The USB audio card (or similar) is generating blocks of audio at a fixed rate, no?
Maybe for video playback or games you need to synchronize audio and video, but there is no need to do that for music production apps.
If you are writing some sort of synth, as soon as you receive a midi note or a tap, trigger the synth and the note will play in the next audio block. No need to wait for the GUI to update.
If you are doing some sort of effect, grab the input data, process and have it ready for the next block out. I don't understand why you need a second loop.
I think you will find, though, that most hardware isn't this way, and to the extent this problem exists, it is usually an API or driver model problem.
If you're talking about a sound card for a PC, probably it is filling a ring buffer and it's the operating system (or application)'s job to DMA the samples before the ring buffer fills up, but how many samples is dependent upon when you do the transfer. But the hardware side of things is not something I know much about.
> If you are writing some sort of synth, as soon as you receive a midi note or a tap, trigger the synth and the note will play in the next audio block
Yeah, and waiting for "the next audio block" to start is additional latency that you shouldn't have to suffer.
> If you are doing some sort of effect, grab the input data, process and have it ready for the next block out. I don't understand why you need a second loop.
The block of audio data you are postulating is the result of one of the loops: the loop in the audio driver that fills the block and then issues the block to user level when the block is full. My whole point is you almost never want to do it that way.