However, even when they are synced, you can still easily see the problem. The software is never going to be able to do its job in zero time, so we always take a delay of at least one buffer-size in the software. If the software is good and amazing (and does not use a garbage collector, for example) we will take only one delay between input and output. So our latency is directly proportional to the buffer size: smaller buffer, less latency. (That delay is actually at least 3x the duration represented by the buffer size, because you have to fill the input buffer, take your 1-buffer's-worth-of-time delay in the software, then fill the output buffer).
So in this specific case you might tend toward an architecture where samples get pushed to the software and the software just acts as an event handler for the samples. That's fine, except if the software also needs to do graphics or complex simulation, that event-handler model falls apart really quickly and it is just better to do it the other way. (If you are not doing complex simulation, maybe your audio happens in one thread and the main program that is doing rendering, etc just pokes occasional control values into that thread as the user presses keys. If you are doing complex simulation like a game, VR, etc, then whatever is producing your audio has to have a much more thorough conversation with the state held by the main thread.)
If you want to tend toward a buffered-chunk-of-samples-architecture, for some particular problem set that may make sense, but it also becomes obvious that you want that size to be very small. Not, for example, 480 samples. (A 10-millisecond buffer in the case discussed above implies at least a 30-millisecond latency).
There's a relatively recent trend to try and record digital all the way, and this is also complicated. Record at 24 fps? At 30? At 60? 60 fps 4k is a lot of data. And sound is actually the major pain point -- video frames you can generally just drop/double, speed up/down a little to even things out. But 24 fps to 60 fps creates big enough gaps that audio pitch can become an issue.
But jblow is right in that when you have to feed in samples from a non-synchronized source into your processing/game/video-application/... then trying to work with the fixed audio block size will be terrible/require additional synchronization somewhere else, such as a adaptive resampler on the input/output of your "main loop".