What is the requirement for synched audio? milliseconds, microseconds, nanosecon...

londons_explore · on Sept 19, 2020

10 ms is fine. 100ms works but you get odd echoes like a speaker system in a big building.

You need some way to compensate for clock drift between different playback devices. Even good clocks will drift 10ms in a few hours.

loxias · on Sept 19, 2020

1ms is barely "ok". with 1ms you'll still hear ugly comb filter effects. (see my other comments)

joshspankit · on Sept 19, 2020

At most 10ms. Sub 1ms would be awesome.

And yes, definitely. If you can build in a “chromecast receiver” so that a user could cast to multiple synced speakers (at full quality), you’d definitely have my attention

Dylan16807 · on Sept 19, 2020

It's worth noting that for two speakers playing the same audio, a 5ms offset will move the point of perfect sync by less than a meter. Going below a millisecond is definitely overkill territory.

magnetised · on Sept 19, 2020

In 1ms sound travels just over 30cms - so just moving around a reasonable room affects the offsets between two well spaced speakers by 5-10ms. I use my own multi room audio system at home - http://strobe.audio - and although the sync isn’t perfect by any means the effect of moving around your house usually the problem. Eg if you have the volume in another room loud enough that it’s louder than the music from the room you’re in you get the weird echo effect. Otherwise nothing.

Turns out our brains are really good at compensating for relatively large offsets

loxias · on Sept 19, 2020

I beg to differ: a good rule of thumb is that an (audio) sample takes about 8mm (44.1k). By the time you're a full 1ms late, that's a 44 sample difference between the two (or that's as though the speakers were about a third a meter apart.

And that distance, you're gonna hear some uncomfortable comb filtering. A third of a meter corresponds to a frequency of about 1khz, which is right smack dab in the middle of a critical band, so you WILL hear the constructive and destructive interference around ~1khz.

(and _5ms_? That means the speakers are now ~220 samples out of sync, so I'd expect any lows around 200hz to have some destructive interference, making the music sound hollow.)

1ms of sync is FAR less than adequate. It took effort, but I was able to get a distributed set of raspis to within about 5 samples at 44.1k, and I think I can do better.

Why do I know this or care? :) I've done (and still do) a lot of research (and inventions) in the field of microphone array processing and audio source localization -- tracking objects purely by their sound. With cheap commodity equipment I can generally get a resolution of 5cm, and with lots of fine tuning I was able to do sub-cm tracking.

Dylan16807 · on Sept 19, 2020

All those problems happen even if your speakers are perfectly in sync. These aren't headphones. If you have a couch for listening, the difference between sitting on the left and sitting on the right is just as big or bigger. You get the same interference, yet it isn't a problem.

m463 · on Sept 19, 2020

I think his thinking is still valid.

If you sit on the left or right the sound should still come from "center stage" or (x,y,z)=(0,0,0) without being shifted or drift.

Dylan16807 · on Sept 19, 2020

Depends on whether you have a stage. If you're playing audio to go with a TV you don't want it drifting much. But when the scenario is playing music over an area you end up with a bunch of mutually-contradictory centers and it doesn't matter if they shift over a little. And music over an area is what sparked this talk.