I've noticed an interesting feature in Chrome and Chromium: they seem to isolate internal audio from the microphone input. For instance, when I'm on a Google Meet call in one tab and playing a YouTube video at full volume in another tab, the video’s audio isn’t picked up by Google Meet. This isolation doesn’t happen if I use different browsers for each task (e.g., Google Meet on Chrome and YouTube on Chromium).
Does anyone know how Chrome and Chromium achieve this audio isolation?
Given that Chromium is open source, it would be helpful if someone could point me to the specific part of the codebase that handles this. Any insights or technical details would be greatly appreciated!
Within a single process, or tree of processes that can cooperate, this is straightforward (modulo the actual audio signal processing which isn't) to do: keep what you're playing for a few hundreds milliseconds around, compare to what you're getting in the microphone, find correlations, cancel.
If the process aren't related there are multiple ways to do this. Either the OS provides a capture API that does the cancellation, this is what happens e.g. on macOS for Firefox and Safari, you can use this. The OS knows what is being output. This is often available on mobile as well.
Sometimes (Linux desktop, Windows) the OS provides a loopback stream: a way to capture the audio that is being played back, and that can similarly be used for cancellation.
If none of this is available, you mix the audio output and perform cancellation yourself, and the behaviour your observe happens.
Source: I do that, but at Mozilla and we unsurprisingly have the same problems and solutions.