
How Google Meet's noise cancellation works - theanirudh
https://venturebeat.com/2020/06/08/google-meet-noise-cancellation-ai-cloud-denoiser-g-suite/
======
crazygringo
Most people have no idea of the amount of incredibly advanced signal
processing that goes into echo cancellation and noise cancellation in
videoconferencing.

This post is on noise cancellation specifically, and it actually has the
potential to be a _huge_ step forward.

One of the big audio problems with group meetings is that the background noise
from each participant adds up, to a point where it quickly becomes unbearable.
For that reason, videoconferencing generally only plays audio from one or two
participants at most, using a fairly simple estimation of whichever audio
signal is currently loudest. The problem is that this can make it really hard
to interrupt (people will literally not hear you), or tell the difference
between two people going "mm-hmm" versus the whole group. If you've ever been
in a group meeting where everybody applauds something, this is why you _see_
everyone applauding but only hear a smattering.

But if this noise cancellation really succeeds, it could be a huge leap
forward because audio cues and overlap will actually work for the first time
-- hearing the "mm-hmms", hearing everyone pipe up, and so on.
Videoconferencing will feel more like an actual single shared audio
environment, rather than the kind of "walkie-talkie" effect it so often feels
like now.

I'm really looking forward to this.

~~~
the_af
> _The problem is that this can make it really hard to interrupt (people will
> literally not hear you)_

This is driving me crazy with Google Meet in these COVID19 times. Even in a
relatively small conference, I have a really hard time interrupting someone to
ask a quick question, even when the speaker is expecting interruptions. It's
always "excuse me!"; delay as person continues speaking; I stop; the other
person says "yes, please ask away"; when I restart my question the other
person already assumed I've changed my mind and continues speaking; repeat ad
infinitum. And this is _if_ they even hear me over the audio breaking up.

It's very, very frustrating. If they solve this it would hugely improve
quality of life in remote conferencing for me.

~~~
fossuser
Yeah, I wish there was a simple non-verbal option to signal intent-to-talk.

I want to just be able to hit my self-view and have it have a big icon on it
or something so the person currently speaking (and everyone else) can see that
I want to say something. Maybe sort these in chronological order so the
speaker can see who wanted to talk first?).

In theory you could do this with a good chat, but for some reason the chat in
Zoom and the others is kind of an afterthought and nobody uses it.

One of the reasons I prefer text based chat is multiple people can talk at the
same time without needing to deal with interrupting audio. If you can type
well, the bandwidth is higher for group communication (and you get a log).

At least with video you can kind of tell when someone is waiting to speak by
seeing their expression. Audio only is worse (but maybe wouldn't be, if you
had good intent-to-speak tools built into the app?)

~~~
boogies
I really like how Jitsi Meet puts the hand raising/lowering button right on
the bottom bar, where there's just empty black/white space in Zoom/Google
Meet, and not buried inside a menu labeled "Participants" (???), where it's a
hassle to access (Zoom).

------
skybrian
> A musical instrument will probably also get filtered out. “To a pretty large
> degree, it does,” Lachapelle said. “Especially percussion instruments.
> Sometimes a guitar can sound very much like a voice — you’re starting to
> touch the limits there. But if you have music playing in the background,
> usually it’ll cut it all out.”

This is a big issue with hearing aids. The whole industry is focused on
optimizing for voice intelligibility and as a musician you end up doing trial-
and-error with the audiologist to turn all that stuff off.

We need more open source hearing aids - I've read of a few but they're not
mainstream.

~~~
aaronAgain
First, I'll say this to everyone: Get hearing aids if you need them. They can
change your world.

About music, this is getting much better in hearing aids. I've been from
analog thru digital over 15+ years of hearing aids, and my latest (3 months
ago) pair from Phonak (no affiliation) is an honest leap forward. It has a
built in Music profile that disables all sound optimizations in general, while
still attempting to correct the hearing ranges that you have a deficit in. I
was on the verge of no longer being able to hear with hearing aids, that has
probably been extended by 3-5 years with these new models. At that point I
will be approaching cochlear implant level hearing loss. I happily embrace my
cyborg future!

On top of Music, I have a Walking profile that attempts to focus on the person
that is walking to the left or right of my and can pick with side on the fly.
And they make great ear plugs when things are loud.

The Normal program, auto-magically selects between 8'ish profiles to pick the
best one for the environment. And it has finally got it right. Older models I
would daily need to force it into the best mode because it guessed wrong. The
latest model I only have to tell it what to do once every few weeks.

And to the original topic, noise cancellation, hearing aids bluetooth'ed to
the phone/PC for conference calls is hands down the best possible audio
experience. Built in noise cancellation, amazing microphones that can be used
for your voice portion of the call, tuned to your hearing, with some of the
finest sound output possible. Just amazing. These things are so good these
days that they are finally being labeled as assistive devices for people
without hearing loss. They can give someone with normal range hearing
essentially bionic hearing. Tinnitus? They play customized white noise to make
the ringing less noticeable. Doesn't help everyone, but it's really nice for
me. I hear more ringing when I take my aids out.

Oh, and it does all of this on a device that fits in your ear with a battery
the size of a few grains of rice and all in a few milliseconds so your brain
sees the mouth move at the same time it actually hears the audio.

Again, get them if you need them.

~~~
skybrian
They have some great features and I agree that people should get them (or
upgrade), but they are not optimized for musicians.

I just got a similar model I assume (M90-R) and it's definitely not switching
to music mode automatically when I play music. (Maybe it's different for
listening.) I just had the audiologist add a music mode that I can switch to
manually, but getting acceptable timbre for the instruments I play (accordion,
melodica, and piano) is work in progress. Making an expensive instrument sound
like cheap trash is disappointing, though of course I can take them out.

Having Bluetooth is nice, particularly for phone calls, but I find the sound
quality is unsatisfying for listening to music, so it won't be replacing
speakers for me.

------
xeno42
I've been using [https://krisp.ai/](https://krisp.ai/) to great effect with
Zoom while sitting outside on the laptop with road traffic, birds, etc nearby
- My team really had a "wow" moment when i turned it on the first time

~~~
meritt
Krisp is embedded into Discord (enable beta settings) and the voice chat
quality far exceeds any of the "business" focused software I've ever used.

Not to mention the screensharing is infinitely better as well. It's pretty
pathetic of the busines sapps, we went through a day where I was trying to
screenshare something and my remote coworkers kept complaining of lag,
blurriness, or the app would just crash (slack). We went through ms teams,
zoom, slack, and google meet. All had issues. Convinced everyone to install
Discord and suddenly I was able to shared my desktop perfectly at 1080p
without noticeable lag and crystal clear audio.

~~~
the_pwner224
+1

Discord's lack of lag in audio makes a huge difference for voice comms. I've
only used it for gaming, but you can really tell the difference when you
switch to the game's voice chat feature which has probably a third of a second
of latency. And of course Zoom et. al. have a lot more lag and it really hurts
the experience. In addition to low latency, the sound is also very good
quality.

~~~
sneak
Note also that no messages in Discord, including individual messages/DMs, are
end-to-end encrypted. This is precisely the same security issue that Zoom has
(and Slack, and IRC without OTR).

Discord can and does log all messages through the system, and has many
internal tools that operate on the plaintext. Anything you communicate through
Discord you should assume any/all Discord staff may read.

They claim that the voice comms are e2e but there are no further details
available (like where the keys are generated).

------
pierrebai
As far as my experience goes, the single best way to deal with background
noise is... the mute button.

In every video conf I've been, you can instantly tell when "one of them" who
can't be bothered to mute themselves joins. The audio quality immediately goes
down the drain. It's always the same subset of people who do it, too. As soon
as they're enjoined to please mute, the audio quality is restored.

No amount of magic signal processing will ever match it.

While perhaps misguided to use it that way, the mute button thus act as a
social-clueness meter.

~~~
Terretta
"Hold space to talk" is not a bad solve for this, also makes folks ramble
less.

~~~
Rebelgecko
Would probably have some major accessibility issues.

Plus I'd hate to be the intern that has to sit and hold the space bar while
the boss delivers a presentation

~~~
adrianmonk
It could be built so that the meeting organizer can exempt certain people and
so the system can exempt certain accounts.

------
jdm2212
I might be unusual, but my experience with videoconferencing has been that
ambient noise is rarely a major problem. The big issue is audio cutting out
due to a shaky network. When ambient noise is a problem, it's not so much
someone typing as their spouse talking in the background or a fire engine
going by -- and at that point the solution is for them to hit mute.

~~~
adrianmonk
> _at that point the solution is for them to hit mute_

From a technical point of view, that is really the best thing. It works, and
sometimes it's the only thing that works.

But if you try to get people actually do it, you run into problems:

(1) They don't realize it's them. AFAIK the system doesn't play their audio
back to them, so while everyone else hears the noise, they don't. The one
person who needs to take action is the one person who doesn't know action is
necessary.

(2) They are distracted. When their spouse is talking, they are focused on
whatever their spouse is saying, not on how it affects the meeting audio. Or
the meeting is boring and they're not paying attention.

(3) They just don't care enough. They are there to attend a meeting, not
fiddle with computer stuff. Some people will never take the time to learn
where the mute button is in the software.

Perhaps #1 could be improved, though, with some kind of blindingly obvious
indicator in the UI. If "YOUR MIC IS WHAT EVERYONE IS HEARING RIGHT NOW"
flashes when your mic takes the floor, maybe you'd notice it lighting up when
you didn't intend for it to.

~~~
ShroudedNight
AWS Chime has significant drawbacks, but one of the things I most liked about
it was that anybody could mute anybody else. The number of calls where that
significantly cut down on audio discomfort was surprising.

For those wondering, unmuting is a privileged operation that only the user
could do themselves.

~~~
tgsovlerkhgsel
Same with Meet, at least if you're using the regular enterprise/GSuite version
(probably different for schools and/or the nerfed version you get with a
personal account, since kids muting the teacher gets old really quick).

------
bigtones
I was not so impressed with this demo - especially when he was scrunching his
potato chip packet, the degradation in his voice quality made it almost
impossible to understand what he was saying and his voice sounded very
synthesized and processed, and that's through a $200 Yeti professional
microphone. Seems like some of the other noise cancellation technology options
from Nvidia RTX and others are more effective.

~~~
crazygringo
Microphone quality has very little to do with it. Noise separation is just an
incredibly hard problem, particularly when a noise is loud. Scrunching potato
chips, there's no scenario where his voice won't become degraded unless you
can isolate the scrunching sound separately (microphone beamforming can help
here, but is still never perfect).

Running this economically on servers at scale in realtime, I consider this
very impressive. I can't say how it compares with RTX, but I wonder if it has
anything to do with the amount of computing resources that can be dedicated to
it. A single expensive card dedicated to one audio stream, versus a single
Google server than needs to process hundreds (thousands?) of audio streams.

~~~
dannyw
RTX Voice uses less than 10% of a 2060 (non-super)

Given how common voice communication is in our world, I am sure Google can
build ASICs for this (if not just run it on TPUs), and get the marginal cost
of vocal processing to be negligible.

Heck, they probably would just need to divert <5% the resources of Fuschia or
any other of their "senior engineer retention" projects.

------
ben7799
I've been doing online guitar lessons since Covid-19 started and all these
algorithms just suck hard for that. Even in a 1:1 call.

Two repeated notes and the noise cancellation just immediately shuts you
down... we've been using Zoom and luckily you can turn all the audio
processing off if you go in "Advanced" and enable "Turn on original audio".

~~~
ehsankia
I'm surprised there's not an option to disable it?

~~~
Thorrez
What do you mean? ben7799 said there is an option to disable it.

------
graton
This would have been useful for my co-worker. They went on a trip to Europe
years ago and had conference call scheduled when they got there.

Unfortunately for them they decided to lie on the bed in the hotel and the jet
lag hit them pretty hard. Next thing you know they are asleep and they started
snoring and I guess fairly loudly and everyone on the call could hear. So then
the people on the call spend some time trying to figure out who is the person
snoring, going through all the attendees. Eventually they figure out who it
was and they started yelling trying to wake them up, which they did after
awhile. Needless to say my coworker was very embarrassed about the incident at
the time, but it did make a good story to tell people :)

------
mleonhard
Simple non-AI solution: Require all meeting participants to use push-to-talk.
Support foot-pedals, mouse buttons, phone volume up button, and bluetooth play
button.

For large meetings, organizers can enable a single-talker mode. Holding the
talk button puts you in a queue. Your screen indicates when it's your turn to
talk. This prevents folks from talking over each other. This eliminates echo
by muting the talker's speakers while recording their voice. Also, attendees
see the current talker, not the person whose dog just barked.

~~~
dexterdog
Unfortunately push-to-talk is not standard on anything. I've hacked together a
hotkey to do it, but there's always too much lag. Somebody should make a wired
headset with a button and two modes, one is push and hold to talk and the
other is push and hold to mute.

------
miki123211
To all those here who complain about algorithms messing with their audio when
they don't want them to. Use an app called TeamTalk. It lets you disable all
that processing, so it works great for high-quality music transmission etc. I
have no affiliation with them, I have been using it for a few years and I'm
very happy.

------
mwexler
All this work to elim background is great. But we also will need business
oriented group meets which emulate real life: Allow breakouts, 2-4 people in
group "sidebar" to chat, the burble of other convos drift in, providing a
gentle low background hum. The sidebar, unless marked private, would also
contribute to hum of other users in a main group or in sidebars of their own.
Acoustic effects can even allow directional it of the sounds.

Yes, we can do VR worlds that still look like Second Life, but while we are
working on fixing that, we might solve for near term things to improve
interactivity.

Well, and solve the "one person speaks, all must listen, lag for response,
loop" which I find is similar to how morse code discussions work.

------
JoeAltmaier
Everybody makes a stab at this, and very little of it works consistently. I
applaud Google for attacking this head-on! It is a big issue and deserves
attention.

My biggest issue (when I worked in videoconferencing) was echoing, and locking
onto the delay window where echoes could occur. Depending on the distance from
a conference room speaker to all the walls, echoes could occur at one or more
offsets (appear at microphone input with some delay after presenting at the
speaker). And ambient noises could masquerade as echoes. The filters tend to
be IIR filters, and get wound up easily. It was awful.

------
PascLeRasc
Microsoft Teams’ noise cancellation has been driving me absolutely crazy the
past few weeks. I live on a busy road, and whenever a car drives past Teams
will reduce the volume of anyone else talking in a meeting. Even if I’m muted
- Teams gives me the “your microphone is muted” notification. And I use
closed-back headphones for meetings so I don’t even hear the outside noise. So
this results in my having to constantly have the Windows volume slider open,
one earpiece off, and listen for a car coming so I can raise the volume in
advance. Is anyone else dealing with this?

~~~
Drew_
Try Control Panel > Sound > Communications > Do Nothing

------
arielserafini
I'd say this is more like a demo. From the "how it works" in the title I was
expecting to see some implementation details.

Edit: I had only watched the video. The article does indeed contain a lot more
detail.

~~~
notatoad
from the "venturebeat.com" domain though, this is about what i would have
expected.

~~~
stefan_
This is not the time to be snarky, I think we should congratulate the Google
Meet team on placing this great article, not to mention the obnoxious
integrations into other beloved Google products.

Can't wait to see what the Google _Duo_ team will come up with in response. I
mean, we saw the blog post on their great new video codec (AOMedia Video 1 was
it?) but I personally felt it left much to be desired.

What happened to the Hangout guys? Are they still in this one? Product middle
management wants to be wooed.

~~~
Orphis
Duo is targeted to the general public and has E2E encryption though, so cloud
denoising is not a possibility.

------
The_Amp_Walrus
Anyone know what kind of system they're using to do this? Any papers?

I messed around for a few months with speech enhancement last year and didn't
really get anywhere beyond sort-of-reproducing a few existing models:
[https://github.com/MattSegal/speech-
enhancement](https://github.com/MattSegal/speech-enhancement)

All the published "state of the art" examples I could find were pretty crap,
whereas Krisp AI were doing much better than what I've seen released publicly.

~~~
W0lf
I've worked on my last gig for a startup that did speech-in-noise recognition
and there we've used a recurrent neural net approach to separate background
noise from speech. This model was trained on many many hours of audio data as
well.

------
ruffrey
Serious question - what's the risk that someone with a high pitched, outside-
the-norm voice will get denoised? If it filters out kids in the background,
will kids no longer be able to use google meet?

~~~
bradstewart
I haven't been able to confirm this, but I swear it happens to my mom on Zoom.
When I video chat my family on Zoom and she isn't sitting directly in front of
the laptop, her words rarely come through. I can see her lips moving, I can
hear my dad grunting in agreement next to her--but she's silent. If they
switch places, I can hear my dad without issue.

I don't know if it's a combination of her cheap hardware or what, but it's...
odd.

EDIT: grammar

------
zoom6628
I would really like to use a Google Noise for meeting cancellation. Excessive
use of meetings now "because we can" instead of thinking longer and harder and
asking a well structured question.

Meetings should be for review/discussion and decision making not vocal
exercise and grandstanding.

Would also be great if meeting providers would have a dial to show current
latency for all participants to make easier to interject.

Lastly I do recommend using meeting tools that have features like letting you
vote, raise a hand, chat all in a sync to main voice. Will make life easier
for meeting moderators..... And if you don't use moderators then start the
practice of doing it - quality of meetings will improve hugely.

------
dekhn
What I'd really like to see is effective source seperation and nulling. For
example, if you could mute the screaming baby in the background of a VC
speaker (this has been fairly common occurrence now that we are WFH and it's
hard to get day care).

~~~
cyrux004
I was really looking for the baby noise test in the demo, but I guess for now
its human vs non-human cancellation ?

Edit: apparently can also remove kids crying; just not included in demo

------
pier25
As someone with 2 dogs this is going to be a good reason to switch to Meet
whenever possible.

------
newfeatureok
None of this fancy technology is necessary IMO.

Just implement push to talk with mute-by-default. 90% of the audio issues
would be resolved. Another 5% could be solved by buying everyone a decent
headset which hopefully has a push-to-talk button on it as well.

~~~
buttersbrian
In theory yes. But what about users that are mobile on a call and even when
they "push" to talk, ambient noise from the metro, crowd, or traffic is
present enough to be troubling?

You don't always get to choose if background noise is present.

Also, you just asked that people push a button, and wear a headset. That is a
lot, and this is about lowering the bar needed to get a good experience.

------
GhostVII
I wonder how much benefit you would get from targeting specific
microphone/speaker setups for noise cancelling rather than treating everything
the same. I would imagine that the noise cancellation requirements are far
different for someone video conferencing over a laptop mic and speaker versus
a good pair of Bose headphones. If you could specify what type of device you
are using it could tune the noise cancellation accordingly - if I am using a
good pair of headphones, I don't need echo cancellation, but I still need to
filter out some amount of background noise.

------
monkey26
Funny timing. Just got off my first Google Meet call an hour ago and was
thinking they need to add noise cancellation. It was awful.

~~~
the_af
I feel you! Did you also have your kid shouting in the background? If they
found a way to specifically mute kids crying or shouting it would be a huge
deal :P

------
jpalomaki
Might be interesting to train the model using user’s own voice. Maybe this
would help filtering out co-workers in open office or family members.

Maybe you could also use this personal model to hide very short network
interruptions. Other party could use this model to constantly predict my next
piece of audio and switch to prediction in case packet is lost.

------
imroot
One of the things I picked up from HN a few months ago was bettering my remote
setup -- I picked up a HDMI capture card for my mirrorless camera, and bought
a few lights to brighten up my office, and then I purchased a cardioid
microphone and a pop filter.

The difference is night and day based on some of the recordings I've heard.

------
The_Amp_Walrus
Anyone know what kind of system they're using to do this? Any papers?

I messed around a little with speech enhancement last year and didn't really
get anywhere: [https://github.com/MattSegal/speech-
enhancement](https://github.com/MattSegal/speech-enhancement)

------
kemayo
> Google also made a conscious decision to put the machine learning model in
> the cloud, which wasn’t the immediately obvious choice.

Oh good. Meet is already a _huge_ battery-hog on my laptop, so adding fancy
signal processing client-side was worrying me.

------
pkaye
I need to get hearing aids soon and heard about all the advances and
limitations. Particularly the problems with noise cancellation. I hope this
kind of technology trickles into hearing aids also.

------
xchaotic
I am still waiting until AI reaches the ultimate in noise cancelling- that
meeting could have been an email. AI will automatically send meeting
cancellation and most likely meeting notes.

------
taeric
Wouldn't this be a bit more trivial with multiple microphones?

------
ComplexSpidey
Journey of a datascience feature - Start to End (still Work in Progress)

Key Takeaway - Its fine to be not 100% accurate, roll it out and learn.

tl;dr

\- Approval from Execs

\- Data -> Learning -> Training -> Variability -> Training -> Tuning

\- Privacymatters (for all digital educated , uneducated)

\- What & Whys of UX -- ultimately what user says

\- Definitely Cloud -- Its 21st Century

\- Optimised for Speed , Cost (a bit irrelevant if I am Google ;) )

\- Release (with presentation) -- Timing matters

\- Feedbacks (On permission)

A summary on the "Denoiser" and not "Noise cancellation" [Don't want to get
ranted out by Data Science folks] feature of googlemeet by PM. Applies to any
such feature.

------
neximo64
Any battery life tests of this tech on phones?

~~~
kccqzy
None. Because the processing doesn't happen on a phone.

> When you’re on a Google Meet call, your voice is sent from your device to a
> Google datacenter, where it goes through the machine learning model on the
> TPU, gets reencrypted, and is then sent back to the meeting.

~~~
neximo64
Something's obviously going wrong. Google meets runs the iPhone vs a standard
call or Zoom.

Obviously theoretically there is basis to what you're saying but, end-to-end
somewhere its problematic.

~~~
Thorrez
>Google meets runs the iPhone vs a standard call or Zoom.

What does this mean?

------
Vaslo
Or you could just mute your damn phones

------
jokoon
Weirdly, it seems the simplest phones already solved those problem a long time
ago.

Seems like over-engineering. The issue is either with the microphone, with the
hi-def stuff or something else.

Every normal phone never had an inch of a problem, so I'm really confused why
computers have this issue.

~~~
brokenmachine
It's because people find wearing a headset or holding a microphone too
onerous.

Better for them to just point their laptop microphone at the whole room and
let the poor saps at the other end suffer.

~~~
deathanatos
Ridiculously, I learned the other day through trial-and-error that the mic on
my headset isn't actually getting used; rather, the one on the laptop is. The
device is Bluetooth, and Bluetooth has two modes of operation: A2DP, which is
output-only (no mic), and the default. But the laptop mic gets used. The other
mode, HSP/HFP, can use the mic, but the audio quality drops _tremendously_.
Like, to garbage levels. My best research said this was because BT lacks the
bandwidth to carry both mic input & sound output simultaneously.

The headset at least still has the benefit of isolating the output from the
conference to my ears, s.t. the laptop mic picks it up, but it would be nice
to not be tethered to the laptop and be able to pace the room.

~~~
brokenmachine
Bluetooth handsfree is not great quality but seems to work semi acceptably for
phone calls in my car. I guess with multiple parties on a video call it might
be worse?

Especially if the other end was not on a headset.

------
m0zg
> How Google Meet's noise cancellation works

Very poorly. Of all the available alternatives (Zoom, Skype, FaceTime), Google
Meet seems to have the worst audio _and_ video quality. This is inexplicable
for a company very easily capable of technological and product leadership in
both of those things.

~~~
thebeefytaco
Did you even read the article? This is talking about a new noise cancelation
feature being introduced today.

~~~
m0zg
Quite obviously not, just like almost everybody else in the comments.

Shouldn't the title be "How the _new_ Google Meet noise cancellation works"
then?

------
trboyden
Not very well. Watched a Google Meet meeting for the neighbor's Honor Society
induction and the quality was horrible. Video kept freezing and audio cut in
and out. Was probably only about a dozen attendees in the meeting room. Wasn't
the neighbor's connection either, they have a solid Fios 200/200 service.

~~~
thebeefytaco
Did you even read the article? This is talking about a new noise cancelation
feature being introduced today.

------
david_draco
It would be fun if it canceled out screaming.

We somehow have this sexist social expectation that women who show their
feelings (crying, screaming) are "hysterical" (really a nasty word) and not
taken seriously. If so, men screaming should be equally considered a sign of
immaturity and lack of self-control.

Also could help with customers ("Sorry, I can't hear you!").

