Hacker News new | past | comments | ask | show | jobs | submit login
How Google Meet's noise cancellation works (venturebeat.com)
390 points by theanirudh 61 days ago | hide | past | favorite | 191 comments



Most people have no idea of the amount of incredibly advanced signal processing that goes into echo cancellation and noise cancellation in videoconferencing.

This post is on noise cancellation specifically, and it actually has the potential to be a huge step forward.

One of the big audio problems with group meetings is that the background noise from each participant adds up, to a point where it quickly becomes unbearable. For that reason, videoconferencing generally only plays audio from one or two participants at most, using a fairly simple estimation of whichever audio signal is currently loudest. The problem is that this can make it really hard to interrupt (people will literally not hear you), or tell the difference between two people going "mm-hmm" versus the whole group. If you've ever been in a group meeting where everybody applauds something, this is why you see everyone applauding but only hear a smattering.

But if this noise cancellation really succeeds, it could be a huge leap forward because audio cues and overlap will actually work for the first time -- hearing the "mm-hmms", hearing everyone pipe up, and so on. Videoconferencing will feel more like an actual single shared audio environment, rather than the kind of "walkie-talkie" effect it so often feels like now.

I'm really looking forward to this.


> The problem is that this can make it really hard to interrupt (people will literally not hear you)

This is driving me crazy with Google Meet in these COVID19 times. Even in a relatively small conference, I have a really hard time interrupting someone to ask a quick question, even when the speaker is expecting interruptions. It's always "excuse me!"; delay as person continues speaking; I stop; the other person says "yes, please ask away"; when I restart my question the other person already assumed I've changed my mind and continues speaking; repeat ad infinitum. And this is if they even hear me over the audio breaking up.

It's very, very frustrating. If they solve this it would hugely improve quality of life in remote conferencing for me.


If the speaker did hear you interrupt, then that's actually a latency issue, not a noise/mixing issue.

When a conference call is made up of people all in the same city on decent internet connections, latency is usually not a big issue.

But when a conference call has people from New York, San Francisco, and Japan on it, even if it's only 3 participants, latency can be bad just because of the speed of light, essentially (on top of what is otherwise reasonable hardware/software latency). Latency may be bad even if you're talking with a colleague in the same city, since the audio is "mixed" on the server, and that server might be across the world if a participant from across the world started the meeting. (Counterintuitively, the latency with your local colleague could be twice as bad as with the colleague from across the world.)


It's not just connection latency, it's decision making latency. As explained in the parent comment, all non-speaking mics are muted. If you want to interrupt, you have to make noise for long enough that it thinks you're talking now, before it will open up your mic. Otherwise it will think it's a cough or innocent tap on the desk.


You're probably right! Though I've just experienced this issue with three participants, all in the same city (not in the US though). It's really annoying.


This is also a problem of people not understanding that audio conferences aren't just a regular conference but with headphones.

There are a few things that most meetings could benefit from. Having an organizer who's aware of the differences between leading an in person meeting and a remote meeting, cutting video to save bandwidth if the meeting doesn't absolutely need it (the organizer can usually just disable the function), muting when you're not speaking (by far the best quality of life improvement, can be done silently by the organizer if someone is just doing their Vader impersonation throughout the meeting), using the "raise hand" function (again, the organizer plays a huge role here), using the native app instead of the web one usually provides better quality and performance, using a wired connection instead of wireless if possible, sometimes even starting meetings at non-standard hours (like 15 to/past the hour) helps avoid the rush of people logging in at the same time, etc.


I’ve experienced some pretty horrific latency in Google Meet that seemed to originate from my local device, where only my connection would suffer from high latency.

Typical restart-all-the-things usually made it go away. But it wasn’t unusual for 500ms of latency to slowly build up during a 30min call. Unfortunately I have nothing more useful to add, the issue resolved itself before I could track down a definitive cause.


Bluetooth.

Stop using it.

Without even looking at your setup, I would bet $100 minimum that it's Bluetooth latency. It adds a lot of latency, 500ms is not unusual, and many folks have no idea that all that latency is really just the last 18 inches. This is why you're seeing more and more cases where people are using good old iPhone wired earphone for conference calling, especially when skyping a TV interview.


> It adds a lot of latency, 500ms is not unusual…

How are you measuring? Half that is considered high-end from what I'm reading[1], and AirPods Pro apparently reduce that to 150ms.

[1] https://stephencoyle.net/airpods-pro


150ms is terrible. That's more time than going from London to Los Angeles and back.

Bluetooth definitely is the issue with 95% of these latency issues.


Also, video makes it 10-70 ms worse due to frame-buffering and significantly worse codec latency. If you're within 2-3h by car, try ethernet cables and Mumble/"Jitsii Meet", the former at a low-latency star point (locally, if you have to) and the latter in P2P mode. 30ms end2end is possible, but 50ms end2end is quite realistic if you have good internet routing. Inside a city even 10ms could be possible, but that'd need some ultra-low-latency software, no Coax/DSL (Ethernet-based FTTH should do), and a kinda-exotic low-latency soundcards, because Opus alone eats 5ms one-way in the restricted-celt-only-low-latency mode. This is basically the delay of sharing a round meeting table for ~20 persons.


150ms * 2 in many cases, as both parties use a bluetooth headset. Wifi adds another 5-50ms * 2 plus some packet drops.


Oh my god, I had no idea airpods were so bad.

Crazy that airpods have that much latency. I have bluetooth in the car and it has way too much latency - I thought Apple had improved it to not be noticeable but I never checked the numbers.

Incidentally, for playing midi instruments you generally want things to be below 8ms to feel natural. 150ms is an eternity.

Is the lipsync ok if you watch a video on an iphone with airpods?


> Is the lipsync ok if you watch a video on an iphone with AirPods?

For video they compensate for the latency by displaying the visuals slightly delayed (this has been done for over a decade even way back on feature phones).

Even game consoles have the option to do this since some TVs/receivers have audio or video latency.


Shure PGX-D is a prosumer / church-basement level of digital wireless microphone system, and that's 3.5ms.


I wish it was that simple. Unfortunately I was seeing this then using the laptops built in speakers, not just with Bluetooth headphones.


You can check your WiFi and try with an ethernet cable. Wifi has tendency to add unpredictable latency.


This applies to everyone on the call


> Even in a relatively small conference, I have a really hard time interrupting someone to ask a quick question, even when the speaker is expecting interruptions. It's always "excuse me!"; delay as person continues speaking; I stop; the other person says "yes, please ask away"; when I restart my question the other person already assumed I've changed my mind and continues speaking; repeat ad infinitum.

One way to solve this is to have the speaker name the person, and then wait until that person speaks. For example, if someone interrupts:

    Speaker: [Talks]
    Person A: Excuse me!
    Speaker: Yes, Mr. A? [waits]
    Person A: What about X?
Or if there are two people talking at the same time

    Speaker: [Talks]
    Person A and B: Excuse me!
    Speaker: Yes, Mr. A? Mr. B, I'll come back to you after A. [waits]
    Person A: What about X?
    Speaker: [Talk about X]. Mr B, you were saying?
    Person B: What about Y?
Treat it as a synchronization problem, with the speaker breaking the ties. As long as it's obvious to everyone whose turn it is to speak, it works well (assuming people aren't too rowdy/impolite).


Yeah, I wish there was a simple non-verbal option to signal intent-to-talk.

I want to just be able to hit my self-view and have it have a big icon on it or something so the person currently speaking (and everyone else) can see that I want to say something. Maybe sort these in chronological order so the speaker can see who wanted to talk first?).

In theory you could do this with a good chat, but for some reason the chat in Zoom and the others is kind of an afterthought and nobody uses it.

One of the reasons I prefer text based chat is multiple people can talk at the same time without needing to deal with interrupting audio. If you can type well, the bandwidth is higher for group communication (and you get a log).

At least with video you can kind of tell when someone is waiting to speak by seeing their expression. Audio only is worse (but maybe wouldn't be, if you had good intent-to-speak tools built into the app?)


I really like how Jitsi Meet puts the hand raising/lowering button right on the bottom bar, where there's just empty black/white space in Zoom/Google Meet, and not buried inside a menu labeled "Participants" (???), where it's a hassle to access (Zoom).


Microsoft Teams has a "raise your hand" function that's pretty handy.


>Yeah, I wish there was a simple non-verbal option to signal intent-to-talk.

this is a solved problem. webex has had a hand raise feature for a decade now. trivial for google to just copy.


>At least with video you can kind of tell when someone is waiting to speak by seeing their expression. Audio only is worse (but maybe wouldn't be, if you had good intent-to-speak tools built into the app?)

Which is one good reason to use video. At least with smaller meetings, someone can raise their hand or just look really pained. (Bigger meetings, you probably need to use chat.)


We rolled out https://chrome.google.com/webstore/detail/nod-reactions-for-... to all of our devices. The quick 'raise hand' button is great for what you're looking for.


I meet regularly with 2 people in the same room in one city, and me and one other person in the same room in another city (4 people, 2 locations). Google Meet is unfortunately entirely unusable for this purpose. It will select one person to "listen" to and tune out the person with the higher voice (that's me!). We have switched to Zoom for these meetings, as the only way to overcome this in Meet is to physically arrange for the higher-voiced people to sit immediately adjacent to a microphone -- while Zoom just works, most of the time.


The biggest reason for this is people not using headsets. If someone is just using their laptop speakers and mic, Meet will prioritize the mic if they're talking and will duck any audio that comes through the speakers.


So much this. I'm almost at the point of stating that echo cancellation has done more harm than good, because we are now in a situation where 80% of people have no idea that wearing earbuds could make a tremendous difference in call quality, and everyone just expects the software to magically take care of it.

Sadly, the software does not just magically take care of it. Anytime two people talk, a typical echo canceler just starts decimating frequencies until both of them are unintelligible.

Add in a couple of clueless teams who mount a camera/mic against a conference room wall and introduce massive amounts of room echo into the mix, and I'm at the point where a conference call becomes an absolutely mentally exhausting experience just trying to decipher what is being said. I have no hope of contributing, because I can only hear 2/3 of the syllables, and my brain is running on overdrive trying to turn those back into words. By the time I've figured out what they just said, they're half-way into the next sentence. What a stressful hellscape.

Ironically, if we had no echo cancellation, it would force everyone to use ear buds, and the average call quality would be a lot better.


100%. I wrote a post about this (among other basic tweaks) a couple of months ago: http://jonpurdy.com/2020/03/how-to-improve-your-zoomskype-te...

I have some screenshots of waveforms showing laptop mic vs headset, and the signal-to-noise ratio with the headset destroys even good noise-cancelling using a laptop mic that's farther away from one's mouth.


I have headsets and tried pretty much everything. there's always background noise from me. I'm even using the audio cable with my bluetooth (!) headsets, turning bluetooth off.

I don't know what else I can do.


There can still be background noise, but if you're wearing a headset there is not a feedback loop where noise from the other participants gets looped through your mic and speakers.

Participants often don't realize that they're the culprit when somebody else sounds terrible.


The software could require everyone to do a mic check before joining the meeting. It would record a second of them saying "hello" and play it back to them.


This would kill the product instantly. People don't like being reminded what their voice sounds like.


I use a Plantronics Legend bluetooth headset, which is pretty good at cutting out background noise. Tested with a phone.

Cheaper bluetooth headsets seem to pick up everything around them. Had that issue with a coworker where the headset was worse than using the internal mic.

Biggest and annoying issue though is consistent bluetooth disconnect/reconnect issues even on different MacOS machines. Latest firmware and such. Pretty sure it's not 2.4ghz interference.


I've heard that the original Bluetooth standard is pretty terrible for audio, especially for microphones. On Windows PCs at least, old protocols can cause a bad experience:

"Modern high-end Bluetooth headsets support AptX, an audio codec compression scheme that offers better sound quality. But AptX is only enabled if it’s supported on both the transmitter and receiver. When using a Bluetooth headset with a PC, it only works if your PC’s hardware and drivers are compatible." (https://www.howtogeek.com/354321/why-bluetooth-headsets-are-...)

Not sure if this applies to a Mac though.


It doesn't directly fix the disconnect/reconnect issue you're talking about, but I've found ToothFairy[1] super useful to mitigate the bulk of my MacOS bluetooth frustrations.

A few different ways it's come in handy for me:

- The Bluetooth speaker I use for music has a tendency to sporadically sound super hollow. Turns out it has a mic built in, and the voodoo of Mac's bluetooth stack would decide at random when it connected whether it would go into audio-only mode (and use the higher-bandwidth AAC codec) or go into audio+mic mode (splits the available bandwidth between the two and as a result uses a lower bitrate audio codec to compensate for the bandwidth drop). Used ToothFairy to always force that device into audio-only mode.

- After the above discovery, tested doing the same with my actual headset, and leveraging the built-in mic for input. For call audio, it's pretty erratic on whether it'll have any impact at all, and depends heavily on the circumstances of the call itself. Sometimes the audio is massively better, but most of the time the audio is already degraded when it gets to my machine and the bluetooth improvement is moot. That said, makes music in between calls far more pleasant.

- My bluetooth mouse is particularly susceptible to that consistent disconnect/reconnect issue you mentioned. ToothFairy can create a menubar icon for individual devices, which helps to act as a quick sanity check to see if my mouse has disconnected. ToothFairy can also run a shell script on disconnect, which has been handy. At this point I have it trigger a system notification[2] so I'm at least immediately made aware of it, check my idle time[3] in case it was the mouse going into sleep mode from inactivity, then conditionally leverage blueutil[4] to look for the device and reconnect if found (forcefully restarting the bluetooth stack in the process if it has issues). Doesn't fix whatever the root cause is for that consistent disconnect/reconnect issue, but this duct tape re-establishes a connection far more quickly when it happens, making the issue itself significantly less disruptive.

I only stumbled on it via Setapp[1] and tried it on a whim, but it's definitely one of the more handy utility apps I've found and well worth the $5 App Store price for anyone that has similar bluetooth frustrations with their Mac.

[1] https://c-command.com/toothfairy/

[2] https://code-maven.com/display-notification-from-the-mac-com...

[3] https://stackoverflow.com/a/17966890

[4] https://github.com/toy/blueutil

[5] https://setapp.com/apps/toothfairy


Some folks find that switching from WiFi to Ethernet for your home office can help:

https://www.jefftk.com/p/ethernet-is-worth-it-for-video-call...


I rolled my eyes at first, because the only client on my network was my streaming laptop.

Then I realized I'd moved to an apartment complex. I did a wifi scan, and found over fifty competing SSIDs.

Switched to ethernet, and the improvement was night and day.


It's easy to forget that only one wireless device on a channel can send at any given time, even if they're on different networks. And that every channel overlaps with its neighbors.


100% agree. Latency is a killer to natural conversation so if you want to be your best on a conference call, no Wi-Fi and no Bluetooth.


I always feel this with Zoom. Interestingly it did not seem to be an issue on a recent Discord call.


Discord targets gaming which absolutely prioritizes low-latency. Zoom has a very noticeable amount of latency which makes it really awkward to have multiple people talking at the same time.


Does anyone have more details on Zoom vs. Discord latency? We've been experimenting with Zoom and Discord for online trivia tournaments where if one participant had better latency than another that would give a big advantage. I'm sure that has to happen on any platform, but if there's a bigger variance on one platform vs. the other that would be good to know.


If you read trough other comment threads, you can learn all kinds of things adding/reducing latency.

No matter which software you use, some people will be at an advantage simply due to their isp/wired connection/wired mic.

This could be a business idea though: Conferencing software which equalizes everyone’s latency.


We're trying karaoke through Zoom tomorrow as a standup gag, wonder how that will work with the delay haha.


my perception with zoom (only based use, not actual knowledge of how it works) is that it has two modes: one where it tries to isolate the speaker and auto-mute everybody else, and another where it can't figure out who the speaker is and just lets all audio through. so if everybody on the call is singing together, it should all come through.


Is Zoom picking some other point on an optimization curve, and if so, what's more important to it?

Or is it just worse?


Zoom seems to be optimizing for bandwidth use, and by extension, cost to them. Its typical use case is a shared office internet connection.

Discord users are more likely to have a dedicated fast internet connection and doesn't seem to care about profitability at the moment.

It's just the difference in designing for a 100/10 connection to yourself vs sharing a 100/100 connection with 20 other people. Zoom reasonably gracefully degrades on choppy/slow connections while Discord becomes straight up unusable.


Ah, this makes sense. I have noticed that Zoom never quite gives up for participants on slow / bursty networks. When something becomes that bad I expect it to be headed towards failure, but Zoom is happy to just sit there at 0.5fps.


It would be nice if there was a 'raise your hand' button which put you in a queue to speak. Even better if it let you take a quick note in case you forget what you wanted to say.


I think part of the problem is that the tooling and the societal norms still need to evolve. The tooling is getting there - Zoom/Teams (I don't know about Meets) have buttons to communicate out-of-band beyond just text chat. We need to have more of that, I imagine eventually we'll have a wide range of ways to express ourselves (and customs/norms to match). Although I don't know if that'll happen before most people stop working from home.


Very true. At the moment the best workaround seems to be Microsoft Teams allowing you to "put your hand up" through a button press so that the speaker can give way to you, but this is far from intuitive/comfortable for most people compared to how they'd normally interact.


Even in on location meetings I ask people not to interrupt and raise their hands instead. Otherwise a few socially inept people keep interrupting to ask questions or bring up criticism that the speaker would have addressed of they had been allowed to keep talking.


That sounds like network latency, not an issue with switching audio sources.


Very likely. But for whatever reason, it results in very frustrating meetings. Like someone else mentioned, Discord seems to work better -- even though it's not the same use case, and I use Discord with friends and much later during the day.


Well, you think they're expecting interruptions. ;) Maybe the existing Meet is good for you.


I know they are expecting interruptions because sometimes (as in the particular example that triggered my frustration) they said so at the beginning: "please interrupt me every time you have a question, otherwise I'll feel like I'm speaking into the void and this will be a very boring meeting".

The problem is that Google Meet (or my connection, or whatever technical reason) wasn't up to the task. This has happened enough times that I dread interrupting now. Sadly, one person monologuing is not how face-to-face meetings really work.


I wish everyone would just get a headset. It drives me nuts when people call me on speakerphone, of course they never hear the problem.


Tell them you can't hear them due to noise, tell them to pick up the phone or plug in a headset, etc.


They will probably tell you everyone else hears them, so the problem is on your end.

People just don’t care about quality, they will gladly use the crappiest mic with all the noise all day..


I mean they can tell whatever they want. I organized my work around the fact that I can tell people to go fuck themselves when they are uncooperative in these small but very important ways. Obviously that's not the reality for ~90% of people today.

But people shouldn't take "workspace" abuse from others. Be polite, assertive, constructive. Give them credit for taking the time to care a bit about this problem, offer to debug with them later, try to demonstrate it - switch to speaker, capture what you hear. Alternatively just switch off voice and type. Tell them that the sound is garbage for you so you switched for typing. Try to do quick 1-on-1s instead of the group call, even offer to write a summary after.

This is basically the equivalent of constantly not giving a fuck about how loud one is in an open office. There ought to be proper channels [ha, no pun intended] to address and solve these.


See also https://krisp.ai

Mute your or participants' background noise in any communication app

https://krisp.ai/technology/


Nvidias approach is far better.


While it works great, not everyone is video conferencing on a computer with a high end RTX GPU.


> videoconferencing generally only plays audio from one or two participants at most

I have noticed this and I hate it. It makes normal conversation absolutely impossible.

Discord, which is an audio first product, is much better than other solutions in this regard and their video conferencing while new has been very enjoyable to use.


Any thoughts on spatialization or panning? I feel like it could help a lot, but also making it a good experience could involve head tracking, since most people are (hopefully) not accustomed to speaking in a different direction than the person they're speaking to.


I used to use panning back when I played WoW "seriously" and did 25 man raids, it makes hectic audio chat sooo much more clear. Some gaming voice apps can dynamically pan voices in 3D to match where the characters are, but I didn't use any of that. Just simply putting Guild Leader center, Tanks ~25% right, Healers ~25% left, and then randomly throwing everyone else somewhere wider in the stereo field. It sounds like an actual group of people rather than a single overlapping mono mess.


A podcast I listen to with 4-5 people talking in a room does that with its audio. It sounds like a good idea, but in practice, it drives me nuts and makes me feel like I have plugged ears. I always enable mono audio when I listen to it.

I think if it was dynamic, where turning my head towards the person speaking balanced the audio (like in real life), I would not have a problem with it. A super simple form of virtual reality that would only require a simple head-mounted gyroscope or motion sensor.

Another podcast I listen to has two people with very similar voices, and I sometimes have a hard time figuring out who's speaking, so I welcome any advancements in this space.


Haven’t they messed up positioning, and reduced the volume too much on the further ear side?


i hope product managers of ms teams/zoom/meets are reading this thread, this is pure gold right here


How did you set this up?


It was built into whichever voice-chat software we were using, just a simple right-click action. This was a long time ago, so I don't totally remember, probably 2008-2012 or so? Trying to jog my memory with Google and I think it was TeamSpeak and the "3D Sound" feature. I feel like Mumble may also have been able to do this.


That's a really cool idea, thanks for the heads up. Didn't realize it was possible but man would it make a world of difference in meetings.


I dream of the day when our laptops come with an integrated microphone array with automatic beamforming based on head tracking.


I mean, automatic beamforming microphones are actually fairly common now, in laptops. Head tracking is probably a detour if your goal is just to get good clear voice input.


Some HP laptops have this and it’s terrible. Perhaps it would be useful if it selected the right sound 100% of the time but in reality all you get is audio cutting all the time with no explanation.


I wish Tinder came with the ability to beam me up.


>One of the big audio problems with group meetings is that the background noise from each participant adds up, to a point where it quickly becomes unbearable.

Just use proper equipment. Headset is an absolute must. Next step is software that only transmits when someone is talking. Gamers have figured this out decades ago. Just look at Mumble, Teamspeak, Discord. They know.

People without proper headsets in that environment get ignored after a while. Nobody wants to use brainpower to understand what you are saying. Corporate might be harder, but you also get payed for that.


>Most people have no idea of the amount of incredibly advanced signal processing that goes into echo cancellation and noise cancellation in videoconferencing.

We pretty much take echo cancellation for granted at this point. Using something better than your laptop microphone on a call is still a good idea but I'm not sure that wearing headphones/earphones is that big a deal at this point.

You don't need to go back that far until speakerphones other than very expensive Polycoms and the like were pretty mediocre at cutting out because of echo.


This will be a game changer on online gaming, too (pun intended, I guess). I don't even like playing games that require headsets and teamwork because the background noise makes my ears physically hurt after long enough.


there are products that do this now with all kinds of apps:

https://www.nvidia.com/en-us/geforce/guides/nvidia-rtx-voice...

I think you can do it not only to your microphone (outgoing audio), but to the other participants in the meeting (incoming audio)


That's exciting. Isn't this difficulty one of the reasons why open-source VOIP clients are rare?


Hasn't the problem been solved decades ago, with car-kits?

It seems that what's old is new again ...


No, as evidenced by the fact that you can almost always hear when a participant calls in from their car. When they start talking, you hear the road noise in the background.

The only thing car-kits seemed to do was add minimum cut-offs before transmitting and make use of directional microphones.


No, not even close. The problem space is still mostly unsolved.


Of all companies poised to solve it, I think Apple could do it if they wanted. They could embed microphones in the laptop frame and integrate it with the camera as a premium feature, since they control the hardware and software stack.

It's more practical than a touch bar, at least.


What’s a car-kit? Just googled but didn’t find anything relevant


Those adapters that let you use your phone through the audio system of your car (adapter does the noise-cancelling).


Ah I see never heard of them


What do you mean by car-kits?


> A musical instrument will probably also get filtered out. “To a pretty large degree, it does,” Lachapelle said. “Especially percussion instruments. Sometimes a guitar can sound very much like a voice — you’re starting to touch the limits there. But if you have music playing in the background, usually it’ll cut it all out.”

This is a big issue with hearing aids. The whole industry is focused on optimizing for voice intelligibility and as a musician you end up doing trial-and-error with the audiologist to turn all that stuff off.

We need more open source hearing aids - I've read of a few but they're not mainstream.


First, I'll say this to everyone: Get hearing aids if you need them. They can change your world.

About music, this is getting much better in hearing aids. I've been from analog thru digital over 15+ years of hearing aids, and my latest (3 months ago) pair from Phonak (no affiliation) is an honest leap forward. It has a built in Music profile that disables all sound optimizations in general, while still attempting to correct the hearing ranges that you have a deficit in. I was on the verge of no longer being able to hear with hearing aids, that has probably been extended by 3-5 years with these new models. At that point I will be approaching cochlear implant level hearing loss. I happily embrace my cyborg future!

On top of Music, I have a Walking profile that attempts to focus on the person that is walking to the left or right of my and can pick with side on the fly. And they make great ear plugs when things are loud.

The Normal program, auto-magically selects between 8'ish profiles to pick the best one for the environment. And it has finally got it right. Older models I would daily need to force it into the best mode because it guessed wrong. The latest model I only have to tell it what to do once every few weeks.

And to the original topic, noise cancellation, hearing aids bluetooth'ed to the phone/PC for conference calls is hands down the best possible audio experience. Built in noise cancellation, amazing microphones that can be used for your voice portion of the call, tuned to your hearing, with some of the finest sound output possible. Just amazing. These things are so good these days that they are finally being labeled as assistive devices for people without hearing loss. They can give someone with normal range hearing essentially bionic hearing. Tinnitus? They play customized white noise to make the ringing less noticeable. Doesn't help everyone, but it's really nice for me. I hear more ringing when I take my aids out.

Oh, and it does all of this on a device that fits in your ear with a battery the size of a few grains of rice and all in a few milliseconds so your brain sees the mouth move at the same time it actually hears the audio.

Again, get them if you need them.


They have some great features and I agree that people should get them (or upgrade), but they are not optimized for musicians.

I just got a similar model I assume (M90-R) and it's definitely not switching to music mode automatically when I play music. (Maybe it's different for listening.) I just had the audiologist add a music mode that I can switch to manually, but getting acceptable timbre for the instruments I play (accordion, melodica, and piano) is work in progress. Making an expensive instrument sound like cheap trash is disappointing, though of course I can take them out.

Having Bluetooth is nice, particularly for phone calls, but I find the sound quality is unsatisfying for listening to music, so it won't be replacing speakers for me.


I guess the flip side of your advice is, don't damage your hearing with loud noises. It will change your world.


This is also a problem with teaching online piano lessons, sometimes the piano gets filtered out and makes it hard to hear what the student is doing.


Zoom has a "use original audio" toggle that helps a lot with this.


I don't know if a similar option is available for other products, but Zoom has a direct audio option: https://support.zoom.us/hc/en-us/articles/115003279466-Enabl...


I've been using Google Hangouts, doesn't look like this option exists here...


Discord has powerful controls. I haven't messed with them myself, but it looks promising


I've been using https://krisp.ai/ to great effect with Zoom while sitting outside on the laptop with road traffic, birds, etc nearby - My team really had a "wow" moment when i turned it on the first time


Krisp is embedded into Discord (enable beta settings) and the voice chat quality far exceeds any of the "business" focused software I've ever used.

Not to mention the screensharing is infinitely better as well. It's pretty pathetic of the busines sapps, we went through a day where I was trying to screenshare something and my remote coworkers kept complaining of lag, blurriness, or the app would just crash (slack). We went through ms teams, zoom, slack, and google meet. All had issues. Convinced everyone to install Discord and suddenly I was able to shared my desktop perfectly at 1080p without noticeable lag and crystal clear audio.


I will say, using Krisp, it has the same problem that basically all these 'AI' based noise cancelling seem to exhibit: sound quality deteriorates when outside noise is suppressed, and people seem to sometimes not meet the threshold and get completely cut out from talking in some scenarios.

It's still better than food noises, but I have noticed that as a disadvantage.


+1

Discord's lack of lag in audio makes a huge difference for voice comms. I've only used it for gaming, but you can really tell the difference when you switch to the game's voice chat feature which has probably a third of a second of latency. And of course Zoom et. al. have a lot more lag and it really hurts the experience. In addition to low latency, the sound is also very good quality.


Note also that no messages in Discord, including individual messages/DMs, are end-to-end encrypted. This is precisely the same security issue that Zoom has (and Slack, and IRC without OTR).

Discord can and does log all messages through the system, and has many internal tools that operate on the plaintext. Anything you communicate through Discord you should assume any/all Discord staff may read.

They claim that the voice comms are e2e but there are no further details available (like where the keys are generated).


It would be amazing if there was a tool like Krisp that could automatically noise cancel outside noise in your headphones for people who work with audio in loud environments. Not clear if that's at all possible without your headphones having microphones built into them to accurately detect incoming outside signal.


It's not possible. The only reason ANC works is because the microphones are located (physically) to your ears and so are the speakers/headphone drivers. If they're in some random location you can't inject anti-noise and you can't detect the noise accurately.


How is this different from noise cancelling headphones currently available? Or do you mean something like this to add the feature to non-noise cancelling headphones?


>Not clear if that's at all possible without your headphones having microphones built into them to accurately detect incoming outside signal.

I'm guessing they mean to add the feature set to standard headphones. Leveraging say the laptop microphone to provide active noise canceling to someone with a standard set of earbuds.


Noise cancelling works by shifting the sound waves of noise, which come into your ears. The ups and downs (of pressure) in the sound wave are added together, cancelling the wave altogether. Each ear get different noise, so the microphones should be as close as possible to each ear and work absolutely independently. Thats why microphone of your laptop is not of any help here, it simply gets completely different noise, which cannot cancel out one getting into your ears. This is more physics than software.


With two different microphones on the laptop, you could triangulate sources of noise and figure out what will reach your ears. With three or more, even better. This sounds like a difficult and interesting signal processing problem, but I wouldn't rule it out.


It would also have to know where each of your ears is in relation to the microphone with millimeter accuracy.


As far as my experience goes, the single best way to deal with background noise is... the mute button.

In every video conf I've been, you can instantly tell when "one of them" who can't be bothered to mute themselves joins. The audio quality immediately goes down the drain. It's always the same subset of people who do it, too. As soon as they're enjoined to please mute, the audio quality is restored.

No amount of magic signal processing will ever match it.

While perhaps misguided to use it that way, the mute button thus act as a social-clueness meter.


For especially ... special people, I think the only solution is with hardware. They need a headset that has a physical toggle switch. Not a button where the mute state is an internal state of some device, but instead a switch (or knob) with two separate physical positions for mute and unmute. This way they are not faced with the mental challenge of tracking the mute state, but have a physical manifestation of it at their fingertips and can develop muscle memory to switch it on and off only for the duration they speak. It also means they can mute without having to alt-tab back to the window of the call (because of course they were checking their email in the meantime).

Then the policy is that they cannot join unless they use the headset.


Wholeheartedly agree. I often Skype-call with a group of 10 and we all have perfect mute-discipline. It’s just like a regular in person meeting, except there is just audio.


Probably time to add some mute-shaming by having a large red button "X is un-muted" next to such participants in the participants list. To those who are actually speaking, this shouldn't be a bother anyway since they are, well, actually speaking.


"Hold space to talk" is not a bad solve for this, also makes folks ramble less.


Would probably have some major accessibility issues.

Plus I'd hate to be the intern that has to sit and hold the space bar while the boss delivers a presentation


It could be built so that the meeting organizer can exempt certain people and so the system can exempt certain accounts.


I might be unusual, but my experience with videoconferencing has been that ambient noise is rarely a major problem. The big issue is audio cutting out due to a shaky network. When ambient noise is a problem, it's not so much someone typing as their spouse talking in the background or a fire engine going by -- and at that point the solution is for them to hit mute.


Most of the issues I have with ambient noise on call could be solved with people investing in a better headset. It's a significant improvement over using your laptop's built in one.

The issue I find a bigger problem is lag causing people to talk over one another. I've been on a lot of calls where the call quality was fine but conversations were difficult because it was hard to judge when the other person had stopped talking.


All voip seems to have terrible latency and it's so frustrating! Mumble does a really good job with this and has for years, why can't we get that in a mainstream solution?


> at that point the solution is for them to hit mute

From a technical point of view, that is really the best thing. It works, and sometimes it's the only thing that works.

But if you try to get people actually do it, you run into problems:

(1) They don't realize it's them. AFAIK the system doesn't play their audio back to them, so while everyone else hears the noise, they don't. The one person who needs to take action is the one person who doesn't know action is necessary.

(2) They are distracted. When their spouse is talking, they are focused on whatever their spouse is saying, not on how it affects the meeting audio. Or the meeting is boring and they're not paying attention.

(3) They just don't care enough. They are there to attend a meeting, not fiddle with computer stuff. Some people will never take the time to learn where the mute button is in the software.

Perhaps #1 could be improved, though, with some kind of blindingly obvious indicator in the UI. If "YOUR MIC IS WHAT EVERYONE IS HEARING RIGHT NOW" flashes when your mic takes the floor, maybe you'd notice it lighting up when you didn't intend for it to.


AWS Chime has significant drawbacks, but one of the things I most liked about it was that anybody could mute anybody else. The number of calls where that significantly cut down on audio discomfort was surprising.

For those wondering, unmuting is a privileged operation that only the user could do themselves.


Same with Meet, at least if you're using the regular enterprise/GSuite version (probably different for schools and/or the nerfed version you get with a personal account, since kids muting the teacher gets old really quick).


An attentive moderator can address all these issues, but it's frustrating to be just a regular participant who can't mute others.


In my experience kids randomly talking is worse than any constant background noise.


I was not so impressed with this demo - especially when he was scrunching his potato chip packet, the degradation in his voice quality made it almost impossible to understand what he was saying and his voice sounded very synthesized and processed, and that's through a $200 Yeti professional microphone. Seems like some of the other noise cancellation technology options from Nvidia RTX and others are more effective.


Microphone quality has very little to do with it. Noise separation is just an incredibly hard problem, particularly when a noise is loud. Scrunching potato chips, there's no scenario where his voice won't become degraded unless you can isolate the scrunching sound separately (microphone beamforming can help here, but is still never perfect).

Running this economically on servers at scale in realtime, I consider this very impressive. I can't say how it compares with RTX, but I wonder if it has anything to do with the amount of computing resources that can be dedicated to it. A single expensive card dedicated to one audio stream, versus a single Google server than needs to process hundreds (thousands?) of audio streams.


RTX Voice uses less than 10% of a 2060 (non-super)

Given how common voice communication is in our world, I am sure Google can build ASICs for this (if not just run it on TPUs), and get the marginal cost of vocal processing to be negligible.

Heck, they probably would just need to divert <5% the resources of Fuschia or any other of their "senior engineer retention" projects.


Does this run client or server side?


However that wasn't a realistic scenario, you are unlikely to talk while scrunching plastics, ideally you should get muted 100% when you do it.

Only way this could happen is if someone is standing 3 feet away from you and does it while you talk which would just be rude and probably be stopped by you immediately anyway.

I'm more curious how this could work in the metro or with a washing machine nearby


Yeti microphones pick up everything. If anything it would make the test more difficult.


You are correct. The sound quality of Nvidia RTX is amazing, compared to this.

Unfortunately, you need a $400 graphics card, and >100 watts of power to run RTX...


Unofficially, it's pretty easy to get it to run on non-RTX nvidia GPUs. https://www.pcgamer.com/nvidia-rtx-voice-performance/


Not only that, but even when running RTX Voice on a GTX, the amount of GPU horsepower used is miniscule; it hardly even registers. The fans on my GTX 9-series GPU don't turn on when running RTX Voice.


+1. I'm using it on a 1060 and I've monitored sensors with OpenHardwareMonitor while RTX Voice is running. The only observable effect is that my GPU's clocks are turned up, there isn't even any significant load.


I've been doing online guitar lessons since Covid-19 started and all these algorithms just suck hard for that. Even in a 1:1 call.

Two repeated notes and the noise cancellation just immediately shuts you down... we've been using Zoom and luckily you can turn all the audio processing off if you go in "Advanced" and enable "Turn on original audio".


I'm surprised there's not an option to disable it?


What do you mean? ben7799 said there is an option to disable it.


This would have been useful for my co-worker. They went on a trip to Europe years ago and had conference call scheduled when they got there.

Unfortunately for them they decided to lie on the bed in the hotel and the jet lag hit them pretty hard. Next thing you know they are asleep and they started snoring and I guess fairly loudly and everyone on the call could hear. So then the people on the call spend some time trying to figure out who is the person snoring, going through all the attendees. Eventually they figure out who it was and they started yelling trying to wake them up, which they did after awhile. Needless to say my coworker was very embarrassed about the incident at the time, but it did make a good story to tell people :)


Simple non-AI solution: Require all meeting participants to use push-to-talk. Support foot-pedals, mouse buttons, phone volume up button, and bluetooth play button.

For large meetings, organizers can enable a single-talker mode. Holding the talk button puts you in a queue. Your screen indicates when it's your turn to talk. This prevents folks from talking over each other. This eliminates echo by muting the talker's speakers while recording their voice. Also, attendees see the current talker, not the person whose dog just barked.


Unfortunately push-to-talk is not standard on anything. I've hacked together a hotkey to do it, but there's always too much lag. Somebody should make a wired headset with a button and two modes, one is push and hold to talk and the other is push and hold to mute.


To all those here who complain about algorithms messing with their audio when they don't want them to. Use an app called TeamTalk. It lets you disable all that processing, so it works great for high-quality music transmission etc. I have no affiliation with them, I have been using it for a few years and I'm very happy.


All this work to elim background is great. But we also will need business oriented group meets which emulate real life: Allow breakouts, 2-4 people in group "sidebar" to chat, the burble of other convos drift in, providing a gentle low background hum. The sidebar, unless marked private, would also contribute to hum of other users in a main group or in sidebars of their own. Acoustic effects can even allow directional it of the sounds.

Yes, we can do VR worlds that still look like Second Life, but while we are working on fixing that, we might solve for near term things to improve interactivity.

Well, and solve the "one person speaks, all must listen, lag for response, loop" which I find is similar to how morse code discussions work.


Everybody makes a stab at this, and very little of it works consistently. I applaud Google for attacking this head-on! It is a big issue and deserves attention.

My biggest issue (when I worked in videoconferencing) was echoing, and locking onto the delay window where echoes could occur. Depending on the distance from a conference room speaker to all the walls, echoes could occur at one or more offsets (appear at microphone input with some delay after presenting at the speaker). And ambient noises could masquerade as echoes. The filters tend to be IIR filters, and get wound up easily. It was awful.


Microsoft Teams’ noise cancellation has been driving me absolutely crazy the past few weeks. I live on a busy road, and whenever a car drives past Teams will reduce the volume of anyone else talking in a meeting. Even if I’m muted - Teams gives me the “your microphone is muted” notification. And I use closed-back headphones for meetings so I don’t even hear the outside noise. So this results in my having to constantly have the Windows volume slider open, one earpiece off, and listen for a car coming so I can raise the volume in advance. Is anyone else dealing with this?


Try Control Panel > Sound > Communications > Do Nothing


I'd say this is more like a demo. From the "how it works" in the title I was expecting to see some implementation details.

Edit: I had only watched the video. The article does indeed contain a lot more detail.


from the "venturebeat.com" domain though, this is about what i would have expected.


This is not the time to be snarky, I think we should congratulate the Google Meet team on placing this great article, not to mention the obnoxious integrations into other beloved Google products.

Can't wait to see what the Google Duo team will come up with in response. I mean, we saw the blog post on their great new video codec (AOMedia Video 1 was it?) but I personally felt it left much to be desired.

What happened to the Hangout guys? Are they still in this one? Product middle management wants to be wooed.


Duo is targeted to the general public and has E2E encryption though, so cloud denoising is not a possibility.


Ah the HN way keep putting down technical achievement just because it goes against a personal agenda.


Anyone know what kind of system they're using to do this? Any papers?

I messed around for a few months with speech enhancement last year and didn't really get anywhere beyond sort-of-reproducing a few existing models: https://github.com/MattSegal/speech-enhancement

All the published "state of the art" examples I could find were pretty crap, whereas Krisp AI were doing much better than what I've seen released publicly.


I've worked on my last gig for a startup that did speech-in-noise recognition and there we've used a recurrent neural net approach to separate background noise from speech. This model was trained on many many hours of audio data as well.


Serious question - what's the risk that someone with a high pitched, outside-the-norm voice will get denoised? If it filters out kids in the background, will kids no longer be able to use google meet?


I haven't been able to confirm this, but I swear it happens to my mom on Zoom. When I video chat my family on Zoom and she isn't sitting directly in front of the laptop, her words rarely come through. I can see her lips moving, I can hear my dad grunting in agreement next to her--but she's silent. If they switch places, I can hear my dad without issue.

I don't know if it's a combination of her cheap hardware or what, but it's... odd.

EDIT: grammar


I think the key here is "in the background". I would assume if you're speaking close to the microphone, with no other voices going on, it will not filter anything.


I would really like to use a Google Noise for meeting cancellation. Excessive use of meetings now "because we can" instead of thinking longer and harder and asking a well structured question.

Meetings should be for review/discussion and decision making not vocal exercise and grandstanding.

Would also be great if meeting providers would have a dial to show current latency for all participants to make easier to interject.

Lastly I do recommend using meeting tools that have features like letting you vote, raise a hand, chat all in a sync to main voice. Will make life easier for meeting moderators..... And if you don't use moderators then start the practice of doing it - quality of meetings will improve hugely.


What I'd really like to see is effective source seperation and nulling. For example, if you could mute the screaming baby in the background of a VC speaker (this has been fairly common occurrence now that we are WFH and it's hard to get day care).


I was really looking for the baby noise test in the demo, but I guess for now its human vs non-human cancellation ?

Edit: apparently can also remove kids crying; just not included in demo


As someone with 2 dogs this is going to be a good reason to switch to Meet whenever possible.


None of this fancy technology is necessary IMO.

Just implement push to talk with mute-by-default. 90% of the audio issues would be resolved. Another 5% could be solved by buying everyone a decent headset which hopefully has a push-to-talk button on it as well.


In theory yes. But what about users that are mobile on a call and even when they "push" to talk, ambient noise from the metro, crowd, or traffic is present enough to be troubling?

You don't always get to choose if background noise is present.

Also, you just asked that people push a button, and wear a headset. That is a lot, and this is about lowering the bar needed to get a good experience.


I wonder how much benefit you would get from targeting specific microphone/speaker setups for noise cancelling rather than treating everything the same. I would imagine that the noise cancellation requirements are far different for someone video conferencing over a laptop mic and speaker versus a good pair of Bose headphones. If you could specify what type of device you are using it could tune the noise cancellation accordingly - if I am using a good pair of headphones, I don't need echo cancellation, but I still need to filter out some amount of background noise.


Funny timing. Just got off my first Google Meet call an hour ago and was thinking they need to add noise cancellation. It was awful.


I feel you! Did you also have your kid shouting in the background? If they found a way to specifically mute kids crying or shouting it would be a huge deal :P


We've been holding 25+ participant Teams sessions, and the only rules are that only one person can speak at a time and no one unmutes unless they're speaking. The noise cancellation might as well not even exist.

Comparatively, I was impressed that we could even have a Meet without everyone needomg to be on mute.


Might be interesting to train the model using user’s own voice. Maybe this would help filtering out co-workers in open office or family members.

Maybe you could also use this personal model to hide very short network interruptions. Other party could use this model to constantly predict my next piece of audio and switch to prediction in case packet is lost.


One of the things I picked up from HN a few months ago was bettering my remote setup -- I picked up a HDMI capture card for my mirrorless camera, and bought a few lights to brighten up my office, and then I purchased a cardioid microphone and a pop filter.

The difference is night and day based on some of the recordings I've heard.


Anyone know what kind of system they're using to do this? Any papers?

I messed around a little with speech enhancement last year and didn't really get anywhere: https://github.com/MattSegal/speech-enhancement


> Google also made a conscious decision to put the machine learning model in the cloud, which wasn’t the immediately obvious choice.

Oh good. Meet is already a huge battery-hog on my laptop, so adding fancy signal processing client-side was worrying me.


I need to get hearing aids soon and heard about all the advances and limitations. Particularly the problems with noise cancellation. I hope this kind of technology trickles into hearing aids also.


I am still waiting until AI reaches the ultimate in noise cancelling- that meeting could have been an email. AI will automatically send meeting cancellation and most likely meeting notes.


Wouldn't this be a bit more trivial with multiple microphones?


Journey of a datascience feature - Start to End (still Work in Progress)

Key Takeaway - Its fine to be not 100% accurate, roll it out and learn.

tl;dr

- Approval from Execs

- Data -> Learning -> Training -> Variability -> Training -> Tuning

- Privacymatters (for all digital educated , uneducated)

- What & Whys of UX -- ultimately what user says

- Definitely Cloud -- Its 21st Century

- Optimised for Speed , Cost (a bit irrelevant if I am Google ;) )

- Release (with presentation) -- Timing matters

- Feedbacks (On permission)

A summary on the "Denoiser" and not "Noise cancellation" [Don't want to get ranted out by Data Science folks] feature of googlemeet by PM. Applies to any such feature.


Any battery life tests of this tech on phones?


None. Because the processing doesn't happen on a phone.

> When you’re on a Google Meet call, your voice is sent from your device to a Google datacenter, where it goes through the machine learning model on the TPU, gets reencrypted, and is then sent back to the meeting.


Something's obviously going wrong. Google meets runs the iPhone vs a standard call or Zoom.

Obviously theoretically there is basis to what you're saying but, end-to-end somewhere its problematic.


>Google meets runs the iPhone vs a standard call or Zoom.

What does this mean?


Or you could just mute your damn phones


Weirdly, it seems the simplest phones already solved those problem a long time ago.

Seems like over-engineering. The issue is either with the microphone, with the hi-def stuff or something else.

Every normal phone never had an inch of a problem, so I'm really confused why computers have this issue.


It's because people find wearing a headset or holding a microphone too onerous.

Better for them to just point their laptop microphone at the whole room and let the poor saps at the other end suffer.


Ridiculously, I learned the other day through trial-and-error that the mic on my headset isn't actually getting used; rather, the one on the laptop is. The device is Bluetooth, and Bluetooth has two modes of operation: A2DP, which is output-only (no mic), and the default. But the laptop mic gets used. The other mode, HSP/HFP, can use the mic, but the audio quality drops tremendously. Like, to garbage levels. My best research said this was because BT lacks the bandwidth to carry both mic input & sound output simultaneously.

The headset at least still has the benefit of isolating the output from the conference to my ears, s.t. the laptop mic picks it up, but it would be nice to not be tethered to the laptop and be able to pace the room.


Bluetooth handsfree is not great quality but seems to work semi acceptably for phone calls in my car. I guess with multiple parties on a video call it might be worse?

Especially if the other end was not on a headset.


it's crazy how high technology results in worse results


> How Google Meet's noise cancellation works

Very poorly. Of all the available alternatives (Zoom, Skype, FaceTime), Google Meet seems to have the worst audio _and_ video quality. This is inexplicable for a company very easily capable of technological and product leadership in both of those things.


Did you even read the article? This is talking about a new noise cancelation feature being introduced today.


Quite obviously not, just like almost everybody else in the comments.

Shouldn't the title be "How the _new_ Google Meet noise cancellation works" then?


Not very well. Watched a Google Meet meeting for the neighbor's Honor Society induction and the quality was horrible. Video kept freezing and audio cut in and out. Was probably only about a dozen attendees in the meeting room. Wasn't the neighbor's connection either, they have a solid Fios 200/200 service.


Did you even read the article? This is talking about a new noise cancelation feature being introduced today.


It would be fun if it canceled out screaming.

We somehow have this sexist social expectation that women who show their feelings (crying, screaming) are "hysterical" (really a nasty word) and not taken seriously. If so, men screaming should be equally considered a sign of immaturity and lack of self-control.

Also could help with customers ("Sorry, I can't hear you!").




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: