Hacker News new | comments | show | ask | jobs | submit login
Looking to Listen: Audio-Visual Speech Separation (googleblog.com)
86 points by chriskanan 5 months ago | hide | past | web | favorite | 7 comments

This would be a huge improvement for hearing aids, particularly for people who can't hear in stereo. It might need an eye tracker for aiming.

Exactly! Cocktail party effect requires two functional ears and I'm sure this can be helpful to people with only one functional ear.

> The cocktail party effect works best as a binauraleffect, which requires hearing with both ears. People with only one functioning ear seem much more distracted by interfering noise than people with two typical ears.

[0]: https://en.wikipedia.org/wiki/Cocktail_party_effect

This method has improvements (better quality than audio-only separation, speaker assignment, and better noise handling), but you can do pretty well with just mixed audio: https://www.youtube.com/watch?v=vW51cG1Ox98

I wonder if we will have visual assisted tts. Humans do it: https://youtu.be/G-lN8vWm3m0?t=74

The McGurk illusion is so strong that I'm sure visual cues have a major role in error correction or voice recognition assistance for humans.

With that particular example, I noticed the issue immediately and heard bah the whole time. I wonder how much people's response to that illusion varies.

I wonder how a blind person will respond to a cocktail party effect. If a blind person can do it, maybe this separation can be done without visual input?

This is pretty cool. Will be interesting to see it used on older audio sources to clean this up.

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact