
Deep Learning enables hearing aid wearers to pick out a voice in a crowded room - WheelsAtLarge
http://spectrum.ieee.org/consumer-electronics/audiovideo/deep-learning-reinvents-the-hearing-aid
======
tobtoh
> The greatest frustration among potential users is that a hearing aid cannot
> distinguish between, for example, a voice and the sound of a passing car if
> those sounds occur at the same time. The device cranks up the volume on
> both, creating an incoherent din.

It may be a simplification of the article that I'm misinterpreting, but as
someone who got a hearing aid in early 2016, that's not how (modern) hearing
aids work.

I got my hearing tested which enabled a frequency response of my hearing loss
to be plotted (my hearing at low frequencies is fine, at higher freq I have
moderate loss). My hearing aid is then tuned to match the inverse of that freq
plot (ie boost volume of high frequencies, leave low freq alone).

You don't actually want a HA that arbitrarily boosts 'speech' since that won't
be matched to your needs and has unintended side effects (like music can sound
overly harsh/bright) because un-needed frequencies are being boosted or
supressed).

\-- On a tangent, after I got my new HAs, I complained to the audiologist that
they didn't sound very good. Everything sounded far too crisp. She pointed out
that having lived with hearing loss for 5-6 years, I actually had almost no
idea what something should sound like since my brain had got used to a world
with muted high frequency sounds.

That blows my mind ... a bit like how do you know the color green is green.
Maybe it's purple, but you have been told by someone else that it's green.

After a few weeks, my brain re-learnt what sound should sound like and now it
sounds 'normal' with HA in. Without HA, everything is a little more muffled
(as you would expect) and I really notice how much I used to struggle
understanding people (I believe my untreated hearing loss contributed to me
losing my job a couple of years ago).

Hearing aids have changed my quality of life (at age 40).

~~~
chx
I probably need a hearing aid but I am very reluctant to go for a testing
because the last time I had one (very long ago and very far away from Canada,
I admit) I thought it was incredibly imprecise: they asked me to press a
button when I could hear a sound and I was absolutely unsure whether I heard
really something already or just imagined. Is it still so very subjective?

~~~
radarsat1
Ah, I see.. interesting interpretation. It's a shame they didn't explain the
process to you better.

This is how psychometric testing works. It's inherently difficult because in
order to estimate the point of subjective "loss", which we call the "just
noticeable difference" (or JND), one has to sample more in the area of the
variable (amplitude, frequency, etc) that is more difficult for you to
distinguish. Consequently, one will always walk away from such an experiment
with an impression of having "guessed" and being really not sure if you gave
the right answers. But that's because they're trying to estimate exactly that:
they're trying to find the point at which you really aren't sure whether you
hear something or not.

Basically this: if you guessed perfectly every time that you heard something,
that would be a 100% recognition rate. If you always said with 100% certainty
that you _didn 't_ hear anything, that would be a 0% recognition rate. So
logically, the point of hearing loss occurs somewhere between those two
extremes.

In order to determine more precisely where, the procedure has to "zoom in" on
the point at which you answer correctly 50% of the time, a bit like a binary
search. (Or sometimes they want 75% or the time, etc.) In any case, to do so,
they need to sample the probability of you answering correctly or incorrectly
in that region. This sketches out a probability curve, and then they can fit
that curve and figure out the 50% or 75% point on the curve.

They'll sample using either a constant spacing method, random sampling, or a
staircase method that adaptively moves towards the 50% point. The latter is
more efficient, in the sense that it requires fewer answers from you, so that
is what is often used in practice. However, by its nature it is also much more
frustrating for the patient, because it will be sampling much more frequently
in the region where you are "not sure" of the answer.

I'm really sorry they didn't explain this stuff to you, and allowed you to
walk away thinking it was a badly done experiment!

~~~
tobtoh
Thanks for that fantastic explanation! It wasn't explained to me either when I
had my hearing tests - i just assumed they were doing multiple tests to get an
'average' answer.

------
chas
This approach surprised me. Why are they doing feature extraction and then
feeding that into a DNN? It seems much more straightforward to have the input
of the network be noisy samples and the output be clean samples a la super
resolution[0] in images. They probably wouldn't want to use fully-connected
layers in that instance, but I don't see any fundamental barriers if they have
enough computational power to run a neural network already. Am I missing
something?

[0]
[https://arxiv.org/pdf/1603.08155.pdf](https://arxiv.org/pdf/1603.08155.pdf)

~~~
adinisom
That might work, although I think there are two limitations:

1) Hearing aids have a 10ms latency budget. So no matter how much processing
they can do, they're limited by how many samples they can look ahead and that
limits the design of the filters. The brain can presumably look ahead further
to separate sound streams so I think it's pretty impressive that ideal binary
masking works.

2) Hearing aids have a power budget. The ones I've looked at achieve low power
by running a FIR filter in hardware to shape the sound while a DSP classifies
the sound and adjust the filter taps. The DSP doesn't have to run at the same
rate as the filter. That seems well matched to the binary filter approach.
Likewise features extraction might not run at the same rate as the DNN.

~~~
gwern
The latency and power issues can probably be fixed, assuming a good end-to-end
model, by using model distillation into a wide shallow net using low-precision
or even binary operations. I don't know if that would be enough - we've seen
multiple order of magnitude decreases in compute requirements (think about
style transfer going from hours on top-end Titan GPUs to realtime on mobile
phones) but the usual target is mobile smartphones which at least have _a_
GPU, while it seems unlikely any hearing aids will have GPUs anytime soon... I
suppose a good enough squashed low-precision model could be turned into an
ASIC.

~~~
taliesinb
Not to detract from your larger point but AFAIK the style transfer thing is
different. If you're willing to hardcode the style into the net you can go
realtime, but the original style transfer paper is able to do different styles
without retraining. So they're different algorithms. Unless the SOTA has
changed recently.

~~~
gwern
You shouldn't need to hardcode the style if you provide the style as an
additional datapoint for it to condition on. But this doesn't really matter
since for fun mobile applications it's fine to pick from 20 or 50 pretrained
styles, and likewise for hearing aids.

------
nilsb
This sounds like something that could have a potential use for non-hearing-
impaired people who have sensory overload issues (e.g. autism).

------
petra
Is deep learning useful for noise-cancellation headphones/earphones? It's
extremely hard to design great noise cancelling, only a very few do it, and
hence the prices are very high. If deep learning can reduce costs and increase
competition here, i think this sector could really grow.

------
rini17
Good read! Is it possible to know which hearing aids brand has this? Pity that
the HA manufacturers are so secretive what their buzzwords mean. It's
impossible to get good comparison between offered features.

------
nabla9
How is this method better compared to independent component analysis?

~~~
jfsantos
For one thing, independent component analysis needs to process signals from as
many microphones as there are sources to work properly.

~~~
nabla9
This is true for vanilla ICA.

Independent Subspace ICA models can be applied to more signals than sources
problems. It's also possible to use different decomposition methods, or
subtracting already detected signals.

------
meow_mix
Really happy to have this guy as our neural nets professor at ohio state :)

------
phaedrus
I can't pick out a voice in a crowded room, or indeed separate speech from any
sort of background noise. In ideal listening conditions I miss words making
sentences not make sense, and often I don't realize someone has started
talking until I've already missed the first sentence. However I don't have any
physical hearing problem. Each time I've gotten my hearing tested I've been
told my tonal hearing is perfectly normal. Yet the problems I have with
picking out and understanding speech are absolutely debilitating, and I can't
get anyone to understand that it is a disability and that it is real.

At my insistence my audiologist administered a speech processing test, but I
was nonplussed to discover this test is completely unrealistic and did not at
all match the situations I have trouble with. The way it worked was that it
would mix a perfectly clear speech track with white noise, or a repetitive
loop of background speech or cafeteria noise. But since the sound streams were
mixed together so artificially, my brain could separate the audio streams
based on source track, words or no. And since the "interruption" loops were
repetitive, my brain could learn the pattern and discount it. So of course I
passed that test, too. The speech processing problems I have occur in _real_
environments when executing functions of daily life.

In the end the audiologist told me that maybe my problem is that I have ADHD
and that my attention isn't able to lock-on or stay with a conversation well.
He didn't know of anyone in my area who treated adults with ADHD, but promised
to send me a referral. I'm guessing he never found anyone, because that
referral never came. However it eventually led to me getting diagnosed and
treated for ADHD on my own. (Although it took almost 2 years to even get an
appointment.) I've found that getting a diagnosis and medication for ADHD has
improved my life immensely. However it has not helped with the original
problem; I still can't separate speech from other noises.

I resonate with the commenter who says he thinks his undiagnosed (physical)
hearing loss once contributed to him losing a job. At work, I find excuses to
hide/disconnect my phone because I have so much trouble making out what people
are saying over a phone. I use chat and IM, and write everything down or ask
for things written down. Still, sometimes I'll miss or not understand some
verbal instruction and get in trouble. It also causes relationship problems -
so many misunderstandings, misheard words, doing the opposite of what my
spouse asked or not realizing she said something to me. I avoid some social
activities because I know that background noise there will prevent me from
participating, or because mishearing people might lead to a social gaffe or a
dangerous misinterpretation of safety instructions.

I can't read lips to get by, either - whatever it is in my brain that affects
speech processing affects lip reading equally, if not worse, and sometimes
when I'm receiving the "all circuits are down" message from my speech centers,
I can't even understand someone's sentence no matter how many times they
repeat it. But if they write it down on a note I can understand it. In a way,
it's like the inverse of dyslexia.

Anyway, not that I have much hope of an answer, but anyone know where I _can_
go to talk about it or what kind of doctor would be actually interested and
not just brush this off?

------
csours
Tech starts 1/3 the way down the article (ctrl+f "clean speech")

> My lab was the first, in 2001, to design such a filter, which labels sound
> streams as dominated by either speech or noise. With this filter, we would
> later develop a machine-learning program that separates speech from other
> sounds based on a few distinguishing features, such as amplitude (loudness),
> harmonic structure (the particular arrangement of tones), and onset (when a
> particular sound begins relative to others).

> Next, we trained the deep neural network to use these 85 attributes to
> distinguish speech from noise.

> One important refinement along the way was to build a second deep neural
> network that would be fed by the first one and fine-tune its results. While
> that first network had focused on labeling attributes within each individual
> time-frequency unit, the second network would examine the attributes of
> several units near a particular one

> Even people with normal hearing were able to better understand noisy
> sentences, which means our program could someday help far more people than
> we originally anticipated

> There are, of course, limits to the program’s abilities. For example, in our
> samples, the type of noise that obscured speech was still quite similar to
> the type of noise the program had been trained to classify. To function in
> real life, a program will need to quickly learn to filter out many types of
> noise, including types different from the ones it has already encountered

 _oh_

~~~
sdrothrock
As someone with a cochlear implant who lives with the consequences of overly
clever programmers who thought they'd "help" by filtering out noise and volume
and whatever else... I really wish they wouldn't. This is a technology that
makes me so angry some days that I sometimes wonder if it was worth getting
implanted, even though I know it was.

~~~
rasur
This is something I do wonder about, in this context. I don't have a CI
myself, but my 6 year old son does, and I am somewhat concerned that he is-or-
might be experiencing partial sound "blindness" (meaning: sure speech
processing is adequate but there are surely some things that are processed
away). I have a fair amount of experience in music/sound-recording
environments and it makes me somewhat sad for him that he's still "missing
out" (although obviously this is outweighed by the fact that he can actually
hear and communicate now, but I'm sure you get what I mean).

I'd get into the area myself, if I was in anyway useful with DSP code or C++..

May I ask, were your hearing issues (leading to the CI) a recent thing, or
long-term? My main interest here is about using machine learning to assist
people who do not know sign-language to understand signers rather than to
"improve" the actual hearing process (because - personally - my S/L skills are
_abysmal_ ).

~~~
brothercolor
Interesting idea on using ML to help non-signers understand sign language. In
this thread's context, the ML is designed to help people hear better. In
visual contexts (which sign language lives in) would this hypothetical ML help
low vision or blind people see better?

Because people with 20/20 vision just need a sign language dictionary handy
and some patience.

~~~
sdrothrock
> Interesting idea on using ML to help non-signers understand sign language.

The dream for me is something like Google Glass with an app that can subtitle
spoken, written, and signed language.

> Because people with 20/20 vision just need a sign language dictionary handy
> and some patience.

I would think a LOT of patience... the easiest way at that point would just to
have the other person fingerspell or write what they're saying; if you're
watching something where that's not possible, then the dictionary will just be
an exercise in frustration.

~~~
rasur
> and some patience.

Yeah, well that would be one way of handling it, but unfortunately the real
world has terrible issues with not impeding my progress on that front. Not
that I'm anti-learning, at all, but - personally - I'm fighting a losing
battle against learning German, Swiss-German and Swiss-German Sign-language
whilst also being a walking-talking-english-lesson :D

Taking the slow way, with dictionary in hand, is as you point out, an exercise
in frustration (especially if the talker/signer is 6 years old).

Yes, I share your dream of something google-glass-like that can add subtitles.
There are people working on this (mostly in the UAE, if memory serves).
Interesting times ahead - hopefully I won't have to wait long, otherwise I'll
have to do it myself and that really would take a while ;)

------
imron
Reinvents is such an awful term.

They didn't 'reinvent' anything, they improved upon an existing shortcoming.

------
dwighttk
I was thinking they were on their way to making a Photoshop for audio until I
heard the before and after samples

