
Brain signals translated into speech using AI - Balgair
https://www.nature.com/articles/d41586-019-01328-x
======
gwern
Mirror:
[https://www.gwern.net/docs/ai/2019-anumanchipalli.pdf](https://www.gwern.net/docs/ai/2019-anumanchipalli.pdf)

~~~
godelmachine
Merci beaucoup :)

------
melling
I saw the article in the NYT. It mentions that the system is capable of 150
wpm, compared to older systems that could only do 8 wpm.

“Previous implant-based communication systems have produced about eight words
a minute. The new program generates about 150 a minute, the pace of natural
speech.”

[https://www.nytimes.com/2019/04/24/health/artificial-
speech-...](https://www.nytimes.com/2019/04/24/health/artificial-speech-brain-
injury.html)

~~~
dang
Thanks—since the paper is paywalled and HN tends to prefer the best popular
article on such a topic, we've changed the URL to that from
[https://www.nature.com/articles/s41586-019-1119-1](https://www.nature.com/articles/s41586-019-1119-1).

A pdf of the paper itself is linked from elsewhere in this thread.

~~~
return1
Nature has an editorial on the paper that is better and free

[https://www.nature.com/articles/d41586-019-01328-x](https://www.nature.com/articles/d41586-019-01328-x)

~~~
dang
Ok, we've switched to that from
[https://www.nytimes.com/2019/04/24/health/artificial-
speech-...](https://www.nytimes.com/2019/04/24/health/artificial-speech-brain-
injury.html). Thanks!

------
avisser
I'd love to see if it works when people are dreaming. Or people who have
Locked-in Syndrome. Or even people who are in a coma.

[https://en.wikipedia.org/wiki/Locked-
in_syndrome](https://en.wikipedia.org/wiki/Locked-in_syndrome)

~~~
chanakya
It doesn't try to read thoughts. It uses commands that are sent to the
tongue/mouth to form syllables. We're nowhere _near_ being able to trace /
decode thoughts.

~~~
joshvm
Actually you can, in limited cases.

If you have prior knowledge of the person's brain activity when looking at an
image, it's possible to predict (reconstruct an image of) what they're looking
at.

[https://journals.plos.org/ploscompbiol/article/figures?id=10...](https://journals.plos.org/ploscompbiol/article/figures?id=10.1371/journal.pcbi.1006633)

~~~
Engineering-MD
That is not reading thoughts. That is via the occipital Cortex which has a
highly structured representation of what is projected on to the retina. This
is much easier as the organisation of the data is maintained relative to the
retina, just regrouped according to visual field rather than eye.

~~~
Teever
If we're reading data from the brain isn't that by definition reading
thoughts, regardless of where in the brain they originate from?

------
thisisananth
I haven't read the full paper to know what hardware it needs, but if it can be
put into a headphone like wearable and integrated into phone to be output as
text, then we can type on phone as fast as we can think. Looking forward to
it! Also once of the major impediments to using voice assistants in public is
to not want to shout into the phone and if it can be directly integrated into
the assistant, the voice assistant use will explode. The possibilities are
amazing!

~~~
fundamental
This technology uses an electrode array referred to as ECoG, which needs to be
surgically placed on the surface of the brain. Current technology does not
make it possible to have the signal quality from non-invasive methods.

~~~
thisisananth
Ok. Then all those applications will have to wait!

~~~
king07828
Sounds like a neural network that converts EEG signals to ECoG signals would
be an interesting idea.

~~~
sagarm
If there is enough information in EEG signals to generate speech, an
intervening layer to "convert" EEG signals into ECoG signals is an unnecessary
complication.

~~~
entropicdrifter
Unfortunately, there probably isn't; on top of that, EEGs are extremely
susceptible to noise from nerve signals going to the head/face, such as
clenching your jaw or raising your eyebrows

------
781
Why are advances like this always presented as useful in a medical context?

It seems to me like some sort of fear of talking about commercial use cases -
for example in this case the ability to talk in your mind with your phone,
person-phone telepathy if you wish

~~~
aeontech
Because a slight increase in convenience to be able to talk to your phone
subvocally is trivial compared to granting capability for vocal communication
to someone who is incapable of regular speech. One is a slight improvement in
convenience for an otherwise able person. The other is life-changing.

~~~
781
True, but many people weigh the convenience by the number of uses.

In that case, 0.0001 * 1 million >> 1.0 * 100

~~~
SmellyGeekBoy
Are there really that many people willing to have electrodes implanted in
their brain just to... _type on their phone!?_

~~~
Dylan16807
Not right _now_ , but imagine if the level of invasiveness/danger was on the
same level as a birth control implant, or botox, or lasik. I've had surgery to
remove drainage tubes from my eardrums just because they were being
inconvenient. I'd rather put up with that in exchange for being able to do
150wpm hands-free.

~~~
stubish
What are you going to do with your hands while you are concentrating on your
writing? They are going to be dangling there, useless for nothing except
reflex actions like picking your nose or driving your car (!).

I already have attached devices that let me enter text faster than I can think
it. I think the tech only gets interesting for the handy-capable when it goes
beyond words, capturing images or sounds as you imagine them, and that is a
long way of.

~~~
Dylan16807
I might be holding my phone, I might just not be near a keyboard, I might be
controlling a video game and use it for extra input...

And while I can type sentences pretty fast, sometimes I get bogged down with
things like code and entering special characters and being able to subvocalize
a single syllable to activate macros would be pretty nice. (Yes I know you
could do that without a brain interface, but the point is that hands are
limited.)

Edit: why does _this_ comment get a downvote?

~~~
stubish
These use cases are all great for speech recognition, which is decades ahead
of brain interfaces and less invasive (and just as quiet if you want by sub-
vocalizing into a throat mike). Again, not particularly interesting to most of
us but very interesting to people with various disabilities.

------
solarkraft
I don't fully understand.

> As each participant recited hundreds of sentences, the electrodes recorded
> the firing patterns of neurons in the motor cortex. The researchers
> associated those patterns with the subtle movements of the patient’s lips,
> tongue, larynx and jaw that occur during natural speech. The team then
> translated those movements into spoken sentences.

So this means the audio example is speech synthesized from data gemerated
while the person was actually reading out loud, right?

Why does the text under the headline claim "no muscle movement needed", which
would imply audible speech synthesized from mere thoughts?

~~~
maebert
Former neuroscientist working on similar stuff here.

While the flashy part is the “speech synthesis”, the science breakthrough is
actually better frames as an machine learning problem.

Imagine you record someone moving their hand across a canvas. The hand
movement becomes the input, the drawing the output. The ML problem they solved
is to reconstruct the output (or at least something that resembles it) from
the input. In the study’s case, that’s efferent motor signals.

There is a long history of mapping these kind of signals in the sensorimotor
homunculus to the respective muscles they control downstream (and some really
cool stuff like prosthetic limbs can already be controlled with it), but
speaking is a notoriously hard motor task and requires a lot of muscles to
work in unison in very precise ways. When you implant these multi-electrode
arrays, you get a few hundred more or less random single neurons, astrocytes,
and local field potentials from the nether in between. Nonsense, noisy data.
Being able to map this back to the result they produce in the body is
technically as complex as astonishing!

~~~
blendo
There's a picture of the electrode array on the Nature site, and it looks
about 3cm x 3cm, with about 16 x 16 sensors. So 256 inputs to the analysis
system.

I know nothing about these kinds of electrodes, but how sensitive do they have
to be? Sub-microvolt? And how fast? Sub-millisecond?

Finally, on the results themselves. If I read it right, the sensors had been
previously implanted into epilepsy patients, and were "re-purposed" for this
study.

So I assume they patients were able to speak? If so, it's trickier to
demonstrate advantages for those who have lost the ability to speak.

------
rainhacker
If/when driverless cars are reliable, human's learning driving skills may
become obsolete. Wonder what are the implications if we are able to reliably
communicate our thoughts using AI. Can training vocal cords to speak become an
obsolete skill too? On this trajectory - I also wonder what other core
capabilities can be delegated to AI and what role this could plays in
evolution of future generations.

------
fundamental
After working on the neural speech-prosthesis problem in the single (or nearly
single) electrode case, this seems to be a promising result from the ECoG side
of things. While there's a few more results I'd like to see about how things
would be expected to generalize more broadly over time/neural-input, it does
look like the results are fairly solid.

Hopefully there can be some further development from here into some practical
applications.

------
superconformist
We have ways of making you talk.

Or at least reading words from your brain.

------
leafario2
Can't wait to control vim using my brain some day

------
sjg007
You could imagine Neuro link is doing something similar.

------
oldgun
I wish Stephen Hawking were still around.

~~~
jsnider3
Yeah. It's unfortunate he didn't get to see the black hole picture from a few
weeks ago.

------
person_of_color
Does this require mouth movement though?

------
chuckschemer
I read this as scientist creates _free_ speech from brain signals.

