
Machine translation of cortical activity to text with encoder–decoder framework - bookofjoe
https://www.nature.com/articles/s41593-020-0608-8.epdf?referrer_access_token=yqi8gndGo3bsyqSOt-QH49RgN0jAjWel9jnR3ZoTv0PGGLjJSIdJ3hFHzrqzGtisj2DRxrMxg3xPhwYR9Or_MFFWi9FFLCitwQPN6hzDOfuddKxNu7NRPQzS80eVE9peKMdJN14zNjJ66HDiFw8OaB3K899HudKHwkGg7prtc2MKkCosR0Mg9lyu2R8tDA-O-XP-CgHFT7NJfXeq8il1mGo_26BUelda8VnFBttFRYIzC7ZAr5I03DG6S_NFvZQTiBj8zrA4P4qwNArhZ0sCMS7bc6l-KY6PS-SIqYDW2LvT2pkbfxepOjoNylWMqOyE&tracking_referrer=www.theguardian.com
======
bognition
Really cool to see progress made here but this won't be available for public
use any time soon (likely decades).

One of the biggest challenges with decoding brain signals is getting a large
number sensors that detect voltages from a very localized region of the brain.
This study was done with ECoG (Eletro-Cortico-Gram) which involves implanting
small electrodes directly on the surface of the brain. Nearly all consumer
devices use EEG (Electro Encephelo Gram) which involves putting sensors on the
surface of the skin.

Commercially available ECoG is highly unlikely as it requires an extremely
invasive brain surgery. For ethical reasons the implants from the study we're
likely implanted to help diagnose existing life threatening medical issues.

Decoding speech from EEG won't work as well as ECoG a number of reasons. First
the physical distance between the sensors and the brain Means the signals you
pick up aren’t localized. Second the skin and skull are great low pass filters
and filter out really interesting signals at higher frequencies, 100-2KHZ.
Additionally these signals have a really low signal power because they're
correlated with neuronal spiking.

ECoG does a really good job picking on these signals because the sensor is
literally on the surface of the brain. Its really hard to pick up these
signals reliably with EEG.

~~~
2008guy
No, actually high density electrode implants are right around the corner.
Watch the neuralink press event.

~~~
empath75
> One engineering challenge is that the brain's chemical environment can cause
> many plastics to gradually deteriorate. Another challenge to chronic
> electrode implants is the inflammatory reaction to brain implants.
> Transmission of chemical messengers via neurons is impeded by a barrier-
> forming glial scar that occurs within weeks after insertion followed by
> progressive neurodegeneration, attenuating signal sensitivity. Furthermore,
> the thin electrodes which Neuralink uses, are more likely to break than
> thicker electrodes, and currently cannot be removed when broken or when
> rendered useless after glial scar forming.

Yeah, you're not putting that in my brain.

[https://en.wikipedia.org/wiki/Neuralink](https://en.wikipedia.org/wiki/Neuralink)

~~~
2008guy
The inflammation and damage is seen in traditional arrays that are large and
rigid. A thread with low enough moment of inertia will likely not cause as
much damage. And the damage is only important if you put the electrodes in an
important place... we currently screw two giant lag screws into people’s heads
and call it “deep brain stimulation” so I feel optimistic about the long game.
But you’d still be right to be wary. And if I had it my way, this technology
would never be allowed to exist at all...

~~~
stainforth
Can you continue about your opposition to it existing?

~~~
salawat
It's a social crisis waiting to happen, especially if you can end up decoding
more than just what someone uses their speech centers to articulate.

This is invasive to the extreme, and seems to open the door for violations of
people's intimate thoughts down the road.

You may not think about it much now, but if you pay any attention to things
like intrusive thoughts, or even have to deal with carefully maintaining a
public face in the workplace, it should not be difficult to realize why these
technologies are legitimately dangerous even as read only systems.

The real nightmare begins when you finally get fed up with Read-Only and
figure out how to write in order to potentially mutate mental state.

I'm normally pretty forward-thinking in terms of embracing the March of
technological progress. However, the last decade or so has shown we as a
society have had our grasp exceed our socio/ethical/moral framework for using
it responsibly; and the potential abuse a full read/write neural interface
would enable is one of the few things that has managed to attain a "full-stop"
in my personal socio-ethical-moral framework.

Not to sound like that an adult, but we're just not ready.

Before anyone points out that the same moral outrage probably occurred with
the printing press; there is a big damn difference between changing someone's
mind through pamphlets, and having a direct link to the limbic system to
tickle on a whim. We do a very bad job of correctly estimating the long-term
effects of technological advancement; just look at how destructive targeted
advertising has been.

I haven't reached my conclusion on an existing preconception/predisposition
either. I used to be massively for this particular advancement. Only through a
long time spent reflecting on it has my viewpoint done a 180.

I'm aware of all of the positive applications for the handicap, brain-locked,
and paralyzed; but I'm still reluctant to consider embracing it for their sake
when I've seen how prone to taking a crowbar to a minor exception/precedent
our legal system is.

Maybe I've just been in the industry long enough not to trust tech people to
keep society's overall well-being and stability at heart. Maybe I'm becoming a
luddic coward as I get older. I don't know, and I ask myself if I'm not being
unreasonable every day. The answer hasn't changed though in a long while, even
though I do keep trying to seek out opportunities to challenge it.

I hope that helps, and doesn't make me sound like too much of a nut.

~~~
IdiocyInAction
> Before anyone points out that the same moral outrage probably occurred with
> the printing press; there is a big damn difference between changing
> someone's mind through pamphlets, and having a direct link to the limbic
> system to tickle on a whim.

I've recently read a short story from Ted Chiang likened the development of
writing to a fundamental cybernetic enhancement of the brain. I found it to be
quite enlightening, as I never thought of how writing changes how we see
ourselves and the environment. Our memories are imperfect and inaccurate and
amplify biases we have, while writing loses much less information.

> just look at how destructive targeted advertising has been

Can you elaborate? Targeted advertising doesn't even make my top 100 of
destructive technologies.

~~~
dandelo1953
Instead of thinking of advertising as "technology" you might want to look into
the military-esque research that brought it into the free market. Just like
the internet, psyops was first destructed and formalized by people that value
information over influence. As only one will beget the other with any
statistical certainty.

------
lars
This is cool. For those who are not super familiar with language processing, I
think it's good to point out the limitations of what's been done here though.
They mention that professional speech transcription has word error rate around
5%, and that their method gets a WER of 3%. Sure, but the big distinction is
that speech transcription must operate on an infinite number of sentences,
even sentences that have never been said before. This method only has to
distinguish between 30-50 sentences, and the same sentences must exist at
least twice in the training set and once in the test set. Decoding word-by-
word is really a roundabout way of doing a 50-way classification here.

It's an invasive technique, so they need electrodes on a human cortex. This
means data collection is costly, so their operating in very low data regime
compared to most other seq2seq applications. It seems theoretically possible
that this could operate on Google translate level accuracy if the sentence
dataset was terrabyte sized rather than kilobyte sized. That dataset size
seems very unlikely to be collected any time soon, so we'll need massive leaps
in data efficiency in machine learning for something like this to reach that
level. They explore transfer learning for this, which is nice to see. Subject-
independent modelling is almost certainly a requirement to achieve significant
leaps in accuracy for methods like this.

~~~
kasmura
Is the following quote at odds with what you are saying about 50-way
classification?

"On the other hand, the network is not merely classifying sentences, since
performance is improved by augmenting the training set even with sentences not
contained in the testing set (Fig. 3a,b). This result is critical: it implies
that the network has learned to identify words, not just sentences, from ECoG
data, and therefore that generalization to decoding of novel sentences is
possible."

~~~
lars
The difficulty of the problem is that of a 50-way classification. If the only
goal was to minimize WER, a simple post-processing step choosing the nearest
sentence in the training set could easily bring the WER down further. They've
chosen to do it the way they did it presumably to show that it can be done
that way, and I don't fault them for it.

They claim that word-by-word decoding implies that the network has learned to
identify words. This may well be true, but it isn't possible to claim that
from their result. For example, let's say you average all electrode samples
over the relevant timespan, transform that representation with a FFW neural
net, and feed that into the an RNN decoder. It would still predict word-by-
word, on a representation that necessarily does not distinguish between words
(because the time dimension has been averaged over). Such a model can still
output words in the right order, just from the statistics of the training
sentences being baked into the decoder RNN.

------
hrgiger
I have tried something similar 5 years ago using the meditation device from
choosemuse.com. It was the cheapest option and provided hackable interface
that you had access all the data. Then I wrote a small mobile app connects to
headset.

Application was picking and showing a single random word from "hello world my
name is hrgiger" then showing a greeen light, when I see green light, i think
about the word and blink, headset was able to detect blinks as well so app was
creating training data using blink time - xxx millis. So I created few
thousands training data with 6 class using this and trained with my half-ass
nn implementation and used generated weights to predict same way via mobile.
Never achieved higher than 40%, tried all mixed waves, raw data, different
windows of time series. Yet still it was a fun project to mess with, still I
try to tune this nn implementation. If they achieve a practical solution I
would use subtitles for the full length training. Simple netflix browser
plugin might do the trick,but I am not sure if there will be a single AI algo
that would understand everyones different data.

~~~
linschn
40% over 6 classes is way above a random baseline. This is actually pretty
cool. Congratulations!

~~~
hrgiger
Oh thank you! I wasnt even aware those numbers were promising.

------
leggomylibro
It looks cool, but they trained their models on people reading printed
sentences out loud.

Would that actually translate to decoding the process of turning abstract
thoughts into words?

The researchers also note that their models are vulnerable to over-fitting
because of the paucity of training data, and they only used a 250-word
vocabulary. Neuralink also has a strong commercial incentive to inflate the
results, so I'm not too sure about this.

It's great to see progress in these areas, but it seems that technologies like
eye-tracking and P300 spellers are probably going to be more reliable and less
invasive for quite some time.

~~~
hyyggnj
The speaking aloud is very suspicious. Why do subjects need to speak aloud?
Are they actually decoding neural signals or just picking up artifacts
introduced by the physical act of speaking (i.e. electrodes vibrating due to
sound, etc)?

------
weinzierl
Fascinating work but far from what some might hope from reading only the
title.

The translation is restricted to a vocabulary of 30 to 50 unique sentences.

~~~
zo1
They do mention that the network is partially learning the words themselves:

> "On the other hand, the network is not merely classifying sentences, since
> performance is improved by augmenting the training set even with sentences
> not contained in the testing set (Fig. 3a,b). This result is critical: it
> implies that the network has learned to identify words, not just sentences,
> from ECoG data, and therefore that generalization to decoding of novel
> sentences is possible."

------
zo1
Can we remove the tracking query-string from this link, please? It works fine
without it:

[https://www.nature.com/articles/s41593-020-0608-8.epdf](https://www.nature.com/articles/s41593-020-0608-8.epdf)

Edit. Sorry, seems to only show the first page if you remove the token.

~~~
IHLayman
Not only that, if you don't run the trackjs script, the pdf won't load at all.
Sorry but a hard pass from me. Don't track my reading.

------
briga
It seems like this field is at about the same stage of progress as image
recognition was in the 90s when researchers were trying to getting a handle on
MNIST-type tasks.

I wonder how much the language embeddings learned by the transformer are
reflected in the actual physical structure of the brain? Could it be that the
transformer is making the same sort of representations as those in the brain,
or is it learning entirely new representations? My guess is that it's doing
something quite different from what the brain is doing, although I wouldn't
rule out some sort of convergence. Either way, this is a fascinating branch of
research both for AI and the cognitive sciences.

------
h3ctic
Looks like a good approach and the error rate of 3% is really good, I guess.
Did they mention how they got the input data? I couldn't find it.

~~~
zo1
They use 250 ECG electrodes as input. I think that means it's above the skin,
so not invasive.

~~~
resiros
You are mistaken. They use ECoG which is intracranial electroencephalography.
These are electrodes placed on exposed surface of the brain.
[https://en.wikipedia.org/wiki/Electrocorticography](https://en.wikipedia.org/wiki/Electrocorticography)

~~~
zo1
Thanks for the info!

------
carapace
I'm pretty sure you can get that with a HD camera or two and some hypnosis
plus off-the-shelf ML.

One of the very first things I learned when I was studying hypnosis was to
induce a simple binary signal from the unconscious. (Technically it's trinary:
{y,n,mu}
[https://en.wikipedia.org/wiki/Mu_(negative)](https://en.wikipedia.org/wiki/Mu_\(negative\))
)

(In my case my right arm would twitch for "yes", left for "no", no twitch for
"mu" (I don't want to go on a long tangent about all the various shades of
meaning there, suffice it to say it's a form of "does not compute."))

Anyway, it would be trivial to set up one or more binary signals, and detect
them via switches or, these days, HD cameras and ML. You could train your
computer to "read" your mind from very small muscular contractions/relaxations
of your face. (The primary output channel, even before voice, of the brain,
eh?)

Or you could just set up a nine-bit parallel port (1 byte + clock) and
hypnotize yourself to emit ASCII or UTF_8 directly. That would be much much
simpler because it's so much easier and faster to write mind software than
computer software (once you know how.) And you could plug yourself into any
USB port and come up as a HID (mouse & keyboard.)

I'll say it again: when you connect a brain to a computer the more
sophisticated information processing unit is the point of greatest leverage.
Trying to get the computer to do the work is like attaching horses to the
front of your truck to tow it. Put the horses in the back and let the engine
tow them.

~~~
astrea
Could you elaborate a little bit on the 'induce a simple binary signal from
the unconscious'? That sounds fascinating.

~~~
carapace
Sure. (Thanks for asking.) It was one of the first things I learned when I
started studying (self-)hypnosis. It's a simple way to have access to the
unconsciousness without going into a trance.

There's really nothing to it. You induce a light trance and ask the
unconscious mind to create a simple unambiguous yes-no signal. Finger motions
are common. After that you can ask yourself questions and get y/n answers (or
non-response, what I'm calling "mu", which indicates some issue with the
phrasing or nature of the query.)

I should mention that you should be very careful about your self-model if you
are experimenting with piercing the barrier between the conscious and
unconscious minds. In computer terms, this signal corresponds to a kind of
trans-mechanical _oracle_ and having it available to your (metaphorical)
Turing machine mind makes you into a fundamentally different kind of
processor, operating by rules that may be unfamiliar.

[https://en.wikipedia.org/wiki/Oracle_machine](https://en.wikipedia.org/wiki/Oracle_machine)

But see also:
[https://en.wikipedia.org/wiki/Oracle](https://en.wikipedia.org/wiki/Oracle)
because that's more accurate.

------
neonate
[https://sci-hub.tw/https://doi.org/10.1038/s41593-020-0608-8](https://sci-
hub.tw/https://doi.org/10.1038/s41593-020-0608-8)

------
Ritsuko_akagi
Speech From Brain Signals
[https://youtu.be/YHFx6O5x5Hw](https://youtu.be/YHFx6O5x5Hw) (2019)

------
tighter_wires
Man, what is going on with Participant C's active neurons.

