
Reservoir computing - gyre007
https://en.wikipedia.org/wiki/Reservoir_computing
======
qaute
"Reservoir computing" taken very literally: wave interference in a bucket of
water computed a simple speech recognition task (differentiate "zero" and
"one") [1].

[1]
[https://link.springer.com/chapter/10.1007%2F978-3-540-39432-...](https://link.springer.com/chapter/10.1007%2F978-3-540-39432-7_63)

~~~
sgentle
PDF:
[https://pdfs.semanticscholar.org/af34/2af4d0e674aef3bced5fd9...](https://pdfs.semanticscholar.org/af34/2af4d0e674aef3bced5fd90875c6f2e04abc.pdf)

Also the Liquid State Machine paper they cited, "Real-Time Computing Without
Stable States: A New Framework for Neural Computation Based on Perturbations":
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.5.8...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.5.8123&rep=rep1&type=pdf)

Fascinating idea. If I understand correctly, it's like an extreme version of
data pre-processing. If you're trying to figure out whether an audio clip is
saying "zero" or "one", analysing the raw amplitude data is pretty tough
going. Instead you could run it through a Fourier transform in the hope that
the clip's frequency content would be easier to analyse. If that doesn't help,
maybe a wavelet transform, or something more fun like, uh, the "inverse
Fourier transform of the logarithm of the squared magnitude of the Fourier
transform"

In a sense, it doesn't really matter what pre-processing you do, provided that
the differences between "zero" and "one" are more distinct in the output than
the input. This is the "separation property" that the papers mention:
important differences get magnified at the expense of unimportant ones. If
that's true, your final analysis will have a lot less work to do.

What's cool about this is that "anything that magnifies important differences"
is a pretty open-ended requirement, leaving you free to choose pre-processing
that's easy to implement in hardware. In this case, fluid dynamics has the
desired properties, and the laws of our universe make it very easy to
implement a fluid simulation using actual fluid in an actual bucket.

Perhaps there are other systems with similar separation properties that are
even easier to implement in hardware. Maybe something with electromagnetic
waves, like in time-domain reflectometry? Even if such a system's behaviour is
uninterpretable to us, it might still provide useful pre-processing to a
machine learning algorithm.

------
gyre007
There are a few interesting resources about this[1]. The concept of
"microelectromechanical neural network application" [2] is super interesting.
We've been hearing about memristors for a while, but has anyone seen them
being widely deployed anywhere?

[1] [http://www.physnews.com/nano-physics-
news/cluster1837307157/](http://www.physnews.com/nano-physics-
news/cluster1837307157/)

[2]
[https://aip.scitation.org/doi/full/10.1063/1.5038038](https://aip.scitation.org/doi/full/10.1063/1.5038038)

------
woopwoop
After writing the following, I realized it might come off as more critical
than I meant it. I'm genuinely curious, I just wasn't able to understand the
motivation for this idea from the Wikipedia page.

First, let me attempt a summary in language that makes sense to me. You want
to interpolate a function f:R^n to R^m. So you pick a "random non-linear
dynamical system", and compose f with the output of this. To be concrete,
let's say you pick a polynomial map F: R^n to R^n of degree at most k for some
fixed k with random Gaussian coefficients. Let g(x) = h(1), where h' = F(h)
and h(0)=x. Now, given training data (x_i, y_i), let z_i = g(x_i), and
interpolate the function f* on the training data (z_i, y_i) using whatever
method you like (least squares or a Neural net or whatever). Given test data
x, guess f(x) = f*(g(x)).

Is this correct? Why is this plausibly a good idea? Why have the function g be
the output of a dynamical system, and not a random matrix? Or if you want g to
be nonlinear for some reason (what reason?), the output of a random polynomial
or a random Fourier series? What happens if two clusters in the initial data
which are well separated and map to different well-separated outputs in the
codomain get mixed after composing with the dynamical system map g? Or what if
g is not even one-to-one?

------
jphoward
What I don't understand is if you can train a final layer on the reservoir's
random representation, why is this better than just training the final layer
on your data directly? I assume the answer has something to do with
dimensionality reduction?

~~~
breuderink
The reservoir can compute non-linear functions of the input over time. So,
with a reservoir, you can train a linear projection of the reservoir state
that responds non-linearly to the input. Without the reservoir, the output
projection can only be linearly related to the input.

------
Seanny123
FYI, if you analytically determine the recurrent connections, you get way
better results than setting the connections randomly.
[http://compneuro.uwaterloo.ca/publications/voelker2018.html](http://compneuro.uwaterloo.ca/publications/voelker2018.html)

------
ovi256
You should be aware that echo-state networks are considered special cases of
recurrent networks. One variant of recurrent networks, LTSMs, dominates state
of the art on a wide variety of problems.

They're expensive to train, in computing time and data-requirements, but
they're the closest thing we have to trained programs.

~~~
rusticpenn
As far as I know, reservoir computing (liquid state systems) works with
Spiking Neural Systems, which are different from "classic" artifical neurons.

------
fnord77
eli5?

~~~
krastanov
I am simplifying it to a level where it becomes borderline useless/wrong, but
it should give a gist of the idea: it turns out that frequently you do not
need to train anything but the last layer of a deep neural network, if all the
other layers are sufficiently big arbitrary and weird. In reservoir computing
you replace all but the last layer of the network with a "dynamical system",
i.e. a large and meaningless map.

~~~
gbrown
So treat it like a basis expansion?

~~~
GlenTheMachine
If I understand the question correctly, ie is a reservoir computing approach
simply projecting an input vector into a higher dimensional space, like a
wonky support vector machine, then I think the answer is: unclear.

Reservoir computing approaches usually have the intermediate layers be
recurrent, eg they implement a dynamical system. Theoretically, this is
actually Turing complete, although good luck programming it. But in any case,
the range of behaviors — transformations in the data — that a dynamical system
can implement is much more powerful than just a basis expansion. However,
whether this is what is happening in actual practice is really, really unclear
to me. A lot of recurrent neural networks aren't doing anything more than an
equivalent feedforward network, and the same thing may be true here: reservoir
approaches might, for the most part, really just be performing nonlinear
projection to higher dimensional spaces, and then the output layer is being
trained to classify those patterns.

~~~
gbrown
Thanks, that makes sense.

