
The Overfitted Brain: Dreams evolved to assist generalization - johnsimer
https://arxiv.org/abs/2007.09560
======
cs702
_> ... all DNNs face the issue of overfitting as they learn, which is when
performance on one data set increases but the network's performance fails to
generalize (often measured by the divergence of performance on training vs
testing data sets). This ubiquitous problem in DNNs is often solved by
experimenters via "noise injections" in the form of noisy or corrupted inputs.
The goal of this paper is to argue that the brain faces a similar challenge of
overfitting, and that nightly dreams evolved to combat the brain's overfitting
during its daily learning._

Actually, there's compelling evidence that overfitting is a necessary step for
achieving state-of-the-art performance with DNNs!

Many state-of-the-art deep learning models today are trained until they
achieve ~100% accuracy on the training data, _and then we continue to train
them_ because once they go past this "interpolation threshold" they continue
learning to generalize better to unseen data. This is known as the "double
descent" phenomenon. See, for example:

[https://openai.com/blog/deep-double-descent/](https://openai.com/blog/deep-
double-descent/)

[https://arxiv.org/abs/1912.02292](https://arxiv.org/abs/1912.02292)

[https://arxiv.org/abs/1903.08560](https://arxiv.org/abs/1903.08560)

The author makes no mention of double descent and the need for overfitting. In
fact, he seems completely unaware of it.

\--

EDIT: Also, see this comment
[https://news.ycombinator.com/item?id=23957501](https://news.ycombinator.com/item?id=23957501)
elsewhere on this page.

~~~
blackbear_
You should be careful over-interpreting this stuff. Double descent is still
poorly understood, and some [1] argue it is an artifact caused by wrongly
assuming that the model complexity is a linear function of the number of
parameters.

I would also argue that "the need for overfitting" is a consequence of broken
benchmarks, rather than a feature of deep learning. Why else would adversarial
examples arise?

[1] [https://deepai.org/publication/rethinking-parameter-
counting...](https://deepai.org/publication/rethinking-parameter-counting-in-
deep-models-effective-dimensionality-revisited)

~~~
jules
Isn't double descent explained by the following?

The network contains many more parameters than data points.

Therefore there is an entire region of lowest training error.

A random point sonewhere in the middle of that region probably generalises
better than at the edge.

Once SGD enters this region at the edge, generalisation can still occur
because the randomness will most likely cause it to random walk inside the
region.

To test this hypothesis you could run gradient descent with line search
instead of SGD, and then you should not see this extra generalisation. Then if
you add a bit of randomness to gradient descent you should see this extra
generalisation again, if this hypothesis is correct. Also, under this
hypothesis you'd predict that the speed at which generalisation improves
depends on the batch size.

~~~
Kinrany
Might be a stupid question, but can we skip the random walk and just pick the
middle?

~~~
pigscantfly
Not a stupid question at all, but one problem is that the boundaries of a
zero-train-loss region are not well characterized and evaluating the
validation loss even at a single point is computationally expensive. The
centroid of one of these regions might not even be inside it (eg. donut shape
but in higher dimensions) Interesting discussion though -- probably worth a
few papers if someone were to investigate further.

~~~
chillee
This just sounds like Stochastic Weight Averaging, which works quite well:
[https://arxiv.org/abs/1803.05407](https://arxiv.org/abs/1803.05407)

------
jere
The abstract is indeed fascinating and I'm reading through the full text,
which is so far mostly easy to understand for a layman.

The high number of typos throws me a bit though. Amusingly they even misspell
their central idea "Overfitted Brian Hypothesis" and I gotta say that the
opening reminds me a bit of fluff you see in mindless high school essays.

> During the Covid-19 pandemic of 2020, many of those in isolation reported an
> increase in the vividness and frequency of their dreams (Weaver, 2020), even
> leading #pandemicdreams to trend on Twitter. Yet dreaming is so little
> understood there can be only speculative answers to the why behind this
> widespread change in dream be- havior.

~~~
datenhorst
Maybe presence of typos is the way to signal non-authorship by GPT-3?

~~~
nullc
GPT-3 can produce typos.

~~~
s_gourichon
Indeed: GPT-3 is liable to produce the same density of typos as its learning
corpus, which is from humans.

------
sgdpk
In the book "Why we sleep?", Matthew Walker suggests something similar. That
dream sleep is fundamental in making associations. In this case, generalizing
and getting rid of overfitting. He talks a bit about problem solving when
sleeping and how this leads to "a-ha" moments when waking up.

This means that the idea in this paper is already "out there", contrary to
what the abstract states. But it's exciting to have a framework to talk about
it quantitatively.

------
TaupeRanger
I'm foreseeing impending downvotes but I have to rant somewhere. There should
be a name for this kind of ridiculous hubris. Unfalsifiable non-insights by
people trying to apply arbitrary deep learning algorithms to a brain which is
definitely not using any of them. DLcentrism? We don't even know how the brain
does almost _anything_ and you think you can use trendy AI topics to explain
something as complex and mysterious as dreams?

This reminds me of Matt Walker's terrible book on sleep, which, as with almost
all neuroscience research recently, tries to explain "why" we have some
behavioral pattern or experience, but literally never offers an explanation at
all, opting to say "this region lights up in an fMRI machine", as if that
answers anything at all. It's like if you asked "why does the heart pump
blood?" and a cardiologist answered, "well, the heart is very important for
exercise, and people with healthy hearts live longer, and when we attach
electrodes to it we see these interesting patterns associated with pulse and
breathing...". That's Matt Walker's book applied to the brain. This allows
"neuroscience" to get away these ridiculously overextended papers, because you
can't disprove anything about something so hard to understand in the first
place.

~~~
maps7
The answer I got from Matt Walker's "Why We Sleep?" was a list of benefits
that we get from sleep and a list of negatives that we avoid. That is a
sufficient answer for me. To ask why we need those benefits is a different
question and eventually goes down a philosophical path.

~~~
andyljones
'Why We Sleep' has some, uh, issues on the benefits-and-negatives front too.

[https://guzey.com/books/why-we-sleep](https://guzey.com/books/why-we-sleep)

~~~
maps7
Thanks for posting - it's always good to see all views. It's disappointing
because the book helped me a lot. It helped me understand sleep and how I
should change my lifestyle to get more of it. I feel (no measure) that it has
benefited me.

I will not regard it as scientifically accurate now though.

Does anyone have any other books about sleep that they could recommend?

------
amitport
"Notably, all DNNs face the issue of overfitting as they learn, which is when
performance on one data set increases but the network's performance fails to
generalize (often measured by the divergence of performance on training vs
testing data sets)."

Not really. For example, "Gradient Methods Never Overfit On Separable Data"
[https://arxiv.org/abs/2007.00028](https://arxiv.org/abs/2007.00028)

~~~
blackbear_
"In this paper, we consider the implicit bias in a well-known and simple
setting, namely learning linear predictors (x->x'w) for binary classification
with respect to linearly-separable data"

Hoping that this applies to deep neural networks is a huge leap of faith to be
honest.

------
bil7
Wow. Never before has an abstract felt so mind blowing to read

~~~
belly_joe
Came here to say the same.

At least personally, it's the insight into human brain mechanics and new
abilities to test hypotheses in this field that gets me most excited about
deep learning developments, rather than the improvements in performance on
various tasks.

------
bfirsh
If you’re on a phone, here’s an HTML version: [https://www.arxiv-
vanity.com/papers/2007.09560/](https://www.arxiv-
vanity.com/papers/2007.09560/)

------
seesawtron
"The goal of this paper is to argue that the brain faces a similar challenge
of overfitting, and that nightly dreams evolved to combat the brain's
overfitting during its daily learning....Sleep loss, specifically dream loss,
leads to an overfitted brain that can still memorize and learn but fails to
generalize appropriately."

This is a beautiful idea. Will have to read the whole paper to understand how
they support this claim.

------
darksaints
While an interesting hypothesis, we should always be careful with research
that tries to derive biological understanding from AI research. AI is, by
necessity, a simplification of how our brains work. For example, current
neural networks really only have one definition of neuron. But biological
neurons can be very different...there are hundreds of types of neurons. Even
if we limit the definition of neuron type to just describe variation in
switching behavior, mice have been found to have 19 distinct neuron types with
distinct switching behavior, and humans likely have dozens more.

------
hliyan
I wonder what psychology as a discipline will look like in twenty years? I
feel like what we're learning now about the human mind through our study of
neural networks is similar to the early work in cellular biology that
eventually replaced metaphor-based models in medicine (e.g. 'humors') with
more ontological ones. Freud's Ego, Superego and Id are gone. So are
'complexes'. Our model of the human mind currently seems to be limited to
'conscious and subconscious'. I'm excited at the prospect of something much
better.

------
irrational
To the best of my knowledge, I have ever dreamed. The abstract speaks of the
dangers of not dreaming from lack of sleep and I wonder if the same thing
applies to naturally not dreaming at all?

~~~
IAmGraydon
It’s far more likely that you just don’t remember your dreams.

~~~
tgv
What’s a dream then?

~~~
pstuart
A dream is the brain coming back online after being shut down for maintenance.

That's my supposition. It's an artifact, not a feature.

~~~
tgv
But that's a model of how you think it works. It's not what a dream is.

------
briga
I love it when two disciplines come together to find new solutions to
scientific problems, but something seems off here. An artificial neural
network is incredibly simplistic compared the the messy complexity of the
brain. Generalization seems to happen in some places and for some dreams, but
is that really the only function of dreams? I somehow doubt it.

------
g_airborne
If we’re going down this road of theorizing about the human brain based on
DNNs, what is the deal with dropout? Could we help human brains with
generalization by randomly removing 10% of our newly created connections at
the end of each day to improve long term learning? :)

~~~
rtkaratekid
That’s called synaptic pruning and, while it most happens as a human matures,
there’s evidence indicating that it occurs during sleep in adults to help
consolidate the most important connections and remove the unimportant ones.
It’s not exactly like dropout, but at a high level it kind of looks like it.

------
AndyPatterson
Skimmed the paper so couldn't possibly give it a fair review but I always feel
there's something off when people make comparisons of ANNs to actual
biological brains. Even more so when it's the other way about.

------
metachor
Is this a form of mechanomorphism, where we try to reason about how human
cognition might work by drawing an analogy from how computers work
(specifically, overfitting in ANNs) and try to apply it back to humans?

------
longtom
Seems like a testable hypothesis: Prepare a "training" deck of cards with an A
side and B side. Each side has a simple symbol on it. Create a second "test"
deck of cards which is identical, expect each A symbol is slightly shifted in
meaning (e.g. horse -> donkey). The task is to predict side B from side A.

If less dreaming leads to overfitting, we would expect REM sleep deprived
probands to do better on the "training" set after learning with them and do
worse on the "test" set, compared to probands without sleep deprivation.

~~~
darksaints
This paper isn't about sleep deprivation, it's about dreams.

~~~
longtom
Right, REM phase interruption would be sufficient (according to this theory).
There are also a bunch of additional variables that one likely cannot (easily)
control for. Still, this theory should make predictions of the sort that more
overfitting occurs in absence of dreaming.

------
ancaster
I recall this being essentially the inspiration for the naming of the Wake-
Sleep algorithm of Boltzmann machines.

~~~
isanybodythere
In the paper: "It is worth noting that the proposal of a "wake/sleep" specific
algorithm for unsupervised learning of generative models based on feedback
from stochastic stimulation goes back 25 years (Hinton et al., 1995)"

------
choonway
how about daydreaming? does that work too?

