Hacker News new | past | comments | ask | show | jobs | submit login
The Overfitted Brain: Dreams evolved to assist generalization (arxiv.org)
195 points by johnsimer 17 days ago | hide | past | favorite | 71 comments



The abstract is indeed fascinating and I'm reading through the full text, which is so far mostly easy to understand for a layman.

The high number of typos throws me a bit though. Amusingly they even misspell their central idea "Overfitted Brian Hypothesis" and I gotta say that the opening reminds me a bit of fluff you see in mindless high school essays.

> During the Covid-19 pandemic of 2020, many of those in isolation reported an increase in the vividness and frequency of their dreams (Weaver, 2020), even leading #pandemicdreams to trend on Twitter. Yet dreaming is so little understood there can be only speculative answers to the why behind this widespread change in dream be- havior.


> "Overfitted Brian Hypothesis" and I gotta say that the opening reminds me a bit of fluff you see in mindless high school essays

My first thought was Monty Python.


Maybe presence of typos is the way to signal non-authorship by GPT-3?


GPT-3 can produce typos.


Indeed: GPT-3 is liable to produce the same density of typos as its learning corpus, which is from humans.


> ... all DNNs face the issue of overfitting as they learn, which is when performance on one data set increases but the network's performance fails to generalize (often measured by the divergence of performance on training vs testing data sets). This ubiquitous problem in DNNs is often solved by experimenters via "noise injections" in the form of noisy or corrupted inputs. The goal of this paper is to argue that the brain faces a similar challenge of overfitting, and that nightly dreams evolved to combat the brain's overfitting during its daily learning.

Actually, there's compelling evidence that overfitting is a necessary step for achieving state-of-the-art performance with DNNs!

Many state-of-the-art deep learning models today are trained until they achieve ~100% accuracy on the training data, and then we continue to train them because once they go past this "interpolation threshold" they continue learning to generalize better to unseen data. This is known as the "double descent" phenomenon. See, for example:

https://openai.com/blog/deep-double-descent/

https://arxiv.org/abs/1912.02292

https://arxiv.org/abs/1903.08560

The author makes no mention of double descent and the need for overfitting. In fact, he seems completely unaware of it.

--

EDIT: Also, see this comment https://news.ycombinator.com/item?id=23957501 elsewhere on this page.


You should be careful over-interpreting this stuff. Double descent is still poorly understood, and some [1] argue it is an artifact caused by wrongly assuming that the model complexity is a linear function of the number of parameters.

I would also argue that "the need for overfitting" is a consequence of broken benchmarks, rather than a feature of deep learning. Why else would adversarial examples arise?

[1] https://deepai.org/publication/rethinking-parameter-counting...


Isn't double descent explained by the following?

The network contains many more parameters than data points.

Therefore there is an entire region of lowest training error.

A random point sonewhere in the middle of that region probably generalises better than at the edge.

Once SGD enters this region at the edge, generalisation can still occur because the randomness will most likely cause it to random walk inside the region.

To test this hypothesis you could run gradient descent with line search instead of SGD, and then you should not see this extra generalisation. Then if you add a bit of randomness to gradient descent you should see this extra generalisation again, if this hypothesis is correct. Also, under this hypothesis you'd predict that the speed at which generalisation improves depends on the batch size.


Might be a stupid question, but can we skip the random walk and just pick the middle?


Not a stupid question at all, but one problem is that the boundaries of a zero-train-loss region are not well characterized and evaluating the validation loss even at a single point is computationally expensive. The centroid of one of these regions might not even be inside it (eg. donut shape but in higher dimensions) Interesting discussion though -- probably worth a few papers if someone were to investigate further.


This just sounds like Stochastic Weight Averaging, which works quite well: https://arxiv.org/abs/1803.05407


Most of the DL models we use are already above the "interpolation threshold" because they use millions of parameters which implies that they are already being trained in the "double descent" regime. Those papers and more [0] argue to explain why using millions of parameters still doesn't overfit our models that are trained on much less data (opposing the traditional Vapnik machine learning argument that your model paramters should not exceed number of data points otherwise you see overfit) because the models still manage to perform well on test data.

In this paper, I think the authors are not focusing on overfitting on the data in the traditional Vapnik sense, but overfitting on unimportant information within that data. Noise injection and data augmentation are the techniques we use to de-correlate signal from noise in the data so that the networks can focus on signal and not on noise. Here sleep and dreams are argued to be that source of "noise" for our brain to decorrelate "signal" from the tasks we learn while awake.

[0] https://arxiv.org/abs/1903.07571


Not exactly. Those overparametrized models first overfit to the training data in the classical sense, and generalize worse to unseen data, and then, if we keep training past the interpolation threshold, the models start generalizing better. That's why the phenomenon is being called double descent. Injecting noise via dropout or mixup obscures this learning dynamic.


> Actually, there's compelling evidence that overfitting is a necessary step for achieving state-of-the-art performance with DNNs!

I think the term you are looking for is "overparametrization". Overfitting is, by definition, worse generalization performance despite better training performance.


Generalization gets worse, then better. That's why this phenomenon is being called double descent.


But there is no need to use a particularly small NN before using a slightly larger one still in the "ascent" region. (Overfitting is relative)


Agree :-)


This paper argues that dreams are a form of data augmentation. Data augmentation is still useful even given double descent. Double descent is theoretically interesting but every DL practitioner will tell you that overfitting is still a problem.


In the book "Why we sleep?", Matthew Walker suggests something similar. That dream sleep is fundamental in making associations. In this case, generalizing and getting rid of overfitting. He talks a bit about problem solving when sleeping and how this leads to "a-ha" moments when waking up.

This means that the idea in this paper is already "out there", contrary to what the abstract states. But it's exciting to have a framework to talk about it quantitatively.


I'm foreseeing impending downvotes but I have to rant somewhere. There should be a name for this kind of ridiculous hubris. Unfalsifiable non-insights by people trying to apply arbitrary deep learning algorithms to a brain which is definitely not using any of them. DLcentrism? We don't even know how the brain does almost anything and you think you can use trendy AI topics to explain something as complex and mysterious as dreams?

This reminds me of Matt Walker's terrible book on sleep, which, as with almost all neuroscience research recently, tries to explain "why" we have some behavioral pattern or experience, but literally never offers an explanation at all, opting to say "this region lights up in an fMRI machine", as if that answers anything at all. It's like if you asked "why does the heart pump blood?" and a cardiologist answered, "well, the heart is very important for exercise, and people with healthy hearts live longer, and when we attach electrodes to it we see these interesting patterns associated with pulse and breathing...". That's Matt Walker's book applied to the brain. This allows "neuroscience" to get away these ridiculously overextended papers, because you can't disprove anything about something so hard to understand in the first place.


I am with you on the latest hype in academia and industry of getting on the Deep Learning train and not being left behind.

In this particular article, the authors propose a hypothesis that is drawn from similarity of over-fitting phenomenon seen in neural networks to over-fitting in human brain and sleep as a noise-additive approach to prevent this overfit.

In order to judge this hypothesis, I would urge you to counter their arguments (esp sections 3.1 and 3.2) with what you find is misleading or incomplete or outright ridiculous. That way we can get to know both sides of the story.


I'm not the original commenter, but I feel similarly. Perhaps offering some thoughts on sections 3.1 and 3.2 could foster some discussion.

Essentially, the hypothesis is that dreams are some kind of regularization that the brain performs on learning to prevent overfitting. The authors propose a couple of different arguments for why this may be useful (from deep learning) and why dreams in particular is where this regularization might happen (from neuroscience).

I don't think anyone would object that overfitting is bad and regularization is useful. I might even agree with the author that some kind of anti-Hebbian or noise corruption may be happening in the brain to help with this. However, the implied analogies in this paper between the tricks used in deep learning and dreams feels really "just so" to me.

I think their main evidence is that:

  1. Dreams facilitate learning
  2. The dreams we experience generally reflect the repeated task we experienced during the day
  3. We often hallucinate variations of environments, not the exact environment we experienced
I think overall from this, I feel it is fair to say that the brain does something at night to help improve learning. It's not immediately clear that this is analogous to dropout, training augmentation, or generative models used in deep learning. It is even less clear whether dreams are truly needed to perform this. The authors don't really dissociate dreaming from sleeping in most of the literature that they cite. Remember that your experience and the weight changes in your brain may not be that highly correlated. Many neural changes may happen during a dream which are not correlated with your conscious perceptions/recollection of that dream.


Human, animal, and machine learning processes to have common points (vide https://p.migdal.pl/2019/07/15/human-machine-learning-motiva...), even if the low-level mechanics is different (at least for biological brains vs GPU operations). We were already puzzled by some similarities (see: "Does AI have a dirty mind, too?" https://medium.com/@marekkcichy/does-ai-have-a-dirty-mind-to...).

> Unfalsifiable non-insights

By all means, it is falsifiable. We can present some training materials and alter the sleep pattern for a fraction of subjects. If generalization is not more affected than the ability to work with the memorized material, we falsified that.

> We don't even know how the brain does almost anything and you think you can use trendy AI topics to explain something as complex and mysterious as dreams?

With this approach, we can safely quit all science. All in all, it is hubris that a mammal brain can understand the universe.

Insights, ideas, and testable hypotheses offer us a way to make educated guesses. Occasionally they provide a way to understand more.


You can think of DL as a model to explore huge solution spaces, is clearly not the only one but looking at things with some models sometimes makes much more sense and DL has been extremely useful. Is true than real neurons and artificial ones are different but "both are systems that perform complex tasks via the updating of weights within an astronomically large parameter space.". The brain is clearly not using the same mechanisms but it probably have some of the same problems and looking it from a DL perspective (that we understand better than the brain) could help us understand it better.


>Unfalsifiable non-insights by people trying to apply arbitrary deep learning algorithms to a brain which is definitely not using any of them. DLcentrism? We don't even know how the brain does almost anything and you think you can use trendy AI topics to explain something as complex and mysterious as dreams?

Mistaking analogies for models.

Analogies (e.g., "machine learning" and "neural network") can help introduce ideas in a discussion. And if the analogy sparks new questions, that's also fine. One just needs to come up with a valid model and run experiments.

But some people try to harvest new insights without leaving the analogy. This is worsened when they do experiments to find supporting evidence instead of trying to disprove the hypotheses.


The answer I got from Matt Walker's "Why We Sleep?" was a list of benefits that we get from sleep and a list of negatives that we avoid. That is a sufficient answer for me. To ask why we need those benefits is a different question and eventually goes down a philosophical path.


Personally, I think that most behaviours/attributes have evolved out of the basic laws of life (survive and reproduce) and of Physics, especially energy-related constraints. Based on that the most obvious basis for sleep is energy saving.

I believe that 'sleep' has been observed in animals as simple as jellyfish so any neuroscience-centred explanation is bound to miss the root cause, in my view. But, as often in evolution, the development of nervous systems may have taken advantage of pre-existing sleep patterns.


'Why We Sleep' has some, uh, issues on the benefits-and-negatives front too.

https://guzey.com/books/why-we-sleep


Thanks for posting - it's always good to see all views. It's disappointing because the book helped me a lot. It helped me understand sleep and how I should change my lifestyle to get more of it. I feel (no measure) that it has benefited me.

I will not regard it as scientifically accurate now though.

Does anyone have any other books about sleep that they could recommend?


As someone working at the intersection of neuroscience and machine learning: thank you, very much. You might enjoy this book we use as a bible in our lab: http://cognet.mit.edu/book/principles-of-neural-design


Putting it out there as an idea, might inspire others and might help.

Machine Learning is the most closest thing we have, as far as i know.


> Putting it out there as an idea, might inspire others and might help.

Yeah I agree, maybe it's broscience but maybe it will spark something interesting and to me that's worth trying and sharing.

That said, to be honest I didn't read past the abstract as I don't have the time but cool idea :)


>ridiculous hubris

>definitely not using any of them


Aha, this was subtle. Nice catch of the poster's own "ridiculous hubris" in assuming that the brain is "definitely not using any" algorithms.


When complex mechanical machines were big, we imagined pneumatic brains. Then we imagined electronic, digital, and now quantum computing brains. We also started to see the brain through the lens of these systems and how they work.

Meanwhile the brain continues to work however the brain works.


Once people talked about "iron horses". Now we have walking robots. From a current perspective, trains are nothing like horses; does that mean that it's invalid to say designing and making walking robots does require/provide insight into the physiology of legs?

It seems to me you're making a very general statement that there is no progress in understanding because people used to vastly overestimate their understanding. People did not know stuff, and therefore they will never know stuff, and therefore they do not know stuff.


I'm not arguing that our understanding hasn't advanced, only that our models are constrained by our internal conceptual and external verbal/written vocabulary. It's possible that brains are doing things that we have no thoughts or words for because they don't resemble any system we've ever dealt with.


> does that mean that it's invalid to say designing and making walking robots does require/provide insight into the physiology of legs?

Yes. That sounds invalid.


That sounds overly binary to me. (said with conscious irony)

You are saying that the informational inputs and outputs for walking robots have exactly the same amount of relevance to biological limbs and biological control of walking as the engineering of steam locomotives?

Obviously they have aspects that are nothing like walking creatures. But the more they solve the same problems, the more they have to be similar, either intentionally or accidentally.


They're just advancing a hypothesis. If you read the paper you'll find it is written quite humbly, makes a good case for why the hypothesis is not crazy, and makes testable predictions which could falsify the hypothesis.


No downvotes. I agree with you.

I can guarantee you that deep NN/RL researchers are thinking of these hypotheses every other week. But they don't publish them. Even I, who got more active in this kind of research very recently, came up with the exact same hypothesis. I didn't publish.

We don't publish because we have a strong sense of empiricism. There proverbial proof has to be in the pudding. We should be able to setup a DNN and run experiments. If it confirms our hypothesis, we publish. If it disproves our hypothesis, we move on to something else.

These armchair neuroscientists/psychologists come along and type out a dozen pages of brain-dump, no pun, and think they're doing science. They need to do better than that, and do some real work with hands-on deep learning, and/or experimental neuroscience (or collaborate with someone who can do that).


This comment on its own can be read as testimony that covid-19 self isolation / social distancing can cause a compulsive need to try to feel relevant in some way when you meet to little people in situations where you are actually relevant. So yeah, I know it is somewhat preposterous to claim any relevancy, but one has to try to cope in some way.

I'm only just "DNN-literate" enough to get the broader points of the more well written papers. I certainly don't know enough to give any criticism on my own, but in a somewhat funny way, statistics, and my experience could perhaps lend some additional weight to your argument.

I'm pretty sure this is very similar to one hypotheses I've amused myself with from time to time, if not exactly. I don't even think I knew what the word overfitting meant at the point I came up with it. Though the ideas involved would seem to boil down to the same thing, or close enough to gauge how common this line of thinking would be.

It's a sample of one, but since I don't believe myself to be that special, and don't believe others would believe so either, it adds some weight that this line of thinking might be relatively common among anyone with even a little knowledge in the field(s).

Perhaps one could even argue I ought to be relatively bad at coming up with novel (creative) ideas, and ideas which are not precluded by known facts. At least partially because my brain would have a higher chance of being overfitted to the few bits I know. Any of my ideas should then either be shallow or false with high probability.


There are way way worse people out there than the ones who put up curious papers on a pre-print server for others to read for free.


> There should be a name for this kind of ridiculous hubris

Pseudoscience? To me this paper is no different from astrology.


> There should be a name for this kind of ridiculous hubris. Unfalsifiable non-insights by people trying to apply arbitrary deep learning algorithms to a brain which is definitely not using any of them.

Galaxy brain-ism


"Notably, all DNNs face the issue of overfitting as they learn, which is when performance on one data set increases but the network's performance fails to generalize (often measured by the divergence of performance on training vs testing data sets)."

Not really. For example, "Gradient Methods Never Overfit On Separable Data" https://arxiv.org/abs/2007.00028


"In this paper, we consider the implicit bias in a well-known and simple setting, namely learning linear predictors (x->x'w) for binary classification with respect to linearly-separable data"

Hoping that this applies to deep neural networks is a huge leap of faith to be honest.


Wow. Never before has an abstract felt so mind blowing to read


Came here to say the same.

At least personally, it's the insight into human brain mechanics and new abilities to test hypotheses in this field that gets me most excited about deep learning developments, rather than the improvements in performance on various tasks.


If you’re on a phone, here’s an HTML version: https://www.arxiv-vanity.com/papers/2007.09560/


"The goal of this paper is to argue that the brain faces a similar challenge of overfitting, and that nightly dreams evolved to combat the brain's overfitting during its daily learning....Sleep loss, specifically dream loss, leads to an overfitted brain that can still memorize and learn but fails to generalize appropriately."

This is a beautiful idea. Will have to read the whole paper to understand how they support this claim.


While an interesting hypothesis, we should always be careful with research that tries to derive biological understanding from AI research. AI is, by necessity, a simplification of how our brains work. For example, current neural networks really only have one definition of neuron. But biological neurons can be very different...there are hundreds of types of neurons. Even if we limit the definition of neuron type to just describe variation in switching behavior, mice have been found to have 19 distinct neuron types with distinct switching behavior, and humans likely have dozens more.


I wonder what psychology as a discipline will look like in twenty years? I feel like what we're learning now about the human mind through our study of neural networks is similar to the early work in cellular biology that eventually replaced metaphor-based models in medicine (e.g. 'humors') with more ontological ones. Freud's Ego, Superego and Id are gone. So are 'complexes'. Our model of the human mind currently seems to be limited to 'conscious and subconscious'. I'm excited at the prospect of something much better.


To the best of my knowledge, I have ever dreamed. The abstract speaks of the dangers of not dreaming from lack of sleep and I wonder if the same thing applies to naturally not dreaming at all?


It’s far more likely that you just don’t remember your dreams.


I’ve been alive for almost 50 years. I would think that if I dreamed, I would have been aware of it at least once. That is approximately 20,000 times I have woken up without any recollection of anything since the moment I fell asleep.


What’s a dream then?


A dream is the brain coming back online after being shut down for maintenance.

That's my supposition. It's an artifact, not a feature.


But that's a model of how you think it works. It's not what a dream is.


> To the best of my knowledge, I have ever dreamed.

This typo sounds profoundly, of a butterfly; vague and formless, overgeneralizing.


I love it when two disciplines come together to find new solutions to scientific problems, but something seems off here. An artificial neural network is incredibly simplistic compared the the messy complexity of the brain. Generalization seems to happen in some places and for some dreams, but is that really the only function of dreams? I somehow doubt it.


If we’re going down this road of theorizing about the human brain based on DNNs, what is the deal with dropout? Could we help human brains with generalization by randomly removing 10% of our newly created connections at the end of each day to improve long term learning? :)


That’s called synaptic pruning and, while it most happens as a human matures, there’s evidence indicating that it occurs during sleep in adults to help consolidate the most important connections and remove the unimportant ones. It’s not exactly like dropout, but at a high level it kind of looks like it.


Skimmed the paper so couldn't possibly give it a fair review but I always feel there's something off when people make comparisons of ANNs to actual biological brains. Even more so when it's the other way about.


Is this a form of mechanomorphism, where we try to reason about how human cognition might work by drawing an analogy from how computers work (specifically, overfitting in ANNs) and try to apply it back to humans?


Seems like a testable hypothesis: Prepare a "training" deck of cards with an A side and B side. Each side has a simple symbol on it. Create a second "test" deck of cards which is identical, expect each A symbol is slightly shifted in meaning (e.g. horse -> donkey). The task is to predict side B from side A.

If less dreaming leads to overfitting, we would expect REM sleep deprived probands to do better on the "training" set after learning with them and do worse on the "test" set, compared to probands without sleep deprivation.


This paper isn't about sleep deprivation, it's about dreams.


Right, REM phase interruption would be sufficient (according to this theory). There are also a bunch of additional variables that one likely cannot (easily) control for. Still, this theory should make predictions of the sort that more overfitting occurs in absence of dreaming.


I recall this being essentially the inspiration for the naming of the Wake-Sleep algorithm of Boltzmann machines.


In the paper: "It is worth noting that the proposal of a "wake/sleep" specific algorithm for unsupervised learning of generative models based on feedback from stochastic stimulation goes back 25 years (Hinton et al., 1995)"


how about daydreaming? does that work too?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: