The high number of typos throws me a bit though. Amusingly they even misspell their central idea "Overfitted Brian Hypothesis" and I gotta say that the opening reminds me a bit of fluff you see in mindless high school essays.
> During the Covid-19 pandemic of 2020, many of those
in isolation reported an increase in the vividness and
frequency of their dreams (Weaver, 2020), even leading
#pandemicdreams to trend on Twitter. Yet dreaming is
so little understood there can be only speculative answers
to the why behind this widespread change in dream be-
My first thought was Monty Python.
Actually, there's compelling evidence that overfitting is a necessary step for achieving state-of-the-art performance with DNNs!
Many state-of-the-art deep learning models today are trained until they achieve ~100% accuracy on the training data, and then we continue to train them because once they go past this "interpolation threshold" they continue learning to generalize better to unseen data. This is known as the "double descent" phenomenon. See, for example:
The author makes no mention of double descent and the need for overfitting. In fact, he seems completely unaware of it.
EDIT: Also, see this comment https://news.ycombinator.com/item?id=23957501 elsewhere on this page.
I would also argue that "the need for overfitting" is a consequence of broken benchmarks, rather than a feature of deep learning. Why else would adversarial examples arise?
The network contains many more parameters than data points.
Therefore there is an entire region of lowest training error.
A random point sonewhere in the middle of that region probably generalises better than at the edge.
Once SGD enters this region at the edge, generalisation can still occur because the randomness will most likely cause it to random walk inside the region.
To test this hypothesis you could run gradient descent with line search instead of SGD, and then you should not see this extra generalisation. Then if you add a bit of randomness to gradient descent you should see this extra generalisation again, if this hypothesis is correct. Also, under this hypothesis you'd predict that the speed at which generalisation improves depends on the batch size.
In this paper, I think the authors are not focusing on overfitting on the data in the traditional Vapnik sense, but overfitting on unimportant information within that data. Noise injection and data augmentation are the techniques we use to de-correlate signal from noise in the data so that the networks can focus on signal and not on noise. Here sleep and dreams are argued to be that source of "noise" for our brain to decorrelate "signal" from the tasks we learn while awake.
I think the term you are looking for is "overparametrization". Overfitting is, by definition, worse generalization performance despite better training performance.
This means that the idea in this paper is already "out there", contrary to what the abstract states. But it's exciting to have a framework to talk about it quantitatively.
This reminds me of Matt Walker's terrible book on sleep, which, as with almost all neuroscience research recently, tries to explain "why" we have some behavioral pattern or experience, but literally never offers an explanation at all, opting to say "this region lights up in an fMRI machine", as if that answers anything at all. It's like if you asked "why does the heart pump blood?" and a cardiologist answered, "well, the heart is very important for exercise, and people with healthy hearts live longer, and when we attach electrodes to it we see these interesting patterns associated with pulse and breathing...". That's Matt Walker's book applied to the brain. This allows "neuroscience" to get away these ridiculously overextended papers, because you can't disprove anything about something so hard to understand in the first place.
In this particular article, the authors propose a hypothesis that is drawn from similarity of over-fitting phenomenon seen in neural networks to over-fitting in human brain and sleep as a noise-additive approach to prevent this overfit.
In order to judge this hypothesis, I would urge you to counter their arguments (esp sections 3.1 and 3.2) with what you find is misleading or incomplete or outright ridiculous. That way we can get to know both sides of the story.
Essentially, the hypothesis is that dreams are some kind of regularization that the brain performs on learning to prevent overfitting. The authors propose a couple of different arguments for why this may be useful (from deep learning) and why dreams in particular is where this regularization might happen (from neuroscience).
I don't think anyone would object that overfitting is bad and regularization is useful. I might even agree with the author that some kind of anti-Hebbian or noise corruption may be happening in the brain to help with this. However, the implied analogies in this paper between the tricks used in deep learning and dreams feels really "just so" to me.
I think their main evidence is that:
1. Dreams facilitate learning
2. The dreams we experience generally reflect the repeated task we experienced during the day
3. We often hallucinate variations of environments, not the exact environment we experienced
> Unfalsifiable non-insights
By all means, it is falsifiable. We can present some training materials and alter the sleep pattern for a fraction of subjects. If generalization is not more affected than the ability to work with the memorized material, we falsified that.
> We don't even know how the brain does almost anything and you think you can use trendy AI topics to explain something as complex and mysterious as dreams?
With this approach, we can safely quit all science. All in all, it is hubris that a mammal brain can understand the universe.
Insights, ideas, and testable hypotheses offer us a way to make educated guesses. Occasionally they provide a way to understand more.
Mistaking analogies for models.
Analogies (e.g., "machine learning" and "neural network") can help introduce ideas in a discussion. And if the analogy sparks new questions, that's also fine. One just needs to come up with a valid model and run experiments.
But some people try to harvest new insights without leaving the analogy. This is worsened when they do experiments to find supporting evidence instead of trying to disprove the hypotheses.
I believe that 'sleep' has been observed in animals as simple as jellyfish so any neuroscience-centred explanation is bound to miss the root cause, in my view. But, as often in evolution, the development of nervous systems may have taken advantage of pre-existing sleep patterns.
I will not regard it as scientifically accurate now though.
Does anyone have any other books about sleep that they could recommend?
Machine Learning is the most closest thing we have, as far as i know.
Yeah I agree, maybe it's broscience but maybe it will spark something interesting and to me that's worth trying and sharing.
That said, to be honest I didn't read past the abstract as I don't have the time but cool idea :)
>definitely not using any of them
Meanwhile the brain continues to work however the brain works.
It seems to me you're making a very general statement that there is no progress in understanding because people used to vastly overestimate their understanding. People did not know stuff, and therefore they will never know stuff, and therefore they do not know stuff.
Yes. That sounds invalid.
You are saying that the informational inputs and outputs for walking robots have exactly the same amount of relevance to biological limbs and biological control of walking as the engineering of steam locomotives?
Obviously they have aspects that are nothing like walking creatures. But the more they solve the same problems, the more they have to be similar, either intentionally or accidentally.
I can guarantee you that deep NN/RL researchers are thinking of these hypotheses every other week. But they don't publish them. Even I, who got more active in this kind of research very recently, came up with the exact same hypothesis. I didn't publish.
We don't publish because we have a strong sense of empiricism. There proverbial proof has to be in the pudding. We should be able to setup a DNN and run experiments. If it confirms our hypothesis, we publish. If it disproves our hypothesis, we move on to something else.
These armchair neuroscientists/psychologists come along and type out a dozen pages of brain-dump, no pun, and think they're doing science. They need to do better than that, and do some real work with hands-on deep learning, and/or experimental neuroscience (or collaborate with someone who can do that).
I'm only just "DNN-literate" enough to get the broader points of the more well written papers. I certainly don't know enough to give any criticism on my own, but in a somewhat funny way, statistics, and my experience could perhaps lend some additional weight to your argument.
I'm pretty sure this is very similar to one hypotheses I've amused myself with from time to time, if not exactly. I don't even think I knew what the word overfitting meant at the point I came up with it. Though the ideas involved would seem to boil down to the same thing, or close enough to gauge how common this line of thinking would be.
It's a sample of one, but since I don't believe myself to be that special, and don't believe others would believe so either, it adds some weight that this line of thinking might be relatively common among anyone with even a little knowledge in the field(s).
Perhaps one could even argue I ought to be relatively bad at coming up with novel (creative) ideas, and ideas which are not precluded by known facts. At least partially because my brain would have a higher chance of being overfitted to the few bits I know. Any of my ideas should then either be shallow or false with high probability.
Pseudoscience? To me this paper is no different from astrology.
Not really. For example, "Gradient Methods Never Overfit On Separable Data"
Hoping that this applies to deep neural networks is a huge leap of faith to be honest.
At least personally, it's the insight into human brain mechanics and new abilities to test hypotheses in this field that gets me most excited about deep learning developments, rather than the improvements in performance on various tasks.
This is a beautiful idea. Will have to read the whole paper to understand how they support this claim.
That's my supposition. It's an artifact, not a feature.
This typo sounds profoundly, of a butterfly; vague and formless, overgeneralizing.
If less dreaming leads to overfitting, we would expect REM sleep deprived probands to do better on the "training" set after learning with them and do worse on the "test" set, compared to probands without sleep deprivation.