but many others exist. I'd write more but typing in my phone is driving me to distraction.
I'd like to learn more about these things - brain regions, connections, functions - and what they might imply about the kinds of computations that are going on, but my background is mainly on the AI/math side of things.
Modern applications of small networks regularly reduce sizes from larger state-of-the-art networks using distillation. Distillation compacts neural networks while affecting accuracy minimally.
Instead of pruning directly from the large network, just learn how it generalizes. Takes fewer nodes / overall operations (Multiplications / Additions).
Also "combine" might not be the right word, since it's really transfer learning. "Distill" is really a descriptive verb.
Maybe my original wording was confusing; I shouldn't have said "distillation compacts" -- distillation is a process by which you can create a more compact version of a complex neural net.
You could train a model using neurogenesis to increase its accuracy, and then use distillation to train a smaller network to comparable accuracy.
But these are two very different, but complementary, problems.
On the other hand, giving the model with more options than it necessarily needs and letting it decide what is important will usually backfire. Rather than learning a few meaningful/functional features, it can just go ahead and completely fit the training data from the very beginning. It will therefore decide that everything is important, because all those extraneous parameters will let it squeeze that last 0.5% out of your training set.
It seems like almost all effort is spent on the former, since everyone's aiming for higher accuracy numbers. Are there any widely-used methods to tackle the latter?
For example, I'm imagining a system which is either given measurements of its resource usage (time, memory, etc.) or uses some simple predictive model (e.g. time ~ number of layers * some constant), and works within some resource bound:
- If we're below the bound, expand the model (add neurons, etc.) to allow accuracy increases (note "allow": it's ok to ignore/regularise-to-zero the extra parameters to avoid overfitting)
- If we're above the bound, prune the model (in a way which tries to preserve accuracy)
- Allocate resources to optimise some objective, e.g. reduce variance by pruning the parameters of the best-performing class/predictor/etc. and using those resources to expand the worst performer.
The closest thing I know of are artificial economies, but they seem to be more like a selection mechanism (akin to genetic programming) than a direct optimisation procedure (like gradient descent on an ANN).
For that, check out our OpenReview ICLR submission on
NEUROGENESIS-INSPIRED DICTIONARY LEARNING: ONLINE MODEL ADAPTION IN A CHANGING WORLD, by
Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano
This paper looks like it builds on pretty well-known techniques like stacked autoencoders, so let's see what first-order noteworthiness data we can gather from a quick skim of the paper. If I had to guess why it wasn't accepted into a better conference:
- It uses stacked autoencoders, which are pretty out of fashion
- It bothers reporting results on MNIST
- (more subjectively) It pulls an unfortunately common technique of saying "here's something the brain does" and then hand-waving that it's a deep reason why a technique they've come up with is useful, when in fact the relationship is just "inspired by the general idea of", not "performs the same function as" the biological mechanism. In this case, I think the tenuous connection of their technique to research on neurogenesis is pretty flimsy. Clearly neurogenesis is not how an adult human brain forms new memories or gains proficiency in new skills (which they acknowledge in the conclusion)
> If you're using the conference as a quick pass/fail as you skim through the abstracts of hundreds of papers, ok
You answered your own statement, I think. Most researchers will skip a paper in a second tier conference. In fact, most I know won't read an entire paper - they'll only read some of it and skip stuff.
You're correct that I am not an active researcher (otherwise I would not have time to be commenting). I merely did some research back in college. But honestly that little experience gives me a huge leg up on most HN commenters in understanding research. It's unfortunate that the only reason this paper is #1 on HN is because it has a cool title.
That being said, MNIST is not really a disqualifier. (Unfortunately) MNIST is the most popular dataset referenced in NIPS 2016 papers (https://twitter.com/benhamner/status/805864969065689088). The handwaving is also forgivable; many NIPS papers handwave a lot too.
NEUROGENESIS-INSPIRED DICTIONARY LEARNING: ONLINE MODEL ADAPTION IN A CHANGING WORLD
Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano
It won't be a 'general ai', though. More like a set of loosely connected systems that operate 'in the best interests of the shareholders', however that's defined.
It's pretty much the end state of the trend of pushing decision making to algorithms to remove moral and legal culpability from individuals.
I have plenty of wants and desires that could take a whole army of idiot savants working 24/7 to fulfill.
Then again.. I also don't understand how researchers come up with a yearly budget for making discoveries.
Click on "FC Forecasting" near the top to limit the list to those about AI predictions, including:
- The Errors, Insights and Lessons of Famous AI Predictions - and What They Mean for the Future
- Predicting AGI: What Can We Say When We Know So Little?
- How We're Predicting AI - or Failing To
And the thing is, we aren't exactly sure why exactly that is.. it's amazing.
Sometimes the best thing we can do is imitate nature
I think that slights neuroscience, which has devoted the past 60 years to answering this question, to a fair degree. But I agree that the biomimetic motivations offered up for various flavors of neural net feel pretty bogus. It seems to me like, among the major old-school researchers in the field, only Geoff Hinton still does this.
I could say the same about genetics, btw. Biology has turned out to be incredibly complex.
This part is a bit similar, no?
> ANNs have more in common with a CPU than a brain
How so? which parts are similar?
Here is a list of properties that ANNs shared with CPUs that are different from brains:
* Synchronized activation vs. asynchronous / partially synchronous activation
* Digital signals vs. analog signals
* Instantaneous transmission of signals vs. delay imposed by axon and dendrite length
* Uniform signal vs. use of various neurotransmitter signals
* Rapid activation speed (GHz) vs. slow activation speed (Hz)
* The use of negative signals vs. strictly positive quantities of neurotransmitters
* Low average connections (10-1000) vs. high average connections (5,000-100,000)
* Low energy efficiency vs. high energy efficiency
For a detailed essay on the topic, see: http://timdettmers.com/2015/07/27/brain-vs-deep-learning-sin...
Sure, some of the properties of NNs do not transpose well to ANNs. As someone pointed out in a comment here with an article showing that if you apply the same kind of signal it doesn't work.
But the fact remains: we are being more successful on AI advancements by trying to emulate parts of our brain than we were with other techniques.
We didn't knew that this would happen when it all started, but it did.
Out of the blue, no-one could look at a model of a yet to be implemented ANN and say it would work, and why. It has all been experimentation, taking the brain as a raw blueprint.
And although many other phenomenas that happened with the brain didn't work well with ANNs, neurogenesis apparently did.
It's impressive IMO and quite humbling that we are getting so many achievements out from mimicking nature, and we aren't 100% sure why it worked in the first place.
That's all that I meant to say
Would there be anything to gain by simulating this?
* local regular structure vs irregular structure with global elements
Spiking Neural Networks  attempt to be more accurate representations of human neurons, but haven't really caught on because they aren't really much better than our perceptron model of neurons, at least for the things we are trying to do with them.
Neurons are also a family of cells, and are very diverse in shapes and functions. We tend to oversimplify our representation of neurons. There are simple neurons and then you have neurons like the Purkinje cell that are massive.
Neurons also rely on their counterparts, the glial cells, that are much less often mentioned.
I think because of this, it will be a while until we fully understand the role of each one of them.
Much so that one of the pioneers of this got discredited by other scientist that for some reason simply could not accept that these would work, and that same pioneer started to get his funding discredited and started believing in his opposition so much so that he sailed away to his death, some argue intentionally (as his life's work had been, even by his eyes, seen as useless).
Now, I'm having a hard time remembering the names of the people on that story, if someone knows who I'm talking about, please remind me of those
- real neurons are stochastic and communicate through spikes, artificial neurons can communicate real values efficiently
- real neurons are more like automatons, they have a dynamic in time, learning happens as a continuous interaction with only its neighbors; artificial neurons are "static" (use discrete time) and implemented by forward and backward pass, and also can use nonlocal information
- real neurons can't backpropagate, because backprop requires the transmission of gradients back the same connections, but in reverse - brain connections don't support that kind of bidirectional data flow; artificial neurons work best by backprop
- real neurons can't implement convolutions, it would require a neuron to slide over a field; also real neurons can't implement RNNs as they are, and don't use backpropagation through time BPTT
So, artificial neurons are much less hampered and can do many things that real neurons can't do or have to use some less efficient method. That means brain neurons still have some tricks up their sleeve. Artificial neurons are quite different from brain neurons, and it's right to be so, because they can be more efficient that way.
Why attribute the idea of introducing new nodes to a graph to biological concepts? It seems like a simple step in exploration, similar to how one might think to vary the weights of the nodes randomly over some range.. unless there is some technique biology uses to pre-configure the nodes upon introduction to the network, that might be rather interesting.
I find myself often having to reread a sentence in order to understand it.
These algorithms are often very simple and can be easily explained. Don't over complicate them.
Neural machine learning methods, such as deep neural networks (DNN), have achieved remarkable success in a number of complex data processing tasks. These methods have arguably had their strongest impact on tasks such as image and audio processing - areas where humans have always performed better than conventional algorithms. In contrast to biological neural systems, which are capable of learning continuously, deep artificial networks have a limited ability for incorporating new information after a network has been trained. As a result, continuous learning methods could be very helpful in allowing deep networks to handle data sets which change over time. Here, inspired by the process of adult neurogenesis in the hippocampus, we investigate how adding new neurons to artificial neural networks can allow them to acquire new information, while preserving what they have already learned. Our results on the MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes lower and upper case letters and digits, show that neurogenesis looks like a good approach for tackling the "stability-plasticity dilemma" that has been a problem for adaptive machine learning algorithms for some time.
As an academic, I tend to agree that we frequently feel compelled to apply more verbosity than is strictly required in order to communicate the intended semantic constructs.
I showed this to my father, a surgeon, and he said he understood it. But not the original abstract.
Nobody talks like this. In my head I read this sentence and I have to translate it to "we add extra neurons to existing networks so they can learn new information while remembering everything it already knows".
> I have to translate it
But you have to distinguish active and passive vocabulary. Words you commonly use and words you understand when others use them. And you also have to distinguish between written and spoken language. E.g. nobody would actually say "exempli gratia" but it's commonly used in writing.
English is not my mother tongue and for me it mostly falls into passive vocabulary, but it is perfectly understandable. I don't have to mentally translate the whole sentence into simpler words before being able to grasp its meaning.
And it's not like those words are obscure, they are just place 2 or 3 of most common uses among their respective set of synonyms.
"Extra neurons"? Input layer? Output layer? Just before a final, fully-connected layer? Somewhere in between?
"Everything it already knows"? What does it know? Character probabilities, like a charnn? Image categories like in a CNN? Input distributions, like a GAN?
From the abstract, I can immediately tell that this paper is about modifying deep auto-encoders in the hidden layers. From that, I can immediately understand that the paper is not about adapting to new input formats or output formats, but instead about inputs from a new distribution but in the same format.
The author's intended audience, researchers and academics, do talk like this. They do so because it is quickly understandable and actually information dense, as indicated in my above paragraph.
- "We specifically consider the case of...a stacked deep autoencoder (AE), which is a type of neural network designed to encode a set of data samples such that they can be decoded to produce data sample reconstructions with minimal error
- "The first step of the NDL algorithm occurs when a set of new data points fail to be appropriately reconstructed by the trained network...When a data sample’s RE is too high, the assumption is that the AE level under examination does not contain a rich enough set of features to accurately reconstruct the sample.
- "The second step of the NDL algorithm is adding and training a new node, which occurs when a critical number of input data samples (outliers) fail to achieve adequate representation at some level of the network.
- "The final step of the NDL algorithm is intended to stabilize the network’s previous representations in the presence of newly added nodes. It involves training all the nodes in a level with both new data and replayed samples from previously seen classes on which the network has been trained.