Something about this article as a whole, though, does raise a question that I've been wondering about, and I want to present it here for a bit of discussion (maybe it needs its own thread?):
Does anyone else here think that the current approach to neural networks has some fundamental flaws?
Now - I'm not an expert; call me an interested student right now, with only the barest of experience (beyond the ML Class, I also took Udacity's CS373 course, and I am also currently enrolled in their Self-Driving Car Engineer nanodegree program).
I understand that what we currently have and know does work. What I mean by that is the basic idea of an artificial neural network using forward and back-prop, multiple layers, etc (and all the derivatives - RNN, CNN, deep learning, etc). I understand the need and reasoning behind using activation functions based around calculus and derivatives and the chain-rule, etc (though I admit I need further education in these items).
But something nags at me.
All of this, despite the fact that it works and works well (provided all your tuning and such is right, etc), just seems like it is over-complicated. Real neurons don't use calculus and activation functions, nor back-propagation, etc in order to learn. All of those things in an ANN are just abstractions and models around what occurs in nature.
Maybe (probably?) I am wrong - but it seems like what nature does is simpler. Much less power is used, for instance, and the package is much more compact. I just have this feeling that in some manner we may have gone down a path that while it has produced a working representation, that representation is overly complex, and had we took another approach (whatever that might be?), our ANNs would look and work much differently - perhaps even more efficiently.
About the only alternatives I have heard about otherwise have been things like spike-train neural networks, and some of the other "closer to nature" simulation (of ion pumps and real synapses, etc). Still, even those, while seemingly closer, also have what appears to be too much complexity.
I'm probably just talking out of my nether regions as a general n00b to the field. I do wonder, though, if there might be another solution, seemingly out in "left-field" that might push things forward, if someone was willing to look and experiment. It is something I plan to look into myself, as I find time and such between lessons and other work for my current learning experience.
This sounds like a (common) failure to understand how abstractions work. Bridges don't do calculus, but the bridge builder uses calculus to understand what bridges do use (the laws of nature), and thus the calculus abstraction is used to encode the behavior of bridges. Thus you can model bridges using calculus.
Similarly, neurons are modeled by calculus. Abstractions are abstract precisely because they are not the concrete thing they model: they are necessarily approximations. They give us the power to simplify at the cost of gaining the capacity to be wrong.
The point being this: you can literally use any abstraction you desire to model anything you like. Some will work better than others, and the better they work, the more closely the structure of your abstraction matches the structure of the concretion being modeled.
Some people cannot understand this, and believe that if you can closely fit the output of a process with a neural network that it implies the process itself is in some way related to neural networks.
At the same time, I wonder if there isn't a simpler way of doing all of this - if there isn't a simpler model for the abstraction of a neuron that doesn't require back-propagation or calculus? In other words, have we become so use to the current concepts and models of ANNs that we have become hesitant or resistant to imagining other possibilities?
Yes - what we have works, and it works well. Honestly, from what I know, and what I have learned (while I may not fully grasp the mathematical and calculus underpinnings of a neural network, I do have a good feel for how both forward and back-prop is implemented), our current general knowledge on ANNs (ie "how to implement a simple neural network") isn't super-complex. I only question whether it could be made simpler (and I'm not talking about a network of perceptrons or RELUs), or if, because of the early work (Pitts and McCulloch mainly), we are going down a less-than-optimum path (to use an ML analogy, we are stuck in a local minima) - one that fortunately works, but maybe there's something better out there?
Again - I recognize as a non-expert in this field (I only consider myself a student and hobbyist so far) - I am likely very well off in la-la land. Even so, we know that quite often in engineering, the optimum solution tends to be the simplest solution; sometimes, that solution is simpler than nature (an airplane vs a bird, for instance). I have a feeling that may be the case with ANNs as well.
I am not wedded to this idea, though - I just want to put it out there for consideration and maybe discussion. As I noted, what we currently have works well, and isn't a complexity nightmare, and is understandable to anyone willing to understand it. It very well could be it -is- the simplest explanation.
Adult humans seem to learn faster when there is some combination of theory, examples, and experience (aka feedback). I'm a scientist and I have very little interest in neural nets for science because it contorts away the kind of equation-based systematics that we rely on to understand our world. The theory component is missing.
I'm more interested in expert system-type AI, where the learning may help us infer more about the world in the systematic way that scientists try to grapple with it--logical rules and frameworks, physics equations, statistics, etc. But these seem to be far less effective or efficient than ANNs, at least as currently implemented.
That being said, I did see a great talk last week on using ANNs to accelerate viscoelasticity calculations--those sorts of things are definitely valuable for science by reducing the time required for simulations by orders of magnitude. And some of the discussion after the talk had to do with how a graph of biases and weights is, philosophically and practically, a fundamentally different way of representing the problem and its solution than a set of differential equations. There is no doubt that it's a useful way of solving the calculations, but it's unclear how we can build upon those weights and biases in a theoretically-meaningful way. How do we learn from machine learning?
Now replace the neurons with some tangentially related abstraction that is massively different from real neurons, well...
I'd love a reference for this. Link, please?
That said, don't underestimate the complexity of what is happening in real neurons. It's true that an individual neuron is fairly simple (more or less! some may perform very complex encodings of sensory data!), but the complexity of the "computations" they perform goes up very quickly, and you have a _lot_ of them that are highly connected, so it may be that ANN are a good abstraction in the sense that they reduce the complexity of spike trains to a more abstract but more powerful representation.
It's true though that they probably don't perform back-propagation, at least in the mathematical sense; I don't know the literature but I think an inhibitory mechanism is probably a better biological model for how "error detectors" can be used to suppress useless or noisy output.
hmm, I did the cs231n homework and had the opposite experience. It was really easy to complete it by ignoring numpy's provided matrix methods (just write a bunch of for loops in python) but that solution was really slow. If you could use numpy's matrix methods, however, the code executed a lot faster (the loops and control flow now run in C instead of python, so to speak). The hard part of the assignments was expressing my python code, with all its ad-hoc mixture of nested loops and conditionals, in numpy.
But I think Ng wanted us to understand what was going on under the hood in Octave when you used its in-built vector primitives, and how to think about the problems in such a way to understand how to "vectorise" them so that the solutions would be amenable to using those primitives.
There was a time where I wasn't "getting it" - and proceeded with my own implementation. In time though, it "clicked" for me, and I could put away those routines and use the vector math more fully (and quickly). That said, I wouldn't have wanted it to be left as a "black box" - it was nice to have the understanding of the operations it was doing behind the curtain.
Which is also why I was disappointed that the math wasn't described more in either of those courses; that part was left as "black boxes" and only hinted at a bit (ie "here's the derivative of the function - but you don't have to worry about it, but if you know about the math, you might find it interesting").
In this latest course I am taking, though, they are diving right into the math - and I have found that they assume you already know what a derivative is and how it is formed from the initial function. Unfortunately, I don't have the education needed, so I am fumbling through it (and doing what I can to read up on the relevant topics - I even bought a book on teaching yourself calculus which was rec'd for me by a helpful individual).
In fact, now that I think about it, the first assignment asked us to write a 2-loops-in-python version of some function (batch linear classifier, I think), then a 1-loop version, then a 0-loop version, and I often repeated this procedure on subsequent harder questions.
That was an useful thing I wasn't expecting to learn from the course - how to vectorize code for numpy (including using strange features like broadcast and reshape)
But do you think that real neurons are less complicated than artificial neurons? Look at the molecular structure of a single ion channel. It's crazy complicated!
How do we know that neurons don't do calculus? Chemical gradients, summation, etc.