Hacker News new | past | comments | ask | show | jobs | submit login

I do think the complaint on having to write the backward pass seems especially shallow; finding out they were working with numpy makes it even more so (since numpy takes the pain out of the matrix operations). IIRC, when I took the ML Class in 2011, we used Octave, but Ng had us first write stuff "the hard way" - so we'd understand what was going on later when we used Octave's methods.

Something about this article as a whole, though, does raise a question that I've been wondering about, and I want to present it here for a bit of discussion (maybe it needs its own thread?):

Does anyone else here think that the current approach to neural networks has some fundamental flaws?

Now - I'm not an expert; call me an interested student right now, with only the barest of experience (beyond the ML Class, I also took Udacity's CS373 course, and I am also currently enrolled in their Self-Driving Car Engineer nanodegree program).

I understand that what we currently have and know does work. What I mean by that is the basic idea of an artificial neural network using forward and back-prop, multiple layers, etc (and all the derivatives - RNN, CNN, deep learning, etc). I understand the need and reasoning behind using activation functions based around calculus and derivatives and the chain-rule, etc (though I admit I need further education in these items).

But something nags at me.

All of this, despite the fact that it works and works well (provided all your tuning and such is right, etc), just seems like it is over-complicated. Real neurons don't use calculus and activation functions, nor back-propagation, etc in order to learn. All of those things in an ANN are just abstractions and models around what occurs in nature.

Maybe (probably?) I am wrong - but it seems like what nature does is simpler. Much less power is used, for instance, and the package is much more compact. I just have this feeling that in some manner we may have gone down a path that while it has produced a working representation, that representation is overly complex, and had we took another approach (whatever that might be?), our ANNs would look and work much differently - perhaps even more efficiently.

About the only alternatives I have heard about otherwise have been things like spike-train neural networks, and some of the other "closer to nature" simulation (of ion pumps and real synapses, etc). Still, even those, while seemingly closer, also have what appears to be too much complexity.

I'm probably just talking out of my nether regions as a general n00b to the field. I do wonder, though, if there might be another solution, seemingly out in "left-field" that might push things forward, if someone was willing to look and experiment. It is something I plan to look into myself, as I find time and such between lessons and other work for my current learning experience.




>Real neurons don't use calculus and activation functions, nor back-propagation, etc in order to learn.

This sounds like a (common) failure to understand how abstractions work. Bridges don't do calculus, but the bridge builder uses calculus to understand what bridges do use (the laws of nature), and thus the calculus abstraction is used to encode the behavior of bridges. Thus you can model bridges using calculus.

Similarly, neurons are modeled by calculus. Abstractions are abstract precisely because they are not the concrete thing they model: they are necessarily approximations. They give us the power to simplify at the cost of gaining the capacity to be wrong.

The point being this: you can literally use any abstraction you desire to model anything you like. Some will work better than others, and the better they work, the more closely the structure of your abstraction matches the structure of the concretion being modeled.


If you fit some data with a very flexible function approximator, that does not imply any kind of isomorphism between the function approximator itself and the process generating the data.

Some people cannot understand this, and believe that if you can closely fit the output of a process with a neural network that it implies the process itself is in some way related to neural networks.


Don't get me wrong - I may not fully comprehend the math behind what is going on (as I noted, I have little understanding of calculus, and it is one of my failings that I am working to improve on) - but I do understand the need for the abstraction; I understand that it allows us to model the activation function and workings of a neuron, to a certain level of accuracy. I understand that it may -not- be the same way that a real neuron does it, but that it is close enough and works well enough to be useful.

At the same time, I wonder if there isn't a simpler way of doing all of this - if there isn't a simpler model for the abstraction of a neuron that doesn't require back-propagation or calculus? In other words, have we become so use to the current concepts and models of ANNs that we have become hesitant or resistant to imagining other possibilities?

Yes - what we have works, and it works well. Honestly, from what I know, and what I have learned (while I may not fully grasp the mathematical and calculus underpinnings of a neural network, I do have a good feel for how both forward and back-prop is implemented), our current general knowledge on ANNs (ie "how to implement a simple neural network") isn't super-complex. I only question whether it could be made simpler (and I'm not talking about a network of perceptrons or RELUs), or if, because of the early work (Pitts and McCulloch mainly), we are going down a less-than-optimum path (to use an ML analogy, we are stuck in a local minima) - one that fortunately works, but maybe there's something better out there?

Again - I recognize as a non-expert in this field (I only consider myself a student and hobbyist so far) - I am likely very well off in la-la land. Even so, we know that quite often in engineering, the optimum solution tends to be the simplest solution; sometimes, that solution is simpler than nature (an airplane vs a bird, for instance). I have a feeling that may be the case with ANNs as well.

I am not wedded to this idea, though - I just want to put it out there for consideration and maybe discussion. As I noted, what we currently have works well, and isn't a complexity nightmare, and is understandable to anyone willing to understand it. It very well could be it -is- the simplest explanation.


Very well said. It took me a long time to "get it" but once I crossed that threshold I started viewing different mathematical techniques as hammers, some more suited to modeling certain phenomena than others.


I think you are assuming the human brain is significantly simpler and better than it is. To start with: the number of neurons in the brain is ~100 billion, which is roughly 10,000 times more than in a large ANN, we train the human brain for years before it does anything remotely intelligent (as opposed the the expectation that neural nets will get somewhere useful in hours or days) and most importantly: our brains probably have strong priors due to millions of years of evolution encoded in our DNA. On top of this, not all brains are as good as each other, especially when you think about animals who do not have a real system of speech.


I don't know a huge amount about neurology (or neural nets), but...

Adult humans seem to learn faster when there is some combination of theory, examples, and experience (aka feedback). I'm a scientist and I have very little interest in neural nets for science because it contorts away the kind of equation-based systematics that we rely on to understand our world. The theory component is missing.

I'm more interested in expert system-type AI, where the learning may help us infer more about the world in the systematic way that scientists try to grapple with it--logical rules and frameworks, physics equations, statistics, etc. But these seem to be far less effective or efficient than ANNs, at least as currently implemented.

That being said, I did see a great talk last week on using ANNs to accelerate viscoelasticity calculations--those sorts of things are definitely valuable for science by reducing the time required for simulations by orders of magnitude. And some of the discussion after the talk had to do with how a graph of biases and weights is, philosophically and practically, a fundamentally different way of representing the problem and its solution than a set of differential equations. There is no doubt that it's a useful way of solving the calculations, but it's unclear how we can build upon those weights and biases in a theoretically-meaningful way. How do we learn from machine learning?


There are quite a few neurons in an adult human, and they aren't just in some kind of undifferentiated randomly connected neuron soup. So when we talk about things that adult humans do in terms of neurons, it is a bit like talking about Linux in terms of transistors. It is not that transistors are irrelevant to Linux, but...

Now replace the neurons with some tangentially related abstraction that is massively different from real neurons, well...


> I did see a great talk last week on using ANNs to accelerate viscoelasticity calculations

I'd love a reference for this. Link, please?


I think everyone agrees that ANN are a useful abstraction of what happens in real neural networks. A "closer" abstraction in some sense, as you mention, is what is being called Spiking Neural Networks, so you may want to read up on that if you're interested. I don't think they are strictly more powerful in any way, for what it's worth, as they more or less just trade continuous-domain discrete-time computations for digital-domain continuous-time computations; although there's some interesting hardware work being done to implement them.

That said, don't underestimate the complexity of what is happening in real neurons. It's true that an individual neuron is fairly simple (more or less! some may perform very complex encodings of sensory data!), but the complexity of the "computations" they perform goes up very quickly, and you have a _lot_ of them that are highly connected, so it may be that ANN are a good abstraction in the sense that they reduce the complexity of spike trains to a more abstract but more powerful representation.

It's true though that they probably don't perform back-propagation, at least in the mathematical sense; I don't know the literature but I think an inhibitory mechanism is probably a better biological model for how "error detectors" can be used to suppress useless or noisy output.


ANNs would be useful or not completely apart from the relationship to anything "neural." I often wish that the association had never been drawn, it just doesn't seem very useful and it is always misleading so many people.


There is some work in this direction. See Bengio's Biologically Plausible Deep Learning (https://arxiv.org/pdf/1502.04156.pdf) and its references for a starting point. So far, they don't achieve SOTA on anything.


Nature uses much less power in a much more compact package because it took the path of inventing molecular nanotechnology. I agree, we should do that too! But in the meantime, the neural networks we build are pretty good for what they do and the requirements they meet.


Coming from a background in neuroscience, IMO, neurons are extremely complicated. There's tons of molecular machinery beyond their inputs and outputs (which in and of themselves are complicated). They are complicated enough that neuroscience has been a field for sometime now, and we still only know very little about the human brain. That's even if we are just talking about a single neuron, ignoring the enormous numbers of neurons and glial cells we have, all interconnected.


> finding out they were working with numpy makes it even more so (since numpy takes the pain out of the matrix operations)

hmm, I did the cs231n homework and had the opposite experience. It was really easy to complete it by ignoring numpy's provided matrix methods (just write a bunch of for loops in python) but that solution was really slow. If you could use numpy's matrix methods, however, the code executed a lot faster (the loops and control flow now run in C instead of python, so to speak). The hard part of the assignments was expressing my python code, with all its ad-hoc mixture of nested loops and conditionals, in numpy.


I found that in the ML Class, we could complete many of the assignments by doing the calcs via "for-loops" in Octave - and as you noted, it was really slow.

But I think Ng wanted us to understand what was going on under the hood in Octave when you used its in-built vector primitives, and how to think about the problems in such a way to understand how to "vectorise" them so that the solutions would be amenable to using those primitives.

There was a time where I wasn't "getting it" - and proceeded with my own implementation. In time though, it "clicked" for me, and I could put away those routines and use the vector math more fully (and quickly). That said, I wouldn't have wanted it to be left as a "black box" - it was nice to have the understanding of the operations it was doing behind the curtain.

Which is also why I was disappointed that the math wasn't described more in either of those courses; that part was left as "black boxes" and only hinted at a bit (ie "here's the derivative of the function - but you don't have to worry about it, but if you know about the math, you might find it interesting").

In this latest course I am taking, though, they are diving right into the math - and I have found that they assume you already know what a derivative is and how it is formed from the initial function. Unfortunately, I don't have the education needed, so I am fumbling through it (and doing what I can to read up on the relevant topics - I even bought a book on teaching yourself calculus which was rec'd for me by a helpful individual).


Yes, my point is that even though numpy provides a "black box" set of functions, using them is so unnatural (at least for me) that I was forced to completely understand what the numpy functions were doing internally to have any hope of using them.

In fact, now that I think about it, the first assignment asked us to write a 2-loops-in-python version of some function (batch linear classifier, I think), then a 1-loop version, then a 0-loop version, and I often repeated this procedure on subsequent harder questions.

That was an useful thing I wasn't expecting to learn from the course - how to vectorize code for numpy (including using strange features like broadcast and reshape)


If I can offer some advice as a former Calc I student, since you are focused on ML, ignore integrals and the fundamental theorem of calculus. Instead, understand the connection between derivatives and antiderivatives and become practiced with the rules of derivation: product rule, quotient rule, trigonomety functions (tanh is part of hyperbolic trigonometry), exponentiation, logarithms, and the chain rule. Using derivatives to find local minima and maxima will also be useful. You should be able to look at the graph of a function and quickly visualize the graph of its derivative.


That's why I enjoyed the "hackers guide to neural networks" so much - it builds everything from the ground up


> Real neurons don't use calculus and activation functions, nor back-propagation, etc in order to learn. All of those things in an ANN are just abstractions and models around what occurs in nature.

But do you think that real neurons are less complicated than artificial neurons? Look at the molecular structure of a single ion channel. It's crazy complicated!

How do we know that neurons don't do calculus? Chemical gradients, summation, etc.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: