Back when I did some work in neural networks (10-15 years ago), the field had become saturated with 'meta' papers on topology generation, and also on optimization techniques, (slightly better gradient descent, pruning techniques etc). Neural networks had become more art than science. At the end of that period, I got the distinct sense that these were signs that the field had stalled in any significant break throughs. Soon after they fell out favor for well known reasons.
I'm curious outside of the current hype, if there's a sense of that now with the current field practitioners and saturation of papers.
What I have seen in recent weeks, as I've investigated different news and articles about machine learning, neural networks, and self-driving vehicles, etc - is that it seems like, at least maybe for a certain class of problems, there is a convergence on a generalized pattern, if not a solution, for implementing the network.
This pattern or solution seems built off of the Lecun LeNet MNIST convolutional neural network; specifically, the pattern seems to be roughly:
(input) -> (1:n - conv layers) -> (flatten) -> (1:n - fully connected layers) -> (output)
(1:n - conv layers) = (conv layer) -> (loss layer) -> (activation layer)
(1:n - fully connected layers) = (fully connected layer) -> (loss layer) -> (activation layer)
The (loss layer) is optional, but given that the (activation layer) seems to have converged (in most cases I've seen, which is probably not representative) to RELU, the (loss layer) is needed (sometimes) to prevent overfitting (simple dropout can work well, for instance).
I don't want to say that this pattern is the "be-all-end-all" of deep learning, but I have found it curious just how many different problems it can be successfully applied to. The RELU operation, while not being differentiable at zero - seems to work regardless (and there are other activation functions similar to RELU that are differentiable - like softplus - if needed).
Anyhow - these "building blocks" seem like the basics - the "lego" for deep learning; taking it from art to engineering. As a student, I've been playing with TensorFlow - and recently Keras (a framework for TensorFlow). These tools make it quick and easy to build deep learning neural networks like I described.
My gut feeling is that I'm talking from my rear; my level of knowledge isn't that great in this field. So - grain 'o salt and all that.
/disclaimer: I am currently enrolled and taking the Udacity Self Driving Car Engineer Nanodegree; in the past I've taken the Udacity CS373 course (2012) and the Standford-sponsored ML Class (2011).
I would like to augment just a couple of your observations with a few explanations --- I think the reason the architectures that you mention work so well is because they exploit hierarchical structure inherent in the data. Lots of things in the universe have hierarchical structure, e.g. vision has spatial hierarchy, video has spatial and temporal hierarchy, audio has temporal and spatial hierarchy.
(side note, thats how a baby learns vision "unsupervised" -- the spatial patterns on the retina have temporal proximity, so the supervisor is "time")
RELU is good, but you have to remember how you're vectorizing your data - if you're vectorising is normalizing to 0-1 then RELU fits nicely, but if you're scaling to mean 0 std dev. 1, then half your samples are negative and you'll get information loss when RELU does this: http://7pn4yt.com1.z0.glb.clouddn.com/blog-relu-perf.png so remember to keep your vectorizer in mind.
> My gut feeling is that I'm talking from my rear;
Nope not at all, you have good insight - my email address is in my profile, come join the slack group too (email me, let's go from there)
Those building blocks are precisely the art (it's both art and engineering) that siavosh is referring to. Everyone is mostly using derivatives of the same basic MNIST architecture of LeCun from 20 years ago ('97), because that's what works (no other justification). With tweaks of course (ReLU, batch norm, etc), also because "they just work."
It's funny you should mention dropout, because that's falling out of favor (fully-connected layers are also falling out of favor). Why? Because in many cases it doesn't seem to be necessary if you do batch norm. That's art and engineering (basically you try with dropout and batch norm, or with just batch norm, and just go with batch norm if no difference).
I'm not sure there's "no other justification". A couple reasons I find the structures given above should be expected to work:
1. It roughly parallels the progression of signals entering our brains through our senses, going through more-or-less convolutional sensory areas of cortex before getting mashed around in the more ad-hoc anterior regions.
2. Consider that convolution layers take advantage of structure in their inputs to radically reduce the number of parameters to train. Would the output of a fully-connected layer have the sort of structure that a convolution layer could take advantage of? (Maybe you could train the fully-connected layer to be exploitable by later conv layers, but doesn't that imply both more training time to get it to do that and more redundant output than you'd want?) It makes more sense to put your convolutions up front so they can take advantage of the correlations in your inputs (perhaps indirectly through the outputs of other conv layers).
2. Justifies convolution over fully-connected. It does not justify convolution in the first place.
None of these justify ReLU, max pooling, batch norm.
For example, Geoffrey Hinton (probably out of all deep learning researchers the most neuroscience motivated) actually believes max-pooling is terrible, the brain doesn't do it, and it's a shame that it actually works.
When you say
> It proves little since only the first visual layer (there are 6 visual cortex layers) does convolution-esque things.
I admit I'm not clear on the roles of the various layers of visual cortex. But I should note that there are several convolution-like steps which in an artificial network would be implemented with separate layers: center-surround detection (happens to occur before V1, but that's not really a problem for what I'm saying- it's a distraction that I skipped past for concision's sake) feeds into the convolution-like operations of oriented-edge detection, corner detection, etc. It's not like our visual system in toto just does one convolution-like step.
My second point must have been expressed poorly, because I have apparently completely failed to convey its meaning. It is meant to justify the particular choice of ordering convolutions before fully-connected layers. If you have already chosen to include both types, my point is that it makes sense to put those convolutions before the fully-connected layers.
As you can see in my "tag" on my post - most of what I have learned came from these courses:
1. AI Class / ML Class (Stanford-sponsored, Fall 2011)
2. Udacity CS373 (2012) - https://www.udacity.com/course/artificial-intelligence-for-r...
3. Udacity Self-Driving Car Engineer Nanodegree (currently taking) - https://www.udacity.com/course/self-driving-car-engineer-nan...
For the first two (AI and ML Class) - these two MOOCs kicked off the founding of Udacity and Coursera (respectively). The classes are also available from each:
Udacity: Intro to AI (What was "AI Class"):
Coursera: Machine Learning (What was "ML Class"):
Now - a few notes: For any of these, you'll want a good understanding of linear algebra (mainly matrices/vectors and the math to manipulate them), stats and probabilities, and to a lessor extent, calculus (basic info on derivatives). Khan Academy or other sources can get you there (I think Coursera and Udacity have courses for these, too - plus there are a ton of other MOOCs plus MITs Open Courseware).
Also - and this is something I haven't noted before - but the terms "Artificial Intelligence" and "Machine Learning" don't necessarily mean the same thing. Based on what I have learned, it seems like artificial intelligence mainly revolves around modern understandings of artificial neural networks and deep learning - and is a subset of machine learning. Machine learning, though, also encompasses standard "algorithmic" learning techniques, like logistic and linear regression.
The reason why neural networks is a subset of ML, is because a trained neural network ultimately implements a form of logistic (categorization, true/false, etc) or linear regression (range) - depending on how the network is set up and trained. The power of a neural network comes from not having to find all of the dependencies (iow, the "function"); instead the network learns them from the data. It ends up being a "black box" algorithm, but it allows the ability to work with datasets that are much larger and more complex than what the algorithmic approaches allow for (that said, the algorithmic approaches are useful, in that they use much less processing power and are easier to understand - no use attempting to drive a tack with a sledgehammer).
With that in mind, the sequence to learn this stuff would probably be:
1. Make sure you understand your basics: Linear Algebra, stats and probabilities, and derivatives
2. Take a course or read a book on basic machine learning techniques (linear regression, logistic regression, gradient descent, etc).
3. Delve into simple artificial neural networks (which may be a part of the machine learning curriculum): understand what feed-forward and back-prop are, how a simple network can learn logic (XOR, AND, etc), how a simple network can answer "yes/no" and/or categorical questions (basic MNIST dataset). Understand how they "learn" the various regression algorithms.
4. Jump into artificial intelligence and deep learning - implement a simple neural network library, learn tensorflow and keras, convolutional networks, and so forth...
Now - regarding self-driving vehicles - they necessarily use all of the above, and more - including more than a bit of "mechanical" techniques: Use OpenCV or another machine vision library to pick out details of the road and other objects - which might then be able to be processed by a deep learning CNN - ex: Have a system that picks out "road sign" object from a camera, then categorizes them to "read" them and use the information to make decisions on how to drive the car (come to a stop, or keep at a set speed). In essence, you've just made a portion of Tesla's vehicle assist system (first project we did in the course I am taking now was to "follow lane lines" - the main ingredient behind "lane assist" technology - used nothing but OpenCV and Python). You'll also likely learn stuff about Kalman filters, pathfinding algos, sensor fusion, SLAM, PID controllers, etc.
I can't really recommend any books to you, given my level of knowledge. I've read more than a few, but most of them would be considered "out of date". One that is still being used in university level courses is this:
Note that it is a textbook, with textbook pricing...
Another one that I have heard is good for learning neural networks with is:
There are tons of other resources online - the problem is separating the wheat from the chaff, because some of the stuff is outdated or even considered non-useful. There are many research papers out there that can be bewildering. I would say if you read them, until you know which is what, take them all with a grain of salt - research papers and web-sites alike. There's also the problem of finding diamonds in the rough (for instance, LeNet was created in the 1990s - but that was also in the middle of an AI winter, and some of the stuff written at the time isn't considered as useful today - but LeNet is a foundational work of today's ML/AI practices).
Now - history: You would do yourself good to understand the history of AI and ML, the debates, the arguments, etc. The base foundational work come from McCulloch and Pitts concept of an artificial neuron, and where that led:
Also - Alan Turing anticipated neural networks of the kind that wasn't seen until much later:
...I don't know if he was aware of McCulloch and Pitts work which came prior, as they were coming at the problem from the physiological side of things; a classic case where inter-disciplinary work might have benefitted all (?).
You might want to also look into the philosophical side of things - theory of mind stuff, and some of the "greats" there (Minsky, Searle, etc); also look into the books written and edited by Douglas Hofstadter:
There's also the "lesser known" or "controversial" historical people:
* Hugo De Garis (CAM-Brain Machine)
* Igor Aleksander
* Donald Michie (MENACE)
...among others. It's interesting - De Garis was a very controversial figure, and most of his work, for whatever it is worth - has kinda been swept under the rug. He built a few computers that were FPGA based hardware neural network machines that used cellular automata a-life to "evolve" neural networks. There were only a handful of these machines made; aesthetically, their designs were as "sexy" as the old Cray computers (seriously).
Donald Michie's MENACE - interestingly enough - was a "learning computer" made of matchboxes and beads. It essentially implemented a simple neural network that learned how to play (and win at) naughts and crosses (TIC-TAC-TOE). All in a physically (by hand) manipulated "machine".
Then there is one guy, who is "reviled" in the old-school AI community on the internet (take a look at some of the old comp.ai newsgroup archives, among others). His nom-de-plume is "Mentifex" and he wrote something called "MIND.Forth" (and translated it to a ton of other languages), that he claimed was a real learning system/program/whatever. His real name is "Arthur T. Murray" - and he is widely considered to be one of the earliest "cranks" on the internet:
Heck - just by posting this I might be summoning him here! Seriously - this guy gets around.
Even so - I'm of the opinion that it might be useful for people to know about him, so they don't go to far down his rabbit-hole; at the same time, I have a small feeling that there might be a gem or two hidden inside his system or elsewhere. Maybe not, but I like to keep a somewhat open mind about these kinds of things, and not just dismiss them out of hand (but I still keep in mind the opinions of those more learned and experienced than me).
There's much better tooling in place, and hardware in place( modern GPUs) now to distribute the evaluation over a cluster of GPU machines, so it's becoming practical to push the boundaries further
edit: I also forgot to mention, older methods often required the manual specification of heuristics for mutation and crossover rules, this is no longer the case, because the machines can be trained with objective functions to prevent overfitting, optimize execution speed or memory usage and can use techniques like dropout, skip connections, layer / node hyperparameters and so on to direct the topology search
Some other things you may be interested in:
Learning to learn by gradient descent by gradient descent
Neural Architecture Search with Reinforcement Learning
 https://arxiv.org/abs/1606.03657, https://arxiv.org/pdf/1701.00160.pdf
Training neural networks is still an art to a degree, but it is becoming better understood, not less.
A ML-learning group at my university recently discussed Learning to learn by gradient descent by gradient descent , the topic seems related. This indicates to me that at least some progress was made.
If you were going into CS grad school for a PhD it is very possible that the bubble might pop before you graduate.
Inductive logic programming is also a field that would benefit from a breakthrough.
The story of most academic developments is that somebody develops a new technique, then they solve the problems that are easy to solve with the technique. At some point the problems that can be solved with the technique are solved and nobody has an idea of how to break the technique's limitations. Maybe 20 years after that, somebody gets a new idea.
Some examples: perceptrons, symbolic A.I., multilayer perceptrons, renormalization group theory in physics, etc.
It can take 10 years to get a PhD so it is not impossible that the field can burn out before then. I don't believe it is burning out in 2 or 3 years.
Imagine everyone who never took multi-dimensional calculus in college (most of humanity). Where are they today? Those folks have data they're not using.
But at this moment, we are far from done playing with what we already discovered. Even as a black box, deep learning already cracked many traditional tasks to near/above human performance, and enables many application that is simply not possible before.
So, I guess we are going to more papers to come in the following years until no low-hanging fruits left.
Second: I don't get it. The primary example they use to illustrate the need has almost nothing to do with model building or selection, and everything to do with selecting and painstakingly cleaning data. This mirrors my experience with data science so far.
"A recent exercise conducted by researchers from New York University illustrated the problem. The goal was to model traffic flows as a function of time, weather and location for each block in downtown Manhattan, and then use that model to conduct “what-if” simulations of various ride-sharing scenarios and project the likely effects of those ride-sharing variants on congestion. The team managed to make the model, but it required about 30 person-months of NYU data scientists’ time and more than 60 person-months of preparatory effort to explore, clean and regularize several urban data sets, including statistics about local crime, schools, subway systems, parks, noise, taxis, and restaurants."
So - the meta part isn't such a big deal. But if DARPA has found a way to properly automate the painstaking process of selecting, cleaning, validating, and normalizing data, well THEN we'll really have something to be impressed about.
It seems that this new technology impacts the poorly and highly educated alike.
For instance, it's well documented that radiologists are going to be one of the first jobs that suffer as a result of ML, due to it being image recognition.
It seems to me that the companies manufacturing these machines used pure signal processing techniques which don't give the results doctors are expecting.
My humidor after all, has not a single machine made cigar despite one every so often being improperly rolled and unsmokable.
However let's not forget that we have (and we have had for some time now) algorithmic composition and more recently we have seen style transfer through neural networks and if artificial intelligence is able to produce visual art and music, that makes me think that there are very few human activities that may not be automated.
What will "save" some human activities is either technological shortcomings, which are probably temporary (e.g. the hand made cigars in your humidor: the difficulty is that machines are currently not very good at rolling cigars with a whole leaf filler, but I think eventually we will get there) or consumer's preference, which is arbitrary (e.g. you may think "OK so now, in 2025, machines are able to roll as well as the best Cuban rollers, but I choose to buy only human made cigars, just because I can").
Besides, a LOT of the music people listen to is already heavily computer generated.
People have been "producing" music using computer assited software for decades. I know people who have produced whole albums without ever touching an instrument and who know nothing about musical theory, the product? Pretty good.
Music production software can correct melodies, autotune voices,setup chord progressions, lay down rhythms etc. You could probably argue that knowing advanced musical theory hasn't really been important for a long time because so many people love "simple" music. Pop, dance etc.
This combined with the already practically endless amount of good already available practically makes the need for machine generated music redundant, the automation of musicians, photographers and painters happened a long time ago.
I think music is largely about "discovery" now, there is just such an abundance of great stuff already available, you could never hear it all in a lifetime.
But hey, people still manage to make great stuff and live music is still the life blood of it all ;)
Algorithmic composition, on the other hand, is less common, and mostly relegated to avant-garde or art music, although there are examples in popular music such as Brian Eno.
Isn't that already very formatted and automation-friendly? I feel a dedicated machine could very well be the one that explains to a human how to make another machine do what he wants. The simple cases have been replaced with automated helplines already.
Also confirmed by this article https://www.naiop.org/en/Magazine/2015/Summer-2015/Business-...
Non-sequential and non-hierarchical topologies are the future, and it makes sense that machines should generate them
Doing a search shows that it's a subject that is still being taught, but I rarely see recent articles about it.
I wonder a bit if there is a bias against the gaming aspect of it.
The jumps in poker have been kind of neat lately. Would like to see more variations played.
I wish Buzz was open source too, I don't know if you've ever tried any but there are a couple of similar open source applications, my favourite used to be Psycle, even though the project doesn't seem to be very active nowadays http://psycle.pastnotecut.org/portal.php and there is also Buzztrax which directly inspired from Buzz http://buzztrax.org/
The new reference site for Buzz gear is this one: http://buzz.robotplanet.dk/
This is referred to as the intelligence explosion or AI Foom.
We sequenced human DNA, so I think we already have something kinda like it. I'm not a scientist but my basic understanding is that while we have the whole sequence (in your source code analogy, we can do git clone or git pull) we are not really good at tweaking it ("oops this codebase uses architectural solutions we're not familiar with, we will have to study it extensively before we can deploy those proposed changes to the production environment")
The effort to sequence the human genome has also led us to discover that there are other ways in which traits can be heritable that don't involve changes to DNA. The focus now is largely on epigenetics, although explanations invoking microbiomes of bacteria inside our organs was definitely popular for a while.
In short, we really don't know a whole lot about our own molecular biology, and a lot of the research in the past 60 years since the discovery of DNA has tended to show "there's more going on than we thought." Where things involve just DNA, we have very good tools for reading (sequencing) and writing (CRISPR/Cas9) it. What we don't have good tools for is modifying our epigenetics, and we don't have a good handle on what comes from DNA and what comes from epigenetics.
If you just want an overview of what's going on, I curated these links for you. For each, focus on the main idea and why it is significant for accomplishing biological goals.
https://en.wikipedia.org/wiki/Molecular_biology section 1-2
https://www.khanacademy.org/science/biology/classical-geneti... watch 2x speed
https://en.wikipedia.org/wiki/Epigenetics intro and diagram only
http://www.zymoresearch.com/learning-center/epigenetics/what... all pages
https://www.jove.com/science-education-database/2/basic-meth... understand basic research methods
https://www.cancerquest.org/cancer-biology/ is awesome. I recommend angiogenesis, metastasis, and tumor-host interactions
The idea of automating aspects of data science like data cleansing and model building makes sense. A good goal.
There is obviously a lot of excitement in deep learning success stories but I would like to see more effort also put into both fusing more traditional AI with deep learning, and also better interactive UIs for data science and machine learning work flows.
A little off topic, but I started using Pharo Smalltalk again this week after not touching it in a long time. I was thinking about using Pharo like environments instead of tools like iPython for organizing workflows. I admit this is likely not such a good idea because most of the great libraries for data science and AI are in C++, Java, Python, and Scala.
More seriously, as someone who's interested in machine learning but doesn't really know where to start in terms of really understanding it beyond "it takes input, does some weird statistical magic, and gives you an output": would one of these D3M tools (should they become readily available and less theoretical) be a decent starting point? Or would it still be better from a learning perspective to start with something more fundamental or "basic building block"? In other words: does it help to know how to build these models from scratch even if you're using some tool to do it automatically?
- Take Andrew Ng's ML class on Coursera.
- Install TensorFlow and TFLearn.
- git clone something based on TF from GitHub and hack it.
- Do some pet project from scratch.
Automatic differentiation is usually very little overhead. Think "putting a second set of values through a compiled slightly differently function".
Although there are calculus-related issues like saturation, smoothness of transfer functions, etc. "solving calculus" is not the major problem with machine learning efficiency and throughput. Linear algebra is the main bottleneck.
A multiplication operation can be done with a single transistor: one operand is gate voltage, another one is source to drain voltage. The resulting current through the transistor is proportional to the product of the two.
An addition operation is even simpler: it's just connecting two wires carrying two different currents - the resulting current on the output wire will be the sum (due to Kirchhoff laws).
Of course, you pay for this efficiency with precision, which for some applications (e.g. neural networks) can be a reasonable trade off.
Please correct me if I misunderstood your point.
Otherwise you might need to clarify your idea a bit more.
> OK, so how do you store your model? In analog or digital format?
This is a good question. Do you think that conversion back and forth would diminish the gains from analog computing?
I think of it as the output being the solution of an equation/dynamic system. It's ok you lose the equations, you care about the solution. I've been doing some light internet research into this not too long ago and came across this title https://www.amazon.com/Neuromorphic-Photonics-Paul-R-Prucnal... that seems to suggest that analog photonics might be the way forward.
But I can't really comment on the state of this. Photonics is definitely becoming more and more viable it appears.
This makes me think of the premise behind Excel, and Access: provide a way for (mostly) non-programmers a nice wizard like tool to analyze data. Of course, the (painful) shortcomings of such tools are known all too well. Hope this Darpa project fares better. On the plus side it democratizes access to a technology, the good and the bad.
But the example in the article of traffic modeling with 30 person-months of analysis and 90 p-months of cleansing illustrates how hard such analysis actually is. Ain't no way you can automate it.
Useful intel signal is hard to find in noise, especially in what by now must be zottabytes. Enlisting anyone to do analysis without deep domain expertise, especially someone as dumb as a computer, is not a promising strategy for success. But it sounds like a great way for beltway bandits to get funding for long term blue sky R&D contracts...
Any reason for this?