It turns out that when you base your post on a ridiculous unsubstantiated premise, you can conclude anything.
Cardinal rule of drug dealing: never get high off your own stash.
Cardinal rule of everything else: follow the money.
Cynicism aside, I tend to believe the OP's premise that these methods will contribute to a new and unrecognizable kind of software development (for better or worse).
It feels more like training a cybernetic animal than programming, and I'm ok with that. It's a puzzle of trying to figure out how to morph reality so that ML systems can gain traction, and that's just a fascinating new way of looking at problems.
So if were comparing it to software, it's like only shipping binary blob which is basically impossible to debug, and in many cases, throwing away most of the source code (the training data). How many people are actually maintaining the full data sets used to train their networks?
Artificial neural networks were invented in 1943, some years before the earliest electronic computer prototypes. In fact, the McCulloch & Pitts paper that first described them served as an inspiration (possibly the main inspiration) for von Neumann's design of the digital circuits that would make the earliest computers. Alan Turing discussed using genetic algorithms for training neural networks in 1948, roughly concurrently with the earliest work on software. So "software 2.0", if not predated "software 1.0", certainly evolved concurrently with it right from the very beginning.
Early progress seemed so promising that a whole movement dedicated to the belief of the power of neural networks and similar techniques, called Cybernetics, was established in 1948 by some of the leading minds of that generation, like Norbert Wiener and von Neumann. They were so optimistic that they believed that a full mathematical theory of intelligence and the brain would be completed in five years. The skeptical Turing, who attended some of the groups meetings but found McCulloch to be a charlatan, thought it would take 50.
80 years later, while better hardware has certainly finally allowed us to enjoy some of the fruits of that old promise, we have yet to make any theoretical breakthrough. The algorithms we use today for training neural networks were invented 50 years ago, and there wasn't even a modest theoretical breakthrough in at least a few decades. I don't think we are in any position to announce any sort of revolution.
It's not true.
For example, the LSTM (the default architecture for use of a NN on text) wasn't invented until 1997.
If we want to look at just very recent work, ResNet wasn't invented until 2015, Attention until 2010 (being generous), NMTs in 2014 etc etc. One can argue these built on other things - which is absolute true - but we don't make the same argument about say Wayland under Linux: "Oh it's just a graphics API with X and Y co-ordinates and more powerful computers".
Until ~2014, people couldn't train neural networks over 4 layers deep. Until ResNet they couldn't do more than ~10 layers. Now hundreds of layers normal for image recognition tasks, and people still say "there is nothing new".
It wasn't faster hardware that got these techniques to work, it was new theoretical breakthroughs.
Well, maybe not actively any more, but I was working with NNs a lot in the 90s.
> For example, the LSTM (the default architecture for use of a NN on text) wasn't invented until 1997.
1. We know no more about the theory of LSTM networks than we do about ordinary NNs.
2. McCulloch and Pitts original 1943 paper explicitly mentions hooking a recurrent NN to external memory (they talk about a Turing machine tape).
> Until ~2014, people couldn't train neural networks over 4 layers deep. Until ResNet they couldn't do more than ~10 layers. Now hundreds of layers normal for image recognition tasks, and people still say "there is nothing new".
That's not what I said. I said that there has been great progress thanks to better hardware and heuristics that have accumulated over the years. But the theory has hardly moved an inch in at least a couple of decades, and has yet to make a breakthrough.
> it was new theoretical breakthroughs.
AFAIK, it's been all hardware and heuristics (like LSTM) gained through trial an error, but I'd love to see an example of a theoretical breakthrough.
Perhaps, but we do know a lot more about how (deep) NNs work than we did 40 years ago.
Take for example: Generalization in Deep Learning which gives a theoretical grounding of generalization.
McCulloch and Pitts original 1943 paper explicitly mentions hooking a recurrent NN to external memory (they talk about a Turing machine tape).
Yes, and DaVinci invented the helicopter, he just failed to get it working.
That really depends on what you mean by "a lot more". I have no qualms about the ongoing research and steady progress in machine learning. My problem is with claims (hinted in the article, as in many other places) that this progress has anything to do with the progress towards AI. AI has been projected to be within 5 years by optimists and 50 years by pessimists for the last 70 years. I don't think any responsible research in machine learning can say that we can now give a narrower range than 50 or 60 (or even 70) years ago.
> Take for example: Generalization in Deep Learning which gives a theoretical grounding of generalization.
So I gave it a quick read. It's certainly about theory, but I would be amazed if anyone called it anywhere near a breakthrough (and it is only marginally about NNs). It is certainly a step towards understanding the statistics of why we see decent generalizations in many scenarios, especially when considerable hand-tuning of the architecture is involved.
> Yes, and DaVinci invented the helicopter, he just failed to get it working.
There's a difference between advancing theory and practice. It wasn't a breakthrough in theory that led to the invention of LSTM, but better hardware and years of tinkering. Again -- this is important and certainly useful, but my point is that we are nowhere near able to proclaim that deep learning is the right path to AI, or that we are anywhere close to achieving it.
That's a different claim to the NN complaint you made.
: Not all, sadly, but machine learning is certainly among the worst offenders when it comes to claims vs. reality, although programming language theory is occasionally a close contender.
by that measure I don't think anyone can point to any substantial progress, let alone to major breakthroughs.
Because by the above metric, all those things which required intelligence before (image recognition, image captioning, good Go playing etc) and clearly aren't AI count as moved goal posts.
As for "I would suggest that machine learning people avoid that term, which is loaded, ill-defined and with a lot of embarrassing historical baggage of failed promises." I think it is interesting to note that Karpathy's article only mentions AI once (as AGI) in the closing sentence as a future work thing.
Personally, I think this is a crappy argument. I'm not at all sure "intelligence" is anything more than good pattern recognition, evolved heuristics and logical reasoning. I think good progress can be shown in all those areas.
There are only two things that "count" as AI: human (or perhaps animal) "intelligence" (this requires a definition of intelligence, which we don't have, but I'll take "we'll know it when we see it" for now), and the field of research working towards that goal. Anything else that some people call AI is nothing but empty marketing speech or the name given to whatever it is that the people researching AI are now doing. The second use seems more reasonable, and what counts as AI by that definition has never changed.
That's not to say that "AI" algorithms don't have some common features. They tend to be less discrete and more continuous, choosing a "best" answer rather than the definitely correct one. But, for example, back in the 40s and 50s, what we would now call control systems were also packaged under the same umbrella of Cybernetics. And, if you think about it, control systems use learning without memory (and some even do have memory; a Kalman filter is basically a single layer NN that employs backpropagation). Still, control systems have long been studied and produced by people who are not AI researchers, so we no longer consider them AI (although, do you remember the fuzzy logic craze of the '90s? It was considered a hybrid of AI and control).
> all those things which required intelligence before (image recognition, image captioning, good Go playing etc) and clearly aren't AI count as moved goal posts.
Doing arithmetic and recalling information based on queries had also been considered once to require intelligence, but they have never been considered "AI" because those were not the problems people in AI research have been working on. The goal posts have not moved an inch: AI is still human/animal intelligence, or whatever product AI researchers (working toward that goal) produce. These days, what AI researchers produce amounts to statistical clustering algorithms, so any statistical clustering algorithm is called AI. I don't see anything harder or more special about image recognition than DB technology, distributed systems, etc.
> I think it is interesting to note that Karpathy's article only mentions AI once (as AGI) in the closing sentence as a future work thing.
That's what I was referring to (I don't see what difference mentioning it only once makes). We simply don't know whether deep learning, i.e. deep neural networks trained through a variant of backpropagation, is the approach that would one day lead us to AI.
My other point was about the special status he assigns to machine learning as Software 2.0, something that is wrong both historically (machine-learning predates almost any other CS field) and in practice (machine learning is not taking over DBs, OSes, etc.; it's doing what it can do well, namely statistical learning).
> I'm not at all sure "intelligence" is anything more than good pattern recognition, evolved heuristics and logical reasoning. I think good progress can be shown in all those areas.
I don't know what I think about your definition of intelligence, but "good progress" is relative. I think current machine learning systems are quite disappointing (they're nowhere near as impressive as what, say, even insects can do, and I don't think we'd call insects intelligent, and they're prone to very "unintelligent" mistakes, not to mention that their learning process does not seem to resemble anything done by humans or animals). I think that in terms of theory, progress could be said to be slow at best, but in any event, we are certainly not in any position to say with any reasonable confidence that AI is less than 50 years in the future.
In terms of theory, a breakthrough would be a better understanding of what intelligence is on one hand, and how "unorganized systems", to use Turing's terminology, evolve sophisticated algorithms. At some stage, Turing believed that as a precursor to intelligence, we should study simpler biological phenomena, and turned to so-called "artificial life". There hasn't been too much progress on that front, as well, but the work done by Stuart Kauffman  since the late sixties seems like moving in the right direction, albeit very slowly.
Don't get me wrong: I'm not an AI skeptic. I believe that we will achieve it one day. I just think it is very irresponsible for machine learning researchers to hint we're getting close, when, in fact, they have no idea whether we are or we aren't. To be more specific, we don't know whether deep learning, i.e. deep neural networks trained through a variant of backpropagation is the approach that would one day lead us to AI.
Isn't this true for pretty much any science done ever? It only becomes a problem when marketing is good, and science is not.
so it's theory more in the sense of a hypothesis rather than what is meant by "theory" in math or physics
Again, isn't it true for pretty much any neuroscience research? How would you judge a neuroscience paper importance?
Also, going back to your earlier answer: what would convince you that a program displays the learning and reasoning abilities of some advanced invertebrates?
Maybe, but when it comes to a commercial entity I'm more suspicious.
> How would you judge a neuroscience paper importance?
I wouldn't; I'd let a neuroscientist judge. It's just that I believe most of us would hear of a major breakthrough in neuroscience.
> what would convince you that a program displays the learning and reasoning abilities of some advanced invertebrates?
It's hard to say precisely (largely because we don't know what intelligence is, let alone have a good quantitative measure for it), but if you read about insect behavior it's very clear that we're nowhere near that (just as it's clear people are more intelligent than spiders even though there are probably mental tasks that spiders can perform better/faster than humans). So ask me again when the question becomes harder to answer :)
Not the OP, but what has happened that is perhaps revolutionary is the hardware capabilities are now unlocking capabilities that were latent in the existing algorithms, but not previously accessible due to hardware limitations. Sometimes quantity creates it's own quality.
First, you really should look into what Mr. Karpathy has done. Second, none of these "old" algorithms were useful in the 40's and 50' because there was no internet and very little data. They are only becoming useful now due to the explosion of data. So, it stands to reason that it Mr. Karpathy isn't ignoring history out of malice, he is ignoring it because it doesn't fucking matter... at all.
The attitude displayed in the post is not only unjustified by the actual achievement (we are still far from achieving even insect-level "intelligence" or actually replacing a significant portion of "software 1.0") but has actually seriously harmed the very research subject, that I assume Mr. Karpathy wants to foster, in the past. AI research has been seriously burned by over-enthusiastic optimism before; it should learn the lesson.
They're more complementary than overlapping. It's something new in the toolkit, not something that replaces existing programs.
Some things will always be written in the "1.0" fashion.
Do you have a clear specification and expect exact results? "Software 1.0" is the best choice.
Do you have problems that are computationally intractable, or simply a huge amount of data and you can accept an approximate solution? This is where "Software 2.0" makes sense. It's already being used and it will keep expanding.
The factors that will determine the ration between "Software 1.0" and "Software 2.0" will likely be:
* how much we will be willing to accept approximate solutions
* how easy it will be to collect the training data and to train a neural network.
I can totally imagine an hybrid model where there is going to be a lot of "Software 1.0" with some black boxes trained using machine learning techniques.
I'm not very convinced we will have many 100% "Software 1.0" applications. That works well for some specific problems (like AlphaGo Zero mentioned in the article) but many other domains don't map that well to a machine learning problem.
Like the article clearly stated.