Hacker News new | comments | show | ask | jobs | submit login
Software 2.0 (medium.com)
81 points by stablemap 9 months ago | hide | past | web | favorite | 36 comments



"It turns out that a large portion of real-world problems have the property that it is significantly easier to collect the data than to explicitly write the program."

It turns out that when you base your post on a ridiculous unsubstantiated premise, you can conclude anything.


I like how he then proceeds to enumerate all the things which, if they weren’t doable, no one would really care.

Cardinal rule of drug dealing: never get high off your own stash.


Individuals don't care, but the corporate surveillance state cares a lot. Face recognition, voice recognition, and speech recognition alone are an Orwellian dream come true.

Cardinal rule of everything else: follow the money.

Cynicism aside, I tend to believe the OP's premise that these methods will contribute to a new and unrecognizable kind of software development (for better or worse).


This seems like a poor framing when you consider that the vast majority of "software 1.0" projects are of a form that does not map onto the semantic structure of a machine learning problem.


Not mentioned in disadvantages is that you can't really explain "how" a neural network produces a given output from given inputs. Some things can be figured out from training data, if it were available, but more often than not it isn't, you just get a pre-trained network.

So if were comparing it to software, it's like only shipping binary blob which is basically impossible to debug, and in many cases, throwing away most of the source code (the training data). How many people are actually maintaining the full data sets used to train their networks?


On my journey to catch back up with ML after having abandoned it around 2000, I find myself thinking the same things as Karpathy. There is something profound here, and as someone that has been writing software for 40 years since the ripe age of 8, it is scintillating in its potential. I'm working in a field distant from most current applications of ML (DDOS detection/mitigation and "cyber" security), and so I'm reaping the benefits of being able to take this stuff and apply it to lots of problems. Results are shocking so far.

It feels more like training a cybernetic animal than programming, and I'm ok with that. It's a puzzle of trying to figure out how to morph reality so that ML systems can gain traction, and that's just a fascinating new way of looking at problems.

Great article!


Not everything is data. Interesting computer science (and just science for that matter) problems have a lot to do with data but the rest falls under entertainment or utility problems. To write this type of software using only ML models you need something that mimics the human nature - that will be effectively AI.


What bothers me about articles like this is not their faith in the power of neural networks -- power that, I'm sure, we have only just begun to harness effectively -- but that by ignoring the history of neural networks (and machine learning in general), they paint a somewhat misleading picture.

Artificial neural networks were invented in 1943, some years before the earliest electronic computer prototypes. In fact, the McCulloch & Pitts paper that first described them served as an inspiration (possibly the main inspiration) for von Neumann's design of the digital circuits that would make the earliest computers. Alan Turing discussed using genetic algorithms for training neural networks in 1948[1], roughly concurrently with the earliest work on software. So "software 2.0", if not predated "software 1.0", certainly evolved concurrently with it right from the very beginning.

Early progress seemed so promising that a whole movement dedicated to the belief of the power of neural networks and similar techniques, called Cybernetics, was established in 1948[2] by some of the leading minds of that generation, like Norbert Wiener and von Neumann. They were so optimistic that they believed that a full mathematical theory of intelligence and the brain would be completed in five years. The skeptical Turing, who attended some of the groups meetings but found McCulloch to be a charlatan, thought it would take 50.

80 years later, while better hardware has certainly finally allowed us to enjoy some of the fruits of that old promise, we have yet to make any theoretical breakthrough. The algorithms we use today for training neural networks were invented 50 years ago, and there wasn't even a modest theoretical breakthrough in at least a few decades. I don't think we are in any position to announce any sort of revolution.

[0]: https://en.wikipedia.org/wiki/Artificial_neuron

[1]: https://en.wikipedia.org/wiki/Unorganized_machine

[2]: https://en.wikipedia.org/wiki/Cybernetics


So this ("NN's were invented a long time ago and there has been no theoretical progress since") is a common meme put forward by people who read bits about AI, usually work in software but aren't actively doing machine learning.

It's not true.

For example, the LSTM (the default architecture for use of a NN on text) wasn't invented until 1997.

If we want to look at just very recent work, ResNet wasn't invented until 2015, Attention until 2010 (being generous), NMTs in 2014 etc etc. One can argue these built on other things - which is absolute true - but we don't make the same argument about say Wayland under Linux: "Oh it's just a graphics API with X and Y co-ordinates and more powerful computers".

Until ~2014, people couldn't train neural networks over 4 layers deep. Until ResNet they couldn't do more than ~10 layers. Now hundreds of layers normal for image recognition tasks, and people still say "there is nothing new".

It wasn't faster hardware that got these techniques to work, it was new theoretical breakthroughs.


> usually work in software but aren't actively doing machine learning.

Well, maybe not actively any more, but I was working with NNs a lot in the 90s.

> For example, the LSTM (the default architecture for use of a NN on text) wasn't invented until 1997.

1. We know no more about the theory of LSTM networks than we do about ordinary NNs.

2. McCulloch and Pitts original 1943 paper explicitly mentions hooking a recurrent NN to external memory (they talk about a Turing machine tape).

> Until ~2014, people couldn't train neural networks over 4 layers deep. Until ResNet they couldn't do more than ~10 layers. Now hundreds of layers normal for image recognition tasks, and people still say "there is nothing new".

That's not what I said. I said that there has been great progress thanks to better hardware and heuristics that have accumulated over the years. But the theory has hardly moved an inch in at least a couple of decades, and has yet to make a breakthrough.

> it was new theoretical breakthroughs.

AFAIK, it's been all hardware and heuristics (like LSTM) gained through trial an error, but I'd love to see an example of a theoretical breakthrough.


We know no more about the theory of LSTM networks than we do about ordinary NNs.

Perhaps, but we do know a lot more about how (deep) NNs work than we did 40 years ago.

Take for example: Generalization in Deep Learning[1] which gives a theoretical grounding of generalization.

McCulloch and Pitts original 1943 paper explicitly mentions hooking a recurrent NN to external memory (they talk about a Turing machine tape).

Yes, and DaVinci invented the helicopter, he just failed to get it working.

[1] https://arxiv.org/pdf/1710.05468.pdf


> Perhaps, but we do know a lot more about how (deep) NNs work than we did 40 years ago.

That really depends on what you mean by "a lot more". I have no qualms about the ongoing research and steady progress in machine learning. My problem is with claims (hinted in the article, as in many other places) that this progress has anything to do with the progress towards AI. AI has been projected to be within 5 years by optimists and 50 years by pessimists for the last 70 years. I don't think any responsible research in machine learning can say that we can now give a narrower range than 50 or 60 (or even 70) years ago.

> Take for example: Generalization in Deep Learning[1] which gives a theoretical grounding of generalization.

So I gave it a quick read. It's certainly about theory, but I would be amazed if anyone called it anywhere near a breakthrough (and it is only marginally about NNs). It is certainly a step towards understanding the statistics of why we see decent generalizations in many scenarios, especially when considerable hand-tuning of the architecture is involved.

> Yes, and DaVinci invented the helicopter, he just failed to get it working.

There's a difference between advancing theory and practice. It wasn't a breakthrough in theory that led to the invention of LSTM, but better hardware and years of tinkering. Again -- this is important and certainly useful, but my point is that we are nowhere near able to proclaim that deep learning is the right path to AI, or that we are anywhere close to achieving it.


My problem is with claims (hinted in the article, as in many other places) that this progress has anything to do with the progress towards AI

That's a different claim to the NN complaint you made.


Because progress is hard to measure when you have no goal to compare it against, and therefore no metric. Most disciplines don't try to measure it and largely avoid making grandiose claims [1]. But some in machine learning, for some reason, choose to measure progress in terms of AI, and by that measure I don't think anyone can point to any substantial progress, let alone to major breakthroughs. But even by any other measure, I don't think there has been a major theoretical breakthrough in machine learning in decades, unlike, say, breakthroughs in complexity theory, distributed systems, and other CS disciplines. In any event, marketers can use the term AI to refer to Quicksort for all I care, but I would suggest that machine learning people avoid that term, which is loaded, ill-defined and with a lot of embarrassing historical baggage of failed promises. Instead, they should be pleased that the discipline has finally produced workable practices that are proving useful in some important domains.

[1]: Not all, sadly, but machine learning is certainly among the worst offenders when it comes to claims vs. reality, although programming language theory is occasionally a close contender.


How about this metric: How many times does the metric for "what counts as AI" move?

by that measure I don't think anyone can point to any substantial progress, let alone to major breakthroughs.

Because by the above metric, all those things which required intelligence before (image recognition, image captioning, good Go playing etc) and clearly aren't AI count as moved goal posts.

As for "I would suggest that machine learning people avoid that term, which is loaded, ill-defined and with a lot of embarrassing historical baggage of failed promises." I think it is interesting to note that Karpathy's article only mentions AI once (as AGI) in the closing sentence as a future work thing.

Personally, I think this is a crappy argument. I'm not at all sure "intelligence" is anything more than good pattern recognition, evolved heuristics and logical reasoning. I think good progress can be shown in all those areas.


> How about this metric: How many times does the metric for "what counts as AI" move?

There are only two things that "count" as AI: human (or perhaps animal) "intelligence" (this requires a definition of intelligence, which we don't have, but I'll take "we'll know it when we see it" for now), and the field of research working towards that goal. Anything else that some people call AI is nothing but empty marketing speech or the name given to whatever it is that the people researching AI are now doing. The second use seems more reasonable, and what counts as AI by that definition has never changed.

That's not to say that "AI" algorithms don't have some common features. They tend to be less discrete and more continuous, choosing a "best" answer rather than the definitely correct one. But, for example, back in the 40s and 50s, what we would now call control systems were also packaged under the same umbrella of Cybernetics. And, if you think about it, control systems use learning without memory (and some even do have memory; a Kalman filter is basically a single layer NN that employs backpropagation). Still, control systems have long been studied and produced by people who are not AI researchers, so we no longer consider them AI (although, do you remember the fuzzy logic craze of the '90s? It was considered a hybrid of AI and control).

> all those things which required intelligence before (image recognition, image captioning, good Go playing etc) and clearly aren't AI count as moved goal posts.

Doing arithmetic and recalling information based on queries had also been considered once to require intelligence, but they have never been considered "AI" because those were not the problems people in AI research have been working on. The goal posts have not moved an inch: AI is still human/animal intelligence, or whatever product AI researchers (working toward that goal) produce. These days, what AI researchers produce amounts to statistical clustering algorithms, so any statistical clustering algorithm is called AI. I don't see anything harder or more special about image recognition than DB technology, distributed systems, etc.

> I think it is interesting to note that Karpathy's article only mentions AI once (as AGI) in the closing sentence as a future work thing.

That's what I was referring to (I don't see what difference mentioning it only once makes). We simply don't know whether deep learning, i.e. deep neural networks trained through a variant of backpropagation, is the approach that would one day lead us to AI.

My other point was about the special status he assigns to machine learning as Software 2.0, something that is wrong both historically (machine-learning predates almost any other CS field) and in practice (machine learning is not taking over DBs, OSes, etc.; it's doing what it can do well, namely statistical learning).

> I'm not at all sure "intelligence" is anything more than good pattern recognition, evolved heuristics and logical reasoning. I think good progress can be shown in all those areas.

I don't know what I think about your definition of intelligence, but "good progress" is relative. I think current machine learning systems are quite disappointing (they're nowhere near as impressive as what, say, even insects can do, and I don't think we'd call insects intelligent, and they're prone to very "unintelligent" mistakes, not to mention that their learning process does not seem to resemble anything done by humans or animals). I think that in terms of theory, progress could be said to be slow at best, but in any event, we are certainly not in any position to say with any reasonable confidence that AI is less than 50 years in the future.


What would be an example of substantial progress/major breakthrough in AI?


In terms of practice, I would say a program that displays the learning and reasoning abilities of some advanced invertebrates (say wasps or spiders). But we don't know how far away from AI that would put us. Once we achieve that, are we 5 years away from human-level intelligence or 50? We simply do not know.

In terms of theory, a breakthrough would be a better understanding of what intelligence is on one hand, and how "unorganized systems", to use Turing's terminology, evolve sophisticated algorithms. At some stage, Turing believed that as a precursor to intelligence, we should study simpler biological phenomena, and turned to so-called "artificial life". There hasn't been too much progress on that front, as well, but the work done by Stuart Kauffman [1] since the late sixties seems like moving in the right direction, albeit very slowly.

Don't get me wrong: I'm not an AI skeptic. I believe that we will achieve it one day. I just think it is very irresponsible for machine learning researchers to hint we're getting close, when, in fact, they have no idea whether we are or we aren't. To be more specific, we don't know whether deep learning, i.e. deep neural networks trained through a variant of backpropagation is the approach that would one day lead us to AI.

[1]: https://en.wikipedia.org/wiki/Stuart_Kauffman


Thanks. Do you follow Numenta's work? They published several papers recently (e.g. [1])

[1] https://numenta.com/papers/a-theory-of-how-columns-in-the-ne...


I find it very hard to judge papers by Numenta, because they make it very hard to separate science from marketing; their entire marketing moto is "we're better because we're more like the actual brain" (their HTM networks), rather than "because we perform drastically better". So claims to better biological accuracy (a controversial direction since the birth of AI) are pretty much their raison d'etre. So they say they're writing about theory in this paper, but I see mostly observations, so it's theory more in the sense of a hypothesis rather than what is meant by "theory" in math or physics. But I am really not qualified to judge this paper's importance.


they make it very hard to separate science from marketing

Isn't this true for pretty much any science done ever? It only becomes a problem when marketing is good, and science is not.

so it's theory more in the sense of a hypothesis rather than what is meant by "theory" in math or physics

Again, isn't it true for pretty much any neuroscience research? How would you judge a neuroscience paper importance?

Also, going back to your earlier answer: what would convince you that a program displays the learning and reasoning abilities of some advanced invertebrates?


> Isn't this true for pretty much any science done ever?

Maybe, but when it comes to a commercial entity I'm more suspicious.

> How would you judge a neuroscience paper importance?

I wouldn't; I'd let a neuroscientist judge. It's just that I believe most of us would hear of a major breakthrough in neuroscience.

> what would convince you that a program displays the learning and reasoning abilities of some advanced invertebrates?

It's hard to say precisely (largely because we don't know what intelligence is, let alone have a good quantitative measure for it), but if you read about insect behavior it's very clear that we're nowhere near that (just as it's clear people are more intelligent than spiders even though there are probably mental tasks that spiders can perform better/faster than humans). So ask me again when the question becomes harder to answer :)


Thanks for the deeper perspective, very interesting.

Not the OP, but what has happened that is perhaps revolutionary is the hardware capabilities are now unlocking capabilities that were latent in the existing algorithms, but not previously accessible due to hardware limitations. Sometimes quantity creates it's own quality.


You stated that Andrej Karpathy ignores the history of neural networks and posits a misleading picture. While you gave a very brief history of Neural Networks, you failed (imo, BIG TIME) to make a point.

First, you really should look into what Mr. Karpathy has done. Second, none of these "old" algorithms were useful in the 40's and 50' because there was no internet and very little data. They are only becoming useful now due to the explosion of data. So, it stands to reason that it Mr. Karpathy isn't ignoring history out of malice, he is ignoring it because it doesn't fucking matter... at all.


The point, simply put, is that neural networks have felt like a revolution for 75 years, while little actual progress has been made. That they have finally proven to be effective at statistical clustering, and so are becoming very practical for some important tasks, is very cool, but declaring them to be a revolution or the road to AI is premature.

The attitude displayed in the post is not only unjustified by the actual achievement (we are still far from achieving even insect-level "intelligence" or actually replacing a significant portion of "software 1.0") but has actually seriously harmed the very research subject, that I assume Mr. Karpathy wants to foster, in the past. AI research has been seriously burned by over-enthusiastic optimism before; it should learn the lesson.


I don't agree with the author. There're plenty of use cases which aren't covered by machine learning and won't be covered by existing machine learning method in the future. Yes, we've came far with ML, but until we really have Software 2.0, a lot of time will pass.


I don't think software 2.0 is a good name, because I don't think the programs we write right now and neural nets are that comparable.

They're more complementary than overlapping. It's something new in the toolkit, not something that replaces existing programs.


The main issue with ML/AI is that it has not been exploited yet by hackers and criminals. We actually don't know ML/AI weaknesses at all! We have some basic examples like feeding fabricated pages to google search AI to hack page rank. Or exploiting news collector AI to push fake news to the top. This area is basically ignored by researchers and mass media.


So in 2017 he suddenly realized that visual recognition, speech recognition, etc are done with neural networks. How enlightening!


apples vs oranges imho. i feel op is speaking from a narrow perspective.


Karpathy beautifully phrases what I felt before. This is important. Slightly frustrating too if you come from a 'hardcoding' background, but there is hope that many non-data problems remain to be solved. Data is software.


I despise the name "Software 2.0" because it implies a certain betterness.

Some things will always be written in the "1.0" fashion.


I think that, as it already happens today, different classes of problems are going to be solved with different techniques.

Do you have a clear specification and expect exact results? "Software 1.0" is the best choice.

Do you have problems that are computationally intractable, or simply a huge amount of data and you can accept an approximate solution? This is where "Software 2.0" makes sense. It's already being used and it will keep expanding.

The factors that will determine the ration between "Software 1.0" and "Software 2.0" will likely be: * how much we will be willing to accept approximate solutions * how easy it will be to collect the training data and to train a neural network.

I can totally imagine an hybrid model where there is going to be a lot of "Software 1.0" with some black boxes trained using machine learning techniques.

I'm not very convinced we will have many 100% "Software 1.0" applications. That works well for some specific problems (like AlphaGo Zero mentioned in the article) but many other domains don't map that well to a machine learning problem.


> Some things will always be written in the "1.0" fashion.

Like the article clearly stated.


But the software they describe is better. And besides, linear versions don't mean better necessarily. Is 2017 better than 2016?


How can it be better when it fails randomly and we don't even understand how it works? Where I come from, that's the worst kind of software.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: