Hacker News new | past | comments | ask | show | jobs | submit login
Deep Learning – The “Why” Question (2018) (piekniewski.info)
81 points by YeGoblynQueenne 16 days ago | hide | past | favorite | 64 comments



Everybody knows that biologists are the ones doing real science. They look under microscopes and see how life works firsthand.

But any chemist will tell you that everything the biologists see is really down to chemical reactions, all those behaviors are the function of electrons being transferred and molecules changing shape. Yes, chemists are the ones doing real science. They weigh things, measure the temperature and pressure of their reaction vessels, and see the effects of all the chemicals that interact firsthand.

But any physicist will tell you that everything the chemists see are down to electrical interactions, they their chemical models are just the crude implementations of the laws of nature that only physicists understand, studying them in colliders and syncotrons and seeing the fundamental particles that make up the universe. Yes, everybody knows that physicists are the ones doing real science.

But any mathematician will tell you that the physicists simply take the most beautiful equations and strip away all the elegant details to a crude approximation, and say these basic objects represent well the simple experiments and measurements they do. Only mathematicians who understand the fundamental rules and symmetries of the universe can truly understand the universe through the study of equations that reflect how everything works. Yes everybody knows that mathematicians are the only ones doing real science.

But any Logician can tell you that a mathematician is just applying axioms and predicates willy nilly to get to some expression that piles together fundamental truths in such a spaghetti heap that the beauty of the real knowable rules of existence are lost. The mathematicians merely apply the findings of logicians to reach much more trivial and applied goals. Yes, logicians are the ones doing science.

But no. EVEN a biologist, knows, a logician does not do science.



SATnet is an interesting paper, in this regard. It's built on a theory of backpropping through the semi-definite programming relaxation of a MAXSAT problem, to learn the MAXSAT problem associated with a class of training examples. It gets good results: learns to solve sudoku puzzles with only binary feedback on its performance (i.e. "that's the solution" or "that's not the solution", no partial feedback on partially-correct solutions). They say it's the first time a neural net has been able to do that. On the other hand, if you look at the coefficient matrix they learn for the MAXSAT problem, it doesn't correspond to any logical proposition because the overwhelming majority of the learned coefficients are close to 0, whereas in the representation of a MAXSAT problem, the coefficients must be -1, 0 or 1, where 0 means the corresponding variable doesn't occur.

Is it a major result? Well I'm certainly impressed. Do we understand what the network is doing? I don't think we do... I'm still glad to have read the paper, though. I don't know what standard you would hold researchers to, to let through results like SATnet but reject spray-and-pray results which only look good because of offline multiple sampling. If you could require a lab notebook of authors, detailing every step of development, that might work.

https://arxiv.org/abs/1905.12149


Call it what you will, but from what I can see progress exploded when theoretical justification was thrown out the window. Maybe intelligence just is messy at its core? Science is all about induction, but what if there are no simple rules to break this down?

The only successful intelligence we know of came about through evolution, not science. The way I see it, researchers are now doing what evolution did, but in a more guided way and much more quickly. Whether you call that science or engineering or tinkering, it doesn't really matter.


Actually most of the scientific progress in these tasks had already been made and all that was missing was the compute power. Just like the know-how and basic principles of how electric cars and have been around for more than a 100 years and what’s most been missing are cheap and powerful enough batteries.

So it depends on how you measure progress? Scientifically and advancement of knowledge, or success in applying such knowledge to business problems and use cases. That the deep learning community insists on publishing most of these breakthroughs as research is the basis of the critique being made in the OP.


I think it was Nassim Taleb[1] that made a case for that engineering and tinkering drives science rather than the other way around, and that most of the knowledge gained comes from the practitioners rather than the scientists, who swoop in to claim credit after the fact (he uses the jet-engine as an example of this). That's not to say fundamental science is unimportant, just that it's getting too much of the credit for all the stuff that's been built.

There's a tremendous amount of know-how required to successfully build an airplane, a jet-engine, a processor, or to successfully train a large neural network. And while theoretical physics no doubt can explain it all, it's not sufficient to build something that works. I think the fact that we're seeing so much progress in deep learning without any theoretical basis is another example of this.

[1] Antifragile, Chapter: History Written By the Losers


This isn't correct.

No amount of compute power will let a RNN/LSTM outperform a Transformer.

No amount of compute power will let an AlexNet or Inception style network train as well as a ResNet.


> I think the fact that we're seeing so much progress in deep learning without any theoretical basis

I don't understand what you mean by this? Any neural network is first build in theory and then tried out practically by coding and testing. I think practitioner and scientist play equal role.


> progress exploded

Did it? Deep Learning provided us with some mindblowing performance art (FaceApp, DeepFake, etc.), but as far as I know any actual business value that claims to be derived from 'AI' is fake news.


Voice recognition, translation, OCR, image classification, etc. is done through neural networks mostly nowadays.

If https://en.m.wikipedia.org/wiki/Google_Translate does not seem like having business value to you, then I don't understand what you mean by "business value".


> If https://en.m.wikipedia.org/wiki/Google_Translate does not seem like having business value to you, then I don't understand what you mean by "business value".

There's no way Google Translate will ever be monetized. So yes, it's effectively an expensive art project for Google, not something that brings real business value.

Hotword detection ('ok google') is a practical application of AI, but that's really slim pickings considering how hyped 'AI' was. (Also, the costs of data collection for this feature are still way too high.)


I think we are really torturing language here. I don't want to get into specifics of what constitutes business value, how to value brand and customer satisfaction and so on. I can add more examples like https://deepmind.com/blog/article/deepmind-ai-reduces-google... that shows how neural networks helps in the pure bottom-line sense you care about.

Let me close my argument: neural network techniques developed in the last ten years are super useful here and now, they are used by billions of people every day and they do make their lives easier.


> I think we are really torturing language here.

Not really. My point is simple: as of 2020, any proposition for a business to 'invest in AI' is a money-losing proposition. (Just like 'blockchain'.)



Why are you so confident Google Translate is not monetized?

"We collect information to provide better services to all our users — from figuring out basic stuff like which language you speak, to more complex things like which ads you’ll find most useful, the people who matter most to you online, or which YouTube videos you might like..."


> Why are you so confident Google Translate is not monetized?

Obviously I don't have access to Google's accounting sheets, but there's no way in hell Google Translate pays for itself if you take all the costs of R&D into account.

Reason being that professionals who might pay big money for it want stuff other than just shiny 'AI' - they might want stuff like professional dictionaries and cited sources, etc.

And obviously people who translate random webpages for the lulz won't be paying much for this service.


Google's main income comes from its adverts business and that heavily relies on data collected from its users. Mail, translate, maps, and search are monetized. It's just not you paying for them.

If you all of a sudden start using Google Translate for Spanish and you think the phrases you are translating won't get used to serve you more targeted adverts for your upcoming vacation, you're being a bit naive.

Google Translate is being monetized. It's as easy a guess as yours to say that there's no way in hell Google Translate stays around if it didn't already pay for itself a long time ago. But let's stop guessing please.


Here's the pricing for translate:https://cloud.google.com/translate/pricing

I personally know companies spending tens of thousands per month on Google Translate.


I'm at my third company making money from neural networks, so I don't know where you get this idea.

2 have been around text and one on video.


Business value is hardly the only or even main determinant of progress.


Of course, just pointing out that not everyone's point of view agrees with the glib pronunciation that 'progress exploded'.

In my little world attitudes have soured on 'AI'; 'AI' is posed to become the next 'blockchain'.


This is the natural course of the hype cycle. Like kids with a new toy, infatuation and then boredom. But the underlying technology in this case is absolutely world changing. I think we are only a few orders of magnitude away from systems that are genuinely intelligent. GPT-5 or 6 will have your jaw on the floor and GPT-10 will probably be smarter than all of us.


I think we are only a few orders of magnitude away from systems that are genuinely intelligent.

A few orders of magnitude is potentially very significant.

It's worth remembering that despite the hype and fake-clever systems playing chess and constructing plausible sentences, we still have nothing remotely approaching AGI at the moment, and no reason to think it is on the horizon.

But I agree that the AI tools we have have made significant progress in the line of powerful un-intelligence, and will have significant impact in automation.


"It's worth remembering that despite the hype and fake-clever systems playing chess and constructing plausible sentences". I completely disagree. I'm a connectionist and subscribe to the "Meaning is use" mantra from wittgenstein, when it comes to language. Language is the medium of thoughts, if GPT-x understands language, it is a general intelligence. If you read into the the output of GPT-3 you can see glimmers of genuine intelligence. An intelligent machine is not something you design, it is a machine that designs itself. All that is needed is to step out of the way of the machine. All you need is gradient descent and extraordinary processing power, there's no magic sauce left to find. Being clever and creating complex learning strategies just buys you more power at the expense of generality. Transformers and convolutional layers and the other tricks we have are probably now enough to get us within shooting distance of AGI. If you look at the progress of machine learning the best predictor of performance is processing power. I see no reason why it will not continue to scale. In the minds of many there is something very un-satisfying, unsettling even, in the idea progress without understanding. But the very definition of intelligence requires that you give up the responsibility of understanding to the machine. Now there is probably an overarching theory here that we don't understand - why is it that search within the space of algorithms described by a neural network is an effective strategy when solving the problems we tend to encounter in the real world? And that probably isn't a pure theory question, it's a question about physics - why do the emergent physics of the world permit easy solutions within the space of possible functions described by neural networks? Why are these solutions discoverable? No one has a handle on these questions yet, I suspect it's a very very deep question and unlikely to be solved within our lifetime. So everyone should "stop worrying and learn to love the neural network"


In other words, shut up and calculate? :)

That's an interesting POV, "if it looks like a duck... it doesn't matter that we don't understand why".

I agree with your point that (as per evolution) there is no reason why we can't just set the initial conditions and set it running and with luck/wise choices (tho the latter implies some understanding, which we don't have) an AGI will emerge.

However, I don't see any evidence of that despite the impressive power of GPT etc.

One reason why is that while I agree that "meaning is use", the use in the human case is in our interaction with the world and our inner thoughts.

The use in the GPT case is more of a Chinese Room style affair, isolated, unthinking and without motivation or interest.

During the typing of this comment I've considered a myriad of things, some related, some not - for instance, going to get a coffee in a minute, turning away to respond to a Whatsapp message. This motivated, conscious and subconscious context switching is just one example - let alone the myriad emotions, memories, feelings, inner life and experience that drive and inform it.

Intelligence as surface is only plausible in the philosophic zombie sense.

And we have nothing yet that comes near to mimicking even that general intelligence at surface level, just (massively impressive) powerful niche-explorers and domain entity generators like GPT.


AI is like the internet pre 2000's or electricity a century ago - barely been applied to the world. It's not time to get disappointed yet.


It's also like AI last century also in the lack of progress towards AGI, so plenty of time to already have been disappointed - certainly nothing to get excited about.


I agree with both those claims completely. Absurd hype, discussed many times here on HN.


It's one of the fundamental critiques Chomsky already brought up decades ago with the ascend of statistical methods in linguistics.

They may be useful in the sense of generating commercial value, just like ML is useful to random-walking you to some solution that you couldn't have come up with, but it is not science.

There is little insight to be gained from this, it is more guesswork and art than anything else, and likely at some points the practical results will diminish as soon as one stumbles about more fundamental problems and has no model or structure to reason with.

ML is essentially behaviourism on steroids.


My problem is that Chomsky seems ardently committed to the "botanical" approach to natural language as the One True Way (TM).

By this, I mean the assumption that there is some universal logical structure to human language, and that our job is simply to come up with the labels for each constituent part and arrange them all correctly like a jigsaw puzzle. Once the puzzle is complete - boom! We've "solved" language.

That would have been a reasonable starting point back in the 60s, but I don't think it's advanced the field nearly as far as statistical methods.

Chomsky would retort that "advancing the field" is meaningless unless it allows us to "understand the field". But if natural language is a largely arbitrary collection of rules in constant flux, there would be very little of this "understanding" to be had. You could be bashing your head against the wall for centuries trying to induce structure into a bunch of symbols where none was ever to be had.

Now I do believe that purely statistical methods will hit a wall - the same way you could never teach a baby to communicate by throwing it a copy of Wikipedia and nothing else.

But if some parts of language are essentially arbitrary, then statistical methods are the best available tool to uncover that.


Language is infinite but the generating process (i.e. our brains or computers) are finite. The generators come in two flavors: rigid rule based (like Chomsky suggests) or stochastic (more modern view). The stochastic generators are obviously more powerful but difficult to "understand". Unfortunately, we associate the word "understand" with ability to have complete list of static rules. In my opinion, probabilistic rules are also as much "understable". Then there is dynamical probabilistic generators which I think our brains aren't evolved enough to truly "understand".


> Now I do believe that purely statistical methods will hit a wall - the same way you could never teach a baby to communicate by throwing it a copy of Wikipedia and nothing else.

Nick Chater and Paul Vitányi wrote papers[0] demonstrating that "the learner has sufficient data to learn successfully from positive evidence, if it favors the simplest encoding of the linguistic input":

- ‘Ideal learning’ of natural language: Positive results about learning from positive evidence,

- The probabilistic analysis of language acquisition: Theoretical, computational, and experimental analysis,

- Language Learning From Positive Evidence, Reconsidered: A Simplicity-Based Approach.

[0] https://homepages.cwi.nl/~paulv/learning.html


>> Chomsky would retort that "advancing the field" is meaningless unless it allows us to "understand the field". But if natural language is a largely arbitrary collection of rules in constant flux, there would be very little of this "understanding" to be had.

Imagine a historian of the future reading reams of code written for all the software we have in the modern era, in all the programming languages of the modern era, from Java to Brainfuck and back. Would the historian of the future be justified to conclude that there is nothing that connects these languages with one another, that there is no common structure, nothing that can unify them, "very little understanding to be hand"? If so the historian of the future would be wrong, because all modern programming languages are designed to run on the same substrate, modern digital computers. The existence of the digital computer is the unifying characteristic that explains the existence of any modern programming language.

In the same way, the variability of human language, internal or external, is without any shadow of a doubt explained by some processes in the human mind. These processes may be too complex to understand, perhaps, but understanding them is a worthy scientific goal. And there is, indeed, much "understanding" to be hand in that direction.


I think Chomsky's opinions held back linguistics by decades, and we are only making progress because of the ascension of statistical methods.

Chomsky didn't just argue that the statistical method lacked insight, but that humans can't really learn language statistically. Given that our artificial neural networks are much worse at learning than humans are, and yet they can simulate language much better than expected, this refutes Chomsky's arguments.

Non-statistical approaches have basically zero successes, commercial or otherwise. ML at least gives us evidence of what aspects of language statistical models can explain and what they cannot, which is a hint for future research if linguists are just willing to take it.


Chomsky didn't just argue that the statistical method lacked insight, but that humans can't really learn language statistically. Given that our artificial neural networks are much worse at learning than humans are, and yet they can simulate language much better than expected, this refutes Chomsky's arguments

How does demonstrating what artificial neural networks can do refute the argument that humans can't learn something in the same way?

All it does is refute the idea that artificial neural networks can't do that.

I'm not commenting on how we learn or Chomsky's correctness or otherwise; just pointing out the unjustified leap from what AI tools can do to what actual intelligence does.


Chomsky argued that it was literally impossible for a statistical model to learn language, so to the extent it's possible, it refutes Chomsky's argument.

We won't know how humans learn language until we understand how humans learn language, but the fact that a crude general-purpose model like ANN works as well as it does is certainly evidence that humans can learn language statistically. We don't know for sure, but we rarely do.


> but the fact that a crude general-purpose model like ANN works as well as it does is certainly evidence that humans can learn language statistically

Humans learn language intuitively (subconsciously), as well as formally (through academic training). Humans understand language (to a greater or lesser extent in any particular case).

Statistical NLP is not, in any human-translatable way, understanding. It is approximation and prediction, in a purely mathematical sense.

Perhaps one of us misunderstands, or is imprecisely using, the phrase "learning language statistically", but I am unaware of any evidence that humans do so.


Chomsky's linguistic theories are not about understanding. Hence his famous example of "colorless green ideas sleep furiously", which is a meaningless sentence that we recognize as grammatical.

Actually understanding language the way humans do would require AGI, I suspect.


Chomsky argued that it was literally impossible for a statistical model to learn language, so to the extent it's possible, it refutes Chomsky's argument.

That's a different claim to the one I countered.

I agree with your last point with the caution that there is no reason yet to think the hyped current AI tools have great relation to the way the human mind works, solely on the basis of the lack of progress towards AGI.


>They may be useful in the sense of generating commercial value, just like ML is useful to random-walking you to some solution that you couldn't have come up with, but it is not science.

There is little insight to be gained from this, it is more guesswork and art than anything else, and likely at some points the practical results will diminish as soon as one stumbles about more fundamental problems and has no model or structure to reason with

This is just normal for the scientific process, and I'd argue that ML currently is more rigorous than any of the soft sciences. The truth is that we likely need a new field of math to properly model neural networks...until then we are gleaning vast amounts of empirical but still scientific data as to the behavior of this new tech.

The proof is that these nets work and are already doing amazing things.


This seems to be more a broadcasting of the author's values than anything else; the part of it which is objective observation is IMHO obvious.

I think there is an appropriate generic response to critiques like this, which is that people are generally doing their best, and they may have different motivations for getting into the field than you do, and this diversity is a good thing. Probably every nascent science has its Faradays and Maxwells, who complement one another.


There was a similar discussion in another thread, someone pointed to the piekniewski.info blog and I found that article that I think is spot-on (though I don't like the blog overall because it seems to have a real bone to pick).

Scientific progress is marked by hypotheses that explain observations and generate predictions that can then be verified (or falsified) by new observations. Machine learning research has produced scant few such verifiable hypotheses in its very long history. To a great extent the same goes for the entire field of AI. I quote from John McCarthy's article on the Lighthill Report [1]:

Much work in AI has the "look ma, no hands" disease. Someone programs a computer to do something no computer has done before and writes a paper pointing out that the computer did it. The paper is not directed to the identification and study of intellectual mechanisms and often contains no coherent account of how the program works at all. As an example, consider that the SIGART Newseletter prints the scores of the games in the ACM Computer Chess Tournament just as though the programs were human players and their innards were innacessible. We need to know why one program missed the right move in a position - what was it thinking about all that time? We also need an analysis of what class of positions the particular one belonged to and how a future program might recognize this class and play better".

McCarthy wrote that in 1974. The criticism is every bit as valid today as it was back then. Machine learning and AI research remains an endeavour that is rarely and only incidentally scientific. Despite 70 years of AI research and the recent amazement at the "progress" in machine learning (a "progress" only in the context of the field's own measures of progress) we have learned very, very little from AI research that we didn't know already. Big machines can compute big programs. So, what?

_________

[1] The quote is from McCarthy's review of the Lighthill report. The review was published in the journal Artificial Intelligence Vol. 5, No. 3, 1974. A pdf copy is here:

jmc.stanford.edu/artificial-intelligence/reviews/lighthill.html

McCarthy is best known to computer scientists as the father of Lisp but he is also one of the founders of AI and the man who named it. The Lighthill Report was a report by James Lighthill on the progress of AI research, commissionned by the UK government and published in 1973. It was extremely negative and caused AI funding to freeze for years, thus bringing on the first "AI winter" and bascially killing Good, Old-Fashioned, logic-based AI.


Science isn't about understanding how something works. It is, as you say, about generating and testing hypotheses. In the case of machine learning, those hypotheses take the form of a program and an associated criteria. The hypothesis is verified by demonstrating the efficacy of the program. That a machine learning specialist is able to consistently produce effective programs demonstrates that they have a good understanding of the relationship between the program and its functioning. This is true even if they couldn't explain how the program "works" to the satisfaction of an observer.


>> In the case of machine learning, those hypotheses take the form of a program and an associated criteria. The hypothesis is verified by demonstrating the efficacy of the program.

This is the first time I hear anything like that. Can you say where this idea comes from? I've never seen it mentioned in any machine learning paper or textbook etc.


Recommend the paper "A Pendulum Swung Too Far"[0] as a nice take on Rationalism vs Empiricism. Interestingly, the paper was written in 2007 when the current crop of deep learning methods were not around (the provocative Hinton/Sutsekever/Krizhevsky paper was in 2012) ... so the pendulum has swung even farther! To the point that, as the post points out, we don't even have good statistical justifications for network architecture design choices. We have some answers, for some choices, but mostly a compendium of techniques empirically validated to work very well.

[0] http://languagelog.ldc.upenn.edu/myl/ldc/swung-too-far.pdf


>What does that tell us? A few things, first the authors are completely ignoring the danger of multiple hypothesis testing and generally piss on any statistical foundations of their "research"

This is a really poor take, considering I make the same gut feel choices professionally and they generally work in production.

This is still a brand new field and we fundamentally don't have answ re as to why many of these tweaks work better than others. That shouldn't stop someone from publishing a novel architecture or improvement.


I do not object to publishing a better result. Like the paper said serious craft goes into that and there's nothing wrong with typing it up and having it published. But it's not science; one day one time some how some way the black boxes have got to open up to address why. This is a very valid question and on point criticism. Even in basic software development why (eg requirements) should be known. It's not consequence free free to proceed otherwise.


>But it's not science

That's just not true. We are probing a novel domain.

In fact how else would you expect this to proceed? We've discovered a new phenomenon, which likely requires novel mathematics, yet through this exact kind of experimentation we are building the intuition that will guide more rigorous formalization later.

Sure, I get it, the quality on arxiv isn't the same as some physics journal; but to dismiss this as unscientific is not only wrong but very much unfair. I'm working the cutting edge at work, and we're documenting our discoveries as we map the structure of a new frontier - if that isn't science, I don't know what is.

tldr this is how science progresses in new domains before novel mathematics and formalisms are developed to address the new class of problems. This is a really exciting time if neural nets don't hit any serious blocks.

Edit: and by the way, my coworkers are all graduated educated scientists from various backgrounds. What else are they doing if not science?


Machine learning is not a brand new field. The name was first used in 1959 by Arthur Samuel. The first "artificial neuron", the Pitts & McCulloch neuron, was described in 1938. Rosenblat described the Perceptron in 1958. Backpropagation was first propsed to train neural networks by Werbos in 1976 and then again in 1986 by Rumelhart, Hinton and Williams. Even "modern" deep learning architectures like LSTMs and CNNs are already more than 20 years old. LSTMs were first described by Hochreiter and Schmidhuber in 1997. The Neocognitron was described by Fukushima in 1980. And so on.

Machine learning did not start in 2012.


Yes, the foundations for ML are somewhat old, but the explosion of progress has put everyone on the edge of a fresh domain for which we are only now developing a theoretical framework.

At the very least you must concede that applied machine learning is effectively a new field. Yes I'm sure a handful of neuron-like components have probably been assembled here or there in the past to do something interesting, but no one was going to college trying to make it big with a career in ML.


>> Yes I'm sure a handful of neuron-like components have probably been assembled here or there in the past to do something interesting, but no one was going to college trying to make it big with a career in ML.

I'm at a looss as to how to respond to your comment. You seem to really believe that before the big success of CNNs in 2012 there were no neural netwroks to speak of. That could not be farther from the truth.

Wikipedia has a decent introduction to the history of neural networks:

https://en.wikipedia.org/wiki/History_of_artificial_neural_n...

I suggest you start from there, then follow the links etc.


You're underappreciating how much more we are doing with neural networks now. Image recognition, general purpose contextual searching, NLP, accelerated 3D modeling, abstract problem solving (applied derivatives of alphago and other RL), and we're on the cusp of image synthesis, music synthesis, news summarization, not to mention translation...

All of this powered by dozens of exotic architectures and hundreds of discovered tweaks and optimizations - we are finally digging into the iceberg of which research before the 10s only scratch the surface. That's not to discount the visionary work of the past!

But if you plotted some general measure of progress in this field, you would see an enormous discontinuity in the derivative starting sometime in the 10s with GPUs and the open source/open science initiatives.

For all the shit I give to Google and Facebook and FAANG in general, they absolutely brought immense good to society by democratizing practical general function approximators. If there aren't any major roadblocks ML will touch every aspect of our near future lives, much like the internet. There's just too much potential - the cutting edge is just hitting the industry, our tech is already proven and it goes far deeper than classifying cats and dogs or digits. This is real science and it's huge, and it's a fact that even the money has never been remotely there for the kind of progress we are making; not to even mention the drastic difference between running ML on 1980s hardware vs a modern GPU. The kind of science we can now do from the comfort of our homes was impossible until very recently.

It's like coming out of the bronze age and starting to work with steel. We build the tools as we go along. This is part of the process and we're actually doing a great job.


>> You're underappreciating how much more we are doing with neural networks now.

From your comment above, that "a handful of neuron-like components have probably been assembled here or there" I understand that you do not have any background in AI or machine learning etc. I am curious then, where all this enthusiasm in the form of "we are doing X" statements comes from. In particular I'm interested in your use of "we". Clearly, you're not doing any of that stuff, so where does the "we" come in? Is it really prudent to express such strong views, without good understanding or personal experience of the subject matter? Are you adding anything to the conversation, by asserting all those things with such impetuousness, other than noise?

I imagine that your source for all this information are articles you've read in the lay press. Unfortunately, such articles can't very well represent the real state of research in deep learning. The truth is that there has been an enormous increase in the numbers of work on deep learning being published every month- there's probably thousands of articles written in that period and uploaded on arxiv or even submitted to reputable venues- and even researches in the field have trouble keeping up. What is abundantly clear however is that the vast majority of this work is very poor quality and even the published work is not much better. It's clear also that the vast majority of this work has no lasting impact and is superseded within weeks anyway. The truth is that deep learning research is in a deep crisis and despite appearances and breathless announcements by large companies, progress has stalled and no new things are really being done. Many of the luminaries of the field, including Geoff Hinton and Yoshua Bengio, have said this in various ways.


Wait a second. I am missing something. The definition of science is:

sci·ence /ˈsīəns/ noun the intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment.

I might be bias (as I write such GPU rigs), but from I don't see how is not a scientific process trying different architecture configurations.

I like the Sturgeon's Law reference from this thread.


I'd rephrase the critique to say that you're doing descriptive science. It's like a naturalist drawing pictures of birds. The most interesting work comes later, when someone suggests a theory to explain it all.


There's lots of theories! It's just that there's not really sufficient mathematics available to determine which ones are provably correct. (My favorite example of this is the seemingly never-ending arguments over the REAL reason that batch normalization works...)

And there are real advances happening because of advances on the theory side, as well: I would say that ResNet is an excellent example of this, bringing some insights from differential equations into model architecture, and greatly advancing the quality of classification.

All that said, I do worry that 'fundamental science' advances might get overlooked if they don't contribute to new high scores. For example, if you can prove that a certain model will 'work' (for some value of 'work'), the model may be hobbled by the need to prove things, and thus not competitive... In which case, the proof techniques might be lost in the howling void of the arxiv.


Was Edison's invention of the light bulb not science because Thomson hadn't discovered the electron yet? Should Edison have politely waited for a quantum description of charge before daring to build something with it?

The fact that we don't fully understand why neural networks are so effective does not imply it's not science.


> Was Edison's invention of the light bulb not science because Thomson hadn't discovered the electron yet?

Umm... yes it wasn't? Invention of a light bulb was not a scientific discovery. No new physical laws were discovered in the process. You can invent something useful by bruteforcing through many trials and errors or even by just plain luck, without creating any new scientific understanding along the way. (Btw Wikipedia does not even regard Edison as a scientist.)


For many and perhaps most state-of-the-art models, the answer to "why?" is "because it works."

As a practitioner, I would say Deep Learning is a trade. The more you work with these models, the more you develop intuitions and habits for things that work and things that don't, like a sculptor who learns to craft beautiful or functional objects by chiseling stone. ("Why did you hit the marble that way?" "Because it works.")

If we want to be more generous, we can call Deep Learning an experimental science, because some researchers working with deep models are truly doing methodical, tedious, experimental work and documenting it for posterity. They and everyone else, including me, hope that we will eventually be able to answer the "whys." But there's no guarantee, a priori, that we will find satisfactory answers.

Quoting Rich Sutton: "the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done." (Source: http://incompleteideas.net/IncIdeas/BitterLesson.html)

Quoting Geoff Hinton: "One place where I do have technical expertise that’s relevant is [whether] regulators should insist that you can explain how your AI system works. I think that would be a complete disaster. People can’t explain how they work, for most of the things they do. When you hire somebody, the decision is based on all sorts of things you can quantify, and then all sorts of gut feelings. People have no idea how they do that. If you ask them to explain their decision, you are forcing them to make up a story. Neural nets have a similar problem. When you train a neural net, it will learn a billion numbers that represent the knowledge it has extracted from the training data. If you put in an image, out comes the right decision, say, whether this was a pedestrian or not. But if you ask “Why did it think that?” well if there were any simple rules for deciding whether an image contains a pedestrian or not, it would have been a solved problem ages ago." (Source: https://www.wired.com/story/googles-ai-guru-computers-think-...)


>> As a practitioner, I would say Deep Learning is a trade.

"Trades" don't have conferences and journals and researchers whose job it is to publish in them. No, deep learning is a field of research and it should be treated as such and evaluated and -if necessary- criticised accordingly.


Deep learning practitioners in industry practice a trade. These individuals should not be treated, evaluated, or criticized as if they were scientists. However, I do agree that people who claim to be doing scientific research should be judged as scientists :-)

> "Trades" don't have conferences and journals and researchers whose job it is to publish in them

Actually, there is a remarkably large number of associations, conferences, journals, and researchers for a remarkably large number of trades. I mean, you can find researchers in, say, the trucking trade figuring out whether using one type of wheel or another can reduce costs per mile driven by a cent or two in a particular class of trailer truck. The number of trade associations in the US alone is close to 100,000, covering nearly every aspect and level of skill in our economy: https://www.google.com/search?q=how+many+trade+associations+...


>> Deep learning practitioners in industry practice a trade.

That's a different statement than the one in the previous comment that I quoted, that "deep learning is a trade". I disagree with that satement specifically. Deep learning is not a trade. It's a subject of research in the field of AI, which remains a scientific field, despite the shoddy science that is typical in it.

That there are people who apply (or try to) the results of deep learning research in the industry is another matter. People in the industry apply the results of computer science research. That doesn't make computer science research "a trade" in the sense that you say it.


Q: Why did you build that super-car, Mr. Ferrari?

A: Because I can.

Q: Why did you build those tractors, Mr. Lamborghini?

A: Because people are hungry and need food.

Q: Mr. Lamborghini, then why did you build a super-car?

A: Because fuck Mr. Ferrari.


Sturgeon's law.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: