Disclosure: I work on the open-source DL project Deeplearning4j: http://deeplearning4j.org/
However, as someone who builds them for vision applications I'm increasingly convinced that some form of ANN will underlie AGI - what he calls a universal algorithm.
If we assume that general intelligence comes from highly trained, highly connected single processors (neurons) with a massive and complex sensor system, then replicating that neuron is step one - which arguably is what we are building, albeit comparatively crudely, with ANN's.
If you compare at a high level how infants learn and how we train RNN/CNNs they are remarkably similar.
I think where the author, and in general the ML crowd focuses too much is on unsupervised learning as being pivotal for AGI.
In fact if you look again at biological models the bulk of animal learning is supervised training in the strict technical sense. Just look at feral children studies as proof of this.
Where the author detours too much is assuming the academic world would prove a broader scope for ANN if it were there. In fact however research priorities are across the board not focused on general intelligence and most machine learning programs explicitly forbid this research for graduate students as it's not productive over the timeline of a program.
Bengio and others I think are on the right track, focusing on the question of ANN towards AGI and I think it will start producing results as our training methods.
First, I'm curious where you thought I was focused on unsupervised learning? It certainly didn't cross my mind when I was writing this --- I was (implicitly) strictly talking about supervised machine learning.
My post actually is in support of what the latter of your comment says in a round-a-bout way. In general, the people that are making huge strides in deep learning (Bengio, Hinton, Lecun are obviously the big three) understand the capabilities and, maybe more importantly, the limitations of DL. My main point is that the ML community at large is actually not on the same page as the experts, and that causes many more problems.
I want us, as a community, to stop treating deep learning any different than any other ML algorithms --- have a consensus, based on scientific facts, about the possibilities and limitations thus-far. If we, "the experts", don't understand these things about our own algorithms, how can we the rest of the world to understand them?
I agree. It's interesting watching the "debate" around deep learning. All the empirical results are available for free online, yet there's so much misinformation and confusion. If you're familiar with the work, you can fill in the blanks on where things are headed. For instance, in 2011, I think it became clear that RNNs were going to become a big thing, based on work from Ilya Sutskever and James Martens. Ilya was then doing his PhD and is now running OpenAI, doing research backed by a billion dollars.
The pace of change in deep learning is accelerating. It used to be fairly easy for me to stay current with new papers that were coming out; now I have a backlog. To a certain extent, it doesn't matter what other people think, much of the debate is just noise. I don't know what AGI is. If it's passing the turing test, we're pretty close, 3 years max, maybe by the end of the year. Anything more than that is too metaphysical and prone to interpretation. But there have been a bunch of benchmark datasets/tasks established now. Imagenet was the first one that everyone heard about I think, but sets like COCO, 1B words, and others have come out since then and established benchmarks. Those benchmarks will keep improving, pursuing those improvements will lead to new discoveries re: "intelligence as computation", and something approximately based on "deep learning" will drive it for a while.
Well yes? If a Turing test you realize the simulation of some idiot in the online chat, it has long been there - and nobody wants. But the system, which can lead a meaningful conversation, today there is no trace. And there is even no harbingers of its occurrence.
<- This was translated by Google Translate from a piece of perfectly intelligible and grammatically correct text in another language. If this is the state of the art in machine translation, how on Earth can you expect a machine that can converse on human level in three years?
Although, your comment illustrates the main problem with the Turing Test. It depends on too many factors and assumptions that have nothing to do with the AI itself.
A good AGI test should be constructed in such a way that any normal person passes it with 100% certainty and no trivial system can pass it at all.
Here are 2 good papers--one from last year, one very recent--that might change your mind.
Over the past 10 years (deep learning got going in ~2006, really), the state of the art has improved at an exponential rate, and that's not slowing down. There are plenty of reasons to be bullish. Or at least, now seems like a bad time to leave the field.
The first one looks very impressive from the examples they've provided, but extraordinary claims require extraordinary proof. I will believe it only when I see an interactive demo. It's been nearly a year and I haven't seen it surfacing in a real product or usable prototype of any sort. Why?
Somehow all the papers that have "deep neural" stuff get 1/100th of the scrutiny applied to other AI research. I don't see anyone hyping up MIT's GENESIS system, for example.
The second paper has a really weird experiment setup. The point of one-shot learning is to be able to extract information from a limited number of examples. The authors, however, pretrain the network on a very large number of examples highly similar to the test set. Whether or not their algorithm is impressive depends on how well it is able to generalize, and they're not really testing generality -- at all. Again, why?
I suppose I should have stated better that I was talking more broadly, however the discussion about hand training being what holds the discipline back I think implies that unsupervised learning is the key.
Perhaps I misunderstood your writing, but I think there is a nuance there that isn't well discussed.
Feral children are incapable of learning a language as adults. But children don't require input (or feedback) from an existing language in order to "learn" a language; a community of children is capable of developing a full natural language without exposure to adult language. The problem the feral children encounter is not that no one teaches them to speak, it's that they're alone.
This is much as if you gave a neural network "supervised training" by letting it interact with several other untrained copies of itself, but no coded data. Does that meet your definition of "supervised training"?
(Obviously, a language created by an isolated group of children will be a new language rather than a known one. Children learning an existing language do need input, but there are some interesting quirks to the process -- they don't seem to need feedback, or use it if they get it, and in the natural course of things the input they get is 100% "correct" and 0% "incorrect". Try training a neural net to discriminate dogs from non-dogs when you can't code anything as "not a dog".)
I think as a second layer, yes.
To your original statement though, being alone is the same as no-one teaching them to talk.
Adults, and other children, in fact teach infants to talk. It might not seem explicit (though there are very explicit programs for infant language development) but in fact it works the same way. See my other comment about object segmentation and classification.
No it's not. You can have one without the other. Being alone implies no one teaching them to talk. However, children who are not alone still learn to talk despite no one teaching them.
This is where we disagree. The distinction is that you probably see training/teaching as intentional - but it's not always. You can (and do) learn just by watching and listening without it being an explicit procedural training.
Simply being around others = being taught.
The same thing happens among the children of pidgin-speaking communities -- they receive input that does not belong to any language and, acting as a group, construct a full language from that. (This process is called creolization.) But the case of deaf children is especially clear because they don't even receive the malformed linguistic input that creolizing children do.
That doesn't sound correct. Infants engage in active inference to resolve their uncertainties, engage in transfer learning between different domains by default, and use their transfer-learning abilities to learn new categories or domains from even just one example.
A perfect example I use all the time which is much later in development but relevant, is when a 1.5 year old will point at something and ask if it's a [thing]. The initial training was people segmenting (holding, moving different ways) classifying (calling it [thing], asking for [thing], telling the subject it is [thing]), which was then reinforced/weighted by asking if [thing] was indeed [thing].
So it doesn't look like explicit training, but it follows all the same steps.
This assumption is wrong, of course.
Let me ask a very simple question. What set of hand-engineered features gives <5% error on ImageNet?
Google's image and video recognition? Deep Dream? That's all convolution.
Speech-to-text? That's convolution.
AlphaGo? That's convolution.
Convnets are a great advance in machine learning, don't get me wrong. I hope that soon we get a generalizable way to apply convolution layers to text or music.
There's a few other major techniques (other than convnets) that are important, like RNNs in general.
My concern about the uses of convnets on text that I've seen is that I don't think they can deal with little things like the word "not". (The Stanford movie review thing can definitely handle the word "not", but that's different.) I'm unconvinced as of yet that we're convolving over the right thing. But maybe the right thing is on the way, especially when Google gave us a pretty good parser. And maybe the right thing involves other things like RNNs, sure.
I guess image recognition could have similar cases, and the results just look more impressive to me because I work with text and not with images.
I think there are some papers out of the IBM Watson group on question answering where they use ConvNets. I don't remember looking a the negation case specifically, but Question Answering generally has cases where that is important.
Yet no one solved ImageNet with some convolutions and hand-engineered features. You can't get that performance even if you shift a set of hand-engineered feature across an image in a convolutional fashion. No one solved it with any previous approach. (Are we to believe that CNNs are the very first method to ever try to exploit some spatial structure...?)
So deep networks do bring things to the table beyond hand-engineered features and you are simply wrong.
a. recent findings are documented incredibly well in both research and code
b. because of its success, there are many areas for useful contribution at relatively less effort from the researcher
c. because of its success, it'll help you develop marketable skills
d. it's fun
Maybe it won't solve General AI, but it seems like a damn good foundation for the person/people that will eventually come out with ideas that move us closer in that direction.
I don't see a huge competitive advantage in not releasing code. It's probably more beneficial to share ideas and collaborate. I imagine the reason it's not done more often than it is is because it takes time and effort to put this stuff out there.
The drawback is that we are still hand tuning architectures, slowly inventing (or incorporating) things like LSTM and the like into the model.
One goal would be to achieve a universal building block that can be stacked/repeated without the need for architectural tuning.
Maybe something that combines recurrence, one-shot learning, deep learning, and something stolen from AI (like alpha-beta, graph search, or something self-referencing and stochastic with secondary neural networks) into a single "node". Then we won't have to worry about architecture so much.
Artificial intelligence, if it can even be defined, does not seem like a particularly valuable goal. Why is emulating human cognition the metric by which we assess the utility of machine learning systems?
I'd take a bunch of systems with superhuman abilities in specialized fields (driving, Go, etc.) over so-called "artificial intelligence" any day.
Oh, I know that. In fact, I think there are some very useful and very hard tasks which even AGI (at least a basic one) would not be able to solve.
My point is that we should focus on applications, not person-ness. Whether a technique is similar to human cognition or not, or the result is similar to a person, is immaterial.
Put simply, I don't think it matters much if all our super-powerful machines are ultimately Blockheads.
> Why is emulating human cognition the metric by which we assess the utility of machine learning systems?
It's not. It's only of interest to people and researchers who are primarily interested in AGI.
I don't think one necessarily follows from the other though. We might be able to make an AGI with the intelligence of a human teenager. That doesn't necessarily mean it would be better at driving a truck than a more specialized algorithm.
> That would probably mean testing machine learning systems with something like Turing tests which no one is doing or advocating.
I see a lot of people complaining that various machine learning techniques won't lead to AGI and would never create a system which could pass the Turing test. Even this article laments that neural nets don't actually resemble human cognition.
I agree that an AGI wouldn't necessarily be good at driving trucks. I meant it in the sense that an AGI would be capable of producing the "truck driving algorithm" (if we humans can do it, the AGI can do it too, almost by definition).
> I see a lot of people complaining that various machine learning techniques won't lead to AGI and would never create a system which could pass the Turing test.
Most of the complaining seems geared towards media and outsiders who portray X technique as being the "Solution to AGI", not towards the techniques themselves.
One way to think of AI is what you say, highly specialized systems that have become very good at a specific task. This is very useful. It is also cool because you if you can define your problem well and come up with a fitness function then the algorithm solves your problem for you.
Another way to think about AI, and this is a much more general type, is that it could possible learn new things and create new theories by performing its own experiments much like a human scientist does.
I agree. A "scientist" algorithm would be quite cool and helpful.
Where I disagree is whether that scientist algorithm should necessarily have intelligence that in any way resembles humanity. For all I care, it doesn't even need to be able to talk to be useful.
A machine must also labor under these constraints, and so ultimately must produce knowledge in the form of language/semantics in order to be doing "science".
Might as well say that German mathematics isn't real mathematics, because it's all incomprehensible gobbledegook to someone who doesn't speak German.
I suspect the real issue will be difference in mental skills... there might not be a human who can understand what the AI understands, even if it is very good at explaining.
An alien system could equally do "science" even if their entire communication system was binary bits.
A purely machine science is anathema; if the machine does science, it must participate in the community of human scientists to validate its conclusions, or else its 'knowledge', however encoded, is as obscure and terrifying as the path of an impending asteroid.
Really, we cannot draw a clear line between a learning algorithm and intelligence. Any algorithm that can incorporate information from the world it is exposed to into itself and use it to predict that world is "intelligent" in a sense, and all machine learning algorithms do that.
Not really. Some symbolic/inductive systems do that. Traditional deep neural nets have a single model that is being adjusted to fit the distribution of input samples. That is not the same as creating a theory, let alone many theories.
> Reinforcement learning systems do explore their environment and perform experiments
Since when does trying something 1000 times with incremental changes qualifies as experiments and exploration?
Sure, we can't guarantee that the ways we have of training this model will be able to learn these function approximations. But the same is true of my brain.
So why would multiple models be better? Any set of multiple models can be aggregated and called one model, anyway.
Being adjusted to fit the distribution of input samples sounds unimpressive, until you remember it can in principle do it for all distributions. And it learns to generalize from the input samples to unseen samples. Making a model to generalize from what we've seen to what we haven't yet seen is exactly the same as creating a theory, as I see it.
Since when doesn't it? If it's 100 times, is it experimentation and exploration then in your eye? What about 10 times? Sounds like an arbitrary distinction to me.
If your definition of "experiment" includes stuff like practicing a baseball pitch 1000 times, it is too generic to be meaningful. Experimentation implies purposeful gathering of knowledge by trying different things.
I don't what you mean by the "system's output" when talking about theories, but it seems to me that even by your definition, the weights of the ANN can be understood as a theory.
Having several theories is the same as having one theory, "one of these theories is more correct", or possibly "some combination of these theories is correct". If you define it as one theory instead of several, you can still improve it. (As I recall, some RL systems have multiple "heads", in a way corresponding to multiple theories).
I think the "universal algorithm" in the article refers to some kind of emergent intelligence. Well, nothing that he mentions precludes it. Our brains aren't magical machines. Neural nets may not model real neurons, yet it is amazing how they can produce results that we identify as similar to the way we think. There is nothing in computational neuroscience that comes close to this. If anything, the success of deep nets bolsters my belief in connectionism rather than the opposite. I would expect it is very difficult to formulate "intelligence" mathematically, and to prove that DNs can or cannot produce it.
SVMs, AdaBoost & co, random forests - just to name a few popular ones.
>Neural nets are universal function approximators. There isnt something they can't learn.
Which types of functions you can approximate has nothing to do with being able to "learn everything". Simple neural nets are a great illustration of this point. Networks that can theoretically approximate the target function can easily overfit or oscillate if not set up properly.
In other words, the ability to approximate is the property of your model, while the ability to learn things is the property of your training algorithm.
Are humans universal approximators? Absolutely not. The mere proof that a nn with an infinite number of hidden nodes (which isn't practical, which could end the discussion there) can find a statistical mapping between any two input and outputs is hardly a benefit. There is something so much more to intelligence --- the ability to generalize well (aka not overfitting) is first and foremost.
I'd push you to ask why you perceive that stacking RBMs produces "eerily human-like" results. I'd argue that they don't: http://blog.keras.io/how-convolutional-neural-networks-see-t...
I do see many results in NLP for example that have an eerie human-likeness, even if nobody can explain why. I am not an expert in the field, but i would think it's a challenge to find similar results made using another technique.
On the face of it, I would have to disagree. It would seem to me that's exactly what the universe selected for, among many other things of course, in our tiny little corner of it.
Otherwise, we wouldn't be here debating it.
How about all the other universes that never generated intelligence?
But going back to our own universe, how widespread is intelligence? It's pretty much microscopic, isn't it? I think our universe seems to be mostly geared toward generating empty space, and the rare areas that are not empty are certainly extremely hostile to intelligence.
Our universe is massively hostile to both life and intelligence.
I'm oversimplifying here, but what linear regression and logistic regression did to kNN is that you can automate the "distance" function, but you still have to manually construct the features. What DL did is one step further -- don't even bother feature-engineering, the network can construct the features themselves.
You see, there isn't a function that kNN can't approximate. If you have an impressive feature list and a training datum that has every feature exactly the same as your input, there is no reason not to directly use the output of said datum. It's the feasibility that matters.
Of course, DL is a huge step forward. It has made "impossible things possible, and hard things easy". The author also acknowledged DL's importance. However, that doesn't mean we should stop at DL.
'important' sure but not always a good idea. When you are learning from finite (but potentially very large) number of noisy examples, the universal function approximator will try to approximate the noise also.
Learning is a delicate dance between the capacity of your learner and the complexity of the thing that you are trying to learn. The key notion is that it takes two to tango. If you have few examples, you are better of not using deep nets.
Stated another way, a universal approximator has infinite potential for distraction. "Hey shiny" and it could veer off in a direction you don't want it to go. You want to pick out army tanks from pictures and it might learn to distinguish pictures taken on a cloudy day from those taken on sunny days.
A Fourier series can model pretty much any function. Doesn't mean it's a good model for your problem.
Teach one to sort an arbitrary list. A universal function approximator is not a Turing machine.
 - http://lipas.uwasa.fi/stes/step96/step96/hyotyniemi1/
It's proved for a subset of functions. Maybe you can prove it for more.
The problem is, would deep learning scale efficiently that it is feasible to learn the "universal algorithm" with DL?
This is way too sloppy. There isn't something neural nets can't be made to represent - but possibility of representation isn't the same as learning. Learning is not a property of neural nets, it is a property of parameter-adjusting algorithms like backpropagation which don't exactly match up with the representational universality of neural nets.
"Neural nets" have nothing to do with brains or "the way we think." This is marketing crapola.
Neural nets have a little bit of something "to do with brains". They were inspired by connectionist ideas. I am all for avoiding marketing speak, but i dont find extremes useful.
I'm assuming the old feed-forward error back-propagation neural networks, before they got "deep". They had three layers (input, hidden and output) and were trained using straightforward error back-propagation (derivative of the training weights wrt the error).
This was a relatively popular technique in--I think?--the 90s (this is a guess, I started my masters ML in 2003 and at that moment neural networks were seen as a thing of the past). So at some point these traditional neural networks hit a wall. We wanted to use more layers but the error-backprop wasn't quite up to it. And as it later turned out, computing power and size of datasets were also lacking, but IIRC we didn't quite realize those aspects back then and mostly saw it as a limitation of the Neural Network training algorithm.
True, but that's kind of like saying Rule 110 is capable of universal computation — possible in theory, but not really useful in practice.
The logic is written by humans. The main mechanism by which computers / robots begin to outperform people in eg playing Chess, Go or Driving, is copying what works.
Humans outperformed animals because they were able to try stuff, recognize what works and transmit that abstract information using language.
The main advantage of computers is being able to quickly and easily copy bits and check for errors. You can have perfect copies now, preserving things that before could only be copied imperfectly.
And now you copy algorithms that work. The selection process might need work but the actual logic is still written by some human somewhere. It's almost never written by a computer. Almost all the code is actually either written by a human or at most generated by an algorithm written by a human, which takes as input code written by another human.
What's the "smarter" thing is the system of humans banging away at a platform, all making little contributions, and the selection process for what goes into the next version. That's what's smarter than a single human. That and the ability to collect and process tons of data.
All the current AI does is throw a lot of machines at a problem, and stores the result in giant databases as precomputed input for later. That's what most big data is today. Whoever has the training sets and the results is now hoarding it for a competitive advantage.
But really, the thing that makes all the system smart is that so many humans can make their own diff/patch/"pull request". Anyone can write a test and submit a bug that something doesn't work. That openness what made science and open source successful.
Open source has served the long tail better, too. Microsoft builds software that runs on some hardware. Linux has been forked to run on toasters. Open source drug platforms would have helped solve malaria, zika and other diseases faster.
If we had patentleft in drugs, we'd outpace bacterial resistance. Instead we have the profit motive, which stagnated development of new drugs.
Yes they can run unsupervised, and yes they run for multiple iterations. But they are written by humans for extracting features from data, not much different than, say, PageRank 20 years ago iterating to approximate the "popularity" of a page on the web.
This AI doesn't see a problem and come up with an algorithm to solve it. It uses algorithms written by humans to accomplish goals set by humans. Nearly all AI today is still simply a preprogrammed machine.
What I am saying, though, is that humans + a system becomes smarter than a single human over time because we find ways to express "what works" in such a manner that a massively parallel computing platform can then find a better answer, faster and more consistently. That's it. This is the algorithm written by humans and collectively refined by humans banging away at a platform.
And why open source would be better than the profit motive when it comes to drugs.
Anyways, computers definitely outdo people by orders of magnitude at computation, and its only recently that they've been able to beat us at pattern recognition. Apply some inductive reasoning and see where that goes.
Besides, humans aren't exactly the perfect beings. Just like we can build cars heavier and faster than us, it's no problem building machines that are beginning to take over what we thought were uniquely human things.
Of course, humans wrote the framework, but nothing to do with how to play the game. Granted, the goals are set by humans.
But at the core the approach we use is also really quite profoundly dumb (though I understand it’s easy to make such claims in retrospect). Anyway, I’d like to walk you through Policy Gradients (PG), our favorite default choice for attacking RL problems at the moment.
What the human is doing is identifying the problem, writing a solution, and simply letting the computer work out the parameters through various statistical methods. But the resulting algorithm itself is pretty much in the narrow class of algorithms that were already described by the human. The computer just executed a dumb and straightforward search through a space of parameters, which itself was a simple preprogrammed algorithm.
Look, most of our science is also pretty much parametrized models, often with smoothness assumptions for calculus. Now with Deep Learning we may indeed find more interesting parametrized models. But that is a far cry from understanding abstract logical concepts and manipulating them to come up with entire algorithms from scratch to solve problems.
Do we have any good reason to suppose that a parametrized model isn't enough for everything we'd want, including a system that has human-level or higher intelligence and creativity? (assuming adequate structure that we don't yet know, that allows for a sufficiently large solution space)
We have good reasons to assume that we ourselves, the sum of a particular person's memories, skills, identity and intelligence, are contained within a particular set of parameters encoded by different biochemical means in our brains, and the process of how we learn skills, facts and habits is literally a search through that space of parameters.
"far cry from understanding abstract logical concepts" is more related to the types of problems we're tackling - symbolic manipulation and reasoning is a valid but very distinct field of AI, but it's not particularly useful for these problems any more than it's useful to having a human programmer craft explicit algorithms for computer vision.
You would expect a computer system "to come up with entire algorithms from scratch to solve problems" if you were making a computer system to solve the general problem of machine learning, i.e., a program to replace the research scientist making learning systems, not the human who currently solves the particular problem. We aren't trying to do that, it is a bit of a different direction, isn't it?
Actually, I have better reasons to assume that a parametrized model would poorly describe a human brain. We grow organically out of cells replicating in an environment for which they have been adapted. These cells make trillions of neural connections. Each cell has its own DNA etc.
We have already tried understanding just the DNA using straightforward parametric models, and they are too organic to be described that way.
It is far more likely that human brains are specifically adapted to the world they live in, and can operate with abstract concepts which are encoded in fuzzy ways (like a progressive JPEG, for example) that allow us to apply concepts to situations and search for concepts that fit situations. The concepts themselves are the hard part. It's not really a parametric model. Each concept represents experience that is stored between neurons.
Yes, we can teach these concepts to a computer eventually but we would have to figure out a language to express this info and data structures to store it. We'd still be designing the computer to mimic what we think we do. Ultimately for the computer to truly replicate what humans do it might need to simulate a gut brain, neurotransmitters etc. And even it would be only a simulation.
I think computer intelligence is just of a very different sort that human intelligence. Less organic, far less ability to come up with new concepts or reprogram itself to "understand" concepts. It is fed parametrized models and does a brute force serch or iterative statistical approximations, and then saves the precomputed results, that's all. That's why humans can recognize a cat with a brain that fits inside your head and consumes low energy, and computers need a huge data center which consumes a lot of energy.
We aren't replicating human intelligence. We are building huge number crunchers, and the algorithms are still written by humans.
Even our languages are too tied up in organic experience acquired over the years (refereces to current events, puns, emotions, fear of some animals vs dominance over others, inside jokes of each community etc) that language recognition is currently quite dumb and has trouble with context. Once again we solve this by dumbing down the human input, making people talk to computers differently than they would if the computer had "understood" anything they would say as a human with similar experience would.
When computers write algorithms to solve arbitrary problems the way groups of humans do, then I'd admit we made a huge leap forward. As it is, AlphaGo and self-driving cars are the result primarily of human work and refinement of the algorithms. It just is amazingly smart because computers crunch numbers fast, consistently and replicate what works across all the instances.
Computer AI does raise philosophical questions of identity and uniqueness, but currently they are not capable of true abstract thought.
The closest system I know is Cyc: http://www.wired.com/2016/03/doug-lenat-artificial-intellige...
And once again the rules "we all know" were fed to it by humans through a language and data structures and code devised by humans and now we will judge whether it does well and replicate the result to millions of machines. We are still doing nearly all the actual design.
"Deep learning" is a red herring. They're just doing exactly what the shallow learning pioneers told them to do twenty years ago, just with more computing power.
None of it involves the word "deep", it's all just bog standard 1980 style nets with tweaks.
"Do Deep Nets Really Need to be Deep?"
While I (emotionally) agree, it will be interesting to see if the complexity (and non-linearity) of these algorithms permit 'emergent' behavior to appear.
There is a wonderful free course about this kinds of stuff:
Highly recommend it.
If those analogies hold, then intelligence is still another concept expressed by some configurations of matter. Who knows how much "latent intelligence" is available to be released, but I guess the assumption is that it's much greater than what's already manifesting in our brains.
Third step is usually finding optimum of the function. Deep neural networks help you to move complexity from step 2 to step 3. One example you mentioned, when feature engineering is moved from 2 -> 3. So you can use simpler methods on step2 to solve same problems, or extend problems area which you can solve with the same complexity on step2.
I've been working my way through http://neuralnetworksanddeeplearning.com/ (with a big detour back into maths thanks to the Khan Academy) and have done a few ML courses, but they mainly cover a couple of algorithms, not all the ones available in spark's MLLib or tensorflow for example.
Why people pull in artificial intelligence is both naively optimistic and quite understandable. Modelling something of a neural system is so close to how biological brain works that the parallel is blatantly obvious. On the other hand, the current deep networks do not translate to intelligence; not at all. Machine learning might be, in part, something we could describe as "intelligent" as it's able to connect dots that are very difficult to connect by traditional algorithms but it absolutely is no intelligence. Then again, we do hang out in the same neighbourhood. If we will ever create an artificial intelligence in software I'm quite certain it will be very much based on some sort of massively deep and parallel network of dynamic connections.
I'm not that interested in artificial intelligence myself. I would be interested in artificial creativity and emotional senses, but to model those there are bigger metaphysical questions to be answered first.
While the myriad nuances of the entire human body are indeed significant roadblocks to drug development, we have a long way to go before those concerns represent the primary bottleneck to progress. If a simulation of a protein's local environment were to reach chemical accuracy (via either some algorithmic breakthrough in quantum chemistry or the development of scalable quantum computers), that would be a huge boon to drug development.
I'm curious what HN users think the "universal algorithm" will end up looking like?
My own guess (wild speculation) is that we'll start moving in the direction of concepts like tensor networks. While that term sounds like it has something to do with machine learning, it actually falls under the domain of theoretical physics. Tensor networks are a relatively recent development in quantum mechanics that show promise because of their ability to extract the "interesting" information from a quantum state. Generally speaking, it's very difficult to compute/describe/compress a quantum state because it "lives" in an exponentially large Hilbert space. Traditionally, the field of quantum chemistry has built this space up using Gaussian basis functions, and the field of solid state physics has built it up using plane waves. The problem is that regardless of the basis set chosen, it appears as though exponentially more basis vectors are required to accurately describe a quantum state as the system becomes larger.
Tensor networks are an attempt to alleviate this problem. While it is true that the state space of an arbitrary quantum system is exponentially large in the number of particles, it turns out that for realistic quantum systems, the relevant state space is actually much smaller — i.e., real systems seem to live in a tiny corner of Hilbert space. And this tiny subspace even includes all of the possible states that one could put a collection of qubits into within the lifetime of the universe.
The projection of a system's state vector into either the position or momentum basis is known as the system's "wavefunction" (some texts allow more than these two bases). Since the wavefunction exhibits the highly desirable property of being localized in position/momentum space, this allows one to build up a good approximation to the state using Gaussians or plane waves — that is, unless the wavefunction exhibits strong electron correlation (quantum entanglement). Quantum entanglement is the exception to nature's tendency to localize state space about a point in spacetime, and thus it is frequently the case that the most commonly used basis sets are highly suboptimal for many real electronic systems (superconductors stand out as a notable and somewhat pathological example).
I'm not entirely familiar with all of the math behind it, but tensor networks essentially describe the small but relevant region of Hilbert space by exploiting properties of the renormalization group. In this sense, a compact way of describing "real world" quantum states is developed. I think this has applications to a "universal algorithm", because real world data rarely consists of a random or uniform scattering of information across the data's state space. In my own research, I've found that a lot of the NP-hard problems I run into are efficiently solvable in practice (stuff involving low rank PSD matrices) precisely because the data isn't random. If tensor networks are good at finding a basis set that is "local" in abstract Hilbert space with regard to some real-world set of quantum states, then it seems as though they would work equally well for a lot of the real world data that lives on a low-dimensional manifold in a high-dimensional space — the kind of data that machine learning (and eventually artificial general intelligence) seeks to tackle.
I doubt that a general algorithm exists (why should it?).
But well, if we are talking about human-level (or superhuman-level) AI, it is good to remember that WE are deep, recurrent neural networks (with a very different implementation, and spikes instead of floats, but still). If it work in vivo, why its abstracted version shouldn't work in silico?
While I get what he is saying here, and more or less agree, I think it is not to be taken lightly that there is a significant difference in this discussion now as compared to 30 years ago. The difference is not how neural networks work, which clearly differs but is related in some ways to the brain, but rather what neural networks see.
What is really significant when you can handle lots and lots of data, and throw it all at a giant neural network, is what we see happening in the network. The observation that the hidden-layer filters developed as an optimal feature for classifying images appear to be Gabor-like directional filters (I'm referring of course to this type of thing ) is not random, and not an insignificant result. It really does relate to perception, in the sense that 1) we know that the brain has directional filters in the visual cortex and 2) more importantly, from signal processing theory we know that such filters are "optimal" from a certain mathematical point of view, and if they develop naturally as the best way to interpret "natural" images (or other natural data, such as audio ), it shows that development of such filters in the brain is perhaps also quite likely. There is quite some research in neuroscience at the moment looking for evidence of such optimal filters in early neural pathways.
So yes, neural networks are not models of "how the brain works", but the newly established ability to process huge amounts of data, and to examine what kind of learning happens in order to optimise this processing, can tell us a lot about the brain -- not how it works, but what it must do. Complemented with work in neuroscience, the idea of modeling information processing is not unrelated and can really lead to some significant contributions in our understand of perception.. and perhaps, eventually, cognition -- but who knows.
The misunderstanding here is thinking that the be-all and end-all of neuroscience is studying how neurons fire and interact. Neuroscience is much more than that. Neuroscientists want to know how we experience and understand the world, and a big part of that is understanding what is required to process and interpret information, what is the information, what are its statistics, and what kind of neural processing would be required to extract it from our sensory inputs. Of course, this must be complemented by studies of how humans do react to stimuli, to try to verify that we do process information according to some model. But that model being verified -- that comes from what we know about information processing, and computer science can contribute there in a significant way.
I suspect your question was somewhat rhetorical, but since you asked.
The term has specific meaning within a specific field (not clearly identified), and has been growing rapidly (which is to say: hasn't reached a stable level of penetration) within printed material over the past decade or so.