PS: Kudos to DeepMind for pushing for median or even betten bottom percentile instead of simplistic average metric which also hides variance.
The sole contribution of DQN was to introduce two ugly band-aids, experience replay and a target network, to make the CNN training stable in an RL setting.
So, DQN = CNN (elegant and simple) + ugly band aid
The brain is massively complex, we still don't understand the ways that it computes and learns in detail except generalities such as using electrical and chemical signals, neurotransmissors, and some knowledge of architecture and region differentiation.
Those models all point to an extremely complex system with quite a number of different specialized parts, with neurons individually having quite amazing complexity (and so on for capsules, neuronal systems).
I think it's unlikely a "simple" system (if you could consider a vanilla CNN simple, for example) will solve human-like AGI efficiently (either data-efficiently or computationally efficiently). There are too many requirements.
Also, while simplicity is good... a researcher shouldn't be caught obsessing over it exclusively, as it is not the end goal, which personally would be the advancement of understanding and of our capabilities instead.
Again, not really. Broadly speaking, the AI community is pretty far behind the neuroscience community in terms of understanding the basic principles underlying how brains work. https://mitpress.mit.edu/books/principles-neural-design
The often quoted 6×10^9 base pairs overstates the complexity as a lot of non coding DNA exists, and much of what’s left is redundant or does not relate to the brain.
PS: We actually know quite about about how the brain learns and computes information. The are really interesting information about how the optic nerve encodes information for example.
Basically, DNA includes the design for both the car, car factory, and all the component factories that feed the car factory, and the mines that feed the component factories. People have dependencies on their diet for things like vitamin C, but that’s a whole other issue.
PS: Mitochondria also have DNA, but that’s just location specific and can simply be added to the total for the cell.
Same DNA that is referred to in other discussions as 'the ultimate spaghetti code'.
Expert systems were taylored to the problem at hand.
The components of the solution here are based on learning and are meant to address what are widely believed to be important and general facets of human cognition and/or fundamental machine learning problems (e.g. credit assignment, episodic and short-term memory, meta-cognition).
It's unclear how general the solutions here actually are, but they certainly don't look very specific to ATARI, on the face of it.
It's also worth noting that DeepMind's research has a pretty good track record of not being overly engineered to solve specific tasks. E.g. DQN, AlphaGo and successors (with the possible exception of Alpha*?)...
Criticisms about the estheticism of an algorithm do not make sense. We have objective metrics to judge codes and algorithms, such as Big O notation, resource usages, epoch, reproductability, etc. The questions the R&D team try to answer are
a) "Is this possible?", and b) "Is it efficient?".
This has nothing to do with estheticism. Simpleness is not a metric.
Neural networks are not simple to understand nor use. The Deep Mind team shows that they are able to use them as building blocks in different places of a bigger algorithm to solve a goal, iteratively. All of these results required time, effort and ressources to obtain. They are not obligated to share their results with the public, so now just stop to criticize without even suggesting a better solution.
For example, for Montezuma's Revenge:
- "average human" score is 4753.30
- Agent57 score is 9352.01
- human record is 1219200.0
Think of your best employee. You show them how to operate the videoconferencing system, and that's done.
Now think of your worst employee. You have to show them how to operate the videoconferencing system 3 times because they keep getting it wrong or not understanding how to start a meeting. Eventually, when they've got the hang of it, they're fine too.
AI is like an employee that needs to be shown how to do it a million times. Unless you have the time or the data to show the AI how to do a simple task millions of distinct times, or someone else has already done it, AI can't help you.
Here deepmind is just showing that for slightly more complex tasks, that ~1e6 factor turns to ~1e11...
A better analogy would be a baby and an adult. The adult is full of past experiences and can use that experience to learn some other tasks faster.
Currently, our AI technology isn't even at the "baby" stage, as it is not able to transfer knowledge between arbitrary tasks. This is an active research domain.
There are no fundamental new ideas in this paper compared to the preceding papers. What they do is to tune hyperparameters (BPTT length, exploration/exploitation tradeoff, and policy parameterization) in a smart fashion as to fit the bottom 5% of Atari games. Obviously the parameters, or equivalently, architecture choices, are tuned to achieve exactly that - good performance at the bottom 5% of Atari games. None of these choices will generalize outside of this specific set of Atari games.
The reasons we are doing badly at these games are well-understood. They typically require "world knowledge" (what is a key? what is a door?) and reasoning (I found a key, that can be used to open a door). That is, the visual representations need to encode such knowledge. Algorithms don't possess this world knowledge as they are not embodied in our world, so they need to learn it from scratch, i.e. brute-force it. That's exactly what this paper is doing - brute-forcing the solutions by finding just the right hyperparameters with millions of hours worth of compute.
A good analogy is what would happen if you took the game but flipped the pixels in some deterministic way so that the screen would look like noise to a human. A key would no longer be a key, but the structure is still the same. If someone asked you to solve Montezuma's revenge with that representation, you would not be able to. Does that make you stupid or non-human? So, because these games require human world knowledge, solving them in the same way as simpler games is kind of besides the point.
While Never Give Up (NGU) is not fundamentally new, it is an important step in computers learning. You need to be able to generalize solutions to problems where you don't have contextual information. Imagine if you were a caveman and asked to operate an iPhone. You're not stupid if you don't know how, but if I tell you, "never give up", and put you in a room for 5 years I'd expect some results from a sentient being. This is an important process too.
We would be much closer to good AI if it can figure things out by itself instead of being constantly fed "clean" data.
Does it teach us anything fundamentally new? No, it still has horrible sample complexity and does not generalize to anything outside of Atari unless you completely re-tune it. And I don't mean re-train. I mean changing the architecture and assumptions. That's different from projects such as e.g. AlphaZero or MuZero.
IMO this would've been more appropriate to be published as an open-source system so that it can be applied and tuned to other problems as opposed to a research paper. As research, nobody outside of DeepMind can ever reproduce this.
You are completely changing the topic with this:
> While Never Give Up (NGU) is not fundamentally new, it is an important step in computers learning. You need to be able to generalize solutions to problems where you don't have contextual information
We're not even talking about NGU, we're talking about the paper linked in this post. This specific paper proposes little new in that regard. It just engineers a system to do this specifically for Atari games by taking a previous paper and changing some parameters. Neat, but it's not some kind of breakthrough.
I thought it was interesting that they let the agent learn the exploration/exploitation tradeoff, also combining memory and intrinsic motivation.
Another contribution of this paper would be that they showed all these tricks can build on each other, thus, are complementary.
Humans brains are also a bag of tricks fine-tuned to the goal and requirements of making more humans.
The fundamental question is, is a representational (contextual) bootstrap required in the long run for a contained computational system to perform at human level across a large number of domains? This isn't a solved problem.
So yes, AI would be better if it could "figure things out by itself" however humans don't "figure things out by themselves" they come pre-wired with a lot out of the box and a lot of help cleaning the data (parents, teachers, literal labels etc...)
You mean, almost like a human baby?
Keep in mind DeepMind has a lot of money for PR - but nice prose and diagrams shouldn't affect your judgement of whether something is important or not!