Juergen Schmidhuber (the author of original LSTM paper) had a very similar idea, http://people.idsia.ch/~juergen/driven2009.pdf
"This drive maximizes interestingness, the
first derivative of subjective beauty or compressibility, that is, the steepness of the
learning curve. It motivates exploring infants, pure mathematicians, composers,
artists, dancers, comedians, yourself, and (since 1990) artificial systems."
> These choices make RND immune to the noisy-TV problem.
Just to add a bit about the point of the "AI winter reality check" X "Stephen Hawkings hype", I think both are right and wrong at the same time: I do believe that the analisis of the human reasoning and its simulation in computers can be achieved with much less computing power than that needed by most insects sensorial systems(so in this sense agree with Moravec, but...). The computing power can accelerate learning in a system, but its a question of time and nothing more, it doesnt matter really if you take a week or a year to "set-train-load" a reasoning human level AI, the achievement would be amazing anyway. So is the mistake, in my opinion, of all the wave-of-hype's direction: neural nets is just for acceleration and generation of datasets(what you say!? yes, generation, thats what they are really for, totally contrarilly to the world's opinion : ) !!! ) and , obviously for the "lesser utility" of sensorial system autonomous programming, that is what they are using it for right now, but it have nothing to do with "Reasoning Intelligence", that would be, in ma humble opinion, lets say, a programm that could take part and give some insigthful contribution in the famous talk between Albert Einstein and Rabindranath Tagore. That said, I do belive a non-neural net "real deal" AI is just around the corner, not that far, really, and here maybe I do tend a bit to the "Hawkings-hype" people, but completely differently from them, I do believe it has nothing to do with neural nets(that are just tools), and are...lets say: "non-neural networks heuristic based".
For example, speech recognition AI is supposedly within fraction of a percent from "average human level", and yet auto-generated captions are awful. They have no punctuation, they don't distinguish between different speakers, they aren't visually grouped, and fail miserably dealing with slang. So turns out researchers are measuring only one aspect of the problem their algorithm is good at and ignoring the rest.
On the flip side, we have animal intelligence. Bees aren't nearly as smart as humans. So surely modern AI, which surpasses humans in this and that, would have no problem outperforming a bee with its 960 000 neurons, right? But in reality, there is nothing even approaching bees' versatile intelligence. Of course, modern AI researchers would just hand-wave this saying the problem is not well defined. Convenient.
YouTube captioning != SOTA, any more than Google Translate for years and years represented anything close to the NMT SOTA.
(In this case, if an agent can beat 'human performance' by only clearing 1 of 9 total levels, one is entitled to a little skepticism about how useful 'human performance' is as a benchmark for this particular game. Focus on the improvement over other DRL agents, not that.)
That blog post also mentions the noisy TV problem.
I am not skilled enough to describe how these approaches differ.
> The environment also contains a TV for which the agent has the remote control. There is a limited number of channels (each with a distinct show)... even if the order of shows appearing on the screen is random and unpredictable, all those shows are already in memory!
So in google's "Curiosity and Procrastination in Reinforcement Learning" they could not handle a TV with pure noise (snow) as it could not remember all that noise.
But how do we decide whether the agent is seeing the same thing as an existing memory? Checking for an exact match could be meaningless: in a realistic environment, the agent rarely sees exactly the same thing twice. For example, even if the agent returned to exactly the same room, it would still see this room under a different angle compared to its memories.
Instead of checking for an exact match in memory, we use a deep neural network that is trained to measure how similar two experiences are.
If there is an unlimited number of shows and the agent walks right up to it, I think its still trapped.
Here are some interesting comment on the topic:
Is Global Reinforcement Learning (RL) a Fantasy?
But "RL" is currently used to describe both problems (MDPs etc), and solutions (RL algos).
RL problems are defined very broadly and have so many subtypes (episodic/non, with/without goals, continuous/discrete, ...) that you can frame most things as an RL problem. And "AGI" would by definition be able to solve any RL problem.
So imo to some extent this is tautological.
However, it's pretty disappointing that OpenAI, a non-profit organization that claims to want to distribute AI as evenly as possible, to not release the source code of their research to make their findings reproducible by others. This paper is an example, but also other high profile work, such as their DoTA bot.
Edit: My mistake, this paper was open sourced, see below comment.
The code is linked from the top of the blog post: https://github.com/openai/random-network-distillation.
While we do produce lots of open-source code, our mandate is much broader than that: https://blog.openai.com/openai-charter/. We are attempting to build safe AGI and distribute its benefits.
In the short term, that means we focus on building working systems and will sometimes, but not always, release source code.
In the long term, per the charter: "we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research."