
Why I’m Remaking OpenAI Universe - evc123
https://blog.aqnichol.com/2017/06/11/why-im-remaking-openai-universe/
======
Houshalter
Why not use game emulators? With popular NES emulators you can advance the
game frame by frame. You can read the raw memory addresses that correspond to
the score. You can dump the memory at any time and reload the game to a
specific game state. You can even manipulate the games in many fun ways by
messing around with the game memory. Or give an AI algorithm access to memory
addresses as additional information, instead of relying on pure machine
vision, if you want to do that..

Here's an example of a guy who made a general game playing algorithm that
brute forces it's way through any NES game:
[https://www.youtube.com/watch?v=xOCurBYI_gY](https://www.youtube.com/watch?v=xOCurBYI_gY)
This isn't necessarily interesting from an AI perspective - the playing
algorithm is just brute force. But it shows what can be done with the
platform, easily reloading to previous states and exploring counterfactual
futures (which is exactly the sort of thing RL algorithms do.) He also has a
cool algorithm for finding the objective function of an arbitrary game, by
watching a human play, and seeing what memory addresses increment. Which is a
lot more easy to use than writing OCR code to read the score and game over
states from the screen.

------
gdb
(I work at OpenAI.)

Great project. We've found that the VNC Universe environments are hard for
today's RL algorithms primarily due to the their async nature. We're currently
working on a new set of Universe environments without VNC; I'm very happy to
see others inspired by the core ideas of Universe as well.

~~~
pixelHD
honest question, how interested is the academia/industry in deep learning
libraries & game engines integrations? I've worked on unreal and tensorflow
the last semester, and I found out that there aren't any existing
integrations. I will probably work on a plugin, but I wanted to know if there
is any interest?

The way I see it, having hooks into the engines themselves helps with what the
article talks about - not needing to go through VNCs or other _glue_ to get
realtime data. It could potentially send the framebuffers themselves directly
from the game/simulation and tie in the actions back to the game/simulation.
And using framebuffers is just one direction, we could instead stream the co-
ords/the current payoff/etc.

Also, having such plugins would help with the adoption in both directions -
games now have an always updating/learning AI (might need a network connection
+ cloud backend), and researchers can have training/testing environments.

~~~
evc123
Arthur Juliani is about to open source his interface for connecting ML agents
to Unity3D game engine:
[https://twitter.com/awjuliani/status/879142906178785281](https://twitter.com/awjuliani/status/879142906178785281)

~~~
pixelHD
Wow, that looks great! I guess that's what i'm intending to do with unreal.

------
hackpert
This is great. Using HTML5 games in a headless browser makes a lot of sense
because the need for VNC is circumvented. However, I think that while OpenAI's
implementation is certainly not the best, having access just the information
on the screen is not a bad idea in itself as a (maybe optional) constraint.
With access to the game's internal state we don't even need RL for solving a
large number of games - algorithms like NEAT are sufficient.

~~~
Houshalter
This project doesn't change that. The agents still only get screenshots of the
game as far as I understand.

However I think this approach is bad. Machine vision is a separate problem
from reinforcement learning. You shouldn't need to be able to do both well.
Machine vision consumes a ton of processing power and researcher time in
figuring out the hyperparameters. And all it's doing is figuring out
information that's already in memory like the location of various objects and
the score. It really limits what can be done. E.g. the famous atari playing
AIs by deepmind were limited to no memory and only knowing the last few
frames, because backpropagating through thousands of frames was too expensive.

Because of the way NNs work, it's trivial to separate out the machine vision
into a separate module. So if you have a good RNN reinforcement learning
system, you can easily add a machine vision learning system to it later if you
need.

~~~
unixpickle
In terms of "backpropagating through thousands of frames", it's not as
expensive as you might think. I've used TRPO to train RNNs on games like Atari
pong with thousands of frames per episode. This can be done via an algorithm
that reduces the memory complexity of RNN backpropagation (these algorithms
didn't exist in 2013). See for example
[https://arxiv.org/abs/1606.03401](https://arxiv.org/abs/1606.03401).

------
daveguy
According to the author, "Universe never really took off in the AI world."

That's a bit premature for a project that was just released less than 7 months
ago, isn't it?

[https://blog.openai.com/universe/](https://blog.openai.com/universe/)

Edit: that said the project seems to have some interesting and needed
improvements (esp time adjustment). Glad to see dialog between muniverse and
openai here.

------
evc123
[https://github.com/unixpickle/muniverse](https://github.com/unixpickle/muniverse)

[https://github.com/unixpickle/demoverse](https://github.com/unixpickle/demoverse)

------
strin
Awesome project.

Despite the flaws, the nice thing with VNC is its universality to support any
apps on a computer. Using HTML5 in a browser limits the scope of things we
could encapsulate as environments, and makes it less "universe".

However, there is a difference between the universality of the tech stack and
the exposed interface. In my opinion, the future universe would be rich
clusters of RL environments with unified API, each of which implemented using
different underlying technology to meet the desired synchronicity and frame
performance.

HTML5 could deliver one of such clusters.

~~~
unixpickle
I'm pretty sure that was the goal of OpenAI Gym. Gym tries to provide a
generic interface for RL environments, and imho it does a nice job. I am
working on Python bindings for µniverse now, which should allow µniverse to
integrate with Gym.

------
dswalter
I'm a little surprised, but this seems like a good idea. HTML5 certainly has a
brighter present and future than flash, and skipping the OCR stem should save
quite a few CPU cycles.

------
zzh8829
I am also working on related project. Flash and HTML5 games in chrome are
great but they are very far away from the initially promised full blown GTA5,
Starcraft and other complex envs. I am in process of remaking the Universe
framework for host machine, since running those computation intensive games at
reasonable frame is nearly impossible inside docker or virtual machines.

------
misiti3780
Did openAI really unofficially abandoned universe ?

~~~
aerovistae
Yeah, really interested in hearing their take on this. It's not often you see
a Musk-sponsored enterprise cast a major project aside without public comment.

~~~
chronic61a
It's because the people actually working on AI, including OpenAI, finally
knocked some sense into Elon Musk. He finally realized how far behind AI is
(it is a glorified linear regression) and we won't be seeing general AI for at
least another 40 years.

Source: Am an AI research scientist.

~~~
computerex
Would be interested to know how you reached that 40 years number. I don't
think we are even remotely close to AGI, 40 years to me seems extremely
optimistic. That's within my lifetime.

~~~
ewjordan
Probably the same way everyone does, by pulling it out of thin air as a guess.
When nobody even knows what theoretical breakthroughs are necessary, you'll
always end up with a scattershot all over the place, even amongst experts. Try
asking working mathematicians how long until the Riemann hypothesis is
resolved one way or another, or look at what people were saying about Fermat's
Last Theorem up until it was solved.

What we do know is that current techniques won't get us close to AGI, so
something new is needed (or perhaps like backprop, something old will work
once we have enough compute power). Personally I'm bullish on AGI because I
have strikingly low faith in the ability of evolution to operate very
effectively as a tool for algorithm discovery, so I suspect that once we've
hit the compute threshold we'll find that many different algorithms can do the
trick, and 40 years is probably not out of the question for us to hit that
point (or 10, or 100), depending who you talk to about what the compute
threshold might be.

I'd caution against putting too much weight in what experts say, though, since
with a tiny few set of exceptions anyone working on "AI" today is actually
just working on narrow AI, which is, as someone put it, just glorified linear
regression. Those tools will almost certainly be part of the solution, but
only in the sense that the classical theory of Diophantine equations was part
of Weil's proof of Fermat's Last Theorem - they are not the core of the
theoretical approach.

~~~
evc123
Evolution has been running ~10^19 experiments in parallel for billions of
years: [http://reducing-suffering.org/how-many-wild-animals-are-
ther...](http://reducing-suffering.org/how-many-wild-animals-are-there/)

Evolution is a slow algorithm, but it had access to an absurd amount of
compute (all neuronal organic matter on Earth) and environment simulation (all
of physical reality on Earth) when discovering us; so the discovery of the
algorithms/architectures/principles in our heads shouldn't be viewed as
trivial.

~~~
backpropaganda
The massive compute/time advantage evolution has makes me bearish about AGI.
We really need to fix our compute capabilities before we can start overruning
evolution. The math dictates it'll happen, but exponentially slowly if we
don't innovate in compute.

~~~
espadrine
There's more to the story, too: advances on top of CRISP may give us better
tools to self-improve the species, accelerating evolution.

Personally, I'm bearish about AGI because I believe we will eventually realize
that the brain is a glorified linear regression too, with a custom wiring to
help learn language and vision.

~~~
computerex
What do you mean when you say that the brain is a glorified linear regression?

------
namuol
Funny, I have an old (unfinished) HTML5 space-exploration game by the same
name:

[https://github.com/namuol/muniverse](https://github.com/namuol/muniverse)

If I had more time I'd submit a PR to integrate it...

------
make3
I wonder what's happening with OpenAI. Most big names are leaving.

~~~
fggh
Please elaborate...

~~~
make3
Well, Ian Goodfellow and Andrej Karpathy for starters

------
zach417
I echo all of your issues with running Universe. I have a decrepit Macbook,
and it was actually not possible for me to use it at all.

~~~
forgotmyhnacc
If you have trouble running universe, how are you going to run RL algorithms
that use lots of gpu and CPU?

------
tomjacobs
Missed opportunity for a Rick and Morty Microverse reference here as the name

------
Cellestro
Congratulations on the initiative, it looks very cool! Indeed, we found that
running asynchronous environments, while possible, proved to be too cumbersome
for research. We're now working on a synchronous set of environments for
universe that are easier to use.

