
OpenAI Universe - sama
https://universe.openai.com
======
llSourcell
Hey guys, it's Siraj. OpenAI asked me to make a promotional video for it on my
Youtube channel and I gladly said yes! You can check it out here:

[https://www.youtube.com/watch?v=mGYU5t8MO7s](https://www.youtube.com/watch?v=mGYU5t8MO7s)

~~~
hacker_9
Nice video, but the jump from solving super simple 2d games, by feedback of
binary win/lose conditions, to solving tasks in 3d open world simulations will
require an un-imaginably gigantic leap in processing and knowledge.
Additionally neural nets have already shown they are not sufficiently good
enough at generalizing, and only work well at the specific tasks they were
trained for. So the idea that an AI that can play GTA would also be able to
'solve' climate change is odd.

~~~
altrus
I'm far from an expert, but I thought the poor generalized performance of
neural nets was largely associated with the complexity of the network (number
of neurons, etc), and the training data.

Is there something more specific about the application of neural nets to
generalized problems that makes them unsuitable?

------
mulcahey
With this platform (and Gym) it seems like a large part of their strategy for
"democratizing AI" is to grow the amateur research community. By making it
easier for an individual to play around and conduct experiments, they are
hoping enable progress to emerge from anywhere instead of just from wealthy
companies and elite universities.

It is also a great way to be able to track and organize what is being created
rather than having to sort through amateur projects scattered across the web
or research publications that often lack accompanying code.

Edit:

Some key ways they're making it easier for amateurs:

* Starting point for problems to solve

* Way to get noticed (instead of needing a university/company brand)

* Technological infrastructure for building and testing. The diversity of tools they brought together to build this platform is very impressive.

------
d_burfoot
Disclaimers: I cannot see the future. These are just my opinions. I really
appreciate the work and money that SamA, Elon, and others have put into the
OpenAI project. The Universe work in particular might help encourage young
people, many of whom love video games, to study AI.

But I feel that contrarians, such as myself, have an ethical commitment to
young people to voice our doubts and criticisms, so that they can avoid making
a long journey down a career/research path that leads to a dead end. That
being said, I think this project leads in a very unpromising direction. Here
are some reasons:

1\. Games aren't a good testbed for studying intelligence. In a game the main
challenge is to map an input percept to an output action (am I drifting off
the side of the road? Okay swerve right). The real challenge of intelligence
is to find hidden abstractions and patterns in large quantities of mostly
undifferentiated data (language, vision, and science all share this goal).

2\. This platform is not going to help "democratize" AI. To succeed in one of
these domains, contestants will need to use VAST amounts of computing power to
simulate many games and to train their DL and/or RL algos. DeepMind and others
will sufficient CPU/GPU power will almost certainly dominate in all of these
settings.

3\. Deep Learning, as it is practiced, isn't intellectually deep. With a few
exceptions, there is nothing comparable to the great discoveries of physics,
not even anything comparable to the big ideas of previous AI work (A*, belief
propagation, VC theory, MaxEnt, boosting, etc). Progress in DL mostly comes
from architecture hacking: tweak the network setup, run the training algo, and
see if we get a better result. The apparent success of DL doesn't depend on
any special scientific insight, but on the fact that DL algos can run on the
GPU. That, combined with the fact that, except for the GPU, Moore's Law broke
down roughly 10 years ago, means that relative to everything else, DL looks
amazingly successful - because all other approaches to AI are frozen in time
in terms of computing power.

~~~
hacker_9
_" Deep Learning, as it is practiced, isn't intellectually deep. With a few
exceptions, there is nothing comparable to the great discoveries of physics
... Progress in DL mostly comes from architecture hacking: tweak the network
setup, run the training algo, and see if we get a better result."_

To be fair isn't this what physicists do all day at CERN too? Smash some
particles together, analyse the numbers, try to find patterns, tweak a few
things and try again?

~~~
pakl
I take the point to be that there aren't "deeper" fundamental principles at
play in these models. Tremendous progress has comes from simply tweaking of
the numbers of layers, or how the feed forward to each other (skipping layers,
etc), or by throwing more computer power or data at the same basic algorithm.

Where might we look for deeper principles? One idea is to consider what brains
do and how they might be doing it. (I'm not saying we need to go down the
rabbit hole of biological detail -- on the contrary I'm suggesting we look at
known or even hypothesized principles of brain operation and import them into
AI.)

Two ideas we have used in our work: prediction (over time), recurrent feedback
(most brain regions have more feedback than feedforward inputs)

~~~
noobermin
As a physicist judging from the outside, I share some your feeling. Are there
general laws governing "learning"? Theorems? Are there "deeper" things to
learn as humans? The thing is people in the field don't need heavy intuition
or math. In some ways that's good (if you just want a result to utilize) and
in others, it's bad (if you are a curious person).

~~~
pakl
In a sense, I would say yes there are learning laws, but it's still early in
codifying them.

Along one axis, you could compare: supervised, semi-supervised, self-
supervised and unsupervised learning. Along another axis, consider that there
are versions of each method that take into account temporal/dynamic data,
versus others that require randomly shuffled static data.

In the current problems of visual perception, I think the field would benefit
greatly a shift to focus on multiscale interaction/dynamics rather than on
(static) statistics as it is currently (for more on this, see my colleague's
blog: [1]).

[1] Statistics and dynamics.
([http://blog.piekniewski.info/2016/11/01/statistics-and-
dynam...](http://blog.piekniewski.info/2016/11/01/statistics-and-dynamics/))

~~~
eli_gottlieb
>In the current problems of visual perception, I think the field would benefit
greatly a shift to focus on multiscale interaction/dynamics rather than on
(static) statistics as it is currently (for more on this, see my colleague's
blog: [1]).

Your friend's blog has a lot of good insights that I've seen in the
theoretical neuroscience and computational cognitive science literature as
well. Where do you guys work?

~~~
pakl
I work at LeEco US out of San Diego, and my colleagues work at other ML/AI
companies also in San Diego. We originally met and collaborated at Brain
Corporation.

------
flaviojuvenal
Related but slightly off-topic, there is a great sci-fi story by Ted Chiang
(the same author who made the story behind Arrival film) about humans raising
AIs in an artificial world. The premise is that if we want AIs to act like
humans, we must teach them like we teach humans:
[http://subterraneanpress.com/magazine/fall_2010/fiction_the_...](http://subterraneanpress.com/magazine/fall_2010/fiction_the_lifecycle_of_software_objects_by_ted_chiang/)

~~~
noobermin
The people who will end up raising AI's will not want them to act like humans.
You already see it in their current uses. They create them to maximize profit.
So, in a way, their owners (corporations) have created them in _their_ image.

------
state_less
I'd love to see AI, using games, master the art of determining a depth for
objects in the scene. If you ask a person, "about how far away is that car?",
they often give you an okay answer that is at least in the same magnitude as
the actual distance 1 m, 10 m, 100 m, 1000 m. If AI could do that, you could
then navigate an environment in the real world better using only a camera or
two. So you start with a virtual world that looks real, train up the bot, then
use it to navigate in the real world. Has this already been accomplished?

~~~
pakl
That's a great (and hard) problem!

More generally, imagine AI that could learn the physics of the world. For
example, if the ball is rolling away, the AI should be able to predict that
the ball will look smaller on the next frame.

Going further, if the ball is about to roll under a shadow, the AI should
predict that the ball will become a darker shade of green.

(After several years working in a robotics research company, these kinds of
capabilities are exactly what we determined would be necessary for robot AI.)

~~~
state_less
Agree, it's not easy. Learning the basics, for example projecting a rectangle
with 3d coordinates to 2d coordinates, then feeding the 2d coordinates into a
NN and ask for the (depth) third dimension. Can you teach the NN a perspective
transform? Can you rotate the rectangle and recognize rotation. Can you add
other rectangles to the scene and detect each? Can you add color and lighting
to infer more properties and get better results? Shine some more info on the
problem ;)

These are like unit tests of AI (basic shapes and transforms) and I agree
physical reckoning is at the top, one of the big tests that is a capstone and
something beautiful to behold in nature (eg. sports). Maybe the a virtual
soccer game at the end?

From my lidar experience, I wanted to reach for a model rather than deal with
noisy sensor data. I want to generate the output (3d world) with my model,
then the NN learns the inverse (eg. the scene graph used to generate the
scene).

I enjoy thinking about this stuff, though it really makes my head spiral
sometimes when I relate it to my own reality. It's easy to feel like you're
losing touch.

~~~
chronic92
This is a trivially easy problem if you have stereo cameras, just like humans.

------
poppingtonic
This is amazing! I was thinking of this problem when I saw a friend making a
stop-motion video. The steps are super repetitive and I asked him, "maybe an
DeepMind Atari-style RL agent can learn how to do this?" But I didn't want to
do what DeepMind did to emulate Atari games with an Adobe editing tool. This
is an experiment that I can now run.

~~~
halflings
Never saw how stop-motion videos are edited. What's so repetitive? I thought
that once you've taken all your pictures, you just put them sequentially and
removed any frames that seemed off. Maybe you also need to decide how much
time to leave every frame?

~~~
poppingtonic
This is true, but some are more complex, which is also due to the choice of
tool. For example, [1], made by Chris King, using Adobe Premiere. This is a
time-lapse of the process that he used to make parts of [2]. Notice the
pattern that emerges when sequential images start lining up.

[1] [https://youtu.be/M7Hr83OI-rs](https://youtu.be/M7Hr83OI-rs)

[2] [https://youtu.be/1-rFV_d6RH8](https://youtu.be/1-rFV_d6RH8)

------
Cybiote
This is astounding!

If requests are being taken, it would be useful to be able to search through
the listed environments. And a poker environ for the internet section would be
a good balance of fun, widely appreciable and a straight forward but very non-
trivial environment.

~~~
Romajashi
you'll lose your job and will be replaced by AI. Astounding?

------
NhanH
This is a bit out there, but it would be fun if OpenAI can get one of the mega
popular multiplayer games under this (WoW, League of Legends, DOTA etc.).

Imagine an AI team in League of Legends world championship!

~~~
conradev
ICYMI, DeepMind partnered with Blizzard to do this for StarCraft II:

[https://deepmind.com/blog/deepmind-and-blizzard-release-
star...](https://deepmind.com/blog/deepmind-and-blizzard-release-starcraft-ii-
ai-research-environment/)

------
shykes
This is perhaps my favorite use of Docker ever.

~~~
tlb
We've been pushing hard on some parts of Docker, and it's working pretty well.
For example, reconfiguring iptables depending on what game you ask for. And it
works fine to test things like that on my MacBook and then deploy to
Kubernetes. Amazing.

------
jclay
I noticed the OpenAI team wrote their own VNC driver in Go for performance
reasons[0].

I would love to hear more about how they were able to achieve increased
performance over other VNC drivers.

[0][https://github.com/openai/go-vncdriver](https://github.com/openai/go-
vncdriver)

~~~
gdb
We wrote it for somewhat subtle reasons. First, there aren't too many
alternatives out there — VNC is meant for human consumption, not for bots
after all :). Second, for a single connection, once you're using Tight
encoding, the bottleneck becomes server-side encoding and libjpeg-turbo,
neither of which will depend on your driver. As you scale to many connections,
the important thing becomes managing the parallelism well. Go is great for
this.

We'd started by adapting an existing Python driver in Twisted, implementing
additional encodings and offloading to threads for calls into C libraries like
zlib. We got this working reasonably on small environments like Atari, but for
environments which generated many update rectangles, we started to be bitten
by the GIL. I still believe that one _could_ make Python work, but it'd take
quite a lot of effort.

libvncserver is a fast C driver, but it's GPL, and doesn't have any particular
support for parallelization. We wanted Universe to be usable by everyone, from
hobbyists to companies, so GPL was a no-go. (We actually talked to the
libvncserver maintainers, who said that they would be interested in dropping
GPL restriction, but there have been far too many contributors over its long
history to figure out how to do so.)

Our Go driver, based on [https://github.com/mitchellh/go-
vnc](https://github.com/mitchellh/go-vnc), has scaled quite well. It takes
advantage of Go's lightweight thread model: each connection runs in its own
goroutine, which makes it easy to run hundreds of connections in parallel
without needing hundreds of threads.

------
soared
[http://reddit.com/r/WatchMachinesLearn](http://reddit.com/r/WatchMachinesLearn)
is about to get a lot more popular. I can't wait. Also from the linked blog
post, you can play with (against?) your agent in realtime:

>You can keep your own VNC connection open, and watch the agent play, or even
use the keyboard and mouse alongside the agent in a human/agent co-op mode.

------
minimaxir
Interesting announcement timing at 10:30 PM PST on a Sunday. :P

The list of third-party gaming partners is extremely impressive, and a Docker
config helps resolve the dependency hell that some of the AI packages require.

~~~
mappingbabeljc
We wanted people at NIPS in Barcelona to have something nice to read over
their morning coffee and such. [I work at OpenAI - @jackclarksf on Twitter]

~~~
Hydraulix989
Well, you just got a few extra followers.

------
Hydraulix989
What is state of the art in reinforcement learning right now?

[https://arxiv.org/abs/1602.01783](https://arxiv.org/abs/1602.01783)

Is there a way to deal with "sparse" training data (state, action, reward)
triples -- sparse in "state"?

~~~
sapphireblue
Looks like the "UNREAL"
([https://arxiv.org/abs/1611.05397](https://arxiv.org/abs/1611.05397)),
"Learning to reinforcement learn"
([https://arxiv.org/abs/1611.05763](https://arxiv.org/abs/1611.05763)) and
"RL^2" ([https://arxiv.org/abs/1611.02779](https://arxiv.org/abs/1611.02779))
are state of art in pure RL for now.

Finally there is a trend of using recurrent neural network as a top component
of the Q-network. Perhaps we will see even more sophisticated RNNs like DNC
and Recurrent Entity Networks applied here. Also we'll see meta-reinforcement
learning applied to a curriculum of environments.

~~~
Hydraulix989
The crazy thing is that these stacked model architectures are starting to
become another layer of "lego blocks" so to speak.

------
CodinM
I'll just go on a limb and consider this to be fucking awesome.

------
grondilu
All the listed PC games environments are tagged as "coming-soon"

[https://universe.openai.com/envs#pc_games](https://universe.openai.com/envs#pc_games)

------
cing
End game; I'd really like an AI agent for "in real life" tabletop games (like
boardgames).

~~~
iampherocity
I call those friends.

~~~
ctchocula
There are hardcore boardgames you will find difficult to find human players
willing to play with you. Campaign for North Africa takes 8-10 players and has
an estimated playing time of 1000 hours [1]. An excerpt of a review written
for this game:

> Are you a logistics major? Are you masochistic? Do you think that the
> calculations required to play a game should take longer than actually moving
> the units? Then do I have a game for you! Get yourself a copy of The
> Campaign for North Africa, and say goodbye to the family for a couple of
> months, if not years.

The Campaign for North Africa is the most detailed game that I have ever
played. It isnt necessarily the most complicated, but for sheer size of the
detail and planning involved, it is by far the most laborious and detail-
oriented game that has ever been produced. As a first example, this is the
only game that I know of that differentiates between British and German jerry
cans for fuel. More about this later on.

The Campaign for North Africa is Richard Berg and SPIs simulation of the war
in North Africa in the Second World War. The seven foot long mapsheet (divided
into five sections), two sets of rulebooks, charts and tables galore and, oh
yes, thousands of counters complete the game in a nice sturdy box, not the
usual SPI flat game holder that falls apart. Most of this is standard SPI
fare, with the functional but not pretty counters, standard three column style
SPI rulebooks, and a fairly attractive map that does an excellent job of
creating an epic sense of scale. True, this is the desert, and most of it is
desolate, but the numerous tracks and roads, the coastal plains and mountains,
and the railroad (both already built and railroad you can build as the game
goes on) all combine to present an appealing picture of the area.

Each turn is one week of time, and each turn is broken down several stages.
There is an initiative determination, naval convoy stage, stores expenditure
stage, and then three operations stages. The Ops Stages are where most of the
activity occurs. There are also stages that are used in the air game. I did
not play the Air Game for the purpose of this review, but did play with the
advanced logistics.

The game also includes on of each type of chart, which can be used to make
copies. I made my own in Excel. There are charts for Division and Brigade
organization, truck convoy sheets, naval convoy sheets, prisoner sheets,
broken down and destroyed vehicle sheets, supply dump sheets, sheets for the
air game and more. I even created a couple of my own for production and
independent units. As each Division in the game needs its own Org chart, which
fit best on legal size paper, these are a lot of charts and sheets to keep
track of. All of these must be filled out before the game even starts, and
just setting up for the beginning of the game requires filling out hours
(literally) of paperwork. And for heavens sake, dont use pen! Much of what you
write in the charts at the beginning of the game will be erased by the end of
the first turn. After every movement, every combat, just sitting there and
doing nothing will require updating of the org charts for every unit in the
game.

[1] [https://boardgamegeek.com/boardgame/4815/campaign-north-
afri...](https://boardgamegeek.com/boardgame/4815/campaign-north-africa)

~~~
ishi
That sounds horrible. It also sounds like a game that should definitely be
played on a computer, not as a board game.

~~~
nannal
I don't know about you, but I love playing unreal tournament by moving rocks
around on the ground similar to [https://xkcd.com/505/](https://xkcd.com/505/)

------
noobermin
>other applications

Any applications with a keyboard and mouse? Can I use emacs and have it start
learning to code?

~~~
moconnor
Sure, just define a good score function...

~~~
levesque
That should be easy, I'll need a 5 million dollar grant and 5 years.

~~~
dagw
How much do you need to train a model to write grant applications for 5
millions dollars or more over 5 years.

------
jakozaur
Browser tasks seems to be a greenfield field with amazing potential.

What if AI can do anything what can human do you with a browser over the
phone?

Also love "bring your own Docker container format".

~~~
tianlins
Well, if all GUI interactions can be automated, what would be our next human
interface to computers/AIs?

~~~
robryk
Voice in one direction and voice and graphics in the other?

------
aratno
I just hope no self-driving vehicle is applying anything learned in GTA.

~~~
cpmsmith
I can't recall where, but I read that Tesla or Google _were_ actually using
GTA to train their self-driving cars, because it is a spectacularly advanced
simulation of driving through an urban environment, so they didn't have to
build their own.

~~~
seasonedschemer
That would be the Berkeley DeepDrive project.
[http://deepdrive.io/](http://deepdrive.io/) and
[http://bdd.berkeley.edu/](http://bdd.berkeley.edu/).

~~~
cr4zy
deepdrive.io creator here - I'm actually not affiliated with the Berkeley
project of the same name. There's also a DeepDriving at Princeton plus plenty
of other (mostly perception) projects using GTAV, so it can be confusing. I'm
hoping the GTAV for self-driving car efforts can start to standardize around
the Universe integration though. Having worked on it, I can say firsthand that
the Universe architecture is definitely amenable to sending radar, lidar,
controlling the camera, bounding boxes, segmentation, and other types of info
that the various sub-fields of self-driving are interested in. Super-excited
to see how people use it!

------
swah
Layman question: isn't adjusting "hyperparameters" similar to writing a
algorithm for playing a game, using human intelligence?

Related to the blog post:
[https://openai.com/blog/universe/](https://openai.com/blog/universe/)

~~~
tlb
It depends how many hyperparameters there are. Many popular general-purpose ML
algorithms have only a handful of numbers for hyperparameters, so they don't
embody much human input. And they can sometimes be tuned automatically.

Also, an algorithms that can learn 100s of different games with the same
hyperparameters is more highly regarded than one that needs different
hyperparameters for each.

------
BaronSamedi
Unless I missed something it looks like the AI has to learn from screen pixels
instead of getting game state data. I don't like that approach at all. I
understand that it's easy to implement for OpenAI but I think having the game
developers provide a real bot-capable API is much better. I hope the latter is
what Blizzard will provide for their DeepMind collaboration.

~~~
mtgx
Seems unlikely. The focus seems to be on improving AI through "vision". The
idea is to make the AI learn skills the same way a human would (at least in
the first years of life). Google's AlphaGo also learned from screen pixels.

So these would be _human-like_ bots, rather than _bot-like_ bots, like you
normally have in games. The bot would simply learn by doing, until it masters
the game, not by getting access to game algorithms.

~~~
SamBam
> Google's AlphaGo also learned from screen pixels.

Source? That literally seems to make zero sense to me. Go can be represented
in a super-simple state. Why make it spend millions of cycles learning to
categorize pixels into that state you already have?

~~~
tiler
I would guess that they trained AlphaGo from many thousands of hours of match
footage. Writing a computer vision script to segment / extract the data may
cost cycles as you say, but would save many human hours by eliminating the
need to re-watch the footage and literally type out state information for each
move.

~~~
ironrabbit
Alphago was actually trained directly on game state (plus some extra computed
state like "how many liberties will I have if I play this move" or "will I win
this ladder"). A huge number of pro games (and countless amateur games) are
available on servers like KGS in a nice computer-digestible format.

------
thallukrish
Being able to "Infer" from what it learns and "Apply" it to new scenarios in a
general way is all about intelligence. I do not see how making it to win one
game or one million will move it towards achieving general intelligence of
this sort.

------
iotb
Does OpenAI Universe communicate in any way with OpenAI remotely regarding
activity in OpenAI Universe? Essentially, are there any call-home aspects to
the code base? Or, is it possible to run this locally without any outside
communication?

If there is remote communication, can you detail why and where it exists in
code?

~~~
gdb
We don't call home. Once you've downloaded the Docker container, the only
outbound network traffic should be downloading the requested SWF once for
Flash games on demand (or for actually playing the game online, in the case of
e.g. Slither). You can cache the SWF if you don't want it to be downloaded
each time you start a new container.

Other than SWF downloading or specific Internet-enabled environments, running
offline should just work.

------
mariusz79
I might be wrong but I think this was created mainly to monitor progress in AI
research. If someone uses OpenAI Universe and can get better results than
virtually everyone else, they will be able to get to them first.

~~~
mrfusion
Kind of like seti for ai if you think about it. I wonder if they have a
protocol for what to do if they detect an advanced ai on their system?

------
NamTaf
From my initial reading, the end user can't create environments? Is that a
feature that I can expect will eventually come?

~~~
mappingbabeljc
You can create environments - it's coming! We'll be releasing many components
over next few months.

~~~
lawless123
Brilliant :)

------
naveen99
Too bad Iphone doesn't support a vnc server. Would be nice to add some android
apps if they could get permission.

------
mrfusion
Is the users ai responsible for parsing the screen pixels that come back or
does each game give you relevant events?

------
dcslack
Some designers from Stripe absolutely helped with the design of this page.

------
daveloyall
Didn't we all agree to NOT let the AGI out of its box?

...That being said...

Instead of presenting the agent with a 2d plane of pixels, they should be
presented with a sphere of pixels, with their POV inside.

