Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI Universe (openai.com)
1021 points by sama on Dec 5, 2016 | hide | past | favorite | 129 comments

Hey guys, it's Siraj. OpenAI asked me to make a promotional video for it on my Youtube channel and I gladly said yes! You can check it out here:


Nice video, but the jump from solving super simple 2d games, by feedback of binary win/lose conditions, to solving tasks in 3d open world simulations will require an un-imaginably gigantic leap in processing and knowledge. Additionally neural nets have already shown they are not sufficiently good enough at generalizing, and only work well at the specific tasks they were trained for. So the idea that an AI that can play GTA would also be able to 'solve' climate change is odd.

I'm far from an expert, but I thought the poor generalized performance of neural nets was largely associated with the complexity of the network (number of neurons, etc), and the training data.

Is there something more specific about the application of neural nets to generalized problems that makes them unsuitable?

Awesome Video! You just got yourself a new subscriber :)

very boring and predictable one. "whayyy, very cool, more, bigger, better, whayyyy".

Wow wow, Amazing!

Thank you! This was very helpful for me :)

Siraj, Ima go ahead and say that your videos have a bit of cognitive friction.

You and your personality are fine, but the jump between the intro and the payoff is jarring.

You need to hold our hand a little more ... jus' sayin'.

With this platform (and Gym) it seems like a large part of their strategy for "democratizing AI" is to grow the amateur research community. By making it easier for an individual to play around and conduct experiments, they are hoping enable progress to emerge from anywhere instead of just from wealthy companies and elite universities.

It is also a great way to be able to track and organize what is being created rather than having to sort through amateur projects scattered across the web or research publications that often lack accompanying code.


Some key ways they're making it easier for amateurs:

* Starting point for problems to solve

* Way to get noticed (instead of needing a university/company brand)

* Technological infrastructure for building and testing. The diversity of tools they brought together to build this platform is very impressive.

Disclaimers: I cannot see the future. These are just my opinions. I really appreciate the work and money that SamA, Elon, and others have put into the OpenAI project. The Universe work in particular might help encourage young people, many of whom love video games, to study AI.

But I feel that contrarians, such as myself, have an ethical commitment to young people to voice our doubts and criticisms, so that they can avoid making a long journey down a career/research path that leads to a dead end. That being said, I think this project leads in a very unpromising direction. Here are some reasons:

1. Games aren't a good testbed for studying intelligence. In a game the main challenge is to map an input percept to an output action (am I drifting off the side of the road? Okay swerve right). The real challenge of intelligence is to find hidden abstractions and patterns in large quantities of mostly undifferentiated data (language, vision, and science all share this goal).

2. This platform is not going to help "democratize" AI. To succeed in one of these domains, contestants will need to use VAST amounts of computing power to simulate many games and to train their DL and/or RL algos. DeepMind and others will sufficient CPU/GPU power will almost certainly dominate in all of these settings.

3. Deep Learning, as it is practiced, isn't intellectually deep. With a few exceptions, there is nothing comparable to the great discoveries of physics, not even anything comparable to the big ideas of previous AI work (A*, belief propagation, VC theory, MaxEnt, boosting, etc). Progress in DL mostly comes from architecture hacking: tweak the network setup, run the training algo, and see if we get a better result. The apparent success of DL doesn't depend on any special scientific insight, but on the fact that DL algos can run on the GPU. That, combined with the fact that, except for the GPU, Moore's Law broke down roughly 10 years ago, means that relative to everything else, DL looks amazingly successful - because all other approaches to AI are frozen in time in terms of computing power.

1. Games are great, especially the closer they get to the real 3d worlds with all the basic visual transforms in play. You can generate training data cheaper that helps you bootstrap the AI.

2. Your argument, which boils down to large organizations can accomplish more than individuals is in general true. But then that isn't saying anything new. Still, I'd prefer the car company give its blueprints than not. My factory (my cpu) can then at least build the car, albeit at a smaller scale. And soon my factory will be bigger/cheaper. Yeah I know, I'd like to be wealthy like Google too.

3. This is your own value judgement. The insight that we simplify AI architecture to some addition and multiplication is a big idea - in my opinion. Transistors are getting cheaper and I believe they will continue to do so. DNN are better suited to take advantage of this new computing power. Turns out, all those great discoveries were just some multiplications and adds the whole time ;)

> The insight that we simplify AI architecture to some addition and multiplication is a big idea - in my opinion

Multi-linear fitting is an insight? Perhaps it's application to certain problems is. That might be what you mean.

What you mention in point #1 isn't the main issue with games, but with the "reactive" approach in AI. Reactive AI systems are necessarily going to be more limited than say, predictive AI systems (a new emerging class of AI models). [1]

Agreed, though, that the focus on games is a long-standing source of problems holding back AI. My colleague has referred to this as "ludic AI" in a recent blog post where he expands on your statement about the real challenge of intelligence. [2]

[1] Reactive vs Predictive AI (http://blog.piekniewski.info/2016/11/03/reactive-vs-predicti...)

[2] Learning Physics (http://blog.piekniewski.info/2016/11/30/learning-physics-is-...)

"Deep Learning, as it is practiced, isn't intellectually deep. With a few exceptions, there is nothing comparable to the great discoveries of physics ... Progress in DL mostly comes from architecture hacking: tweak the network setup, run the training algo, and see if we get a better result."

To be fair isn't this what physicists do all day at CERN too? Smash some particles together, analyse the numbers, try to find patterns, tweak a few things and try again?

I take the point to be that there aren't "deeper" fundamental principles at play in these models. Tremendous progress has comes from simply tweaking of the numbers of layers, or how the feed forward to each other (skipping layers, etc), or by throwing more computer power or data at the same basic algorithm.

Where might we look for deeper principles? One idea is to consider what brains do and how they might be doing it. (I'm not saying we need to go down the rabbit hole of biological detail -- on the contrary I'm suggesting we look at known or even hypothesized principles of brain operation and import them into AI.)

Two ideas we have used in our work: prediction (over time), recurrent feedback (most brain regions have more feedback than feedforward inputs)

As a physicist judging from the outside, I share some your feeling. Are there general laws governing "learning"? Theorems? Are there "deeper" things to learn as humans? The thing is people in the field don't need heavy intuition or math. In some ways that's good (if you just want a result to utilize) and in others, it's bad (if you are a curious person).

In a sense, I would say yes there are learning laws, but it's still early in codifying them.

Along one axis, you could compare: supervised, semi-supervised, self-supervised and unsupervised learning. Along another axis, consider that there are versions of each method that take into account temporal/dynamic data, versus others that require randomly shuffled static data.

In the current problems of visual perception, I think the field would benefit greatly a shift to focus on multiscale interaction/dynamics rather than on (static) statistics as it is currently (for more on this, see my colleague's blog: [1]).

[1] Statistics and dynamics. (http://blog.piekniewski.info/2016/11/01/statistics-and-dynam...)

>In the current problems of visual perception, I think the field would benefit greatly a shift to focus on multiscale interaction/dynamics rather than on (static) statistics as it is currently (for more on this, see my colleague's blog: [1]).

Your friend's blog has a lot of good insights that I've seen in the theoretical neuroscience and computational cognitive science literature as well. Where do you guys work?

I work at LeEco US out of San Diego, and my colleagues work at other ML/AI companies also in San Diego. We originally met and collaborated at Brain Corporation.

Are you thinking of laws along the lines of biological evolution or physical laws of motion, where very simple ideas can produce extremely complex emergent behavior?

If so I think a layman law of ML is already understood.

To draw an analogy, some programmers might use trial-and-error to produce programs by fiddling with a few lines of code, seeing if it pass more unit tests, repeat.

If you believe that a human brain can be represented by machine code, then given infinite time, that trial-and-error programmer can write down the "source code" of the brain.

Then machine learning is just a "turing complete programming language"(i.e., a neural network architecture) with "source code" (in the form of matrix weights) where "passing more unit tests" is done by numerically following a gradient to update the "source code".

Everything else is just finding a better "programming language" that can make this run very fast on our current machines.

> Are there general laws governing "learning"?

This has been my question too...

Based on my understanding, it all really boils down to probability, statistics and a few important theories like Vapnik-Chervonenkis which provides the mathematical foundation for what "learning" is, can we even learn from the given data and how well can we learn (VC dimension, etc).

But I would love it if someone can point me to or explain / derive from core first principles the concept of "learning".

This is what the experiment does. The theory they are looking for/validating/falsifying is much deeper and richer than just a multi-linear fit.

Part of me says that you are correct on all of these points. However I think you went too far on 3.

Experimentation does not always follow theory in science. I would argue that many of the great discoveries of physics in the 20th century followed directly from startling results of experiments conducted at the end of the 19th century (the photoelectric effect for example). I agree that there seems to be an awful lot of experimenting going on and that hardware and large datasets have helped tremendously but there are also many researchers poking and prodding at deep learning theory [1, 2].

So from a glass-half-full perspective we have rapid (sometimes iterative) experimentation coupled with yawning gaps in theory to explain surprising results. In other words, the opposite of previous AI booms and a big reason to be optimistic despite all the hype.

[1] https://arxiv.org/pdf/1410.3831v1.pdf

[2] https://arxiv.org/abs/1608.08225

You're forgetting about the mother nature's miracle work. Wait till you see some unknown genious from some god forgotten village far away from a civilization in an undeveloped country come up with a new break-through idea that will kick asses of all Sillicon Valleys in the world.

1. Games are great for studying intelligent agent behavior and intelligence itself. An intelligent agent will find hidden abstractions and patterns in a large dimensionality. Language understanding, vision, decision science, solving long-temporal credit assignment etc. all are present. Russell & Norvig (2003) considered goal-directed behavior as the essence of intelligence. Also see possibly the first papers on AI and their focus on parlor games: https://en.wikipedia.org/wiki/Theory_of_Games_and_Economic_B... & http://www.loebner.net/Prizef/TuringArticle.html

2. This platform will provide a test bed for AI algo's and help "democratize" AI. One does not need to set up their own platform. One can compare approaches. One can learn from other implementations running on a common ground. Being resource constrained forces one to be more creative and this paves the way for more energy-friendly methods. Sure, a high school student will not dominate the power houses like DeepMind et al. But the high school student can get up and running in a week or few days.

3. https://arxiv.org/abs/1608.08225 Physics and Deep Learning are well entwined. Deep Learning certainly is a big idea, up there with VC theory and boosting. It exists for decades now, I agree the more recent incarnation was made possible with more computing power and better bigger datasets, and relies less on new tricks. Yet, still tricks are being invented in the recent years that have majorly contributed to better generalization. Dropout being one. DL, and especially the relevant Deep Reinforcement Learning, is not just GPU's, but a lot of new (and budding) theory. One can run Random Forests (and other approaches) on CUDA too. The Neural Turing Machine heralded a whole new, intellectually deep and stimulating, field in Deep Learning, and we haven't seen the best of it yet. There are also fields, like vision, where the other approaches are significantly underperforming relative to DL. Try to train a SVM or RF on ImageNet. Also, one is not required to use DL for your agent. Experiment with the classic approaches and see which is better (https://arxiv.org/abs/1603.04119).

I actually shared some of your concern, not for AI research, but for game development. I thought it was very hard to actually get a good job in that field. Then the mobile game market started booming, and indy developers could make a living. AI has got the backing of all the major players in industry. Instead of pipe dreams and philosophical meanderings, we have actually business-value adding working models now. It is not going anywhere soon.

If all else fails, you remain a good coder or data analyst with a lot of automation skills.

Related but slightly off-topic, there is a great sci-fi story by Ted Chiang (the same author who made the story behind Arrival film) about humans raising AIs in an artificial world. The premise is that if we want AIs to act like humans, we must teach them like we teach humans: http://subterraneanpress.com/magazine/fall_2010/fiction_the_...

The people who will end up raising AI's will not want them to act like humans. You already see it in their current uses. They create them to maximize profit. So, in a way, their owners (corporations) have created them in their image.

At Asteria we're using a Agent-System-Interface model, and are building models around observations of your own activity.

I 100% agree we need to teach them like humans, so they at least can build a model of how humans interact and participate with one another. At the least this will teach them about us, more than it will teach them about anything else. And if we want to participate and collaborate with that future of AI we need to have these models.

Read it on your suggestion. It was pretty good.

I'm getting a weekly dose of Ted Chiang by just farming links to free stories people post on HN! :D Thanks!

I'd love to see AI, using games, master the art of determining a depth for objects in the scene. If you ask a person, "about how far away is that car?", they often give you an okay answer that is at least in the same magnitude as the actual distance 1 m, 10 m, 100 m, 1000 m. If AI could do that, you could then navigate an environment in the real world better using only a camera or two. So you start with a virtual world that looks real, train up the bot, then use it to navigate in the real world. Has this already been accomplished?

That's a great (and hard) problem!

More generally, imagine AI that could learn the physics of the world. For example, if the ball is rolling away, the AI should be able to predict that the ball will look smaller on the next frame.

Going further, if the ball is about to roll under a shadow, the AI should predict that the ball will become a darker shade of green.

(After several years working in a robotics research company, these kinds of capabilities are exactly what we determined would be necessary for robot AI.)

Agree, it's not easy. Learning the basics, for example projecting a rectangle with 3d coordinates to 2d coordinates, then feeding the 2d coordinates into a NN and ask for the (depth) third dimension. Can you teach the NN a perspective transform? Can you rotate the rectangle and recognize rotation. Can you add other rectangles to the scene and detect each? Can you add color and lighting to infer more properties and get better results? Shine some more info on the problem ;)

These are like unit tests of AI (basic shapes and transforms) and I agree physical reckoning is at the top, one of the big tests that is a capstone and something beautiful to behold in nature (eg. sports). Maybe the a virtual soccer game at the end?

From my lidar experience, I wanted to reach for a model rather than deal with noisy sensor data. I want to generate the output (3d world) with my model, then the NN learns the inverse (eg. the scene graph used to generate the scene).

I enjoy thinking about this stuff, though it really makes my head spiral sometimes when I relate it to my own reality. It's easy to feel like you're losing touch.

This is a trivially easy problem if you have stereo cameras, just like humans.

This is amazing! I was thinking of this problem when I saw a friend making a stop-motion video. The steps are super repetitive and I asked him, "maybe an DeepMind Atari-style RL agent can learn how to do this?" But I didn't want to do what DeepMind did to emulate Atari games with an Adobe editing tool. This is an experiment that I can now run.

Never saw how stop-motion videos are edited. What's so repetitive? I thought that once you've taken all your pictures, you just put them sequentially and removed any frames that seemed off. Maybe you also need to decide how much time to leave every frame?

This is true, but some are more complex, which is also due to the choice of tool. For example, [1], made by Chris King, using Adobe Premiere. This is a time-lapse of the process that he used to make parts of [2]. Notice the pattern that emerges when sequential images start lining up.

[1] https://youtu.be/M7Hr83OI-rs

[2] https://youtu.be/1-rFV_d6RH8

Ha! Fascinating - I had the exact same thought when I saw a friend of mine make stop motion videos!

This is astounding!

If requests are being taken, it would be useful to be able to search through the listed environments. And a poker environ for the internet section would be a good balance of fun, widely appreciable and a straight forward but very non-trivial environment.

you'll lose your job and will be replaced by AI. Astounding?

This is a bit out there, but it would be fun if OpenAI can get one of the mega popular multiplayer games under this (WoW, League of Legends, DOTA etc.).

Imagine an AI team in League of Legends world championship!

ICYMI, DeepMind partnered with Blizzard to do this for StarCraft II:


Facebook open sourced their library to connect Torch to Starcraft Brood War last week: https://github.com/TorchCraft/TorchCraft

Valve is already involved, I imagine we'll see Dota 2 support.

WoW I would be very surprised by. But all the MOBAs and esport FPSes and the like are fair game.

Mostly because, with the latter, improved AI means better competition and a deeper understanding (which boosts sales). With the former, improved AI means improved automation which means imbalanced economies.

This is perhaps my favorite use of Docker ever.

We've been pushing hard on some parts of Docker, and it's working pretty well. For example, reconfiguring iptables depending on what game you ask for. And it works fine to test things like that on my MacBook and then deploy to Kubernetes. Amazing.

I noticed the OpenAI team wrote their own VNC driver in Go for performance reasons[0].

I would love to hear more about how they were able to achieve increased performance over other VNC drivers.


We wrote it for somewhat subtle reasons. First, there aren't too many alternatives out there — VNC is meant for human consumption, not for bots after all :). Second, for a single connection, once you're using Tight encoding, the bottleneck becomes server-side encoding and libjpeg-turbo, neither of which will depend on your driver. As you scale to many connections, the important thing becomes managing the parallelism well. Go is great for this.

We'd started by adapting an existing Python driver in Twisted, implementing additional encodings and offloading to threads for calls into C libraries like zlib. We got this working reasonably on small environments like Atari, but for environments which generated many update rectangles, we started to be bitten by the GIL. I still believe that one could make Python work, but it'd take quite a lot of effort.

libvncserver is a fast C driver, but it's GPL, and doesn't have any particular support for parallelization. We wanted Universe to be usable by everyone, from hobbyists to companies, so GPL was a no-go. (We actually talked to the libvncserver maintainers, who said that they would be interested in dropping GPL restriction, but there have been far too many contributors over its long history to figure out how to do so.)

Our Go driver, based on https://github.com/mitchellh/go-vnc, has scaled quite well. It takes advantage of Go's lightweight thread model: each connection runs in its own goroutine, which makes it easy to run hundreds of connections in parallel without needing hundreds of threads.

http://reddit.com/r/WatchMachinesLearn is about to get a lot more popular. I can't wait. Also from the linked blog post, you can play with (against?) your agent in realtime:

>You can keep your own VNC connection open, and watch the agent play, or even use the keyboard and mouse alongside the agent in a human/agent co-op mode.

Interesting announcement timing at 10:30 PM PST on a Sunday. :P

The list of third-party gaming partners is extremely impressive, and a Docker config helps resolve the dependency hell that some of the AI packages require.

We wanted people at NIPS in Barcelona to have something nice to read over their morning coffee and such. [I work at OpenAI - @jackclarksf on Twitter]

Well, you just got a few extra followers.

What is state of the art in reinforcement learning right now?


Is there a way to deal with "sparse" training data (state, action, reward) triples -- sparse in "state"?

Looks like the "UNREAL" (https://arxiv.org/abs/1611.05397), "Learning to reinforcement learn" (https://arxiv.org/abs/1611.05763) and "RL^2" (https://arxiv.org/abs/1611.02779) are state of art in pure RL for now.

Finally there is a trend of using recurrent neural network as a top component of the Q-network. Perhaps we will see even more sophisticated RNNs like DNC and Recurrent Entity Networks applied here. Also we'll see meta-reinforcement learning applied to a curriculum of environments.

The crazy thing is that these stacked model architectures are starting to become another layer of "lego blocks" so to speak.

That paper was 10 months ago. There have been many RL papers in the meantime, but sparsity is only a problem with respect to reward, not state or action, from what I can see.

You didn't answer my question. :(

I'll just go on a limb and consider this to be fucking awesome.

All the listed PC games environments are tagged as "coming-soon"


End game; I'd really like an AI agent for "in real life" tabletop games (like boardgames).

I call those friends.

Unfortunately, those "friends" have a lot of annoying issues that come with meatspace-produced wetware, and don't take (kindly to) pull requests.

There are hardcore boardgames you will find difficult to find human players willing to play with you. Campaign for North Africa takes 8-10 players and has an estimated playing time of 1000 hours [1]. An excerpt of a review written for this game:

> Are you a logistics major? Are you masochistic? Do you think that the calculations required to play a game should take longer than actually moving the units? Then do I have a game for you! Get yourself a copy of The Campaign for North Africa, and say goodbye to the family for a couple of months, if not years.

The Campaign for North Africa is the most detailed game that I have ever played. It isnt necessarily the most complicated, but for sheer size of the detail and planning involved, it is by far the most laborious and detail-oriented game that has ever been produced. As a first example, this is the only game that I know of that differentiates between British and German jerry cans for fuel. More about this later on.

The Campaign for North Africa is Richard Berg and SPIs simulation of the war in North Africa in the Second World War. The seven foot long mapsheet (divided into five sections), two sets of rulebooks, charts and tables galore and, oh yes, thousands of counters complete the game in a nice sturdy box, not the usual SPI flat game holder that falls apart. Most of this is standard SPI fare, with the functional but not pretty counters, standard three column style SPI rulebooks, and a fairly attractive map that does an excellent job of creating an epic sense of scale. True, this is the desert, and most of it is desolate, but the numerous tracks and roads, the coastal plains and mountains, and the railroad (both already built and railroad you can build as the game goes on) all combine to present an appealing picture of the area.

Each turn is one week of time, and each turn is broken down several stages. There is an initiative determination, naval convoy stage, stores expenditure stage, and then three operations stages. The Ops Stages are where most of the activity occurs. There are also stages that are used in the air game. I did not play the Air Game for the purpose of this review, but did play with the advanced logistics.

The game also includes on of each type of chart, which can be used to make copies. I made my own in Excel. There are charts for Division and Brigade organization, truck convoy sheets, naval convoy sheets, prisoner sheets, broken down and destroyed vehicle sheets, supply dump sheets, sheets for the air game and more. I even created a couple of my own for production and independent units. As each Division in the game needs its own Org chart, which fit best on legal size paper, these are a lot of charts and sheets to keep track of. All of these must be filled out before the game even starts, and just setting up for the beginning of the game requires filling out hours (literally) of paperwork. And for heavens sake, dont use pen! Much of what you write in the charts at the beginning of the game will be erased by the end of the first turn. After every movement, every combat, just sitting there and doing nothing will require updating of the org charts for every unit in the game.

[1] https://boardgamegeek.com/boardgame/4815/campaign-north-afri...

That sounds horrible. It also sounds like a game that should definitely be played on a computer, not as a board game.

I don't know about you, but I love playing unreal tournament by moving rocks around on the ground similar to https://xkcd.com/505/

Tabletop simulator would making an interesting training environment. I may not be able to train a good go player, but a table flipping sore loser is probably doable.

>other applications

Any applications with a keyboard and mouse? Can I use emacs and have it start learning to code?

Sure, just define a good score function...

That should be easy, I'll need a 5 million dollar grant and 5 years.

How much do you need to train a model to write grant applications for 5 millions dollars or more over 5 years.

Browser tasks seems to be a greenfield field with amazing potential.

What if AI can do anything what can human do you with a browser over the phone?

Also love "bring your own Docker container format".

Well, if all GUI interactions can be automated, what would be our next human interface to computers/AIs?

Voice in one direction and voice and graphics in the other?

for some reason, the idea of autonomous bots crawling around the internet also unsettles me. i guess it really depends on what kind of rewards you train it for.

Which tasks do you think are the ones with more potential in this field?

I just hope no self-driving vehicle is applying anything learned in GTA.

I can't recall where, but I read that Tesla or Google were actually using GTA to train their self-driving cars, because it is a spectacularly advanced simulation of driving through an urban environment, so they didn't have to build their own.

There was an interesting academic research paper that showed you could train in GTA and transfer over to the KITTI dataset and do ok: https://arxiv.org/abs/1610.01983

That would be the Berkeley DeepDrive project. http://deepdrive.io/ and http://bdd.berkeley.edu/.

deepdrive.io creator here - I'm actually not affiliated with the Berkeley project of the same name. There's also a DeepDriving at Princeton plus plenty of other (mostly perception) projects using GTAV, so it can be confusing. I'm hoping the GTAV for self-driving car efforts can start to standardize around the Universe integration though. Having worked on it, I can say firsthand that the Universe architecture is definitely amenable to sending radar, lidar, controlling the camera, bounding boxes, segmentation, and other types of info that the various sub-fields of self-driving are interested in. Super-excited to see how people use it!

>spectacularly advanced drivint

gran tourismo is advanced, for a videogame at least

GTA is Grand Theft Auto, not Gran Turismo.

What is spectacularly advanced about the police ramming me off the road as they chase another suspect?

Least fun GTA player ever.

Write once, run over pedestrians.

Layman question: isn't adjusting "hyperparameters" similar to writing a algorithm for playing a game, using human intelligence?

Related to the blog post: https://openai.com/blog/universe/

It depends how many hyperparameters there are. Many popular general-purpose ML algorithms have only a handful of numbers for hyperparameters, so they don't embody much human input. And they can sometimes be tuned automatically.

Also, an algorithms that can learn 100s of different games with the same hyperparameters is more highly regarded than one that needs different hyperparameters for each.

Unless I missed something it looks like the AI has to learn from screen pixels instead of getting game state data. I don't like that approach at all. I understand that it's easy to implement for OpenAI but I think having the game developers provide a real bot-capable API is much better. I hope the latter is what Blizzard will provide for their DeepMind collaboration.

If your goal is to build General AI, getting access to the game state information is cheating and ultimately self defeating because it does not generalize across games. Expert humans are able to pick up and play a brand new game with a high degree of initial success despite zero knowledge of the memory states within the game. We do this by reliance on our vast knowledge of video game and literary tropes as well as experience playing past games. I have yet to see any video game bot make use of this stuff to figure out how to play Zelda, for example.

An AI which can learn any game from "looking" at a screen is a very ambitious goal. I doubt it is achievable in the near term. Humans can do it because of years of learning about the world. Personally, I prefer the more modest and achievable goal of teaching an AI to play complex real-time games such as Starcraft, LoL, and Dota--especially the latter 2 since they are team games.

Personally, I prefer the more modest and achievable goal of teaching an AI to play complex real-time games

Those sorts of goals are better suited to individual hobbyists. OpenAI is a blue sky research project set up by billionaires with the goal of improving all of mankind. I hope I'm not being too uncharitable when I say that your comment reminds me of those who scorned the Apollo program for its ambition.

I support research in AI of all kinds, however, its history is one of vastly over-promising and under-delivering. Modest but demonstrable progress is better, in my view, than yet more hype.

That's not their goal though.They want AGI.

Seems unlikely. The focus seems to be on improving AI through "vision". The idea is to make the AI learn skills the same way a human would (at least in the first years of life). Google's AlphaGo also learned from screen pixels.

So these would be human-like bots, rather than bot-like bots, like you normally have in games. The bot would simply learn by doing, until it masters the game, not by getting access to game algorithms.

> Google's AlphaGo also learned from screen pixels.

It did not. It received the state of the board as one array, another board state array for capture/komi (since Go does have global state which is not visible purely from the board representation), and a few additional features to help it out with stuff like ladders. It was architected with convolutional layers, but over the Go grid, not pixels. See the AlphaGo paper pg11 for the exact structure of the input: http://www.postype.com/files/2016/04/08/16/05/03384c91046e8e...

Could it have learned from pixels (augmented by the additional necessary global state)? Sure. But that would've been a waste of computation since the visual layout of a Go board is fixed and static, unlike Atari games.

> Google's AlphaGo also learned from screen pixels.

Source? That literally seems to make zero sense to me. Go can be represented in a super-simple state. Why make it spend millions of cycles learning to categorize pixels into that state you already have?

I would guess that they trained AlphaGo from many thousands of hours of match footage. Writing a computer vision script to segment / extract the data may cost cycles as you say, but would save many human hours by eliminating the need to re-watch the footage and literally type out state information for each move.

Alphago was actually trained directly on game state (plus some extra computed state like "how many liberties will I have if I play this move" or "will I win this ladder"). A huge number of pro games (and countless amateur games) are available on servers like KGS in a nice computer-digestible format.

Again, that doesn't make sense because the moves have almost certainly already been typed out by somebody. It's the same in chess. There are databases containing millions of games.

I'm not too familiar with Go, but in chess we have millions of text-format games and very little real-time footage.

Getting the game state data means deciding a-priori what features the AI should learn on. The whole point of the deep learning paradigm is to allow a machine to learn such features that enable good prediction, visualization, generation (aka. hallucination), etc.

Instead, researchers have provided the raw feed input data to these agents with the hope that the learned features could be interpreted as game state data by humans.

I would say that it is a point of deep learning rather than "the whole point". For an AI to interact with the real-world then building models from vision (as we do) makes a lot of sense. In the virtual world, however, it makes no sense. Model data is already available and the AI has no need for something as inefficient as vision. We humans have to use vision (and sound, etc) in games because we have do not have access to direct data feeds, computers have no such limitation. Why cripple the AI by imposing human limitations on it?

If they want their AGI to applicable to the real world or software with incomplete or insufficient APIS they have to do it the way they are doing it here.

There isn't an API for me to check if i'm still on the foot path and not the road as i walk down the street.

I can't use an API to tell me water is boiling and that i shouldn't stick my hand in it.

If the goal is to have AI that is aware of the real world (even a chatbot), then using game state is a crutch that doesn't help us solve the real problem.

Being able to "Infer" from what it learns and "Apply" it to new scenarios in a general way is all about intelligence. I do not see how making it to win one game or one million will move it towards achieving general intelligence of this sort.

Does OpenAI Universe communicate in any way with OpenAI remotely regarding activity in OpenAI Universe? Essentially, are there any call-home aspects to the code base? Or, is it possible to run this locally without any outside communication?

If there is remote communication, can you detail why and where it exists in code?

We don't call home. Once you've downloaded the Docker container, the only outbound network traffic should be downloading the requested SWF once for Flash games on demand (or for actually playing the game online, in the case of e.g. Slither). You can cache the SWF if you don't want it to be downloaded each time you start a new container.

Other than SWF downloading or specific Internet-enabled environments, running offline should just work.

I might be wrong but I think this was created mainly to monitor progress in AI research. If someone uses OpenAI Universe and can get better results than virtually everyone else, they will be able to get to them first.

Kind of like seti for ai if you think about it. I wonder if they have a protocol for what to do if they detect an advanced ai on their system?

From my initial reading, the end user can't create environments? Is that a feature that I can expect will eventually come?

It looks like the image from the server and control information to the server is sent through the VNC protocol. Other information such as the reward signal from the environment server is sent through a WebSockets protocol using JSON:


You should be able to implement this protocol for your environment and run a VNC server for the rest. A new class for the client representing your environment can be based on this:


Then register the class with OpenAI Gym:


After creating the environment using gym.make you need to add information about your remote in the call to configure:

env = gym.make('gtav.SaneDriving-v0')




This is only based on a cursory reading, but it should be possible to use custom environments with OpenAI Universe as it is today.

You can create environments - it's coming! We'll be releasing many components over next few months.

Brilliant :)

Does that mean we can't train it on new games? only preexisting ones?

If it's true, I believe we have to wait for the OpenAI team to build new gym environments before we can train in new games.

I only briefly poked around because it's nearing on midnight here - maybe you can pull open the examples included and work out how to rewire them to work on new games, maybe not. Either way, I've got a particular use case I'd like to make a gym for so I'm interested in finding out.

Too bad Iphone doesn't support a vnc server. Would be nice to add some android apps if they could get permission.

Is the users ai responsible for parsing the screen pixels that come back or does each game give you relevant events?

Some designers from Stripe absolutely helped with the design of this page.

Didn't we all agree to NOT let the AGI out of its box?

...That being said...

Instead of presenting the agent with a 2d plane of pixels, they should be presented with a sphere of pixels, with their POV inside.


Whoa. Please don't be nasty like that here, even when someone's video has flaws.

Actually many of us can empathize with the feeling of not getting detailed information—the real goods—when it's want you want and you know you're capable of absorbing it. But this is a bad way to express such a strong feeling on Hacker News. A good way might be to give someone the benefit of the doubt and explain what you'd really like without insulting them.

We detached this subthread from https://news.ycombinator.com/item?id=13104019 and marked it off-topic.

Making a technical video for a general audience is real challenge. This video succeeds in that regard.

The audio cuts are likely because he had a deadline that didn't allow him to completely re-record the audio. I appreciate he inserted clips that added clarity despite knowing that he'd get negative comments for that effort.

Thanks for creating this video and sharing.

Agreed, I'd rather have the condensed information the video provided today than to wait for a more refined version.

You could have made these criticisms without the hugely inflammatory language.

Your comment comes off as needlessly abrasive; it reads as something intended to wound rather than edify.

I agree with you. You are most likely being downvoted by his fans. But this video (and all his others apparently) is empty of substance.

From my experience HN users will down vote if a comment comes across as uncivil, as at least three commenters have already noted. How you say it matters on HN.

I bet you have some far superior videos for your own open source projects--why don't you post links to those babies so we can learn how to do it right?

I never made that argument.

I made the argument that nearly every single one of this guy's videos are not the least bit helpful in actually learning how machine learning works, and gets viewed based on click-baity title and click-bait thumbnails (attractive girls a lot of the time). Which i will stand by. Every one of them is "how to write an AI that ___" when its importing tensorflow, setting 2-3 hyper parameters then letting it run.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact