
Visual Doom AI Competition - nopakos
http://vizdoom.cs.put.edu.pl/competition-cig-2016
======
fdej
I'd like to see a single-player bot that can do human-level speedruns and/or
beat stuff like
[https://www.twitch.tv/blooditekrypto/v/30795033](https://www.twitch.tv/blooditekrypto/v/30795033)

Baby steps. Beating other bots in deathmatch is a good start. I love that they
only use the rocket launcher, giving the careless bot an equal chance of
blowing itself up.

Parsing what's on the screen in Doom is potentially a lot easier than in
modern games: since there is no texture filtering or anti-aliasing, and due to
the 2.5D perspective, most vertical runs of pixels on the screen map exactly
to (linearly?) scaled columns in wall textures or sprites. I would not be
surprised if you could come up with a fairly simple algorithm to determine the
exact position and orientation of the player and the objects on screen within
the map, without any real AI/learning involved.

~~~
hacker_9
How do you figure? The doom maps are almost all completely dull looking and
gray, aren't lit realistically (so depth from shading becomes harder to
determine), and because of no anti-aliasing, lines are far more jagged
(meaning the computer needs to be smarter, in order to join disparate lines
together it 'thinks' are connected). There's barely any distinguishing
features separating wall boundaries from texture details either. I think you
might be underestimating the challenge here.

~~~
nothis
How is an anti-aliased line easier to read for a computer? It should be fairly
easy to find lines and use those to get vanishing points for orientation.
Distance of enemies should be more or less a mapping of screen height, since
you're mostly looking straight-forward. It sounds super doable, especially
with machine learning becoming a thing, more or less delegating the task of
putting all those inputs together to make sense of it to the computer.

~~~
gene-h
Well you have more unique features from which you can do localization on.

------
Kristine1975
Related: [https://www.newscientist.com/article/2076552-google-
deepmind...](https://www.newscientist.com/article/2076552-google-deepmind-ai-
navigates-a-doom-like-3d-maze-just-by-looking/)

 _Google DeepMind AI navigates a Doom-like 3D maze just by looking_

Paper: [http://arxiv.org/abs/1602.01783](http://arxiv.org/abs/1602.01783)

------
Nr7
Google cache:
[http://webcache.googleusercontent.com/search?q=cache:bVb0ETV...](http://webcache.googleusercontent.com/search?q=cache:bVb0ETVF1p8J:vizdoom.cs.put.edu.pl/competition-
cig-2016+&cd=1&hl=en&ct=clnk&gl=us)

------
owenwealro
Interesting article and poses the question and what I would theorize is the
correct way for A.I. to learn how to be more human or machine learn is in a
virtual environment. i.e. create a 3D based game to teach it how to interact
in the physical world in order for robotic A.I. advancement and human
interaction. Similar to Google Drive then test on the road in real-life.

To the point on visual processing a stealth A.I. (not for long) working in
this space is Magic Pony Technology
([http://www.magicpony.technology/](http://www.magicpony.technology/))
operating from London and guys I spotted at London A.I. who have previously
hosted Prediction IO and Swiftkey at their last events.

Another caveat for this test is sound, human players will have audio but the
A.I. will be purely visual so a slight disadvantage.

We are working on speech-to-text and text-to-speech technology for our A.I.
voice enabled finance assistant which is in beta at WealRo
([http://www.wealro.com](http://www.wealro.com)) with a view to enabling a
visual face recognition at some point in order for the A.I. to gather
information from facial expressions. Always happy to get any thoughts on the
potential usefulness of such an integration?

~~~
chrisan
> Another caveat for this test is sound, human players will have audio but the
> A.I. will be purely visual so a slight disadvantage.

This project says it only uses the screen buffer, but perhaps (and this is
beyond my low level knowledge) there is also an equivalent "sound buffer"
where it could tap into left and right channels?

~~~
jcauchy
No, there is no sound buffer. It is an interesting extension to our project
but I think the visual information is significantly more important.

------
yorwba
The site seems to have moved to
[http://www.cs.put.poznan.pl/visualdoomai/competition-
cig-201...](http://www.cs.put.poznan.pl/visualdoomai/competition-
cig-2016.html)

------
logicrook
The question seems flawed, as having an AI making decisions only based on
visual information is basically confusing how you get the information (visual)
and what information the AI get (only limited information, similar to what the
player has). Two different problems that can be solved completely
independently. The first problem makes no sense for a game (how intensive
would be the computation), while the latter one could be very interesting,
since it would rely on designing the AI more like a natural player. The catch
however is that it's a "could", in itself there is no reason to imagine such
AIs would make the game better in any way (over "cheating" AI's).

~~~
argonaut
Your argument is very unclear, and your last sentence makes no sense. What
exactly is the problem with jointly learning perception and control? They are
widely considered to be intertwined problems in robotics. Not independent at
all. But for what it's worth, you will find researchers working on all sorts
of different approaches.

Clearly this competition is an attempt to build off successes in reinforcement
learning with agents that play games using only images and scores.

~~~
logicallee
I don't know if your parent comment is correct, but their argument is really
easy to follow. I'll put what I think their argument is in different words.

"this contest is like making Google make Alphago have to also include a robot
and image recognition and making the robot have to place the stones."
obviously that has "nothing to do with" the game and is "how Alphago gets the
information."

But Go (and tetris etc) are games of perfect information where perception of
the game state is not a challenge. If you have access to the internal data
structure representing the Go or tetris board that's the same as having to
scrape it off of a screen and recognize it or do real world image recognition.

If your parent comment is wrong it's because that's not the kind of game Doom
is.

So what you consider "intertwined" really isn't, unless Google has not even
built a go engine, since a human was doing the perception.

(again, I am just saying your parent's argument is easy to follow, not that
they're correct in this case.)

~~~
logicrook
Thank you, for the excellent rephrasing.

>But Go (and tetris etc) are games of perfect information where perception of
the game state is not a challenge.

In general, "perception of the game state" is not a challenge, at least
according to good game design principles (e.g. in danmaku shmups, perception
can be a challenge because of visual effects that are not really part of the
game, but this is seen as poor game design, similarly to how being unable to
differentiate backgrounds from platforms in a run&jump is bad design).
Although there are games where the perception of the game state is a game
mechanic, but Doom isn't really one.

But even in Doom, you can separate quite neatly the two tasks. The vision task
essentially aims to reconstruct a model of the world. But in a video-game,
this model comes for free. You can trivially limit the information an agent
get to what he would get as a player (in games like MGS it's already the case,
albeit in a very simplistic way). It's fairly easy to make a function that
computes what is visible, what sounds a player would hear, etc. You can then
rephrase the problem as make an AI that can only access this function, and
this wouldn't change anything.

So for the AI community, I think a more interesting question would have been
to design an AI over such a function.

~~~
eutectic
If the contestants are using deep learning, I don't see why it should be any
more difficult to generate a _meaningful_ , low-dimensional representation of
the game-state from raw pixels than from an abstract view input.

------
Joof
This sounds fun. I hope they stream some of it on twitch as well.

------
deepnet
This looks great fun, I am working on a convnet to play Doom.

------
brudgers
[Content after page loading]

Motivation

Doom has been considered one of the most influential titles in the game
industry since it popularized the first-person shooter (FPS) genre and
pioneered immersive 3D graphics. Even though more than 20 years have passed
since Doom’s release, the methods for developing AI bots have not improved
significantly in newer FPS productions. In particular, bots have still to
“cheat” by accessing game’s internal data such as maps, locations of objects
and positions of (player or non-player) characters. In contrast, a human can
play FPS games using a computer screen as the only source of information. Can
AI effectively play Doom using only raw visual input? Goal

The participants of the Visual Doom AI competition are supposed to submit a
controller (C++, Python, or Java) that plays Doom. The provided software gives
a real-time access to the screen buffer as the only information the agent can
base its decision on. The winner of the competition will be chosen in a
deathmatch tournament. Machine Learning

Although the participants are allowed to use any technique to develop a
controller, the design and efficiency of the Visual Doom AI environment allows
and encourages participants to use machine learning methods such as
reinforcement deep learning. Competition Tracks 1\. Limited deathmatch on a
known map.

The only available weapon is the Rocket Launcher, with which the agents start.
The agents can also gather Medikits and ammo. 2\. Full deathmatch on an
unknown map.

Different weapons and items are available. Two maps are provided for training.
The final evaluation will take place on three maps unknown to the participants
beforehand. Important Dates

    
    
        31.05.2016: Warm-up deathmatch submission deadline
        15.08.2016: Final deathmatch submission deadline
        20-23.09.2016: Results announcement (CIG)
    

Contact

    
    
        For announcements and questions subscribe to vizdoom@googlegroups.com
        Bugs: Open a new GitHub ticket
    

Getting started

    
    
        Download (or compile) the ViZDoom environment.
        Follow the instructions.
    

What will the Deathmatch Look Like?

Your controller will fight against all other controllers for 10 minutes on a
single map. Each game will be repeated 12 times for track 1 and 4 times for
track 2, which involves three maps. The controllers will be ranked by the
number of frags.

    
    
                In the case of lots of submissions, we will introduce some eliminations.
    

Technical Information

    
    
        Each controller will be executed on a separate machine having a single CPU and GPU at its only disposal.
        The machine specification: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz + GTX 960 4GB
        Operating system: Windows or Ubuntu Linux 15.04
    

How to Submit my Entry?

To accept your submission we will need the following data:

    
    
        name of the team
        team members and their affiliations
        max. 2 pages description of the method used to create the controller (pdf)
        a list of (sensible) software requirements for the agent to run (ask beforehand)
        a link to the source code of your controller and additional files (max 1GB in total)
        an instruction how to build and execute the controller
    

The form to submit the above data will be provided later.

    
    
                In the spirit of open science, all submissions will be published on this website after the competition is finished.
    

Organizers

Wojciech Jaśkowski, Michał Kempka, Marek Wydmuch, Jakub Toczek

------
banach
So they are trying to make it learn how to go on a killing spree. Doesn't
sound like such a great plan.

~~~
Kristine1975
Replace the sprites, and instead of slaughtering enemies with the chainsaw,
the AI heals wounded civilians with a medikit.

Doom is similar to the Lenna picture in image processing: Somewhat prurient,
but well-known. See for example psDooM, where you kill Linux processes:
[http://psdoom.sourceforge.net/](http://psdoom.sourceforge.net/)

P.S: Who will teach the AI cheat codes such as idchoppers?

