Hacker News new | comments | show | ask | jobs | submit login
Python plays Grand Theft Auto V (pythonprogramming.net)
416 points by adamnemecek 67 days ago | hide | past | web | 74 comments | favorite

Wow, hello hacker news! I haven't seen pythonprogramming.net load this slow in a ... ever.

Thanks for sharing my silly project!

Any ideas, pull requests, or critiques are welcome. (https://github.com/sentdex/pygta5/)

I'm currently working on PID control, and also contemplating switching the model to a DQN, with a reward for speed (whether this is perceived, or read from the game directly) IF we're on a paved surface. If you know about either, don't be shy.

If you hate long loading, you can watch just the videos on youtube: https://www.youtube.com/playlist?list=PLQVvvaa0QuDeETZEOy4Vd...

edit: added link to github, as well as links to just the videos since pythonprogramming.net is going slow from traffic.

I'm always curious when someone shows up with a super impressive project, but has never registered a user name on Hacker News.

Can I ask "where have you been all this time" - Do you have some other aggregator or blogroll that you follow? Do you try to avoid time wasted surfing different articles and comments? Did you know about HN before and avoid it due to its reputation, etc?

I've just never felt the urge to post on hacker news, but it's in my list of places I check frequently for news, I just lurk though. Maybe it's because I dunno how to hacker news, but I just look at the front page. Usually, by the time I am seeing something on HN, whatever I think or have to offer has already been said, usually more eloquently than I'd come up with.

I have more of a presence on reddit (https://www.reddit.com/user/sentdex/). The bar is lower there. ;)

For news, I have a bookmark dir with a bunch of sources in it, I right click and open all whenever I want to check in and see what the world is up to.

> whatever I think or have to offer has already been said

I've only been on HN for about six months now, but from what I have seen in this short time, I wouldn't let "fear" of saying something or posting something you think has already been done stop you.

Case in point: Your first posting. Look at the attention it has gotten. I enjoyed it, and others have enjoyed it. But guess what?

In my short time here I've seen other similar projects posted - and that's not anything against you or them! Ultimately, it's how we all learn from each other, and you shouldn't let prior work stop you from posting - there may be a nugget of new work or something else interesting about it that makes it worthwhile.

For instance, while your project isn't the first of its kind, I do think it may be one of the first to be done in OpenAI's GTA V universe system; even if not, it's one of the few that have been posted here - so it's relatively "fresh".

Just my observations, of course - and thank you for posting!

Although without super-impressive project, I didn't know Hacker News existed until a year or two ago despite over a decade reading on IT, INFOSEC, etc. I was mostly on blogs/forums where I saw knowledgeable people talking about my interests with low noise. Especially Schneier's blog before Snowden leaks when noise showed up.

I think Hacker News has a marketing problem. Many people here assume all the technical folks know about and read it. Whereas, most I meet have never heard of it. So, I just tell them about it. Now, we don't want to give it a bad rap by being zealots shouting about it in places everywhere. But, if bookmarking excellent discussions, you might just drop a link to them when they solve a problem or provide insight that's on another forum. "Hey, that exact thing was discussed on Hacker News with a lot of interesting ideas. Link below. You should probably check out the homepage as it often has lots of neat tech, businesses, etc sometimes with the inventors of them in the comments."

That's what I've been doing.

Personally, I very much prefer that HN is not publicized.

How about only doing it on forums or threads with potentially-valuable members? That's what I do with Lobste.rs give Im responsible for who I invite.

He's been posting great Python tutorials on YouTube for years under the same channel name as his username - super impressive

On the first page, thanks for documenting what actually happens on a daily basis. Sometimes we don't know how to do things, copy and paste something from Stack Overflow, there's a few typos, a little bit more searching, and we get something working and it's good enough for the next step. Thanks for teaching others it's ok to get stuck and solutions don't come in a flash.

As the saying goes, Make it work, make it right, make it fast.

Hi, I watched some of your videos at https://www.youtube.com/user/sentdex and I found them helpful. Keep up the good work :)

Your videos are great/inspirational. Keep it up!!

I'm currently enrolled in Udacity's Self-Driving Car Engineer Nanodegree; I just skimmed your article, but it seems like you've implemented a form of lane-line tracking to control steering.

In the first term of Udacity's course, we had to implement something similar as part of a project - but it was a later project that was most interesting.

We were given this tool:


...though at the time it was closed source and only had a couple of tracks. Anyhow, the idea and goal was to extract frames from the simulator (it provided for this) and using Python, OpenCV, TensorFlow and Keras - develop a neural network to guide the vehicle around the track. For the project, speed was fixed in code, and we only had to worry about the steering (though the output of the simulator also provided for accelerator and braking values as well - so we could use them if we wanted to).

The simulator had three "cameras" - left, right, and center. You could drive it around the course (using a keyboard, mouse, or joystick/pad), and it would dump these frames to a directory, as well as a labeled dataset of the frame filenames and the various datapoints.

So - you had a labeled dataset for training a neural network - input a frame, train it to output the proper values for that frame (steering, brake, throttle). The left and right frames (with other adjustments) could be used to simulate drifting near the left or right side and counter-steer away (to prevent going off the road). Other things we did was to augment the data by adding fake shadows, shifting or skewing the images to simulate turns or whatnot, altering the brightness of the images to simulate glare, etc. So you could get a very large dataset to play with; more than enough to split up into training, test, and validation sets.

For myself (and more than a few of the other students), rather than try to implement a custom network to handle the task, we instead implemented a version of NVidia's End-to-End CNN:


Some of the details about the architecture you have to make some educated guesses about, plus we had to scale our images down to reduce the number of input parameters to the network (but if you have the memory, go for it!). I ended up using my GTX 750 ti SC for my TensorFlow CUDA (running everything under Ubuntu 14.04). This has about the equivalent power as NVidias embedded Jetson PX car computer platform (better for running the model on rather than training, but I don't have such a real rig yet).

It worked really well. I'd imagine that if you did the same in GTA V, which has more realism, the resulting model could be used as a foundation (transfer learning) to further train a vehicle for real-world driving. I suspect that Udacity's simulator has three cameras because the self-driving vehicle they developed has the same setup (I'm not sure how it is going to work, but the final term project is supposed to be related to the real vehicle in some manner - they claim our code will be ran on the vehicle, but we'll see how that goes).

So...you might want to give that a shot! As some advice here, I found that for Udacity's simulator, using a steering wheel controller system (with pedals) was more natural and generated better data than using a keyboard/mouse/joystick. Also, while it sounds like implementing this kind of system is simple, in reality you might find yourself being frustrated by certain aspects (make sure your dataset is "balanced" - I found issues in one of my runs where my model was biased to left-hand turns, and couldn't cope with a right-hand turn, for instance) - but these can lead to interesting and humorous results:

I had one model which got confused, and would turn off the road in the simulator and drive on a dirt area; this dirt area was defined, but "edges" consisted of berms and warning signs; other than that it looked the same as the ground. The dirt area wound around then eventually led back out to the track.

My model, driving the car, would invariably go off-roading on this section - but despite never having trained in this area, it drove it relatively perfectly! It would follow the dirt around, and get back on the track, at which point when it got to the dirt entrance again, it would do it over. It did this several times, and was fun to watch. Somehow, the CNN model had enough knowledge from on-road driving to handle a simple case of off-road driving as well...

So you might want to give this option a shot in the future. I personally have waiting in the wings a machine that I planned to build for running GTA V, based around a GTX 970 - it was going to be a gaming machine, but now I am seriously considering making it my next workstation for machine learning (and perhaps upgrading to a 1080).

You can just re-structure your post and make it a blog article :)

Here's a faster way to get a frame using PyGTK:


Takes around 5ms per frame for me, rather than 50.

Slightly shorter code (still using the same method): https://askubuntu.com/a/400384/73044

Thanks for sharing this, will definitely look into it. With the ever improving FPS, this AI's reflexes are going to be insane.

No problem, glad to have helped!

Thank you so much for sharing this! I'm working on something similar right now (a framework to build game agents using computer vision tools) and I just went from capturing at 10-15fps with ImageGrabber, to 15-25fps with FFMpeg and finally 70-80 fps with this technique. Insane!

You're quite welcome! I used it to capture a single pixel for my Super Hexagon LED strip (https://www.stavros.io/posts/wifi-enabled-rgb-led-strip-cont...) and it works really well.

Really nice and can see why you needed the performance for this project. Funny that you mention Super Hexagon because it's the game I'm using for my POCs!

Oh cool, what are you doing with it? I was thinking of doing the same, I'm really interested in reading about it.

Is there a cross platform way to do this? I'd like to be able to do this on a Mac, but would prefer not to have to set up GTK (if it even works).

Self-playing games ought to be newer generation's Logo's turtle!

There's something magical about creating a game AI. It's you're you're a god and you built a divine being in your diving game world.

To make an artificial player for a simulated world we'd built was part of an assignment in Dr. Robert Harper's 15-212 class when I took it at CMU. On each tick of time, each of them would randomly do one of the things it could do. To see these characters randomly pick up change, put it into vending machines, and get out guns, ammo, and more of themselves, and then collectively gun me down, was a thrilling illustration of an AI control problem. I felt like a god being killed by his creations. Mindchildren?

By the way, in case anyone knows, I would LOVE to track down the origins of this idea. A dozen years later, when I first thought to ask Dr. Harper, he did not remember the source, probably one of his TA's circa Fall 1992.

It probably goes back further than that; one could argue that things like Conway's LIFE implement certain aspects of the idea, as well as military "tabletop" game simulations (which go back a long ways). Other kinds of fantasy RPGs are similar, and then you have older video games on systems in the 1970s that could be considered.

But I think military and other more "physical" simulations likely pre-date all of this; these systems have their own rule systems for the various "pieces" on the "board", and use things like dice and other random number generators to determine how things progress.

There's also probably more than a few fictional stories which set up this idea as well.

I wish GTA V has the GTA IV vehicle physics engine settings.

Driving in GTA IV feels so realistic, with so many attention to details from pedestrians that really react to small little things like misfiring of half-broken car engine and so much more.

(it's the same engine (RAGE with bullet physics), just with dumbed down settings for more noob-friendly/casual gameplay)

And not only the vehicle physics but also the damage model has been dumped down - the GTA IV one was superb. (Only totally different and a lot of more resource intensive damage models in RigsofRods/DriveNG are even more realistic.)

They had to decrease the simulation level for GTA V as the PS3/Xbox360 weren't capable to offer the richness of GTA IV on bigger scale. Sadly they never added the details back with the PS4/XboxOne/PC HD re-mastered release.

So to get the most detailed simulation one has to use GTA IV and install mods to drive around on the GTA V map. See YouTube videos, it's possible and a lot of fun.

It's realistic but it's so annoying to play, you have to slow down A LOT to make a turn, compared to III/VC/SA where driving was actually fun.

IV was such a disappointment to me that I haven't even tried V yet. SA felt so expansive and open to me that when I played IV, it felt like I was trapped on story-telling rails.

Have they reversed that trend in V?

I'm not a gamer. I've only barely played any of GTA, but what I have I liked.

When GTA V came out, I couldn't believe the level of detail and the size of the world, and how much could be done within it (especially in "sandbox" mode). I was so stoked about it, I plunked the money down to build a dedicated GTA V Windows box (and I haven't used 'doze at home in well over 15 years). It was the first and only game to get me to do this.

Now - I'll level with ya here - I never put that box together; I still have all the pieces (it was going to be a Mini-ITX with 16 GB, GTX 970, etc stuffed in a ThermalTake Core V21) - I just never got around to it (I even bought the game on Steam).

Now I'm thinking about using all of that for a new ML workstation to replace my current aging system (but I can still use it for GTA V I guess).

I know this isn't a ringing endorsement, but if a game can make someone like me that hyped about it - to drop money into parts for a complete new system just to play the game, from somebody who isn't a gamer, nor a Windows user (or fan) - well, that game has to be something special, ya know?

The only other game that moves me like this is currently near vaporware - Cyberpunk 2077 (and I haven't heard anything recent on it, since I've been heads down in my ML course).

You should play V. I don't know if you'll enjoy it but it's the biggest, loaded world they've ever done. Controls way better. The characters and dialogue are super-fun, too, even when missions make it pretty linear. Hell, even my mom liked watching me play for the characters. You also get some replay value on missions from ability to switch characters in real-time. You can try different ways of doing the mission with different people. The world itself obviously has piles of stuff to do as in the trailer that I luckily still have bookmarked since the trailer itself was put together so well.


My problem with it was that the single-player was great. Lots of people and things I ran into. Lots of fun random things. Online was broken for that first month so couldn't do that. A glitch also made my cars disappear out of garages. I eventually just kind of got done with it after a month or two. Great experience but just a month or two given I wasn't going to go hunting, endlessly customize my character, play golf, etc. Might as well be on The Witcher or Skyrim for that kind of thing. People that like setting up crazy stunts, chain explosions, etc still had more fun but I just watch the best ones on YouTube to skip the work part of it. :) So, you should get at least a month of enjoyment out of it with it being pretty cheap by now. Trevor was worth it by himself since I'm pretty sure hypothesis is true that he's the first character modeled after the way most GTA players play GTA.

A question for the author of this, or other authors who create step-by-step guides like this: do you write it as you go along, or fiddle around and then go back and write the blogpost?

Even with small commits, I usually have a hard time reconstructing the history of what I've done well enough to write it up like this.

Hey there, author here. It's just me who does the entire site of pythonprogramming.net and the youtube channel.

The way I have gone about things has changed over time, but I have found the worst way to do documentation is to write 100% of the code, the full series in this case, then go back and document/write a tutorial on it.

I used to just do the videos, but had lots of requests to do write-ups too, so I started that and really hated it at first, because I was just timing it all wrong.

So now, I usually will do maybe 1-5 videos, and then make sure to do the write-ups on them before continuing...otherwise the write-ups suffer significantly. I also try to do the write-ups in the same day, and before I personally progress on to the next topics.

It's just hard, because the last thing I want to do after I've made something that I think is cool is document it.

Other times I will actually work locally, and just either save my scripts in a step-by-step manner, or just work in an ipython notebook to save the steps I took in development, and then film the video, and then go back to the notebook to do the documentation.

Not sure that really helps much, it's an "it depends," but the main thing is to not get too far ahead of any accompanying documentation.

Not the author, but usually when writing step-by-step stuff like this I alternate between short bursts of messing around and then going back to write down what I just learned.

So I'll fiddle for 10-15 minutes, occasionally pausing to write small notes and important details down (like the CLI commands I used to achieve some particular result), but only after I achieve some significant amount of progress will I go back and write about what I just did in more verbose language.

Would it be possible to read the game's framebuffer directly to get rid of the screen grab step? I'm thinking that might be faster.

You can also sometimes grab the individually drawn elements from the graphics stack as well - it's likely to be less obfuscated than the game's memory layout.


With source code obviously yes, though it would need interfacing between Python and C++ (likely used in GTA V), then internal screen resizing, and as most of the time is spent in OpenCV anyway it might not give you the effect you wish to get.

You should look into method swizzling/hooking, to hook into a c++ method called on every frame, and add a line of code to save the frame buffer to disk. You can either do this at runtime via some kind of code injection, or at compilation time if you can manage to re-sign the game.

I don't know much about c++ reverse engineering but I've done this when reversing iOS apps (and this is what "tweaking" means in the jailbreak community). The same concepts should apply.

The first step is obtaining a class dump (what it's called with iOS) that shows you all the header files and gives you the information necessary to understand what's going on in a debugger so you can determine which method to hook.

Edit: this should point you in the right direction - https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker...

ReShade https://reshade.me is an open source DX9/DX11/OGL hook that adds shader effects… but also includes screenshot functionality! Should be possible to change that to streaming frames to your app…

(Just in general, doing this shouldn't depend on a particular game, it's a graphics API level thing)

This being such a popular multiplayer game, I'm sure the memory offsets of the position in the map/acceleration/etc are documented and you can just hook the process and read/write these

Or, because it is a popular multiplayer game (and thus prone to cheating), maybe some addresses are randomized & obfuscated.

Just google "gta 5 trainer". I'm sure there are dozens, but it's possible that none of them is open source.

Obviously with source code, I was thinking of something like RenderDoc but in the form of a Python library.

DXtory does this for games http://exkode.com/dxtory-features-en.html

you could also directly intercept/inject graphic calls like ENB and SweetFX, ENB already supports GTA5

This is really cool. I love how those "stuping, time wasting" games turn out to be - with a help from some smart individual - great learning tools.

This is also relevant - https://github.com/ai-tor/DeepGTAV (A plugin for GTAV that transforms it into a vision-based self-driving car research environment.)

Anyone know of a similar game that you can run under Linux?

I want to play with this but rather not spend the time to setup a vm and go through the whole pass-through dance or setup a dev env on windows.

Check out openai. They have a lot of Atari games (https://gym.openai.com/envs#atari) and a car racing game (https://gym.openai.com/envs/CarRacing-v0).

For the Atari/car racing games you get direct pixel output of the game so using a neural network for image processing is required. But other games have a much simpler environment where you can focus on reinforcement learning.

There's even classic doom. https://gym.openai.com/envs#doom

Thank you.

A couple of options:

Trigger Rally: This is available as a nodejs/javascript version (arguably the most advanced):


There is also a native Linux version (check your repo - also here: http://trigger-rally.sourceforge.net/).

There are other racing game simulators available for Linux - TORCS (The Open Racing Car Simulator) is another:


Udacity - as part of the Self-Driving Car Engineer Nanodegree, also open-sourced their driving simulator (I posted this link elsewhere):


I used it in the first term to implement a version of NVidia's End-to-End CNN (https://images.nvidia.com/content/tegra/automotive/images/20...). Something I ran into with the simulator, which may apply to other simulators (though I've played Trigger Rally before with keyboard only and no problems) is that joysticks were really touchy (major fast acceleration on movement of sticks, and no way to really adjust it easily under Linux - which was kinda a disappointment to see - I could do some adjustment, but nothing like what is available on Windows); I found that using a steering wheel controller worked best.

Look into the dolphin emulator for running GameCube and wii games. It's very well maintained with a smart community behind it, and I'm pretty sure it runs on Linux. I've used it on my mac for playing Mario kart and it was great. I actually found it through a post on here where someone was doing AI training on Mario kart through dolphin.

You could use SuperTuxCart. I'm setting this up, actually.

I was thinking of Euro Truck simulator. Or SuperTuxKart.

Liftoff (quadcopter flying sim) would be interesting.

Now this is is a great way to get people interested in coding. I'd of loved to have something like this when I was learning!

Great tutorial, watched the first few when he did them a week or so ago. I was one of the people suggesting he use PID controllers. While this is a ways off, it is fun to think of a project like this evolving into an open source self driving car package. pip install jalopy, or something like that.

Any chance you know how to implement PID control? I'm still throwing errors. Last night I was looking through a pull request that threw a breaking error and realized it was due to two uses of D (one for the key, one for the D in PID). Still having issues though after fixing that.

If you have any ideas: https://github.com/Sentdex/pygta5/pull/3

Yes, I do. Also, I was the guy who suggested using two PI controllers- you can probably get by without the D term. I will look at it later tonight.

>pip install jalopy


The tutorial is great and the other website resources seem to be really interesting. Glad it was shared here !

I would love to have public API for Witcher3 or GTAV. Imagine all those crazy things you could do.

Part of my idea for this course was that you could use these methods on any game. You don't need anyone to make an API, you read the frames in, send direct keys, and you're all set.

That said, there are various things like scripthookpy that let you communicate with GTA V to do things other than AI with Python.

I am personally more curious about the AI aspect, so I figure any game that I can play visually, an AI can too with these methods.

In a way, you do. The inputs are well defined (mouse and keyboard), and the outputs (graphics) can be captured fairly easily. Interpreting those outputs is a lot harder than interpreting a stream of "character A at point (X,Y,Z), character B at point...", I agree.

screen capturing technique also lacks things like find(X,Y,Z) or events/triggers (ex. when collision happens)

True, it fails to provide explicit notifications, but it does provide implicit notifications which could be recognized.

What I want is a full public API to control all aspects of the current view and character.

At a minimum, I want to be able to send positional data to control the orientation and position of the head, and at least one of the hands (ragdoll physics and IK can take care of the rest). That way, I could control where I look with my head, and where I point or grab with my arm - independently.

Ideally - you'd have both arms/hands free - plus you'd need inputs to control other aspects.

For the head, if you crouch down, it would know this, and adjust the body model as well (or perhaps those parts could be controlled as well). The goal?

Better immersion for VR - but platform agnostic; the thing is, I don't want future games to be beholden to a particular or set of HMDs (ie, this game only works with the RIFT, but not the Vive, etc). I also own a nice older kit of magnetic trackers (an old Ascension Flock) that I could use, along with some older HMDs (as a contributor to the KS, I also have the original DK1, plus the later KS backer CV1 - but both of these aren't well supported under Linux, which ticks me off). Basically, let me (or anyone else) control the character in full, and just provide a way to output two stereo frames (don't even distort them - let third-party software deal with that - because you might have an older HMD that doesn't need the distortion). Then get out of the way.

I doubt that we'll see such a development though.

Wow, just wow. I am going to try doing this with my GTA SA.

I am amazed with how versatile and fun Python is!

Very neat! I can't wait to take a look at this and give it a shot. The author mentioned it would be better to offload the screen capture to the GPU, would this be achieved through something like CUDA?

Using python for all those individual WIN32 calls looks awkward. This interfacing would be easier in pure C.

i think the main contribution from this is just finding a way to get the frames from the game into opencv. Would be interesting to see a more "real" self driving ai method with this. I.E. a CNN

Could you run GTA5 and a CNN at the same time? I think that would fry your GPU.

It can work - I won't re-post everything again here, but I've already mentioned in this thread how I used my 750 ti for training a CNN based on NVidia's End-to-End model to self-drive a virtual vehicle around a simulated track (this was part of Udacity's Self-Driving Vehicle Engineer Nanodegree).

Udacity supplied the simulator, I set up CUDA and such to work with TensorFlow and Keras under Python. I had to drop the res down on the simulator (windowed at 800x600 IIRC), and training was done without the simulator (I used the simulator to generate the datasets - this is built into the simulator). The resulting model was actually fairly small (less than 300 kbytes); I scaled down the input images (originally 640x480, scaled to 160x120) to limit the number of initial parameters to the CNN, then applied aggressive dropout and other tricks to further keep things in check thru the layers as I trained (I could have used batch processing, too). The resulting model worked well with the simulator afterward - and it had no problem with memory or anything else in order to keep up.

yo dawg! I heard U like GTA V and Python tutorials, so I put a tutorial inside your tutorial so you can play GTA V while you're learning!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact