Hacker News new | past | comments | ask | show | jobs | submit login
TensorKart: self-driving MarioKart with TensorFlow (kevinhughes.ca)
631 points by pickle27 on Jan 4, 2017 | hide | past | favorite | 68 comments

In contrast, here's what is effectively an oracle machine playing mario kart: https://www.youtube.com/watch?v=ZBNgbJ5hXtQ

(Amazingly detailed) info: http://tasvideos.org/5243S.html

I like how it just glitches itself to an almost instant win on half of the maps.

These are Tool-Assisted Speedruns. That means that it's a human player using things like slow motion, mem dumps and other mechanisms to play perfect games. It's more an example of human's abilities when augmented with computers than AI discovering those glitches itself.

The most amazing run I've seen so far was a RTA (realtime time attack) of mega man 2. A human player is manually collision glitching and writing over memory with a sequence of inputs. And the RTA time in 2016 is now faster than the initial TAS records.

Do you have a link? Sounds like it'd be a very interesting watch.

Edit: I found one that has an example (I think) around 7:38 http://www.nicovideo.jp.am/watch/sm13963118 - the collision detection pushes megaman into the wall and jumps between different sections. Very interesting indeed!

Sorry I think the one I was thinking of was Megaman 2. I did find a link for you. Starts at 2:36. http://www.nicovideo.jp/watch/sm23825129

Also you might be interested in the Final Fantasy 6 memory overwrite bug that was discovered in 2016 as well. It uses the Window Color menu settings as the data reference.

Btw regarding the Megaman 2 RTA, there's an even more ridiculous collision bug being used around 11:30 http://www.nicovideo.jp/watch/sm28321223

Thank you for sharing! I thought the Moo Moo Farm run was incredible on its own, THEN I saw the write-up. Blown away.

This is awesome! So much fun to watch these shortcuts.

Congrats on finishing the project! As you've already linked at the bottom of your post, it's possible that OpenAI could've solved most of your I/O issues.

One thing I'd suggest is exploring a reward function, instead of using only pre-recorded training data. That is, give the AI a goal to complete (in this case, finish the race) and let it learn by itself!

I would love to learn how to do that - any suggestions?

EDIT: to clarify: what should I google for?

Here's what I could find in a couple minutes:


OpenAI's example universe agent. Remember that while their goal is an agent that works in any and all environments (read: games), you could certainly optimize yours just for MarioKart.

Thanks, looks promising! Can't wait to try it! :)

Reinforcement Learning. Here's a good intro: http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

Perfect, thank you!!! You made my day. :-D

...later found this nice explanation of RL concepts if it helps someone: https://www.nervanasys.com/demystifying-deep-reinforcement-l...

This is pretty cool; as someone who is currently working on the second project (traffic sign recognition) for the Udacity "Self-Driving Car Engineer" nanodegree, using TensorFlow - it is interesting to me how it seems like the "standard" MNIST CNN can be adapted to so many other use cases.

For the project I am currently working on, I'm using a slightly modified form of LeNet - which isn't too different from the TF MNIST tutorial; after all, recognizing traffic signs isn't much different than recognizing hand-written numbers...

...but "driving" a course? That seems radically different to my less-than-expert-at-TensorFlow understanding, but that is only due to my ignorance.

I'm glad that these examples and demos are being investigated and made public for others - especially people learning like myself - to look at and learn from.

From the post:

> Later, I switched to use Nvidia’s Autopilot...

So I guess he didn't use the MNIST CNN model.

However, if you look at the code:


You can see that it follows much the same pattern as LeNet CNN for MNIST - a few (ok, more than a few!) convolutional layers followed by a few fully connected layers.

Maybe you could call it a "follow on" or perhaps an ANN pattern?:

Conv -> Conv -> Reshape/Flatten -> FC -> FC -> FC

(disregarding activation and such)

...which is really the lesson of the LeNet MNIST CNN - at least, that's my takeaway.

You're right, that does look similar... I expected this to be based on some type of RNN!

As someone who's interested in taking the Udacity course, would your recommend it? Do you think the course prepares you enough find a Self-Driving developer job? Would you learn enough to compete/work along side people who got their Masters/PhD in Machine Learning? Appreciate your input.

> As someone who's interested in taking the Udacity course, would your recommend it?

So far, yes - but that has a few caveats:

See - I have some background prior to this, and I think it biases me a bit. First, I was one of the cohort that took the Stanford-sponsored ML Class (Andrew Ng) and AI Class (Thrun/Norvig), in 2011. While I wasn't able to complete the AI Class (due to personal reasons), I did complete the ML Class.

Both of these courses are now offered by Udacity (AI Class) and Coursera (ML Class):



If you have never done any of this before, I encourage you to look into these courses first. IIRC, they are both free and self-paced online. I honestly found the ML Class to be easier than the AI class when I took them - but that was before the founding of these two MOOC-focused companies, so the content may have changed or been made more understandable since then.

In fact, now that I think about it, I might try taking those courses again myself as a refresher!

After that (and kicking myself for dropping out of the AI Class - but I didn't have a real choice there at the time), in 2012 Udacity started, and because of (reasons...) they couldn't offer the AI Class as a course (while for some reason, Coursera could offer the ML Class - there must have been licensing issues or something) - so instead, they offered their CS373 course in 2012 (at the time, titled "How to Build Your Own Self-Driving Vehicle" or something like that - quite a lofty title):


I jumped at it - and completed it as well; I found it to be a great course, and while difficult, it was very enlightening on several fronts (for the first time, it clearly explained to me exactly how a Kalman filter and PID worked!).

So - I have that background, plus everything else I have read before then or since (AI/ML has been a side interest of mine since I was a child - I'm 43 now).

My suggestion if you are just starting would be to take the courses in roughly this order - and only after you are fairly comfortable with both linear algebra concepts (mainly vectors/matrices math - dot product and the like) and stats/probabilities. To a certain extent (and I have found this out with this current Udacity course), having a knowledge of some basic calculus concepts (derivatives mainly) will be of help - but so far, despite that minor handicap, I've been ok without that greater knowledge - but I do intend to learn it:

1. Coursera ML Class 2. Udacity AI Class 3. Udacity CS373 course 4. Udacity Self-Driving Car Engineer Nanodegree

> Do you think the course prepares you enough find a Self-Driving developer job?

I honestly think it will - but I also have over 25 years under my belt as a professional software developer/engineer. Ultimately, it - along with the other courses I took - will (and have) help me in having other tools and ideas to bring to bear on problems. Also - realize that this knowledge can apply to multiple domains - not just vehicles. Marketing, robotics, design - heck, you name it - all will need or do currently need people who understand machine learning techniques.

> Would you learn enough to compete/work along side people who got their Masters/PhD in Machine Learning?

I believe you could, depending on your prior background. That said, don't think that these courses could ever substitute for graduate degree in ML - but I do think they could be a great stepping stone. I am actually planning on looking into getting my BA then Masters (hopefully) in Comp Sci after completing this course. Its something I should have done long ago, but better late than never, I guess! All I currently have is an associates from a tech school (worth almost nothing), and my high school diploma - but that, plus my willingness to constantly learn and stay ahead in my skills has never let me down career-wise! So I think having this ML experience will ultimately be a plus.

Worst-case scenario: I can use what I have learned in the development of a homebrew UGV (unmanned ground vehicle) I've been working at on and off for the past few years (mostly "off" - lol).

> Appreciate your input.

No problem, I hope my thoughts help - if you have other questions, PM me...

> the Udacity "Self-Driving Car Engineer" nanodegree

That looks like a great course by the way, thanks for sharing.

I'm one of cr0sh's classmates. I don't have any background in ML/AI/etc, so I've had to supplement the Udacity course materials with a lot of external resources (just finished watching the Stanford CS231n course, which was very helpful), but overall the course been really interesting+fun so far. It's really nice to be exposed to new kinds of tech I've never heard of / used before. Refreshing change from webdev.

If you're strapped for cash and don't want to pay the $800/term, you could definitely learn these things on your own using free online resources. If you don't mind the price, though, I've found this course worth the time+money+effort so far. [they're not paying me to say this :)]

A bit off-topic, but does anyone know how good this[0] course is?

[0] https://www.udacity.com/course/deep-learning--ud730

i was quite put off by it. i feel like the teaching technique is pretty poor and the focus in on all the wrong things. mainly the tech gets in the way for learning. i don't want to figure out how to learn numpy when i'm trying to learn how to understand deep learning, that in itself is hard enough. i quite after a week (i did the stanford course first and this was going to be my second).

i would recommend the coursera course by andrew ng. i had an amazing time. the code stays out of your way and he walks you through the algorithms and explains the theory very well.

i just started the fast.ai by jeremy howard, and literally have been blown away but the course. it is AMAZING! by lesson 3 i'm able to build cnn models and score on top 20% in kaggle competitions. not bad for a complete novice. HIGHLY RECOMMENDED.

once im done with the fast.ai course i may look back around to google's deep learning course. i think it may be easier for more experienced users to digest its info.



Edit: added fast.ai link

Felt very rushed for a beginner, okay if you have some background.

Personally, I found Stanford dl courses (image classification, nlp) to be much more suitable for beginners.

I'm also a web developer and I plan on taking the SDC nanodegree program in the February.

Ideally, I want to work on self driving cars or AI in my day to day job, but I don't want to get my hopes up. Do you think that after you complete the nanodegree you will attempt to change your career to an SDC engineer or AI/ML engineer? Or is this just meant to fulfill a curiosity of yours?

As someone who's interested in taking the Udacity course, could you understand the course better by answering the following questions? Do you think the course prepares you enough find a Self-Driving developer job? Would you learn enough to compete/work along side people who got their Masters/PhD in Machine Learning? Appreciate your input.

The inevitable follow-up article that delves into training offensive banana peel usage should be interesting.

Quote: "Driving a new (untrained) section of the Royal Raceway:"

So the author did a proper test of the model by scoring it on an unseen track to make sure it generalizes! This is very awesome!

How did we get from "bare minimum sensible testing" to "This is very awesome!"? Are things that bad on average?

There's probably a broad range of people in the hn community.

Fair point.

NB: generalization should be one of, if not the first thing you think about and plan for in any ML project.

It is pretty cool! I'd love to see how it did on the other parts of that track as well, if that was tested

Personally I think the most impressive thing here isn't that you created a self-driving MarioKart, but that you trained TensorFlow based on input screenshots of your desktop.

I feel like that could be a good next step - a ubiquitous neural net model that, after mapping inputs, will learn to play any video game that's on your screen.

Especially since the hard work of increasing the screen resolution has already been done.

Also, bravo on including the stupid little bugs that gave you trouble. It always sustains me working on a hard project to know that a self-driving video game was blocked by a missing newline in a C HTTP request. It makes me step back and laugh at the ridiculous complexity of what we take for granted in our day to day work.

There's work being done to allow reinforcement nets to do transfer learning.

Best part, "With this in mind I played more MarioKart to record new training data. I remember thinking to myself while trying to drive perfectly, “is this how parents feel when they’re driving with their children who are almost 16?”"

It's basically a project these days at Udacity's Self-driving car nanodegree under "Behavioral Cloning" ;-)

I was ready to be impressed about seeing an AI that could consistently beat the game's own AI, with blue turtle shells and all. Oh well, still pretty impressive to be able to drive on the easiest course without opponents.

Check out also MarI/O, very impressive: https://www.youtube.com/watch?v=qv6UVOQ0F44

How are the original computer opponents able to play MarioKart?

1. The AI in games has access to internal representations of game state and does not have to recognize it from pixels on screen. This is a massive difference.

2. The logic is usually a bunch of (human-authored) scripts consisting of if-else spaghetti.

Also, the AI opponents don't have to play by the same rules. They go by fun > fairness to keep things interesting. That's why you normally can't keep a huge lead on AI opponents, because they "rubberband" back up to you faster than they should be able to.

Wouldn't surprise me if they don't even 'drive' in any sense while off-screen, just increment some abstract position relative to the track length. But I don't know this for a fact.

"Wouldn't surprise me if they don't even 'drive' in any sense while off-screen, just increment some abstract position relative to the track length. But I don't know this for a fact." This seems unlikely, especially given how item pickup zones operate. Since an item box disappears for a short period of time after someone drives over it, it's imperative that the position of the CPU player who drove over it, and the one that comes after that (and inherently gets no item) is represented accurately. Even off-screen, AI continues to collect and utilize items.

Then again, maybe this is just done by cheating simply with an RNG.

This is indeed what they do. Mario Kart 64 is known for having the most egregious rubberbanding ever.

How do you know this stuff? I find it insanely interesting.

Tell me more!

Mostly experience, but you should get plenty of results by just Googling "mario kart rubber banding". Looks like the top result mentions a patent on an algorithm for it, but I'm blocked at work.

It's a common enough term, though, there's even a page for the trope here:


Personally, I'm just a little impressed that you can train an active agent to play a game using old-fashioned supervised learning on screen states and controller states rather than relying on "action-oriented" learning techniques like reinforcement learning, online learning, or even a recurrent model.

It really shows how simple many control tasks actually are!

This is exactly what I wondered about. So what exactly is the function you are training for? Is it basically like "if the screen (showing the track) looks like this, apply these controls"?

An more accurate description of the function would be "given this picture of a screen, what is the most likely key my author was pressing in this situation" - no goals, no values, no optimization, but simply learning to imitate the actions performed by a human.

Coincidentally, one of the neural network components in AlphaGo did pretty much the same, i.e. attempted to guess what human player would usually play in this situation purely based on the image and nothing else.

In TFA it says that he was training a supervised learner to predict the control state from the screen state. So yes, "if the screen looks like this, apply these controls", and that can play Mario Kart 64.

Next, have it upload its race results to kartlytics

Yeah - that was a very cool project!

omg! thanks for the refference to kartlytics. I had no idea this existed!

I'm interested in knowing why the Python and C components communicate with HTTP, beyond reading about the bugfix. Wouldn't it be easier to use sockets or files or some other mechanism to integrate the two languages?

Just something to think about as a developer. I would imagine that on a local machine, using HTTP as the protocol might add latency.

This was my initial reaction as well; it seems like a raw socket or even embedding a Python interpreter would be better ways to go.

Pretty interesting I must say. Have to admit though, I kind of expected the self driving AI to be trying to win Grand Prix or Versus races instead of doing well in Time Trials. But hey, I can see how that would be utterly painful to try and set up, especially given how times you get hit by items or rammed off the track in more recent games.

Step 1 is to make the AI find an ideal path through the course.

Step 2 is to make AI figure out how to return to the ideal path through the course when other people are stealing your items or shelling you.

step 3 is to make the AI figure out how to counter attack to slow down the opponents.


Step 2.5 would be to make the AI figure out how to evade or minimize the effect of or ability to initiate opponents' offensive moves. That would be the most interesting bit to me. Would be neat to see an AI intentionally stay in 2nd place with an item at the ready until the home stretch, to avoid being blue-shelled.

Intentionally stay in second, unless it has reason to believe that it can stay in 1st place, even after getting blue-shelled.

But yes. point being, self driving is a feat of it's own, competing with opponents is a whole different ballgame with it's own set of challenges.

It would be very interesting to see how well this does with more training data, especially with multiple players.

This is very cool and I think if Kevin spends a bit of time learning reinforcement learning it could be amazing.

It seems like a lot of people doing reinforcement learning on video games get bogged down on training on raw pixels only... it would take a tremendous amount of data to make the driver recognise when and where to use certain power ups, however if you encoded this as a variable, wow it could be really cool.

I believe this is fundamentally how we humans learn with so few examples. Other humans "encode features for our brain to track" by telling us how it should be done and what information to prioritise.

This is really cool, and any reason to bring this game back into my life is warmly welcomed

I love this!

I'm working (albeit very slowly, as a beginner) on a similar project with Geometry Dash and Python. You're a great inspiration!

I appreciate the write-up. Thank you!

great write up! this is awesome

No power slide? Failure.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact