Hacker News new | past | comments | ask | show | jobs | submit login
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos (arxiv.org)
63 points by jonbaer on June 27, 2022 | hide | past | favorite | 10 comments



This is a clever approach for solving a very old problem. And that problem is that random exploration really doesn't work for long time sequences:

A while ago, we participated in the Bomberland competition (won semi-finals, 2nd place overall) and we went with deep learning reinforcement learning and here's the rough math: It takes 1 step to place the bomb, 5 for the bomb to arm, 1 to detonate, 10 while there's fire. So you might not know until 17 steps later if placing that bomb was a good idea. You have 7 actions (up,down,left,right,bomb,detonate,nop) so the chance of getting that 17-step sequence correct by chance is (1/7)^17 = 4*10^-15. Or said the other way around, you need to let the AI try around 232,630,513,987,207 times to get it correct once.

And that's why imitating someone else - no matter if they are an expert or not - will massively boost your learning performance. Because even the worst Mindcraft player on YouTube is still doing 1000x better than truly random exploration.

(That said, we lost in the Bomberland finals against someone who analyzed the game theoretically and then just hard-coded the perfect strategy. Sometimes thinking hard is superior to all AI approaches...)


This is exciting stuff! I've been using the same process / algorithms for a video game analytics platform I've been building for the past few years. I uploaded couple videos of my work-in-progress which demonstrates many of these techniques first-hand in some games. Here are some videos to check out:

https://www.youtube.com/watch?v=HO2c1RPdPpI - Recording mouse / keyboard

https://www.youtube.com/watch?v=ASd5kdXz3hA - Learning a fighting game combo in Tekken

https://www.youtube.com/watch?v=q6SgsyAY2G4 - Tracking / graphing character progression in Path of Exile


This looks really great. Best of luck.

Crowd-sourcing labels and tags on top of the videos would be (too) amazing. Could actually ruin online gaimng..


Yannic Kilcher video on this paper: https://youtu.be/oz5yZc9ULAc


Super interesting, tx for the link.


So their untrained model watches a lot of online videos of people playing Minecraft and then plays like a pro?

If YouTube can be used to train models we'll have AGI pretty soon!


Not exactly, they train it on 2000 hours of labeled video (they have people play the game and record the inputs at the same time) and then use those 2000 hours to figure out the inputs of the 70000 hours of unlabeled video. Then it uses all data to play the game by itself using a video feed.


Yeah I think the key behind all AI we are seeing is user generated content. The longer we have the internet, the more is available and tagged for algorithm consumption


> If YouTube can be used to train models we'll have AGI pretty soon!

And its face will be stuck in a permanent :O grimace.

And. every little. thing. will be the most shocking. biggest surprise since the. invention of. surprises. Plus a heaping. spoonful of jump.cuts.

But hey at least it's gonna talk really fast to squeeze all the info into the first 3 minutes, then repeat itself a lot to meet an arbitrary length requirement.


I tried installing the minecraft example on a couple machines (Win 10 and Ubuntu 22.04) and it failed on both (for the same reason- the code is already so deprecated it no longer compiles without fiddling with flags). Bummer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: