This is a clever approach for solving a very old problem. And that problem is that random exploration really doesn't work for long time sequences:
A while ago, we participated in the Bomberland competition (won semi-finals, 2nd place overall) and we went with deep learning reinforcement learning and here's the rough math: It takes 1 step to place the bomb, 5 for the bomb to arm, 1 to detonate, 10 while there's fire. So you might not know until 17 steps later if placing that bomb was a good idea. You have 7 actions (up,down,left,right,bomb,detonate,nop) so the chance of getting that 17-step sequence correct by chance is (1/7)^17 = 4*10^-15. Or said the other way around, you need to let the AI try around 232,630,513,987,207 times to get it correct once.
And that's why imitating someone else - no matter if they are an expert or not - will massively boost your learning performance. Because even the worst Mindcraft player on YouTube is still doing 1000x better than truly random exploration.
(That said, we lost in the Bomberland finals against someone who analyzed the game theoretically and then just hard-coded the perfect strategy. Sometimes thinking hard is superior to all AI approaches...)
This is exciting stuff! I've been using the same process / algorithms for a video game analytics platform I've been building for the past few years. I uploaded couple videos of my work-in-progress which demonstrates many of these techniques first-hand in some games. Here are some videos to check out:
Not exactly, they train it on 2000 hours of labeled video (they have people play the game and record the inputs at the same time) and then use those 2000 hours to figure out the inputs of the 70000 hours of unlabeled video. Then it uses all data to play the game by itself using a video feed.
Yeah I think the key behind all AI we are seeing is user generated content. The longer we have the internet, the more is available and tagged for algorithm consumption
> If YouTube can be used to train models we'll have AGI pretty soon!
And its face will be stuck in a permanent :O grimace.
And. every little. thing. will be the most shocking. biggest surprise since the. invention of. surprises. Plus a heaping. spoonful of jump.cuts.
But hey at least it's gonna talk really fast to squeeze all the info into the first 3 minutes, then repeat itself a lot to meet an arbitrary length requirement.
I tried installing the minecraft example on a couple machines (Win 10 and Ubuntu 22.04) and it failed on both (for the same reason- the code is already so deprecated it no longer compiles without fiddling with flags). Bummer.
A while ago, we participated in the Bomberland competition (won semi-finals, 2nd place overall) and we went with deep learning reinforcement learning and here's the rough math: It takes 1 step to place the bomb, 5 for the bomb to arm, 1 to detonate, 10 while there's fire. So you might not know until 17 steps later if placing that bomb was a good idea. You have 7 actions (up,down,left,right,bomb,detonate,nop) so the chance of getting that 17-step sequence correct by chance is (1/7)^17 = 4*10^-15. Or said the other way around, you need to let the AI try around 232,630,513,987,207 times to get it correct once.
And that's why imitating someone else - no matter if they are an expert or not - will massively boost your learning performance. Because even the worst Mindcraft player on YouTube is still doing 1000x better than truly random exploration.
(That said, we lost in the Bomberland finals against someone who analyzed the game theoretically and then just hard-coded the perfect strategy. Sometimes thinking hard is superior to all AI approaches...)