> The claim that the code is inefficient is really not substantiated well in this blog post.
I didn't run benchmarks, but in the case of clang writing zeros to memory (which are never used thereafter), there's no way that particular code is optimal.
For the gcc output, it seems unlikely that the three versions are all optimal, given the inconsistent strategies used. In particular, the code that sets the output value to 0 or 1 in the size = 3 version is highly unlikely to be optimal in my opinion. I'd be amazed if it is!
Your point that unintuitive code is sometimes actually optimal is well taken though :)
I assume the agent somehow found this out and developed the behavior of going in and out of the room until the key shows up (which, with enough agent randomness it apparently will).
> In addition, the agent learns to exploit a flaw in the emulator to make a key re-appear at minute 4:25 of the video
After a bit of debugging, this appears to be a very intentional feature in the game rather than a flaw. That key appears after a while if you're not in the room (and don't have one).
I'm not sure if this is a previously known feature in the game (a quick google search does not reveal much). It would be quite interesting if the RL agent was the first to find it!
PS: If you launch MAME with the "-debug" option and press CTRL+M you can see the whole memory (atari 2600 only has 128 bytes!!) while playing the game. If you keep an eye on the byte at 0xEA you will know when the key is about to pop up.
Alternatively you can speed things along by changing it yourself to a value just below 0x3F.
I didn't run benchmarks, but in the case of clang writing zeros to memory (which are never used thereafter), there's no way that particular code is optimal.
For the gcc output, it seems unlikely that the three versions are all optimal, given the inconsistent strategies used. In particular, the code that sets the output value to 0 or 1 in the size = 3 version is highly unlikely to be optimal in my opinion. I'd be amazed if it is!
Your point that unintuitive code is sometimes actually optimal is well taken though :)
reply