It might. But also, all decisions and knowledge is pretty much based on a resamp...

erwincoumans · on Feb 24, 2024

Self-play GPT (by bots in a rich simulation) similar to Alpha Go Zero?

wizzwizz4 · on Feb 24, 2024

Self-play works for Go, because the "world" (for lack of a better term) can be fully simulated. Human language talks about the real world, which we cannot simulate, so self-play wouldn't be able to learn new things about the world.

We might end up with more regularised language, and a more consistent model of the world, but that would come at the expense of accuracy and faithfulness (two things which are already lacking).

codetiger · on Feb 24, 2024

Games like Alpha Go have very limited(or known) end state so reinforcements learning or similar methods work great. However, I wonder how will AI train itself in learning human languages without being judged by humans. It’s just a matter of time before someone figures out

erwincoumans · on Feb 24, 2024

Right, a rich simulator with humans for feedback: an evolved version of online worlds with a mix of AI NPC's and real people, with the task: find the NPC's. The NPC's can train in rooms with exclusive NPC's or mixed with people, without knowing.