Hacker News new | past | comments | ask | show | jobs | submit login

The main issue with self-play is that unless done very methodically it can lead to behavior that does not learn what we want it to learn, but games the simulation and basically cheats. Its not a perfect solution and is just another tool being used to improve models. It can produce really interesting results especially in complex games but its also not perfect.



I agree... I feel like advanced games will always need guidance to alter the game such that they AI don't reach some local maxima through an exploit.

This happens all the time when making games for humans and is evident by how many balancing patches are made to new, highly competitive games (such as any Blizzard title).

The next logical step would be for another, impartial AI to observe the games and changing the rules and parameters intelligently as they evolve to guide the player AI toward the actual goal.

So, I'll just dive sideways right into a religious/philosophical thought based on the simulation discussion I've been having all over the place:

A universe-sized simulation built for a purpose, which requires simulated intelligence to carry out that purpose, would almost certainly include a God intelligence to alter parameters and induce suffering/hardship to direct the simulated intelligence toward that purpose.


Earlier iterations are buggier and have poorer dev tools. So the God intelligence has more need to smite and command the AIs within the game.

After a while the bugs are ironed out, so God can settle back and gently tweak parameters at a distance.


> After a while the bugs are ironed out, so God can settle back and gently tweak parameters at a distance.

That explains the hands off approach God has lately with the human society.


>would almost certainly include a God intelligence to alter parameters and induce suffering/hardship to direct the simulated intelligence toward that purpose.

Wow that's pretty deep, actually.


One could argue that "cheating" is a valid solution.


Only when the learning environment is the same environment it will use when deployed. Agents for robots are often trained in simulation, where it's a common problem for the agent to exploit a physics bug in the simulator.


You only get what you measure. If you can figure out how to measure the right thing, then that's what you will get.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: