Hacker News new | past | comments | ask | show | jobs | submit login

It confused me that their stated evaluations by humans are comparing video clips rather than evaluating game play.

Short clips are the only way a human will make any errors determining which is which.

More relevant is if by _playing_ it they couldn’t tell which is which.

They obviously can within seconds, so it wouldn't be a result. Being able to generate gameplay that looks right even if it doesn't play right is one step.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
