In a sense, I thought it was more human-in-the-loop rather than an explicit RL o...

		tbalsam on Jan 7, 2023 \| parent \| context \| favorite \| on: Playing games with AIs: The limits of GPT-3 and si... In a sense, I thought it was more human-in-the-loop rather than an explicit RL objective (i.e. potentially somewhat limited reward surface, even if there's a reward model trained from it)