dpf's comments

dpf · on May 5, 2023

code-davinci-002 is a base LM, and the other 3.5 models (text-davinci-{002,003}, gpt-3.5-turbo, and ChatGPT) use instruction tuning and/or RLHF. Source: https://platform.openai.com/docs/model-index-for-researchers

dpf · on March 1, 2018

Previous discussion:

https://news.ycombinator.com/item?id=11951444 (2016)

https://news.ycombinator.com/item?id=2591154 (2011)

dpf · on May 16, 2017

These environments are often used as a testbed for reinforcement learning, e.g. https://arxiv.org/abs/1502.05477

deepnet · on May 16, 2017

"However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevertheless, often the overall gist of what the policy does in simulation remains valid in the real world. In this paper we investigate such settings, where the sequence of states traversed in simulation remains reasonable for the real world, even if the details of the controls are not, as could be the case when the key differences lie in detailed friction, contact, mass and geometry properties"

from Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

https://arxiv.org/abs/1610.03518

by Christiano, Shah, Mordatch, Schneider, Blackwell, Tobin, Abbeel, & Zaremba