Hacker Newsnew | past | comments | ask | show | jobs | submit | dpf's commentslogin

code-davinci-002 is a base LM, and the other 3.5 models (text-davinci-{002,003}, gpt-3.5-turbo, and ChatGPT) use instruction tuning and/or RLHF. Source: https://platform.openai.com/docs/model-index-for-researchers



These environments are often used as a testbed for reinforcement learning, e.g. https://arxiv.org/abs/1502.05477


"However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevertheless, often the overall gist of what the policy does in simulation remains valid in the real world. In this paper we investigate such settings, where the sequence of states traversed in simulation remains reasonable for the real world, even if the details of the controls are not, as could be the case when the key differences lie in detailed friction, contact, mass and geometry properties"

from Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

https://arxiv.org/abs/1610.03518

by Christiano, Shah, Mordatch, Schneider, Blackwell, Tobin, Abbeel, & Zaremba


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: