(arxiv: cs > arXiv:2203.14465, Computer Science > Machine Learning, Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman, 2022)
>"We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat.
[...]
Thus, STaR lets a [AI] model improve itself by learning from its own generated reasoning*."
"What is Q<star>?":
https://www.youtube.com/watch?v=Z6E41eXStsU&t=329s
"LLM's [AI's] today still don't reason very well":
https://www.youtube.com/watch?v=Z6E41eXStsU&t=405s
Peter Liu, GSM8K/STaR tweet:
https://www.youtube.com/watch?v=Z6E41eXStsU&t=465s
"STaR: Bootstrapping Reasoning With Reasoning"
https://arxiv.org/abs/2203.14465
(arxiv: cs > arXiv:2203.14465, Computer Science > Machine Learning, Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman, 2022)
>"We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat.
[...]
Thus, STaR lets a [AI] model improve itself by learning from its own generated reasoning*."