What Is Q*? [video]

peter_d_sherman · 2023-12-02T05:21:34.000000Z

Specific points of time in video:

"What is Q<star>?":

https://www.youtube.com/watch?v=Z6E41eXStsU&t=329s

"LLM's [AI's] today still don't reason very well":

https://www.youtube.com/watch?v=Z6E41eXStsU&t=405s

Peter Liu, GSM8K/STaR tweet:

https://www.youtube.com/watch?v=Z6E41eXStsU&t=465s

"STaR: Bootstrapping Reasoning With Reasoning"

https://arxiv.org/abs/2203.14465

(arxiv: cs > arXiv:2203.14465, Computer Science > Machine Learning, Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman, 2022)

>"We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat.

[...]

Thus, STaR lets a [AI] model improve itself by learning from its own generated reasoning*."