Hacker News new | past | comments | ask | show | jobs | submit login

OAI revealed on Twitter that there is no "system" at inference time, this is just a model.

Did they maybe expand to a tree during training to learn more robust reasoning? Maybe. But it still comes down to a regular transformer model at inference time.






Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

https://arxiv.org/pdf/2403.09629

> In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting – ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions.

>[...]

>We generate thoughts, in parallel, following all tokens in the text (think). The model produces a mixture of its next-token predictions with and without a thought (talk). We apply REINFORCE, as in STaR, to increase the likelihood of thoughts that help the model predict future text while discarding thoughts that make the future text less likely (learn).


I don't think you can claim you know what's happening internally when OpenAI processes a request. They are a competitive company and will lie for competitive reasons. Most people think Q-Star is doing multiple inferences to accomplish a single task, and that's what all the evidence suggests. Whatever Sam Altman says means absolutely nothing, but I don't think he's claimed they use only a single inference either.

what is “all the evidence”? please share

I recommend getting on Twitter to follow closely the leading individuals in the field of AI, and also watch the leading Youtube channels dedicated to AI research.

can you link to one speculating about multiple inferences for their CoT? i am curious

e: answer to my own question https://x.com/_xjdr/status/1835352391648158189


So far it's been unanimous. Everyone I've heard talk about it believes Strawberry is mainly just CoT. I'm not saying they didn't fine tune a model too, I'm just saying I agree with most people that clever CoT is where most of the leap in capability seems to have come from.

> believes Strawberry is mainly just CoT. I'm not saying they didn't fine tune a model too

You don't see the scaling with respect to token length with non-FT'd CoT like this, in my opinion.


I haven't even added Strawberry support to my app yet, and so haven't checked what it's context length is, but you're right that additional context length is a scaling factor that's totally independent of whether CoT is used or not.

I'm just saying whatever they did in their [new] model, I think they also added CoT on top of it, as the outer layer of the onion so to speak.


Source?

> I wouldn't call o1 a "system". It's a model, but unlike previous models, it's trained to generate a very long chain of thought before returning a final answer

https://x.com/polynoamial/status/1834641202215297487


That answer seems to conflict with "in the future we'd like to give users more control over the thinking time".

I've gotten mini to think harder by asking it to, but it didn't make a better answer. Though now I've run out of usage limits for both of them so can't try any more…


I'm not convinced there isn't more going on behind the scenes but influencing test-time compute via prompt is a pretty universal capability.

not in a way that it is effectively used - in real life all of the papers using CoT compare against a weak baseline and the benefits level off extremely quickly.

nobody except for recent deepmind research has shown test time scaling like o1


i am telling claude to give me not the obvious answer. that put thinking time up and the quality of answers is better. hope it helps.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: