Hacker News new | past | comments | ask | show | jobs | submit login

The model performance is driven by chain of thought, but they will not be providing chain of thought responses to the user for various reasons including competitive advantage.

After the release of GPT4 it became very common to fine-tune non-OpenAI models on GPT4 output. I’d say OpenAI is rightly concerned that fine-tuning on chain of thought responses from this model would allow for quicker reproduction of their results. This forces everyone else to reproduce it the hard way. It’s sad news for open weight models but an understandable decision.




The open source/weights models so far have proved that openAI doesn't have some special magic sauce. I m confident we ll soon have a model from Meta or others that s close to this level of reasoning. [Also consider that some of their top researchers have departed]

On a cursory look, it looks like the chain of thought is a long series of chains of thought balanced on each step, with a small backtracking added whenever a negative result occurs, sort of like solving a maze.


I suspect that the largest limiting factor for a competing model will be the dataset. Unless they somehow used GPT4 to generate the dataset somehow, this is an extremely novel dataset to have to build.


They almost definitely used existing models for generating it. The human feedback part, however, is the expensive aspect.


I would love to see Meta releasing CoT specialized model as a LoRa we can apply to existing 3.1 models


Isn't it what Reflection 70b (https://news.ycombinator.com/item?id=41459781) does on top of Llama 3.1?


Reflection 70B is a scam. The creator was just routing requests to Claude.

https://old.reddit.com/r/LocalLLaMA/comments/1fc98fu/confirm...


That's unfortunate. When an LLM makes a mistake it's very helpful to read the CoT and see what went wrong (input error/instruction error/random shit)


Yeah, exposed chain of thought is more useful as a user, as well as being useful for training purposes.


I think we may discover that model do some cryptic mess inside instead of some clean reasoning.


Loopback to: "my code works. why does my code work?"


I’d say depends. If the model iterates 100x I’d just say give me the output.

Same with problem solving in my brain: Sure, sometimes it helps to think out loud. But taking a break and let my unconcious do the work is helpful as well. For complex problems that’s actually nice.

I think eventually we don’t care as long as it works or we can easily debug it.


CoT is now their primary method for alignment. Exposing that information would negate that benefit.

I don't agree with this, but it definitely carries higher weight in their decision making than leaking relevant training info to other models.


This. Please go read and understand the alignment argument against exposing chain of thought reasoning.


Given the significant chain of thought tokens being generated, it also feels a bit odd to hide it from a cost fairness perspective. How do we believe they aren't inflating it for profit?


That sounds like the GPU labor theory of value that was debunked a century ago.


No, its the fraud theory of charging for usage that is unaccountable that has been repeatedly proven true when unaccountable bases for charges have been deployed.


The one-shot models aren't going away for anyone who wants to program the chain-of-thought themselves


Yeah, if they are charging for some specific resource like tokens then it better be accurate. But ultimately utility-like pricing is a mistake IMO. I think they should try to align their pricing with the customer value they're creating.


Not sure why you didn’t bother to check their pricing page (1) before dismissing my point. They are charging significantly more for both input (3x) and output (4x) tokens when using o1.

Per 1M in/out tokens:

GPT4o - 5$/15$

O1-preview - 15$/60$

(1) https://openai.com/api/pricing


My point is that "cost fairness" is not a thing. Either o1 is worth it to you or it isn't.


It’s really unclear to me what you understood by “cost fairness”.

I’m saying if you charge me per brick laid, but you can’t show me how many bricks were laid, nor can I calculate how many should have been laid - how do I trust your invoice?

Note: The reason I say all this is because OpenAI is simultaneously flailing for funding, while being inherently unprofitable as it continues to boil the ocean searching for strawberries.


If there's a high premium, then one might want to wait for a year or two for the premium to vanish.


Eh it’s not worth it to me because it’s unfair.


It'd be helpful if they exposed a summary of the chain-of-thought response instead. That way they'd not be leaking the actual tokens, but you'd still be able to understand the outline of the process. And, hopefully, understand where it went wrong.


They do, according to the example


Exactly that I see in the Android app.


When are they going to change the name to reflect their complete change of direction?

Also, what is going to be their excuse to defend themselves against copyright lawsuits if they are going to "understandably" keep their models closed?


[flagged]


AFAIK, they are the least open of the major AI labs. Meta is open-weights and partly open-source. Google DeepMind is mostly closed-weights, but has released a few open models like Gemma. Anthropic's models are fully closed, but they've released their system prompts, safety evals, and have published a fair bit of research (https://www.anthropic.com/research). Anthropic also haven't "released" anything (Sora, GPT-4o realtime) without making it available to customers. All of these groups also have free-usage tiers.


sure but also none of that publicly existed when openai was named


> literally anyone can use it for free, you don't even need an account

how can you access it without an account?


chatgpt.com allowed me to last i checked


Am I right that this CoT is not actual reasoning in the same way that a human would reason, but rather just a series of queries to the model that still return results based on probabilities of tokens?


Tough question (for me). Assuming the model is producing its own queries, am I wrong to wonder how it's fundamentally different from human reasoning?


It could just be programmed to follow up by querying itself with a prompt like "Come up with arguments that refute what you just wrote; if they seem compelling, try a different line of reasoning, otherwise continue with what you were doing." Different such self-administered prompts along the way could guide it through what seems like reasoning, but would really be just a facsimile thereof.


Maybe the model doesn't do multiple queries but just one long query guided by thought tokens.


> I'd say OpenAI is rightly concerned that fine-tuning on chain of thought responses from this model would allow for quicker reproduction of their results.

Why? They're called "Open" AI after all ...


I see chain of thought responses in chatgpt android app.


Tested cipher example, and it got it right. But "thinking logs" I see in the app looks like a summary of actual chain of thought messages that are not visible.


o1 models might use multiple methods to come up with an idea, only one of them might be correct, that's what they show in ChatGPT. So it just summarises the CoT, does not include the whole reasoning behind it.


I don't understand how they square that with their pretense of being a non-profit that wants to benefit all of humanity. Do they not believe that competition is good for humanity?


Can you explain what you mean by this?


You can see an example of the Chain of Thought in the post, it's quite extensive. Presumably they don't want to release this so that it is raw and unfiltered and can better monitor for cases of manipulation or deviation from training. What GP is also referring to is explicitly stated in the post: they also aren't release the CoT for competitive reasons, so that presumably competitors like Anthropic are unable to use the CoT to train their own frontier models.


> Presumably they don't want to release this so that it is raw and unfiltered and can better monitor for cases of manipulation or deviation from training.

My take was:

1. A genuine, un-RLHF'd "chain of thought" might contain things that shouldn't be told to the user. E.g., it might at some point think to itself, "One way to make an explosive would be to mix $X and $Y" or "It seems like they might be able to poison the person".

2. They want the "Chain of Thought" as much as possible to reflect the actual reasoning that the model is using; in part so that they can understand what the model is actually thinking. They fear that if they RLHF the chain of thought, the model will self-censor in a way which undermines their ability to see what it's really thinking

3. So, they RLHF only the final output, not the CoT, letting the CoT be as frank within itself as any human; and post-filter the CoT for the user.


RLHF is one thing, but now that the training is done it has no bearing on whether or not you can show the chain of thought to the user.


This is a transcription of a literal quote from the article:

> Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users


At least they're open about not being open. Very meta OpenAI.


I think they mean that you won’t be able to see the “thinking”/“reasoning” part of the model’s output, even though you pay for it. If you could see that, you might be able to infer better how these models reason and replicate it as a competitor


Including the chain of thought would provide competitors with training data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: