You make the asumption that Q* is a LLM, but I think OpenAI guys know very well that the current LLM architecture cannot achieve AGI.
As the name suggests, this things is likely using some form of Q learning algorithm, which makes it closer to the DeepMind models than a transformer.
My guess is that they pipe their LLM into some Q learnt net. The LLM may transform a natural language task into some internal representation that can then be handled by the Q-learnt model, which spits out something that can be transformed back again into natural language.
The real world is a space of continuous actions. To this day Q algorithms have been ones of discrete action outputs. I'd be surprised if a Q algorithm could handle the huge action space of language. Honestly its weird they'd consider the Q family. I figured we were done with that after PPO performed so well.
As an ML programmer, i think that approach sounds really too complicated.
It is always a bad idea to render the output of one neural network into output space before feeding it into another, rather than have them communicate in feature space.
As the name suggests, this things is likely using some form of Q learning algorithm, which makes it closer to the DeepMind models than a transformer.
My guess is that they pipe their LLM into some Q learnt net. The LLM may transform a natural language task into some internal representation that can then be handled by the Q-learnt model, which spits out something that can be transformed back again into natural language.