You make the asumption that Q\* is a LLM, but I think OpenAI guys know very well...

jansan · on Nov 23, 2023

There is a paper about something called Q*. I have no idea if they are connected or if the name matched coincidentially.

https://arxiv.org/abs/2102.04518

wegfawefgawefg · on Nov 23, 2023

The real world is a space of continuous actions. To this day Q algorithms have been ones of discrete action outputs. I'd be surprised if a Q algorithm could handle the huge action space of language. Honestly its weird they'd consider the Q family. I figured we were done with that after PPO performed so well.

wegfawefgawefg · on Nov 23, 2023

As an ML programmer, i think that approach sounds really too complicated. It is always a bad idea to render the output of one neural network into output space before feeding it into another, rather than have them communicate in feature space.