Has anyone tried giving LLMs a scratchpad where the model could e.g. run the pipeline in order, generate the poem, and then explicitly publish it to the user without showing the earlier steps?
The user just sees the "Final Answer" / Finish response from the chain's execution, even if several invocations across different tools & model invocations were required
I remember other people getting similar results, which suggests it's not an hallucination
It is possible that Bing Sydney is doing this or something like that based on the PM's tweet: https://twitter.com/MParakhin/status/1632087709060825088
One approach here would be prompt injection: just insert the 'No' into your own response so ChatGPT tries completing that. Also:
> I speculate that the temperature, when coupled with the mechanism of generating text based on already-generated text, could explain some cases of ChatGPT stupidity. In cases when ChatGPT should be perfectly accurate, the temperature will surely under-optimize its cleverness, and now the entire conversation is broken, because everything else will depend on what foolishness it just wrote.
Absolutely. This is why 'best-of' sampling (not available in ChatGPT's default interface) can be so useful. You decode many different possibilities in parallel, and the ones where the random decoding makes a fatal error will get discarded and you'll get back the most plausible overall one, which is much more likely to be correct.
I'm a bit embarrassed to, "real" research finetunes internal models to play a particular role, rather than orchestrating several "conversations" and hoping your prompt will get you the right output format 100% of the time, etc.
Here's a woefully lacking diagram of this user/interpreter/LLM flow for a cohesive longform story generator. 
The coolest part of this design pattern you've ID'd is you can always add one more character/conversation that the interpreter orchestrates
ex. A DB character whose role is taking a new page as input, then outputting the new DB, where the DB is all important facts to sustain over a story. That let me scale to 16+ "pages"
Me: Reverse the digits of 12+39
ChatGPT: The sum of 12 and 39 is 51. If you reverse the digits, you get 15.
Me: Reverse the digits of 12 + 84. Only respond with the reversed digits, no explanation
ChatGPT: The reversed digits of 12 + 84 are 96.
Which makes me think that longer explanations give it more of a chance to think because it gets more passes through the model. Weird!
I think the most interesting potential development of this concept would be to give it the ability to spawn child instances to process subtasks (such that each subtask gets its own token window!) and produce intermediate results that it would that combine. It can be done manually (copy/paste) with a lot of handholding; the trick is to come up with a way to automate it, such that it's clear which part of the output is a request to spawn a submodel + its prompt, and the result is also communicated in some way that's clear to the model.
And anyway, probably most of the compute is used to judge the social standing of the person asking the question. And if it is worth bothering to answer it ;)
I guided it to write a program for me, which it did correctly, and then I asked to evaluate it on different numeric inputs. It got correct answers for small numbers and the first few positions of map(thefunction,[1,2,3,4,5,6,7,8,9]) before wandering off into bad fuesses.