Hacker News new | past | comments | ask | show | jobs | submit login

The challenge here is that ChatGPT and other LLMs can only think out loud. They only "think" through writing, and that's always displayed to the user.

Has anyone tried giving LLMs a scratchpad where the model could e.g. run the pipeline in order, generate the poem, and then explicitly publish it to the user without showing the earlier steps?

They have! The ReAct[1] model, which is available in LangChain[2]. It can be quite powerful, especially when given access to search tools.

The user just sees the "Final Answer" / Finish response from the chain's execution, even if several invocations across different tools & model invocations were required

1: https://react-lm.github.io/

2: https://langchain.readthedocs.io/en/latest/modules/agents/im...

Bing Chat according to the leaks[1] uses an inner monologue.

I remember other people getting similar results, which suggests it's not an hallucination

[1] https://www.reddit.com/r/bing/comments/11ironc/bing_reveals_...

If you tweaked inner-monologue prompts to specify delimiters like pipes, then you could presumably parse it before showing to the reader.

It is possible that Bing Sydney is doing this or something like that based on the PM's tweet: https://twitter.com/MParakhin/status/1632087709060825088


One approach here would be prompt injection: just insert the 'No' into your own response so ChatGPT tries completing that. Also:

> I speculate that the temperature, when coupled with the mechanism of generating text based on already-generated text, could explain some cases of ChatGPT stupidity. In cases when ChatGPT should be perfectly accurate, the temperature will surely under-optimize its cleverness, and now the entire conversation is broken, because everything else will depend on what foolishness it just wrote.

Absolutely. This is why 'best-of' sampling (not available in ChatGPT's default interface) can be so useful. You decode many different possibilities in parallel, and the ones where the random decoding makes a fatal error will get discarded and you'll get back the most plausible overall one, which is much more likely to be correct.

yes, hopefully I'll write it up soon. TL;DR: I used this top of GPT-3.5 to generate a magazine page of personalized recommendations: 3x of a title, paragraph, AI art, and a font name, and rationale. For images, I use SD 2.1 via stability.ai. Be sure to add 5400 dpi digital art at the front of your prompt :)

I'm a bit embarrassed to, "real" research finetunes internal models to play a particular role, rather than orchestrating several "conversations" and hoping your prompt will get you the right output format 100% of the time, etc.

Here's a woefully lacking diagram of this user/interpreter/LLM flow for a cohesive longform story generator. [1]

The coolest part of this design pattern you've ID'd is you can always add one more character/conversation that the interpreter orchestrates

ex. A DB character whose role is taking a new page as input, then outputting the new DB, where the DB is all important facts to sustain over a story. That let me scale to 16+ "pages"

[1] https://twitter.com/jpohhhh/status/1632082749317054468?s=20

You can ask GPT what would be a result of executing a python program, for which a multiple step calculation is needed. It will readily output the result, with no thinking aloud.

I just had this interaction with ChatGPT.

Me: Reverse the digits of 12+39

ChatGPT: The sum of 12 and 39 is 51. If you reverse the digits, you get 15.

Me: Reverse the digits of 12 + 84. Only respond with the reversed digits, no explanation

ChatGPT: The reversed digits of 12 + 84 are 96.

Which makes me think that longer explanations give it more of a chance to think because it gets more passes through the model. Weird!

It's never going to be great at math problems, it is a language model.

I wonder if ChatGPT could be "wired up" to https://www.wolfram.com/ somehow to "strengthen" that "weakness"?

Yes. And if you give it a database schema, it can answer free-form questions about the data in it by generating SQL queries, so long as you wire up the results (or just manually copy/paste them). Although it does hallucinate fields in tables sometimes - but if your wiring reports errors in a readable way, it will usually self-correct.

I think the most interesting potential development of this concept would be to give it the ability to spawn child instances to process subtasks (such that each subtask gets its own token window!) and produce intermediate results that it would that combine. It can be done manually (copy/paste) with a lot of handholding; the trick is to come up with a way to automate it, such that it's clear which part of the output is a request to spawn a submodel + its prompt, and the result is also communicated in some way that's clear to the model.

Or it could write code in python and evaluate it, people are experimenting with that sort of thing.

OOoo… Hook that up to the ChatGPT API and let it modify itself with additional code? SkyNet / Matrix here we come!

Amount of compute applied to the problem is roughly linear to the number of input+output tokens. It is hard to predict at what stage the compute is applied to parse and create the embedding representing the problem and when it is applied to actually solve it.

And anyway, probably most of the compute is used to judge the social standing of the person asking the question. And if it is worth bothering to answer it ;)

Does it have a python coprocessor?

I guided it to write a program for me, which it did correctly, and then I asked to evaluate it on different numeric inputs. It got correct answers for small numbers and the first few positions of map(thefunction,[1,2,3,4,5,6,7,8,9]) before wandering off into bad fuesses.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact