Hacker News new | past | comments | ask | show | jobs | submit login

The Bing LM, or rather the service, did have "inner monologue" in the sense of text that it would generate, but not show to the user, and treat as "thoughts" to guide the generation of an actual reply that the user would see.

We know this because it happily told us, including the json format it uses internally.




Interesting. I didn't know that.

When using gpt-4 directly through the API we can emulate this behavior


And you trust what it told you?


No, but the reconstructed examples have "im_start" and "im_end", which strongly implies that it is, if not verbatim, then a close enough restatement of the real deal. Take a look:

https://www.make-safe-ai.com/is-bing-chat-safe/Prompts_Conve...


Yup, for the same reason I trust e.g. jailbreaks exposing the prompt: it was consistent.

Really, just asking again is a fine way to expose all sorts of "hallucinations" in a LM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: