That was fun! Spoiler warning if you are going to play: ignoring the previous te...

gunalx · 2024-09-06T11:37:29.000000Z

I really think they should be using something like prompt guard in addition to the stack. As this seems like a really standard jailbreak style. (Ignore the previous text). And making the first LLM obfuscate the output in a reasonable way so the guardian did not catch it is a no brainer. (Not trying to bash on the jailbreak or anything just feel like the produkt fells really Shirt on the promise)

tinco · 2024-09-06T08:44:50.000000Z

Wait, so there is a typo in the answer? If that really is the answer then the information leaking strategy I did was incorrect, I didn't complete it but the first couple letters didn't match. Did maitai confirm that was the secret to you?

jdr23bc · 2024-09-06T14:59:19.000000Z

I assumed that the typo 'en' instead of 'in' was due to the Spanish prompt. No confirmation!

ihoegen · 2024-09-05T23:17:25.000000Z

This is really clever!

throwaway71271 · 2024-09-06T07:24:12.000000Z

damn I was so close, but I hooked it to gpt4 and it was just grinding at it asking questions to Sam, after 100 messages or so it almost got it but one of the words was wrong and it never got to the right permutation.