Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I literally learned prompt engineering from you for the first time two days ago (thank you btw! it was great!)

But didn't you mention that there may be some ways to isolate the user input, using spacing and asterisks and such?

I agree though that leaking a prompt or two by itself doesn't really matter. What's probably a bigger concern is security/DoS type attacks, especially if we build more complicated systems with context/memory.

Maybe Scale will also hire the world's first "prompt security engineer."



The problem is that no matter how well you quote or encode the input, the assumption that any discernible instructions inside that input should be followed is too deeply ingrained in the model. The model's weights are designed to be "instruction-seeking", with a bias toward instructions received recently. If you want to make it less likely it through pure prompting, placing instructions after quoted input helps a lot, but don't expect it to be perfect.

The only 100% guaranteed solution I know is to implement the task as a fine-tuned model, in which case the prompt instructions are eliminated entirely, leaving only delimited prompt parameters.

And, thanks! Glad you enjoyed the talk!


Thanks! Makes sense!

It was a long day, but one of the most fruitful ones I've had in a long while.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: