Hacker News new | past | comments | ask | show | jobs | submit login

The problem is that no matter how well you quote or encode the input, the assumption that any discernible instructions inside that input should be followed is too deeply ingrained in the model. The model's weights are designed to be "instruction-seeking", with a bias toward instructions received recently. If you want to make it less likely it through pure prompting, placing instructions after quoted input helps a lot, but don't expect it to be perfect.

The only 100% guaranteed solution I know is to implement the task as a fine-tuned model, in which case the prompt instructions are eliminated entirely, leaving only delimited prompt parameters.

And, thanks! Glad you enjoyed the talk!




Thanks! Makes sense!

It was a long day, but one of the most fruitful ones I've had in a long while.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: