I don't think it is that hard. The trick is to implement the access control requirements in a lower traditionally coded layer. The LLM would then just receive your free form command, parse it into the format this lower level system accepts and provide your credentials for the lower system.
For example you would type into your terminal "ship eject warp core" to which the LLM is trained to output "$ ship.warp_core.eject(authorisation=current_user)" The lower level system intercepts this $ command and checks if the current user is authorised for warp core ejection or not and executes it accordingly. Then this lower level system would input to the LLM the result of it's decision either ">> authorised, warp core ejected" or ">> unathorised" and the LLM would narrate this back to the user in freeform text. You can confuse the LLM and make it issue the warp core ejection command but the lower level system will decline it if you are not authorised.
If you think about it this is exactly how telephone banking works already. You call your bank, and a phone operator picks up your phone. The phone operator has a screen in front of them with some software running on it. That software let's them access your account only if they provide the right credentials to it. You can do your best impression of someone else, you can sound real convincing, you can put the operator under pressure or threaten them or anything, the stupid computer in front of them doesn't let them do anything until they typed in the necessary inputs to access the account. And even if you give them the credentials they won't be able to just credit your account with money. The interface in front of them doesn't have a button for that.
The operator is assumed to be fallible (in fact assumed to be sometimes cooperating with criminals). The important security checks and data integrity properties are enforced by the lower level system, and the operator/LLM is just a translator.
I don't think it is that hard. The trick is to implement the access control requirements in a lower traditionally coded layer. The LLM would then just receive your free form command, parse it into the format this lower level system accepts and provide your credentials for the lower system.
For example you would type into your terminal "ship eject warp core" to which the LLM is trained to output "$ ship.warp_core.eject(authorisation=current_user)" The lower level system intercepts this $ command and checks if the current user is authorised for warp core ejection or not and executes it accordingly. Then this lower level system would input to the LLM the result of it's decision either ">> authorised, warp core ejected" or ">> unathorised" and the LLM would narrate this back to the user in freeform text. You can confuse the LLM and make it issue the warp core ejection command but the lower level system will decline it if you are not authorised.
If you think about it this is exactly how telephone banking works already. You call your bank, and a phone operator picks up your phone. The phone operator has a screen in front of them with some software running on it. That software let's them access your account only if they provide the right credentials to it. You can do your best impression of someone else, you can sound real convincing, you can put the operator under pressure or threaten them or anything, the stupid computer in front of them doesn't let them do anything until they typed in the necessary inputs to access the account. And even if you give them the credentials they won't be able to just credit your account with money. The interface in front of them doesn't have a button for that.
The operator is assumed to be fallible (in fact assumed to be sometimes cooperating with criminals). The important security checks and data integrity properties are enforced by the lower level system, and the operator/LLM is just a translator.