I think you have a valid point, but the risk of this feels exaggerated.
I already had a few entities I didn't really need it using (not for security reasons, but to shorten the system prompt). I simply excluded them within the Jinja template itself. I can see this being a problem with people who have their ovens or thermostats on HA, but I don't necessarily think it's an unsolvable issue if we implement sensible sanity checks on the output.
hilariously, the model I'm using doesn't even have any RLHF. but I am also not very concerned if GlaDOS decides to turn on the coffee machine. maybe I would be slightly more concerned if I had a smart lock, but I think primitive methods such as "throw big rock at window" would be far easier for a bad person.
when it comes to jailbreak prompts, you need to be able to call the assistant in the first place. if you are authorized to call the HomeAssistant API, why would you bother with the LLM? just call the respective API directly and do whatever evil thing you had in mind. I took an unreasonable number of measures to try to stop this from happening, but I admit that's a risk. however, I don't think that's a risk caused by the LLM, but rather the existence of IoT devices.
I already had a few entities I didn't really need it using (not for security reasons, but to shorten the system prompt). I simply excluded them within the Jinja template itself. I can see this being a problem with people who have their ovens or thermostats on HA, but I don't necessarily think it's an unsolvable issue if we implement sensible sanity checks on the output.
hilariously, the model I'm using doesn't even have any RLHF. but I am also not very concerned if GlaDOS decides to turn on the coffee machine. maybe I would be slightly more concerned if I had a smart lock, but I think primitive methods such as "throw big rock at window" would be far easier for a bad person.
when it comes to jailbreak prompts, you need to be able to call the assistant in the first place. if you are authorized to call the HomeAssistant API, why would you bother with the LLM? just call the respective API directly and do whatever evil thing you had in mind. I took an unreasonable number of measures to try to stop this from happening, but I admit that's a risk. however, I don't think that's a risk caused by the LLM, but rather the existence of IoT devices.