A. Implement guardrails (like already done against prompt injection).
Invariant blog post mentions this:
> Conclusion: Agents require extensive, highly-contextual guardrailing and security solutions
> As one of our core missions at Invariant, we absolutely cannot stress enough how important it is to rely on extensive guardrailing with AI models and their actions. We come to this conclusion repeatedly, as part of our research and engineering work on agentic systems. The MCP ecosystem is no exception to this rule. Security must be implemented end-to-end, including not only the tool descriptions but also the data that is being passed to and from the AI model.
B. Version the tool descriptions so that they can be pinned and do not change (same way we do for libraries and APIs).
C. Maybe in future, LLMs can implement some sort of "instruction namespacing" - where the developer would be able to say any instruction in this prompt is only applicable when doing X, Y, Z.
Invariant blog post mentions this:
> Conclusion: Agents require extensive, highly-contextual guardrailing and security solutions
> As one of our core missions at Invariant, we absolutely cannot stress enough how important it is to rely on extensive guardrailing with AI models and their actions. We come to this conclusion repeatedly, as part of our research and engineering work on agentic systems. The MCP ecosystem is no exception to this rule. Security must be implemented end-to-end, including not only the tool descriptions but also the data that is being passed to and from the AI model.
B. Version the tool descriptions so that they can be pinned and do not change (same way we do for libraries and APIs).
C. Maybe in future, LLMs can implement some sort of "instruction namespacing" - where the developer would be able to say any instruction in this prompt is only applicable when doing X, Y, Z.