I have seen posts about how you can use LLM to proxy queries into your dataset. This seems powerful, but how to prevent it from being abused or hacked? Are there other strategies to deploy a text to SQL use case?
Yes! I've worked on something [1] in the space lately, and can share learnings. It generates queries, which gets fed to the application layer, and it executes them based on some deterministic controls.
Basically, if you're working with a database that has some actually useful/critical data, put an application layer in the middle that can gatekeep LLM access.
- Depending on your risk appetite, make a decision on providing a read-only URL to the thing that interfaces with the LLM.
- For maximum control over the queries, you can wrap then in functions (tools) and let the LLM only call the tools. But this takes away a lot of the power. What we've basically done is allowing the LLM to read the context progressively [2] and give it free reign over the queries.
- The said application layer can ask for approval on all of the queries. (Human in the loop). [3]
- (Optionally) If your query returns a lot of data, putting all of that through the LLM context might be expensive. You need a way to return data out of band from a query, directly to the user.
Basically, if you're working with a database that has some actually useful/critical data, put an application layer in the middle that can gatekeep LLM access.
- Depending on your risk appetite, make a decision on providing a read-only URL to the thing that interfaces with the LLM.
- For maximum control over the queries, you can wrap then in functions (tools) and let the LLM only call the tools. But this takes away a lot of the power. What we've basically done is allowing the LLM to read the context progressively [2] and give it free reign over the queries.
- The said application layer can ask for approval on all of the queries. (Human in the loop). [3]
- (Optionally) If your query returns a lot of data, putting all of that through the LLM context might be expensive. You need a way to return data out of band from a query, directly to the user.
[1] https://github.com/inferablehq/inferable/tree/main/data-conn...
[2] https://github.com/inferablehq/inferable/blob/ec504b8593f41b...
[3] https://github.com/inferablehq/inferable/tree/main/data-conn...
reply