Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to Safeguard a LLM to SQL Prompt from SQL Injection?
3 points by halamadrid 2 days ago | hide | past | favorite | 4 comments
I have seen posts about how you can use LLM to proxy queries into your dataset. This seems powerful, but how to prevent it from being abused or hacked? Are there other strategies to deploy a text to SQL use case?





Yes! I've worked on something [1] in the space lately, and can share learnings. It generates queries, which gets fed to the application layer, and it executes them based on some deterministic controls.

Basically, if you're working with a database that has some actually useful/critical data, put an application layer in the middle that can gatekeep LLM access.

- Depending on your risk appetite, make a decision on providing a read-only URL to the thing that interfaces with the LLM.

- For maximum control over the queries, you can wrap then in functions (tools) and let the LLM only call the tools. But this takes away a lot of the power. What we've basically done is allowing the LLM to read the context progressively [2] and give it free reign over the queries.

- The said application layer can ask for approval on all of the queries. (Human in the loop). [3]

- (Optionally) If your query returns a lot of data, putting all of that through the LLM context might be expensive. You need a way to return data out of band from a query, directly to the user.

[1] https://github.com/inferablehq/inferable/tree/main/data-conn...

[2] https://github.com/inferablehq/inferable/blob/ec504b8593f41b...

[3] https://github.com/inferablehq/inferable/tree/main/data-conn...


DB Level:

- Create SQL credentials for LLM queries that will have limited access (exclude sensitive fields, tables, views etc) and read-only access.

- Use SQL views to avoid/exclude access to specific fields.

App Level:

- Use SQL linters and treat input as a regular user input.

- Convert the SQL query string into a query tree, escape ALL fields and then assemble the query back.


Treat your LLM input like it's user input and don't execute SQL directly from a LLM output.

instead of plain SQL, the LLM should output AST



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: