This would be interesting from a query analytics perspective, but in terms of protection, i would personally not want to rely on something at the DB level. I would place a significant bet that this opinion is not unique to me, and it is why you do not find this built into xxSQL already.
If the 'attacker' has access to execute unsafe queries, that is a problem you should address much further up the ladder rather than by adding additional work / responsibility to your DB.
I see your comment is raising some controversy, but I tend to be in your camp on this one. I do a significant amount of development involving PostgreSQL and I don't think I would want to rely on this. I don't want to belittle the work in question, but as someone that can develop for the database, I have other, better techniques which can provide defensive in depth such as injection prevention while not relying on yet another third party extension. I can set up stored procedure barriers, view barriers, and more between the data in my database and outside access from application tiers. Some of these techniques provide injection prevention as a side effect and in those cases where I do have to consciously provide such protection, I can do so in a central point of access that can service distributed application tiers.
Having said that, there are many development shops that don't have strong database development talent on hand and really treat the database as some black box where to stuff data. In these instances, the database development centric approaches aren't feasible as practical matter. In these cases, I think such a tool can be helpful since at least it gives you the choke point. Where you'd run into issues in this scenario is that this tool requires a learning period to know which queries are good and which aren't. When an application developer changes code, I would imagine that you'd have to address the SQL firewall training as well; a developer that might not be as comfortable developing for the database might not consider that factor and you might get some wonky deployments. I might be missing something on this count only having scanned the docs a bit for this extension.
> there are many development shops that don't have strong database development talent on hand and really treat the database as some black box where to stuff data.
Exactly. That's the reason why we still see lots of SQL injection attacks and incidents, and the reason why I have created this module. :)
White it's not unique to you there would likely be many people here who are aware of the concept of trust boundaries and would appreciate this product :)
Yeah you're right and I can think of several cases, especially in legacy software, where this would be helpful. I just feel that this is one of those things that delegates responsibility to a lower level component due to a flaw in one above it, with the major benefit being that no code changes are required to implement the system, but coming with the cost of increased DB CPU and IO consumption.
I agree with you on that, but at the same time the best security strategies are usually multi-level.
Protection at the application and access layers are always ideal but in that rare case that some major exploit comes along it would be ideal to have extra security one level deeper.
Neat! I did something similar as a patch to PostgreSQL proper in May of this year and gave a talk on it at BerlinSides[1]. I like the fact that your solution is implemented as an extension rather than as a patch to the core, but I worry about the fact that the extension API only has access to the post-analysis parse trees. While I'm comfortable saying that all of the types of SQLi attack that I'm aware of will cause a difference in the raw parse tree, I'm not so sure about whether those differences will carry through into the results of the analysis.
Also, how much overhead does this add? My software is linear (albeit with a fairly large constant) in the size of the input query and constant in the number of queries in the training set.
> but I worry about the fact that the extension API only has access to the post-analysis parse trees.
I doubt that's a problem. Parse analysis won't remove information from the query - otherwise it'll not be available for the actual planning and execution ;)
> Parse analysis won't remove information from the query
That's right.
PostgreSQL's parse analysis keeps a statement structure with token-by-token in the parse tree, and PostgreSQL's query jumbling calculates a hash value from the parse tree.
So, it's possible to find something strange in the statement(s) if someone attempts to cheat.
I'm assuming this works for prepared queries? That is, I should be able to execute a query which is the same, aside from different values for the bound parameters. Anyway, looks pretty cool :)
If the 'attacker' has access to execute unsafe queries, that is a problem you should address much further up the ladder rather than by adding additional work / responsibility to your DB.