

14 Years of SQL Injection and still the most dangerous vulnerability - robert681
https://www.mavitunasecurity.com/blog/sql-injection-vulnerability-history/
SQL injection has been around for 14 years and it is still the most dangerous and widely exploited web application vulnerability. Ever wondered why?
======
Shish2k
TL;DR: "Because you aren't using our snazzy new product; try it today!"

Also:

    
    
        This is primarily because of the most obvious problem:
        We are still using relational SQL databases.
        Were we to use NoSQL database systems such as MongoDB
        or CouchDB, none of these attacks would ever happen
    

Yeah, and motorbikes are safer than cars because the third and fourth wheels
never fall off...

~~~
SigmundA
Make perfect sense, since there is no SQL in NoSQL its impossible to have a
SQL injection, problem solved...

~~~
astrodust
In MongoDB you could theoretically have a JavaScript injection, which I'd
argue is pretty bad.

~~~
SigmundA
Sorry the sarcasm didn't transfer to my post, you are correct.

------
clarkevans
When combining software components, you have to gateway between them -- this
is done by escaping inputs and validating outputs. It's often non-trivial to
grok where the junctures between components are, and how to properly connect
them. This requires thought, experience, and diligence.

Why it's so common is that you can easily connect components incorrectly,
still have it "work" enough to do a demo, and, there is lot of pressure to put
things that appear to work into production without auditing them.

Addendum: Shish2k observes this article claims NoSQL databases would somehow
solve the problem, rather than seeing NoSQL as just a different kind of
component with it's own escaping/validation needs. Classic -- someone writing
on this very topic is blinded to the phenomenon he discusses; the article
itself demonstrates why we have SQL injections.

~~~
betterunix
"this is done by escaping inputs"

Something about that sounds wrong. It is as though you are suggesting that we
use in-band signaling with a bunch of notch filters to ensure that Cap'n
Crunch whistles cannot be used to get free calls. The right answer is out-of-
band signaling -- in other words, not constructing queries / command strings /
etc. from user inputs.

Major SQL databases all support prepared statements; this seems like a far
more robust approach than trying to sanitize inputs.

~~~
mikeash
I think that comment is a succinct summary of why SQL injections are still
common, anyway. The answer: because people think "escaping" is the answer.

Imagine constructing a function in some other language this way:

    
    
        function square(x) {
            code = x + "*" + x;
            return eval(code);
        }
    

This is obviously wrong. Now, let's say I suggest fixing it with escaping:

    
    
        function square(x) {
            code = escape(x) + "*" + escape(x);
            return eval(code);
        }
    

Any sane programming community would string me up by my thumbs for doing this,
and rightly so.

Yet, do the exact same thing in SQL and it's just business as usual.
_Occasionally_ people talk about prepared statements and parameterized queries
and such, but usually people just talk about escaping. As if the problem is
just that you need more backslashes, not that you're glomming strings together
at runtime and then calling eval() on them.

It's so bizarre. People would think you're crazy for doing it in one
environment, but doing it in a different environment where the exact same
problems are present is just fine! I can't fathom how we got to this place.

~~~
clarkevans
I should have used properly "construct input" (as suggested by dragon-writer)
rather than "escape". If possible, you'd use query parameters or some other
reusable and verified mechanism to encode user content. If you're embedding a
data stream of one type within another, you still have to manage boundaries to
encode content properly.

I'd note for database languages, query construction is often two phase, where
you build your query template based on selection/filtering needs, and then,
use parameter substitution with user-provided content. I've seen experienced
programmers do both at the same time and resort to manual escaping of user
content rather than seeing them as distinct phases.

I think it's about managing the movement of information between contexts...
one process' data is another process' code. The most important context switch
happening between untrusted inputs and the rest of your system.

I apologize for deleting my other comment; I was trying to shorten it.

~~~
mikeash
I'll move my reply here, then.

You make good points, and I'm trying to figure out exactly how your points and
mine hook together.

I _think_ what it ultimately comes down to is distinguishing between data and
code. More specifically, data can contain code, but code should never contain
(foreign) data. So escaping in general is fine, but escaping _in this context_
, where you're building a string of code, is no good.

That, I think, is why I'm comfortable with e.g. the escaping done to a string
when emitting JSON, but not with escaping a string to put it into an SQL
query.

With parameterized queries, you're still ultimately passing the code and data
over to the database over some sort of stream (assuming an out-of-process
database server), but _hopefully_ that stream is designed as data containing
both the parameterized SQL code and the parameters, rather than just SQL code
with escaped parameters.

~~~
rpedela
Doesn't the database server just escape the data in the parametrized query? So
if you used the database's escape functions directly wouldn't it be the same
thing? I guess if you are arguing that developers are more likely to mess it
up than the database server then I see your point.

~~~
vertex-four
Not usually. What usually happens is something like this:

    
    
        SELECT * FROM bar_table WHERE x = 123;
    

Is converted into a datastructure like this in the database engine (really
simplified):

    
    
        {
          "querytype": "select",
          "fields": ["bar_table.x", "bar_table.y", "bar_table.z"],
          "tables": ["bar_table"],
          "where_filters": ["x", "=", 123]
        }
    

Which is then converted into a list of things to do to run that query.

Now, if you give it a prepared statement:

    
    
        SELECT * from bar_table WHERE x = ?;
    

The database engine can convert that into a closure, something like this:

    
    
        function(arg1){
          return {
            "querytype": "select",
            "fields": ["bar_table.x", "bar_table.y", "bar_table.z"],
            "tables": ["bar_table"],
            "where_filters": ["x", "=", arg1]
          };
        }
    

Then when you run the prepared query, it can simply call the closure, putting
the right value in the datastructure without escaping anything, and continuing
on from there.

Of course, the realities of it are more complicated than that for optimization
reasons, but that's the basis of it.

~~~
rpedela
Makes sense. Thanks for the explanation.

------
16s
Unrealistic deadlines and sales reps promising features to clients is a large
part of the problem. Get the changes in, test the usability, and ship it. Get
it out to the client pronto. We have an earnings report due by X date and this
will help the bottom line.

It's not that devs and DBAs don't care, it's that in many cases (not all) they
aren't given the time to carefully test the system to see how it might be
abused.

And also, many companies view security testing as an afterthought or an audit
requirement. If they do not have to do it (time is money) they won't. And when
they are forced to do it, they rush through it.

In some ways this is a philosophical fight between a group that wants to ship
software and make a profit and a group that wants to design and test systems
to make sure they are correct. Suits vs technologists.

~~~
untothebreach
Deadlines are not an excuse for not doing basic pen/vuln/fuzz testing prior to
a release. Even the most basic of tools can detect SQL injections.

~~~
16s
I understand this. When security is not part of the culture of the company,
and management just wants to ship software, devs won't have time to test. MS
used to be this way (back before XP SP2). They stopped and made security a
focus and allowed devs to work on securing the software. Today, they are much
better off because of that. Not all companies think about security. Many only
think about the bottom line.

------
munimkazia
Because there are developers and project managers out there who still don't
understand it very well, or (dare I say) don't care enough to understand it.
There will always be people who will just get the work done to a level that
will reasonably satisfy their bosses, so that they can take their paycheck,
and go home.

This is not software engineering alone, by the way.

------
chriswarbo
SQL injection is so prevalent because developers overwhelmingly think about
their data in terms of its representation, rather than its meaning.

User input and SQL statements are two completely separate things, but it just
so happens that they can both be represented in memory using the same layout.
Hence many developers stop thinking in meaningful terms like "user input" and
"SQL query", and instead start thinking in terms of implementation details,
like "strings".

The solution to code injection is not to jump on the bandwagon du jour, but to
maintain and enforce distinctions between meaningful datatypes. See, for
instance, [http://blog.moertel.com/posts/2006-10-18-a-type-based-
soluti...](http://blog.moertel.com/posts/2006-10-18-a-type-based-solution-to-
the-strings-problem.html)

"Systems Hungarian" has the same cause; Hungarian notation means prefixing
variable names with useful information about the contents, for example "dAge"
could be the difference between two ages and "cUsers" could be a count of how
many users there are. When this practice became enforced at, for example
Microsoft, it quickly degraded into the meaningless "Systems Hungarian" which
just gives unhelpful implementation details, eg. "iAge" and "iUsers" to
indicate that they're both integers. Widespread use of Systems Hungarian,
enforced by clueless management, lead to many developers hating Hungarian
notation completely, despite never knowing that it has nothing to do with
machine representations.

------
praptak
SQL is hard to get rid of injections and here's why I think it is so.

How would an ideal injection-free application look? I'd imagine a set of
unmutable precompiled SQL statements (the code) each controlled by a set of
parameters (the data.) No gluing of statements from strings at runtime and the
parameters are obviously passed out-of band. You can't forget to escape user-
provided data because in this setup the SQL code and the user data never mix,
so escaping is not necessary.

Unfortunately is not feasible to use the above design with SQL. A simple
filter with a parameter that can be unspecified (as in "don't care") would
require 2 different precompiled statements ("select ... from foo" and "select
... from foo where param=<placeholder>".) The number grows exponentially with
the complexity of the filter, so the runtime construction of SQL statements is
inevitable.

You can go about it in a smarter or a dumber way but the danger remains.

~~~
eru
> The number grows exponentially with the complexity of the filter, so the
> runtime construction of SQL statements is inevitable.

You can still construct at runtime. Just let your typesystem handle the
distinction between SQL code and data.

~~~
praptak
Good point, but this would require composable SQL in the typesystem, right?
Maybe libraries like SQLAlchemy come close to that (or maybe are already
there?) but look how long it took.

~~~
chriswarbo
Type systems tell you when your code is wrong. You don't have to do any SQL
manipulation _in_ your type system, you just make sure that SQL and non-SQL
data have different types, and carry on as normal. If your code is joining
functions together in an unsafe way, the type system will reject it and you
have to go back and change it. If it's safe, they type system accepts it and
then the code can be compiled into some untyped* language (assembly,
Javascript, JVM bytecode, whatever)

* By 'untyped' I mean that the type-system that you were using is no longer enforcing anything. There may be another type system in the compiler target.

~~~
chriswarbo
It would actually be possible to use a type system to compose SQL; we could
use this to guarantee there are no syntax errors. For example we could have a
types "SqlTable", "SqlQueryType", "SqlWhereCondition",
"SqlComparisonOperator", etc.

However, we don't need to do anything nearly so elaborate to stop SQL
injection. We just need a type "SqlQuery", since that will be instantly
unusable by all string concatenation functions. We then make a concatenation
function for SqlQuery values and a "stringToSqlQuery" function (or
".toSqlQuery" method, if you prefer) which converts strings to SqlQuery values
by escaping them.

This way, we've turned SQL escaping into a type coercion, so we can only pass
the type checker by escaping every string we put in our queries. Note also
that it solves the double-escaping problem: since escaped strings have a
different type to unescaped strings, we can't send them back through the
escape function; ie. "stringToSqlQuery(stringToSqlQuery(foo))" is a type
error.

------
SigmundA
Isn't this more about using strings to build executing code from untrusted
input?

This kind of stuff always seems to surface when one language is embedded in
another and this tends to be database query languages and one of the most
popular is SQL, but it's not limited to it, just look at javascript injection
from parsing JSON using eval.

I am not seeing injection going away until we stop sending bits of code
between runtimes as strings, which will be hard to avoid so long as databases
and applications use different languages. It's a lot of work to abstract one
language into the other (See LINQ to SQL), where as string concatenation is
understandably the easy way out.

Ideally we could all agree on a good binary data format(something like Go
GOBs) and a AST structure to serialize to it then we would have the common
building blocks to send dynamically built code between runtimes/languages
without using strings.

~~~
dragonwriter
> Isn't this more about using strings to build executing code from untrusted
> input?

No.

Its about lack of care in building executing code from untrusted input. If you
took data from the outside world, converted it into x86 machine code, and sent
to it another system, you could still have the same problem if you didn't
handle the input properly.

SQL-by-string-interpolation/concatenation is an easy target because its a
common enough mistake that the same style knowledge of how to exploit the
holes in it can be used against lots of targets, not because the transfer
format from the poorly-built system generating the query to the critical
backend system handling the query uses strings.

~~~
SigmundA
My point is it's much easier to lack care when using strings. If for example
SQL where only exposed as an AST object model to the client language injection
would be much harder to accidentally allow. Also you could have another layer
of your app actually examine the AST for security issues (no DML statements
allowed) or certain tables restricted etc.

We actually use a component in our app the builds an AST from SQL which can
then be verified, this is not trivial. Also LINQ builds an AST which is then
transformed to SQL statements in LINQ to SQL, this AST can also be examined
before executed and prevents injection.

------
eli
Because it's easy to do things the wrong way and a site built the wrong way
_still works_ , it's just insecure.

~~~
dragonwriter
i.e., the problem is that "properly secures access to data" is all too often
not part of the definition of "works" applied to systems.

~~~
eli
Sure, probably true. But I would assume most people who write insecure code do
so because they don't know how to do it the right way, not because "must be
secure" wasn't in the requirements doc.

~~~
dragonwriter
If security (operationalized properly) was part of the requirements against
which it was evaluated prior to release, insecure code wouldn't be released.

("Must be secure" is a much higher level requirement than anything that is
testable, but a high level requirement is meaningless except to the extent its
operationalized into lower-level requirements that are testable -- or
analytically provable, but that's even harder.)

------
harrytuttle
Why?

Because most of the products are developed by the lowest bidder.

------
rip747
blogspam. just one big advertisement for their product

