
SQL injections vulnerabilities in Stack Overflow PHP questions - laurent123456
https://laurent22.github.io/so-injections/
======
jorkro
It's important to note that the author seems to be flagging injections based
on the use of a PHP variable in the query string (
[https://github.com/laurent22/so-sql-
injections/blob/master/s...](https://github.com/laurent22/so-sql-
injections/blob/master/src/AppBundle/InjectionFinder.php#L62) ).

Although a strong indicator, this does not necessarily imply an SQLi. If the
variable is sanitized before the query (for instance, the variable is cast to
an int), there is no SQLi to exploit.

So it's very possible that these graphs are riddled with false positives.

~~~
laurent123456
Yes there are definitely some false positives (as well as false negatives),
though the lack of prepared statements and use of concatenated strings like
this are usually a good indicator that there's in fact an SQL injection
vulnerability. If you manually check some of these questions, you'll often
find that the parameters come from $_POST or other user input, so that's why I
didn't try to make it any smarter since just checking for "some sql... some
$variable" is a good enough indicator.

~~~
sakopov
Wouldn't it make sense to strip out a lot of unnecessary code to get to the
point faster and allow others to reproduce the issue at hand easier? Just
because the question is omitting validation and sanitation doesn't mean it has
flaws. Isn't that the nature of SE questions? If your question is about a
piece of logic deep down in your stack, you're likely not going to post 3
layers of code above it.

~~~
laurent123456
Hmm, could be, but I think if someone cares enough to write a minimal working
sample, they'll also care enough not to have SQL injection vulnerabilities in
their code (if only because people might lecture them about this instead of
answering the question :). More often than not these issues are found in code
dump like this[0] where nothing has been cleaned up.

[0]
[https://stackoverflow.com/questions/40958763](https://stackoverflow.com/questions/40958763)

------
sschueller
Not really, these are just non parametrierend SQL Statements. It doesn't mean
that the variables in the SQL Statements have not been sanitized before hand.
[1]

It may not be best practice but doesn't automatically mean SQL injection
vulnerable.

And this is not limited to PHP

[1] [https://github.com/laurent22/so-sql-
injections/issues/1](https://github.com/laurent22/so-sql-injections/issues/1)

~~~
nkozyra
Yeah, that was my immediate thought. Something like this:

$delete = "DELETE FROM `cart` WHERE id='$id'";

Which is currently among the top three, could be prefaced by the built in
mysqli escape function or at minimum cast to (int).

This is not a great approach, the query should be prepared, but it's not
inherently an injection risk based on that snippet alone.

And the underlying functions come from the C library for MySQL. Almost any
language's SQL libraries will allow you to send an arbitrary string to the
database. This is not a PHP problem by any measure.

~~~
dep_b
Perhaps it's the whole mindset. Why even use that when you have perfectly fine
built in mechanisms for SQL parameters for over a decade? I wouldn't even
start writing a query like that.

You can do stuff like that too in a .Net / SQL stack for example (or in
Django) but somehow everybody defaults to Entity framework or similar
solutions that take care of it.

I'm way too lazy to deal with lines and lines of sanitizing, raw queries and
mysql_blahblahblah anyway so I write wrappers that give me the results from a
query and a bunch of parameters in one line. Secure. Simple.

~~~
flomo
If you google "PHP MySQL", you can find all sorts of horrible tutorials like
[1] teaching you to write SQL injections. So unlike users of other languages,
PHP programmers are taught the wrong way first and then are expected to figure
it out somehow.

[1]
[http://www.freewebmasterhelp.com/tutorials/phpmysql/7](http://www.freewebmasterhelp.com/tutorials/phpmysql/7)

~~~
towindontplay
That website is from 2001 though

~~~
DCoder
The bigger problem is that Google ranks that website higher than the modern
stuff. You can make much better tutorials and it won't help as long as that
legacy cruft is sitting at the top of every newbie's search results.

~~~
dep_b
True. I notice that I filter Google results on last year only just to filter
out outdated stuff. Just today I wondered why everything on the web (read:
Google) was getting so old.

------
tyingq
It's finding some things where placeholders aren't a solution....

29/11/2016 08:23:13: $update = 'UPDATE '.$table.' SET ';

You can't use a placeholder for a table name. Using a dynamic table name is
pretty standard practice. The reason for that is to allow a table prefix,
allowing multiple of the same app within a single database. And, typically,
the table name originates from a config file, not web POSTS/GETS or other
"taintable" sources.

~~~
oneeyedpigeon
Also, one of the examples is:

    
    
        03/12/2016 19:33:11:	$pdo->prepare("UPDATE users SET avatar=? WHERE id=22 ")->execute([$u]) ;
    

Isn't that the correct way to do it?

~~~
tyingq
Found the regex's being used: [https://github.com/laurent22/so-sql-
injections/blob/master/s...](https://github.com/laurent22/so-sql-
injections/blob/master/src/AppBundle/InjectionFinder.php#L62)

It's pretty basic. For UPDATE, it's matching anything with UPDATE followed by
SET, followed by a dollar sign+alphanumeric

And, it's specifically going after dynamic table names. Hmm.

    
    
      '/UPDATE\s+.*?\$[a-zA-Z0-9].*?\sSET.*?/i', // UPDATE $table SET ...
    

Edit: You can also escape the scans by using variable names that start with an
underscore :) $_dontscanme

~~~
oneeyedpigeon
I just did the same :) It flags, for example, the following as an SQL
injection:

    
    
        query("DELETE FROM foo WHERE id=1");$id=2;
    

In other words, it's flagging any line that begins with "DELETE ... FROM" and
then later has a variable. Now, that might be a fairly strong correlation, but
it's definitely far from perfect.

~~~
laurent123456
Yes there are false positives. It would be too time consuming to extract and
parse the PHP code from the questions since there are millions of them to
process, however I found that the regexes, though not perfect, provide a good
estimate and are quick to execute.

~~~
tyingq
You could tweak the regexes a bit to fix some things...

\- it's currently skipping variable names with underscores, and it's matching
variable names that start with numbers, which isn't a valid variable name

\- the false positives coming from ->execute($whatever) could be fixed by not
searching after a > character. I don't think that would create any notable
amount of false negatives.

Like, for example, changing:

    
    
      '/UPDATE\s+.*?\sSET\s.*?\$[a-zA-Z0-9].*?/i'
    

To

    
    
      '/UPDATE\s+.*?\sSET\s[^>]*?\$[a-zA-Z_]/i'

------
JoshMandel
StackOverflow accepts edits to posts, if you have the karma for it.

Somebody might enjoy writing a bot to propose updates that demonstrate better
practices (like prepared SQL statements) for the 80% of bad examples where
this is a pretty trivial manipulation.

~~~
oneeyedpigeon
Does this site link to the relevant questions, though? Because I can't see it.

~~~
laurent123456
Not currently, but the data is there so it would be easy to provide a link for
each question.

Edit: Done - the date is now a link to the question.

~~~
oneeyedpigeon
Thanks very much. I know I've been a bit critical about the implementation in
these comments, but it's only because I see such value in the concept, and
that's much more important, tbf.

------
askmike
I haven't seen any examples (just code snippets), but I wouldn't call all
unsanitized queries SQL injection vulnerabilities:

    
    
        $sql = "SELECT * FROM inventory_item WHERE product_category_id='$pcid'";
    

The pcid var might have been checked/cleaned up before this line (might even
be part of the same SO answer).

\-----

Obviously that doesn't mean this is very bad practise, especially at
stackoverflow where you know people will just copy and paste your code.

~~~
vomitcuddle
yeah, unless the snippet also includes a definition for variable $pcid, there
is no proof that the code is actually vulnerable - using such simplistic
logic, the stats are likely hugely inflated

but i don't agree that it should be the poster's responsibility to show their
sanitation logic to prevent people from blindly copying-and-pasting their
code, if it's not relevant to their question

------
cdevs
These are the people we don't hire or if we communicate will with and they
seem teachable this is always are first test and step for teaching. I use to
be suprised how many college grads come out with a blank face when you ask
them about string cleaning or binding and MOST have worked some small
internship at a business which means they created a CRUD with plenty of sql
injection working for cheap instead of learning.

Side story had to yell at a new employee this year for obsessively posting to
stack queries like this instead of asking the senior devs. I've asked 5
stackoverflow questions over 6 years and answered 61..if your asking 3 a day
you need to rethink things.

------
cm2187
What I would also find interesting is to do the same exercise on the top
responses.

------
thisrod
In principle, it seems like this could be mitigated by some changes to the
language. Has anyone tried this? E.g.:

Strings have a _dirty_ flag. Literal strings are initially clean, CGI
variables are dirty, argv and file/socket input might be configurable.

Function parameters can be declared _evaluated_. Passing a dirty string as an
evaluated argument fails with extreme prejudice. The database primitives
enforce this. Maybe sockets have an _evaluated_ flag too, and only clean
strings can be written to an evaluated socket.

The string primitives propagate dirty flags: clean and dirty concatenate to
dirty, and so on.

There is an obfuscated way for pdo->prepare to make dirty strings clean, and a
special circle of hell for anyone who mentions that in user documentation.

~~~
DCoder
Perl has such a feature, it's called "taint mode" [1], it marks all variables
receiving external data as "tainted" and disallows their usage in sensitive
contexts.

PHP has an inactive proposal to implement it as well [2].

[1] [http://perldoc.perl.org/perlsec.html#Taint-
mode](http://perldoc.perl.org/perlsec.html#Taint-mode)

[2] [https://wiki.php.net/rfc/taint](https://wiki.php.net/rfc/taint)

------
pasta
I think it's even more shocking they appear in answers. But then again this
does only show they don't use prepared statements. You can't tell anything
about vulnerabilities.

------
chiefalchemist
Perhaps this is natural? That is, you're looking for help so you're likely to
have other flaws in your tool box.

Certainly, given the importance, something can be done to prevent relying on
the human element to get this right every time. We're how many years into
programming and SQL injection is still very real? So much for progress?

~~~
astrodust
It's not natural. It's unique to PHP.

Python, Perl, Ruby, Java, Node, they all have respect for best practices and
libraries that make coding SQL queries safely quite easy and pleasant.

PHP is a whole different beast. People are allergic to package managers, to
using "third party code", to using frameworks, even to reading documentation.
At times it's amazing how aggressively backwards it can be.

I'm not trying to characterize the entire PHP community, there are large parts
of it with people trying hard to do the right thing, but there's also this
vast wasteland of incompetent people publishing tutorials that are so
dangerously wrong with even more people willing to defend these tutorials as
being educational and useful.

Progress means learning from your mistakes, but some would prefer everyone
repeat them, painfully if necessary, to promote "education".

~~~
developer2
PHP has had the PDO extension since version 5.1, released in 2005. Blaming the
language for SQL injection hasn't been a valid excuse for over a decade.

The only problem with PHP is that it has the easiest barrier to entry - or
perhaps more accurately, no barrier to entry whatsoever. The kind of developer
you're describing would never have written a line of code if PHP didn't exist.
Python, Perl, Ruby, and Java just wouldn't have been an entry point into
programming for them. Node is different, as like PHP, it is accessible to
frontend developers who want to branch off into backend as an "addon" rather
than as a primary endeavor.

~~~
astrodust
PHP the language and PHP the ecosystem are different but inseparable. Although
_technically_ PHP has had PDO since 5.1 the community is extremely,
frustratingly slow to adopt new things.

I remember considerable resistance on the part of the community to the very
idea of object-oriented problem, something that massively limits uptake on
things like PDO and is the driving reason behind mysqli having a bizarre mixed
mode, OO or procedural based on preference.

Likewise, PHP tutorials, references, and guides kept pushing the absolutely
garbage mysql_query interface on people. I don't know how many books published
post-PDO insisted on using this approach because it was "easier" or whatever,
and some books don't even touch on SQL injection bugs, it's like they don't
exist, even while the examples lack escaping of any sort and would crash on
certain kinds of input.

> The kind of developer you're describing would never have written a line of
> code if PHP didn't exist.

Bullshit. These are people that want to learn to program, and as they look
around they see PHP is popular, it's easy to get support for, and hosting it
is easy. It's got a lot of positive points.

I've seen people with zero technical experience pick up Ruby and build a Rails
app inside of months, they get productive very quickly. Same could be said for
Node with the right mentoring, though the async stuff is a little harder to
get lined up. Client-side apps aren't all that bad, though.

PHP has a lot of good things going for it, but too much of the community is
absolutely in the dark about it. There's almost zero outreach being done.

~~~
chiefalchemist
It sounds like a case of more is less. That is, there's a tipping point -
that's not really PHP's fault - where quantity of participants makes quality
close to impossible. Yes, there is quality PHP usage but it's rare, if only
because the masses create so much noise. Look at WordPress for example.

~~~
developer2
WordPress is one of the worst examples you could ever point a new PHP
developer to. A friend once pointed me to Automattic as an entity to work for
(on WordPress) as a nomad developer. I took one look at wp-login.php in the
root directory and after 30 seconds of reading, knew I could never take the
people that work on that codebase seriously.

The first line of the file is "require( dirname(__FILE__) . '/wp-load.php'
);". a) "require" is a statement, not a function; it should not have
parentheses around it at all. b) Oh. my. god. Their coding standard require
spaces _inside_ parentheses; I have _never_ seen this in any language, even
from legacy codebases or esoteric developers - it's incomprehensible that some
senior developer managed to enforce this on the entire company. c) Oh, mixed
PHP class and HTML; a 1000 line file for a login script. RUN! SERIOUSLY, RUN
AWAY!

After this many years, backwards compatibility is not an excuse. WordPress
should be avoided at all costs, unless it is the only way you believe you can
build and sustain your business. If you have the choice to take _any_ other
avenue, take it. The developers working there have no clue, and/or are content
with a status quo that is 20 years behind acceptable standards.

~~~
chiefalchemist
To be clear, I wasn't saying WordPress was a positive example. In fact, as you
pointed out, quite the opposite. It's been drinking its own Kool Aid for so
long that the nOObs don't dare ask questions, and The Elites are too
comfortable to progress.

Yeah. You're right. Ugly.

------
based2
solution: ci with [http://stackoverflow.com/questions/4863969/php-source-
code-a...](http://stackoverflow.com/questions/4863969/php-source-code-
analyzers)

------
0hn0
It shows only bad practices of PHP and SQL. SO question were asked probably by
novice programmers or amateurs. I'm a PHP developer. I use PDO or other data
binding ORM/lib (ex. Eloquent ORM).

------
dorianm
Next step could be to auto-edit the answers with proper SQL queries? Most of
them look pretty simple to fix

------
xytop
!!! PHP MySQL extension doesn't support placeholders so everyone has to escape
params before mysql_query and then put escaped params inside query string. Of
course if there's used mysqli or PDO then need to use placeholders but in
other case - there is actually no choice for developer. So I'd rather not call
most of those - "vulnerabilities".

------
niklasber
I think it's very natural that code in questions is vulnerable to SQL
injections as it should be a stripped down example demonstrating the issue
that the poster is dealing with, not necessarily a copy-paste of code that's
in production.

