
No-SQL injection in Mongo PHP - taylorbuley
http://www.php.net/manual/en/mongo.security.php
======
tptacek
This is not the only NoSQL-injection issue. Extra fun bonus warning: nobody
knows what the quoting domains are for all the NoSQL's. Be extraordinarily
careful with your keys.

~~~
rbanffy
Any query language can be injected if you don't sanitize your input. Any
object persistence layer that doesn't sanitize parameters is vulnerable to
this kind of attack.

~~~
btilly
You are advocating a semi-reasonable solution to a common mistake, possibly
because you can't conceive of not making the mistake. The better solution is
to follow the advice in <http://cr.yp.to/qmail/guarantee.html> of using APIs
that avoid parsing user input. Then no possibility of problems exists at all.

This is explained in item #5 of <http://cr.yp.to/qmail/guarantee.html> \-
_Don't parse._

Note that this is definitely true in this case. MongoDB's API is based on the
passing of data structures where you have well-defined places to put user
input directly. The problem emerges because PHP decides to attempt to _parse_
that user input in a way that it had no real need to, with bad results.
Without that parsing, there would have been no need to sanitize data because
there would be no possibility of ever having had bad data in the first place.

~~~
bmm6o
A light bulb went off for me when I saw SQL injection attacks described as
"subverting the structure of the query". The problem isn't sneaking bad data
into your query, the problem is that the data injected into the string alters
the parse tree in the dbms. This distinction isn't immediately obvious if
you've moved to a parametrized query solution, because there's still a lot
left to parse - it's just that the part that gets parsed is now a compile-time
constant.

~~~
tptacek
For whatever it's worth, even in parameterized SQL queries, the query
structure isn't always entirely fixed. There are injectable parameterized
query constructs, and some of them are fairly common.

~~~
bmm6o
I haven't heard of this before. Can you link to an example?

~~~
tptacek
On-the-fly table selection is a simple example, but there are more subtle
examples that happen even more often.

~~~
bmm6o
Unless I'm misunderstanding you, changing the table doesn't affect the shape
of the parse tree. (and I'm not trying to be pedantic - I'm genuinely curious
about this).

~~~
btilly
The issue isn't how you parse the query. The issue is how you choose to
execute it.

I'm not sure what tptacek is thinking, but here is an important case where you
really want the execution path to change.

Tables often have a very uneven distribution of data. This can mean that the
right query plan will depend on the data provided. Consider the following
query:

    
    
      SELECT ...
      FROM lease l
        JOIN property p
          ON p.id = l.property_id
        JOIN management_company c
          ON c.id = p.management_company_id
      WHERE l.created >= ?
        AND l.created < ?
        AND c.name = ?;
    

Pretty straightforward query, right? Now here is the question, should the
query plan start by filtering leases on date then flow forward to finally
filter on the management company, or by getting the right management company
and then flow backwards to filter on dates.

It turns out that there is no right answer. You want to start with whichever
restriction eliminates the most records, quickest. If the date range is wide,
we want to start with the management company. If the date range is narrow, we
want to start with that.

So let us simplify our life:

    
    
      SELECT ...
      FROM lease l
        JOIN property p
          ON p.id = l.property_id
        JOIN management_company c
          ON c.id = p.management_company_id
      WHERE l.created >= '2011-02-01'
        AND l.created < '2011-03-01'
        AND c.name = ?;
    

There is still no right answer! Let's suppose some company, call it 'AIMCO',
has a huge number of properties with a lot of lease volume while other
companies generally have 1-5. Then it is better to filter on dates first for
'AIMCO', and better to filter on the company first for anything else. Getting
the order wrong gives an order of magnitude performance difference. (This is a
real life example. I really did have a bug that basically read, "Make this
report run for AIMCO." And this was exactly the problem.)

Most databases will pick one query plan and go with it. But Oracle allows you
to turn on Adaptive Cursor Sharing. With this it will analyze the data, see
that AIMCO should get special treatment, and actually execute a different
query plan if you passed in 'AIMCO' than if you passed in something else.

So now you have a data dependent query plan.

~~~
bmm6o
Ok, so your example consists of switching between semantically equivalent
queries in the application layer to compensate for weaknesses in the DBMS
execution planner. I get that CPU time is a valuable resource and you don't
want to open yourself up to DOSing and all, but this isn't really anything
like SQL injections. The attacker still can't subvert the structure of the
query - changing the structure of the query was the solution to the problem.

~~~
btilly
_Ok, so your example consists of switching between semantically equivalent
queries in the application layer to compensate for weaknesses in the DBMS
execution planner._

Not quite. My example consists of the DBMS execution planner deciding that it
will switch between different semantically equivalent query plans depending on
the parametrized data passed. This switch should not be noticed in the
application layer.

Back to the main thread, I suspect that you are interpreting tptacek as saying
something he didn't. You seem to be looking for a potential security flaw. But
he didn't ever say that there was one. He only said that the query structure
can vary with the parameters passed to a parametrized query. I just described
one way that this can happen.

~~~
tptacek
Yeah these are great comments and all but I'm just saying that, last time I
checked, you can't bind a table name as a parameter in the MySQL protocol; if
your table names change at runtime based on input, you're back to concatenated
queries.

~~~
rbanffy
> you're back to concatenated queries.

Wouldn't it be safer to use a _switch_ statement?

~~~
bmm6o
In theory, it should be possible to enumerate your SQL queries ahead of time
rather than ever generating one on the fly. In practice, that could be a huge
pain of the O(2^n) variety, depending on the application type.

~~~
rbanffy
I assume the choice of table names or query templates in the case mentioned
(make a report for atypical client C run faster) would be limited to a handful
of possible query templates.

------
RyanMcGreal
> which PHP will magically turn into an associative array

I think I found the issue.

~~~
jameskilton
Rails does the same thing and it's insanely useful, you just need to know what
you're doing and ensuring you're protecting against this.

~~~
emil0r
Which means a) You're going to forget it at least once. b) For every developer
mindful of this, there will be hundreds not knowing about it.

If you ever put "you just need" into a sentence about security you've already
lost.

~~~
boundlessdreamz
The default Rails way of using Activerecord will automatically sainitize
inputs for you. Not turning it into an associative array doesn't add anything
to security.

User input need to be sanitized always. Rails make it easy and is the default
way.

~~~
rapind
There's nothing stopping you from passing unsanitized params to the database.
I.e. where("name LIKE '%#{params[:name]}%'") instead of where("name like ?",
"%#{params[:name]}%"). While the 2nd option is definitely the recommended way
to do it, there's nothing stopping you from doing it the wrong way, and I bet
many do.

~~~
RyanMcGreal
The thing is, it's _easier to do it the right way_. It requires a special,
deliberate effort to do it wrong. That seems like a reasonable implementation
of a "magic" feature. PHP, as usual, gets it exactly backwards. It's easier to
do it wrong, and requires a special, deliberate effort to do it the secure
way.

~~~
rapind
Yeah maybe. Depends on the person I think. I actually find it easier to
assemble a SQL string myself. If you're new to rails / AR and you're not
familiar with SQL then you'll look up and figure out how to assemble the SQL
via the ORM methods. If you're new to rails but an old hand at SQL, just
building the string is super easy and takes you 2 seconds without looking up
any documentation. Of course assembling the string AND sanitizing it requires
a heck of a lot more effort.

Regardless, it's not a knock on rails. Just saying it's not necessarily a
magic bullet to secure apps (and I don't believe rails nor PHP needs to be).

------
mscarborough
<http://www.php.net/manual/en/book.filter.php>

Lots of built-in options, the only reason this should be an issue is if you're
not validating inputs at all, or hacking your own as ronnoch noted.

~~~
sausagefeet
Which is quite possible unless you have a compiler forcing you to validate. I
make silly mistakes all the time in Python.

~~~
grncdr
Not to pick on your comment specifically, but I've seen a couple of comments
that imply this is the kind of mistake that one would make in a moment of
absent-mindedness, and I disagree. Nobody with any semblance of a clue is
writing (new) apps that access user input from PHP super globals directly. If
you are _ever_ using $_GET in your app-specific code you are doing it
__horribly wrong __, not making a silly mistake.

That being said, there are legions of clueless devs out there who will do
this, and that _is_ unfortunate.

~~~
sausagefeet
I agree with the clueless argument, but I also think there are mountains of
bugs that exist simply because one made a mistake and a dynamically typed
languages can't really do much for them there.

------
s00pcan
I might be missing something here, having never used MongoDB, but what does
this have to do with it? How else can this attack vector be used?

~~~
nbpoole
There's a very good example here:

[http://www.idontplaydarts.com/2010/07/mongodb-is-
vulnerable-...](http://www.idontplaydarts.com/2010/07/mongodb-is-vulnerable-
to-sql-injection-in-php-at-least/)

------
antirez
Non existing problem with Redis (not possible at protocol level), for what it
is worth.

~~~
mrkurt
It's not exactly a protocol issue for mongo, though. If it weren't for
Rails/PHP (and more) converting magical strings into objects of another type,
it wouldn't be possible in Mongo either.

This would be the equivalent of some helper wrapper on top of a redis driver
that turned strings with a special format into, say, an MGET command that
returned more than a developer meant for it to.

redis is safer because it doesn't have the complex query-over-document
functionality that mongo does, but it's not inherently immune.

~~~
antirez
The new Redis protocol (the only one used at this point) is inherently immune,
with SORT we have complex queries. But the point is, the way we send every
different argument to the server as a single entity with prefixed length,
makes the attack impossible.

