
Mongo BSON Injection: Ruby Regexps Strike Again - homakov
http://sakurity.com/blog/2015/06/04/mongo_ruby_regexp.html?
======
jasonmp85
> 3 years ago I wrote a blog post about broken regular expressions in Ruby, ^$
> meaning new lines \n.

Are they actually _broken_? One of the first quirks I learned writing Ruby is
that you use \A and \z instead of ^ and $.

I know better than to blame security vulnerabilities on "bad programmers"
rather than usability problems with a language (or plain old PoLA violations),
but changing the meaning of these anchors will be a tough migration, possibly.

~~~
developer1
Why is everybody talking about ruby's regex implementation? ^, \A, $, \Z, \z
are standard across many languages, based on PCRE syntax. This isn't a ruby
problem; this is simply developers who are not experts with regular
expressions not fully understanding how to write patterns with the tool they
are using. There are no problems with regex implementations, only problems
with regex use.

Anybody interested should have a read through
[http://pcre.org/pcre.txt](http://pcre.org/pcre.txt). The syntax presented
here is used in perl, php, ruby, python, and many others.

Also, nobody ever uses \A without the /m flag. You use ^, it has the same
meaning unless you specifically add the /m flag to allow ^ to match at the
beginning of any line rather than at the front of the string only. This
distinction will only bite developers who just add flags like /msig for every
regex, because again they don't understand exactly what every flag actually
does.

~~~
riffraff
ruby has half multine mode by default, which causes the issue. Namely, /m only
changes the behaviour of "." to match \n, but ^ and $ work the same.

(and incidentally, that would make changing it easier, you can just request
that users specify a flag all the time and deprecate the one without).

~~~
developer1
Eww what the hell. I don't use ruby, and if I did I might very well - as a
PCRE half-expert - fall into this trap based purely on the assumption that
ruby was using PCRE. I just looked at their Regexp class, and it matches PCRE
in most regards. The fact that /m makes . match newline \n is horrible - every
PCRE-based implementation uses /s for that, whereas /m only affects ^ and $.

It still falls on the developer to understand the exact flavor of regex
available in their language. And yet ruby is doing a disservice to anybody
coming to their language with existing PCRE knowledge by having syntax that is
almost an exact match to PCRE used in many languages... only to find out
someday that it's not. Harsh.

------
gabeio
To clarify this isn't _Mongo 's_ BSON, it's _Moped 's_ implementation of
BSON/Ruby's implementation of BSON (again). The title is fairly misleading
making it sound like it's actually Mongo which is vulnerable. Still
interesting stuff though.

~~~
mbell
This is not correct.

The vulnerability is in `bson-ruby`[1] which is written by MongoDB and used by
Moped (and thus Mongoid), the official Ruby driver from MongoDB, and Mongo
Mapper.

The only thing that _isn't_ vulnerable is Moped's BSON implementation (if
reasonably recent), but it was dropped in Moped 2.x.

In reality if your using Mongo with Ruby, your most likely vulnerable, unless
you happen to be on Moped 1.x.

[1] [https://github.com/mongodb/bson-
ruby/blob/84d8acd32ce9067ad6...](https://github.com/mongodb/bson-
ruby/blob/84d8acd32ce9067ad646755e52f472a6ad685918/lib/bson/object_id.rb#L285)

~~~
gabeio
> This is not correct.

> The vulnerability is in `bson-ruby`[1] which is written by MongoDB and used
> by Moped (and thus Mongoid), the official Ruby driver from MongoDB, and
> Mongo Mapper.

Then it's in the _ruby gem_ of MongoDB's driver for ruby NOT in _MongoDB_. The
title is still misleading for people who do not code in ruby and therefore are
not vulnerable to the apparently ever present ruby BSON bug.

> Mongo BSON Injection

A better title would be Mongo _gem_ BSON Injection

I am not trying to nit-pic I was fairly confused when seeing the title because
I don't code in ruby and was 99% sure Mongo's core was C not ruby.

~~~
lvh
This doesn't detract from your point, but Mongo is primarily C++, not C.

------
maerF0x0
Here's one lurking in javascript:

/A-z/ includes "[]^_`" [1]

Now go search github [2] and see the +1k repos that have this bug in their
parsing of base64

Sources: [1]: [http://wtfjs.com/2014/01/29/regular-expression-and-
slash](http://wtfjs.com/2014/01/29/regular-expression-and-slash)

[2]:
[https://github.com/search?utf8=%E2%9C%93&q=INVALID_BASE64_RE...](https://github.com/search?utf8=%E2%9C%93&q=INVALID_BASE64_RE&type=Code&ref=searchresults)

~~~
pxndx
This isn't WTF at all, that's exactly how regexes should work!

~~~
kiallmacinnes
In case anyone is wondering why this isn't a WTF - See [1]..

[1]: [http://www.asciitable.com/](http://www.asciitable.com/)

------
jph
One way to help protect this kind of issue is to be explicit about validation
steps.

This is the buggy code:

    
    
        !!str.match(/^[0-9a-f]{24}$/i)
    

That regex is trying to do three different things: validate the length is 24,
validate the string contains alphanums, and ensure the matching is pinned from
start to finish.

I prefer code that makes the validation steps explicit and simpler:

    
    
        str.length==24 && str!~/[^0-9a-z]/i

~~~
arielby
Its a whitelist. It is "verify the string is a 24-character alphanumeric
string".

------
jpatokal
Egor Homakov strikes again. This guy is a _machine_ , just look at the
vulnerabilities he's found this year alone:

[http://sakurity.com/blog](http://sakurity.com/blog)

~~~
atonse
Oh boy I'd love to watch this guy in action in a twitch.tv stream, taking down
a site (to clarify... white hat stuff!). It would be so damn fascinating to
see how his mind works and how he makes the leap from 0 to exploit.

~~~
icpmacdo
Is there anything currently like this for other people?

~~~
tobeportable
I've been thinking about live coding on twitch but having to think about not
revealing any sensitive information would make it a hassle. I will give it a
try for some side projects.

------
hnanon6
I suspect that many of these same Rubyists who mistakenly assumed ^ and $ have
the normal PCRE semantics would still readily claim to know Ruby, and in many
cases, even to know it well. I think this type of vulnerability undermines the
optimistic belief that any decent programmer can quickly learn a new language,
and further, it shows the danger of adopting new languages generally,
especially those with extremely complex and not particularly well-defined
syntax and semantics, like Ruby.

I also wonder how many vulnerabilities result just from Rubyists favoring
cutesy APIs (or "DSLs," as they call them) that while making for great demos,
hide the often times unignorable, crucial details of what they do from their
users.

------
gshutler
Where's the attempt to submit a patch to fix the problem before disclosing?

------
cheald
The 1.x versions of BSON are vulnerable, too, FWIW.

~~~
rsutphin
I tested on our app (which uses BSON-ruby 1.9.2) and was surprised to find
that the detection code indicated it was not vulnerable. Turned out it was
because we also use bson_ext — bson_ext replaces the vulnerable method with a
C implementation which doesn't use regexes.

~~~
ploxiln
Kinda funny to see a "safe" language saved by C. Just sayin'

------
eddanger
Added a patch for Moped::BSON in Rails here

[https://gist.github.com/eddanger/9408317d5d508d8e9ba7](https://gist.github.com/eddanger/9408317d5d508d8e9ba7)

------
veesahni
This has been fixed. Just update moped:

gem "moped", "~> 2.0.5"

------
allcentury
Is this only an issue if you are defining the BSON _id ?

