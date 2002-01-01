Hacker News new | comments | show | ask | jobs | submit login
Regex: badly needs fuzzing (boost.org)
58 points by infogulch 1 hour ago | hide | past | web | 27 comments | favorite





Google's cache of this bug (loads quite slowly for some reason):

https://webcache.googleusercontent.com/search?q=cache:mVrrFL...

reply


Text only link: http://webcache.googleusercontent.com/search?q=cache:https:/...

reply


The trac instance (and legacy SVN server) isn't really designed for slashdot (well, HN) effect traffic, runs on some jiggly piece of rust at some uni somewhere.

reply


Any rust lovers out there: Could I ask you do a benchmark comparison and a fuzz comparison. I'd be genuinely interested in the result and if (as you might hope) the Rust::regex is as fast as boost:regex, and never crashes, that would persuade at least me to finally learn some Rust!

reply


It does look like the afl.rs project (afl for rust code) has been run on regex:

https://github.com/frewsxcv/afl.rs#trophy-case

Which resulted in just one issue? I'm not sure how long they fuzzed or what the methodology was, but this was the panic they found (still not a memory safety issue, more akin to an unchecked exception in Java than a crash in C++):

https://github.com/rust-lang/regex/issues/84

Since that issue is from the same time as Rust's 1.0 release, I suspect that either it hasn't been run again recently or that things are pretty stable in the regex crate w.r.t. fuzzing.

reply


You can look at a well known (but not very complete) benchmark comparison here [0], rust wins, the fastest boost program is c++ g++ #3 and takes 8.5 times as long, the fastest c++ implementation (using re2) takes twice as long.

I don't know of a fuzz comparison, but there has been fuzzing done on the rust library without finding anything bad, e.g. see this issue [1].

[0] http://benchmarksgame.alioth.debian.org/u64q/performance.php...

[1] https://github.com/rust-lang/regex/issues/203

reply


If you were interested in performance you probably would not have been using boost::regex to begin with. RE2 is often an order of magnitude faster. You might choose boost if you require backtracking, but that's crazy anyway due to exponential time.

reply


In my experience, the exponential time thing isn't really a big deal. I've used Perl regular expressions on a very regular basis for about 16 years now. Exponential time has been an issue only once. Obviously if I were accepting regular expressions from random people, I'd use RE2. But for my day to day purposes, it's pretty much a complete non issue.

reply


What is backtracking?

reply


https://regex101.com/r/G23xYd/2

reply


A feature of certain typed of regular expression engines. It allows for certain types of regular expressions but at the cost of possibly going exponential if you aren't careful about your expression. [1]

1: http://www.regular-expressions.info/catastrophic.html

reply


http://stackoverflow.com/questions/9011592/in-regular-expres...

reply


Regex seems to be one of the big strengths of Rust, performance wise: https://benchmarksgame.alioth.debian.org/u64q/performance.ph... (C++ g++ #4 uses Boost)

reply


The Rust regex library is written by the same guy who wrote ripgrep. I'd be surprised if it didn't run circles around boost::regex.

reply


Another counterexample to the idea that modern C++ written by experts is free of memory safety issues.

reply


Not that I necessarily think it'd avoid the problem even then, but I wouldn't call this modern C++ exactly. For one thing, it's (c) 2002, so written in C++98. And for another, it looks very uh, C-ish (not that uncommon in the C++98 era). In the bad sense of lots of error-prone pointer-based string manipulation, stuff like this:

    if(STR_COMP(s1, p) >= 0)
    {
       do{ ++p; }while(*p);
       ++p;
       if(STR_COMP(s1, p) <= 0)
          return set_->isnot ? next : ++next;
    }

reply


Well, yes if the strawman "all modern C++ written by experts is free from memory safety issues" is what you're countering. I find that to be gratuitous and petty, and not a good representation of Rust, however.

reply


If the solution doesnt make it any easier to avoid memory issues (just forces you to avoid them,) its not an attractive solution

reply


Given that we're talking about bugs, I don't think this is a valid argument. Being forced to avoid bugs is good, yes? Even if it's not necessarily less hard to make them in the first place (which is arguable, too!).

I mean, I spend most of my time coding searching for bugs, not avoiding them.

reply


Avoiding bugs is a pretty effective way to not have bugs.

reply


I disagree? It's better to not have memory safety problems than to have them!

reply


Practical upshot: do not feed untrusted regexes into boost::regex.

reply


Or any untrusted user input for that matter?

reply


They seem to be fuzzing the regex, not just the input it is applied to. This may or may not change the results, but if you're allowing users to input arbitrary regex patterns you have a whole lot of other problems.

reply


Huh? It would be perfectly valid to have it power the regex part of the scripting engine in a browser, for example. If that would then lead to memory safety errors, you just got yourself a 0-day.

Regex engines used in browsers are both fast and hardened against these attacks.

reply


I've seen plenty of places that you may want to accept an arbitrary regex from the user. An app could allow the user to set up a filter for messages or usernames by putting in a regex. Or an interpreter for a sandboxed language could provide regex support.

reply


All manner of problems in the programmers mind become trivial if only we allow users to input essentially code to express exactly what they want. Of course this is basically never a good solution.

The issue with allowing arbitrary regex patterns is DoS through exponential blowup. But if you allow running code anyway you might not very much care for that.

reply




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: