Hacker News new | past | comments | ask | show | jobs | submit login

The point is, with the NFA approach, there are no regexes that exhibit catastrophic backtracking.

So you can't shoot your foot with expressions that seem fine but send your cpu off into la-la land when asked to match certain input data.




It's a strange property to want. It's not a property of most other parts of programming languages. You can write loops or function calls in any language that take exponential time. So why demand that regexps be immune to it?

It would make sense to worry about it if it were a common mistake, but I've never seen a gratuitously exponential regexp in the wild.


It's a thing: https://en.wikipedia.org/wiki/ReDoS

And it really happens. StackOverflow had a ~30 minute outage a couple years ago because of it: http://stackstatus.net/post/147710624694/outage-postmortem-j...

One of the key points here is that it's not always obvious from looking at a regex whether it will exhibit catastrophic backtracking (at least, to an untrained eye). Even more insidious, catastrophic backtracking may only happen on certain inputs, which means that tests won't necessarily track it either. Therefore, desiring a performance guarantee here is certainly not "strange."

The bottom line is that the various implementation strategies for regex engines have trade offs. On Internet forums, it's popular to "proclaim" for one side or the other. The backtracking advocates like to believe that catastrophic cases never happen or are easily avoided and the linear time advocates like to believe that so long as matching doesn't take exponential time, then it will be fast enough. :-)


Because most of the time you don't need the features that demand backtracking, and it's simple to use a NFA approach that avoids those kinds of issues.

Besides, people make mistakes all the time. Why not prevent the possibility?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: