Hacker News new | past | comments | ask | show | jobs | submit login

How do both of those compare to the C++11 std::regex or Boost::regex? At what point is it worth the switch?



If you really need the backtracking features in those other systems, the point is "never".

Similarly, if you have short-lived regexes that are compiled, used for a small amount of data, and discarded, you might never see a performance benefit.

Multiple regexes, scanning a substantial amount of data and/or having a requirement to 'stream' (i.e. process successive writes of input data when you can't hold on to old data) are the sort of things that make Hyperscan use more sensible. We see a lot of use in network security where these assumptions all generally hold.


Also, if you're taking in user input into a regex you want to use something like RE2 or Hyperscan. To do otherwise is to expose yourself to DoS.


is that just because they don't support the features which can be used to make pathological regexes in other engines?

or they have additional sanitisation or security features?


Neither. There are regexes that can be written in the common subset supported by (say) libpcre, RE2 and Hyperscan that will induce exponential backtracking with libpcre but not with the other libraries.

I'm not aware of any difference in terms of sanitisation or security.


C++ says what the regex must do, but not how it should do it or what the performance should be.

To be honest, I have no idea if the "what the regex must do" description rules out particular algorithms. I suspect that, in keeping with Committee tradition, it's basically a new interface over POSIX ( http://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html ). By "Committee tradition," I mean "compare 'catgets' with C++ message catalogs."


std::regex does not work in half of distros, boost is disaster, qt is ok. never.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: