Hacker News new | past | comments | ask | show | jobs | submit login

FWIW, the conversion from NFA (non-deterministic, i.e. backtracking) to DFA (deterministic, linear time) can take exponential space. So there's another avenue for DDOS; it's a lot harder to exploit, though, because it requires the attacker to control the input (i.e. regular expression) to the NFA->DFA transformation, rather than merely provide a string that takes a long time for an NFA to recognize.



I'm actually not aware of any popular DFA based engines that suffer from this vulnerability. grep and RE2, for example, build the DFA lazily and cap the size of the DFA such that matching is still linear time even if generating the full DFA would take exponential space. (This is because at most one DFA state is generated for each byte in the input, so technically, you only ever need space for one or two states. In practice, you give a little more room to avoid constantly recomputing states, but it's still bounded.)


Random side note... Recognized your name, we use your TOML go package in a bunch of our tools. Thanks and keep up the great work!


There is a misunderstanding here -- NFA simulation is NOT backtracking. A NFA can be simulated in time LINEAR to the input length (holding the regex length as a constant). A DFA state corresponds to a set of NFA states. It is an additional optimization to convert NFA to DFA, but it just prevents doing repeated calculation -- it doesn't affect the computational complexity. Cox's articles address this and are quite lucid.

The problem is that the friedl O'Reilly book uses the terms NFA and DFA in a completely wrong and boneheaded way. I was confused by that book for awhile.


Only if you want to model it with every combination of states from the nfa getting one from the dfa. If you don't mind describing the current state as a list of potential states from the nfa, the conversion is linear space.


Usually there's no need to convert NFA to DFA. NFA can be simulated directly, i.e. by using bit vector instead of integer for state accounting. Also, no backtracking is necessary to simulate an NFA.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: