Hacker News new | past | comments | ask | show | jobs | submit login

The "Papers" section on re2c's web site continues Laurikari's work: http://re2c.org/

... but I haven't found them particularly accessible. And it's not clear it's a viable strategy in a general purpose regex engine. Namely, I'm not sure how much bigger it makes the DFA.

Also, AFAIK, these aren't DFAs. They are different theoretical structures with explicitly more power.

> and then an NDFA is used to match a third time, to extract the capture groups.

That's the PikeVM. It's an NFA simulation. Although it uses additional storage and is otherwise more computationally powerful than just a plain NFA.




Thanks! I hadn't encountered that top paper "A closer look at TDFA."

That paper claims to be "the first practical submatch extraction and parsing algorithm based on DFA" and it came out only last year! It shows how new this theory is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: