The "Papers" section on re2c's web site continues Laurikari's work: http://re2c.... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

burntsushi on April 9, 2023 | parent | context | favorite | on: Irregular Expressions

The "Papers" section on re2c's web site continues Laurikari's work: http://re2c.org/

... but I haven't found them particularly accessible. And it's not clear it's a viable strategy in a general purpose regex engine. Namely, I'm not sure how much bigger it makes the DFA.

Also, AFAIK, these aren't DFAs. They are different theoretical structures with explicitly more power.

> and then an NDFA is used to match a third time, to extract the capture groups.

That's the PikeVM. It's an NFA simulation. Although it uses additional storage and is otherwise more computationally powerful than just a plain NFA.

ridiculous_fish on April 9, 2023 [–]

Thanks! I hadn't encountered that top paper "A closer look at TDFA."

That paper claims to be "the first practical submatch extraction and parsing algorithm based on DFA" and it came out only last year! It shows how new this theory is.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact