Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For any finite set of quotation character pairs you can get away with a strategy like `(left_char1)[^right_char1](right_char1)|(left_char2)[^right_char2](right_char2)|...`. Escape characters aren't much harder to accommodate.


Of course, but you are out of luck in terms of complexity. (Colloquial) regular expressions lack any kind of abstractions.


Absolutely. I don't for a moment think they're the right tool for the job.

They are fairly powerful in terms of what they're capable of parsing however (not enough for an arbitrary html document, but enough to handle the hairier situations in this thread that people thought they couldn't), and that does mean that a regular expression generator can handle all of those situations as well and potentially be much more readable.

If I found myself writing code like this I'd still want to reach for a better parsing technology, but you can use other languages to add abstractions to regex. Here's a Python3.6+ example assuming any desired backslashes have already been applied:

  '|'.join(rf'{a}[^{b}]*{b}' for a,b in pairs)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: