Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are many considerations of course and I might not have fully grasped your point, but in reference to your last paragraph:

By default RE2 matches directly against the UTF-8 encoding, see this neat state diagram:

https://swtch.com/~rsc/regexp/regexp3.html#step3

In terms of backtracing engines, perl also stores strings internally as UTF-8 (on most platforms) and its regexp engine runs directly on them. It can also match "eXtended grapheme clusters" with \X.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: