Hacker News new | past | comments | ask | show | jobs | submit login

The biggest point of complexity, IMO, is the representation in which the characters to be matched are mixed in with the syntactical elements in a single string.

This tends to cause confusion, makes escaping complex, and is just really different from most other ways of expressing logic in programming.

Even advanced regex features like extended mode and named captures, which I love, don't get around this fundamental issue with the representation.

It does have advantages though, a big one being terseness, and the ability to be expressed in an implementation-independent way. Another is that for some types of regexes, the expression "sort of looks like" the strings it can match against, which can help intuition.

I think the way forward is to consider the existing syntax a "terse mode" and to use regex builder libraries [0] as a "verbose mode." The "terse mode" does have merits, though, and I wouldn't want to get rid of it or anything.

[0]: For example https://github.com/VerbalExpressions/JSVerbalExpressions is a library that lets developers express regexps using a fluent API. This changes things from being syntactic elements of a string to just being traditional function calls.




> For example https://github.com/VerbalExpressions/JSVerbalExpressions is a library that lets developers express regexps using a fluent API.

Except from what I can see, it only supports trivial cases. What's a syntax that supports grouping, alternation, a numerical range of number of matches (e.g. \d{3,10}), and the combining of any number of those those? I suspect it quickly devolves to the case where you either use the extended syntax and comments are really what matters, or the terseness is useful in at least not making it so verbose and meandering that it's hard to comprehend just because the amount of boilerplate to usefully express something dwarfs the functional bits.

Regular expressions work best when matching a a character level. The non-terse mode of stuff you are grasping at works best when matching at a symbolic level, and has existing for a very long time. They're called grammars. A popular form is BNF[1] (and variants), which will probably look somewhat familiar from technical specifications or RFCs, if you've looked at any before.

If you want something beyond regular expressions and you're willing to use a library for it, just use a library that provides grammar parsing. Or use Raku (fka Perl 6), which has support built in (with hooks for calling code at particular parsing points).[2] Hopefully most languages have a good library for parsing grammars at this point.

1: http://matt.might.net/articles/grammars-bnf-ebnf/

2: https://docs.perl6.org/language/grammar_tutorial




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: