[A-Z]*H # prefix
\d+ # digits
[a-z]* # suffix
(?: \( \s* )? # maybe open paren and maybe space
(?&code) # one code
(?: \s* \+ \s* (?&code) )* # maybe followed by other codes, plus-separated
(?: \s* [\):+] )? # maybe space and maybe close paren or colon or plus
( (?&multicode) ) # code (capture)
( .*? ) # message (capture): everything ...
(?= # ... up to (but excluding) ...
(?&multicode) # ... the next code
(?! [^\w\s] ) # (but not when followed by punctuation)
| $ # ... or the end
fr”this is both an f- and an r-string”
int x = ...;
I didn't know about this and was a happy regular expression user without it, but this looks like a good feature for the specific use case of wanting other people to understand the structure of your regular expressions. And much more portable than I would have expected.
A regex is just a fast, usually integrated into the language, universally understood way to do some simple parsing. Parser combinators are an amazing specialized tool for building parsers, but one that is generally harder to integrate into a code-base (outside of e.g. Haskell), requires lesser known libraries, and is a paradigm that people need time to get used to.
It's like saying people shouldn't use a mitre-box  and instead use a full fledged circular mitre-saw . Yes the second tool is much more versatile, powerful and useful. But it requires much more setup, skill and investment to actually use.
It looks like this:
pattern = (
r'[A-Z]*H' # prefix
r'\d+' # digits
r"[a-z']*" # suffix
identifier <- [A-Za-z][A-Za-z0-9_]*
charsetchar <- "\\" . / [^\]]
charset <- "[" "^"? (charsetchar ("-" charsetchar)?)+ "]"
The fact one can get similar ergonomics this way in straight Python is wonderful! I'm definitely going to leverage this.
I've done similar in other languages, but it's never felt quite right. re.VERBOSE is also handy to know.
Parsing expression grammars are easily one of my favourite tools. Honestly, I find them superior to regexes.
I decided to go looking for when Perl got verbose regex. The oldest thing I can find on perldoc.perl.org or cpan.org is 5.004 (1997), where they were an existing feature.
EDIT: Found 4.036 sources (1993). A quick scan of the man page (troff source!) does not find verbose regular expressions. So it looks like they were introduced very early in the Perl 5 series.
The first definition is as you say: an expression that has a constant value.
The second definition is: an expression that is the primary syntactic form to construct a type. For example: “array literals” construct arrays, but may contain arbitrary expressions within.
The first definition is more common in low-level languages where there is a place in the compiled executable to put constant data. These languages might call the second form an initializer rather than a literal. But in a dynamic language such as Python the distinction is less important.
I suspect a more proper description would be: #1 is the correct definition, formally used for 80 years now. #2 is incorrect, and is being abused by people in JS and Python who should know better.
I would have called them "interpolated strings" or even e-string but the f-string moniker had already caught on and there was no stopping it.
By straight python I mean things like 'for', 'split', 'startswith', 'find', and regular character indexing.
So for me this post is a solution to a problem that I just avoid.
No. f-strings are an awful idea. And combining them with r-strings is yet another awful idea. Please, never do that.
I'm also not a big fan of extended / verbose regular expressions because it creates ambiguity in interpretation (a slightly different language to define regular expressions). It's a bad solution to the problem of building longer expressions, which should've been addressed in a different way: by making the language of regular expressions more modular, not through allowing more hard-to-interpret language details.
― Tim Peters
For example the title is "Star Wars", the subtitle is "The Empire Strikes Back"
So does string formatting. You don't need f-strings for this pattern to work.
even more reason to love python, right now for me slots and dataclasses is my new obsession. there was a great article that was posted here that went into details about 3.8 and up that featured all these great python hacks
This chart alone should be telling. Ruby has more learning curve, smaller set of libraries and stable but slow rate of innovation. The returns aren't that great for something like Python developers who can be trained on most tool in its ecosystem.
I think the bigger problem is not so much that regexes are generally defined by strings specifying the desired regex, but that at some point between the development of printf and the development of regex libraries people forgot that they could use whatever escape character they wanted when implementing a new conceptual data type. In C, the compiler deals with strings, and the printf function deals with strings, and they try not to conflict with each other by assigning the escape character \ to the compiler while printf uses % instead.
But in Java, and Python, and presumably many, many other languages, some idiot decided that if strings used \ for their escape character, regex functions should also use \ for their escape character. Since they accept strings, suddenly the regex escape character is actually "\\". How do you match a single literal backslash? "\\\\", obviously. What's wrong with that?
I mean python's regex module was added in 1997 and hasn't fundamentally changed since. I don't think that concept was super common in '97.
(E did have this concept back in '97, iirc, though yeah I wasn't expecting Guido to have run into it then, that wasn't when I meant.)
An f-string fills holes in a format string to build a string.
A template literal parses a template with holes to produce any datatype you like, filling it with arguments of any appropriate type.
Lisp's quasiquotation is similar in spirit though different in appearance.
This is my last try to explain here. I guess this thread shows that tagged template strings are much less well known on HN than I thought. (Yes it also shows me I was unusually bad at communicating.)
> the complaint that you wish the formatting API made regexes a first class citizen is odd
That was not what I was trying to say. The template literal mechanism knows nothing about regexes. Regexes are just one particular type and one particular syntax.
But any regex engine that can work with a parse tree shows the same principle, e.g. https://edicl.github.io/cl-ppcre/#create-scanner2
Since you explicitly talked about filling the fields in an f string with already parsed regex objects instead of strings, it's hard to see what else you could mean. But even if I s/regex engine/DSL parsing engine in general/, I would like to see an actual example of a language or library where I can have a string like a Python f-string whose fields can be filled with some kind of parsed "engine" object instead of another string.
> Composing DSL programs by string concatenation is such a famous source of security bugs you see it in top-10 lists.
I don't see how composing DSL programs by filling in string fields with parsed "engine" objects is much better. I personally don't like regexes in general because I find them too hard to reason about unless they're extremely simple (and regexes that simple usually aren't necessary). I would rather try to write library functions (which might include functions that build other functions) in the same language as the rest of my program.
You could make one called, say, rx for regex. Then