? Otherwise it matches many other strings (e.g. 001011) in addition to 0^n 1^n; you actually want it to match zero or one times instead of zero or more. Unfortunately the explaination why (1) works is a bit faulty.
I didn't know recursive regexes before, thanks for the article.
To sum it all up: most "regular expression" implementations are actually powerful enough to describe context-free grammars, which are essentially regular expressions mixed with balanced parentheses. The fact that the author manages to never use the term "context-free grammar" leads me to believe he's unduly surprised at his discovery.
Well said. I will include this in the article. Couldn't figure out how to say it well before because they are still not equivalent to context-free languages, they are just sometimes powerful enough to describe some of the context-free languages. Therefore I didn't include it. I think only the regex implementations with recursion are context-free. Others are not, therefore I wouldn't say "most" of them are.
You're right that I was surprised because I hadn't heard of recursive regular expressions in Perl before.
Update: I just remembered that extended regular expressions can even recognize recursive languages. Here is a regex that decides if the number is a prime:
perl -lne '(1x$_) !~ /^1?$|^(11+?)\1+$/ && print "$_ is prime"'
This goes beyond context-free, therefore I couldn't just say "context-free" carelessly in the article.
Technically, a CFG is the intersection of an n-balanced parentheses language and a regular language. Therefore, any regular language can be described by a CFG (but obviously not vice-versa).
I am completely unfamiliar with Perl and its regular expressions, but from your article and this comment I get the impression that they are Turing-complete, which would mean they're capable of describing any formal language as well as doing any "doable" computation. Is that correct?
This is incorrect. What is being described is closer to PEGs. A parser for context-free grammars will correctly handle ambiguity between alternatives, whereas a PEG (and modern regex engines) will just try all of the alternatives and choose the first one that matches.
Allison Randal, the chief architect of the Parrot virtual machine (that Perl 6 runs on) has one of the most under-rated views regarding this --
There's an odd misconception in the computing world that writing compilers is hard. This view is fueled by the fact that people don't write compilers very often. People used to think writing CGI code was hard. Well, it is hard, if you do it in C without any tools. I don't know anyone who writes CGI code in C anymore. If we wrote compilers as often as we write shopping carts, or web forums, or wikis, there would be just as many tools available to make the job easy. (And just like web tools, only 10% of them would be worth using. That's evolution in action.)
Completely true. In fact nothing is hard in the computing world. Everything is doable. And if someone says it's not doable, they are not trying hard enough.
If we can write a program that translates said arbitrary program into the rules for an n-state, m-symbol turing machine, where n = (2|3|4|5) and m = 2, then yes - we can determine if the program terminates or not.
We can make some fuzzy guesses whether or not the program terminates or not in other state/symbol combinations.
If we ever did write said program that could determine whether any /arbitrary/ program is infinite, then we have solved the halting problem. And that would be quite something.
Allison's point is instead that the combination of the perception that compilers are difficult and the relative paucity of tools to make writing compilers easier is a vicious cycle which writing better tools might fix.
It's a neat trick but as the author says, it's probably too cryptic to use for "real" code.
Lua's LPeg lets you create simple recursive grammars to solve problems that are often hard with regular expressions (e.g., matching balanced parentheses). It's the one library from Lua I find myself seriously missing when programming in other languages.
I didn't know recursive regexes before, thanks for the article.