My favorite regex is the following, /^1?$|^(11+?)\1+$/ Which finds prime numbers...

lubutu · on Nov 13, 2012

I do dislike people calling that expression a "regex", because it isn't: regular expressions cannot contain backreferences, and must be computable in linear time, whereas primality tests are polynomial.

boyter · on Nov 13, 2012

While I agree I believe this comment by _delirium sums this up rather well,

http://news.ycombinator.com/item?id=1486502

full comment thread here http://news.ycombinator.com/item?id=1486158

lubutu · on Nov 13, 2012

I agree more with philh's response that there is no alternative term for the true meaning of "regular expression" — a regular language, as suggested by _delirium, is not the same thing.

I suppose I could accept "regex" as not being a regular expression as such, but the two are used so interchangeably that maintaining a distinction isn't very realistic. I'd personally rather a regular expression described a regular language, and "PCRE" (or so) used for the Turing-complete expressions with a similar syntax.

tshaddox · on Nov 13, 2012

I'm not a big fan of your explanation. To be more precise, true "regular expressions" are computationally equivalent to deterministic finite automata, which indeed can test an n-character string in O(n) time.

MileyCyrax · on Nov 13, 2012

NFAs and DFAs both recognise the regular languages (and only them).

laumars · on Nov 13, 2012

It's PCRE (Perl Compatible Regular Expressions) which is one of the most popular dialects of regex. But AFAIK there's isn't a hard and fast RegEx standard.

So I'd argue that code is RegEx.

I guess it's just a matter of perspective though.

mvzink · on Nov 12, 2012

I had to prove that in a formal languages class once and I still have no idea how it works.

emillon · on Nov 13, 2012

First, it does not match "prime numbers". It matches composite numbers in unary notation (n is represented by n '1' characters).

The first part (^1?$) allows "" and "1" to match (so that 1 is not detected as a prime).

The second part matches groups of two or more ones (11+?), repeated twice or more, ie products n*m, n ≥ 2, m ≥ 2.

The backreference means that \1 should match the exact same string as the first (11+?). It's different from using (11+?){2,} which would match n_1+n_2+n_3..., n_1 ≥ 2, n_2 ≥ 2, n_3 ≥ 2 (where submatch is independent).