Hacker News new | past | comments | ask | show | jobs | submit login

My favorite regex is the following,


Which finds prime numbers. Although, I can't for the life of me think of a reason for using it.


I do dislike people calling that expression a "regex", because it isn't: regular expressions cannot contain backreferences, and must be computable in linear time, whereas primality tests are polynomial.

While I agree I believe this comment by _delirium sums this up rather well,


full comment thread here http://news.ycombinator.com/item?id=1486158

I agree more with philh's response that there is no alternative term for the true meaning of "regular expression" — a regular language, as suggested by _delirium, is not the same thing.

I suppose I could accept "regex" as not being a regular expression as such, but the two are used so interchangeably that maintaining a distinction isn't very realistic. I'd personally rather a regular expression described a regular language, and "PCRE" (or so) used for the Turing-complete expressions with a similar syntax.

I'm not a big fan of your explanation. To be more precise, true "regular expressions" are computationally equivalent to deterministic finite automata, which indeed can test an n-character string in O(n) time.

NFAs and DFAs both recognise the regular languages (and only them).

It's PCRE (Perl Compatible Regular Expressions) which is one of the most popular dialects of regex. But AFAIK there's isn't a hard and fast RegEx standard.

So I'd argue that code is RegEx.

I guess it's just a matter of perspective though.

I had to prove that in a formal languages class once and I still have no idea how it works.

First, it does not match "prime numbers". It matches composite numbers in unary notation (n is represented by n '1' characters).

The first part (^1?$) allows "" and "1" to match (so that 1 is not detected as a prime).

The second part matches groups of two or more ones (11+?), repeated twice or more, ie products n*m, n ≥ 2, m ≥ 2.

The backreference means that \1 should match the exact same string as the first (11+?). It's different from using (11+?){2,} which would match n_1+n_2+n_3..., n_1 ≥ 2, n_2 ≥ 2, n_3 ≥ 2 (where submatch is independent).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact