A regular expression to check for prime numbers 89 points by kqr2 on July 4, 2010 | hide | past | web | favorite | 13 comments

 Note: this is not a regular expression in the technical sense. It's common to talk about "Perl-compatible regular expressions" (PCREs) or regexes to avoid confusion.No regular expression can check for prime numbers because the language { 1^n | n is prime } is not regular. You can see this by applying the pumping lemma: http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_langu...
 My theory-of-computation prof a few years ago was of the opinion that CS-theory people should stop making that argument, and go with the standard usage of "regular expression" being a particular syntax for writing matching statements, of which PCRE syntax is the most popular, and use "regular language" if they want the CS-theory meaning.
 I don't disagree. But the programmers have "regexes" which refers unambiguously to their type of "regular expression"; there's no good equivalent for a CS theorist. (A regular language is not a regular expression, just like the set of prime numbers is not a test for primality.)
 The problem with that is that it ruins the excitement from hearing that "a regular expression can solve problem X." I believe most regex implementations are actually Turing complete, so it's no surprise that a problem can be solved with them. But that's just another CS-theory person argument.
 Yeah, I see regexes as more of a question of expression, in the pragmatic human programmer sense, rather than formal expressibility. Some things are natural and easy to write using regexes, and some are much better done some other way, but I'm not sure the Chomsky hierarchy shows where that boundary is.
 I agree, however I feel that in almost every case, regexes are only natural for matching regular languages.
 Yeah, it uses minimal matching and backreferences to test divide the number by everything from 2 to whenever the regex parser gives up (probably around n/2).It was really confusing to me how this would work correctly on 12, 13, etc. until I read through there and realized that the program rewrote the number in unary (i.e. 2 -> 11, 3 -> 111, 4 -> 1111, etc.) before applying that regular expression.From there, it does a quick test to correctly handle the zero and 1 case (empty string or a single 1), then it goes into the backtracking pattern, which tries to divide it by each number less than it until the parser succeeds or gives up. Anything matching is composite, anything not matching is prime.Incidentally, whatever site he originally linked it from appears to be dead now.
 The pattern detects if a number N is not prime by searching for A and B such that N = A*B. The (11+?) searches for A bigger than 1, and the repetition of this group (the \1+ part) repeats A as much as possible, eventually B times. If an exact match isn't possible, it backtracks and advances A. If no such A and B exist, A will keep increasing until it becomes bigger than N, and then the match fails.An interesting pattern that wasn't immediately obvious, but a very inefficient way to detect primes :)
 This regex has been mentioned on HN a few times before.If your browser has Java applet support, here you can watch the regex succeed (demonstrating a number is composite) or fail (prime):
 If I recall correctly, Hofstatder uses the same principal for developing a formal language for classifying primes in Goedel Escher Bach.
 This has been around for a while but it's a really nice way to show what's possible if you have backtracking capabilities.
 The comments make for fun reading :)My favourite:for calculations with big numbers, try F#
 Fascinating!!Yet scary how many folks confuse odd numbers with primes on the original blog's comments.

Search: