
A regular expression to check for prime numbers - Moyamo
http://www.noulakaz.net/weblog/2007/03/18/a-regular-expression-to-check-for-prime-numbers/
======
btilly
This regular expression is due to Abigail of Perl fame. Who is still very
active, see
[http://search.cpan.org/~abigail/](http://search.cpan.org/~abigail/) for more.

However while cute, it is very slow. It tries every possible factorization as
a pattern match. When it succeeds, on a string of length n that means that n
times it tries to match a string of length n against a specific pattern. This
is O(n^2). Try it on primes like 35509, 195341, 526049 and 1030793 and you can
observe the slowdown. (Some regular expression engines may employ tricks that
speed this up over the naive analysis, but none will be super speedy about
it.)

Usual arithmetic is much more efficient. :-)

~~~
j2kun
It's even worse, because the speed of factoring algorithms is measured against
the number of digits of the binary expansion, and this is a _unary_ expansion.
So it's really O(2^n^2)

~~~
nhaehnle
You really have to put parentheses when building "towers" of exponentials. The
bound is (2^n)^2, which simplifies to 2^(2n). This is of course still
horrible, but much less so than the 2^(n^2) for which what you wrote may be
mistaken.

~~~
j2kun
Yeah, I realized that after the edit link expired. Let's just disambiguate the
expression in whatever way makes me look best :)

------
anyfoo
For some this will be obvious, but for others it may still be interesting:

If this was a "true" regular expression, in the sense that it describes a
regular language
([http://en.wikipedia.org/wiki/Regular_language](http://en.wikipedia.org/wiki/Regular_language)),
then checking whether the number is prime or not, by checking if the
expression matches, would run in linear time. This would be worth a few nobel
prizes at least.

Unfortunately, it is an "extended" regular expression, i.e. not regular at all
in the strictest sense, and (as the article says) requires backtracking.

~~~
btilly
Not true!

First of all there is no Nobel in math/CS.

But more importantly, when we talk about the running time of algorithms, we
talk about the time in relation to how many bits it takes to describe the
number. So linear in the size of the number is exponential in the number of
bits, and that is not very interesting.

And a better primality test would not rock the world of cryptography. At the
moment
[http://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_...](http://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test)
is demonstrably good enough as a prime test that nobody really is hurting for
better ones. The thing that you want is a better _factoring_ algorithm. And
the General Number Sieve is already subexponential, and so better than any
regular expression could be.

~~~
anyfoo
_First of all there is no Nobel in math /CS._

I know. An algorithm this efficient would still be worth that much. :)

 _But more importantly, when we talk about the running time of algorithms, we
talk about the time in relation to how many bits it takes to describe the
number. So linear in the size of the number is exponential in the number of
bits, and that is not very interesting._

No, the regular expression described here works on numbers in unary
representation. Linear to the size of the unary representation is linear to
the number itself.

~~~
sold
Unary representation is useless when speaking about complexity in number
theory.

If you take a number N given in unary, convert it to binary and do trial
division up to the square root, it will take O(N log N) time for conversion to
binary and O(sqrt(N) * log(N)^2) for trial division (depending on your
computational model, it could be O(N) and O(sqrt(N)) - I am counting bit
complexity). In total, it's O(N log N). The runtime is dominated by reading
the input! The complexity of trial division and the brilliant AKS algorithm is
the same from this viewpoint.

Even if you had an algorithm that did not have to convert to binary and could
tell in time linear to unary represenation whether a number is prime, it would
be interesting trivia but nothing worthy a Nobel prize. In practice numbers
are given in binary (or some other base>1 number system). To use your
algorithm, you would have to convert to unary, which already means trial
division would be faster.

~~~
anyfoo
Ah, I see, thanks for the explanation.

------
xyzzyz
It has been posted before [2], so I'll repost my comment from back then [3]:

It's cool, but it's regexp, not regular expression -- regular expressions
cannot test for primeness. The proof is a simple application of pumping
lemma[1]: we want to show that language { a^p: p prime } is not regular.
Suppose it is. By pumping lemma we have numbers N, n >= 1, such that for every
k >= 0, we have a^(N+nk) in our language. But that means that for every k,
N+nk is prime, and it fails for k = N, since N+nN = N(1+n) is not a prime,
since both N and 1+n are greater than 1.

[1] -
[http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_langu...](http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages)
[2] -
[https://news.ycombinator.com/item?id=3391547](https://news.ycombinator.com/item?id=3391547)
[3] -
[https://news.ycombinator.com/item?id=3391572](https://news.ycombinator.com/item?id=3391572)

------
elzr
This is indeed beautiful. But the author limits itself to a low-level
unpacking of the regex and running a few tests cases, all the while remaining
mystified as to why this works.

A high-level description helps makes obvious what's happening:

    
    
      To check the primality of a number N with string matching,     
        repeat a string (say, "1") N times
        and declare N prime:
          UNLESS the repetitions match the string 0 or 1 times
          OR the repetitions can be EVENLY matched into groups of MORE than one string.

~~~
city41
I wouldn't say he remained mystified. His exploration of the 9 case basically
explained how it worked, just the author chose to leave it a bit implied for
the reader to figure out. I would agree for a blog post it's usually better to
hammer the point home though.

------
andrewflnr
Uses back-references, so it doesn't describe a regular language in the strict
sense. Still, it's pretty cool to use them to trick a string-processing engine
to do integer factoring.

~~~
ubernostrum
"Regular expression" lost the sense of "strictly only regular languages"
around the time that Perl crushed the world.

It's probably time to give up on that terminology nit-pick.

~~~
kazinator
> _It 's probably time to give up on that terminology nit-pick._

Many Perl coders are self-taught programmers with no CS background. If a man
page calls something a regular expression, that is good enough for them.

Why drop to that level if you know better?

I don't want to appear "gratuitously ignorant" in front of someone who might
be a decent computer scientist.

There are already ways in which I will appear ignorant due to actual
ignorance; why go out of my way to dumb down the way I speak about something
that I _do_ understand?

If we are going to wreck math terminology why don't we just redefine "prime
number". Let's see: 3 is prime, 5 is prime, 7 is prime, oh ... so 9 is prime
11, is prime, 13 is prime, 15 is prime ... And, what do you know: now we can
use a regular regular regex for primality testing over the binary
representation: why, it's just (0|1)*1. As a bonus, we can restore 1 to its
proper status as a prime number and kick out the oddball 2.

Say, if Perl called an odd test function "isprime", would you start calling
odd numbers prime?

~~~
teddyh
I believe the next line in this traditional dialogue is “Language is a fluid
thing and words are defined by their use!”

Your turn.

~~~
FeeTinesAMady
I can't tell if you're defending this line of thought or making fun of it, but
the term "regular language" has a specific mathematical meaning and perverting
it to include how Perl uses the term is wrong, plain and simple.

~~~
teddyh
I was, in fact, slightly mocking both the argument and the related argument
that people who use sloppy language like to bring out in favor of using words
to mean whatever they like. I certainly agree that words with precise meanings
are important to use correctly.

------
tom-lord
I'm currently working on a ruby gem to generate _examples_ of given regular
expression. For example:

    
    
        /this|is|awesome/.examples # => ["this", "is", "awesome"]
    

So, I tried it out on this "prime number validator" regex:

    
    
        /1?|(11+?)\1+/.examples
          #=> ["", "1", "1111", "111111", "11111111"]
        /1?|(11+?)\1+/.examples(max_repeater_variance: 50, max_group_results: 100).uniq.sort.map(&:length).first(10)
          #=> [0, 1, 4, 6, 8, 9, 10, 12, 14, 15]
    
        require 'prime'
        Array(1..100) - /1?|(11+?)\1+/.examples(max_repeater_variance: 50, max_group_results: 3000).uniq.sort.map(&:length) == Prime.first(25)
          #=> true
    

Horribly inefficient, but pretty cool! Here's my gem if case anyone's
interested: [https://github.com/tom-lord/regexp-
examples](https://github.com/tom-lord/regexp-examples)

~~~
arielpts
Thank you very much for your great gem :)

Very usefull!

~~~
tom-lord
Thanks :)

It's still "under development", while I add a few missing features, but as you
can see from the README these are only for fairly obscure parts of the regex
language! - E.g. POSIX groups, named properties, character set intersection,
...

------
charlesacknin
Nice (abuse of regexp) !

AFAICT the way the string is going to be parsed is equivalent to this:

    
    
      def isprime(n):
        if n == 0 or n == 1:
          return False
        i = 2
        while i <= n:
          if (n - i) % i == 0:
            return False
          i += 1
        return True
    

A parser implementation may optimize and do "while (n - i) >= 1" instead,
which cuts in half the number of iterations.

~~~
jdmichal
A better implementation will only check to sqrt(n), as there can be no prime
factor greater than that.

~~~
punee
Yes, there can.

~~~
tom-lord
The point is that if there is a prime factor greater than sqrt(n), then your
will have already found its co-factor by then.

~~~
jdmichal
Correct. I misspoke. I should have said,

A better implementation will only check to sqrt(n), as there can be no prime
factor greater than that without there also being a prime factor lower than
that.

------
fxn
I explored this technique applied to a few other problems just for fun,
integer square root, coprimes, continued fractions, modulo, etc.

[https://github.com/fxn/math-with-regexps/blob/master/one-
lin...](https://github.com/fxn/math-with-regexps/blob/master/one-liners.sh)

------
nsajko
It uses a string of '1' as a unary numeral representation of a number and then
checks it for divisibility with 2,3,4... using brute force.

------
logicallee
less sexy when you write

    
    
      /^(11+?)\1+$/ is a prime test on whether a unary string  > 2 is composite
    

then people have a real chance of figuring it out. (go ahead and try, if you
haven't read the article.)

As it stands that 1 x shift is super confusing. who would have thought 1 would
become a literal! we wouldn't even write it that way in english.

------
bantic
The thing that made this click for me was realizing the "1" in the regex was
not special. The regex is just looping through integers (starting with 2) and
checking if the target number is divisible by any of them. It can use any
digit if the target string uses the same digit. Ruby example using 5 instead
of 1:

def is_prime(n); str = "5"*n; str !~ /^5?$|^(55+?)\1+$/; end

The regex essentially says, for integer n, "does 2 ones fit evenly into n
ones", then "does 3 ones fit into n ones", etc., which is the same as "does 2
fit into n", "does 3 fit into n", etc.

------
avinash
Thanks for sharing. This is a post I wrote way back in 2007 and, even though,
as many have pointed out, the algorithm is very inefficient, it is still a
good illustration of how powerful regular expressions are.

When I am explaining the basics of regular expressions to people, I always
refer them to this post to make them realize that regular expressions are not
only glorified search terms. In general, they are amazed :-)

------
ekimekim
An aspect of the syntax is confusing me here. I would have assumed that the
expression 'x+?' would be parsed as '((x)+)?', ie. 'x*'. However, it seems the
parser special-cases this as (x)(+?) instead, and interprets that as a non-
greedy operator. Isn't this horribly inconsistent?

------
eridal
nice trick!

here a JS implementation

    
    
        function isPrime(n) {
          var re = /^1?$|^(11+?)\1+$/;
          var s = [];
          while (n -- > 0)
            s.push(1);
          return !re.test(s.join(''));
        }

~~~
maemilius
If I may suggest an improvement:

    
    
      function isPrime(n) {
        var re = /^1?$|^(11+?)\1+$/,
              s = (new Array(n+1)).join('1');
      
        return !re.test(s);
      }

~~~
myfonj
Why not just function isPrime(n) { return
!/^1?$|^(11+?)\1+$/.test(Array(n+1).join('1')) } then? :]

------
Amorymeltzer
We all learned this way back with the Regex Game:
[http://regex.alf.nu/6](http://regex.alf.nu/6)

------
TheLoneWolfling
Is there a way to write a (non-regular, of course) regular expression that
checks for prime numbers in a non-unary base?

~~~
Someone
If there were, there would be finitely many repunit primes (via the pumping
lemma and the notion that there are arbitrary large non-prime repunits).

Edit: that's not quite correct. The above would imply some simple structure in
repunit primes. If you add the known asymptotical behaviour of the 'number of
primes < N)' function, I think it would imply it.

Also, in binary, it would either imply there are finitely many Mersenne
primes, or that there is a N such that all Mersenne numbers larger than N are
prime (edit: Mersenne numbers are repunits in binary)

------
geertj
This one has been going around for a while. I'd be curious to see a base-2 or
base-10 variant of it.

------
okey
Evidently beauty is in the eye of the beholder.

