A Perl Regular Expression That Matches Prime Numbers 123 points by pkrumins on Dec 25, 2011 | hide | past | web | favorite | 27 comments

 It's cool, but it's regexp, not regular expression -- regular expressions cannot test for primeness. The proof is a simple application of pumping lemma[1]: we want to show that language { a^p: p prime } is not regular. Suppose it is. By pumping lemma we have numbers N, n >= 1, such that for every k >= 0, we have a^(N+nk) in our language. But that means that for every k, N+nk is prime, and it fails for k = N, since N+nN = N(1+n) is not a prime, since both N and 1+n are greater than 1.
 That argument is theoretically sound, and I've upvoted you for bringing up such a helpful contribution, but it's practically nonsense.We are all running computers that are finite-state machines: if you have roughly a trillion bits of storage, you can have up to 2^1T states.For most numbers that are interesting to humans, we are able to test their primality with regular languages, which are detectable by finite-state machines. The number of states might be quite large indeed.
 You're mostly right, I made a similar point few months ago[1].However, I must disagree that we are able to use regular expressions to test for primality all the number we're interested in. Your argument only proves that there exists a regular expression that tests for primality all the numbers below N, where N is arbitratily large. It does not shows any way how to construct it[2], nor does it guarantee that the DFA we will be using to test for primality will fit in available memory.[2] - It's actually trivial: example regular expression we're looking for is /a^p_1|a^p_2|...|a^p_N/, where p_1, ..., p_N are primes below. Sadly, to use it to test for primality, we need to know if the number is prime beforehand.Actually, it sounds like an interesting problem -- to come up with an interesting and nontrivial regular expression schema that for every N gives us a regular expression that tests for primeness all numbers less than N.Come to think of it, the existence of any such interesting schema seems highly unlikely -- the Parikh's theorem implies that the set of lengths of words matched by regular expression (or even context-free grammar) seems to be too constrained to allow for such scheme -- we can easily find non-prime matched by an such regular expression, and Parikhs theorem seems to imply that it is not much bigger than the automaton state count.
 The only thing practically nonsense here is viewing a computer as a finite state machine with 2^trillion states! That state machines cannot decide primality is downright practical by comparison.
 >We are all running computers that are finite-state machinesThe amount of storage of a computer is not fixed, and is easily extendable, and extending memory fits simply in the Turing Model, and does not affect the internal states of the Turing Machine. In contrast, any time a new source of memory is introduced (say, a new hard drive or aws cluster), you'd have to change your DFA, which breaks your statement 'a computer is a DFA'.I really don't see how calling computers DFA's can in anyway be considered 'practical'.
 It's tough to define or predict what is "interesting to humans." I'm interested in primality testing of arbitrarily large integers that wouldn't be feasible to test with a huge regex consisting of a bunch of known primes combined with alternation.I don't think that solution would be very feasible anyway. The four largest known primes alone would take about 20 megabytes if stored as unsigned integers.
 Yeah, if the title wasn't clear enough, this isn't a "regular expression" in the Computer Science definition (type-3 in the Chomsky hierarchy). It's a Perl regular expression, so it supports context-free grammars (type-2) I think. http://en.wikipedia.org/wiki/Chomsky_hierarchy
 Incidentally it supports quite a bit more. Even the less powerful Perl Compatible Regular Expressions (PCRE), which are used by PHP, can also easily handle context sensitive grammars (type 1): http://stackoverflow.com/questions/7434272/match-an-bn-cn-e-...
 Perl regular expressions probably accept more than just context-free grammars since the language that accepts prime numbers is not Context-Free, you can't build a pushdown automaton for such a language, there is a relatively trivial proof of this using the pumping lemma for context-free languages.
 Comments like this are the reason I read HN, which is really an oasis in a desert of confident ignorance.
 Posts like this and comments like this (the GF) are the reason I read HN, so I can discover how stupid I am in comparison. It's an important reminder.
 Comments like these are my least favorite part of HN -- the sheer level of implied smug pedantry running through this part of the thread is disgusting.
 Exactamundo. It's not like it takes a genius to know what is and what is not a regular expression as defined in theory. It's CS 101 (well, usually more like CS 3xx, the class where you learn about lexers, parsers and compilers).And you can get similarly informed comments in Reddit, Slashdot, heck, with enough eyes reading a thread even CNET...
 regexp != regular expression like xyzzyz said. This script uses backreferences which give regex's more "computing power" than regular expressions actually have, but remove the guaranteed runtime of an O(n) regular expression.Quote: One common regular expression extension that does provide additional power is called backreferences. A backreference like \1 or \2 matches the string matched by a previous parenthesized expression, and only that string: (cat|dog)\1 matches catcat and dogdog but not catdog nor dogcat. As far as the theoretical term is concerned, regular expressions with backreferences are not regular expressions. The power that backreferences add comes at great cost: in the worst case, the best known implementations require exponential search algorithms, like the one Perl uses. Perl (and the other languages) could not now remove backreference support, of course, but they could employ much faster algorithms when presented with regular expressions that don't have backreferences, like the ones considered above.If that and the Wiki article still don't help you understand why regexp != regular expression, please ask. =]
 Beware: Depending on your backtracking limits, at some point the regex will start deciding numbers are prime that aren't.
 Former incarnations of the same thing here on HN:http://news.ycombinator.com/item?id=1486158
 that's what, five years now. Imagine what you would need to pay someone who didn't know the code to come debug the current version.That's Perl built-in job security right there. Also why I don't use it except as throw-away scripting at the bash prompt:)
 >>That's Perl built-in job security right there.Huh? You mean, this is about Perl because regexps was created in Perl and the Perl regexp lib isn't ported anywhere else?Oh, wait...(If you must troll, at least be interesting. :-( )
 I'm sorry must have been drunk, and that one-liner is in reality easy to debug, or even understand.
 OK, that assumption that this is how serious people normally writes REs was at least entertaining. But (still) not really contributing to the HN quality.
 First seen in 1998: http://diswww.mit.edu/bloom-picayune.mit.edu/perl/10138
 Unless I'm missing something obvious, this post lacks a proper attribution. It is an old, very old and very well-known one-liner that made rounds and rounds in the Perl community over the years. Reading Krumins's post one might think he invented it.
 possibly added reently, but the last line of the post reads "PS. I didn't invent this regular expression, it was invented in 1998 by Abigail."
 Yep, it wasn't there when I first looked.
 Yup, I updated the article so it didn't sound like I invented it. Also added a message that it's not a real regular expression.
 I wrote a coprimality test based on the same technique https://gist.github.com/1520171
 Pretty awesome!

Search: