
Extreme regex foo: what you need to know to become a regular expression pro - Anon84
http://immike.net/blog/2007/06/21/extreme-regex-foo-what-you-need-to-know-to-become-a-regular-expression-pro/
======
neilc
_you should be able to construct basic regular expressions to match things
like email addresses_

Actually, writing a _correct_ (per RFC) regex to recognize email addresses is
far from simple.

~~~
smanek
You can say that again. Here's a Perl Regex to validate according to RFC 822:
<http://ex-parrot.com/~pdw/Mail-RFC822-Address.html>. It's well over a page
long.

The problem is that email validation is really better suited to being done by
a push-down automata (rather than a finite state machine, which is what true
regexes are). For better or for worse, perl regular expressions aren't
strictly finite state machines though - so you can kind of extend them beyond
what is sane.

P.S. Here's a ridiculously cool regex that can be used to determine if a
number is prime: /^1?$|^(11+?)\1+$/

If you've got a few hours (a few seconds for cperciva ;-)) to waste, try and
figure it out :-D

~~~
Anon84
I'm not sure how this ( /^1?$|^(11+?)\1+$/ ) would match 3, 5, 7, etc... I'm
no regex expert, maybe I'm missing something?

~~~
smanek
well, you're right - there are some more details. it needs a bit of wrapper
code around it.

It actually tests if n is composite, when presented with a pattern of n ones.
E.g., '111' doesn't match so it is prime, and '1111' does match so it's
composite.

~~~
rplevy
Wow, this is really amazing (to me at least because I had not seen or thought
of this before).

[ Warning: Spoiler! :) ]

It's in base 1 (tally system).

The first part up to the "|" matches the case where number 1 is the whole
string, covering the fact of 1 not being considered prime. It also matches no
string, I guess to mean 0.

Next part matches a captured string consisting of 2 or more ones, and the
captured string must be repeated at least one time. That is, 2+ repetitions of
chunks of 2+ ones.

This much is obvious, but it wasn't until I wrote it out that I realized what
it is doing:

multiples of 2

11+11 or 11+11+11 or 11+11+11+11 or ... = 4, 6, 8, 10, 12, ...

multiples of 3

111+111 or 111+111+111 or 111+111+111+111 or ... = 6, 9, 12, ...

multiples of 4

1111+1111 or 1111+1111+1111 or ... = 8, 12, 16, 20, ...

multiples of 5

11111+11111 or 11111+11111+11111 or ... = 10, 15, 20, 25, ...

multiples of n

which leaves only primes remaining.

Side note: In Perl, executing it this way:

$nstr = 1 x $ARGV[0]; print ($nstr =~ m/^1?$|^(11+?)\1+$/ ? "composite\n" :
"prime\n");

The largest prime I can find without a seg fault is 37397

And the largest composite I can find without a seg fault is 37399

------
mhartl
He means "regex fu": "fu" as in "kung fu", not "foo bar". In other words, the
construction is "[arbitrary thing] fu". Since hackers often use "foo" for an
arbitrary thing, perhaps this is better rendered as "foo fu". In Ruby we might
write it as follows:

    
    
      class String
        def foo_fu?
          foo = ".*"
          self =~ /#{foo} fu/
        end
      end

------
jrockway
His regex for adding commas to numbers is a little obtuse. I would just write:

    
    
       scalar reverse join ',', split /\d{3}\K/, reverse $number;
    

The reason why people hate regexes are because people use them as a general
purpose programming language instead of a concise way of matching strings.
Note that the regex I use just says to "split the string every third digit"
and the rest of the logic is actual code.

------
DTrejo
If you have any interest in regex, visit

txt2re.com

It is extremely helpful.

------
rhcpd
and really, i think the chap meant "Fu" anyways...

