

Advanced Regular Expression Tips and Techniques - cfontes
http://pypix.com/tools-and-tips/advanced-regular-expression-tips-techniques/

======
tmslnz
My all-time favourite reference for regex syntax is
[http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt](http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt)
It's compact and clear enough for most cases.

------
gmac
If you want readable spacing and comments in JS RegExps, my customary plug:
[http://blog.mackerron.com/2010/08/08/extended-multi-line-
js-...](http://blog.mackerron.com/2010/08/08/extended-multi-line-js-regexps/)

~~~
esalman
Was going to ask that questions. Thanks.

------
draegtun
These days I try and use _Named Capture Buffers_ where feasible to make my
regex code much easier to understand & update.

For eg. Here is one way I would do the first example shown (using Perl this
time):

    
    
      use 5.016;
      use warnings;
      use re '/x';  # ignore all whitespace in regex patterns
     
      my $delimiter      = qr/ (?<delimiter>    [\-\.\s])      /;
      my $prefix         = qr/ (?<prefix>       1)             /;
      my $area_code      = qr/ (?<area_code>    \d{3})         /;
      my $first3digits   = qr/ (?<first3digits> \d{3})         /;
      my $last4digits    = qr/ (?<last4digits>  \d{4})         /;
    
      my $prefix_rule    = qr/ ($prefix $delimiter)            /;
      my $area_code_rule = qr/ ($area_code | \( $area_code \)) /;
    
      my $pattern = qr/
          ^
          ($area_code_rule | ($prefix_rule $area_code_rule))
          $delimiter
          $first3digits
          $delimiter
          $last4digits
          \Z
      /;
     
      my @numbers = (
          "123 555 6789",
          "1-(123)-555-6788",
          "(123-555-6787",
          "(123).555.6786",
          "123 55 6785",
      );
     
      for my $number (@numbers) {
          if ($number =~ $pattern) {
              print "$number is valid : ";
              say join "-", 
                      $+{prefix} || 'none', 
                      $+{area_code}, 
                      $+{first3digits}, 
                      $+{last4digits};
          } 
      } 
    

Output:

    
    
      123 555 6789 is valid : none-123-555-6789
      1-(123)-555-6788 is valid : 1-123-555-6788
      (123).555.6786 is valid : none-123-555-6786

------
dspillett
I'm getting "Panda Endpoint Protection has blocked access to this page,
Reason:The page contains malware or exploits than could infect your PC." from
my office machine - is/was there something there that I should inform the site
owner about, or a false alarm that I should let Panda know about?

~~~
pypix
The site has no potential malware or virus, its clean. your Panda Antivirus is
detecting it wrong.

------
leeoniya
i never understood why non-capturing syntax requires extra work - '(?:' vs '('
\- seems very backwards.

~~~
pcmonk
I think it's simply because capturing groups are much more common than non-
capturing groups. Even though non-capturing is "simpler", it's less common, so
we default to capturing and use the more verbose syntax for non-capturing.

