Regex: a little Clojure DSL for readable, compositional regexes

draegtun · on Sept 23, 2010

Looks good.

Regex have long been tarnished with the bad readability stain. But that shouldn't always be the case, for eg. the Clojure example:

    (def datestamp-re
      (let [d {\0 \9}]
        (regex [d d d d :as :year] \- [d d :as :month] \- [d d :as :day])))

Could be written in Perl like so:

    sub datestamp_re {
        qr/ (?<year> \d \d \d \d) - (?<month> \d \d) - (?<day> \d \d ) /x;
    }

Perl has had named captures since 2007. The captures are stored %+ variable (ie. $+{year}, $+{month} & $+{date}).

The /x (on the end of the regex object qr) allows whitespace & newlines in the regex composition making it much easier to read. Anyone doing a longer than a few chars regex should always have this :)

And composing your regex in bites is definitely a good idea:

    sub datestamp_re  {
        my $year = qr/(?<year>\d{4})/;  # defined with named capture
        my $dom  = qr/\d{2}/;           # day or month (2 digit)
    
        qr/ $year - (?<month>$dom) - (?<day>$dom) /x;
    }

    sub re_match {
        my ($re, $text) = @_;
        if ($text =~ $re) { return { %+ } }
        return;
    }

    re.pl> re_match datestamp_re, '2007-10-23';
      {
        day => 23,
        month => 10,
        year => 2007
      }

    re.pl> re_match datestamp_re, '20X7-10-23';
    
    re.pl>

henning · on Sept 23, 2010

The main issue I see with this Clojure library at the moment is the lack of tests. Other than the general idea is worthwhile and not hard to explore in most functional languages.

draegtun · on Sept 24, 2010

Something like Perl6 Rules / Grammar maybe a good target to explore: http://en.wikipedia.org/wiki/Perl_6_rules