Hacker News new | comments | show | ask | jobs | submit login

I like the way Perl 6 handles this with the grammar feature. (A grammar is just a special type of class, with a regex as just a special type of method.)

It could be simpler, but I want the resulting data structure to be easier to use.

  grammar Url {
  
    # default regex/token/rule/method to call
    # (token disables backtracking)
    token TOP {
      <protocol> <domain> <path> <query> <fragment>
    }
  
    token protocol {
      <(
  
        <[a..z]> ** 3..10
  
      )>     # don't include :// in the stringified result
  
      '://'  # must be escaped as it isn't alphanumeric
    }
  
    token domain-segment {  <-[?#/.]>+  }
    token domain {
      <domain-segment> ** 2..* # at least 2 domain segments
        % '.'                  # separated by .
  
      <?{
        # make sure that the last segment is at least 3 chars
        # (using the Boolean result of regular Perl 6 code)
        @<domain-segment>.tail.chars >= 3
      }>
    }
  
    token path-segment {  <-[?#/\\]>+  }
    token path {
      [
        <[/\\]>
        <path-segment>*
          %% <[/\\]>     # separated by path separator (allow trailing)
      ]?
    }
  
    token query-segment {
      # store as named, rather than positional
      $<key>   = ( <-[#=&]>+ )
      '='
      $<value> = ( <-[#=&]>+ )

      # run regular Perl 6 code in the regex
      {

        # attach a Pair object as the AST
        make ~$<key> => val(~$<value>)
        # (`val` turns a numeric value into an allomorph)

      }
    }
    token query {
      [
        '?'
        <( # don't include ? in the stringified result

          <query-segment>*
            % '&'         # separated by & (no trailing allowed)

        )>
      ]?
  
      {
        # attach a static associative array of the key value pairs
        # as the AST
        make Map.new: (@<query-segment>».ast if @<query-segment>.elems)
      }
    }
  
    token fragment {
      [
        '#'
         <(  .*  )> # don't include '#' in the stringified result
      ]?
    }
  }
Example usage:

  > my $result = Url.parse('http://perl6.org/foo/bar/baz/?a=1&b=2#fragment');
  > say $result;
  「http://perl6.org/foo/bar/baz/?a=1&b=2#fragment」
   protocol => 「http」
   domain => 「perl6.org」
    domain-segment => 「perl6」
    domain-segment => 「org」
   path => 「/foo/bar/baz/」
    path-segment => 「foo」
    path-segment => 「bar」
    path-segment => 「baz」
   query => 「a=1&b=2」
    query-segment => 「a=1」
     key => 「a」
     value => 「1」
    query-segment => 「b=2」
     key => 「b」
     value => 「2」
   fragment => 「fragment」

  > say $result<query>.ast;
  Map.new((:a(IntStr.new(1, "1")),:b(IntStr.new(2, "2"))))

  > my %query := $result<query>.ast;
  > say %query<b> ~~ Int; # True (because of val(…))
  True
A more advanced usage would be with an actions class.

Basically Perl 6 treats regular expressions as code that is written in a domain specific sub-language, with grammars acting as a structure to hang them off of.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: