Real regexes can be matched in time linear in the input size; you'll need to use an applicative parser combinator like regex-applicative (https://hackage.haskell.org/package/regex-applicative) to get that kind of asymptotic complexity. A monadic one like Parsec is impossible to analyze statically to turn into a finite automaton.
However, I don't think this solution will help too much with regexes. To me it's overly complex and less readable than regex strings.
I didn't invented the combinators for regex, that was already in ocaml-re (which is used as backend for tyre). The combinator approach has many advantages:
- It's "self documenting". Your combinators are just functions, so you just expose them and give them type signatures, and the usual documentation/autocompletion/whatevertooling works.
- It composes better. You don't have to mash string together to compose your regex, you can name intermediary regexs with normal variables, etc.
- Related to the point above: No string quoting hell.
Now, tyre doesn't really improve any of that. If anything, combinators are simpler in ocaml-re. What tyre gives you is the automatic extraction and casting of matching groups into the desired datatype (and the reverse direction, but that's almost free bonus). No need to select groups manually and transform into an integer, no need to reconstruct tuples/lists/records, tyre will do that for you (and handle conversion failures cleanly).
Also, how flexible is the unparsing? Does it have to be the same regular expression? [Sorry, I haven't read in detail and I'm only slightly familiar with ocaml - I'm interested from an academic perspective.]
If you are interested by unparsing, I would advise you to read the (famous) paper functional unparsing by Danvy. The module Printf and Format of the OCaml standard library are basically this paper on steroid. I recently wrote a blog on part of the implementation technique.
Hope that helps. :)
Uh... did you see my bug report on your webpage (first sentence above)? That link still doesn't work properly.
It's a standalone library and may well be used outside of Cucumber.
The example in the home page is hard to parse and understand.
Technically not regular expressions since they are appropriate for matching much more than regular grammars, but that's true for almost any programming language's "regular expressions".
IMO the word "improvements" should be in quotes. Pathological regular expressions are an issue because of things like back references and look-behind matching. If Perl hadn't enshrined the idea of "a regular expression for every problem" people would probably be less skeptical of regular expressions in code.
(The extensions to regexps in Perl 5 (and presumably now also in PCRE?) have made it possible to write them more like grammars than anything else. But you would probably not approve. :-) )
Make syntax consistent across RE engines for one.
Improve the syntax.
Introduce clear modes with different tradeoffs, so you don't get e.g. regex based DOS attacks.
Make them easier to compose, not just through copy pasting...
The real sweet spot I've found is parse in Rebol / Red - http://blog.hostilefork.com/why-rebol-red-parse-cool/ | http://www.rebol.com/docs/core23/rebolcore-15.html
In general you have the same problem when parsing CSV, command line flags, URL query parameters, JSON, etc.
For the little history, I started this for URLs. I couldn't (and still can't) figure out a proper interface for that, but someone mentioned he would like something like that for regular expressions, so here it is.
We already have something similar for command line flags in OCaml (and several for JSON).
(For those having the type-safety problem with JSON in Python, can I recommend my own library obiwan? https://pypi.python.org/pypi/obiwan/ :) )
"Parser combinators", for me, denotes actual parsers, for non-contextual or contextual grammars.
It's pretty fast too. I don't know whether you can compare the performance of Ragel to regex libraries like RE2; they look like they solve different problems.