Hacker News new | comments | show | ask | jobs | submit login
Tyre – Typed regular expressions (drup.github.io)
109 points by dwc on Aug 15, 2016 | hide | past | web | favorite | 32 comments

See also an interesting typed regex implementation by Gabriel Gonzalez: https://github.com/Gabriel439/slides/blob/master/regex/regex...

Is there any advantage to using regex over a parser combinator?

Depends on the parser combinator library.

Real regexes can be matched in time linear in the input size; you'll need to use an applicative parser combinator like regex-applicative (https://hackage.haskell.org/package/regex-applicative) to get that kind of asymptotic complexity. A monadic one like Parsec is impossible to analyze statically to turn into a finite automaton.

Interesting. Attoparsec is usually my preferred parsing library, but regex-applicative looks very useful for smaller, less complicated parsers.

It's an interesting experiment and an inspiring one.

However, I don't think this solution will help too much with regexes. To me it's overly complex and less readable than regex strings.

Hi, author here!

I didn't invented the combinators for regex, that was already in ocaml-re (which is used as backend for tyre). The combinator approach has many advantages:

- You don't need to remember which regex syntax the library is using. Is it using the emacs one ? The perl one ? The javascript one ? The bash one ?! (shudder) ...

- It's "self documenting". Your combinators are just functions, so you just expose them and give them type signatures, and the usual documentation/autocompletion/whatevertooling works.

- It composes better. You don't have to mash string together to compose your regex, you can name intermediary regexs with normal variables, etc.

- Related to the point above: No string quoting hell.

Now, tyre doesn't really improve any of that. If anything, combinators are simpler in ocaml-re[1]. What tyre gives you is the automatic extraction and casting of matching groups into the desired datatype (and the reverse direction, but that's almost free bonus). No need to select groups manually and transform into an integer, no need to reconstruct tuples/lists/records, tyre will do that for you (and handle conversion failures cleanly).

[1]: https://github.com/ocaml/ocaml-re/blob/master/lib/re.mli#L20...

Hi, your homepage "unparsing" link goes to #evaluating, but the anchor on the docs page has id="eval"

Also, how flexible is the unparsing? Does it have to be the same regular expression? [Sorry, I haven't read in detail and I'm only slightly familiar with ocaml - I'm interested from an academic perspective.]

If I understand your question correctly, no, it doesn't. Actually, the thing that you "unparse" doesn't even have to come from a regex matching to begin with. As long as the type matches, everything is fine.

If you are interested by unparsing, I would advise you to read the (famous) paper functional unparsing by Danvy[1]. The module Printf and Format of the OCaml standard library are basically this paper on steroid. I recently wrote a blog on part of the implementation technique[2].

Hope that helps. :)

[1]: http://www.brics.dk/RS/98/12/BRICS-RS-98-12.pdf

[2]: https://drup.github.io/2016/08/02/difflists/

[Printf]: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Printf.htm...

[Format]: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Format.htm...

Ah, thanks. I think one needs (structural) type transformation for the flexibility I had mind, things like swapping sum operands, re-ordering concantenation operands.

Uh... did you see my bug report on your webpage (first sentence above)? That link still doesn't work properly.

Slightly related, Cucumber will soon support a typed expression format called Cucumber Expressions: https://docs.cucumber.io/cucumber-expressions/

It's a standalone library and may well be used outside of Cucumber.

It's currently implemented in JavaScript, Java and Ruby.

A minor nitpick: wrong syntax highlighting is worse than no syntax highlighting at all.

The example in the home page is hard to parse and understand.

What could anyone possibly do to improve Regular Expressions ?

Perl6 made a lot of improvements https://docs.perl6.org/language/regexes

Technically not regular expressions since they are appropriate for matching much more than regular grammars, but that's true for almost any programming language's "regular expressions".

> Perl6 made a lot of improvements https://docs.perl6.org/language/regexes

IMO the word "improvements" should be in quotes. Pathological regular expressions are an issue because of things like back references and look-behind matching. If Perl hadn't enshrined the idea of "a regular expression for every problem" people would probably be less skeptical of regular expressions in code.

So why not use grammars, if regexps are too complex...? https://docs.perl6.org/language/grammars

(The extensions to regexps in Perl 5 (and presumably now also in PCRE?) have made it possible to write them more like grammars than anything else. But you would probably not approve. :-) )

Lots of things.

Make syntax consistent across RE engines for one.

Improve the syntax.

Introduce clear modes with different tradeoffs, so you don't get e.g. regex based DOS attacks.

Make them easier to compose, not just through copy pasting...

I've steered more to composing grammars by using tools like Regexp::Grammars - https://metacpan.org/pod/Regexp::Grammars

The real sweet spot I've found is parse in Rebol / Red - http://blog.hostilefork.com/why-rebol-red-parse-cool/ | http://www.rebol.com/docs/core23/rebolcore-15.html

A linq interface to visit the resulting parse tree.

Add types of course.

How do these improve regular expressions?

It doesn't improve regexp; it improves your code that uses regexp to populate type-checked data. Instead of writing a lot of "if foo is int then ..." type checking code you can just annotate the regexp with types and it either extracts those types for you or throws an error or something.

In general you have the same problem when parsing CSV, command line flags, URL query parameters, JSON, etc.

Indeed, that is the exact goal.

For the little history, I started this for URLs[1]. I couldn't (and still can't) figure out a proper interface for that, but someone mentioned he would like something like that for regular expressions, so here it is.

We already have something similar for command line flags in OCaml[2] (and several for JSON).

[1]: https://github.com/Drup/furl [2]: http://erratique.ch/software/cmdliner

Those type-safety problems for parsing tainted data are a royal pain and, often, security issues! Typed regex are a very good idea and very welcome. At the moment, when I use regex and extraction, I have a lot of boilerplate and I could easily have lots of bugs unfuzzed.

(For those having the type-safety problem with JSON in Python, can I recommend my own library obiwan? https://pypi.python.org/pypi/obiwan/ :) )

At this point, why not just use the more powerful, maintainable, and flexible parser combinators[0]? I thought the point of regex was for quick and dirty bash/grep/perl one-liners and search/replace in vi/emacs. If you're going to actually maintain the code, then use something designed for that.

[0] https://en.wikipedia.org/wiki/Parser_combinator

Well, there is the speed argument. Regular expressions (not the pcre kind with backtracking and stuff) are blazing fast. Ragel[1] is a good example of that. In general, though, I actually agree.

[1]: http://www.colm.net/open-source/ragel/

Among parser combinator libraries, attoparsec[0] is no slouch. It gets very close to the speed of the hand-rolled http-parser library while being much shorter, easier to read, and maintain[1].

[0] https://hackage.haskell.org/package/attoparsec

[1] http://www.serpentine.com/blog/2010/03/03/whats-in-a-parser-...

As I mentioned in another comment, there are applicative parser combinator libraries for regular languages, e.g. https://hackage.haskell.org/package/regex-applicative so asymptotic speed is orthogonal to 'combinatorialness'.

If you look at the API of ocaml-re (and tyre), you'll see it's pretty much equivalent to regex-applicative. Regex combinators are nice.

"Parser combinators", for me, denotes actual parsers, for non-contextual or contextual grammars.

Another good example is RE2: https://github.com/google/re2

It's pretty fast too. I don't know whether you can compare the performance of Ragel to regex libraries like RE2; they look like they solve different problems.

Consider the regex may be program input, so you can't just code them away. A search engine is a good example.

I see! Thanks.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact