
What the heck is a parser-combinator? - lihaoyi
http://www.kimpel.com/2017/02/28/what-the-heck-is-a-parser-combinator/
======
tel
Consider the most basic parsers you might want. For instance, a parser that
only succeeds if it matches a string exactly, a parser that matches any single
character and always succeeds, a parser that matches nothing and returns some
constant, a parser that always fails. They're all simple and stupid and let's
give them names:

    
    
        string("foobar") : Parser<String>
        char : Parser<Character>
        always<A>(x: A): Parser<A>
        never<A>: Parser<A>
    

These are parser combinators. They form the atoms at the foundation of a
language for constructing parsers. For instance, we can imagine two parsers
happening in sequence

    
    
        then<A, B>(a: Parser<A>, b: Parser<B>): Parser<(A, B)>
    
        then(char, char): Parser<(Character, Character)>
    

Or more fancily, a two parsers happening in sequence, but the second parser
being defined using the output of the first. This creates context sensitivity

    
    
        thenDependent<A, B>(a: Parser<A>, b: A -> Parser<B>): Parser<(A, B)>
    

We're constructing a fundamental language for building parsers up from our
atoms to larger and larger things. All we need now is a way to "run" them

    
    
        run<A>(p: Parser<A>): String -> Option<A>
    

Tada---parser combinators!

~~~
omaranto
Is that really how the term is used? I would have naively thought that among
your examples only 'then' and 'thenDependent' are parser _combinators_ , and
that 'string("foobar")', 'char', 'always(x)' and 'never' are simply called
parsers. (Maybe 'string', as opposed to the evaluated 'string("foobar")',
should be called a combinator even though it doesn't take any parsers as
arguments...)

Just like I'd call '+' and '*' arithmetic operators but I'd call '3' a number.

~~~
DonaldPShimoda
In my experience (limited) yes, this is how the term is used.

"Combinator" != "combiner". A combinator is a thing which is combined with
other combinators. When you combine two combinators, you end up with yet
another combinator which can be combined with other combinators. Very
composable, in the functional spirit of things.

~~~
omaranto
I'm not so sure, and however wrote the Wikipedia page seems to agree with me.
The first sentence is "In computer programming, a parser combinator is a
higher-order function that accepts several parsers as input and returns a new
parser as its output." Also people name the type something like Parser<A>, not
ParserCombinator<A>.

------
Cieplak
Gotta love Parsec for Haskell and pyparsing for Python. Some other awesome
parsing libs:

PEGTL - _C++ Parsing Expression Grammar Template Library_
[https://github.com/taocpp/PEGTL](https://github.com/taocpp/PEGTL)

Parboiled - _Java & Scala PEG Library_
[https://github.com/sirthias/parboiled](https://github.com/sirthias/parboiled)

Nom - _Rust parser combinator framework_
[https://github.com/Geal/nom](https://github.com/Geal/nom)

Nearley - _JavaScript parser toolkit_
[https://github.com/Hardmath123/nearley](https://github.com/Hardmath123/nearley)

Neotoma - _Erlang library and packrat parser-generator for PEGs_
[https://github.com/seancribbs/neotoma](https://github.com/seancribbs/neotoma)

~~~
DominoTree
I've used Nom to parse a couple RFC-based protocols now, and I'm amazed at how
easy and performant the result ends up being.

There's a great walk-through of using it here, including fuzzing your parsers
to make sure they're solid. [https://github.com/Geal/langsec-2017-hackathon-
code](https://github.com/Geal/langsec-2017-hackathon-code)

~~~
beefsack
Geal is very helpful too, he pops up around the place in the Rust community
and is responsive on the nom Gitter channel.

------
LukaD
Slightly off-topic but _please_ stop this js smooth scroll nonsense. If I
wanted to use smooth scroll I would have enabled it in my browser.

~~~
Lev1a
^ Someone talking sense right here, I thought some of my browser settings had
changed for a minute.

~~~
to3m
Sing it! I wonder what purpose it is even supposed to serve?

The weird thing is that all people have to do to make this stuff work is:
_nothing at all_. But for some reason that's just too much effort.

------
brianberns
This is great, but if you're going to use a parser combinator in .NET, you
might as well switch to F# and use FParsec instead of hacking LINQ in C#. The
F# code is so much easier to read and understand.

------
wahern
Here's a JSON parser in LPeg, the excellent PEG parser from one of the Lua
authors. The function "decode" returns the final, complete tree data
structure.

    
    
      local lpeg = require"lpeg"
      local P, S, R, V = lpeg.P, lpeg.S, lpeg.R, lpeg.V
      local C, Cc, Cf, Cg, Ct = lpeg.C, lpeg.Cc, lpeg.Cf, lpeg.Cg, lpeg.Ct
    
      local function to8(n)
        ... -- Lua code to normalize UTF-16 to UTF-8
      end
    
      local unicode = P"u" * (R("09", "AF", "af")^4 / to8)
      local named = C'"' + C"\\" + C"/" + (P"b" * Cc"\b") + (P"f" * Cc"\f") + (P"n" * Cc"\n") + (P"r" * Cc"\r") + (P"t" * Cc"\t")
      local escaped = P"\\" * (named + unicode)
      local unescaped = C((P(1) - S'\\"')^1)
      local qstring = Ct(P'"' * (unescaped + escaped)^0 * P'"') / table.concat
    
      local exp = S"Ee" * S"-+"^-1 * R"09"^1
      local frac = P"." * R"09"^1
      local number = (S"-+"^-1 * R"09"^1 * frac^-1 * exp^-1) / tonumber
    
      local boolean = (P"true" * Cc(true)) + (P"false" * Cc(false))
      local null = P"null" * Cc(nil)
      local space = S" \t\r\n"^0
    
      local JSON = { "Value",
        Value = space * (V"Object" + V"Array" + V"Simple") * space,
        Object = Cf(Ct"{" * space * Cg(qstring * space * P":" * V"Value" * P","^-1 * space)^0 * P"}", rawset),
        Array = Ct(P"[" * space * (V"Value" * P","^-1 * space)^0 * P"]"),
        Simple = number + boolean + null + qstring,
      }
    
      local function decode(txt)
        return lpeg.match(JSON, txt)
      end

------
rebolek
This reminds me of Red's parse [http://www.red-
lang.org/2013/11/041-introducing-parse.html](http://www.red-
lang.org/2013/11/041-introducing-parse.html)

------
pwm
type Parser a :: String -> [(a, String)]

    
    
      "A Parser for Things
      Is a function from Strings
      To Lists of Pairs
      Of Things and Strings!"
      - Fritz Ruehr

------
albi_lander
Reminds me of SuperCombinators: a parser combinator framework written in Swift
[https://github.com/snipsco/SuperCombinators](https://github.com/snipsco/SuperCombinators)

------
unmole
Another good introduction: [http://theorangeduck.com/page/you-could-have-
invented-parser...](http://theorangeduck.com/page/you-could-have-invented-
parser-combinators)

------
tangus
Here is a great explanation of parser combinators, not tied to any specific
implementation:

[https://qntm.org/combinators](https://qntm.org/combinators)

------
RodgerTheGreat
I implemented a basic set of primitive parsers and combinators here in K,
along with a usage example- a complete parser for the bittorrent "Bencode"
data interchange format:

[https://github.com/JohnEarnest/ok/blob/gh-
pages/examples/par...](https://github.com/JohnEarnest/ok/blob/gh-
pages/examples/parsing.k)

Not particularly efficient, but fairly concise.

------
blacksmythe
Parser combinator library in Go:
[https://github.com/prataprc/goparsec](https://github.com/prataprc/goparsec)

------
bitwize
A tiny, elemental parser in a universe where parsers are closed under certain
parser composition operators. Such tiny parsers can be combined using such
operators to produce a bigger parser which could theoretically parse anything.

------
louthy
Wanted to leave a comment on the site, but it seems comments are broken, so
I'll stick this here just in case the author sees it:

You've linked to my csharp-monad library for C# parser combinators. This has
been superseded by my language-ext project:
[https://github.com/louthy/language-ext/](https://github.com/louthy/language-
ext/)

It is a much more advanced and efficient port of the Haskell Parsec library.
Would you mind linking to that instead?

Your Sprache examples would look like this in language-ext:

    
    
        public static readonly Parser<JCLCommand> JCLText =
            from open    in ch('$') 
            from ws1     in spaces
            from command in asString(many(noneOf(' ')))
            from ws2     in spaces
            from content in asString(many(noneOf('"')))
            select new JCLCommand(command, content);
         
        public static readonly Parser<JCLCommand> GlobalText =
            from variablename in asString(many(noneOf('=')))
            from ws2          in ch('=')
            from openbrack    in ch('(')
            from filepath     in asString(many(noneOf(')')))
            from closebrack   in ch(')')
            select new JCLCommand(variablename, filepath);
    

But the power of parser combinators are their reusable nature. So I'd break
that down to a set of tools:

    
    
        static Parser<A> token<A>(Parser<A> p) =>
            from x in p
            from _ in either(spaces, eof)
            select x;
    
        static Parser<string> symbol(string x) =>
            token(str(x));
    
        static Parser<string> identifier =
            token(asString(many1(alphaNum)));
    
        static Parser<A> quotes<A>(Parser<A> p) =>
            between(symbol("\""), symbol("\""), p);
    
        static Parser<A> parens<A>(Parser<A> p) =>
            between(symbol("("), symbol(")"), p);
    
        static Parser<string> quoteText =
            token(quotes(asString(many(satisfy(x => x != '"')))));
    
        static Parser<string> parensText =
            token(parens(asString(many(satisfy(x => x != ')')))));
    

Then your final parsers would look like this:

    
    
        static readonly Parser<JCLCommand> JCLText =
            from open    in symbol("$")
            from command in identifier
            from content in quoteText
            select new JCLCommand(command, content);
    
        static readonly Parser<JCLCommand> GlobalText =
            from variablename in identifier
            from ws2          in symbol("=")
            from filepath     in parensText
            select new JCLCommand(variablename, filepath);
    
        static readonly Parser<Seq<JCLCommand>> Commands =
            from _        in spaces
            from commands in many1(either(JCLText, GlobalText))
            select commands;
    
    

Which is much easier to understand I think. It's not exactly the same as it
defines what an identifier is, but it's much more tolerant of rogue spaces
because of the token parser. This is definitely the most compelling aspect of
parser combinators for me, the way they compose so elegantly.

------
ravishah
Discount Y - Combinator

------
manishmarahatta
basically a parser inception, takes parser and returns parser, woah

------
flor1s
A few years ago I worked on a similar kind of project as part of my bachelors.
We were converting SQL for Oracle to SQL for Microsoft SQL Server. One problem
is dealing with features which language A supports and language B doesn't.
Another problem is that it is a huge amount of work to match all syntax
elements as well as all library usages. In fact, I guess it's not a good thing
to say on HackerNews, but isn't it better to just buy a tool which does it for
you? It seems like they are available for the author's use case:
[http://www.ispirer.com/application-conversion/jcl-to-
powersh...](http://www.ispirer.com/application-conversion/jcl-to-powershell-
conversion)

~~~
majewsky
> We were converting SQL for Oracle to SQL for Microsoft SQL Server.

I had a similar problem a few months ago with a Golang application that uses
Postgres in production, but SQLite for unit tests. Since Go's SQL support has
pluggable driver backends, I made a generic proxy driver [1] that can rewrite
the incoming query, and used that in my application to rewrite from Postgres
to SQLite syntax [2].

[1]
[https://godoc.org/github.com/majewsky/sqlproxy](https://godoc.org/github.com/majewsky/sqlproxy)

[2]
[https://github.com/sapcc/limes/blob/205f9980a41d75fc0315e0ac...](https://github.com/sapcc/limes/blob/205f9980a41d75fc0315e0acc1e3daeb9853048f/pkg/db/connection.go#L48-L60)

~~~
SteveBash
Why just not use a local postgres instead of sqlite, your case seems like
overengineering to me.

