
A Handwritten Math Parser in 100 lines of Python - gnebehay
https://github.com/gnebehay/parser
======
wryun
Slightly more amusingly, but non-equivalently:

    
    
      import sys
      from functools import reduce
      from operator import add, sub, mul, truediv as div
    
      def calc(s, ops=[(add, '+'), (sub, '-'), (mul, '*'), (div, '/')]):
        if not ops:
          return float(s)
    
        (func, token), *remaining_ops = ops
        return reduce(func, (calc(part, remaining_ops) for part in s.split(token)))
    
      if __name__ == '__main__':
        print(calc(''.join(sys.argv[1:])))
    
    

Does support order of operations, in spite of appearance. Extending to support
parentheses is left as an exercise for the reader :)

~~~
tgb
This is pretty clever. There's a video by Jonathan Blow, creator of Braid and
currently making a programming language called Jai, that talks about parsing
expressions with order of operations. He gives an simple approach that
apparently was discovered recently (though he didn't remember the paper name)
that does this in a single pass and hardly any more complicated than what you
wrote. You can see it here [1]. He also trashes on using yacc/bison before
that and on most academic parsing theory.

[1]
[https://www.youtube.com/watch?v=MnctEW1oL-E&t=4578s](https://www.youtube.com/watch?v=MnctEW1oL-E&t=4578s)

~~~
mucinoab
I would like to know the name of that paper if anyone knows.

~~~
lioeters
I believe he was talking about Pratt parsing in the video.

Top Down Operator Precedence - Vaughan R. Pratt -
[https://tdop.github.io/](https://tdop.github.io/)

What he called "priority" is called "binding power" in the paper.

~~~
Leszek
I like Jonathan Blow and all, but he should be careful with trashing academic
parsing theory if he thinks that Pratt parsing was only discovered "in the
last few years".

------
pansa2
> Luckily, one can make the grammar LL(1)-parser-friendly by rewriting all the
> left recursions in the grammar rules as right recursions.

The problem with this is that it changes all the left-associative operators to
be right-associative. For example, `python compute.py '3-2-1'` gives the
result 2, but it should be 0.

Edit: Why the downvote? Am I wrong?

~~~
bmn__
You are correct, the result 2 from the submitted parser.py is erroneous.
Downvoters should be ashamed of their abusive behaviour.

It is not well known amongst programmer who would like to use parsers that
most algorithms as implemented in countless software libraries are wrong and
broken beyond repair. We should strive to make it a well-known fact and
promote those parsers that work properly, so that less programmers and end-
users need to suffer.

------
azhenley
This does not properly handle associativity.

It erroneously computes "7-4+2" as 1, not 5.

------
justinmeiners
Doesn't shunting yard solve this pretty simply? Once you have an expression in
RPN, building expression trees is pretty easy.

~~~
rbtprograms
I recently did this same project in Javascript and you're correct, shunting
yard is perfect for this.

------
glxxyz
Similar calculator written in C++ by Bjarne Stroustrup:
[https://www.stroustrup.com/dc.c](https://www.stroustrup.com/dc.c) and in Go
by me:
[https://github.com/glxxyz/snippets/blob/main/calculator/dc.g...](https://github.com/glxxyz/snippets/blob/main/calculator/dc.go)

Not that much longer, and supports in interactive session with named
variables.

------
BerislavLopac
When I saw "handwritten math" I thought it was something in the vein of
[https://photomath.app](https://photomath.app)

------
kuesji
shameless plug. probably messy code but same mission.

[https://github.com/kuesji/mec](https://github.com/kuesji/mec)

------
tom_mellior

        (3) Exp -> Exp2
        (4) Exp2 -> Exp2 * Exp3
        (5) Exp2 -> Exp2 / Exp3
        (6) Exp2 -> Exp3
    

This is horrible to read and to maintain. Real grammars use names like
"additive_expr", "multiplicative_expr" etc., or maybe "term" and "factor",
though I can never remember which is which.

~~~
ImprobableTruth
For me it's actually the opposite, I find using the priority makes it actually
a lot more readable, especially once you have a lot of operators. It's also
quite close to a generalized (and potentially dynamic) version that is just a
function that takes in a priority and returns the correct parser.

~~~
tom_mellior
It's not that I'm opposed to priorities, it's just that I think it's
unreasonably hard to see the typos here:

    
    
        (3) Exp -> Exp2
        (4) Exp2 -> Exp2 * Exp3
        (5) Exp3 -> Exp2 / Exp2
        (6) Exp2 -> Exp3
    

There are reasons why we usually give variables more meaningful names than
var1, var2, var3.

(Obviously I agree that if the parser can be expanded dynamically with
operators with numeric priorities, then you need numeric priorities. But that
is a very different case.)

------
spinningslate
Really nicely-written project. The documentation is clear and sufficiently
comprehensive; the project structure is immediately obvious and navigable; and
each source file is both intuitively named and very readable.

------
skohan
Where would I read up on what's currently considered best practice for
implementing a parser? We learned recursive descent in school, but I have
heard several remarks that this is no longer considered en-vogue.

~~~
eliben
Recursive descent is what I use today. En-vogue or not, it's the most robust,
flexible and - most importantly - maintainable approach in the long term.

~~~
tzs
It's also easy to explain to someone who has not studied parsing, and easy for
them to implement. That was why I chose recursive decent when I needed to
trick someone into writing a C compiler.

This was back in the early '80s, back when there was much more variety in
personal computers both in CPU architecture and operating system, and C was
fairly rare on such small machines.

A friend who was a really good programmer but who had not taken the compiler
course in college (he was a math major, not a CS major) had started a company
aiming to produce a C compiler for a new small computer that was coming out in
in a few months, with the compiler meant to be available as soon as the
computer came out. He'd hired another friend to write the compiler.

This was well before the internet, when advertising was a lot slower than it
is now. The company had to buy its ads in the relevant computer magazines well
ahead of time.

That guy writing the compiler was a genius, and was on track to produce an
amazing compiler especially given the limited resources of the personal
computers back then...but it turned out he was not a _fast_ genius. The
compiler would not be finished until long after the new computer launched.

The first friend was getting rather depressed, knowing he had to get someone
faster to write the first version of the compiler but not knowing anyone else
who could do it and was available.

So one evening I decided to convince him that he could do it himself. I sat
him down in front of a whiteboard, and told him he needed three things: a
lexical analyzer to tokenize the input, a parser to parse the token stream,
and a code generator to turn the parse tree into assembly.

I then outlined recursive decent parsing on the whiteboard, giving examples of
using it on C. I was able to convince him that parsing was not hard, and that
he could do. I then moved on to code generation, and showed him that he could
generate code for a simple stack-based virtual machine, and that could easily
then be mapped to the real, register-based CPU--it would be inefficient, but I
also went over how later he could add a simple optimizer that could remove a
lot of the inefficiency.

Once I had him convinced that he could handle parsing and code generation and
some optimization by their deadline, I pretended we were done and started to
leave. He interrupted and asked about lexical analysis.

I acted embarrassed that I had forgotten that, and "admitted" that it was the
hardest part, and said I'd try to think of something and went home.

Of course that was a lie. When I got home, I quickly wrote a lexical analyzer
for C, and then the next morning went back and handed it to him.

He now actually believed that the hardest part was done, and was now sure that
he could do the remaining easier parts himself. And he did. We both got a good
laugh later when I finally told him that lexical analysis is the easiest part.

