
Programming Your Own Language in C++ - etrevino
https://accu.org/index.php/journals/2252
======
wcrichton
This article shows a lot of the silliness in both a) writing compilers in C++
and b) re-creating well-worn compilers tools like parsers from scratch.

1\. Their definition of values in the language (unfortunately called
"Variables") is a product (class) over all possible types:

    
    
      class Variable {
        ...
        double numValue;
        string strValue;
        vector<Variable> tuple;
        string action;
        string varname;
        Constants::Type type;
      };
    

This means each value at runtime has a large memory footprint, as each
instance of a value contains slots for all possible types even though only one
will ever be filled. Sum types solve this problem.

2\. The author intermixes parsing and code generation. This makes it difficult
to implement any kind of optimizations, even for an interpreter. For example,
in order to do function inlining in the author's language, you would have to
manually inline calls in every single part of the code generator instead of
having IR transformations as separate modules (see: LLVM). It also tightly
couples the two pieces of the compiler, making it difficult to change parsing
strategies without rewriting the code generator for instance.

3\. It took the author nearly 5000 lines of C++ to write an interpreter for a
simple Javascript-like scripting language. Some of this is due to the
verbosity of C++, and some is due to the needless effort of reimplementing
parsing from scratch. I implemented a similar language in <1k lines of Rust
[1], and I'm sure that others could do even better in Haskell or OCaml. On top
of that, the author's claimed benefit of extensibility requires writing large
amounts of boilerplate.

[1] [https://github.com/willcrichton/lia](https://github.com/willcrichton/lia)

~~~
chubot
At the risk of stating the obvious -- in C and C++, unions solve that problem.

~~~
wcrichton
They solve the memory issue, although they are inconvenient to use vs. pattern
matching and aren't type safe. Additionally, I've had no luck using classes
(e.g. string) inside of unions because the compiler reasonably complains about
initializers. I believe this is somewhat solved with boost::variant (and
that's also getting put into C++17, iirc), but ideally it would be a part of
the type system.

~~~
wolfgke
Replace the type (e.g. std::string) by the pointer type (e.g. std::string*) in
the union.

------
packetslave
Cached version, since we appear to have hugged their database to death.

[http://web.archive.org/web/20160616053900/http://accu.org/in...](http://web.archive.org/web/20160616053900/http://accu.org/index.php/journals/2252)

~~~
warriorkitty
I just love how sites get unavailable after getting to the top of the HN. :)

------
partycoder
If you are serious about making your own programming language, I would
strongly suggest to target the LLVM.

~~~
spriggan3
> If you are serious about making your own programming language, I would
> strongly suggest to target the LLVM.

Strange advice given the state of LLVM on platforms such as windows. Both
Swift and Crystal, LLVM based don't run on windows for instance. LLVM gives
you more quickly but at the cost of portability.

~~~
Ono-Sendai
LLVM works fine on Windows.

------
n00b101
How does this contrast with recursive descent parser?

~~~
jfoutz
ANTLR generates a recursive descent parser. The value is, it's easier to
compare (hopefully) your EBNF to the antlr spec than it is to compare your
EBNF to lower level code.

~~~
PeCaN
Doesn't ANTLR usually generate a bottom-up parser? I've never used ANTLR but
it would be very odd to generate a recursive descent parser; LR grammars are
parsed bottom-up (presumably the ‘LR’ in ‘ANTLR’).

Bottom-up parsers are nigh unreadable by mortals.

~~~
chubot
ANTLR generates top down parsers which use a few different algorithms,
including Parr's LL(star) algorithm and the new ALL(star) algorithm for antlr
4:

[https://scholar.google.com/scholar?cluster=11340764028758422...](https://scholar.google.com/scholar?cluster=11340764028758422091&hl=en&as_sdt=2005&sciodt=0,5)

[https://scholar.google.com/scholar?cluster=70305927860962254...](https://scholar.google.com/scholar?cluster=7030592786096225431&hl=en&as_sdt=0,5&sciodt=0,5)

The ANTLR acronym is explained here and doesn't have anything to do with LR
grammars: [http://www.antlr.org/](http://www.antlr.org/)

------
chachram
antlr's c++ backend is pretty good for complex projects

