
Writing a compiler in Python using Lex, Yacc and LLVM - Alcides
http://wiki.alcidesfonseca.com/blog/writing-compiler-using-python-lex-yacc-and-llvm/
======
sketerpot
Inspired by this, I've started writing a compiler for a subset of Matlab. I'm
sick of Matlab being an interpreted language year after year, and while I
don't hope to _change_ that, I can at least do a proof-of-concept to lend
weight to my derision.

I've got the lexer and parser done, thanks to Parsec, and some basic C code
generation as a sanity check. Now for the runtime LLVM code generation, to
make it feel like an interpreter! Thanks, HN!

~~~
maximilian
Could you post a link to Parsec. Quick googling didn't come up with any
compiler tools. Thanks!

~~~
GeoJawDguJin
It's a Haskell library: <http://hackage.haskell.org/package/parsec>

Here's a tutorial that uses it to parse a simplified Scheme:
[http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_H...](http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours)

------
jwecker
He mentioned that llvm-py documentation was lacking... I was going through it,
seems pretty darn good to me (relatively). <http://mdevan.nfshost.com/llvm-
py/userguide.html>

------
ilkhd2
Impractical: 1) The compiler is gonna be very slow; 2) If you make compiler
for low-level language, such as C you need precise correspondence between data
types used in cpu, datatypes used in compiler's implementation language and
datatypes in the target language.

~~~
daeken
On the contrary. Compilers written in a high-level language are at a distinct
advantage: they're easier to optimize (significantly so), easier to debug, and
support forms that are difficult to deal with in a low(er)-level language like
C or C++.

I do compiler development every day, and I no longer touch low-level
compilers. Everything I write is Boo (for my OS), Ruby (for my startup -- by
far my favorite language for compiler dev, despite not liking it in general),
or Python. As a result, I have far more maintainable code than the majority,
and the code I output is incredibly well optimized.

Edit: To clarify, by 'support forms that are difficult to deal with ...' I
mean things like using a pure S-exp structure. Rather than a traditional
intermediary form, you can represent your compiler state as an S-exp and
iteratively optimize and compile it. Standardizing around a form like that,
when it's easy to deal with, greatly simplifies code. (This is actually the
reason Ruby is my compiler language of choice these days.)

~~~
ilkhd2
if you read carefully what I said, i never said that hig-level languages are
bad - I said only this: 1) Compiler written in python, ruby and other "slow"
(do not beleive that pyruby is slow?) languages going to take eternity to
compile linux kernel. 2) You need to have _precise_ mapping between types in
compiler and the language you implement. It is so important that gcc use a
software library for floating point computation, otherwise you are locked-in
wit you cpu's FP implementation and can not write cross-compilers.

~~~
viraptor
> You need to have _precise_ mapping between types in compiler and the
> language you implement.

No. You don't need that. You can treat your data however you want until you
place it into the final binary. Before that, you can even treat all your
numbers as strings if you really want to. If you need some constant expression
evaluation, you just have to replicate the target machine's math operations in
software - no magic involved here. As long as you get the right result, noone
cares what you do internally.

If your compilation machine == target machine, you can even construct a
function that calculates the expression and actually run it to get the result.

~~~
ilkhd2
When you make optimization you need to do intermediate calclulation _exactly_
the same way the target machine does. Even if the number is ascii or unicode
string, you need to do the computations strictly same way s target machine
does, otherwise you'd break assumptions of the programmer, and lead to
unreproducible code.

