Hacker News new | past | comments | ask | show | jobs | submit login
One-pass Compiler (keleshev.com)
161 points by halst 2 days ago | hide | past | web | favorite | 43 comments





Lua is an example of a programming language that still uses an one-pass compiler. It keeps the implementation small, which is useful in embedded contexts. Additionally, not constructing an AST helps when parsing very large files. Lua files are sometimes used to store data, sort of like JSON.

If anyone's interested in a simple single-pass compiler for a full language, you can also check out munificent's Crafting Interpreters[0]. It walks you through a Java based AST parser and C based bytecode compiler.

[0] http://craftinginterpreters.com/


Reminds me of "Let's Build a Compiler"

https://compilers.iecc.com/crenshaw/ circa 1995

And a new C version https://github.com/lotabout/Let-s-build-a-compiler


There are a few trivial optimizations that you can do even in a one-pass compiler. See http://t3x.org/t3x/comp.html

Details are described in the book Write Your Own Compiler, http://t3x.org/t3x/book.html


Cool stuff! Register management is a pain to do. It's nice to be able to leave that to LLVM or treat x86 as a stack machine.

I wrote a compiler series exploring all three of these variations for a lisp dialect compiled by JavaScript.

https://notes.eatonphil.com/compiler-basics-an-x86-upgrade.h...


Tangential - in your work in progress compiler book https://keleshev.com/compiling-to-assembly-from-scratch-the-... are you going to write yourself the lexer and parser or use Flex and Bison like in the article ?

In the book, lexing and parsing is done from scratch. I think tools like Flex and Bison are very handy, but for the book, my focus is on learning value.

I remember some of the old DOS compilers would generate similar code, using the stack as an actual expression evaluation stack when registers weren't sufficient. It's not a bad idea even now, since x86 has an internal cache of the top of the stack so operations on it are specifically optimised, and the opcodes are very small: push/pop reg is 1 byte, push s8 is 2.

I wrote something very similar recently, also a single-pass, but with a trivial internal representation:

https://github.com/skx/math-compiler

The real difference between the compiler in the example, and mine, is that I handle floating-point operations and also explicitly use printf to show the output. (Because otherwise your return value is limited to an 8bit number.)


This is nice. The -debug flag is a cool feature. But if you first build a representation of the whole program in memory, then traverse that representation to generate code, it's not really single-pass.

Sorry my comment was wrong, I should have said "not a single-pass", that's why I mentiond the trivial internal representation.

Too late to edit now!


Because otherwise your return value is limited to an 8bit number

Only on Linux; Windows lets you use the full 32 bits.


That's true. Though I guess even there there are limitations, because you can't return non-integers :)

I'm currently working on a 3-part blog series on writing a one-pass compiler in Python that emits C. It is quite a bit longer though because we make the lexer and parser from scratch. I started it because my students were having trouble with other tutorials because of all the jargon, theory, and libraries required.

http://web.eecs.utk.edu/~azh/blog/teenytinycompiler1.html


Good text, but let’s nitpick. I’m too lazy/confident/arrogant to verify, so let’s potentially embarrass myself here.

    def getToken(self):
        self.skipWhitespace()
        self.skipComment()
How does that handle multiple consecutive comments? Whitespace following a comment?

(returning ‘comment’ and ‘whitespace’ tokens would fix this, and would make it possible to reuse the lexer for pretty-printing/syntax coloring)


It works because a comment by definition goes until a newline, so you can’t have consecutive comments without a newline token in between.

I’ll look into if there’s a better way to do it than this based on your suggestion, thanks!


I've done the same thing for the same reason but in JavaScript and compiling a lisp dialect.

https://notes.eatonphil.com/compiler-basics-lisp-to-assembly...



That's very well done, thanks for sharing. I really like the straight forward writing style, I've never attempted to write a compiler, and have only done basic parsers for day to day tasks, so like the clean, simple approach.

Very glad to hear it! Part 2 will be out in a few days and part 3 the next week.

I’ve been writing a one-pass optimising compiler for Wasm. It’s not perfect, but the code it outputs currently outperforms Cranelift's by a factor of 2-10x for many of the files in the Wasm specification testsuite.

Wow, that sounds incredible. Is that a .wasm -> .wasm compiler?

Is that due to SIMD or something? I got about 80% of LLVM performance when I was using CraneLift.

That was a short and epic read.

> This limited both the language features that were possible and the quality of the produced code.

What are the limitations ?


Older versions of C had the restriction that variables could only be declared at the start of a block. Supposedly this was due to the original C compiler being one-pass. It's easier to keep track of the size of the stack frame if you see all declarations in one place. Though I think with a frame pointer you could make it work anyway. You have essentially the same complications if you allow programmers to open a block anywhere and declare variables there. I don't know if the very first versions of C allowed this.

Maybe more exotically, here is a legal Haskell program:

    compute = f x y
    x = 3
    y = 4
    f = (+)
Using typed identifiers before they are declared would not be possible in a single-pass compiler. You wouldn't know what code to emit for compute since you wouldn't even know its type until you have seen the definitions of x, y, and f.

Interestingly, this restriction does not apply to goto labels, which you can use first and define later: The compiler can just emit the label as written into the assembly code and let the assembler worry about patching up the jump target.


>Older versions of C had the restriction that variables could only be declared at the start of a block.

Pascal also has that restriction

Although Delphi does not have it anymore. But FreePascal still has it, even in Delphi compatible mode. The FreePascal developers have also said, they will keep that restriction to improve readability. The code could not be read anymore if variables were placed willy-nilly everywhere


I’ve heard the readability justification for C restriction, but it doesn’t make sense: this prevent ´constification’ of variables (make the variable contains only one value)..

I wonder if Delphi still a one pass compiler?

> Interestingly, this restriction does not apply to goto labels, which you can use first and define later: The compiler can just emit the label as written into the assembly code and let the assembler worry about patching up the jump target.

A one-pass compiler can manage this by maintaining a mapping of undefined labels to a list of goto statements that refer to them. Once the definition is located, unwind the list and fill in the jump values. Types are trickier because the size is unknown.


There is no way to "fill in the jump values" if you have already emitted the goto to a place you can't modify later. Like the featured article does, emitting its code with printf. I agree that you could emit code to an intermediate buffer, fix it up later, and still call this (a slightly more relaxed version of) one-pass compilation.

Fair point... kinda feels like passing the buck, but not having that memory overhead is nice

I'm actually working on a Pascal compiler right now. Register management has been a really tricky part for me, but I didn't even thing of treating an x86 processor as a stack based machine. Very clever, especially when you just want a simple compiler an not anything super optimized.

A simple optimization is to make a stack machine with the top of stack cached in a register. This can improve speed about 10% in general operation.

The author states:

>"An optimizing compiler would constant-fold our expression into a single number ahead of time."

Could someone say what is meant by constant-folding here?


As a concrete example I wrote a simple interpreter a while back, in golang. Operations would be compiled to a simple bytecode which would be executed at runtime. So:

     3 + 4
Would get "compiled" to:

     PUSH_INT 3
     PUSH_INT 4
     OP_PLUS
As you can imagine this was a virtual stack-machine. After a while I started folding these operations:

* If there was a "push int"

* And another "push int"

* And a maths operation.

* Then replace.

So in my case I'd first generate the naive version, then have a separate "optimise" pass which would rewrite the program:

     PUSH_INT 7
     NOP
     NOP
     NOP
     NOP
A later optimization pass would remove consecutive NOP operations. I made a brief blog-post here:

https://blog.steve.fi/adventures_optimizing_a_bytecode_based...

There were a couple of bugs found along the way, but the actual implementation of this optimization pass was pretty simple:

https://github.com/skx/evalfilter/pull/66/files

The biggest issue I had to face was that "1 + 3 => 4" is trivial, but my virtual machine only supports loading integers. So I couldn't collapse "2 * 3.3" into the constant "6.6".


For anyone wondering, the NOPs are so that jumps don't get messed up. Assembly level GOTOs generally work on byte/word offsets, so if something intended to jump back before the calculation but was expecting two PUSH_INTs and an OP_PLUS (5 words: 3 opcodes, 2 arguments) but there was only a single PUSH_INT (2 words), it would jump back 3 words too far.

Good clarification, and exactly right!

Later I do walk back over the bytecode and remove the nops. But I have to update all JMP/CALL instructions to cope with the changed destination offsets. Not a hard job in my case, as there are only a couple of instructions which refer to byte-offsets.

(If I allowed "LD A,[BC]", or similar permuations I'd have a lot more work to do.)


Basically you have all the info you need at compile time, so you just do the work then rather than having to do it every time while the program is running.

Ex: 4 + 4 would be condensed down to just 8 at compile time rather than writing the code to get 4 and 4 both into registers and add them at execution time.


https://en.wikipedia.org/wiki/Constant_folding: "Constant folding is the process of recognizing and evaluating constant expressions at compile time rather than computing them at runtime. Terms in constant expressions are typically simple literals, such as the integer literal 2, but they may also be variables whose values are known at compile time. Consider the statement:

    i = 320 * 200 * 32;
Most compilers would not actually generate two multiply instructions and a store for this statement. Instead, they identify constructs such as these and substitute the computed values at compile time (in this case, 2,048,000)."

Say you have a simple expression within your code: 4+3*2. Rather than computing the result at run time, you can reduce (fold) that expression into 10, because it's a simple constant that only needs to be computed once.

A one pass compiler is going to generate code which is slower than a modern JIT/interpreter. Why is a one pass compiler interesting?

Because they're really fast. Actually, the lower tiers of modern JITs actually need speed, so being able to do code generation in one pass there is a huge boon.

How so?

Say, F# compiler is close to single-pass, I believe, producing CIL, which goes to JIT/AOT compiler later




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: