Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: PlanckForth – Bootstrapping an interpreter from handwritten 1kb binary (github.com/nineties)
132 points by nineties on Dec 6, 2021 | hide | past | favorite | 19 comments



This project is to bootstrap a language processor from a hand-written ELF binary. This is a hobby project and is not intended for any practical use, but constructing a language processor completely from scratch is a lot of fun. Please try running, reading and playing with the code!

Bootstrapping proceeds as follows:

1. Implement a simple Forth interpreter (really) by handwriting ELF.

2. Grow it into a relatively expressive language through self-extension.

Since writing a program in machine language by hand is a tough work, the first interpreter is designed to be very simple. Every built-in word is single-letter and the interpreter just repeats that reads a character, looks it up from the dictionary and executes it. Also there is no error checking.

This is the actual code for the first interpreter, which is a 136-byte implementation of the interpreter followed by a built-in dictionary of 888 bytes.

https://github.com/nineties/planckforth/blob/main/planck.xxd

The first interpreter and language is so esoteric that, for example, the Hello World looks like this.

    $ ./planck
    kHtketkltkltkotk tkWtkotkrtkltkdtk!tk:k0-tQ
bootstrap.fs self-extends this. After running bootstrap.fs, the Hello World program can be written like this:

    $ ./planck < bootstrap.fs
    ." Hello World!" cr
At the beginning of bootstrap.fs, you can only use one-letter words, so the code is difficult to read as shown below.

    $ head bootstrap.fs
    h@l@h@!h@C+h!k1k0-h@$k:k0-h@k1k0-+$h@C+h!ih@!h@C+h!kefh@!h@C+h!l!
    h@l@h@!h@C+h!k1k0-h@$k h@k1k0-+$h@C+h!ih@!h@C+h!kefh@!h@C+h!l!
    
    h@l@ h@!h@C+h! k1k0-h@$ k\h@k1k0-+$ h@C+h!
        i       h@!h@C+h!
        kkf     h@!h@C+h!
        kLf     h@!h@C+h!
        k:k0-   h@!h@C+h!
        k=f     h@!h@C+h!
        kJf     h@!h@C+h!
However, at the end, you can write a normal Forth program as follows. In this way, the interpreter and the language are self-extended during the execution of the script.

    $ tail bootstrap.fs
            next-arg dup argv @ !
            included
        else
            ." Welcome to PlanckForth " version type
            ."  [" runtime type ." ]" cr
            copyright
            ." Type 'bye' to exit." cr
            s" /dev/tty" included
        then
    ; execute
Various functionalities like compile mode, immediate mode, control-flow structures, literals, variables, constants, file I/O, heap memory, etc. will be available after running bootstrap.fs.

Also, the language and bootstrap.fs are designed to be as environment-independent as possible, so you can implement the runtime in other languages as well. As an example, I added an implementation of the runtime in C and Python 3.

There is no space for detailed explanation here. So please read the following source code and comments.

https://github.com/nineties/planckforth/blob/main/bootstrap....

Also, there is a commentary article in Japanese here. I hope you can read it with google translate. (I will write it in English if I have time. someday.)

https://qiita.com/9_ties/items/349b2ed65b7cd8a7d580

Feedbacks and pull requests such as runtime implementations in other languages are very welcome.

Thank you for reading.


Sorry. I found a bug in the *Hello World* program I posted above. The Q operation requires an integer argument, an exit status.

So

    kHtketkltkltkotk tkWtkotkrtkltkdtk!tk:k0-tQ
is wrong. The below is correct.

    kHtketkltkltkotk tkWtkotkrtkltkdtk!tk:k0-tk0k0-Q


It would make a great addition to the Bootstrapping Wiki [1]!

[1] https://bootstrapping.miraheze.org/ (also a great trove of resources if you are into this kind of things)


pretty cool wiki


bootstrap.fs is a thing of beauty

https://github.com/nineties/planckforth/blob/main/bootstrap....

It starts off looking like line noise (the very simple interpreter defined in hex) and gradually turns into the forth we know and love.

Fantastic!


I love that the first thing added is support for comments, followed by a comment explaining that we just added comments and how it did so.

"Before I can explain anything, I have to implement the ability to explain things and then explain that I just added the ability to explain things."


It is most certainly inspired, in style, by JonesForth [1] which is a classic moment in the "Programming Pedagogy Hall of Fame."

[1] https://github.com/nornagon/jonesforth/blob/master/jonesfort...

[2] https://github.com/nornagon/jonesforth/blob/master/jonesfort...


This is so over my head I couldn't grasp it with a jetpack.


Don’t underestimate yourself. His interpreter just has to read a byte, look up the definition in a dictionary, and then jump to the appropriate subroutine. The one he wrote reads from stdin; if you were actually bootstrapping a new machine from scratch you would be writing it to the boot sector of a disk, so it would need to read input from later sectors of that disk. Not really difficult to arrange, though the details would differ on different types of computers. You would just have it read a fixed number of sectors, long enough to read in your bootstrap program, and then everything would be in memory from then on out.

It has a rather minimal set of commands in that dictionary to start with. The next thing you want to do is give it a program that adds new commands to the dictionary to make the language less annoying and more capable. This is in the bootstrap.fs file. The first two lines define a command called `\`, which reads and discards bytes from the program until it finds a newline byte. This allows the language to include comments. It also defines a command called ` ` which does nothing except allow the program to include spaces for formatting. :)

Commands are added by writing to memory to add new entries to the dictionary. The dictionary is a linked list, and the bootstrap program “allocates” memory by simply starting at the beginning of free memory and writing values consecutively. That’s good enough to extend the dictionary as long as it never goes backwards. I haven’t read the whole bootstrap program, but presumably it implements something more sophisticated than that. On the other hand, memory is cheap. You can do a lot even if you never deallocate anything.

Since the language is so simple, you can learn to decypher it pretty easily. You’ll pretty quickly start to notice elements that are frequently repeated, such as `h@!h@C+h!`, which stores the byte on the stack to the current address in memory, then moves the current address to point to the next location in memory. In C that would be `*ptr++ = foo`. These repeated bits are soon defined as new commands, so that the program gradually gets less repetitive.

My favorite trick is actually one he does to improve readability. The command to read input, `k`, reads a single byte as a signed integer. If you want to read in something like -5, you could include that as a single byte with the value 251 (or 0xFB). Instead, he writes it as `k0k5-`. This reads the byte `0`, which has value 48 (0x30), then the value 53 (0x35), and subtracts them. But _visually_, it looks almost exactly like `0-5` to the programmer which is indeed the same as negative 5. This is used in the `\` command to jump backwards by 5 instructions after every read that doesn’t find a newline, for example. On line 645 he finally gets around to defining a command that negates a number, on line 1153 he defines one to read in a number in either decimal or hexadecimal, and on line 1304 is the command that reads in a double–quoted string. After that he can start writing out numbers almost directly: `s" -5" >number` reads in a decimal -5, but now it can be trivially modified to contain any number you need.


Related – bootsector Forth https://github.com/NieDzejkob/miniforth


Reminds me of buzzard.2 from the 1992 ioccc contest.

http://www.ioccc.org/years.html#1992


Does the "metacircular" part just mean that the compiler can use definitions it's already compiled in the current pass?


No, it more or less means that a language is described in (part of) the language itself. And before you ask, no, it's not quite the same as self-hosting (good question though). The latter is more associated with compilers that can take a source file and produce another compiler. The former is often more like an interpreter that provides support for part of a language, but those part are enough to define support for the rest of the language.

So in this case, the base implementation starts with support for only a subset of the Forth language, picked for being either essential or much more performant when implemented "natively" in the binary. A subset of this subset is then used to construct support for the rest of the language. This is very typical for Forth- and Lisp-like languages.

[0] https://en.wikipedia.org/wiki/Meta-circular_evaluator

[1] https://en.wikipedia.org/wiki/Self-hosting_(compilers)


It would be more accurate to describe a metacompiler as a cross compiler to the machine an interpreted Forth is implemented on.

I recommend Moving Forth [1] for the full explanation. (That's the part that addresses metacompliers, the rest is still quite a great book.)

[1] https://www.bradrodriguez.com/papers/moving4.htm


Reminds me of Mes[1] from Boostrappable project.

[1] https://bootstrappable.org/projects/mes.html


@nineties, I'm curious why you included things like 'mul', 'divmod', 'and', 'or' in the initial instruction set? These can all be composed, so would make your initial ELF binary a bit smaller.


I’d bet it was just because those are easy; there is a machine instruction to compute them. Might as well use it and have one less subroutine to define later.


Thank you db48x. Yes, that's the reason. I added them to the first dictionary since these instructions are very simple even when writing in machine language.

In case of mod and divmod, implementing them with add, sub and loop has a serious performance issue. Replacing it with shift operations, bit operations, add, sub and loops could be an option.


For those who are interested there is also a very similar project called StoneKnifeForth




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: