
Bootstrapping a Forth in 40 lines of Lua code (2008) - tosh
http://angg.twu.net/miniforth-article.html
======
AQXt
If you want to understand Forth, I recommend this article:

A sometimes minimal FORTH compiler and tutorial for Linux / i386 systems by
Richard W.M. Jones

"LISP is the ultimate high-level language, and features from LISP are being
added every decade to the more common languages. But FORTH is in some ways the
ultimate in low level programming. Out of the box it lacks features like
dynamic memory management and even strings. In fact, at its primitive level it
lacks even basic concepts like IF-statements and loops.

Why then would you want to learn FORTH? There are several very good reasons.
First and foremost, FORTH is minimal. You really can write a complete FORTH
in, say, 2000 lines of code. I don't just mean a FORTH program, I mean a
complete FORTH operating system, environment and language. You could boot such
a FORTH on a bare PC and it would come up with a prompt where you could start
doing useful work. (...)

Secondly FORTH has a peculiar bootstrapping property. By that I mean that
after writing a little bit of assembly to talk to the hardware and implement a
few primitives, all the rest of the language and compiler is written in FORTH
itself. Remember I said before that FORTH lacked IF-statements and loops? Well
of course it doesn't really because such a lanuage would be useless, but my
point was rather that IF-statements and loops are written in FORTH itself."

[https://github.com/nornagon/jonesforth/blob/master/jonesfort...](https://github.com/nornagon/jonesforth/blob/master/jonesforth.S)

~~~
kbr2000
Agreed; also, this book is golden: Threaded Interpretive Languages (R. G.
Loeliger)

~~~
LargoLasskhyfv
Since it seems to be out of print, and on the internet archive I'll drop this:

[0]
[https://archive.org/stream/R.G.LoeligerThreadedInterpretiveL...](https://archive.org/stream/R.G.LoeligerThreadedInterpretiveLanguagesTheirDesignAndImplementationByteBooks1981)

------
dfischer
I’ve been in a “retro computing” deep dive lately and it’s been fascinating to
discover Forth. I was surprised that an entirely novel paradigm was hidden
despite much digging over many years. All of the sudden I am seeing it
everywhere now too - confirmation bias from growing up skill wise in the age
of dot com boom vs right before?

I like the feeling of concatenative languages. I especially seem to like the
postfix notation over prefix and largely this creates the paradigm of lisp vs
forth at first step - in my opinion.

I’ve recently dug in to Factor and it’s quite an amazing piece of art that is
highly usable and up to date despite such a small community. It almost feels
like a secret weapon and too good to be true. I’m having a ton of fun hacking
on it. It’s like Smalltalk and forth had a baby.

I dream of making an OS with Forth/Factor similar to Plan9 and Oberon.

I get a kick out of Forth being made of words and Genesis from the Bible.

At first was the Word, and the Word was with Forth, and the Word was Forth.

[https://factorcode.org/](https://factorcode.org/)

[https://en.wikipedia.org/wiki/Oberon_(operating_system)](https://en.wikipedia.org/wiki/Oberon_\(operating_system\))

Tidbit if anyone cares: I’ve been obsessing over returning to first principles
of computing. I’m bored of the internet and browsers. I want to have fun with
software AND hardware and Forth seems like such a perfect language to do
exactly that with. Factor is a nice grown up example of that but it’s
definitely a few steps removed from portability and self bootstrapping
behavior like in CollapseOS.

Retro Forth is extremely impressive: [http://forth.works](http://forth.works)

~~~
pjmlp
Note to Oberon, don't focus on version 1 alone.

Oberon System 3 with its gadgets framework or the current AOS (still used as
teaching tool at ETHZ) also have interesting ideas, and better support for
systems programming.

The book "The Oberon Companion. A Guide to Using and Programming Oberon System
3" is available as part of the System 3 environment delivered as part of AOS.

[http://cas.inf.ethz.ch/news/2](http://cas.inf.ethz.ch/news/2)

[http://builds.cas.inf.ethz.ch/](http://builds.cas.inf.ethz.ch/)

You can find some old ISOs from here
[https://github.com/cubranic/oberon-a2](https://github.com/cubranic/oberon-a2)

~~~
dfischer
Thank you. I’m incredibly interested in the UX paradigms demonstrated within
Oberon. I’d like to play with that and get some inspiration for a few ideas
I’ve been mulling about. Thanks!

~~~
pjmlp
See my blog article over Oberon, you will find screenshots for AOS.

Also have a look at Mesa/Cedar, as it was the inspiration for Oberon ideas.

[https://news.ycombinator.com/item?id=24450970](https://news.ycombinator.com/item?id=24450970)

Comments include a link to one hour demo on YouTube with members of the team.

As idea, many of these concepts can be done on modern OSes, on top of
something like COM/D-BUS/gRPC/XPC, but none of them go as far as they did.

------
arethuza
In the 1980s there was a home computer, the Jupiter Ace, that used Forth
rather than Basic:

[https://en.wikipedia.org/wiki/Jupiter_Ace](https://en.wikipedia.org/wiki/Jupiter_Ace)

Edit: I never used one (or indeed ever saw one) but for some reason reading
about Forth at the time gave me a lingering fascination with Forth type
languages - which was useful when I worked on a project for a few years that
used PostScript, C & Lisp ...

~~~
ume
Forth was available for the Acorn Atom (early 80s). Sadly, didn't have the
money or time to go down that route then. Had fun with its Basic though...

~~~
Wildgoose
I had the Forth for the Acorn Atom. It was great, a real mind-opener. Still
have it in a box upstairs!

------
kmill
Something I like about this article is how it helps reveal what Forth's
essence is:

Forth is an extensible virtual machine

Most interpreted languages that are implemented with a virtual machine
(sometimes known as a bytecode interpreter) have a fixed set of instructions.
Modulo some implementation details, writing new Forth words is akin to
extending the virtual machine with new instructions.

However, the article's Forth is a more extensible form of this. The entire
evaluation of the virtual machine is split into an extensible set of states
(starting with only interpret, compile, head, forth, lit), new types of
instructions can be introduced by adding to a list of heads (starting with
only DOCOL and EXIT), and then words are, specifically, lists of addresses
that the forth state interprets, whose evaluation is triggered by a DOCOL seen
in the head state.

This opens up possibilities to define other sorts of interpreters for specific
purposes. The article gives the example of polynomial evaluation, but you
could also do something like have a variant of the forth state for token
threaded code in memory-constrained environments. Sort of like how ARM has an
additional Thumb instruction set.

More detail for this last idea: you'd have an array of, say, 256 pointers, and
in the bforth state you'd read the next byte of the thread, look up the
address in the table, push it on the RS, then go into the head state. This
gives you perfect interoperability. You'd just need a special bcompiler state
that would look up the bytecode for each word.

~~~
MaxBarraclough
> an extensible virtual machine

I've read similar ideas on Lisp. I don't put much stock in it in either
context, but I think Lisp has a better claim to extensibility. All programming
languages enable decomposition of a program. Functions/methods/macros/words
don't really 'extend' a language.

You can't easily add garbage collection to Forth. You can't easily add static
type-checking to Forth. Why call it 'extensible'? It's no more extensible than
C.

It may be easier to hack on a Forth implementation than a C compiler, sure.

~~~
frompdx
> Why call it 'extensible'? It's no more extensible than C.

Extensibility refers to the ability to extend the compiler directly in the
language. Lisp supports this with macros, which are not functions and are
different than macros in C because they alter the behavior of the compiler,
and Forth supports this with defining words and compiling words, which behave
differently from other forth words because they alter the behavior of the
compiler. Compare this to a macro in C, which pre-processes the macro into
more C prior to compilation rather than at the time of compilation. In this
respect, both Forth and Lisp are more extensible than C because facilities to
extend the language exist within the language itself.

~~~
MaxBarraclough
I think I follow, but it seems rather fuzzy. Compile-time functionality
doesn't strike me as really extending the language.

How about _BOOST_SCOPE_EXIT_ in C++? [0] Is it extending the language, or just
a trivial macro-based hack on RAII? What's the difference?

My other issue with it is that the major features of programming languages
that people actually care about, aren't the kinds of things you can implement
with clever macros. They tend to rely on serious engineering.

Even in a highly extensible language, you can't easily throw together features
like industry-strength garbage collection, or a type system, or borrower-
checking. (I considered listing formal verification but that would be unfair,
it can't really be added to an existing language.) These things are
implemented by compiler engineers, and always will be. (If I'm mistaken about
that, and am underestimating anything, I'd be interested to know, but I don't
think I am.)

What's the 'killer example' of LISP macros? I can see a pretty neat example at
[1].

[0]
[https://www.boost.org/doc/libs/release/libs/scope_exit/doc/h...](https://www.boost.org/doc/libs/release/libs/scope_exit/doc/html/BOOST_SCOPE_EXIT.html)

[1] [https://docs.racket-lang.org/guide/pattern-
macros.html#%28pa...](https://docs.racket-lang.org/guide/pattern-
macros.html#%28part._pattern-macro-example%29)

~~~
kmill
You can add garbage collection to Forth, but for the trivial reason that Forth
is a low-level virtual machine to which you can add instructions/words for
managed objects, and then you can add some words to help you interact with
them more ergonomically -- words that run at compile time to change how code
is being compiled. I don't think it's right to think of Forth as a "language"
per se, although it's true there is a certain language that Forth programs
tend to be written in, the one people call reverse-polish and concatenative.

I can't say anything about adding a type system, because it's unclear what
sort of type system would even apply here. But at least in Common Lisp (which
incidentally already has a type system), I think the way you'd do it is create
your own front-end that interprets and checks all the typing/borrowing
information and then passes this to the usual compiler. There's a long
tradition in Lisps to define interpreters that extend the base language in
some way.

SBCL even lets you hook into the compiler, and I saw some article about
someone adding vectorization to SBCL using only user-level code.

Regarding BOOST_SCOPE_EXIT, it's yes or no depending on what extending the
language means to you. I think it's more useful to use a definition where the
answer is 'yes.' It's true that C macros are not a very powerful interface for
language extension, though.

Also, it seems like you're saying that "extensible" means "easily extended" or
"doesn't take serious engineering." It just means you can extend it (and, I'd
add, in a principled way). Writing good Lisp macros can take serious
engineering, but also there are other ways to extend Lisp.

In Lisp, you can (and do) define pattern match destructuring and lvalue
assignments entirely using macros. These are features that require a new
edition of a language standard, usually.

~~~
MaxBarraclough
> You can add garbage collection to Forth

Sure, but to do this properly you'd need to get your hands dirty with the
implementation. Garbage collection doesn't work well as a bolted-on library,
it needs to be handled in the language implementation. Forth is no different
from any other language in that regard.

> I can't say anything about adding a type system, because it's unclear what
> sort of type system would even apply here.

It's directly analogous to static type-checking in a 'normal' programming-
language, except we use the stack for accepting parameters and for returning
values.

It would ensure each word manipulates the stack in the expected way, ensuring
that the appropriate number of elements are consumed and pushed, and that each
element is treated as being the correct kind of data (type-checking should
fail if you attempt to dereference an integer).

This has been done in the _Kitten_ programming language. [0] Java bytecode
verifiers do something similar.

> SBCL even lets you hook into the compiler, and I saw some article about
> someone adding vectorization to SBCL using only user-level code.

Neat. I like the idea of advanced optimisations as libraries, sounds like a
good research topic.

> it seems like you're saying that "extensible" means "easily extended" or
> "doesn't take serious engineering." It just means you can extend it (and,
> I'd add, in a principled way)

Perhaps I seem dismissive, but it just doesn't seem to me like many serious
language features can be done properly in extensible languages.

The latest major features in mainstream programming languages are await/async
and borrower-checking. I imagine it might be possible to implement await/async
using Lisp macros, but I really doubt the same goes for borrower-checking.

> In Lisp, you can (and do) define pattern match destructuring and lvalue
> assignments entirely using macros.

That's pretty impressive, but it still seems that there are plenty of useful
language-extensions that can't practically be implemented in extensible
languages.

[0] [https://kittenlang.org/](https://kittenlang.org/)

~~~
kmill
You seem to be thinking about Forth-the-language, and I've tried to be clear
that Forth is referring to a way of programming where you extend the machine,
which seems to be what Chuck Moore has meant by Forth. Your program _is_ a
language implementation, defining a language in which you can describe the
program in the best way. There is no avoiding getting your hands dirty with
the implementation. No abstraction layers between your program and the
machine.

You can add garbage collection because the Forth program is a language
implementation. It probably won't look exactly like ANSI Forth in the end, but
that's (theoretically though potentially not in practice) ok.

> It's directly analogous to static type-checking in a 'normal' programming-
> language, except we use the stack for accepting parameters and for returning
> values.

I'm familiar with type systems for stack languages, but the issue is that this
only really could apply to a standardized Forth language. Forth is only
incidentally concatenative and stack-based --- this just happens to have both
a simple implementation and has good compositional properties. There is
nothing stopping you from introducing new programming models in a Forth, and
the article gives a few examples. You can add arbitrary bytecode interpreters
to a Forth, for example, and easily make it interoperable with the threaded
interpreter in the base Forth. Nothing stopping you from adding words so your
Forth feels like a register machine, either. It's because of this that any
sort of type system seems doomed (other than one that describes state
transitions of the CPU...).

> Perhaps I seem dismissive ...

I sort of don't see the point of saying "it's not really extensible if I can't
extend it in all possible ways." In any case, Forth and Lisp are, for trivial
reasons, languages that let you embed arbitrary other languages within them
(it's about as exciting as how Turing completeness seems to be easy to meet --
which is to say, not very). Worst case, the cost is implementing a whole
compiler or interpreter for said language, but, still, it's possible. Common
Lisp gives you many hooks to change low level behaviors, so there is usually a
better way than this worst case.

Something like borrow checking, though, is a pervasive new feature. The issue
is that _everything_ needs to know about how ownership is transferred. It's no
different from, even in C, changing some basic struct the whole application
uses and then having to update everything to account for it. You could add
borrow checking to Common Lisp, but it would have to be in demarcated areas in
which borrow checking is being done, and, like Rust, you'd have to figure out
an 'unsafe' to be able to use all the Lisp (respectively, C and C++) that's
already out there.

> formal methods

Being able to bolt on embedded formal methods to a programming language is
active research. I think it's unreasonable to expect this of any language
right now, other than ones specifically designed for it :-) (Speaking as a
Lean user.)

Forth doesn't really seem to be the kind of thing where practitioners would
care about formal methods... It's very defensible to say, then, that Forth is
wrong because of this (we depend quite a lot on software being correct!). But,
I don't see anything about Forth that prevents you from defining words that
check formal specifications -- and I don't mean this in the trivial adding-a-
formally-checked-language-into-Forth way.

> extensibility

Anyway, I don't think it's good or bad to be extensible. It's just a property,
and Forth and Common Lisp happen to be examples of languages that are much
more extensible than usual. There are certainly engineering challenges either
way when it comes to extensibility. And, with an extensible language, while
you might be able to make bespoke solutions, you now have a bespoke (hopefully
small) language to maintain, too. Language design ability and programming
ability don't necessarily go hand-in-hand, either...

~~~
lxdesk
The kind of issue I see this thread getting at is what I call the "terrarium
problem", which is, to make a whole software ecosystem like a language
environment or API mechanisms bounded by a standard, one has to build a
sufficiently large terrarium for it and maintain that. The largest, most
flexible terrariums we use as developers tend to be operating system
standards, browser standards, Internet protocols, etc.; in the case of Forth
it's defined by the bootstrapping process, since the bootstrapping code can
present a Forth that is near the hardware, or one that is embedded alongside
another language environment like Lua. It is even relatively straightforward
to have a Forth that generates Lua source code, which the Lua interpreter can
then apply its own checks to and evaluate efficiently.

Extensibility by itself doesn't solve the terrarium problem because it doesn't
define any standard, so there isn't anything to build on. When presented with
extensibility, you still build the terrarium, it's just a customized one, and
if you do it ground-up like Forth, you can potentially build it smaller and
simpler. But in the end you still have a terrarium with assumed boundaries.

This has led me away from Forth-the-language in the last few months to explore
the terrarium problem further, and I hit on the idea of treating this as an
organizational issue solved at the level of the core UX. If the problem is
defining boundaries in software, we should have better ways of doing that. At
first I considered this in terms of selection - selection being one of the
pillars of structured programming, and many of our improvements taking the
form of easier selection metaphors. This gradually led me to explore the idea
of document editing with a binder and sticky notes metaphor, with a
supplemental compiler process taking the resulting complex, layered documents
and processing them into a linear form for consumption. The presumption is
that if I present a rich set of tools for defining types of divisions and
groupings defined skeumorphically - pages, bins, tape, wire, guides, stickies,
overlays - and then add a bit of labelling and indirection on top, a powerful
"thinking tool" should emerge where the organization is easy and the
compilation system makes it easy to query and traverse it in a customized way.

If I can finish designing it.

------
gustinian
Yea! An informed, intelligent discussion about my now-favourite language. I've
been near fanatical about it for the last 3-4 years mainly using Mecrisp
Stellaris on the STM32F4xx. I find the elegant simplicity and raw power so
refreshing after all the bloatware and nanny-knows-best typed code. Even the
stack juggling is absorbing puzzle-solving fun. I've successfully built OLED
and MIDI drivers from scratch and am now tackling DSP SIMD assembly FFT
routines - stuff I would never normally attempt without this experience. It
harks back to the simpler days of the 1980s yet it is really is only limited
by one's approach - given some thought, you can adapt it to just about any
programming paradigm - should you actually need it, that is. I'm probably not
currently considered a full Forthwright yet not having built myself a Forth
from scratch (yet). I know Forth is not perfect, but it's the closest I've
found yet and deserves a renaissance, if it withered on the vine that would be
a travesty - it's too good for that.

------
mikewarot
I've always had an interest in Forth. It's a very powerful tool that I just
never got the hang of. When OS/2 was still young, and Microsoft hadn't killed
it off yet... I was told that you couldn't write assembler programs in
OS/2.... which I took as a challenge. Forth/2 was the result. Under OS/2 V2 it
generated native code, and actually got quite a few users once Brian
Matthewson wrote a manual for it.

These days, I wonder if it's possible to run Forth inside a VM as the
operating system. How hard would it be to stand up a networking stack?

~~~
JdeBP
Who told you that? The whole of Ed Iacobucci's _OS /2 Programmer's Guide_ was
in assembly language.

~~~
mikewarot
I could have used that book back then. That was life before Amazon and online
searches ruled everything. If it wasn't at Borders or Barnes and Noble, I
didn't know about it.

------
tzs
Here's a comment from a few years ago where I go over how do make a FORTH-like
language starting from a simple basic RPN calculator in C [1] and then adding
programmability (mostly implemented in the language itself with only a little
bit of C support), and finishing with some ways you could optimize it to
actually get decent performance.

This would be reasonable afternoon project (except for the optimization
stuff).

None of the C code is complex or tricky or big. You could easily do it in
assembly instead. That's why these kind of languages are great for a lot of
embedded applications. You only need to write a small assembly core to support
your language, plus assembly function to support your hardware, and then you
can do everything else in your higher level FORTH-like language.

[1]
[https://news.ycombinator.com/item?id=13082825](https://news.ycombinator.com/item?id=13082825)

~~~
stevekemp
I hacked up a quick implementation of your writeup in go just now, as I said
at the time your summary was wonderfully direct and "obvious".

[https://github.com/skx/foth](https://github.com/skx/foth)

Thanks again, four years later!

I'll get the compilation phase done tomorrow, barring surprises.

------
codetrotter
An interesting curiosity about Forth is the fact that the final stage of
FreeBSD kernel bootstrapping, the loader program, contains a Forth
interpreter. If you have a look in the /boot directory on a FreeBSD system,
you will see quite a few files named with the extension "4th" there, and in
the loader(8) man page we find among other things the following:

> loader [...] provides a scripting language that can be used to automate
> tasks, do pre-configuration or assist in recovery procedures. This scripting
> language is roughly divided in two main components. [...] The bigger
> component is an ANS Forth compatible Forth interpreter based on FICL, by
> John Sadler.

> [...]

> BUILTINS AND FORTH

> All builtin words are state-smart, immediate words. If interpreted, they
> behave exactly as described previously. If they are compiled, though, they
> extract their arguments from the stack instead of the command line.

> If compiled, the builtin words expect to find, at execution time, the
> following parameters on the stack:

> addrN lenN ... addr2 len2 addr1 len1 N

> where addrX lenX are strings which will compose the command line that will
> be parsed into the builtin's arguments. Internally, these strings are
> concatenated in from 1 to N, with a space put between each one.

> If no arguments are passed, a 0 must be passed, even if the builtin accepts
> no arguments.

> While this behavior has benefits, it has its trade-offs. If the execution
> token of a builtin is acquired (through ' or [']), and then passed to catch
> or execute, the builtin behavior will depend on the system state at the time
> catch or execute is processed! This is particularly annoying for programs
> that want or need to handle exceptions. In this case, the use of a proxy is
> recommended. For example:

> : (boot) boot;

> FICL

> FICL is a Forth interpreter written in C, in the form of a forth virtual
> machine library that can be called by C functions and vice versa.

> In loader, each line read interactively is then fed to FICL, which may call
> loader back to execute the builtin words. The builtin include will also feed
> FICL, one line at a time.

> The words available to FICL can be classified into four groups. The ANS
> Forth standard words, extra FICL words, extra FreeBSD words, and the builtin
> commands; the latter were already described. The ANS Forth standard words
> are listed in the STANDARDS section. The words falling in the two other
> groups are described in the following subsections.

[https://www.freebsd.org/cgi/man.cgi?query=loader&apropos=0&s...](https://www.freebsd.org/cgi/man.cgi?query=loader&apropos=0&sektion=8&manpath=FreeBSD+12.1-RELEASE+and+Ports&arch=default&format=html)

~~~
yjftsjthsd-h
Forth seems to have made its way into a lot of boot environments; it was also
used for open firmware

------
caramelsuit
noice!

