
Creating a language using only assembly language - nineties
https://speakerdeck.com/nineties/creating-a-language-using-only-assembly-language
======
cbasoftware
Ok. I'll toot my own horn.

In ~1983 I wrote the first iteration of Sensible Solutions for O'Hanlon
Computer Systems using MASM. Of course, it had to run in less than 64k.
Basically, it was a VM that would run pseudo code. And, the compiler was
written in ASM also.

In 1985 I wrote TAS, again using MASM. Again, a VM. The language it compiled
(in both cases) was written to create business accounting applications, which
I also wrote. Originally I called them Level 1, 2 and 3 where Level 2 was
basic bookkeeping and Level 3 was a full blown Accounting package including
invoicing, purchasing, etc.

Unfortunately, this was during the time of dBase and I got clobbered in
Infoworld for not being like dBase. So, I wrote a compiler for dBase and
called it dBFast. Sold a lot of copies through Egghead and others.

In all three of these we made a ton of money and, even more, created a whole
group of people who became "pseudo programmers." I say pseudo because they
could program in TAS, not necessarily in anything else.

I was very proud of what I did and hope there is one more language in me to
make programming on the web significantly easier than it is now.

If you've never programmed in assembler I recommend it to anyone. Just try
something simple. Until you do, you won't appreciate what we have today.
Especially if you're running on CP/M or early DOS!

~~~
macjohnmcc
I used to do some assembly on 68000, 6502 and some Z80. It's nice to have
those simpler CPUs and the simpler hardware to learn an assembly language on.

~~~
protomyth
I did 6502 in high school from a book and later did 6809 (EE class) and 8088
in college. The funny part was our compiler target for the CSci Compiler
Course was an IBM 370 on which they taught the required assembler course. Of
that group I liked the 6809 best. The 370 was ok and it sure had a lot more
registers. We didn't write our compilers in assembly, but instead used the
department's chosen language: Modula-2. It was very odd translating from the
dragon book to Modula-2.

~~~
friendzis
NB: Just a comment, could not (yet) watch the video. Our course was very
simple with 8051 compiled with IAR, IIRC. I hated the experience: quite a lot
of logic is backwards and you have to come up with something like ABI in order
not to make huge mess.

Anyway, this actually skyrocketed my understanding of C: C is nothing more
than (rather thin, I would say) wrappers around assembly, which in turn is
wrappers around machine instructions. That's why C is fast. That's why you
cannot have first-class functions, return multiple (compile time unknown)
values, etc..

But nothing beats spending 10+ hours debugging simple I2C (or SPI, can't
remember) baremetal ARM program, first weeding out vendor libs, then actually
diving into assembly only to find out that CPU is buggy.

~~~
jwdunne
This is similar to what I experienced, sort of like an enlightenment moment.
In C, I see a thin veneer over assembly, but that veneer has been designed to
look thicker than it actually is. I came to appreciate the abstraction and now
understand why it's so long lived.

------
ORioN63
Step 1: Make a language high-level enough to build a parser without being a
masochist.

Step 2: Make a lisp.

Step 3: Do it all in lisp.

Step 4: Profit.

Even
Amber([https://github.com/nineties/amber](https://github.com/nineties/amber))
and its syntax looks a lot like lisp concept of grammar. But instead of (s
expressions) you've got what is very similar to M[expressions]
([https://en.wikipedia.org/wiki/M-expression](https://en.wikipedia.org/wiki/M-expression)).

It's amazing what you can with Lisp.

~~~
WildUtah
Parsing LISP in assembly isn't even so hard. Without a lot of syntactic
symbols, you're just finding whitespace and parens. Even syntactic sugar like
' for quote and , or @ in macros are pretty simple to parse compared to C or
-- code gods forbid -- C++.

What I'd dread is writing the garbage collector in assembly. Maybe you could
rely on reference counting, but that could still be a mess.

~~~
Arelius
> What I'd dread is writing the garbage collector in assembly.

Then don't. If you're just building a lisp just so you can bootstrap your lisp
compiler in lisp, the free-everything-and-quit should be a good enough GC
strategy, and quite quick too.

~~~
geon
Especially with 16 GiB of ram...

------
vidarh
I love this. My first (toy) compiler was bootstrapped a similar way, though
much more ad-hoc (I didn't write a formal grammar until I was well into it).
Back then a lot of compilers were written in assembler, so it wasn't such an
unusual starting point - I assume I was drawing inspiration from e.g. PDQ (a
public domain Pascal implementation for the Amiga) and others that I think I
would have seen first.

I started basically by having the compiler do _very_ basic parsing of M68k
assembler and just pass everything that looked like assembler straight
through.

Then I added support for defining functions, but all the content of functions
was still assembler. Then I started adding expression support and things like
local variable and types.

I wish I still had the source (lost in a move, long after I stopped doing
anything on it) - one of the fun things about it was that I kept the ability
to intersperse assembler instructions everywhere - they were treated as normal
statements. And the M68k registers were first class variables in the language
that could occur in any expression. So e.g. you could write "D0.w = D1.w + a",
where D0.w would refer to the low 16 bits of the D0 register, and D1.w the low
16 bits of the D1 register, and "a" would be a variable allocated on the
stack.

Basically the features I added were largely guided by what seemed like it'd
let me shave more lines off the compiler itself...

You learn a lot about the language you're writing when you bootstrap from asm
or try to condence everything into a tiny core - a lot dependencies in the
language that are non-obvious becomes a lot clearer.

~~~
niklasni1
> one of the fun things about it was that I kept the ability to intersperse
> assembler instructions everywhere - they were treated as normal statements.
> And the M68k registers were first class variables in the language that could
> occur in any expression.

That sounds like so much horror and so much fun at the same time.

~~~
pjc50
BBC Micro Basic let you interpose assembler in the middle of the program -
although it was a two-pass assembler and you had to call both passes
seperately with a small FOR loop if you wanted labels to work.

~~~
bbcbasic
I remember that! I wrote a short assembler program on a BBC to control a
milling machine.

------
tptacek
This is the coolest thing I've seen posted on HN in awhile.

The title might not do it justice; the point is: there's very little assembly
language involved, if you're careful.

~~~
agumonkey
I smell bootstrap. #onto-reading

ps: I remember an article with a similar structure but using lambda calculus.
Starting with very poor subset and gradually expanding features... Too bad I
can't find it again.

~~~
baldfat
Is it this project? [http://matt.might.net/articles/implementing-a-
programming-la...](http://matt.might.net/articles/implementing-a-programming-
language/)

~~~
agumonkey
Nope, I remember numbered evaluators like lc-0 lc-1. Thanks anyway.

~~~
Jtsummers
Possibly this: [https://github.com/zlizta/LC](https://github.com/zlizta/LC)?

~~~
agumonkey
Arf, it was a blog post, and IIRC in sexp syntax. Good find anyway.

------
Jtsummers
Very neat. If you find this interesting you may also like jonesforth, another
project developing a language starting from assembly.

[http://git.annexia.org/?p=jonesforth.git;a=summary](http://git.annexia.org/?p=jonesforth.git;a=summary)

~~~
vidarh
I second that; Forth is extremely well suited to bootstrap from asm too.

------
jflatow
This is wonderful and impressive.

For others curious how long it took: judging from the first commit on GitHub
([https://github.com/nineties/amber/commit/bdb49d27968d6bc4588...](https://github.com/nineties/amber/commit/bdb49d27968d6bc458827394d4a9d6779adcb9ba)),
which references a first commit from 2009, it seems at least 6 years.

------
nickpsecurity
I love seeing this because it matches my recommendation for redoing the stack
post-Snowden. I gave two options: (a) Wirth-style [1] with assembler -> high-
level assembler -> Modula-2-like language -> safe Oberon-like language -> 4GL-
like batteries included language; (b) VLISP-like [2] setup with assembler ->
high-level assembler -> LISP interpreter -> PreScheme compiler -> integrated
PreScheme/LISP/assembler system -> AOT or JIT compiler for full LISP.

This is kind of like a mix between the two. I like how the author illustrates
each step well. The best illustration is showing how easily the core language
can transform into a mainstream-grade language with extensible syntax and
macro's. A strength worth copying in any new language albeit with guidelines
on proper use. I bet it was all pretty fun, too.

[1]
[http://www.cfbsoftware.com/modula2/Lilith.pdf](http://www.cfbsoftware.com/modula2/Lilith.pdf)

[2]
[http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=E0F...](http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=E0F133C9131F36924AD1BD7A9E806B5A?doi=10.1.1.36.9989&rep=rep1&type=pdf)

------
k1w1
Speaking of languages in assembly. Here is a Java virtual machine in assembly
language:

[https://github.com/k1w1/javelin-
stamp/blob/master/asm/javeli...](https://github.com/k1w1/javelin-
stamp/blob/master/asm/javelin.src)

Until I looked at it again now I had forgotten my favorite instruction, a nop
to avoid a bug in the silicon:

[https://github.com/k1w1/javelin-
stamp/blob/master/asm/javeli...](https://github.com/k1w1/javelin-
stamp/blob/master/asm/javelin.src#L2291)

~~~
jacquesm
That's for the SX microcontroller.

------
Galanwe
Really refreshing article. Huge kudos for that great achievement! Seems like
you had a lot of fun.

------
jhallenworld
I wrote BASIC in VAX assembly as a freshman. It compiled each line to byte-
code (like Atari BASIC), and used a hand-written recursive descent parser. I
remember using the VAX SKIP and SPAN instructions when dealing with strings.

Sadly my only record of it was a fading fan-fold listing which is now gone.

Looking at the presentation: I really should have learned LISP earlier than I
did..

~~~
sklogic
Programming in MACRO-32 was more like using a high-level language, did not
feel like an assembly at all.

------
kevin_thibedeau
I recently needed to make a cross platform assembler for a niche 8-bit micro
that only has DOS tooling. After cloning it in a small bit of Python I was
looking for a simple way to extend it with macros. It turns out that m4 works
really well as an asm macro language and I now have advanced facilities like a
code generating expression evaluator and high level control flow statements
without any significant change to the core assembler. It blows the doors off
what contemporary assemblers (nasm, masm, tasm) can do and hearkens back to
the high point of high level assemblers when they peaked in the 70s.

------
kabdib
In the late 70s and early 80s the BASIC interpreters in personal computers
were awesome exercises in assembly language.

There's a fantastic description of How Atari BASIC works. It's pretty
sophisticated for something that doesn't really JIT, and the code fits in
8-10K. Link:
[http://users.telenet.be/kim1-6502/6502/p1.html](http://users.telenet.be/kim1-6502/6502/p1.html)

I was saddened by the lack of performance and sophistication of BASIC
implementations on later computers. By comparison they were pedestrian, slow,
buggy and harder to use.

(I haven't used BASIC since 1981, with the exception of a stint of Visual
Basic in the mid 90s that was . . . eye-opening and really quite positive).

------
Nate75Sanders
Awesome post.

I like the assumption that he didn't care that his "rlci" lisp interpreter
didn't garbage collect _at all_ because it was just a way to get to the next
step and he knew it was going to have enough RAM to do so.

------
diginux
I wrote my own Lisp compiler in college using this as the foundation:
[http://schemeworkshop.org/2006/11-ghuloum.pdf](http://schemeworkshop.org/2006/11-ghuloum.pdf)

~~~
AlexeyBrin
Thanks for this. Do you have a link to the extended version of the article
mentioned at the end of the article. The provided link is dead now.

~~~
diginux
Unfortunately, I do not. I would try contacting the author, you never know.

------
smitec
Does anyone have any good resource for someone looking to try this sort of
thing? Preferably aimed at someone with High Level + C/ASM knowledge looking
to apply it in this direction.

------
tluyben2
I'm implementing a Forth-like language in Z80 assembler. I implemented a Basic
when I was younger (... +-30 years ago) and I must say it's quite painful to
do... Even an emulator which makes crashing painless, it feels harder than I
remember it back in the day.

------
sudeepj
I am trying to learn assembly myself (for fun) and really appreciate this.
Cool stuff!

------
vmorgulis
This is really great.

I agree with comments about Forth.

In this kind of projects, there is also Urbit
([https://github.com/urbit](https://github.com/urbit)).

~~~
iyn
This is the first time I see Urbit, looks interesting. Do you have some
context about the project? Project homepage
([http://urbit.org/](http://urbit.org/)) doesn't seem to explain anything and
Github page just describes the language.

~~~
vmorgulis
The author is Curtis Yarvin
([https://vimeo.com/75312418](https://vimeo.com/75312418)).

You can search also for "Moldbug". He has an old blog.

He is also a very controversial guy with libertarian ideas.

His project is very neat with distributed processes called "ships" or
"submarines" with rights based on reputation.

It's a little bit like Erlang processes but with an anti-spam security model.

~~~
graycoder
Had a few subs a while back myself... :)

------
faragon
That's amazing. Doing the first step in assembly instead of C is quite
overkill, in my opinion, but the whole thing is impressive. Kudos.

------
tankfeeder
Picolisp's author created his own assembler to implement 64bit version of
picolisp.

------
zinkem
This should be a yawn for those with web and mobile dev experience, people I'd
expect to already understand virtual machines and fractured encodings.

