Hacker News new | past | comments | ask | show | jobs | submit login
Creating a language using only assembly language (speakerdeck.com)
347 points by nineties on June 11, 2015 | hide | past | web | favorite | 65 comments

Ok. I'll toot my own horn.

In ~1983 I wrote the first iteration of Sensible Solutions for O'Hanlon Computer Systems using MASM. Of course, it had to run in less than 64k. Basically, it was a VM that would run pseudo code. And, the compiler was written in ASM also.

In 1985 I wrote TAS, again using MASM. Again, a VM. The language it compiled (in both cases) was written to create business accounting applications, which I also wrote. Originally I called them Level 1, 2 and 3 where Level 2 was basic bookkeeping and Level 3 was a full blown Accounting package including invoicing, purchasing, etc.

Unfortunately, this was during the time of dBase and I got clobbered in Infoworld for not being like dBase. So, I wrote a compiler for dBase and called it dBFast. Sold a lot of copies through Egghead and others.

In all three of these we made a ton of money and, even more, created a whole group of people who became "pseudo programmers." I say pseudo because they could program in TAS, not necessarily in anything else.

I was very proud of what I did and hope there is one more language in me to make programming on the web significantly easier than it is now.

If you've never programmed in assembler I recommend it to anyone. Just try something simple. Until you do, you won't appreciate what we have today. Especially if you're running on CP/M or early DOS!

> If you've never programmed in assembler I recommend it to anyone. Just try something simple. Until you do, you won't appreciate what we have today. Especially if you're running on CP/M or early DOS!

I found very pleasant write a clone of Wozniak's machine code for my toy CPU/Computer

I used to do some assembly on 68000, 6502 and some Z80. It's nice to have those simpler CPUs and the simpler hardware to learn an assembly language on.

I did 6502 in high school from a book and later did 6809 (EE class) and 8088 in college. The funny part was our compiler target for the CSci Compiler Course was an IBM 370 on which they taught the required assembler course. Of that group I liked the 6809 best. The 370 was ok and it sure had a lot more registers. We didn't write our compilers in assembly, but instead used the department's chosen language: Modula-2. It was very odd translating from the dragon book to Modula-2.

NB: Just a comment, could not (yet) watch the video. Our course was very simple with 8051 compiled with IAR, IIRC. I hated the experience: quite a lot of logic is backwards and you have to come up with something like ABI in order not to make huge mess.

Anyway, this actually skyrocketed my understanding of C: C is nothing more than (rather thin, I would say) wrappers around assembly, which in turn is wrappers around machine instructions. That's why C is fast. That's why you cannot have first-class functions, return multiple (compile time unknown) values, etc..

But nothing beats spending 10+ hours debugging simple I2C (or SPI, can't remember) baremetal ARM program, first weeding out vendor libs, then actually diving into assembly only to find out that CPU is buggy.

This is similar to what I experienced, sort of like an enlightenment moment. In C, I see a thin veneer over assembly, but that veneer has been designed to look thicker than it actually is. I came to appreciate the abstraction and now understand why it's so long lived.

Argh! Modula-2, that brings back memories. They were still teaching it as an introductory language when I went to uni in 1997. The language seemed a tad irrelevant.

It wasn't bad and did well to teach a lot of concepts, but it being on an IBM 370 made for some pain. I will say XEDIT did have some ok features but was a painful experience overall.

I've always had a soft spot for assembler code. I've been writing hobby kernels for almost a decade now ala Plan9 - haven't every really taken any of them far enough to be useful, but it's a rich journey of discovery. I love discovery.

Now there is a language I haven't heard about in a very long time. I actually briefly worked for a company that developed a whole system with TAS, that they then sold to optometrists.

Step 1: Make a language high-level enough to build a parser without being a masochist.

Step 2: Make a lisp.

Step 3: Do it all in lisp.

Step 4: Profit.

Even Amber(https://github.com/nineties/amber) and its syntax looks a lot like lisp concept of grammar. But instead of (s expressions) you've got what is very similar to M[expressions] (https://en.wikipedia.org/wiki/M-expression).

It's amazing what you can with Lisp.

Absolutely. Two languages that anybody who wants to truly understand computing, programming languages, and compilers are Lisp and Forth.

- Lisp is a fundamental, extensible, simple, unbounded AST system (and brings in interesting lambda calculus).

- Forth is a fundamental, extensible, simple, unbounded expression evaluator.

From there, the world is yours.

(Note I use "simple" in Rich Hickey's sense of simple vs easy)

Yeah, it's kind of a cheat. A real language built in assembler would be headaches and I bet a HLL prototype would be hidden somewhere. HLA might be doable, though, especially if macro's are used diligently for pre-optimization work.

Competition has gone down from the 70's and 80's, though. I remember reading a Scheme paper that builds a software interpreter then implements the thing in hardware too. There were also people doing HLL-to-microcode compilers to cheat extra performance out of their chips. These days most projects stop at assembler. I'd love to see people whip out their FPGA's and start trying to outdo 70's-80's era innovation in hardware/software designs maybe for modern languages.

One of my concepts collecting dust is the Secure Python Target that merges a capability machine, LISP-like hardware at core for native bytecodes, and compiler from Python to that language. Result could be quite productive, reliable, and secure. Another was a Secure Haskell (or ML/Ocaml) Target. I don't know that kind of stuff as well, though. Python has enough documentation, community, and simplicity (to a degree) that academics could port some of its core to hardware.

"A real language built in assembler would be headaches"

Depending on what you consider a "real language", it's not that bad. I did a Forth in 80x86 assembler years ago, and it was reasonably easy and fun.

(I did use the BIOS routines for keyboard/screen I/O...they were one character at a time, IIRC).

It does depend on the definition. I'd disqualify Forth as a real language, too. I've always considered it a cross-platform, macro assembler for an abstract, machine design. Like P-code was but more flexible. It's definitely interesting and useful but doesn't feel like a 3GL or even a full LISP. I'd see it as a bootstrap (firmware) or compilation target (eg BootSafe) of a high-level language.

Cool that you assembled one, though. (pun intended)

Forth has an ANSI standard as a programming language. What more do you need? What features does a language require to make it a 3GL?

Hmm. I was introduced to the term 3GL to mean languages such as ALGOL, PL/1, Fortran, Pascal, C, and so on. Substantially more structured or English-like than assembler. Most definitions say they take care of details unnecessary to writing the programming logic.

So, I just looked up the wikipedia article to find the same languages but also that anything above assembler is technically a 3GL. Even HLA is a 3GL by that standard. So, Forth is... at the lowest rung... a 3GL. By that standard, so are macro-assemblers with control flow and typing constructs. Personally, I think that definition is crap and we should reserve 3GL for languages that significantly raised abstraction like those I cited.

This is just a matter of opinion, though. Each will see it a different way and there's no right answer. That's what I've learned researching after your comment.

It's not that hard to write a "real language" compiler in assembler, lots of people have done so. Amiga-E is a good example for which source code is available.

A simple, Wirth-type compiler is very straight-forward to write for most simple ALGOL-family languages.

By Wirth-type compiler, I mean recursive descent parser with direct code generation (no AST). Single pass if the language allows it.

You need a handful of utility functions for lexical analysis/tokenization, symbol table management, and for outputting code, and then most of the rest is calling subroutines and simple control flow.

In fact, some parts, like code generation, can be made very simple in assembler, because you can directly inline code fragments and in effect use them as "templates".

I'll be darned: your example [1] is indeed a real langauge written in M68K assembly. Also one I'd never heard of so thanks. Old guard's tenacity to get stuff done with constrained hardware and software continues to impress me.

Regarding compiler, yeah I was thinking of the complexity of one with AST and optimizations because who would use it without that? That would be an impressive compiler in assembler. Amiga-E is impressive enough, for now, while also proving the concept. Good points as well on how an assembler style could easily use templates and such during code gen phase. I think the analysis and transform phases might be the hardest in an optimizing compiler. Just a guess, though.

[1] https://en.wikipedia.org/wiki/Amiga_E

Yeah, even operating systems it seems. :)

Parsing LISP in assembly isn't even so hard. Without a lot of syntactic symbols, you're just finding whitespace and parens. Even syntactic sugar like ' for quote and , or @ in macros are pretty simple to parse compared to C or -- code gods forbid -- C++.

What I'd dread is writing the garbage collector in assembly. Maybe you could rely on reference counting, but that could still be a mess.

> What I'd dread is writing the garbage collector in assembly.

Then don't. If you're just building a lisp just so you can bootstrap your lisp compiler in lisp, the free-everything-and-quit should be a good enough GC strategy, and quite quick too.

Especially with 16 GiB of ram...

> What I'd dread is writing the garbage collector in assembly. Maybe you could rely on reference counting, but that could still be a mess.

You don't have to write the final Lisp in assembly; you write the first Lisp in assembly, and the final Lisp in Lisp.

Sure. At a naive (but workable) level, you could just use assembly to link all the free space up into one humongoid list at startup, then operate on that using Lisp (either directly, or by having a later stage in the startup transform this rudimentary free list into something more sophisticated).

> What I'd dread is writing the garbage collector in assembly.

There are very simple garbage collectors out there, GC doesn't necessarily mean generational mark-sweep-compact or whatever. Semispace collectors are pretty simple and don't even need a stack: https://en.wikipedia.org/wiki/Cheney%27s_algorithm

That only works if you don't have self referential structures though.

(Disclaimer: I've never actually used one /or/ seen an example of one being used... (See, those AI Koans do have useful tidbits of knowledge in them!)).

I love this. My first (toy) compiler was bootstrapped a similar way, though much more ad-hoc (I didn't write a formal grammar until I was well into it). Back then a lot of compilers were written in assembler, so it wasn't such an unusual starting point - I assume I was drawing inspiration from e.g. PDQ (a public domain Pascal implementation for the Amiga) and others that I think I would have seen first.

I started basically by having the compiler do very basic parsing of M68k assembler and just pass everything that looked like assembler straight through.

Then I added support for defining functions, but all the content of functions was still assembler. Then I started adding expression support and things like local variable and types.

I wish I still had the source (lost in a move, long after I stopped doing anything on it) - one of the fun things about it was that I kept the ability to intersperse assembler instructions everywhere - they were treated as normal statements. And the M68k registers were first class variables in the language that could occur in any expression. So e.g. you could write "D0.w = D1.w + a", where D0.w would refer to the low 16 bits of the D0 register, and D1.w the low 16 bits of the D1 register, and "a" would be a variable allocated on the stack.

Basically the features I added were largely guided by what seemed like it'd let me shave more lines off the compiler itself...

You learn a lot about the language you're writing when you bootstrap from asm or try to condence everything into a tiny core - a lot dependencies in the language that are non-obvious becomes a lot clearer.

> one of the fun things about it was that I kept the ability to intersperse assembler instructions everywhere - they were treated as normal statements. And the M68k registers were first class variables in the language that could occur in any expression.

That sounds like so much horror and so much fun at the same time.

It was awesome for slowly migrating the compiler itself from assembler, and also for things like interfacing with the OS - I could write all the glue code inline.

But yes, it was easy to shoot your foot off with it. The main saving grace was that compared to i386, the M68k architecture has plenty of general purpose registers - 8 data registers and 8 address registers (including the stack pointer), so it was reasonably easy to avoid clobbering registers by having some strict rules about which registers were used for what combined with a very simple extra pass to the register allocator that'd mark any registers that were mentioned by name in a function as off limits.

It actually let me defer adding "real" local variables for quite some time since I could simply use the registers.

BBC Micro Basic let you interpose assembler in the middle of the program - although it was a two-pass assembler and you had to call both passes seperately with a small FOR loop if you wanted labels to work.

I remember that! I wrote a short assembler program on a BBC to control a milling machine.

This is the coolest thing I've seen posted on HN in awhile.

The title might not do it justice; the point is: there's very little assembly language involved, if you're careful.

I smell bootstrap. #onto-reading

ps: I remember an article with a similar structure but using lambda calculus. Starting with very poor subset and gradually expanding features... Too bad I can't find it again.

Nope, I remember numbered evaluators like lc-0 lc-1. Thanks anyway.

Arf, it was a blog post, and IIRC in sexp syntax. Good find anyway.

Very neat. If you find this interesting you may also like jonesforth, another project developing a language starting from assembly.


I second that; Forth is extremely well suited to bootstrap from asm too.

That's what I was thinking of too; glad you mentioned it. jonesforth consists of 2 files: an assembly language file which implements the minimal commands to be able to load the rest of the language (in forth) from the second file. It is written in literate programming style so that the comments are the documentation which teaches you all about it.

Since then I've often daydreamed about implementing a forth OS on something like the Raspberry Pi (just another project I might not get around to). I'm looking forward to reading the linked article as well.

This is beautiful, thank you for the link!

This is wonderful and impressive.

For others curious how long it took: judging from the first commit on GitHub (https://github.com/nineties/amber/commit/bdb49d27968d6bc4588...), which references a first commit from 2009, it seems at least 6 years.

I love seeing this because it matches my recommendation for redoing the stack post-Snowden. I gave two options: (a) Wirth-style [1] with assembler -> high-level assembler -> Modula-2-like language -> safe Oberon-like language -> 4GL-like batteries included language; (b) VLISP-like [2] setup with assembler -> high-level assembler -> LISP interpreter -> PreScheme compiler -> integrated PreScheme/LISP/assembler system -> AOT or JIT compiler for full LISP.

This is kind of like a mix between the two. I like how the author illustrates each step well. The best illustration is showing how easily the core language can transform into a mainstream-grade language with extensible syntax and macro's. A strength worth copying in any new language albeit with guidelines on proper use. I bet it was all pretty fun, too.

[1] http://www.cfbsoftware.com/modula2/Lilith.pdf

[2] http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=E0F...

Speaking of languages in assembly. Here is a Java virtual machine in assembly language:


Until I looked at it again now I had forgotten my favorite instruction, a nop to avoid a bug in the silicon:


That's for the SX microcontroller.

Really refreshing article. Huge kudos for that great achievement! Seems like you had a lot of fun.

I wrote BASIC in VAX assembly as a freshman. It compiled each line to byte-code (like Atari BASIC), and used a hand-written recursive descent parser. I remember using the VAX SKIP and SPAN instructions when dealing with strings.

Sadly my only record of it was a fading fan-fold listing which is now gone.

Looking at the presentation: I really should have learned LISP earlier than I did..

Programming in MACRO-32 was more like using a high-level language, did not feel like an assembly at all.

I recently needed to make a cross platform assembler for a niche 8-bit micro that only has DOS tooling. After cloning it in a small bit of Python I was looking for a simple way to extend it with macros. It turns out that m4 works really well as an asm macro language and I now have advanced facilities like a code generating expression evaluator and high level control flow statements without any significant change to the core assembler. It blows the doors off what contemporary assemblers (nasm, masm, tasm) can do and hearkens back to the high point of high level assemblers when they peaked in the 70s.

In the late 70s and early 80s the BASIC interpreters in personal computers were awesome exercises in assembly language.

There's a fantastic description of How Atari BASIC works. It's pretty sophisticated for something that doesn't really JIT, and the code fits in 8-10K. Link: http://users.telenet.be/kim1-6502/6502/p1.html

I was saddened by the lack of performance and sophistication of BASIC implementations on later computers. By comparison they were pedestrian, slow, buggy and harder to use.

(I haven't used BASIC since 1981, with the exception of a stint of Visual Basic in the mid 90s that was . . . eye-opening and really quite positive).

Awesome post.

I like the assumption that he didn't care that his "rlci" lisp interpreter didn't garbage collect at all because it was just a way to get to the next step and he knew it was going to have enough RAM to do so.

I wrote my own Lisp compiler in college using this as the foundation: http://schemeworkshop.org/2006/11-ghuloum.pdf

Thanks for this. Do you have a link to the extended version of the article mentioned at the end of the article. The provided link is dead now.

Unfortunately, I do not. I would try contacting the author, you never know.

Does anyone have any good resource for someone looking to try this sort of thing? Preferably aimed at someone with High Level + C/ASM knowledge looking to apply it in this direction.

I'm implementing a Forth-like language in Z80 assembler. I implemented a Basic when I was younger (... +-30 years ago) and I must say it's quite painful to do... Even an emulator which makes crashing painless, it feels harder than I remember it back in the day.

I am trying to learn assembly myself (for fun) and really appreciate this. Cool stuff!

This is really great.

I agree with comments about Forth.

In this kind of projects, there is also Urbit (https://github.com/urbit).

This is the first time I see Urbit, looks interesting. Do you have some context about the project? Project homepage (http://urbit.org/) doesn't seem to explain anything and Github page just describes the language.

The author is Curtis Yarvin (https://vimeo.com/75312418).

You can search also for "Moldbug". He has an old blog.

He is also a very controversial guy with libertarian ideas.

His project is very neat with distributed processes called "ships" or "submarines" with rights based on reputation.

It's a little bit like Erlang processes but with an anti-spam security model.

Had a few subs a while back myself... :)

That's amazing. Doing the first step in assembly instead of C is quite overkill, in my opinion, but the whole thing is impressive. Kudos.

Picolisp's author created his own assembler to implement 64bit version of picolisp.

This should be a yawn for those with web and mobile dev experience, people I'd expect to already understand virtual machines and fractured encodings.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact