In ~1983 I wrote the first iteration of Sensible Solutions for O'Hanlon Computer Systems using MASM. Of course, it had to run in less than 64k. Basically, it was a VM that would run pseudo code. And, the compiler was written in ASM also.
In 1985 I wrote TAS, again using MASM. Again, a VM. The language it compiled (in both cases) was written to create business accounting applications, which I also wrote. Originally I called them Level 1, 2 and 3 where Level 2 was basic bookkeeping and Level 3 was a full blown Accounting package including invoicing, purchasing, etc.
Unfortunately, this was during the time of dBase and I got clobbered in Infoworld for not being like dBase. So, I wrote a compiler for dBase and called it dBFast. Sold a lot of copies through Egghead and others.
In all three of these we made a ton of money and, even more, created a whole group of people who became "pseudo programmers." I say pseudo because they could program in TAS, not necessarily in anything else.
I was very proud of what I did and hope there is one more language in me to make programming on the web significantly easier than it is now.
If you've never programmed in assembler I recommend it to anyone. Just try something simple. Until you do, you won't appreciate what we have today. Especially if you're running on CP/M or early DOS!
I found very pleasant write a clone of Wozniak's machine code for my toy CPU/Computer
Anyway, this actually skyrocketed my understanding of C: C is nothing more than (rather thin, I would say) wrappers around assembly, which in turn is wrappers around machine instructions. That's why C is fast. That's why you cannot have first-class functions, return multiple (compile time unknown) values, etc..
But nothing beats spending 10+ hours debugging simple I2C (or SPI, can't remember) baremetal ARM program, first weeding out vendor libs, then actually diving into assembly only to find out that CPU is buggy.
Step 2: Make a lisp.
Step 3: Do it all in lisp.
Step 4: Profit.
Even Amber(https://github.com/nineties/amber) and its syntax looks a lot like lisp concept of grammar. But instead of (s expressions) you've got what is very similar to M[expressions] (https://en.wikipedia.org/wiki/M-expression).
It's amazing what you can with Lisp.
- Lisp is a fundamental, extensible, simple, unbounded AST system (and brings in interesting lambda calculus).
- Forth is a fundamental, extensible, simple, unbounded expression evaluator.
From there, the world is yours.
(Note I use "simple" in Rich Hickey's sense of simple vs easy)
Competition has gone down from the 70's and 80's, though. I remember reading a Scheme paper that builds a software interpreter then implements the thing in hardware too. There were also people doing HLL-to-microcode compilers to cheat extra performance out of their chips. These days most projects stop at assembler. I'd love to see people whip out their FPGA's and start trying to outdo 70's-80's era innovation in hardware/software designs maybe for modern languages.
One of my concepts collecting dust is the Secure Python Target that merges a capability machine, LISP-like hardware at core for native bytecodes, and compiler from Python to that language. Result could be quite productive, reliable, and secure. Another was a Secure Haskell (or ML/Ocaml) Target. I don't know that kind of stuff as well, though. Python has enough documentation, community, and simplicity (to a degree) that academics could port some of its core to hardware.
Depending on what you consider a "real language", it's not that bad. I did a Forth in 80x86 assembler years ago, and it was reasonably easy and fun.
(I did use the BIOS routines for keyboard/screen I/O...they were one character at a time, IIRC).
Cool that you assembled one, though. (pun intended)
So, I just looked up the wikipedia article to find the same languages but also that anything above assembler is technically a 3GL. Even HLA is a 3GL by that standard. So, Forth is... at the lowest rung... a 3GL. By that standard, so are macro-assemblers with control flow and typing constructs. Personally, I think that definition is crap and we should reserve 3GL for languages that significantly raised abstraction like those I cited.
This is just a matter of opinion, though. Each will see it a different way and there's no right answer. That's what I've learned researching after your comment.
A simple, Wirth-type compiler is very straight-forward to write for most simple ALGOL-family languages.
By Wirth-type compiler, I mean recursive descent parser with direct code generation (no AST). Single pass if the language allows it.
You need a handful of utility functions for lexical analysis/tokenization, symbol table management, and for outputting code, and then most of the rest is calling subroutines and simple control flow.
In fact, some parts, like code generation, can be made very simple in assembler, because you can directly inline code fragments and in effect use them as "templates".
Regarding compiler, yeah I was thinking of the complexity of one with AST and optimizations because who would use it without that? That would be an impressive compiler in assembler. Amiga-E is impressive enough, for now, while also proving the concept. Good points as well on how an assembler style could easily use templates and such during code gen phase. I think the analysis and transform phases might be the hardest in an optimizing compiler. Just a guess, though.
What I'd dread is writing the garbage collector in assembly. Maybe you could rely on reference counting, but that could still be a mess.
Then don't. If you're just building a lisp just so you can bootstrap your lisp compiler in lisp, the free-everything-and-quit should be a good enough GC strategy, and quite quick too.
You don't have to write the final Lisp in assembly; you write the first Lisp in assembly, and the final Lisp in Lisp.
There are very simple garbage collectors out there, GC doesn't necessarily mean generational mark-sweep-compact or whatever. Semispace collectors are pretty simple and don't even need a stack: https://en.wikipedia.org/wiki/Cheney%27s_algorithm
(Disclaimer: I've never actually used one /or/ seen an example of one being used... (See, those AI Koans do have useful tidbits of knowledge in them!)).
I started basically by having the compiler do very basic parsing of M68k assembler and just pass everything that looked like assembler straight through.
Then I added support for defining functions, but all the content of functions was still assembler. Then I started adding expression support and things like local variable and types.
I wish I still had the source (lost in a move, long after I stopped doing anything on it) - one of the fun things about it was that I kept the ability to intersperse assembler instructions everywhere - they were treated as normal statements. And the M68k registers were first class variables in the language that could occur in any expression. So e.g. you could write "D0.w = D1.w + a", where D0.w would refer to the low 16 bits of the D0 register, and D1.w the low 16 bits of the D1 register, and "a" would be a variable allocated on the stack.
Basically the features I added were largely guided by what seemed like it'd let me shave more lines off the compiler itself...
You learn a lot about the language you're writing when you bootstrap from asm or try to condence everything into a tiny core - a lot dependencies in the language that are non-obvious becomes a lot clearer.
That sounds like so much horror and so much fun at the same time.
But yes, it was easy to shoot your foot off with it. The main saving grace was that compared to i386, the M68k architecture has plenty of general purpose registers - 8 data registers and 8 address registers (including the stack pointer), so it was reasonably easy to avoid clobbering registers by having some strict rules about which registers were used for what combined with a very simple extra pass to the register allocator that'd mark any registers that were mentioned by name in a function as off limits.
It actually let me defer adding "real" local variables for quite some time since I could simply use the registers.
The title might not do it justice; the point is: there's very little assembly language involved, if you're careful.
ps: I remember an article with a similar structure but using lambda calculus. Starting with very poor subset and gradually expanding features... Too bad I can't find it again.
Since then I've often daydreamed about implementing a forth OS on something like the Raspberry Pi (just another project I might not get around to). I'm looking forward to reading the linked article as well.
For others curious how long it took: judging from the first commit on GitHub (https://github.com/nineties/amber/commit/bdb49d27968d6bc4588...), which references a first commit from 2009, it seems at least 6 years.
This is kind of like a mix between the two. I like how the author illustrates each step well. The best illustration is showing how easily the core language can transform into a mainstream-grade language with extensible syntax and macro's. A strength worth copying in any new language albeit with guidelines on proper use. I bet it was all pretty fun, too.
Until I looked at it again now I had forgotten my favorite instruction, a nop to avoid a bug in the silicon:
Sadly my only record of it was a fading fan-fold listing which is now gone.
Looking at the presentation: I really should have learned LISP earlier than I did..
There's a fantastic description of How Atari BASIC works. It's pretty sophisticated for something that doesn't really JIT, and the code fits in 8-10K. Link: http://users.telenet.be/kim1-6502/6502/p1.html
I was saddened by the lack of performance and sophistication of BASIC implementations on later computers. By comparison they were pedestrian, slow, buggy and harder to use.
(I haven't used BASIC since 1981, with the exception of a stint of Visual Basic in the mid 90s that was . . . eye-opening and really quite positive).
I like the assumption that he didn't care that his "rlci" lisp interpreter didn't garbage collect at all because it was just a way to get to the next step and he knew it was going to have enough RAM to do so.
I agree with comments about Forth.
In this kind of projects, there is also Urbit (https://github.com/urbit).
You can search also for "Moldbug". He has an old blog.
He is also a very controversial guy with libertarian ideas.
His project is very neat with distributed processes called "ships" or "submarines" with rights based on reputation.
It's a little bit like Erlang processes but with an anti-spam security model.