We did the same thing for the Jailbreak CTF at Starfighter (we compile to AVR, though, and "ported" it to Go).
If I'm right, the big thing this will be missing (apart from structs, integer types, for loops, arrays, function pointers, &c) is register allocation. C4 compiles to a stack virtual machine, which evaluates somewhat like Java. That's fine for an interpreted runtime, but to run code on an actual ARM, you're going to want to either replace the codegen or (what we did) postprocess it into SSA form and do register allocation.
C4 is a really beautiful piece of code, but I don't think it was intended as a real starting point for compiler development. Everything we did to it to make it work more realistically made me feel like we were ruining it and missing its point.
Oh, neat, they added struct support! :)
Everything we did to it to make it work more realistically made me feel like we were ruining it and missing its point.
On the contrary, I think the parser (and tokeniser) is the most interesting part, as the codegen in C4 was basically "for free" since it generates as it parses. The parser is amazingly simple yet featureful for its size, and that's what makes it a good starting point. It could easily be made to generate AST nodes instead of stack instructions, and then you have the beginnings of a "real" compiler. The simplicity makes it easy to start "hacking on" and extend/modify, because it's straightforward to understand where everything is.
The one idea I have for the parser is to refactor it into being table-driven indexed on the precedence levels, instead of the large switch with lots of very similar code in each case. At a glance it looks like this version can compile itself, and already supports structures, arrays of structures, and maybe function pointers.
In any case it's great to see more little compilers that are so close to "real" ones in functionality.
Edit: yes, it does self-compile. From the Makefile:
./amacc amacc.c tests/hello.c
Just to be a little clearer about my concern about the C4 codebase:
C4 isn't just written in its idiosyncratic style in order to be smaller; it's also designed to compile the minimal subset of C required to self-host. For instance, global variables and, in particular, global arrays are there because C4 didn't parse structs.
This compiler is inheriting those design decisions, but has discarded the goal of compiling a minimal subset of C, so its design is a little incoherent. (Ours was too!)
C4 is more like a piece of sculpture than it is a real compiler.