Things like atom and symbols are functions in the runtime. A Lisp compiler only has to handle special forms, and function calls. If we see the compiler source code using a special operator that it doesn't handle, either directly using it or via macro expansion, then we know it's not yet self-hosting.
Handling specific functions is optional. If the compiler can compile a function call, it can compile a call to the + or car functions, which then have to be president in the run-time.
Functions like these are obvious targets for special recognition and inlining. Arithmetic code won't be as fast as it could be if every + has to be a call to a function, but it will work.
A compiled Lisp implementation can be bootstrapped to the point where the definition of car in the library looks like:
(defun car (x) (car x))
And similarly for some other functions. Then there are only two places in the system that know how to actually extract the car field: the compiler source code, and the corresponding compiler executable needed for bootstrapping.
In that case if you remove the car handling from the compiler then the system's self-hosting and bootstrapping ability breaks.
Of course! That's what I did last time I wrote a Lisp compiler, but this one seems a little more limited. The page explains:
> Finally, comp-funcall compiles code for function calls to the built-in functions, or a recursive call to the main function: (...)
(Emphasis mine.)
And none of his example functions calls any function other than the builtin ones (which, indeed, his compiler does inline) and itself. But it isn't obvious where the limitation comes from; the subroutine call is just a jal instruction to an assembler label, which you would think would work just as well to call another function as to make a recursive call.
Maybe the limitation is that the assembler only assembles a single function at a time, and he doesn't have a link-editing stage or a symbol table implicitly or explicitly shared between assembler calls. In that case it would be fairly simple to extend the compiler to support more general calls, as you reasonably but apparently incorrectly assumed this one already does.
Looking a bit at the assembler http://www.ulisp.com/list?31OE it looks like it only supports jumps to previously defined labels? In $jal and offset I don't see anything that resembles adding a relocation to a list of relocations for a label so it can be backpatched later. But I also don't see how it gets the numerical value for a label that it subtracts *pc* from in offset. In the compiler itself http://www.ulisp.com/list?4Y4Q it seems to be consing up lists of assembly instructions that eventually get evaled, which seems like a kind of janky way to invoke your assembler but whatever, but I don't see where the binding of label names to addresses happens. I can't find anything resembling a symbol table.
I think I'd make a few other changes, though. I'd add closure support and some kind of type tagging; right now it depends on knowing the types are compile time so it's kind of more like a Forth or C compiler with Lisp syntax. And I'd indirect the calls through a runtime-mutable symbol table so you could redefine a function without having to rewrite all the calls to the old function in existing code.
That's what I did last time I wrote a Lisp compiler, anyway; maybe it's not a good tradeoff on today's hardware anymore, since people presumably still only interactively load new definitions a few times a minute at most, but CPUs have gone from a MIPS to a hundred BIPS. So making all your function calls much slower by frustrating the CPU's branch predictor with a PLT in order to speed up relinking after an edit might no longer be a good tradeoff, even if it is what glibc does—glibc doesn't have FASLs.
The assembler is two-pass and the labels are simply local variables in the defcode form. They are assigned the value of the program counter in the first pass, and the assembler instructions are evaluated in the second pass. I got the idea from the assembler in the Acorn Atom, if anyone remembers that.
> So I think the compiler may not be able to compile calls to arbitrary functions?
A Lisp compiler will by default compile a call to ANY function as a "jump subroutine" (or as a non-returning jump in the case of a tail call) machine code call. The function then has to be present at runtime (in the runtime) and the code will call it at runtime. Lisp code by default also calls a global function through its symbol's cell -> late binding.
"supported by the compiler" here probably means that the compiler can generate inline code for these functions in various cases. Thus if the call is to the function 1+ and it knows that the argument is an integer number (and, possibly, the result also has to be an integer number), then it will not use a subroutine call via the global function, but will inline the call to an integer addition machine instruction. Obviously this then defeats late binding.
If a Lisp (for a tiny machine) only has fixnum integers as its single numeric data type, then the thing is simple -> every 1+ will get an integer argument and thus one can inline it. Inlining makes for tiny computers (which uLisp was developed for) only sense if the inlined function code isn't too long and doesn't use to much of the precious memory for the machine code. Inlining OTOH like will have a positive effect in reducing execution time and a negative effect by increasing compilation time.
In languages like Common Lisp or Scheme, which have several numeric types (fixnums, bignums, floats, ratios, complex, ...) this gets more complicated. Common Lisp compilers typically use optional type declarations and type inference to determine what inlined code to generate.