
Numbers and tagged pointers in early Lisp implementations - jsnell
https://www.snellman.net/blog/archive/2017-09-04-lisp-numbers/
======
ScottBurson
A piece of more recent history that people might not know: the 32-bit SPARC
chips had some special instructions for adding and subtracting integers,
assuming a representation in which the two low-order bits were used as tag
bits, and a value of 00 in those bits indicated a fixnum (Lisp-speak for an
immediate integer). The TADDCC instruction ("Tagged ADD and set Condition
Codes") was just like ADD, except that it would also set the overflow
condition code if either operand had a tag value other than 00 (it would also
be set if the addition overflowed). The next instruction would call a handler
if the overflow bit was set; the handler would determine whether (a) this was
a case of actual integer overflow requiring construction of a "bignum" (a
heap-allocated integer); (b) one or both operands were already heap-allocated
integers (or perhaps other kinds of numbers) requiring some other form of
addition; or (c) one or both operands were not numbers at all, a runtime
error. But the point is, the most common case was handled in two instructions,
and those could even be combined into one using the TADDCCTV variant, which
trapped automatically on overflow. (There were also TSUBCC and TSUBCCTV, of
course.)

My understanding of the history here is that a lot of Sun people, notably Bill
Joy, came from Berkeley, where Franz Lisp was also developed; and in the
mid-1980s (the first SPARC came out in 1987) it still seemed possible that
Lisp would be of great commercial importance. AFAIK, these instructions were
dropped for the 64-bit SPARC.

But the point of posting this is to underscore that the use of "fixnums",
tagged immediate integers, is well known. Not only, as Juho has helpfully
uncovered, does it date back to the 1960s, but it had explicit support in what
was once a major microprocessor architecture.

I'll digress a little further and add that I really wish Java had used the
Lisp approach for arbitrary-precision integers. Then Joshua Bloch would never
have had to write this blog post: [0], among many other benefits.

[0] [https://research.googleblog.com/2006/06/extra-extra-read-
all...](https://research.googleblog.com/2006/06/extra-extra-read-all-about-it-
nearly.html)

~~~
FullyFunctional
Furthermore, SPARC, like most RISC at the time, didn't support unaligned
accesses, so to a large extent you got tag checking for free. Eg. assuming
"01" tags for pointers and a simple two-word CONS cell, car would simple be
one load instruction.

    
    
      ld %rd, [%r1 - 1]
    

if %r1 didn't have a "01" tag, this would trap.

Finally, SPARC reserved 8 registers for "global" use, that is registers what
wouldn't be otherwise used by the compiler. Use was as simple as

    
    
      register int *sp asm("%g3");
    

which was immensely valuable for VM implementations (and others).

Unfortunately, the tagged add instructions didn't get much use; perhaps SPARC
wasn't a big enough base to justify designing around it.

~~~
lukego
... and these days tag checking seems to be almost for free on out-of-order
processors (e.g. x86) because the check is only a couple of instructions with
a predictable control dependency and no data dependency.

~~~
ScottBurson
Right. This is probably one reason TADDCC was omitted from SPARC64.

------
lukego
Just tangentially: I have high hopes for an experiment with a 96-bit word size
in RaptorJIT (a fork of LuaJIT, a compiler for a Lisp-like language.) That's a
32-bit tag and a 64-bit value.

This seems practical for a few reasons:

Full 64-bit pointers and numbers can fit into tagged words. (No boxing and so
no heap allocation for u64 and pointer arithmetic.)

Naked 64-bit values can still be stored in registers and used directly. The
JIT knows the type of each register and does not need to load the tag bits
(they are checked on load and added on store.)

The x86-64 target CPU is efficient for unaligned access.

We will see how this works out in practice. Link:
[https://github.com/raptorjit/raptorjit/pull/93](https://github.com/raptorjit/raptorjit/pull/93)

~~~
gsg
How do you pass arguments? In two registers? On the stack?

~~~
lukego
This turns out to be much simpler than expected, on paper at least.

First, this is a tracing JIT and so it basically inlines everything. There is
no argument passing convention at all within a block of JIT code because
subroutines bodies are completely inline. So - passing tagged values as
arguments is actually a fairly infrequent operation.

Then in cases where tagged values really do need to be passed around they will
always be passed by pointer reference. The JIT will write them to memory - the
Lua stack or the heap - and pass a pointer reference in a register.

The really happy circumstance is that the whole C runtime system is already
written to reference tagged values using pointers, and there is already a port
that supports 64-bit Lua values using 32-bit machine words.

~~~
gsg
I see. It seems like that would still make calling supporting routines
annoyingly expensive, but perhaps that (and the extra tag memory) doesn't
matter as much as I think it does.

Good luck with the project.

~~~
lukego
Thanks!

There are really two kinds of supporting routines here.

Runtime system routines operate on tagged values and do have to swallow the
stack convention. Thankfully the code is already written in this style and the
JIT seldom invokes any of these routines in optimized programs (it generates
inline versions.)

Second is C routines called via FFI. These use native C data types and the
standard platform calling convention. They never see the tagged values (get an
untagged uint64_t in a register instead of a TValue struct.) These are easy,
cheap, and frequently used.

------
rjsw
I would read the PDP-6 Lisp description as meaning that negative and larger
integers were boxed, not that there wasn't a way of representing them.

~~~
jsnell
Indeed. But I think the text of the article already agrees with you :)

~~~
rjsw
Yes, you did write that you were just considering immediate values.

I'm not sure that PDP-6 Lisp used tags though, MacLisp used BiBop to determine
the type of an object so the entries in the type table corresponding to the
special pointers would have contained the fixnum type.

~~~
jsnell
They were still writing up BIBOP as the new hotness in 1977, so I'm pretty
sure this predates BIBOP by a decent margin.

But it's a fair point that the encoding in PDP-6 LISP doesn't match what we'd
think of as tagging today. They left the 0-pointer as a non-integer (probably
that meant NIL), so the type test would have been done using two comparisons
rather than a mask+compare.

