
The purpose of NaN Boxing - akg
http://programmers.stackexchange.com/questions/185406/what-is-the-purpose-of-nan-boxing
======
lmkg
NaN-boxing is morally equivalent to using tagged unions to represent values of
any type. It optimizes for fast floating-point operations, and for allowing a
larger amount of meaningful values to be represented in 64 bits.

Most tagged-union approaches use some sort of punning so that at least one
type of value is directly meaningful, without any bit manipulation. For
example, some Common Lisp implementations will choose their tag bits for Cons
cells such that NIL is represented with all zeros. Since NIL is the only
falsey value and everything else is truthy, this means that any bit-pattern
can be used directly for testing conditionals, with the correct semantics.

NaN-boxing is choosing a different type of punning for the tags, to optimize a
different use case. First, all bit-patterns can be used directly as 64-bit
floats with technically correct semantics--anything that's NaN-boxed is
actually not a floating-point number. I'm given to understand that modern
architectures are slow to mix float and bit ops, so it's nice that you don't
need to mask your tags off of your float before operating on them.

Second, (double-precision) floats are usually the only type that require the
full 64 bits to have meaning. In many applications, 50ish bits is 'good
enough' for ints and pointers (with some extra ops to handle overflow), but
floats are mandated by a standard and don't scale down gracefully. A tagged
union wouldn't be able to contain a 64-bit float directly without spilling
into an extra machine word. Unless, of course, the tags are punned to be part
of the float value itself. And that's exactly what NaN-boxing is.

------
gus_massa
In Racket, they use the alignment of the memory blocks:

* The even integer numbers are pointers.

* The odd integers represent fixnums (small integers) (n <\---> 2*n+1).

With this representation the calculations with fixnums are quite fast because
they don't need unboxing.

~~~
apaprocki
It probably isn't relevant to most people anymore, but misaligned memory
access on certain architectures (e.g. Sparc) incurs a penalty.

~~~
rayiner
The tagging scheme mentioned above accounts for that. Objects are stored at
aligned addresses, which leaves a couple or three bits available for
signifying integers and other data types. On architectures that trap on
unaligned accesses, you can actually use the trap to avoid an explicit type
check if you're expecting a pointer and someone passes an integer.

------
bitwize
_I'm not really sure I understood the utility of this technique, that I see as
an hack (it relies on the hardware not caring on the value of the mantissa in
a NaN) but coming from a Java background I'm not used to the roughness of C._

The use of "signalling NaNs" is supported by IEEE 754. It may be a hack, but
it's a hack that's in the standard.

------
malkia
luajit was the first implementation I've seen, but it's author Mike Pall said
soemewhere that this was an old technique, or more like that it could've been
known before. I remember reading aome lisp paper from th 90s where sifferent
encoding techniques/boxings were mentioned, and one of them was it (I thnink).

~~~
krickle
I first read about it on wingolog.org, and IIRC there was mention of an
implementation of Guile using it. I think that supports your paper as
evidence. Very cool technique, but I would rather give preference to integers
and pointers.

------
general_failure
v8 doesn't use this technique - the use SMI.

Read also [http://wingolog.org/archives/2011/05/18/value-
representation...](http://wingolog.org/archives/2011/05/18/value-
representation-in-javascript-implementations)

