
Smuggling Data in Pointers - phamilton
http://www.undiscoveredfeatures.com/2011/11/smuggling-data-in-pointers.html
======
jbeda
I implemented this in the core data structures for IE5. We were very memory
constrained as we had to run on Win95 machines with something like 12MB of
RAM. I was counting bits.

The base level of the tree wasn't really a node DOM but rather a stream of
document events (start element, text, end element, etc). We put these in a
binary tree that was balanced as a splay tree. To drive memory usage down I
reduced the {parent, left, right} pointers down to {child, next,
left_child_bit, next_is_parent_bit}. 'child' pointed to the the first child. I
knew if it was the left or the right child based on the left_child_bit. 'next'
pointed either to the sibling or the parent, depending on the
'next_is_parent_bit'. However, to really realize the savings, I needed to hide
these bits in the pointers themselves.

Other tricks that we used to save memory:

1) Put rare data into a hash table indexed off of the 'this' pointer of the
object. Have a bit to indicate if it was there or not. We called these
lookaside pointers.

2) Embed objects in other objects to save pointers back and forth. The
embedded object would do math on its this pointer (based on bits for which
embedded member it is) to get the this pointer of the parent object.

I'm not sure, but I suspect that one of the reasons I think that current IE is
slower now than it should be is that a lot of these optimizations are no
longer necessary and cause more problems than they solve.

~~~
gravitronic
Kudos to you for such a widely released implementation of somewhat insane
coding.

Crashes back in Windows 95 days suddenly make a lot more sense to me...

~~~
jbeda
Actually, by the time it shipped, 80% of the crashes were due to insane
reentrancy issues due to the componentized nature of IE.

For example, when tearing down a page, we would go ahead and release the
laster of the references to the ActiveMovie COM objects. That component was
using a separate thread and wanted to shut it down cleanly during clean up. It
used windows messaging to communicate to the thread and had to run a message
loop. That message loop would also dispatch messages to our top level window.
If the wrong message came in at the wrong time in this situation we would have
code dealing with a semi-destructed data structure and it would crash. I dealt
with a lot of these issues by being very defensive when calling out to
anything. Things like lots of null checks, saving references to be released
until the end, etc. Of course there was a perf cost for each of these that we
monitored closely.

I don't _think_ that any of these were the root of security problems. Those
were more due to the impedance mismatches at the level of the shell/browser
host and the URL deliver services.

------
ot
This technique is commonly called "tagged pointer"

<http://en.wikipedia.org/wiki/Tagged_pointer>

They are commonly used in interpreter implementations to distinguish between
pointers and integers (if the LSB is 0 then the value is a pointer, otherwise
(v >> 1) is an integer value)

~~~
sb
Just for the record and to provide additional references, the technique of
reusing a pointer for integer representation is rather old, for example
fixnums refer to the same technique in Lisp implementation, IIRC, in Smalltalk
this refers to SmallIntegers.

------
batterseapower
The Glasgow Haskell Compiler uses tagged pointers to implement lazy
evaluation. Suspended computations are created as heap-allocated "thunks", and
the runtime will tag any pointers to a thunk that is known to have already
been evaluated, which can save one indirection when you need to "force" the
thunk to get a value.

Furthermore, for thunks of algebraic data types with only a few alternatives
(e.g. booleans, the Maybe type) the tag will encode _which_ of the
alternatives the pointer actually points to. So if you are case scrutinising a
boolean you can decide which branch to jump to by inspecting the pointer only
with no memory access.

This last trick doesn't work if the number of alternatives is > 3 (for 2 byte
alignment) because there aren't enough bits available.

------
xenomachina
Some software for the Motorola 68000 did something a bit like this (though
probably more for space savings/convenience than atomicity): the 68k had
32-bit pointers but the top 8 bits were ignored so some programs would store
extra data there. You didn't even need to mask off the bits to dereference! I
remember the Amiga ROM Kernel Reference Manuals actually warned against doing
this, as later CPUs in the 68k family _did_ pay attention to the top byte in a
pointer.

~~~
yason
Surprisingly, the original AmigaBASIC written by _Microsoft_ (yes) used this
very technique and thus failed catastrophically on later AmigaOS versions
where the memory bus was no longer 24 bits.

Further, on the Amiga, any sensible pointers were generally 4-aligned anyway
(you couldn't read words or longs from odd addresses and I think reading longs
from 2-aligned addresses was slower, too, so nobody wanted to do that) so,
theoretically, if you were to abuse this to the maximum effect you could've
stashed 10 bits of excess payload data into an address: 8 bits in the bits
23-31 and 2 bits in 0-1 of an address. Any pointer was internally just a
32-bit address register value so you certainly had bits to spare!

~~~
xenomachina
Ha! I didn't know AmigaBASIC did that.

I almost mentioned the fact that pointers were pretty much always aligned on
the 68k, but I never really head of anyone using those bits to store anything.
The top-8 had the "nice" property that the hardware did the masking for you.

I do remember that certain parts of the Amiga OS (particularly in the file
system, I believe) used "BPTRs" which were essentially pointers shifted down 2
bits -- in other words, the word index instead of the individual byte index.
Normal pointers were called "APTRs" by the code that used BPTRs.

~~~
yason
Yes, the APTR / BPTR divergence had something to do with AmigaDOS (the disk
operating subsystem) was originally written with BCPL. While it was later
rewritten in C, however, the ABI had to remain compatible and thus the
pointer-mangling continued. I don't remember if they had any use for the extra
two bits or whether the addressing scheme was coined particularly just to
access a 4x large address space.

------
regularfry
This is precisely how Ruby's object ids work. An object id is an 8-bit aligned
pointer to the object's location in memory. The low 3 bits are used to
identify booleans, fixnums, nil and symbols.

I'm sure this was borrowed from a lisp implementation.

~~~
arethuza
Lisp machines typically had direct hardware support for these tags - Symbolics
machines used 36 bit words, 32 bits of data and 4 bits of tags:

<http://en.wikipedia.org/wiki/Symbolics>

~~~
dfox
Also, SPARC has direct hardware support for this (for the cases that are
relevant for JITed code, ie. arithmetics). But sadly, on current
implementations it is often somehow emulated and thus slow.

~~~
rayiner
It's because there isn't much point. Most Lisps use a tagging scheme like:

00 = Integer 01 = Pointer

With 00 as the integer tag, integer addition and subtraction can be done
directly without masking off the tags. Multiplication and division need the
tags masked, but are slow anyway. The tag can also be stripped off for free on
any architecture that offers BASE + OFFSET addressing (just subtract 1 from
the desired constant offset).

------
kaushiks
This is also a common technique used by modern Javascript implementations. A
much less common and slightly more interesting approach (to me) is NaN boxing
- that packs tags into unused NaN bit ranges of doubles that are encoded per
the IEEE-754 format.

Chakra uses the bottom most bit to distinguish SMIs from pointers.

~~~
ralph
LuaJIT uses NaN-tagging too.

------
apaprocki
While this tagged pointer construct is pretty common but makes some people
wretch, realize that the entire web is built upon this same concept, only
bigger. The nan/nun-boxing that is used by various dynamic engines (e.g. All
JS engines) relies on the fact that pointers on x86-64 hardware only use
48-bits of memory address space. They use the remaining 16 bits in the double
for flags pertaining to what type of object it is. This has a nasty side
effect of preventing these engines from running on any 64-bit architecture
that does not have this memory address space restriction (e.g. Sparc, POWER).
One either has to force the environment to only allocate space out of the
lower 48-bits or come up with some other kind of encoding scheme to indicate
if the address is a "far" address and should be added to the 48-bit base.

------
agazso
In the Linux kernel, the internal Red-Black tree (rbtree) implementation uses
this trick to save memory.

Quoting LWN:

"The anticipatory, deadline, and CFQ I/O schedulers all employ rbtrees to
track requests; the packet CD/DVD driver does the same. The high-resolution
timer code uses an rbtree to organize outstanding timer requests. The ext3
filesystem tracks directory entries in a red-black tree. Virtual memory areas
(VMAs) are tracked with red-black trees, as are epoll file descriptors,
cryptographic keys, and network packets in the "hierarchical token bucket"
scheduler."

<http://lwn.net/Articles/184495/>

------
rvkennedy
If you're serious about smuggling data in pointers, there's probably a lot of
mileage on the other end of the pointer if you're writing 64-bit code. While
32 bits can address 4Gb, you only need another 8 bits (say) to get a terabyte,
leaving maybe 24 bits to cram with whatever you feel like.

~~~
dubya
The enormity of 64-bit pointers is something Don Knuth has complained about
because it wastes a lot of memory for programs that don't need that much
address space. And it actually matters for some of the stuff he writes. The
old Motorola had an addressing mode that used 16 bit relative pointers. Does
Intel have anything similar for x64?

~~~
dubya
Replying to myself :( The mode was called PC relative addressing, and resulted
in smaller, faster code on the 68000, but things had to be written in 32k
chunks. I seem to recall that it was required for the original Macintosh
before they wrote a proper program loader. Anyhow, some of Knuth's code (hiss
BDD package at least) has hacks to reduce the size of pointers.

------
exDM69
This is a really dirty hack to do and it only works if your addresses are
always aligned to a 2^n byte boundary. That said, there are plenty of cases
where this dirty hack is perfect for the job.

Atomic lock free data structures is one example. Tagging the pointers can be
used notify that a node is going to be deleted in order to avoid the ABA
problem (as mentioned in the article). Tagging pointers is absolutely
essential for more complex lock free data structures, like doubly linked
lists.

This can also be used in a garbage collector to add a few bits of metadata. In
a GC you can (and probably should) decide to use memory objects with proper
alignment, which leaves you with a few bits to store data in. A perfect
solution to implement something like Lisp cons cells.

------
anon1385
OS X Lion uses tagged pointers for some CF objects:
[http://objectivistc.tumblr.com/post/7872364181/tagged-
pointe...](http://objectivistc.tumblr.com/post/7872364181/tagged-pointers-and-
fast-pathed-cfnumber-integers-in)

Search for OBJC_IS_TAGGED_PTR in
[http://opensource.apple.com/source/objc4/objc4-493.9/runtime...](http://opensource.apple.com/source/objc4/objc4-493.9/runtime/objc-
private.h)

