
XOR Linked List - no_more_death
http://en.wikipedia.org/wiki/Xor_linked_list
======
duskwuff
It's a cute trick, but please don't use it. It will confound debuggers,
garbage collectors, memory leak detectors, and future readers. :)

~~~
DennisP
Oh, I dunno. Every now and then I write a massive simulation of some p2p idea
or other and max out my RAM. This might just come in handy.

~~~
sukuriant
So that you can get 30 more nodes? Surely the L/R pointers of the linked lists
aren't taking up all the memory in your program

~~~
alecbenzer
If you're on a 64bit machine storing 32bit ints, this will decrease the node
size from 32 + 64 + 64 = 160 to 32 + 64 = 96, so you should be able to store
66% more data.

edit: In fact, when do you ever really need to store particularly large
objects in something like a linked-list? It seems like you can always get some
relatively small reference value to the data (eg, a pointer) and store that in
your list instead of storing your large objects themselves.

~~~
benwr
If you're storing references to larger objects, your larger objects are what's
taking up the space, which I think was the point (66% more pointers are
useless if you can't also store 66% more of the objects they point to).

Also, I think on a 64-bit machine with common DDR SDRAM, a memory location is
64 bits wide, no matter the size of the integer contained there. Not sure
about this, however.

~~~
zanny
Most modern architectures like AMD64 use 40 bit memory references in the CPU,
but write them as 64 bit to memory. That is why they have much lower
addressable memory than the logical limit.

~~~
DiabloD3
I think Bulldozer and Ivy Bridge are both 48, but I'm not going to dive into
500 page PDFs to find out.

~~~
zanny
Maybe, I just remember that number from powerPC / ARM back in my assembly
class in 2010.

------
seanmcq
Please profile tricks like this, as they may actually be significantly slower
than their naive counterparts on modern hardware. The prefetcher knows what a
linked list looks like, and it knows how to get it somewhere closer than main
memory before the nodes are needed.

~~~
jacquesm
> The prefetcher knows what a linked list looks like

That's some pretty advanced magic.

Even the compiler has very limited insight into what your code is actually
doing without simulating it, the prefetcher might be able to look a bit ahead
in the execution stream and do branch prediction but _absolutely no way_ does
that extend to knowing stuff about your data structures.

Unless I have just been transported by a time warp I really think this is
fiction.

If you're thinking of cache pre-fetching that actually has a really hard time
dealing with stuff like linked lists because it has absolutely no idea at all
about the data structure it is looking at. The 'next' and 'previous' pointers
in the linked list might actually simply be values without any significance at
all. And if they are dereferenced as pointers then that memory could be just
about anywhere within the valid address range.

For arrays on the other hand such pre-fetching can be useful.

~~~
exDM69
> Unless I have just been transported by a time warp I really think this is
> fiction.

Nope, the Intel guys and gals do this kinda magic day in, day out. Or at least
the chips they manufacture do. I'm not familiar with the internals of the
prefetcher of any CPU at this level, but let me wave hands here. This is what
the prefetcher could do:

All it takes is for the prefetcher to get a cache line when requested, and
then observe what is inside and look at pointer-sized and aligned values that
look like pointers and are pretty close to the original cache line's (virtual)
address. Now if these values happen to be sane virtual addresses in the
current process, the prefetcher might as well fetch them one cache closer to
the CPU. If it hits, it might yield big performance boost in real world apps.
If it misses, it's just a little wasted electricity.

All modern CPUs do dirty little tricks like this if it helps them outshine
their competitors.

Btw. you can add prefetch instructions in your code manually if you do linked
list traversals or similar. In GCC you can use __builtin_prefetch() compiler
intrinsic.

~~~
malkia
Any refs, docs about this behavior? First time hearing it. How does it work
with MMU, protected memory, or external memory mapped as normal one?

I know a bit about prefetch, and it's variants, but haven't directly used in
10 years (I was on a software project that had one of the first Katmai's, back
in 1999 to do some bilinear/bicubic texture filtering for a media/drawing
application, and had to code extensively in mmx assembly back then).

~~~
exDM69
> Any refs, docs about this behavior? First time hearing it. How does it work
> with MMU, protected memory, or external memory mapped as normal one?

edit: try to look for performance tuning guides for your CPU architecture,
there might be some details about systems like this.

Nope. Some might exist but this stuff is generally considered trade secrets. A
patent search might yield something. You can find some material about
speculative execution, branch prediction, prefetching, etc but the real beef
is hidden somewhere in Intel's (and their competitors') vaults.

It's all supposed to be transparent to the programmer so there's no need to
write and release detailed public documentation about it.

As I said earlier, I have no idea how they work (or do they even exist), but
I'll give a handwaving example of how they _could_ work.

------
peterderivaz
The nicest use of a similar XOR trick I have seen is to eliminate dead cycles
in hardware bus arbiters.

Suppose you need to multiplex between inputs A,B,C to make output D (e.g. you
have a 3d block, a display engine, and a CPU all trying to access DRAM). The
idea is to xor all valid inputs together.

When there is only one input this results in output=input. If all 3 are active
then the output is A^B^C. The trick is that the next cycle only the granted
input (say A) is removed so the output is B^C

Receivers are required to XOR two sequential values to extract the actual
data. In this case (A^B^C)^(B^C)=A.

See the "NoX Router" for details and explanation of how this compares to
alternative arbitration approaches:
pharm.ece.wisc.edu/papers/micro2011_nox.pdf

~~~
pheon
oooh very cool.

------
TheBoff
Believe it or not, we actually had a C exam question where we were asked to
sketch out an implementation of this.

[http://www.cl.cam.ac.uk/teaching/exams/pastpapers/y2010p3q6....](http://www.cl.cam.ac.uk/teaching/exams/pastpapers/y2010p3q6.pdf)
With classic mildly amusing exam humour introduction!

~~~
goldeneye
This question still haunts me.

------
shirro
I haven't used an XOR linked list since my Amstrad 6128. I think probably
independently discovered but not sure. It is a pretty obvious thing to do on a
memory limited machine. Nice to know people still know about this stuff post
the 8 bit micro era but outside of embedded systems why would anyone use this
today?

~~~
excuse-me
Saving memory still matters if you are trying to fit something into cache -
although the items you are storing in the list would have to be pretty small
if the extra pointer mattered

~~~
shirro
Most people don't need more than 4G per process so I wish stuff like the linux
x32 abi became standard. All the extra registers without the huge pointers.
Would be handy for squeezing more into a cheap VPS. Not to mention the cpu
cache.

------
Tossrock
Anyone actually using this is almost certainly committing a heinous premature
optimization.

~~~
glimcat
Why would you not just do the following?

interface LinkedList

class BasicLinkedList implements LinkedList

class XORLinkedList implements LinkedList

~~~
kevingadd
Hm, I'm super concerned about performance! I know, I'll turn every function
call into a virtual function call!

~~~
ComputerGuru
Don't be silly, the principle is sound.

    
    
        #ifdef _DEBUG
        typedef BasicLinkedList LinkedList
        #else
        typedef XorLinkedList LinkedList
        #endif

------
no_more_death
I know, maybe it's not very useful.

But isn't it fascinating? Some information moved from the link node into the
state of the program processing it. This halves the link redundancy of a
doubly linked list. Suddenly a doubly linked list has the footprint of a
singly linked list. How did that happen? :-)

It taught me. Tricks like this make me a better programmer. Even if I won't
use the principle involved in this exact way.

------
dubya
This is an exercise in Knuth, TAOCP v.1.

------
drallison
As I remember this was used the operating system in the XDS Sigma 7. Decoding
core dumps when things went awry was a challenge, but it did save memory.

------
mkup
Some implementations of heap memory allocators manage free memory chunks as
doubly-linked lists. There are multiple lists for chunks of equal size, i.e.
4, 8, 16, 32 ... 2^n.

This XOR linked list hack makes sense for smallest chunk size. On 32-bit
system, two pointers will take 8 bytes, so minimum size of allocated memory
block is 8 bytes without this hack and 4 bytes with this hack. On 64-bit
system, this hack makes possible allocation of 8-byte block from heap without
rounding up to 16 bytes.

For other practical purposes, I think that trading speed and code
maintainability for memory size isn't good idea.

------
joeld42
Cool! Hadn't seen that before, neat trick.

Seems to me like (as other commenters have pointed out) this wouldn't be a
great idea for something that is stored as a straightforward linked list.

Where it seems really useful to me is a cheap way to store an alternate
ordering on top of an existing data structure, without having to keep a
separate list around. I've done this before, using next pointers to preserve
an alternate ordering, but this is a nice way to allow it to be bidirectional.
neat.

------
Vlaix
It looks fun and all, but I really wonder about the implementation and
practicality of it in C. But once again, it does look really cool.

~~~
nknight
It's technically undefined behavior, but in practice it will work perfectly
well on almost all C implementations, and in the infinitesimal number of cases
where this structure is actually useful, portability isn't a major concern.

~~~
mzl
Just curious, what part of an XOR-linked list is undefined? AFAIK, casting to
and from sufficiently wide integers is ok.

~~~
zurn
I remember hearing claims that a C++ implementation would be allowed to use
conservative GC (and nop out free/delete). I wonder what the language lawyers
make of this.

~~~
nuje
What about destructors? I guess only using the GC on destructorless objects
could work.

------
codezero
Just curious, this wouldn't work for a circularly linked list in the case
where there is only one element, right?

~~~
tresta
Why wouldn't it?

A xor A = 0.

A xor 0 = A.

That should work like a regular circularly linked list or am I missing
something?

 _edit_

Doh, now I understand. If this was allowed it wouldn't be possible to have a
one-node non-circular list.

~~~
bbrtyth
It would be the reverse case if it were non-circular, returning 0.

------
pheon
it only works if you are linearly iterating from the start/end of a list.

IMHO one of the (few) primary benefits of linked lists is random access to
insert/delete/iterate an element without having to traverse the entire list.
Something that breaks with xor lists.

Secondly if you have so many nodes that the memory saving of one pointer is
significant. Its likely traversing that list will take forever and force you
to use a different algo anyway.

... an improvement on basic linked lists would be 16B/32B/128B/512B etc
alignment of the nodes memory address and bit packing the Left/Right offset in
units of the alignment unit. Hell you could probably beat xor lists for space
for a range of workloads.

~~~
adsr
A regular linked list doesn't have random access to elements, typically a head
node is inserted into a function that iterates the list.

~~~
pheon
.. or you pass any node to a function that operates on the list. iterate
up/down insert/delete etc.

point being you dont have to iterate from the start of the list.

~~~
dpark
So your function needs to be passed current and next (or prev, depending on
which direction you want to iterate). XOR-lists have lots of issues. This
isn't one of them.

~~~
pheon
and your back to storing 2 pointers again.

~~~
dpark
No you're not. Your function needs to accept two pointers. Your data structure
doesn't need to store two pointers.

Unless you're storing external pointers to every node (in which case you'll
actually break even with the traditional implementation in terms of size),
you'll still have significant savings. Passing an extra pointer to a function
is pretty trivial in terms of size.

