Hacker News new | past | comments | ask | show | jobs | submit login

My favourite cursed XOR trick that I think wasn't mentioned is XOR doubly-linked lists. https://en.m.wikipedia.org/wiki/XOR_linked_list

Instead of each node storing the next- and previous-pointers separately, store a single pointer which is the XOR of the two. Which is obviously an invalid pointer. But when iterating, XOR the previous node's pointer with the combined pointer to get the next node's pointer, and so on. You can iterate this way in both directions. Feels illegal. :)




One thing this loses relatively to a conventional doubly linked list is the ability to remove an item while only having its address (or any other iterator pointing to it that is stable across other items being inserted or deleted). And unfortunately that’s often why you’re using a doubly linked list in the first place.

(A less intrinsic flaw is that coding an XOR linked list in strictly conformant C is exceedingly annoying. The standard does not guarantee that equal pointers turn into equal integers upon casting, so you’re forced to make everything into an uintptr_t to, essentially, maintain a canonical integer-cast version.)


> The standard does not guarantee that equal pointers turn into equal integers upon casting, so you’re forced to make everything into an uintptr_t to, essentially, maintain a canonical integer-cast version.

Of course! The standard does not guarantee that the size of an int is the same as the size of a pointer, i.e. `sizeof(int) =/= sizeof(int*)`. IIRC, this was the case on some of the PowerPCs I worked on decades ago. Now with x86-64 & Aarch64 having taken over the world (and with saner 32bit on the embedded end) we've almost forgotten the Cambrian explosion of "interesting" word-size machines.

The whole point of the C standard is that it allows implementers flexibility in the sizing of the basic data types, to match a given machine's architecture, with the standard only defining relationships between types (e.g. char <= short <= int <= long). The only surprise is that it took so long for fixed-width types to be standardized (C99).


I meant integers in general, not ints specifically. That is to say, the standard does not guarantee that, if p and q are (say) void pointers and p == q, then (uintptr_t)p == (uintptr_t)q. (Neither does it guarantee that, if p == NULL, then (uintptr_t)p == 0, but I digress.) Mere size shenanigans are not enough for things to get that bad. What you need is for there to be multiple ways to represent a pointer to the same place in memory, like on real-mode x86 in the large memory model.

The practical result is, you need to write XOR-linked-list operations something like this:

  struct node { uintptr_t link; };
  void prepend(uintptr_t head, struct node *n) {
          struct node *h = (void *)head;
          uintptr_t node = (uintptr_t)(void *)n;
          n->link = head ^ h->link;
          ((struct node *)(void *)h->link)->link ^= head ^ node;
          h->link = node;
  }
and you cannot for example make head, the sentinel node pointer, into a struct entry * instead, it has to be exposed as an uintptr_t (a canonical integer representation of that pointer).


Wow, TIL. So in the version unaware of this the list could be corrupted if the pointer you use, while correctly pointing to the head or tail of the list, happens to convert to a different bit representation for the XOR than the one you encoded into the node. Did you happen to ever see this in a real system?


I don’t have it in me right now to work out whether this can actually cause corruption, but as for equal pointers simply converting to unequal integers, sure:

  $ cat mismatch.c
  #include <stdio.h>
  char abc[48*1024U], def[48*1024U];
  int main(void) {
      void *p = &abc[sizeof abc], *q = &def[0];
      printf("%d\n", p == q); /* pointers are equal */
      printf("0x%.5lX == 0x%.5lX\n", /* linear addresses are equal */
          ((unsigned long)p >> 16 << 4) + (unsigned short)p,
          ((unsigned long)q >> 16 << 4) + (unsigned short)q);
      printf("0x%.8lX != 0x%.8lX\n", (unsigned long)p, (unsigned long)q);
      return 0;
  }
  $ # compile&link for DOS, 8086, huge memory model, create map file
  $ wcl -bcl=dos -0 -mh -fm mismatch.c
  Open Watcom C/C++16 Compile and Link Utility Version 1.9
  [snip]
  creating a DOS executable
  $ egrep '_abc|_def' mismatch.map
  0371:0000+     _abc
  0f71:0000+     _def
  $ emu2 mismatch.exe
  1
  0x10080 == 0x10080
  0x0408C000 != 0x10080000
(I said large memory model earlier. That was incorrect: if you compile for the large memory model, with -ml, the first line of the output will be 0, because then pointer comparisons will not canonicalize pointers. You need the huge memory model for that. Both 0 and 1 are OK according to the standard, as it does not guarantee anything about comparing a pointer one element past the end of one object to a pointer to another object.)


On modern 64-bit machines, sizeof(int) != sizeof(int*) is very true. But there's probably a significant amount of code that assumes it's equal to sizeof(uint64_t) or sizeof(long). :)


True, but if you always keep a pair of adjacent pointers where before you would have kept just the first, efficient deletion and insertion come back. (You could even go the whole hog and encapsulate this pair as a new WidePointer type, with operator++() and operator--() internally operating on both, etc.)

This will likely be a small constant factor slower for some operations, of course, but if you need bidi traversal and node values are small compared to pointers, it's a solid space savings -- and the lower cache utilisation resulting from that savings may even make it a speed win overall.


Storage can be further reduced if we think that, with a 64-bit processor, probably a 32-bit address space is enough for most applications (that require less than 4 GB of RAM).

Maybe we can go even deeper with 16-bit near/relative pointers. Perhaps data-oriented design fits well in this situation? With blocks of 64k elements and uint16 indices to address elements inside of them.


> with a 64-bit processor, probably a 32-bit address space is enough for most applications

This is related to the Java virtual machine's use of 32-bit "compressed ordinary object pointers (OOPs)" on a 64-bit platform. The pointer is 8-byte-aligned though, so it can address 32 GiB of memory. There is also a non-compressed-OOPs mode that can address more than 32 GiB.


I see that you're being silly, but the problem is conflating pointers and indices. 16 bit indices are fine. 16 bit pointers are terrible.

Separately, 32 bit pointers are a good optimization in many situations. Java will save lots of space by using 32 bit pointers until you go above 32GB or 64GB or more (depending on what you set as minimum object alignment).


16 bit relative pointers are about as useful/terrible as 16 bit indices.


A relative pointer with limited range is something that should rarely exist, and all too easily it can end up with a base that is no longer guaranteed to be adjacent. There's a lot of bonus landmines compared to just having a 64k element limit.


Something like Rust's borrow checker can make sure that your references / pointers only live as long as they should.


By "landmine" I mean situations where it gets very unpleasant to use because the data isn't guaranteed to be close enough to fit into one of those pointers. Even if it's true when the code is first written, minor updates might change that. I don't think something like a borrow checker can do much to help with that.


Yes, you lose flexibility. There's a price to pay for the increased performance.


Let me put it this way: 16 bit indices are already rarely worth it. 16 bit pointers are 10x the trouble for 20% more situational coverage. It's not worth it. They are not similar levels of useful/terrible.


Relative pointers and indices are pretty much the same, it's just the ergonomics which differ a bit (depending on your language or library).

I agree that for most people in most situations neither of them are worth it.


Indices all go in the same allocation, which makes a big difference. You're just putting a max size on your arrays, instead of making your memory allocations a lot more complicated. And if you need bigger, you'll know in a much more straightforward and deterministic way. It also doesn't require the same support from the language, whether that's explicit small pointer support or support for turning integers into pointers.


So essentially, you are using indices rather than pointers.

Indices have advantages: they can be more compact, and they are position-independent. But you need a base pointer, and a bit more computation. It also requires you to have your own allocator, as your typical heap allocator will just give you an arbitrary pointer to your object.


Re: data-oriented data structures... random (not XOR)...

I have been running multiple iterations of entire ipv4 space port scan. As of now only scanning top 25 used ports (nmap, masscan have toplists). I wanted to be able to do simple data analysis _very_ efficiently time-wise, being OK to sacrifice memory for that.

If you want (time-wise) O(1) item insert (into SORTED btw), fetch, lookup, and port status check to simply be a matter of bitshifting (similarly with counting), then:

1. Bitfield array to store status of up to 32 ports (1 bit for state (open/closed) => 32 bit bitfield)

2. ...that's it. Each scan result is to be found at `bitfields[(unsigned int) ipv4_address]`

In C:

``` // bitfield for port status, for each IP

  struct port_field {
    bool p1:1;
    bool p2:1;
    bool p3:1;
    bool p4:1;
    bool p5:1;
    // in C, gotta write it out - of course we could use a macro to generate this
    ...
    bool p32:1;
  };
```

This will use 16 GiB of memory for the whole mapped space:

```

  #define NUM_IPV4 (unsigned long) 4294967296L
  // ...
  // sizeof(struct port_field) => 4 bytes
  struct port_field *ip_space = calloc(NUM_IPV4, sizeof(struct 
port_field)); ```

When scanning (or reading in scan results):

```

  in_addr_t u32_ip; // unsigned 32 bit int
  struct port_field *p_target_bitfield;
  int *p_target_int;

  // ... to insert:
  if (!(u32_ip = inet_addr(token))) { // `token` is string with actual ip (e.g. from text file)
    printf("line %lu: IPv4 address not valid: %s\n", line_count, s_ip);
  } else {
    p_target_bitfield = &(ip_space[u32_ip]); // unsigned int ipv4 as 'index'
    p_target_int = (int *) ((void *) p_target_bitfield); // cast bitfield* to int\*

    // set bit at port_index:
    *p_target_int = ((1 << port_index) | *p_target_int);
    // now, port identified by `port_index` is at `(1 << port_index) | *p_target_int)`
    // where `p_target_int` is pointer to port status bitfield cast into signed int32
```

It works - pretty nifty :) i'm sure i could make it much more pretty tho.

But a kind of 'columnar-bitfieldish' in-memory O(1) for everything:)*


Wouldn't a columnar store be 25 512MB arrays? And that's probably a better layout for doing analysis with.

Also you might as well set the number of addresses to 224<<24.


> Wouldn't a columnar store be 25 512MB arrays?

Hm I guess you're right - I'm misusing the term. And yeah! Will experiment with it; what's neat is that it's not that much code to mess around with these basic notions...

Re: 224 << 24 - you're right; so many unusable actual addresses. It's just kind of neat to actually map out whole ipv4 space to memory. But yes lots of it unneeded, I'll see if I can add minimum-computation-possible mapping translation so that everything still stays ~kind of O(1).

Thank you for your comments! :)

edit P.S. 25 x 512MiB arrays - actually thank you, I thought of doing sth like that at first, but now forget why didn't start experimenting with that sort-of-actual-columnar-store from the beginning.. anyway, nice to quickly mess around with multiple base data layouts (I'll try that one next I think), would recommend anyone wanting to attain base knowledge on e.g. data layouts for data analysis...


So an 8086 with 64 address pins? Or did I miss the /s?


If you do this, your garbage collector will hate you.

Or, at least it will decide your data structure is garbage.


Why would I want to use this trick?


To cut down the storage overhead of the linked list. You're saving 8 bytes per element on 64-bit systems.


> Instead of each node storing the next- and previous-pointers separately, store a single pointer which is the XOR of the two. Which is obviously an invalid pointer. But when iterating, XOR the previous node's pointer with the combined pointer to get the next node's pointer, and so on. You can iterate this way in both directions. Feels illegal. :)

Meh, it's not a pointer, it's not really any different from storing the difference between the two pointers, which obviously would let you iterate in both directions.


Yes, but as mentioned in TFA, storing the difference would require an extra bit. The difference between two 32 bit numbers is in the range [-2^32 -1, 2^32-1], needing 33 bits to store. The XOR is the same size as the original pointer, simplifying data flow.

But even so, storing a doubly linked list with only pointer differences and no absolute pointers (other than head and tail) feels illegal too


> storing the difference [of prev and next pointers] would require an extra [sign] bit [relative to XOR]

No it wouldn’t. Just let wraparound happen normally and things will work out.

Effectively what you need are functions

  pair  : ptr × ptr → uintptr
  left  : uintptr × ptr → ptr
  right : ptr × uintptr → ptr
  
  left(pair(x, y), y) ≡ x
  right(x, pair(x, y)) ≡ y
Setting all three to XOR is one possibility. But

  pair(x, y)  = (x + y) mod 2^(width of ptr)
  left(p, y)  = (p - y) mod 2^(width of ptr)
  right(x, p) = (p - x) mod 2^(width of ptr)
is an equally valid one, because addition and subtraction modulo any fixed value are related in exactly the same way as normal addition and subtraction.


Uh, no. Assuming 32-bit pointers, you'd add or subtract them using normal unsigned arithmetic which is modulo 2^32. Essentially the same trick definitely works because all these operations (adding/subtracting/xoring a number) are invertible.

The neat thing about xoring with a number is simply that that operation is its own inverse.


One difference between XOR and difference is that because of the symmetry of XOR, you could use the same code for walking the list forwards and backwards.


Feels like the kind of tricks we pulled in the "640K ought to be enough for anyone" era. I would reserve them for severely constrained devices today.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: