X32 ABI

carterschonwald · on Oct 10, 2013

There's some exciting preliminary experimentation doing an x32 abi variant for GHC (Haskell). The benchmarks with the hacky first attempt yielded a 15% perf boost. I'm hoping that an x32 variant with largish heap support will be in the ghc 7.10 release (which won't be for another 8-12 months).

Btw: ghc always welcomes new contributors! Getting started can be as simple as trying to build ghc HEAD on your favorite platform and reporting any test suite bugs or bugs in your own experimentation. If you get stuck or confused, the ghc irc channel on freenode is full of folks happy to help out too!

slacka · on Oct 10, 2013

With a potential 32% boost in integer performance, I've been exited to try out X32 for years now. I was hoping this post meant I could finally use it. After re-reading the Wikipedia article, I was left wondering can my Ubuntu 12.04.3 run it? Where can I get the X32 software? According to Phoronix, no tier-one Linux distribution is shipping any official x32 images yet.[1]

With all the talk of 64bit CPU on the new iPhone 5s with only 1GB of RAM. I wonder if it will be subject to the 64bit memory address penalty, or if iOS is using something similar to X32?

[1] http://www.phoronix.com/scan.php?page=news_item&px=MTM1OTA

hdevalence · on Oct 10, 2013

Note that the boost is only as dramatic for programs that make very heavy use of pointers. For instance, the benchmark that gets a 32% boost [1] requires "about 100 or 190 MB" of memory on 32/64 bit, respectively. If we call the memory used for pointers A and the memory for other data B, then this means that

    A + B = 100
    2A + B = 190
    => A = 90, B = 10

So if 90% of your memory usage is pointers, this is great. But I don't think that this is a typical workload. Otherwise, the performance advantage is not as clear.

Wikipedia notes that "On average x32 is 5–8% faster on the SPEC CPU integer benchmarks compared to x86-64 but it can as likely be much slower. There is no speed advantage over x86-64 in the SPEC CPU floating point benchmarks."

Perhaps the reason that no tier-one Linux distro ships with x32 images is that the additional complexity of supporting an additional ABI is not seen to be worth the modest performance increases.

[1]: http://www.spec.org/cpu2000/CINT2000/181.mcf/docs/181.mcf.ht...

aaronbee · on Oct 10, 2013

Another relevant benchmark is x32 vs. x86. In a memory constrained environment you may be forced to use x86. x32 gives you the ability to get the performance benefits of x86-64 (possibly plus some) while staying in your memory constraints.

frozenport · on Oct 10, 2013

30% performance boost in Java, which we know is lies built on pointers. Also maybe Scala.

sanxiyn · on Oct 10, 2013

This is not true. Since this optimization helps Java so much, it is already implemented in HotSpot itself without changing ABI.

https://wiki.openjdk.java.net/display/HotSpot/CompressedOops

frozenport · on Oct 10, 2013

Does this break vectorization? Specifically, saxpy (Y=M*X+B) requires extra operations to dereference every element and won't let you insert an FMA instruction.

conradev · on Oct 10, 2013

ARM64 has 64 bit pointers, so even though only 33 bits of those are being used for addressing memory, iOS still suffers the increased memory penalty[1].

[1] http://www.mikeash.com/pyblog/friday-qa-2013-09-27-arm64-and...

hdevalence · on Oct 10, 2013

On the other hand, it's not like the bits of the pointer which are not used for addressing memory are necessarily wasted.

You can use them as tagged pointers [1], and I think that Apple's Obj-C runtime actually does this. (I don't know for sure, as it's not really something I'm interested in, but I think I saw an article about it. I could be wrong.)

[1]: https://en.wikipedia.org/wiki/Tagged_pointer

conradev · on Oct 10, 2013

Yes, those bits are put to very good use! In ARM64, some bits are actually used to store the reference count for the object, making -retain/-release calls super fast.

asveikau · on Oct 10, 2013

Any details on that posted somewhere?

The wackiest thing about that is... if there are multiple pointers to an object (kind of the point of reference counting) and the reference count is part of the pointer... well... I don't really see that working well at all. OTOH I could see there being some optimization that means some callers might not need to modify the global reference count, only the one in their reference.

twoodfin · on Oct 10, 2013

If I'm not mistaken, those extra bits are pulled from the "isa" pointer, present in all objects and similar to a C++ vtable pointer. That is, they're in the object itself, not the pointers to it.

Mike Ash wrote an excellent summary:

http://www.mikeash.com/pyblog/friday-qa-2013-09-27-arm64-and...

asveikau · on Oct 10, 2013

OK that makes a lot more sense. Thanks.

jevinskie · on Oct 10, 2013

There is nothing preventing you from using a 32-bit pointer ABI with AArch64.

KyleSanderson · on Oct 10, 2013

FWIW I've been running x32 in production for over a year now. The system has its quirks, such as busted busybox/iptables support. Compiling these two packages against amd64 allow them to run, which is totally fine. With x32 you get the best of both worlds; I am using Gentoo.

kaeso · on Oct 10, 2013

An experimental Debian X32 port is currently being built[0], already covering about 79% of the whole packages archive[1].

[0] http://www.debian-ports.org/

[1] http://buildd.debian-ports.org/stats/x32.txt

josteink · on Oct 10, 2013

Looking at the X32Ports details-page[1], I'm pretty quick to rate this "not ready for mainstream" at this point ;)

Definitely interesting though. Once support matures, I'll be interested in trying it out.

[1] https://wiki.debian.org/X32Port

ppadron · on Oct 10, 2013

First thing that came to mind after reading this was Redis. See "Using 32 bits instances" in http://redis.io/topics/memory-optimization.

emillon · on Oct 10, 2013

X32 code is different from "plain" 32 bit code (x86 style) as it still runs in long mode and can use all registers.