
Linux Kernel Developers Discuss Dropping x32 Support - pantalaimon
https://www.phoronix.com/scan.php?page=news_item&px=Linux-Potentially-Drops-x32
======
dragontamer
If you want 32-bit pointers for compression, it seems more like the job of a
"32-bit malloc()" like interface. The malloc_32bit() would return a pointer in
a 32-bit 4GB space, and you just add a 64-bit pointer to get the true address.

An add takes 1-clock cycle (latency). An L1 cache lookup takes 4-clock cycles
(latency). So an add + L1 lookup would be only 5-cycles of latency. Skylake
can do two memory lookups per clock and many adds per clock (I think 3 adds
per clock?), so it wouldn't take many resources at all.

The actual memory lookup takes ~100+ Cycles if its outside of the cache.

It seems like this problem can be 100% solved in userland without needing
kernel support. For something like following a linked-list chain, I'd expect
almost no change in speed due to out-of-order executions and memory latency
hiding techniques of the CPU.

\----------

64-bit user space should remain the default, if only for ASLR / security
reasons. Its actually trivial for CPUs to traverse an entire 32-bit space
(4-billion isn't a big number: most computers today are ~3GHz, or 3-billion
operations per second...)

So if you want your address space randomization to actually have any security
what-so-ever, you better use the 64-bit (erm... really 48-bit) address space.

~~~
Vogtinator
Should be trivial to implement using ptmalloc/dlmalloc/whatevermalloc and
constraining it to 4GiB.

Main issue I see is that this requires a new ABI for passing pointers around.

~~~
adrianratnapala
> new ABI for passing pointers around.

Isn't that what x32 actually is? I mean it's not just for passing pointers
around.

Aaaand that's the point. With x32, people who care about these sizes can use
the x32 ABI which is consistent in its own way. While everyone else uses the
x64 ABI.

Without x32, people who want to have 32-bit pointers need to reintroduce the
NEAR/FAR distinction into the source-level-APIs.

Ouch.

I'm guessing the real rationale for getting rid of x32 is that 32-bit pointers
are just not a big enough use case to justify it.

------
hyperman1
Java has an interesting pointer model between 32 bit and 64 bit pointers:
Compressed OOps.

A pointer to a data structure is shifted right by 3 a.k.a divided by 8. You
now have 35 bits and can addres 16GB. Data structures should be aligned on an
8 byte boundary.

At runtime, the pointer is expanded to 64 bit in a register, multiplied by 8,
and the correct offset for a struct field is added.

So you can't point at individual bytes, and you have to shift, but you gain 4
bytes per pointer. It is said this hits the current sweet spot for Java.

~~~
bogomipz
I believe this is why you sometimes see recommendations to no to set the JVM
heap larger than 32 Gigs.

Is this still the case though? Does JVM performance fall off a cliff without
compressed pointers? Or is the issue more GC beyond 32 Gigs.

~~~
user5994461
A common misconception unfortunately.

The heap is limited to roughly 30 GB, plus a bit of overhead. You will never
run in 32bits mode if you set a 32GB heap, because overhead.

From experience, a 30GB heap is similar to a 40GB in usable space. 32 vs 64
bits pointers respectively.

~~~
bogomipz
Sorry what is the common misconception? Exceeding a 32 gig heap size will
cause the JVM to stop using compressed pointers? This seems to be widely
accepted:

[https://www.elastic.co/guide/en/elasticsearch/guide/current/...](https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-
sizing.html)

I understand there's overhead I wasn't suggesting that all 32 Gigs were
useable

------
sprash
I think the X32 feature is vastly underused.

Ideally every application that is unlikely to use more than 2GB of memory
should use the X32 ABI.

Cache latency is the most important bottleneck of today's general purpose
CPUs. X32 Helps a lot with that and not just in "synthetic" benchmarks.

~~~
ajross
Yes, but the issue at hand isn't whether or not apps can or should be compiled
to x32. That capability isn't going away (it happens entirely in userspace and
is basically just a recompiled set of libraries).

This is a discussion about whether the kernel itself wants to run using x32
mode, which is arguably much less useful. Such a system would be limited to 4G
of physical memory, etc... There aren't that many such systems left in the x86
world (that run Linux, anyway -- I'm literally working on an x86_64 port for
Zephyr now, and it uses x32).

~~~
jabl
> Yes, but the issue at hand isn't whether or not apps can or should be
> compiled to x32. That capability isn't going away (it happens entirely in
> userspace and is basically just a recompiled set of libraries).

If the kernel removes X32 support, I guess it won't take long for toolchains
to remove it either. Why should they pay the maintenance burden of it if
nobody uses it?

> This is a discussion about whether the kernel itself wants to run using x32
> mode,

No, the kernel has never supported running in X32 mode itself. This is about
the compatibility syscalls allowing a X32 userspace with a x86-64 kernel.

------
loeg
Blogspam website. Original source:
[https://lkml.org/lkml/2018/12/10/1145](https://lkml.org/lkml/2018/12/10/1145)

------
geocar
I read this as just dropping building an x32 kernel: It seems that
mmap(MAP_32BIT) should continue to allow an x32 userspace (which is all I ever
want anyway).

Does anyone have more details on this?

~~~
nwmcsween
What? How would MAP_32BIT reimplement an entire ABI? It has more in common
with x86-64 than i386.

~~~
geocar
x32 isn't i386. It's x86_64 with truncated pointers.

There aren't many syscalls that return pointers; mmap() is one of them, and
using MAP_32BIT means that the pointer will be 32-bit.

~~~
AstralStorm
But the offset still is 64bit right? (As per large file support)

~~~
jabl
X32 has it's own (partial) set of syscalls. Similar to how a x86-64 kernel ran
run traditional x86-32 code.

32-bit code tells the kernel to allocate in the lower 32-bit address space
(via MAP_32BIT) precisely so that the 64-bit pointers the kernel works with
can be safely truncated to 32-bit user space pointers.

------
Wowfunhappy
I'm a little confused: is X32 support different from 32-bit x86 support, a la
32 bit mode in Windows?

~~~
gsnedders
Yes. It's essentially x86_64 but with all addresses being 32-bits, so you get
all the things like extra registers that x86_64 added without the added memory
bloat of longer addresses for everything.

~~~
mg794613
What "bloat"? It's curious to see that when a technique comes out and is being
misunderstood by the masses and praised as the best thing since sliced bread.
Suddenly robustness, accountability and predictability turn into "bloat". Do
you have a real world example?

~~~
zakk
If you code does intresive computation with structures containing a lot of
pointers, you’ll get much more cache miss when using 64 but pointers.

The performance hit can be quite heavy!

Cache is a limited resource, and you want as many pointers as possible to fit
there.

------
gnode
If the problem is that a significant amount of cache is routinely wasted by
storing many pointers of the form: 0xffffffffxxxxxxxx and 0x00000000xxxxxxxx,
then I wonder why CPU designers haven't added an optimisation to their caches
to store these numbers in 32-bit (or slightly larger) cells.

~~~
pedrocr
The CPU caches are caching normal memory pages. From the point of view of the
CPU it doesn't know if what it's caching is a pointer or just any other random
64bit number until the code actually does anything with the cache.

What you are suggesting might work generally if it was somehow worthwhile to
try and do some lossless compression of page contents before storing them in
the CPU cache. If that's not done yet I'm sure it's because the energy and
extra transistors needed for it are not worth it, and not because CPU
designers haven't thought about it.

~~~
gok
Memory compression is actually pretty widely used. Linux has zram, which
Android and ChromeOS have on by default. macOS/iOS has something similar.

~~~
pedrocr
I was aware of those. I meant memory compression within the CPU core itself
before going into cache, to match what the OP was describing.

------
qwerty456127
I just wonder how many consumer applications actually need more than 3 GiBs of
RAM per process (and I fear to imagine such an app and the idea these are
going to become common). As for now only professional (data science,
graphics/multimedia editing, CAD etc) and server-side apps may need that much
AFAIK, 64-bit pointers seem a waste on an average PC.

~~~
kbirkeland
64 bit pointers are pretty important for security. When using ASLR, a certain
number of the bits cannot be randomized. This leaves you with a randomization
space of about 12 bits with 32-bit addresses, but over 40 bits of
randomization with 64 bit addresses.

~~~
hyperman1
Our company bought a nasty piece of proxy/security software that injects a
bunch of DLLs right at the center of the address space. 2GB is small if you
lose 256MB on this kind of contraption. We had regular problems when ASLR made
unfortunate random choices at startup, and not enough continuous adress space
was available for end user software.The only option was a reboot.

------
bogomipz
The author states:

>"The x32 ABI allows for making use of the additional registers and other
features of x86_64 but with just 32-bit pointers in order to provide faster
performance when 64-bit pointers are unnecessary."

Can someone say why would 32 bit pointers provide faster performance then 64
bit on modern hardware? Is this performance difference really non-negligible?

~~~
jgowdy
It’s simple. Pointer sizes being twice as big consume twice as much cache,
thus you’re effectively cutting your cache size in half (worst case, if the
cache line were stuffed end to end with pointers). I imagine the idea that
cutting the cache size by up to 50% causes a severe performance impact doesn’t
require much explanation.

~~~
floatboth
> effectively cutting your cache size in half

Only if the cache is full of pointers, and not the numbers you're crunching :)

~~~
jgowdy
> (worst case, if the cache line were stuffed end to end with pointers)

Worst case scenario is probably a GOT or vtable.

