
JVM Anatomy Quarks: Compressed References - luu
https://shipilev.net/jvm/anatomy-quarks/23-compressed-references/
======
fulafel
Data layout optimizations are a really neglected part of compiler engineering.
Performance bottlenecks are now in caching and moving data in the memory
hierarchy, instead of ALU throughput. So compact data representation lets you
fit more items in the same amount of cache and move more items in the same
amoutn of banwidth.

Compressed references is one of the most basic optimizations that is routinely
done by hand when programmers do performance work. Others include storing data
in a columnar form (aka AoS->SoA transformation), altering array layout to be
cache-friendly (tiling / swizzling), moving rarely accessed members out of
line, and field shrinking based on dynamic range analysis.

The history of "compressed references", afaict, is reactive on the JVM: people
were shocked at memory usage increases when moving from 32-bit to 64-bit
machines. So it came about semi-accidentally and is not really indicative of
systematic work on data layout optimizations in the JVM...

There is also a term of art in CS that refers to economical use of memory:
[https://en.wikipedia.org/wiki/Succinct_data_structure](https://en.wikipedia.org/wiki/Succinct_data_structure)
and a second one that is for shrinking data structures based on content
entropy:
[https://en.wikipedia.org/wiki/Compressed_data_structure](https://en.wikipedia.org/wiki/Compressed_data_structure)
\- this literature is dealing with more advanced encoding tricks that are
rather far from what a compiler could realistically do on its own, and often
make big tradeoffs such as making the structure read-only.

~~~
the8472
Rust does quite some optimization on layouts[0], it can do so since its
internal ABI, including struct layouts, is unspecified and thus they can
change it while still having C ABI for interop. They're also keeping the
options open for PGO-based layouts.

Of course those are still compile-time decisions, so compressed pointers are
out since they only work if you restrict yourself to ~32GB, depending on
alignments.

[0] [https://github.com/rust-lang/rust/pull/45225](https://github.com/rust-
lang/rust/pull/45225)

~~~
fulafel
Can you point out some layout optimizations that are done? I could not make
out much of your link sadly.

~~~
the8472
Look at the "Size optimizations implemented so far" section. It's mostly about
packing enum discriminators into unused bits and bytes.

Beyond that rust has always done field reordering to minimize wastage from
alignment gaps.

Other optimizations are still being planned.

------
colechristensen
The short version is that you are often better off running multiple instances
of a JVM application with a heap <32GB when you have the choice between
scaling up and scaling out.

~~~
callmeal
That depends on your memory usage. There is an awkward range (32GB - 48GB)
where the increase in heap size does not give you a proportional increase in
available memory because of the doubling of references. That is where you're
better off running multiple instances. But for heaps bigger than 48GB running
a single jvm is more efficient.

------
zamalek
_> Once the heap size gets larger than the threshold under which compressed
references are working, a surprising thing happens: references suddenly become
uncompressed_

Am I correct in assuming that this is made possible because Hotspot can re-
JIT? So the GC/Allocator just warns the JITter, "hey, all of your Oops [flags]
code is no longer valid," resulting in a re-JIT of everything? Or a prologue
in every method that checks the Oops flags (and triggers a re-JIT)?

~~~
deadc0de
No, compressed oops mode doesn't change at runtime. That's pretty hard to pull
off. You'd need to change object layout dynamically and unless you want to
reformat the whole heap you'd need tags or regions with different pointer
rules. Any of these techniques would have a very substantial runtime overhead.

------
kristianp
"The name underlines the fact that the single post is not enough to form
reasonable matter"

What does that bit of legalese mean?

~~~
userbinator
That is an exceedingly awkward sentence --- either due to non-native influence
or an attempt to sound fancy. It's really saying something like "the articles
refer to each other so you might not understand everything from reading just
one."

