
Compress objects, not cache lines: an object-based compressed memory hierarchy - feross
https://blog.acolyer.org/2019/05/24/zippads/
======
leni536
In the article they mention that most programs are not dominated by arrays but
objects. An other cure for that is to use arrays of primitives a lot more:

[https://www.youtube.com/watch?v=yy8jQgmhbAU](https://www.youtube.com/watch?v=yy8jQgmhbAU)

~~~
idsout
Another great talk on data-oriented design, given by Mike Acton at cppcon 2014
[https://www.youtube.com/watch?v=rX0ItVEVjHc](https://www.youtube.com/watch?v=rX0ItVEVjHc)

------
fjfaase
I once wrote an interpreter that performed some operation on interpreting a
data structure. Because it seems that the interpreter was spending most of its
time on interpreting the data structure, I thought it would be a good idea to
'compile' the data structure in code. That resulted in a lot of code, which
took a long time to compile, but I was hoping that the compiled code would be
much faster. But it turned out to be almost twice as slow. Then if dawned to
me: The interpreter and the data structure fitted in the cache, while the
compiled code did not.

I bet that in some cases using a dedicated memory allocator that takes care
that objects that belong together are stored together, can result in execution
improvements. If you using the default memory allocator, it could happen you
get pieces of memory far from each other, especially if other threads are also
allocating memory or because temporary objects (think string manipulations)
are created during the construction of the data structure.

------
kristianp
Although the authors have designed hardware for this, I can imagine it being
used by a language's VM (e.g. the JVM), making use of cache size-aware
Zippads.

------
Haga
Adapt a perfectly working machine to a mental crutch instead of learning data
oriented design?

~~~
dm3
If this means running millions of already written programs more efficiently -
yes!

~~~
vardump
The question is whether those gates are really best used for this feature.
Also, this feature might increase latency and make tens of millions programs
run slower.

~~~
AstralStorm
It probably can be done generic enough in a cpu, reordering data layouts on
the fly using some access and temporal locality tracing like done for caches.
Slight memory cost.

Security implications are important though, with programmer nor kernel no
longer controlling memory layout.

Initial stages of this are already seen in the various ways of memory
interleaving for multiple banks, cores and cpus.

The truly fun part would be speculative as opposed to tracing reordering...

