
Memory layout of multi-dimensional arrays - ingve
http://eli.thegreenplace.net/2015/memory-layout-of-multi-dimensional-arrays/
======
lispit
Related to this is the concept of arrays of structures (AoS) vs. structures of
arrays (SoA). Do you store all of the data that makes up one "object" in a
single data structure, or do you store all of each "member" of each object in
separate flat arrays?

If you need to use all of the "members" of an "object" at once, AoS is more
efficient. But if you only need to use 1 or 2 members for a certain task, you
will waste lots of memory bandwidth on data that isn't needed for the current
operation. In this case, an SoA layout can improve your performance by an
order of magnitude, by insuring that only the data that is actually necessary
for your task is loaded into cache.

The same principle applies to row vs. column-oriented databases, which
obviously are orders of magnitude more sensitive to IO bandwidth usage.

One of the most damning criticisms of OOP, from this perspective, is that it
encourages you to wrap up lots of potentially unnecessary data into monolithic
"objects" without concern to memory layout or usage patterns, giving you AoS
_and_ bloated data structures.

~~~
jofer
Good points. However, object oriented programming doesn't necessarily mean
arrays of objects. I'd disagree that it encourages it in any way.

I realize that's a common paradigm in Java and C++, but it's an architectural
decision that you make, not a part of a programming paradigm. The array-of-
structs prevalence really stems from C, not object-oriented approaches.

You can do both in a object-oriented manner. For example:

    
    
        points = Points(xarray, yarray, zarray)
    

vs

    
    
        points = [Point(x0, y0, z0), Point(x1, y1, z1), ...]
    

If you're mostly operating on a series, it makes more sense to structure
objects and methods around the former "plural" approach, anyway.

------
jofer
Excellent summary!

Layout of N-dimensional arrays becomes even more important when you start
considering on-disk formats, where seek times are relatively slow.

People often talk about out-of-core calculations without fully considering how
their data is laid out on disk. If your access pattern doesn't match the on-
disk ordering of your large, N-dimensional array, you're in for a large
surprise. Accessing a portion of an array that takes milliseconds in RAM can
easily take multiple hours if you naively write a large multidimensional array
to disk with an ordering that doesn't match your access pattern.

Of course, this is why chunking exists, but surprisingly few people are as
familiar with it as they should be.

For example, at work, I often need to use a specific legacy format that's
basically a large 3D fortran-ordered array stored on disk. I often get asked
why I'm loading the entire thing into memory instead of taking an out-of-core
approach. The problem is that our access pattern deals entirely with accessing
a single "trace" (z-direction) at a time. For a fortran-ordered array stored
on disk, that's the worst-possible access pattern. Each element in a single
trace is a long ways apart throughout the entire file. Loading a single trace
takes far long than loading the entire file into memory on most (non-SSD)
systems.

I've recently been on a bit of an evangelical kick for trying to persuade more
people to think about the implications of on-disk layout. (For example:
[http://geology.beer/2015/02/10/hdf-for-large-
arrays/](http://geology.beer/2015/02/10/hdf-for-large-arrays/) While that's
specifically about HDF5, most of it is explaining memory/disk layout and its
implications.)

------
agibsonccc
This is a fascinating topic I have had to spend a lot of time studying as I
was writing [http://nd4j.org](http://nd4j.org) (numpy port for the jvm)

When you want to represent say 6d data you have to account for all sorts of
interesting details.

One interesting detail is strides along particular dimensions.

For example given an ndarray of

2 x 3 x 1 x 5

You will have a stride to get to the next element along a dimension in the
linear buffer.

The strides along k dimensions are determined by ordering.

The crazy thing here though is that 1 in the middle for the shape.

If you find a 1 in the shape you actually skip that stride when indexing into
the array.

Numpy's internals go in to this quite a bit: [https://scipy-
lectures.github.io/advanced/advanced_numpy/](https://scipy-
lectures.github.io/advanced/advanced_numpy/)

The great thing about the linear storage of these arrays also helps with
distributed systems and reading blocks of bytes.

Lots of interesting little things in here to consider.

Without going ok another tangent this is also highly relates to blas. There is
a lot more to this in practice. This is a wonderful introduction to the topic
though.

~~~
Retra
You can also use space-filling curves to produce nice cache behavior under
certain assumptions. If you're only needing to view a small, contiguous
portion of a very large array, for instance.

~~~
lispit
GPUs use tiling and morder-order storage for textures for this reason.

[https://fgiesen.wordpress.com/2011/01/17/texture-tiling-
and-...](https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-
swizzling/)

~~~
sitkack
... Morton Order
[https://en.wikipedia.org/wiki/Z-order_curve](https://en.wikipedia.org/wiki/Z-order_curve)

~~~
lispit
...oops...

------
colanderman
Was hoping this touched on Z-ordering. If you have a processor that can
efficiently interleave bits (i.e. combine 0110 and 1011 to get 01101101), you
can lay out 2D and 4D arrays in Z-order, which will improve spatial locality
(nearby elements in both the X and Y dimensions are also nearby in
memory/cache/TLBs).

You can actually efficiently achieve this even on processors without an
interleave instruction if you're only walking stepwise -- basically, mask out
the odd (resp. even) bits separately, setting the even (resp. odd) bits to 1
before adding 1, or to 0 before subtracting 1, then recombine.

------
rcpt
There's more about multi dimensional numpy iterators in "beautiful code"
[https://www.safaribooksonline.com/library/view/beautiful-
cod...](https://www.safaribooksonline.com/library/view/beautiful-
code/9780596510046/ch19.html)

------
halayli
relevant:

[https://software.intel.com/en-us/articles/optimize-data-
stru...](https://software.intel.com/en-us/articles/optimize-data-structures-
and-memory-access-patterns-to-improve-data-locality)

