

C++ encapsulation for Data-Oriented Design - joaquintides
http://bannalia.blogspot.com/2015/08/c-encapsulation-for-data-oriented-design.html

======
yoklov
This is awful. Some reasons, in no particular order

\- Straight up, this isn't data oriented. You're dealing with one object at a
time.

\- This code is vastly harder to understand in its entirety, much less extend,
than the data oriented approach. This is template craziness to it's fullest
degree, and what it's doing is going to be unfamiliar to the vast majority of
programmers. Hell, I've read an article, and have been staring at for 15
minutes now and I'm not fully sure. Admittedly, I'm a bit rusty with my
templates, but that's a very bad sign for maintainability.

\- Memory access patterns aren't the only reason to do DOD, a big one is being
able to easily simd optimize your code. This defeats that.

\- If you're _not_ going to simd optimize your code, then storing everything
as SoA is often not a great idea (you want to group the things you touch at
the same time, so that you have fewer cache misses).

\- You can still have encapsulation with data oriented code. It's just
encapsulated at a different granularity (you encapsulate access to the data
arrays).

~~~
Veedrac
I think the idea here is just to put names onto traditional SOA. Thus, instead
of doing `foo_xs[k] += foo_dxs[k]`, one does `foo.x += foo.dx` and lets it all
magically resolve itself.

In theory, this should help the compiler automatically vectorize (than AOS),
but even if it doesn't it still allows you to use a slightly higher-level
interface for loading your values for manual SIMD.

Note that at the end, the poster says

> Traversal is at the core of DOD, as the paradigm is oriented towards
> handling large numbers of like objects. In a later entry we will extend this
> framework to provide for easy object traversal and measure the resulting
> performance as compared with OOP.

I also agree that recursive templates are a pain, but "modern" C++ kind'a
expects you to use them anyway. As recursive templates go, this one isn't all
that bad to use.

~~~
kllrnohj
I suspect it's far more likely for "foo_xs[k] += foo_dxs[k]" to be auto-vec'd
than for "foo.x += foo.dx" via templates to be auto-vec'd. The first is very
straightforward and obvious with a clean auto-vec route, the later is not.

~~~
Veedrac
I've obviously not benchmarked, but from what I can see all the complexity is
in the templates, which only exist at compile-time. From what it looks like to
me, it should all dissolve before the optimizer gets to peek at it.

~~~
joaquintides
Hi Veedrac, some results at

[https://news.ycombinator.com/item?id=10177262](https://news.ycombinator.com/item?id=10177262)

------
jamii
I would worry that trying to shoehorn the code back into an entity-at-a-time
OO-style interface still encourages poor access patterns. I much prefer the
bitsquid approach which treats component managers as the objects and provides
nice bulk-access apis for each:

[http://bitsquid.blogspot.com/2014/08/building-data-
oriented-...](http://bitsquid.blogspot.com/2014/08/building-data-oriented-
entity-system.html)

[http://bitsquid.blogspot.com/2014/09/building-data-
oriented-...](http://bitsquid.blogspot.com/2014/09/building-data-oriented-
entity-system.html)

[http://bitsquid.blogspot.com/2014/10/building-data-
oriented-...](http://bitsquid.blogspot.com/2014/10/building-data-oriented-
entity-system.html)

[http://bitsquid.blogspot.com/2014/10/building-data-
oriented-...](http://bitsquid.blogspot.com/2014/10/building-data-oriented-
entity-system_10.html)

------
louithethrid
Best approach imo would be a IDE which keeps the whole object-orientated
approach as a purely developer-view only set of restrictiosns and allows to
adapt the underlying datastructure to the algorithmic hottest spots in the
code, while keeping from oo- only the component interfaces.

Thanks for the Bitsquid links.

~~~
kibibu
If you have a few hours, have a look at Jon Blow's new language, Jai.

Most of the content is in video form so not easy to grok, but there is
essentially a way to switch from Array of Structs to Struct of Arrays with a
single keyword. This gets you many of the benefits of Data-Oriented
Programming with minimal effort.

------
scotty79
I really hate private, protected, sealed and such. At least few times I found
myself in situation that I needed to use some library in ways author haven't
anticipated or create workaround for a bug there.

I had to recompile it and duplicate large parts of the library code just
because somebody put private, protected or sealed for no apparent reason in
few places.

In JS keeping classes only as local variables not exposed in any way to user
of the library has the same effect. Please don't do it. If you want me not to
shoot myself in the foot then create good api, not just try to hide the gun in
the closet. I'll find it anyway and the more time I spend looking for it, the
less time I'll have to understand how to operate it safely.

------
okasaki
It'd be nice to see some benchmarks...

------
JoeAltmaier
We're at a crossroads in CPU design. The hardware guys implement heuristics to
make ordinary code work "pretty good", with cache lines and multiple layered
caches and register calculus and stalling pipelines.

Then compiler folks reverse-engineer all that to get "really good"
performance. But they can't make one code for every model of CPU because this
stuff changes with each release.

How about, CPU leaves these decisions up to the compiler? Flatten and
simplify, and make (allow) compiler writers to know exactly what's going on,
and even control it!

~~~
Veedrac
I'm not sure what you're suggesting. Software prefetching? Software branch
prediction? Those sound like pretty poor ideas if that is indeed what you mean
- a lot of time will be spent emulating something that hardware does really
well.

~~~
JoeAltmaier
E.g. the Mill architecture, which has no register-collision issues at all. At
the cost of the compiler keeping track of the address of results as they pass
through the mill.

And hardware doesn't do it well at all, at least not in the super-optimization
cases discussed in the OP. They're working hard to reverse-engineer around the
hardware; which is at least as hard as the compiler programming caches and
pipelines explicitely.

~~~
kllrnohj
Mill has two issues:

1) It doesn't exist. Theory is irrelevant, build an actual chip and run some
actual software on it and then let's talk.

2) VLIW of 33 operations? Ain't no way in hell they are keeping that fed. Not
happening. AMD couldn't keep a VLIW of 5 fed on their GPUs. They dropped to
VLIW4, then switched to a RISC SIMD approach. And that's with a compiler that
only has to deal with a very constrained grammar that almost never encounters
branches. Which is about as far from general purpose workloads as you can get.

~~~
Veedrac
Issue #1 is a big deal, but they've never said otherwise.

Issue #2 is less of a reason to worry. The 33 instructions is an upper limit,
and will likely be a lot lower on most implementations. They also don't
expect, nor need, it to be saturated.

Further, their instruction set is much more amiable to parallelism than normal
instruction sets because

* Their instructions have special values (NaN and NaR) that enable extra parallelism.

* Their instructions avoid side effects that prevent parallelism. This includes not just condition codes, but effectively the whole belt mechanism.

* Their instructions can move data across themselves horizontally (under certain phase boundaries), so certain data dependencies are allowed _inside_ an instruction. In essence, you're running at three-to-five times the clock rate but each clock is only allowed to run a given subset of instructions.

There's an example they give where the whole of a vectorized strcpy iteration
is done in one instruction, with no set-up or tear-down needed. This is of
course a best-case scenario, in that every technique they have is in use
simultaneously, but all of the techniques they give are generally applicable
and (relatively speaking) simple for compilers to use.

