
Three Big Lies (2008) - Tomte
http://cellperformance.beyond3d.com/articles/2008/03/three-big-lies.html
======
jstimpfle
Thanks for posting! I'm a big fan of Mike Acton and the data oriented
paradigm. Abstraction and encapsulation (which is what OOP is to me) often
hinders deeper understanding of the problem, or at least hinders efficient
implementation. Instead it creates conceptual and maintenance boundaries
(which might be what's needed).

~~~
vvanders
Yup I'm a fan of Mike Acton as well, always fun to show someone who hasn't
seen Data Oriented Design a path to a 10-50x perf improvement that they didn't
think was possible.

Back when I first saw it was in one of Bruce Dawson's courses where we did
simple image manipulation. I think the task was to implement bitblit(move one
rect of a bitmap into another rect on another bitmap).

I remember him grabbing one random student's assignment, throwing in some
rudimentary timing and then saying, "lets see if we can do any better". By the
end of ~15 minutes he'd sped the thing up 800x with a combination of reading
by row rather than column, aligned reads and some other tricks that looked
like black magic at the time. Looking back now it all seems fairly obvious
once you know how the hardware works.

Good times.

~~~
stagger87
Why call it data oriented design? Why not just call it what it is? Cache
optimization.

~~~
dottrap
Because it isn't just about cache optimization. How you organize your data
impacts the types of transformations that can and need to be done. Utilizing
SIMD is another thing that is extremely sensitive to data layout.

~~~
vvanders
Yup also relevant for subsystems that need to shuffle data across a bus like
the SPUs on the PS3 that only have 256kb of addressable memory.

------
AnimalMuppet
> If there's a rocket in the game, rest assured that there is a "Rocket" class
> (Assuming the code is C++) which contains data for exactly one rocket and
> does rockety stuff.

Probably true.

> With no regard at all for what data tranformation is really being done, or
> for the layout of the data.

I agree that OO design is usually not done with regard for the layout of the
data.

> Or for that matter, without the basic understanding that where there's one
> thing, there's probably more than one.

Say what? In C++, you'd simply create another object that is another instance
of the Rocket class. Behold, "more than one".

> Though there are a lot of performance penalties for this kind of design, the
> most significant one is that it doesn't scale. At all. One hundred rockets
> costs one hundred times as much as one rocket.

Well, see, one hundred rockets have one hundred times the data that one rocket
has. I don't see how you can avoid that, _no matter how you represent the data
or whether you do it as OO or something else._

But you don't duplicate the _code_ 100 times, just the data. You might also
have 100 instances of a pointer to a virtual function table, which means
you've wasted 100 times the size of one pointer. That's not impressing me with
the inefficiency here.

So you can color me unimpressed with this whole argument on lie #2.

~~~
dottrap
You’ve missed a lot of important real world details about what’s going on in
the system, which is the heart of what Mike Acton talks about.

> Say what? In C++, you'd simply create another object that is another
> instance of the Rocket class. Behold, "more than one".

So now you have a bunch of separate instances of rockets which are probably
scattered throughout heap memory which will lead to cache misses on every
access. Mike is describing the lost optimization potentials here because you
didn’t think to reason about how you will use this data.

A simple example, every rocket needs to update its position every frame and
probably obeys a velocity equation defined in the game. Iterating through
every rocket scattered out through heap memory is already going to kill you
with cache misses. But additionally, since all the rockets obey the same
equation, we should be using SIMD to compute everything which can let us do
4x-16x operations (or more depending on hardware) for the same cost as doing
one. But chances are your Rocket class ivars are not nicely laid out for SIMD
(AoS vs. SoA), so you will be forced to copy or swizzle a bunch of data which
negates the performance benefits you are trying to win with SIMD. If you
designed your data upfront with this idea in mind (many rockets for batch
operations), then you get both cache optimization wins and SIMD wins, and now
we are talking about speed ups that can be easily 10x-100x. And we haven’t
even touched the possibility of further parallelizing this across multiple
cores.

A real world example they gave at GDC on SIMD was a many-players-to-many-doors
problem they had to solve. On every frame, any player near any door had to
automatically open like in Star Trek. 30 doors and 100 players means they have
3000 tests they have to run. The original algorithm wasn’t data oriented and
what Mike Acton would probably call typical C++ BS: A single Door class. On
strict CPU budgets to handle everything else in the game, the cache misses
alone were worrisome. In the talk they (obviously) convert to SIMD with the
idea of ‘many’ using Data Oriented Design (which also solves the cache miss
problem). They got a 20x-100x speed up (depending on the number of players and
doors).

------
klodolph
That font is one of the hardest fonts to read I've ever seen used seriously on
a website.

~~~
CharlesW
Readability ([https://www.readability.com/](https://www.readability.com/)) is
a helpful tool for reading sites which with hard-to-read designs:
[http://rdd.me/sjahtctg](http://rdd.me/sjahtctg)

~~~
dredmorbius
Readability seems to have ceased all devolopment and communications for the
past three years or more. I'd suggest other options.

------
Pica_soO
With DAO you are building a refinery- you look at what comes through, how long
it takes to process, how its stored and where your pipe has the smallest
diameter and what distillation unit takes the longest.

With Object Orientation you will do the same, but with n different typed
bottles in boxes, the boxes in containers which are driven around as a whole
by trucks. You might be able to do so in great comfort - everybody on this
planet knows how to ship boxes. But you will trade off for control and
efficiency.

------
devishard
I totally disagree that #2 is a lie. Code _should_ be designed around a model
of some part of the world, it's just that what the author is describing is a
pretty bad way of modeling the world.

Take this part: _" If there's a rocket in the game, rest assured that there is
a "Rocket" class (Assuming the code is C++) which contains data for exactly
one rocket and does rockety stuff. With no regard at all for what data
tranformation is really being done, or for the layout of the data. Or for that
matter, without the basic understanding that where there's one thing, there's
probably more than one."_

This is an enormous straw man. You don't need to be using C++, or even a real
OO language, to design your code around a model of the world. I'd go so far as
to say that C++ is a pretty poor choice of language for modeling the real
world. And even in an OO language, no decent practitioner writes one-class-
per-object. That's not how objects are intended to work, and even pretty bad
practitioners of OO don't usually screw it up _that_ badly.

And the underlying thing here is that if you model interactions in the real
world accurately, at least the parts that are relevant to what you're trying
to do, the data transformations and layout tend to fall into place naturally.
Of course there are exceptions; we don't have leak-free abstractions yet.

~~~
jstimpfle
> And even in an OO language, no decent practitioner writes one-class-per-
> object.

But that's the stereotypical example of OO design. You have duck->paint(),
duck->quack(), duck->plunge() all in one class (file) and of course the
dependency mess and the scattering of aspects throughout the project.

These have definitely been problems in my own software design attempts and in
many of the projects I've seen.

And even if you make more classes, so that your design is more like one class
per concept/aspect, I think the criticism of Mr. Acton is: if you have many
instances of a given class, then there must be a better way than calling a
method on each individual instance.

In other words, the idea is that classes are fine (they promote
modularization), but there shouldn't be more than one instance of each class.

Can't see a straw man there.

~~~
devishard
>> > And even in an OO language, no decent practitioner writes one-class-per-
object.

> But that's the stereotypical example of OO design. You have duck->paint(),
> duck->quack(), duck->plunge() all in one class (file) and of course the
> dependency mess and the scattering of aspects throughout the project.

So much nonsense here. One class per object is absolutely not the
stereotypical example of OO design. class != file. And dependency management
is usually a problem because junior devs pull in a billion half-baked
libraries to solve a problem--it's not an inherent problem with OO and it's
_certainly_ not a problem with trying to model the real world.

I'm not even particularly in love with OO. I particularly think that
functional paradigms often do a better job of modeling the real world. What
I'm really disagreeing with is the claim that modeling the real world is a bad
practice.

> And even if you make more classes, so that your design is more like one
> class per concept/aspect, I think the criticism of Mr. Acton is: if you have
> many instances of a given class, then there must be a better way than
> calling a method on each individual instance.

If that was Acton's criticism, then he should have said that instead of saying
that "code should be designed around a model of the world" is a lie.
Particularly since if you're acting on a large list of objects, then
representing it as if you're going through and then each object is acting is a
pretty bad representation of reality.

> In other words, the idea is that classes are fine (they promote
> modularization), but there shouldn't be more than one instance of each
> class.

Now you're just confused. Acton specifically was criticizing one instance per
class in the section I quoted, and now you're saying that's what he's
supporting?

And for the record, if there's only one instance of your class, you didn't
need a class.

~~~
jstimpfle
No need to get personal.

> So much nonsense here. One class per object is absolutely not the
> stereotypical example of OO design.

I didn't say that. I said: The stereotypical example is "one class per
(concept / class of) real world thing(s)". Like "Rocket" or "Duck".

> If that was Acton's criticism, then he should have said that instead of
> saying that "code should be designed around a model of the world" is a lie.

It needs just a little context or reading between the lines to understand the
intentions instead of twisting words to make accusations.

> Now you're just confused. Acton specifically was criticizing one instance
> per class in the section I quoted, and now you're saying that's what he's
> supporting?

I am not confused. He wasn't criticizing "one runtime instance per class", but
"one per runtime instance per real world thing". That's something different.

The quote reads _rest assured that there is a "Rocket" class which contains
data for exactly one rocket_. I translate, he suggests to combine all "real
world rockets" in a single runtime object instead of representing each rocket
in its own runtime object.

Concretely, he would make a "Rockets" class instead of a "Rocket" class,
because that typically allows for simpler and more efficient implementation.
(Of course, if there were also planes or missiles or bullets, he would think
twice before making a Rockets class).

As I commented elsewhere on this page, there are very close analogies to
relational databases -- especially the column-store flavour.

~~~
devishard
Okay, given your understanding of what he said, I can see why you might agree
with him (although I don't), but critically, that's not what he said.

~~~
dottrap
Mike has given multiple talks on Data Oriented Design.

Another example he gave at CppCon was a Chair class. In a real game, you may
have a static chair, a dynamic lighting chair, a breakable chair, a physics
chair. There is a tendency to make these all relate through some common Chair
class because they all share some "chairness" in the real world. But in
reality, the data and transformations each need have nothing in common and
trying to shoehorn them into some relationship because it resembles something
in the real world is counterproductive.

------
pnathan
I'm not sure that code is ephemeral. It seems to congeal into a thixatropic
mass. But it's clear that data itself - that mutable-state heterogeneous-
structured blob - has deep value, and _handling_ that data appropriately is
very important. This isn't treated adequately in the zeitgeist.

