
Data-Oriented Design (2018) - chrispsn
http://www.dataorienteddesign.com/dodbook/
======
carapace
I think what's missing is something like "Data Model Patterns: A Metadata Map"
by David C. Hay

It's like C. Alexander's "Pattern Language" but for data models.

> ...I was modeling the structure— the language —of a company, not just the
> structure of a database. How does the organization under-stand itself and
> how can I represent that so that we can discuss the information
> requirements?

> Thanks to this approach, I was able to go into a company in an industry
> about which I had little or no previous knowledge and, very quickly, to
> understand the underlying nature and issues of the organization—often better
> than most of the people who worked there. Part of that has been thanks to
> the types of questions data modeling forces me to ask and answer. More than
> that, I quickly discovered common patterns that apply to all industries.

> It soon became clear to me that what was important in doing my work
> efficiently was not conventions about syntax(notation) but rather
> conventions about semantics(meaning). ... I had discovered that nearly all
> commercial and governmental organizations— in nearly all industries —shared
> certain semantic structures, and understanding those structures made it very
> easy to understand quickly the semantics that were unique to each.

> The one industry that has not been properly addressed in this regard,
> however,is our own— information technology. ...

[https://www.goodreads.com/book/show/20479.Data_Model_Pattern...](https://www.goodreads.com/book/show/20479.Data_Model_Patterns)

I've pushed this at every company/startup I've worked at for years now and
nobody was interested. You can basically just extract the subset of models
that cover your domain and you're good to go. Or you can reinvent those wheels
over again, and probably miss stuff that is already in Hay's (meta-)models.

------
carapace
Two (more) things I'd like to point out:

Prolog clauses are the same _logical relations_ as in the Relational Model of
DBs. ( Cf. Datalog
[https://en.wikipedia.org/wiki/Datalog](https://en.wikipedia.org/wiki/Datalog)
)

The next big thing IMO is _categorical_ databases. Go get your mind blown by
CQL: [https://www.categoricaldata.net/](https://www.categoricaldata.net/) It's
"an open-source implementation of the data migration sketch from" the book
"Seven Sketches in Compositionality: An Invitation to Applied Category Theory"
which went by yesterday here:
[https://news.ycombinator.com/item?id=20376325](https://news.ycombinator.com/item?id=20376325)

> The open-source Categorical Query Language (CQL) and integrated development
> environment (IDE) performs data-related tasks — such as querying, combining,
> migrating, and evolving databases — using category theory, ...

~~~
dmux
In reading both your comments, I kept thinking about Smalltalk's Browser and
Method Finder functionalities.

In Hay's book, he talks about modeling not just Object Classes, but how you
can go about modeling ObjectClass classes (the metadata). In Smalltalk, this
metadata is automatically available via the Browser.

In reviewing the CQL Tutorial, the Typesides' "java_functions" specification
section made me think of the Method Finder (wherein you could provide an input
value and the expected output value and methods that satisfied the
transformation would be shown). I'm not sure if in CQL you'd be able to search
for a function that satisfied criteria, but that may beyond the scope of what
that system is trying to provide.

In any case, interesting threads of thought to be followed and explored.

~~~
carapace
Cheers!

Apparently Conal Elliott[1] has reached out to the CQL folks so if they get
together there should be some interesting and useful stuff emerging there. I
told Wisnesky they should make a self-hosting CQL compiler in CQL. ;-D Maybe
modeled on Arcfide's thing[2]. Categorical APL...

[1] [http://conal.net/papers/compiling-to-
categories/](http://conal.net/papers/compiling-to-categories/)

[2]
[https://news.ycombinator.com/item?id=13797797](https://news.ycombinator.com/item?id=13797797)

------
slifin
When designing data structures that hold my program's state, my structures
tend to become more coupled& brittle because I use them in multiple contexts
and do a poor job of separating them, I want each context to have the perfect
shape for each context even if they share data

It makes me wonder what happens if instead of creating a tree of data I put my
data into a Datom flat data structure then used syntax like Datalog to conform
data into shape for each new context

Unfortunately, I still haven't installed Datomic free to try this out because
I feel dirty if it doesn't come from brew

~~~
lilactown
With things like EAV and Datalog, you still end up coupled to the names and
kinds of facts that you store and the relationship between entities in your
structure. It turns out that there can be many ways to store facts about an
entity, and model the relationships between them, which may be convenient or
not depending on the context. But it does help in that it can much more easily
evolve with the kinds and shapes of data that you need to store in your app as
your app changes and grows.

More abstract constructs like lenses can help with this as well. By building
lenses for reading that can transform to a domain context structure from your
global program state, you can keep them relatively decoupled and ensure all of
the glue lives only in those lenses.

I'm undecided on the utility of leveraging lenses for mutation as well; on the
one hand, mutation allows one to operate strictly in the local context, but on
the other hand, it requires your lenses to be truly bidirectional (which I
think is harder in practice than in theory), and counts on either synchronous
sync of local state and global state _OR_ eventual consistency with the rest
of the app.

Going a more CQRS-esque way like Redux/re-frame allows you to not rely on the
bidirectionality of your lenses and also ensures consistency in that local
state is only ever driven by changes to the global state.

------
mmsimanga
As someone who works in data warehousing and business intelligence I always
feel the pain of developers thinking of data as an afterthought. Biggest
issues are always not all data is persisted and not all changes are tracked in
the database. The means you can never do all the reports the business want and
it is always a mission to explain to business that you cannot report off data
that doesn't exist or not properly stored (referential integrity). My dream
has always been one day developers will think of the data first :-).

------
dang
Thread from 2016:
[https://news.ycombinator.com/item?id=11064762](https://news.ycombinator.com/item?id=11064762)

~~~
steveklabnik
(Not to nitpick, but the thread is from 2013; that thread said the site is
from 2013, but the page itself says that it's 2018...)

~~~
dang
In 2016 there was a discussion of a 2013 book. Now in 2019 there is a
discussion of the 2018 edition of that book. I think we're sorted now!

(I originally said "thread from 2013" above but that was wrong and I edited
it.)

~~~
steveklabnik
Makes perfect sense, and words are hard. Thanks :)

------
blub
I've skimmed over the book and I think it _definitely_ needs more code
examples and before and afters, because whatever little code I saw is
unconvincing.

In fact, when developing I don't want to think too much about mere _data_ , I
want to see the algorithms and data structures and even the overall program
logic. Having to worry about each individual struct member, and whether those
are used properly and kept in sync is a pain and is throwing away the
brilliant idea of encapsulation.

The chapter on "managers" reads like a caricature of poor OOP practices, I
don't quite understand how it's become a best practice in data-oriented design
:)

All in all, this seems like a clumsy way of designing things.

~~~
drainyard
There's plenty of good talks and literature on the subject and why it is
applicable to game development, but also more generally.

This talk by Mike Acton (Formerly Insomniac now Unity):
[https://www.youtube.com/watch?v=rX0ItVEVjHc](https://www.youtube.com/watch?v=rX0ItVEVjHc)

I recommend this talk all the time, since it is the one that got me convinced
to look into DoD seriously.

Also: More Mike Acton (now at Unity):
[https://www.youtube.com/watch?v=p65Yt20pw0g](https://www.youtube.com/watch?v=p65Yt20pw0g)

Stoyan Nikolov “OOP Is Dead, Long Live Data-oriented Design”:
[https://www.youtube.com/watch?v=yy8jQgmhbAU](https://www.youtube.com/watch?v=yy8jQgmhbAU)

Overwatch Gameplay Architecture and Netcode (More specifically about ECS):
[https://www.youtube.com/watch?v=W3aieHjyNvw](https://www.youtube.com/watch?v=W3aieHjyNvw)

The main argument is that you work with data so you should care about data.
All a program does is take some input, do something with it, and give you an
output. If you structure your program around this mindset, a lot of problems
become much simpler. Concurrency is one thing that becomes much simpler to
reason about now that you understand how your data is managed and moved
around. Realisations that most of the time you have multiple of something and
very rarely do you have a single thing. So having a lot of small objects of
the same type each doing some work within their deep call-stack, rather than
running through an array of them and doing the work you need to do.

I disagree that encapsulation is a brilliant idea _in general_, because it
promotes hiding data, and hiding data is not inherently good. There's
obviously cases, where there is internally critical, but since all your state
is simply tables of data, your program is just choosing to represent that data
in some way, which can make it easier to centralize a bug early.

There's obviously pros and cons, but I don't think you should discount the
possibility of it being a good idea just because it questions ideas that seem
standard.

------
gameswithgo
I got this book recently in print form, its from a gamedev perspective
partially, but interestingly a lot of the book will be familiar to people who
work with databases, and thus a lot of the webdev world. A lot of the ideas
come from how we organize data in databases and applying that to data in your
code.

------
ChicagoDave
This is exactly the wrong design pattern to follow. Translation from code to
business model has always been problematic. Reducing that translation by
modeling code after a business domain (not a business object) is the best way
to reduce complexity and enable a longer life for a system.

~~~
drainyard
Can you elaborate?

I would argue that a business domain is defined by it's data and how that data
is transformed and displayed based on user input.

So simply put: 1\. Data goes into system 2\. Data is displayed to user 3\.
User interacts with data (button, CLI etc.) 4\. Data is transformed based on
interaction

That is ALL a program ever is. So you can model a program by just specifying
the transformations that happen on each user interaction.

This can be optimized heavily if each interaction requires a large set of data
to be transformed, e.g. through data oriented design you can work on big sets
of data based on previous interactions and transformations and only work on
data you need to work on.

It's not a design pattern. It's just a way of thinking about programs as what
they are. A design pattern is something you put on top of data because you
want a human understanding of business logic.

~~~
dragonwriter
> So simply put: 1. Data goes into system 2. Data is displayed to user 3. User
> interacts with data (button, CLI etc.) 4. Data is transformed based on
> interaction

> That is ALL a program ever is.

Well, no, that's a fairly typical pattern for an interactive program, sure,
but plenty of programs involve transformations that are not in response to
user interaction, and may not even include user interaction or display to user
at all.

~~~
drainyard
Sure, you can cut out the user interaction, but in general it is something
goes in and something comes out. I'd argue that's just a simplification and
the point still stands. 1\. Data goes in 2\. Data is transformed 3\. Data is
output/displayed

~~~
drainyard
Also, I clearly don't know how to format lists on Hackernews...

~~~
steveklabnik
Nobody does; there's no list support:
[https://news.ycombinator.com/formatdoc](https://news.ycombinator.com/formatdoc)

------
euske
I attempted to read the first section expecting that this is something that
reduces the complexity of software design, and I'm completely lost here. They
don't really explain what _is_ data-oriented design in a single summed up
paragraph. So I turned to Wikipedia and I found this:

    
    
        In computing, data-oriented design is a program optimization approach motivated by efficient usage of the CPU cache, used in video game development.
    

???

~~~
corysama
One explanation is: In the search for higher performance and greater
flexibility, game developers are converging on restructuring their data from
hierarchies of objects into something that resembles in-memory databases.

This is based on a design philosophy that gets strict about structuring code
into input-transform-output even though C++ encourages "modify parts of an
object in place". Also, it get strict about cache utilization.

And so, you get the "Entity-Component-System" (which is a 3-noun term like
model-view-controller) approach. The Components are conceptually the columns
of the database. The Entities are the rows. Each entity only has a subset of
columns, so it's actually a primary key across many tables. And, "Systems" are
data transformation functions to be applied to some subset of components

So, instead of defining cars, trees and soldiers in your game as a tree of
individual objects that participate in an inheritance hierarchy with a virtual
Update() method, you define each actor as a subset of available components
that update en-masse during the data transforms applied to arrays of raw
structs.

------
iamcurious
If you are interested in tools for automatizing Data-Oriented design. Check
out GeneXus. You design what the user needs from the data, and it
automagically builds the normalized tables, the cruds in various languagages
and deploys.

------
lincpa
The Pure Function Pipeline Data Flow, based on the philosophy of Taoism and
the Great Unification Theory, In the computer field, for the first time, it
was realized that the unification of hardware engineering and software
engineering on the logical model. It has been extended from `Lisp language-
level code and data unification` to `system engineering-level software and
hardware unification`. Whether it is the appearance of the code or the runtime
mechanism, it is highly consistent with the integrated circuit system. It has
also been widely unified with other disciplines (such as management, large
industrial assembly lines, water conservancy projects, power engineering,
etc.). It's also very simple and clear, and the support for concurrency,
parallelism, and distribution is simple and natural.

There are only five basic components:

1\. Pipeline (pure function)

2\. Branch

3\. Reflow (feedback, whirlpool, recursion)

4\. Shunt (concurrent, parallel)

5\. Confluence.

The whole system consists of five basic components. It perfectly achieves
unity and simplicity.It must be the ultimate programming methodology.

This method has been applied to 100,000 lines of code-level pure clojure
project, which can prove the practicability of this method.

[The Pure Function Pipeline Data
Flow]([https://github.com/linpengcheng/PurefunctionPipelineDataflow](https://github.com/linpengcheng/PurefunctionPipelineDataflow))

~~~
xuejie
Although I do agree data flow programming can be useful sometimes, it has been
pointed out that data oriented design is not about data flow:
[https://sites.google.com/site/macton/home/onwhydodisntamodel...](https://sites.google.com/site/macton/home/onwhydodisntamodellingapproachatall)

And taking from the other side of the view, even when you consider high
performance data flow programming, there're great people pointing out going
functional might not be a very good idea:
[https://www.freelists.org/post/luajit/Ramblings-on-
languages...](https://www.freelists.org/post/luajit/Ramblings-on-languages-
and-architectures-was-Re-any-benefit-to-throwing-off-lua51-constraints)

------
foobar_
This beats OOP and FP a million times.

~~~
tabtab
I'm a fan of "table oriented programming". But what's lacking is good
reference implementations for others to study. When I explain it with text and
short examples, most others go "Huh? Why not not just use code?"

Plus, existing development tools are not well-suited for table oriented
programming. One would have build such tools also for the benefits to show.

A reference example could be a CRUD framework or a gaming framework for a
Trek-ish style universe (not too much supernatural like Star Wars). Making
either is a non-trivial process. Maybe when I retire I'll get around to making
such...

To make it practical, we may also need somebody to implement "Dynamic
Relational" because existing RDBMS are too stiff for certain things, like
representing different kinds of UI widgets. Having a dedicated table for each
"kind" of widget is overbearing. With "static" RDBMS, one either has to use
attribute tables (AKA, "EAV" tables) or dedicated per-widget tables. That's
not ideal.

~~~
datashaman
"static" RDBMS support JSON columns which can store arbitrary shaped data.

~~~
tabtab
That's part way there, but I see at least two problems: 1st you treat
"regular" columns different than dynamic columns, and have to change your SQL
if you switch which "type" it is, and second, it's hard to index a blob of
text well. Dynamic relational wouldn't have these issues (if done right).
Marking a column to be permanent is like adding a constraint rather than
changing container "type".

