It's like C. Alexander's "Pattern Language" but for data models.
> ...I was modeling the structure— the language —of a company, not just the structure of a database. How does the organization under-stand itself and how can I represent that so that we can discuss the information requirements?
> Thanks to this approach, I was able to go into a company in an industry about which I had little or no previous knowledge and, very quickly, to understand the underlying nature and issues of the organization—often better than most of the people who worked there. Part of that has been thanks to the types of questions data modeling forces me to ask and answer. More than that, I quickly discovered common patterns that apply to all industries.
> It soon became clear to me that what was important in doing my work efficiently was not conventions about syntax(notation) but rather conventions about semantics(meaning). ... I had discovered that nearly all commercial and governmental organizations— in nearly all industries —shared certain semantic structures, and understanding those structures made it very easy to understand quickly the semantics that were unique to each.
> The one industry that has not been properly addressed in this regard, however,is our own— information technology. ...
I've pushed this at every company/startup I've worked at for years now and nobody was interested. You can basically just extract the subset of models that cover your domain and you're good to go. Or you can reinvent those wheels over again, and probably miss stuff that is already in Hay's (meta-)models.
Prolog clauses are the same logical relations as in the Relational Model of DBs. ( Cf. Datalog https://en.wikipedia.org/wiki/Datalog )
The next big thing IMO is categorical databases. Go get your mind blown by CQL: https://www.categoricaldata.net/ It's "an open-source implementation of the data migration sketch from" the book "Seven Sketches in Compositionality: An Invitation to Applied Category Theory" which went by yesterday here: https://news.ycombinator.com/item?id=20376325
> The open-source Categorical Query Language (CQL) and integrated development environment (IDE) performs data-related tasks — such as querying, combining, migrating, and evolving databases — using category theory, ...
In Hay's book, he talks about modeling not just Object Classes, but how you can go about modeling ObjectClass classes (the metadata). In Smalltalk, this metadata is automatically available via the Browser.
In reviewing the CQL Tutorial, the Typesides' "java_functions" specification section made me think of the Method Finder (wherein you could provide an input value and the expected output value and methods that satisfied the transformation would be shown). I'm not sure if in CQL you'd be able to search for a function that satisfied criteria, but that may beyond the scope of what that system is trying to provide.
In any case, interesting threads of thought to be followed and explored.
Apparently Conal Elliott has reached out to the CQL folks so if they get together there should be some interesting and useful stuff emerging there. I told Wisnesky they should make a self-hosting CQL compiler in CQL. ;-D Maybe modeled on Arcfide's thing. Categorical APL...
It makes me wonder what happens if instead of creating a tree of data I put my data into a Datom flat data structure then used syntax like Datalog to conform data into shape for each new context
Unfortunately, I still haven't installed Datomic free to try this out because I feel dirty if it doesn't come from brew
More abstract constructs like lenses can help with this as well. By building lenses for reading that can transform to a domain context structure from your global program state, you can keep them relatively decoupled and ensure all of the glue lives only in those lenses.
I'm undecided on the utility of leveraging lenses for mutation as well; on the one hand, mutation allows one to operate strictly in the local context, but on the other hand, it requires your lenses to be truly bidirectional (which I think is harder in practice than in theory), and counts on either synchronous sync of local state and global state _OR_ eventual consistency with the rest of the app.
Going a more CQRS-esque way like Redux/re-frame allows you to not rely on the bidirectionality of your lenses and also ensures consistency in that local state is only ever driven by changes to the global state.
(I originally said "thread from 2013" above but that was wrong and I edited it.)
In fact, when developing I don't want to think too much about mere data, I want to see the algorithms and data structures and even the overall program logic. Having to worry about each individual struct member, and whether those are used properly and kept in sync is a pain and is throwing away the brilliant idea of encapsulation.
The chapter on "managers" reads like a caricature of poor OOP practices, I don't quite understand how it's become a best practice in data-oriented design :)
All in all, this seems like a clumsy way of designing things.
This talk by Mike Acton (Formerly Insomniac now Unity):
I recommend this talk all the time, since it is the one that got me convinced to look into DoD seriously.
More Mike Acton (now at Unity): https://www.youtube.com/watch?v=p65Yt20pw0g
Stoyan Nikolov “OOP Is Dead, Long Live Data-oriented Design”: https://www.youtube.com/watch?v=yy8jQgmhbAU
Overwatch Gameplay Architecture and Netcode (More specifically about ECS): https://www.youtube.com/watch?v=W3aieHjyNvw
The main argument is that you work with data so you should care about data. All a program does is take some input, do something with it, and give you an output.
If you structure your program around this mindset, a lot of problems become much simpler. Concurrency is one thing that becomes much simpler to reason about now that you understand how your data is managed and moved around.
Realisations that most of the time you have multiple of something and very rarely do you have a single thing. So having a lot of small objects of the same type each doing some work within their deep call-stack, rather than running through an array of them and doing the work you need to do.
I disagree that encapsulation is a brilliant idea _in general_, because it promotes hiding data, and hiding data is not inherently good. There's obviously cases, where there is internally critical, but since all your state is simply tables of data, your program is just choosing to represent that data in some way, which can make it easier to centralize a bug early.
There's obviously pros and cons, but I don't think you should discount the possibility of it being a good idea just because it questions ideas that seem standard.
I would argue that a business domain is defined by it's data and how that data is transformed and displayed based on user input.
So simply put:
1. Data goes into system
2. Data is displayed to user
3. User interacts with data (button, CLI etc.)
4. Data is transformed based on interaction
That is ALL a program ever is. So you can model a program by just specifying the transformations that happen on each user interaction.
This can be optimized heavily if each interaction requires a large set of data to be transformed, e.g. through data oriented design you can work on big sets of data based on previous interactions and transformations and only work on data you need to work on.
It's not a design pattern. It's just a way of thinking about programs as what they are. A design pattern is something you put on top of data because you want a human understanding of business logic.
> That is ALL a program ever is.
Well, no, that's a fairly typical pattern for an interactive program, sure, but plenty of programs involve transformations that are not in response to user interaction, and may not even include user interaction or display to user at all.
So as with any architecture, you start with "it depends".
Is the system a simple CRUD application with minimal integrations to other systems? In this case, just use your best judgment and build it as simply as possible. A data-driven approach is economical and perfectly acceptable for this kind of system.
Does the system require integrations in and/or out with other systems? In this case you'll want to understand those integrations clearly before proceeding because they will have an impact on how you develop the system. If the integrated systems are unreliable or work in unexpected ways, you may need an anti-corruption layer to buffer your new system from the older systems.
Now. If the new system is complex with many integrations, you'll want to move away from data-driven and into domain-driven design. You'll want to explore bound contexts, relationships in between them, translation layers, root aggregates, value objects, and event messaging. All of this is detailed in Eric Evans' book and further discussed in other books like Vaughn Vernon's "Implementing Domain-Driven Design".
I'd also highly recommend learning Event Storming as a workshop to clearly understand a system and identify strategic opportunities.
But the overriding concern I have for data-driven design is that data, tables, objects doesn't always align with a bound context in a one to one fashion. You may (will) likely have several models for the same context. For instance, "user" seems to be a singular object/domain, but it can have many contexts (employee, external, internal, vendor, sales, support, manager, etc). Each of these contexts carries different meaning and therefore different models. Our past object-oriented philosophy was to build do-everything objects with interfaces to manage complexity.
In Domain-Driven Design, you would actually create separate implementations for each model. (employee service would different than vendor service even though the underlying data may have similarities).
And before we go down the relational database avenue, you have to realize that relational databases are a product and solution, not an architecture. It is convenient for reporting and aggregation, but it actually is an anti-pattern for building transactional systems. Transactional systems are often better suited to key/value, NoSQL data stores. (not to say always, but often)
Lastly, a business domain is defined by its data and behavior. You cannot separate the two and I'd argue behavior takes precedence when designing an architecture.
There are only five basic components:
1. Pipeline (pure function)
3. Reflow (feedback, whirlpool, recursion)
4. Shunt (concurrent, parallel)
The whole system consists of five basic components.
It perfectly achieves unity and simplicity.It must be the ultimate programming methodology.
This method has been applied to 100,000 lines of code-level pure clojure project, which can prove the practicability of this method.
[The Pure Function Pipeline Data Flow](https://github.com/linpengcheng/PurefunctionPipelineDataflow)
And taking from the other side of the view, even when you consider high performance data flow programming, there're great people pointing out going functional might not be a very good idea: https://www.freelists.org/post/luajit/Ramblings-on-languages...
In computing, data-oriented design is a program optimization approach motivated by efficient usage of the CPU cache, used in video game development.
This is based on a design philosophy that gets strict about structuring code into input-transform-output even though C++ encourages "modify parts of an object in place". Also, it get strict about cache utilization.
And so, you get the "Entity-Component-System" (which is a 3-noun term like model-view-controller) approach. The Components are conceptually the columns of the database. The Entities are the rows. Each entity only has a subset of columns, so it's actually a primary key across many tables. And, "Systems" are data transformation functions to be applied to some subset of components
So, instead of defining cars, trees and soldiers in your game as a tree of individual objects that participate in an inheritance hierarchy with a virtual Update() method, you define each actor as a subset of available components that update en-masse during the data transforms applied to arrays of raw structs.
The net result is that it’s often significantly better to use an algorithm that packs its data contiguously (to allow easy pre-fetching) and with only the information required (to get the most elements in each cache line) than one with better time complexity.
Then the DB like model falls out of that as a way to represent domain objects (like game entities) as contiguous collections of the data that represents them broken up into the units that are actually used together.
This DB model got semi-divorced from the origin and you end up with the Entity-Component-System architecture that you’d be forgiven for thinking everyone in game development was using. But in actuality are not even if the game engine itself has been built to exploit the concepts of Data Oriented Design.
This is also why you can often find a huge performance improvement moving data into typed arrays in JS. So isn’t just a concern for people working in systems languages.
Plus, existing development tools are not well-suited for table oriented programming. One would have build such tools also for the benefits to show.
A reference example could be a CRUD framework or a gaming framework for a Trek-ish style universe (not too much supernatural like Star Wars). Making either is a non-trivial process. Maybe when I retire I'll get around to making such...
To make it practical, we may also need somebody to implement "Dynamic Relational" because existing RDBMS are too stiff for certain things, like representing different kinds of UI widgets. Having a dedicated table for each "kind" of widget is overbearing. With "static" RDBMS, one either has to use attribute tables (AKA, "EAV" tables) or dedicated per-widget tables. That's not ideal.