It's abstract, but I'll try to get something down. First, look at what happens t...

It's abstract, but I'll try to get something down.

First, look at what happens to the system from the outside, say a web request that leads to a web response. In between, information is gathered from other areas (databases, program logic) and combined with the request data. There are also possibly other effects generated (writes to database state, messages to other users, etc.).

Now take all of those “effects”--the web response, but also the database updates, logs, messages, etc.--and look at each of them as a tree (going left to right, with the root, the result, on the right) where different kinds of information were combined and transformations were performed in order to get the result.

We’re being conceptual here, so imagine we’re not simplifying or squashing things together--the tree can be big and complicated. Also temporarily ignore any ideas you may have that there’s a difference between information coming from the “user” area versus the “admin” area versus the “domain object #1” area. In this world, those stores of information only exist to the extent they enable the flow that produces our results.

Now notice that there are many different requests and many different effects and responses. Thankfully, some number of the inputs are shared and reusable. Further, entire spans of nodes are in common (an event type) or entire subtrees are in common (a subsystem). These are your data streams and your modules. You didn’t add them in because you felt like there had to be a “user service” or an “object #1 service”--those commonalities factored out (to the extent they did) of the requirements of the data flows.

Often, there isn’t an “object #1” at all--that was a presumption used to put stakes down so you had somewhere to start. And our systems that are made of up of things like “object #1 service” and “object #2 service” very frequently end up with problems of the form: “we can’t do that because object #1s don’t know about [aspect of object #2s]! Everyone knows that! We need a whole new sub-system!”. In the data-driven world the question is always the same: what data do you need to combine in order to get your result?

This isn’t to say all modules we usually come up with will turn out to be false ones (especially since a lot of the time we’re basing our architectures on past experience). For instance, that there is some kind of “user” management system is probably made inevitable by the common paths user-related data take to enter the system.

Now for the reverse argument: imagine you have a system that was done with the sort of modeling where there is an “object #1 service” that must get info from the “user service” and work with the “object #2 service” through the “object set mediator service”. You’re tracing through all the code that goes into formulating a response to requests, from start to finish, but someone has played a trick on you: they’ve put one of those censoring black bars over deployment artifacts, package names, and class names. The punchline is that your architecture inevitably is one of the trees described above--it’s just a question of how badly things are distorted because someone presumed the system comes from the behavior of “object #1”s and “object #2”s and not the other way around.