
Combining event sourcing and stateful systems - brendt_gd
https://stitcher.io/blog/combining-event-sourcing-and-stateful-systems
======
evdev
I think to truly be an event-driven architecture you need to go a step or two
further and be _data-driven_.

In other words, the appropriate way to describe your system would not be
(subscribable) relationships between a set of components that describe your
presumptive view of a division of responsibilities. (This is the non-event
driven way of doing things, but with the arrows reversed.)

Instead, you track external input types, put them into a particular stream of
events, transform those events to database updates or more events, etc. Your
_entire system_ is this graph of event streams and transformations.

These streams may cut across what you thought were the different
responsibilities, and you will have either saved yourself headaches or removed
a fatal flaw in your design.

If you're thinking about doing work in this area, don't just reverse the
arrows in your component design!

~~~
dpc_pw
I'm really interested to understand your comment better.

Can you give an example for "presumptive view of a division of
responsibilities" and generally the whole comment? Something like "bad way" vs
"good way"? Thanks!

~~~
evdev
It's abstract, but I'll try to get something down.

First, look at what happens to the system from the outside, say a web request
that leads to a web response. In between, information is gathered from other
areas (databases, program logic) and combined with the request data. There are
also possibly other effects generated (writes to database state, messages to
other users, etc.).

Now take all of those “effects”--the web response, but also the database
updates, logs, messages, etc.--and look at each of them as a tree (going left
to right, with the root, the result, on the right) where different kinds of
information were combined and transformations were performed in order to get
the result.

We’re being conceptual here, so imagine we’re not simplifying or squashing
things together--the tree can be big and complicated. Also temporarily ignore
any ideas you may have that there’s a difference between information coming
from the “user” area versus the “admin” area versus the “domain object #1”
area. In this world, those stores of information only exist to the extent they
enable the flow that produces our results.

Now notice that there are many different requests and many different effects
and responses. Thankfully, some number of the inputs are shared and reusable.
Further, entire spans of nodes are in common (an event type) or entire
subtrees are in common (a subsystem). These are your data streams and your
modules. You didn’t add them in because you felt like there had to be a “user
service” or an “object #1 service”--those commonalities factored out (to the
extent they did) of the requirements of the data flows.

Often, there isn’t an “object #1” at all--that was a presumption used to put
stakes down so you had somewhere to start. And our systems that are made of up
of things like “object #1 service” and “object #2 service” very frequently end
up with problems of the form: “we can’t do that because object #1s don’t know
about [aspect of object #2s]! Everyone knows that! We need a whole new sub-
system!”. In the data-driven world the question is always the same: what data
do you need to combine in order to get your result?

This isn’t to say all modules we usually come up with will turn out to be
false ones (especially since a lot of the time we’re basing our architectures
on past experience). For instance, that there is some kind of “user”
management system is probably made inevitable by the common paths user-related
data take to enter the system.

Now for the reverse argument: imagine you have a system that was done with the
sort of modeling where there is an “object #1 service” that must get info from
the “user service” and work with the “object #2 service” through the “object
set mediator service”. You’re tracing through all the code that goes into
formulating a response to requests, from start to finish, but someone has
played a trick on you: they’ve put one of those censoring black bars over
deployment artifacts, package names, and class names. The punchline is that
your architecture _inevitably is_ one of the trees described above--it’s just
a question of how badly things are distorted because someone presumed the
system comes from the behavior of “object #1”s and “object #2”s and not the
other way around.

~~~
heavenlyblue
It is the same as arguing whether lambda calculus is better than pi-calculus
or a Turing machine.

These are all isomorphic structures. Neither of them can do more than the
other.

For example - you’re speaking of dependencies, etc - but any language based on
statements can be reduced to a dependency graph defined by it’s single-
assignment form.

Event sourcing is not a panacea.

------
Fire-Dragon-DoL
I'm working with an event sourced system and we did some mistakes in the
process of the design, so some areas that didn't need event sourcing do have
it.

The biggest downside has been the UI: events are not real time and these
objects are just CRUD stuff, so the user wants to see that you saved what he
has just written. You might not have this information yet, so you need to
mitigate it, for example updating the UI through sockets (lot of additional
work).

On the upside, we are acquiring a lot more insights in what business processes
bring value and are meaningful, versus what I call "just configuration".

We figured out quite a few rules of thumb over time that are helpful though.

One thing I noticed over time is that on average there is no need for
"created" and "updated" event, usually there is one meaningful business event
that would encompass both (not always the case), e. g. "product listed", or
something along those lines. This not only saves lines, but some code reacting
to this event has a reduced interaction surface (less bugs and coupling), as
well as being more expressive.

If you might be interested, we chat a lot about event sourcing in the Eventide
Slack channel: [https://eventide-project.org/#community-
section](https://eventide-project.org/#community-section)

~~~
agentultra
This is super important and I cannot stress it enough! If your events contain
words like, "create, update, delete, associate, disassociate," then you're
building a weak domain model that won't benefit from the added complexity of
deriving state from the source of events.

Your events should use the same words your customer would actually use to
describe their business process. For example, a system to manage intake of
patients in an ER would have events such as _Patient Arrived_ , _Patient
Screen Completed_ , _Patient Admitted_ , etc.

If you don't have such a vocabulary then you're not capturing interesting
events so don't store them. You probably want something that is _event driven_
instead or perhaps simply to log actions to an audit table.

~~~
Fire-Dragon-DoL
Indeed! It became pretty evident at some point, we had 4 models that had just
"created, updated and deleted" and felt weak, suddenly we arrived at one model
that had only a "configured" event, that needed all the information from the
previous 4.

That's when we figured out that the other 4 were pointless, just "UI gimmicks
to make the life of the user easier", the only event that was relevant was
that "Configured", with all the related information in there.

The event modeling is the key part. It also help gaining a much deeper
understanding of the business problem the software is trying to solve

------
withinboredom
You should take a look at Microsoft's Durable Functions which pairs event
sourcing + (optional) actor model + serverless. It's some pretty neat tech.

I tried doing something similar to this several years ago, and here's a few
issues I ran into:

1\. Pub/sub in Event Sourcing is a bad idea. It's really hard to get right.
(what to do if sub happens after pub due to scaling issues/infrastructure,
etc?) Instead it's better to push commands deliberately to a process manager
that handles the inter-domain communication and orchestration.

2\. Concurrency. Ensuring aggregates are essentially single-threaded entities
is a must. Having the same aggregate id running in multiple places can cause
some really fun bugs. This usually requires a distributed lock of some sort.

3\. Error handling. I ended up never sending a command to a domain directly,
instead I sent it to a process manager that could handle all the potential
failure cases.

~~~
sbellware
> Pub/sub in Event Sourcing is a bad idea

I find this point surprising. I would say the exact opposite. I would say that
pub/sub and event sourcing are two sides of the same coin: events.

> what to do if sub happens after pub

That should only ever be a problem with a non-durable transport that doesn't
have serialized writes per topic. Which, admittedly, can be pretty common. But
it's not so much an event sourcing or pub/sub issue as much as a choice of
message transport issue.

> Concurrency. Ensuring aggregates are essentially single-threaded entities is
> a must. Having the same aggregate id running in multiple places can cause
> some really fun bugs. This usually requires a distributed lock of some sort.

Or it requires partitioning the queues and using an optimistic lock when
writing (just to be on the safe side).

~~~
withinboredom
> I find this point surprising. I would say the exact opposite. I would say
> that pub/sub and event sourcing are two sides of the same coin: events.

I meant in the context of getting it right. I didn't experiment with all the
pub/sub systems at the time, but most I experimented with would lose data in a
catastrophic event and cause inconsistencies. This was several years ago
though.

~~~
sbellware
> most I experimented with would lose data in a catastrophic event and cause
> inconsistencies

Fair enough. Those are probably message buses or message queues that are
ephemeral transports. Since event sourcing is predicated upon permanent
storage of events, there's no way to lose events that have already been
committed (unless someone actually physically deletes the events).

------
agentultra
I did a formal model of an event source system used in production and it was
quite illuminating. It turns out that concurrency is something one should take
into account when designing these systems. _Versioning_ often refers to two
things:

1\. The event data itself; when business cases change or understanding grows
we wish to add, remove, or change the type of different fields in an event
record.

2\. The current _state_ of a projected model

The latter is what requires some form of co-ordination otherwise you can end
up with events being written in an incorrect order and produce the wrong
state.

It is a good idea though to avoid event sourcing all of your models. Microsoft
wrote about their experiences implementing an event-sourced application and
how they reached that conclusion [0]. In my experience it's because of
temporal properties: event sourced systems are inherently _eventually
consistent_ systems. When you have domain models that depend on one another
you will need to be quite certain that A _eventually_ leads to B which
_eventually_ leads to C and that if a failure happens along the way that
nothing is lost or irrecoverable.

[0] [https://docs.microsoft.com/en-us/previous-versions/msp-
n-p/j...](https://docs.microsoft.com/en-us/previous-versions/msp-
n-p/jj554200\(v=pandp.10\)?redirectedfrom=MSDN)

------
gen220
We encounter a similar problem at my current job (mixing systems that we want
to keep stateful with systems that we want to make “real-time”/stream-based).

I think you’ve covered most of the problems you’ll encounter. One thing that
sticks out to me is downtime: how will your order subscriber handle a product
publisher that’s down or otherwise delayed? Then, the events will be
potentially out of order, is that a problem for you?

On another note, we follow the same bounded context principles, but we
implemented it with Kafka+confluent, since that infrastructure and those
libraries were already available. Teams make their data accessible via a mix
of “raw” Kafka topics and very refined gRPC services. Your subscriber is
implemented as a cron job that reads from N stream and “reduces” them to 1
stream.

FWIW, we also store a transaction log in each of our databases, so we can
generate a stream of object states relatively easily later on. This has helped
a lot with converting old tables into streams, and vice versa.

The only thing that’s a persistent issue is schema changes. My only
recommendation there is to never make them... In all seriousness, keep your
data models small, and whenever you want to experiment with a schema change,
add the new data as a FK’d table with its own transaction log, rather than a
schema mutation to your core table. It’s never worth the headache if you take
data integrity seriously.

------
stormageddon
How long did it take you to come up with this approach? How many meetings etc?
As a lone developer I always wonder this stuff.

~~~
brendt_gd
It took several hours of individual research, watching talks, reading blog
posts; and took several pair-programming sessions of several hours over the
span of four weeks to come up with a solution we liked.

We informed our client that this was a new area for us and that we didn't have
hands-on experience with, but that we believed it would be beneficial to spend
time to explore it, as it would be an elegant solution to several of their
business problems. They agreed and we kept them in the loop with weekly
meetings.

We're now in the phase of actually implementing real-life processes, the
project will probably be in active development for another year or two.

~~~
sdiupIGPWEfh
How do you approach estimating effort for this sort of thing? I find it
awkward enough in Scrum to guess up front how many days of effort research
will take and commit to delivering a plan or design by the end. If you have
clients and aren't strictly bound by someone else's framework, they still want
some rough idea how long research will take. Especially if the client is
footing the bill. If the research is un-billed time, then the estimate is
critical to _you_. What do you do?

~~~
loopz
If you are perceived as just a cog delivering software patches, you've already
lost. If you're presenting business proposals and designs, you've already
provided research, analysis and business cases. Often this is alot of extra
unrewarded work. Though, if a vendor has proven themselves already, they're in
position to ask for billable time for more different types of roles. This
upfront work is an investment in own business and can be used to attract other
clients, but sure, most of it goes down the toilet unless a good outlet is
found for all that creative work. A bigger consultancy can have many people
cooperating on such work in order to add that extra value to hiring companies.

------
carapace

        Events + state = state machine
    

I was working (briefly) at a startup once and we were having a meeting and the
CTO sketched out his idea for our internal architecture and I looked at it and
thought, "That's ethernet." I quit that job.

(Technically, I was let go. Friday I went to head of HR and said, "I think I'm
gonna quit." Monday morning I was laid off. * _shrug_ *)

~~~
jdkoeck
I don't get what this has to do with ethernet.

~~~
icedchai
They're both reinventing the wheel...

------
animeshjain
I would be interested in knowing how the reactors are handling side-effect
which should never be replayed. Is there some well established pattern for
doing this?

~~~
agentultra
Reactors can keep their own state including their current position in the
event stream. When a replay is initiated it ignores events older than it's
current "head."

~~~
sbellware
What happens when the server that holds the thing that holds that state is
restarted?

~~~
agentultra
That's an important question to ask!

Can you assume the reactor has durable, local storage with atomic
transactions?

The answer is to model your design and use a sufficient level of rigour in
validating that your system meets your requirements.

Maybe you could use a database server that has the right properties to ensure
your reactor could survive a restart.

What if you want to add multiple reactors so that you can process an event
stream with a high volume of events, faster?

~~~
sbellware
What happens when the write of the current position fails?

It's the same problem as presuming that ACK messages in message brokers/queues
are guaranteed to not fail.

Since the message transport and other durable resources are rarely able to be
enlisted in the same atomic transaction, and since distributed transactions
would largely be an antipattern, it would seem that a reactor that records its
current position can't be presumed to be an infallible way of ensuring that
messages aren't processed more than once.

In the end, it always comes back to ensuring that handlers are idempotent,
having something that can be used as a stable and consistent idempotence key,
and accepting that the idempotence logic is the responsibility of the handler
coders rather than something we can count on generalized infrastructure for.

------
pdexter
How were the diagrams made?

~~~
brendt_gd
With [https://excalidraw.com/](https://excalidraw.com/)

Believe me: a whole new world will open once you've discovered it.

~~~
the_arun
Pretty cool! I like the idea of edit a file -> save in your repo -> edit ->
save. Export as needed.

