
What they don’t tell you about event sourcing - jackthebadcat
https://medium.com/@hugo.oliveira.rocha/what-they-dont-tell-you-about-event-sourcing-6afc23c69e9a
======
lscharen
The last section on Operational Flexibility and the inability to change the
event history raises a very good point.

Like most of the issues, the solution requires experience to know when you are
at the Goldilocks point (Just Right). This specific issue has a lot in common
with managing database migrations in django or any other migration system.

The ideal situation is to create migrations that can always be rolled back,
but sometimes this is not possible to do operationally. For example, a schema
change that restricts a field from NVARCHAR to INTEGER can only be generically
rolled back if all of the unconvertable data is persisted. This can be
mitigated by structuring the database to avoid these dead-ends, and that
really is only gained by hard-won experience.

The problem with undoing operations via new events is the same thing -- unless
you have foreknowledge of this kind of problem, it is very easy to
accidentally create events that perform un-undoable actions. A very simple
example of a problematic event would be something that modifies a foreign-key
relationship -- let's say it's a digital asset in a game and you want the
ability to transfer ownership of the item from one player to another.

The simple solution is a event like SetAssetOwner(asset_pk, player_pk). This
would set the item's player foreign key field to the player's primary key.
Easy. However, you have lost knowledge of where the asset came from and cannot
undo this operation. A better solution would be to make an event
SwapAssetOwner(asset_pk, owner_pk, recipient_pk). Yes, the owner_pk is
technically redundant, but it provides a check against someone trying to steal
an item with a maliciously crafted SetAssetOwner event that performs no
checking. Even better, this operation can be reverted by sending the same
event with the owner and recipient arguments reversed. Since these properties
are part of the event message, they will be persisted in the event history and
all of the information to undo the event is self-contained.

~~~
Yokohiii
Idk why people are saying that ES data is always immutable. They can be by
default sure, but if a facility is useful to change the history, why not?

~~~
hesdeadjim
If you need to change history you can just create new events that accomplish
your mutation -- and even mark them as a type 'change history' or some such
obvious identifier so when you inspect the event stream you know exactly what
you are looking at.

~~~
mianos
I agree, this is 100% normal in accounting, as the earlier thread pointed out.
If there is an error you add journal entries to the end to make the
adjustment. It is funny that this sort of thing was invented in accounting in
1494.

------
joefreeman
Good article. I've spent the last year migrating to an event sourced system,
so thought I'd share some thoughts.

On the eventual consistency point, I've found you can get quite far with
having the read model managing the race condition. This probably doesn't work
everywhere, but in our system, multiple users can accept an invitation, so we
have something like `InvitationAccepted{invitation_id, user_id}`. It's
possible that multiple users might accept the invitation at roughly the same
time, but the command-side doesn't really have to be concerned with this - it
can happily allow multiple users to accept an invitation. It's up to the read
model to ask, 'has this invitation already been accepted?' \- if not, the
acceptance is successful and will be indicated when queried, otherwise the
acceptance is unsuccessful (and as a bonus, we can separately record who
unsuccessfully tried to accept it). From the user's point of view, when they
accept the invitation, they see a spinner until we confirm with the read model
one way or the other (this could be done by polling the read model, but in our
case we have an event sent back to the client).

Coming up with the event schema and versioning/granularity are hard. We have
version numbers on all our events to make this a bit more manageable/explicit
(`InvitationAccepted1`, for example). Storing events in a relational database
does make it a bit easier to go back and edit/upgrade/delete them (sort-of
cheating, but also relevant for GDPR). Also, I think we're going to end up
suffering a bit from the 'whole system fallacy', but at the moment namespacing
all the events (keeping in mind their expected volume) makes it a lot easier
to manage.

~~~
sbov
> It's up to the read model to ask, 'has this invitation already been
> accepted?'

I feel like you've skipped over the interesting part of your strategy here. If
it's an eventually consistent system, what keeps the read model from having
the wrong answer to this question?

~~~
hvidgaard
Not OP, but I am working with a CQRS system.

In CQRS eventual consistency does not mean that we have multiple servers such
that we have 2 servers with 2 different answers. It means from a command is
issued to an event is propagated to all read models, there is a delay.

You need to handle the race condition at some point or another. From a CQRS
point of view, a user accepting an invitation is just an event like any other
event. What happens based on that event must account for the possibility that
multiple users have accepted, and it's a rather straight forward thing to
solve with an ES. The "accept" event with the lowest sequence number is the
first.

Having the read side handle it would probably mean that when you have a read
model for accepted invitations, you ignore all but the first accepted of an
"invitationId".

------
7sigma
Regarding eventual consistency, a CQRS\ES system can also be synchronous, or
partially. You could have listeners for events that need to supply a strongly
consistent model and others events that feed parts of the system that don't
need strong consistency.

"However the events in a event store are immutable and can’t be deleted, to
undo an action means sending the command with the opposite action"

Well they don't have to be immutable. I don't see why you can't update/migrate
events.

~~~
zihotki
I also share same opinion based on my experience. Events can be modified and
deleted but that must be an exceptional situation (GDPR and other compliances,
etc.). But even if it's exceptional you have to provide a clear and easy way
to do so and that increases complexity of the solution by a lot.

Another thing is strongly consistent models, there may be valid requirements
in some problem areas to have a strongly consistent and normalized model and
use it for command validation. This helps especially well in the case when all
requirements are not known upfront and/or business domain changes very
frequently. A small change in business may require to completely redo the
aggregate roots and logic if you follow standard approach, this is very
expensive. A better decision could be to use a normalized SQL database instead
of an aggregate root. Such approach may be more flexible in certain cases and
have it's own benefits as well as cost and drawbacks.

~~~
jwhitlark
Encryption + lose the key policies seem to satisfy the GDPR. So you don't have
to actually delete an event, you can just give up your ability to read its
payload.

------
dmitriid
Is it just me or the article never presented a solution to the posited
problem: ”Since each entity is represented by a stream of events there is no
way to reliable query the data. I have yet to meet an application that doesn’t
require some sort of querying.”

The solution is said to be CQRS, and nothing in the article shows how you
solve that.

If you unwind the unnecessarily winded florid poetic waxing, it all comes down
to “dump it to a database, and query that”.

~~~
voiceofunreason
"dump it to a database, and query that"

Yes.

In some cases, "dump" is a fold/reduce, and your database is just an in memory
data structure, and depending on how much latency is permitted by your service
level objectives you might cache the data structure as opposed to regenerating
it every time.

There's no magic.

The pattern is analogous to what you would do if your book of record were an
RDBMS, and you had to run graph queries. "Dumping" the data into a graph
database and running the query there would be a tempting solution, no?

------
shady-lady
Hopefully bi-temporal tables get added to Postgres soon.

I think it would cover a few of the use cases that people are turning to event
sourcing for(excluding scale).

------
theptip
I don't see much discussion of event-sourcing simply using a SQL database
(i.e. skipping the CQRS part). This would allow you to keep your CP (strongly-
consistent) semantics.

While this clearly wouldn't work in high-volume cases (i.e. where you
_actually_ need CQRS), it seems like this would be the simplest option for
many systems. I see a lot of articles advocating for immediately jumping into
CQRS, which seems like a big increase in architectural complexity.

Does anyone have opinions/experience on this approach?

~~~
tomerbd
Don't you still need a queue in your example? do you lock the whole table when
you insert a new record to the table? can you elaborate how this solves
consistency?

~~~
theptip
I don't think you need a separate queue; if you have an "Events" table then
you can just write everything there.

It solves the consistency problem because you can create your event inside a
transaction, which will rollback if another event touching the same source is
created simultaneously.

E.g. if you have these incompatible events in a ledger:

CreditAccount(account_id=123, amount=100)

DebitAccount(account_id=123, amount=60)

DebitAccount(account_id=123, amount=60)

You'd want one of the debit transactions to fail, assuming you want to
preserve the invariant that the account's balance is always positive. You
could put the `account_id` UUID as an `Event.source` field, which would allow
you to lock the table for rows matching that UUID.

~~~
Yokohiii
If your idea in the example is that the second "debit" is created by another
transaction while your transaction is in progress, then this will not work
out. Firstly it requires a dirty read, which is nothing I would rely on in a
transaction. Secondly, if the dirty read works, assuming the outcome of
several rows is just a read operation, which forces you to rollback on the
client and still leaves a window for inconsistencies if you decide to commit.
Maybe SELECT .. FOR UPDATE can do a trick here, but that is like giving up
life.

To round this up: RDBMS are bad for queue like read semantics. All you can do
is polling. Which is even worse if you end up being lock heavy.

~~~
hvidgaard
No matter how you model things and no matter what technology you use, a race
condition like this needs to be handled one way or another. You either handle
it such that the data is always consistent, or you handle inconsistent data.

You can use an SQL database with transaction and locking to ensure that you
will never debit money that isn't there. Or you can save commands in a queue
that only a single process at any given time (that incidentally includes the
SQL scenario). Or you can use a distributed consensus algorithm with multiple
stage commits. There is no way around it.

------
dustingetz
Strong consistency is possible with this model, consider for example git.

One real database that works like this is Datomic, which is competitive with
SQL for most kinds of read-heavy data modeling loads that SQL and CQRS is used
for.

------
subsubsub
All the issues the author talks about are valid, however, the idea that no one
warns you about them is wrong.

If you do even the most basic research into Event Sourcing you will find
people warning of these trade offs (foremost amongst them Greg Young, the
person who originally set out the idea)

------
likeclockwork
I've never quite been able to get my head all the way around "Eventual
Consistency". I don't understand how actions that could conflict or resources
that could be contended are supposed to work? At some point, something has to
say, given two actions A and B that are in conflict, the later action B must
fail.

So, where does that happen? When getting written into the read model? Then
what, it emits an event saying clearly that the previous action failed? All
while there's a client waiting for results?

I'd love to see a worked example based on discrete resources. Such as two
people, a box, and a ball; where the ball can be held by either person or in
the box, and a person can take the ball from the box but not from another
person.

I find the concept intriguing but I don't really get it and I haven't been
able to identify in any of the writing where "the buck stops".

~~~
paragraft
Conflict resolution is a separate problem that isn't addressed by Eventual
Consistency. If I make a write in an EC system, that write may eventually be
accepted, it may be rejected, or it could be resolved through something
smarter like a CRDT. EC just says that I won't know that immediately; that
different parts of the system can have different views of the truth at the
same time. And for most business systems that's usually okay, if you're
talking about consistency that resolves on a good-enough timescale.

~~~
likeclockwork
Thanks for answering. Is Eventual Consistency a necessary property of a
CQRS/Event Sourcing architecture?

~~~
paragraft
I don't think it is, but—as usual with EC—if you don't want it there's a
performance cost. The way you'd avoid it in a model where writes go into one
queue/connection and reads come out somewhere else is when you'd do a write
you'd have to block and wait on that write being acknowledged on the reader
side (or you build in some sort of side channel to whatever your write handler
is so it can tell you when a write has been accepted, if you're doing conflict
resolution on the write side (some systems don't and push it off to read-
time)).

But at that point you're just simulating something you probably already had
before going to CQRS/ES, so you would only want to do that in very select
cases. Otherwise there's really no point to the architecture...

------
ninjakeyboard
I've worked with event sourced systems quite a bit in production and at scale,
and have written a book on Akka and another one on related topics. While I
have been an advocate of the approach, My experiences are guiding me away from
implementing Event Sourcing in many use cases (especially where the entities
are long lived). While CQRS is more complex, I'm more likely to implement CQRS
without event sourcing, where an entity is responsible for its own persistence
using whatever mechanism we deem suitable, and emits events that can be used
for the view (or the data can be viewed as read only.) Ultimately a bounded
context is there, and you send it commands and get out events. You have to
evaluate the recovery and persistence mechanism based on your needs. Exactly
once delivery is a pipedream so it doesn't matter how you're persisting the
event and delivering it, evaluating line by line, crashing somewhere it's
going to be possible to emit an event twice somewhere (eg if the outbound
projection emits the event but doesn't persist its offset.) You need to
deduplicate somewhere to get exactly once processing semantics. There _are_
simpler approaches to persistence that work just fine with the same delivery
guarantees that work fine in the place of Event Sourcing. Nobody talks about
any alternatives though - everyone defaults to event sourcing. My intuition
guides me to look at other options having been burned one too many times,
watching organizations collapse around 100% event sourced applications, etc.

I'm still working on event/message driven systems (today with Elixir mostly)
but I've started to make architectural compromises to move away from
especially event sourcing. Event Sourcing + CQRS may be prescriptive but it's
very hard for new developers to pick up and understand the layers of
abstraction underneath eg Akka + the Persistence Journal. And I'm not sure I
can trust many of the open source libraries outside of Akka to be honest. I've
had to dig into the depths of postgres journal implementations and apply
windowing to journal queries for example because they weren't burned in at the
scale I was working with (partially because I inherited an application with a
single entity in a context which had many million line long journals - this
highlights a design error though but hopefully you can see my point.)

You don't _need_ to use these patterns but you can still apply DDD and
event/message based abstractions, and publish events. An entity can write its
state to a record and then apply the state in memory as well without using a
journal given you handle exactly once processing semantics correctly. This
means there are knobs and dials. The problem with event sourcing in the
greater picture is that it's descriptive of an approach, and there aren't many
clear alternatives that people are talking about that work in similar system
designs. If you have very long lived entities, or only a few of them, it gets
especially difficult to keep the system alive over time, but for those use
cases it doesn't mean you should stop receiving commands and emitting events.

You always here about the idea of the approach, never the reality of
maintaining these systems, or the inappropriate use. In one implementation,
there is one entity that receives thousands of events a day, and lives
forever. How do you maintain the journal while changing the code, keep it
alive over time? I've watched event sourcing and CQRS sink projects and teams.
I've watched well paid contractors unable to figure out how to cluster and
scale these systems. The barrier to entry for people to become effective can
be high and you should understand the long view in terms of people required
and cost over time and validate the approach for your use case very carefully.
Again, the fact that everyone talks about event sourcing and no closely
related alternatives makes it seem like the gold standard or the only option
but there are other (simpler) ways to deal with your persistence in an overall
similar architectural approach.

~~~
raarts
I've re-read your post a couple of times, but still find it confusing. It
reads like you're an early-adopter, and got burnt multiple times by design
flaws, and people not being familiar with it.

Suppose ES/CQRS is designed well, and a well-understood mechanism, would you
still move away from it? For which scenarios? I can imagine exactly-once
delivery not being a problem in all scenarios.

And what is the problem with long-lived entities? Solely the fact it takes
longer to reconstruct current state?

~~~
ninjakeyboard
I'm only saying it isn't the only approach, even if you're keeping the source
of truth in memory. It's the most prescriptive approach for sure so is the one
people tend to go for first but there be dragons in managing especially long
lived journals. Migrating the journal over time, figuring out how to truncate
it (especially if there is an initial creation command/event that describes
the rest of the life, or periodic updates to the structure - you have to keep
all of that data forever potentially to know that it's going to wind up being
correct.)

And yes long recovery times are an issue unless you want to keep every entity
in memory forever and ever, or you can tolerate extremely slow turnaround
times. There are knobs and dials here that are subtleties that people won't
think about until after they launch.

If you have a hammer everything looks like a nail kind of thing. There are
other ways to handle problems that are only marginally different in how they
persist and recover state, yet are far more usable for certain use cases. I'm
extremely skeptical when I see someone gung-ho for event sourcing if they've
never used it though. I tend to look at the problem very hard to see what else
we could do for persistence while still maintaining the "source of truth" in
memory.

------
gazarsgo
It's really important to manage versioned schema for events and defining rules
around evolution of schemas, hopefully with tooling support like via Avro's
Schema Registry or Protobuf's forward and backwards compatibility guarantees.

------
sigi45
Still like the idea but it is more complex to do it right and keep it right.

Would only do this with a good team and a real problem where this would be
useful.

Rdbm goes very far

------
chrisweekly
CQRS: "Command Query Responsibility Segregation"

Fowler's post is predictably excellent:
[https://www.martinfowler.com/bliki/CQRS.html](https://www.martinfowler.com/bliki/CQRS.html)

~~~
pc86
This is already cited and linked in the article (along with Fowler's article
on event sourcing).

------
mamcx
CQRS & Event Sourcing are good ideas, but putting yourself ON PURPOSE with
eventual consistency is _nuts_.

Until your are in the facebook-big-data scaling challenging, you don't need
the madness of lose ACID. You can get terabytes of data easily on a single
server, and even partition by company, customer or similar that still allow to
keep domain-level consistence.

Now, I put the events after my normal CRUD operations (ie: I emit CRUD + Save
to event in a single transaction). Is super easy of operate and keep coding
familiar and predictable

\----

I think the ideal EE database engine must be like:

Have a log Table, and a index/subtables for each validation (ie: to check
uniques, aggregates, counts, etc.)

So, if a have a customer related events, I have:

\- Index on: code, name

\- SubTable: code, name, isactive

all the other fields are not need for validation so are recovered from the
log.

In ACID:

\- POST Command

\- Validate data with the index,

\- Save to log

\- Emit blocking events (events that need to be at the same time after the
save)

\- Commit

Eventual:

\- Emit lazy events (events that not need ACID, like to fill external sources)

------
marknadal
I'm glad somebody else said it. It seems lately a new meme is coming around
(by many people independently) all saying:

Wait, event sourcing / immutable data doesn't scale.

I was doing event sourcing in 2010, and loved it. It was incredible. But by
the time I started hearing people call it "event sourcing" I had already moved
on to:

State-based, graph CRDTs.

They combine the best of event sourcing with the best of distributed state
replication, and are super scalable!

Now even the Internet Archive[1] is running it (in 2014 I implemented it into
a library that they are now using -
[https://github.com/amark/gun](https://github.com/amark/gun) )

[1]
[https://news.ycombinator.com/item?id=17685682](https://news.ycombinator.com/item?id=17685682)

------
ulcica
They didn't even tell me what event sourcing is.

~~~
LandR
As I understant it:

Consider you have a database where you store the account balance.

If you want to update the account balance you might update the row for that
customer, e.g.

tblAccounts \------------ | AccountHolderId | AccountBalance |

update tblAccounts set AccountBalance = @NewAccountBalance;

In an EventSource database instead you wouldn't update the AccountBalance
column. You would store something like:

AccountEvents | AccountHolderId | AccountBalance + 100.00 | AccountHolderId |
AccountBalance + 150.00 | AccountHolderId | AccountBalance - 80.00

Then ifyou wnat o get the current balance you can just take the opening
balance and add 100, add 150 and subtract 80.

Periodically you need to collapse these as querying for the balance could end
up requiring going through a large log of events. So you snapshot at some
point in time. So assuming an opening balance of zero, we could snapshot the
above to 170.00.

It feels like you get auditing / logging of all changes out of the box, also
if you are working in functional programming or a system where you constrain
side / mutability as much as possible you sort of eliminate your db as a giant
mutable object. But you also get the downsides this article talked about.

~~~
beiller
Account balance is a bad example, because it cannot be solved using event
sourcing. A key to event sourcing is eventual consistency. The account balance
cannot be eventually consistent, otherwise it will allow double-spending. It
has to be immediately consistent.

~~~
UK-Al05
Not if the event sourcing aggregate rehydrates on the request to spend, and
sends failure or success back.

------
trhway
Sounds like OLTP transactions and aggregated state only with hip names.

------
jackthebadcat
Guys, just need to say that this article is not mine!

------
ryanmarsh
_If we choose to build a business critical functionality around this eventual
consistency can have dire ramifications. There are use cases that availability
is the needed property of a system but there are also use cases where
consistency also is, where is better to not make a decision rather than making
it based on stale information._

I see this issue raised quiet often. If consistency is paramount you can make
commands on certain aggregates be synchronous all the way to updating the read
model. There's nothing that says you MUST have a queue in-between the event
log for an aggregate and the logic that updates the read model. Use common
sense.

 _Typically shines the most when pinpointing parts of the system that benefit
from it, identifying a specific bounded context in DDD terms, but never on a
whole system._

I feel like Greg Young has taken great pains to make this clear. This should
be taken for granted when attempting CQRS.

 _Also your events will be based on a SomethingCreated or SomethingUpdated
which has no business value at all. If the events are being designing like
this then it is clear you’re not using DDD at all and you’re better of without
event sourcing. Finally, depending on the requirements on how the synchronous
the UI and the flow of the task is the eventual consistency can, and most of
the times will, have a klinky feel to it and deliver a poor user experience._

If the read and write model are being updated asynchronously from the UI
you're gonna have to adopt an optimistic caching scheme on the client. This is
why GraphQL subscriptions are pretty much boilerplate for any client I build
against a CQRS service. The Apollo client seems to handle this rather well.

 _Converting data between two different schemas while continuing the operation
of the system is a challenge when that system is expected to be always
available. Due to the very nature of software development new requirements are
bound to appear that will affect the schema of your events that is
inevitable._

I hereby give you permission to use the Strategy Pattern. Problem solved.

 _The events can’t be too small, neither too large they have to be just right.
Having the instinct to get it right requires an extensive knowledge of the
system, business and consumer applications, it’s very easy to choose the wrong
design._

Greg Young and others have talked quite a bit about how to bound aggregates.

 _However the events in a event store are immutable and can’t be deleted, to
undo an action means sending the command with the opposite action._

This is why bookkeeping systems have the idea of "journal entries". I haven't
implemented one for an event sourced system but I can see how this might work.

Overall great post. Really enjoyed that the author took the time to walk us
through all of these issues. Most are non-trivial.

