
Event Sourcing is Hard - goostavos
https://chriskiehl.com/article/event-sourcing-is-hard
======
spricket
I agree with the author completely. I worked on a fairly large system using
event sourcing, it was a never-ending nightmare. Maybe with better tooling
someday it will be usable, but not now.

Events are pretty much a database commit log. This is extremely space
innefficient to keep around. And not nearly use useful as you might think.

Re-runs need to happen pretty often as you change how events are handled. Even
in our local CI environment, it eventually took DAYS to re-run the events. It
was clear that the system would never survive production use for this reason
alone.

De-centralizing your data storage is a bad idea. We ended up not only with a
stupidly huge event log, but multiple copies of data floating around at each
service. Not fun to deal with changes to common objects like User. Sometimes
you would have to update the "projection" in 5-10 different projects.

In practice ES amounted to building our own (terrible) database on top of a
commit log that's actually a message queue. Worst technology I've worked with
in years, eventually the whole project collapsed under it's own weight.

Some of these problems are fixable in theory. Perhaps a framework to manage
projection updates, something to prune old events by taking snapshots, a DB
migration style tool to "fixup" mistakes that are otherwise immortalized in
the event stream. But right now, seriously stay away :)

~~~
bitcharmer
I agree that it's hard, however doable and pays benefits if you know what
you're doing. I worked on 3 successful implementations for finance sector and
we could replay a few million messages per second. Have a look at how we
achieved that in LMAX:
[https://martinfowler.com/articles/lmax.html](https://martinfowler.com/articles/lmax.html)

Sorry to say it, but clearly you must have been doing something wrong or
employing event sourcing where it does not belong.

~~~
kilburn
> LMAX's in-memory structures are persistent across input events, so if there
> is an error it's important to not leave that memory in an inconsistent
> state. However there's no automated rollback facility. As a consequence the
> LMAX team puts a lot of attention into ensuring the input events are fully
> valid before doing any mutation of the in-memory persistent state. They have
> found that testing is a key tool in flushing out these kinds of problems
> before going into production.

I'm sorry, but this is saying "catch your bugs before they reach production"
which just isn't feasible on non-critical software development (i.e., most
software development). The important part that is left out here is: what
happens when one such errors slips in? How do you deal with it after the fact?

That being said, your system is impressive and I loved being able to read
about it. Please keep up the good work and specially sharing your findings! :)

~~~
kjeetgill
Not quite! I don't think that's what they're getting at.

The idea is this: Say you have a record A with fields f1, f2, f3. When an even
comes in you run a function F with steps s1, s2, s3 each of which may modify a
field of record A.

Here's the issue, if s3 fails (due to "invalid input"), the modifications to A
from s1 and s2 are incorrect and A is now corrupt.

There are a bunch of ways to handle this but the one described here is to
avoid touching data that persists between requests until you're at a stage
where nothing can fail anymore.

~~~
kilburn
> until you're at a stage where nothing can fail anymore.

... and then there's a NullPointerException because you forgot to check
something that could indeed fail (i.e.: you have a bug).

In other words: they advise you to not have bugs in that part of the codebase,
which was precisely my objection.

~~~
kjeetgill
Absolutely. But doing things in this style protect you from large classes of
the especially hard to reproduce bugs. Nothings perfect but it helps a lot!

I'd never heard it articulated before but I personally discovered this style
over the years as well.

------
manigandham
Event Sourcing = everything that happens is an event. Save all the events and
you can always get to the latest state, as well as what things look like at
any time in the past. What an "event" is depends on your business domain and
the granularity of processing. It's very common in enterprise apps with
complex workflows (like payment processing or manufacturing). Good fit for
functional programming techniques and makes business logic easy to reason
about since everything is in reaction to some event, and usually emits another
event.

With global ordering, you can have a single transaction ID that points to a
snapshot of the entire system. The actual communication is usually handled by
"service bus" style messaging like RabbitMQ that can support ordering,
routing, acknowledgements, and retries. Kafka or a RDBMS can also be used but
requires frameworks on top.

This concept is used pretty much everywhere. Redux is event sourcing front-end
state for React. Most database replication is event sourcing by using a write-
ahead log as the stream of mutations to apply. All that being said, I
completely with this article that it's the wrong solution for most cases and
creates far more problems and limitations than it solves.

~~~
jsmeaton
What about when event structures change? Now you’re having to push versions
into your events and keeping every version of your serialisation format.

Redux often does not have to keep track of versions, because the event stream
is consistent for that session.

~~~
manigandham
The most common approach is to add versions to the events. The good thing is
that with event sourcing, the exact cutover and lifetimes of these schema
versions can be known (and even recorded as events themselves).

Downstream apps and consumers that don't need to be compatible with the entire
timeline can then migrate code over time and only deal with the latest
version. You have to deal with schemas anytime you have distributed
communications anyway, but event sourcing provides a framework for explicit
communication and this is one area where it can make things easier.

~~~
tumetab1
For me an issue, not usually made explicit, is that those benefits seem to
require a pretty stable business and a stable application landscape.

Why? Because a changing business requires changes to the Events. Since a
change in an Event requires that all consumers to be updated, immediately or
at the schema version deprecation, the cost of change seems to increase faster
in comparison to an application landscape without event sourcing. At the same
time, the cost to know who needs to be changed also grows since "any"
application can consume any Event.

An stable application landscape also seems to be required because if the
number of an Event consumer grows quickly the ability to update, and
deprecated schemas, seems to be related with the number of Event consumers
(which require update).

~~~
taude
If your org is anything like mine, mostly things (data) are "additive" onto
the existing structure. When you want to deprecate something, you can notify
all the consumers like a thirst party would, if you were going to change
something enough. But this later happens much rarer for us, though tends to
leave traces of technical debt....

------
fagnerbrack
First we need to agree on what "Event Sourcing" means. In my view you don't
need to implement every single Event Sourcing pattern to have an "Event
Sourced" system. Say you have a TODO List app (yes, pretty cliche but that's
ok). In that TODO list you have a <form> in which you post the state of the
TODO list to the server. That state is stored in the database in the form of
an event "TODO_LIST_SAVED". When you want to "replay" back, just list all the
events from the DB in chronological order while filtering the ones the user
has access and pick the last, then rebuild the HTML using that one event
converted into the view model.

Kaboom, you have an event sourced system that doesn't even use a queue.

The problem with trashing the idea is that people have a bad experience with
it, either by over-engineering and trying to apply all the solutions with
tools or hand-made implementations instead of using a Lean approach, or
storing events in a type of business model that doesn't even require a
database.

"Event Sourcing is Hard" is a statement as true as "web development" is hard,
or "distributed systems" is hard, or "API" is hard, "eventual consistency" is
hard... yet, we build those things every day doing the best we can. In fact,
anything, any practice, any technique, any architecture can be hard because
software is hard. Even harder is to not over-engineer something that can be
very simple.

Simplicity is hard.

~~~
jacques_chester
Kafka-oriented streaming folks talk about stream-table duality; the idea that
one form can be expressed as the other. There is usually a little lip service
paid to this idea before some heavy hints that _actually_ , the _stream_ is
the _true_ reality are dropped.

My own view is that there are dimensions for any data of interest, expressing
some ability to show an evolution of it. Frequently that dimension is time, or
can be mapped onto time.

But neither the stream nor the table is the truest representation. The truest
representation is whatever representation makes sense for the problem.
Sometimes, I want to clone a git repo. Sometimes I want to see a diff.
Sometimes I want to query a table. Sometimes I want a change data capture
stream. Sometimes I want to upload a file. Sometimes I want a websocket
sending clicks. Sometimes you need a freight truck. Sometimes you need a
conveyor belt. Sometimes a photo. Sometimes a movie.

Sometimes I talk about space vs time using the equations for acceleration, or
for velocity, or for distance. These are all reachable from each other via the
"duality" of calculus, but none of them is the One Truest Formula.

And so it is for data. The representation that makes the most sense for that
domain under those constraints is the one that is "truest".

~~~
justjico
I believe this is where CQRS plays nice with “event sourcing”. You write all
your events in one model, but can read them in multiple ones, that is if you
can tolerate some read latency... and most systems are usually ok with that.

~~~
bonesss
CQRS also alleviates a lot of the pain people experience around event sourcing
and distributed systems with event evolution and manipulation. There are many
cases where it's inappropriate for consuming services to be aware of the
internal representation of events. Giving them a 'snapshot-centric' view of
the data can be a simplification in both directions.

------
AndrewKemendo
The great thing about HN is that it consistently shoves into my face how many,
seemingly common, dev tools or frameworks etc... that I've never heard of.

Event sourcing isn't anything I've ever heard of, let alone something for
which broad marketing promises need debunking.

How common is this framework/archecture/product? Google isn't helping me
determine how widely used it is.

~~~
6t6t6t6
EventSourcing is not a Framework, but a concept.

The idea is to store not the current state of your app, but the transitions
(events) that derive into the current state.

Think about how git stores your source code as a series of commits.

In theory it is a beautiful idea; in the real world, it is hard to implement.

~~~
ploxiln
In fact, git stores a full snapshot of your entire repo with every commit. It
does not store diffs from the previous commit. When you do "git show
<COMMIT_SHA>" it generates a diff from the parent commit on the fly.

There's a huge optimization though: it uses a content-addressed blob store,
where everything is referenced by the sha1 of its contents. So if a file's
contents is exactly the same between two commits, it ends up using the same
blob. They don't have to be sequential commits, or it could even be two
different file paths in the same commit. Git doesn't care - it's a "dumb
content tracker". If one character of a file is different, git stores a whole
separate copy of the file for it. But every once in a while it packs all blobs
into a single file, and compresses the whole thing at once, and the
compression can take advantage of blobs which are very similar.

~~~
bonesss
Gits bi-modal nature is a wonderful representation of a sanely architected
Event Sourced system. When needed it can create a delta-by-delta view of the
world for processing, but for most things most of the time it shows a file-
centric view of the world.

IMO a well-factored event sourced system isn't going to feel 'event sourced'
for most integrated components, APIs, and services because it's working
predominately with snapshots and materialized views. For compex process logic,
or post-facto analysis, the event core keeps all the changes available for
processing.

Done right it should feel like a massive win/win. Done wrong and it's going to
feel hard in all the wrong places :)

------
ilikehurdles
Datomic is, at its core, an event sourced datastore, and it works really well.
I don’t think event sourcing is something that should be solved in application
space — you wouldn’t write a database from scratch for your products yet
implementing an in-house event sourcing system is for some reason more
acceptable.

All of these stories of failure are stories of teams bandaiding event sourcing
on top of other databases not optimized for the usecase.

~~~
willvarfar
I was thinking about Datomic when I read this, and yet came to exactly the
_opposite_ conclusion :)

Datomic may be something like 'event sourcing' internally, but it works so
well because it abstracts and _hides_ the naked overreach that the article
pounds against.

Datomic users think of it as a DB, not a stream of events. So I don't think
that Datomic users are following the 'event sourcing' model.

~~~
vemv
Fair point... but, events in the style of "User address changed" are very
repetitive, wasteful, fragile etc.

So stopping seeing user-address-change as (primarily) an event is a win. One
always can fall back later to perceiving user-address-change as an event.

For "real" events (e.g. "Transaction fraud detected") possibly I wouldn't use
Datomic at all even if it was my primary DB. There are more suitable pub-sub
systems.

And that results in a neat separation of concerns: data changes in one store,
actual business-valuable events in the other.

------
the_duke
Good summary of the drawbacks of ES.

I think one thing can not be repeated often enough:

Eventsourcing can be an incredibly valuable approach, but almost always only
for a very SMALL SUBSET of your system.

Most of the problems with ES materialize from trying to build your whole
architecture around it.

I think many developers fall into this trap because theoretically the concept
sounds so appealing and elegant (immutable, reproducible, modular, ...)

~~~
linkmotif
Event sourcing is REALLY hard to figure out how to do “right.” A lot of
getting it right is modeling knowledge/experience, understanding your domain.

That said, you can succeed at building your entire arch around it and once you
do, it’s glorious. Kafka Streams makes the technical aspects easy once you
figure out how to model correctly.

~~~
the_duke
Out of curiosity, how do you deal with consistency guarantees across
aggregates? (which is much more relevant when your whole architecture is ES)

I realize this is highly domain dependent. Some will be much less affected
than others. But it's another drawback not mentioned, because now you start to
need sagas/managers that coordinate across services with commit/rollback
patterns , conflict resolution., etc.

~~~
linkmotif
As zenpsycho said, all you get is eventual consistency across aggregates, if
you’re talking about projection aggregates. As you say, for domain aggregates
you can wire up transactions by writing your own 2PC on top of Kafka exactly
once semantics.

I would recommend only to people who are really committed to the idea or know
what they are doing and modeling. It took me a long time because I went from
zero knowledge of Java/DDD. First I had to learn Java, then Kafka, then KS,
but still I was lost. Learning the most basic DDD was enough for me and did
the trick though. Writing a 2PC to coordinate across aggregates wasn’t
pleasant but also wasn’t hard with KS. The hard part was learning all the
other stuff and the modeling.

I think a well done ES framework based on Kafka streams would maybe be the
first ES framework that would have a chance. The primitives in KS seem just
right.

~~~
barbecue_sauce
DDD in this context = Domain Driven Design?

~~~
linkmotif
si

------
fma
I just finished the last 2.5 years replacing a somewhat complex legacy system.
It's a lot of Spring Integration, JMS Queues in between (so technically,
'events') and a traditional relational DB. The system was deployed 6 months
after conception and piece meal migration and feature addition to support a
full decommission of the legacy system.

It runs very well and the business is happy. However, I feel I need to convert
it to Event Sourcing/CQRS not because for any technology constraint, or
business requirement...but I feel if I don't have buzz words on my resume, I
won't be marketable.

Ideally, systems are created to meet current and future business needs within
time and budget constraint. In reality, I need to also maintain marketability.
JMS queues and 2 phase commits....old school!

~~~
twic
> It's a lot of Spring Integration, JMS Queues in between (so technically,
> 'events') and a traditional relational DB.

Ah, i wonder if you're working on the project i worked on at my previous job
...

> It runs very well and the business is happy.

Apparently not.

------
EGreg
At Qbix, early on we made the decision to do replication via logs of events,
which we call “messages”. This was at a time when Parse and Firebase were all
te rage and collaboration was done by things like Operational Transformations
and various diffing.

We figured that a social activity would be best represented as a linearly
ordered set of messages, interpreted by various actors (and front end
components) depending on the type of activity.

The thing is, you usually don’t want global ordering of events across all
activities. That introduces problems that Google solves with Spanner and
CockroachDB works hard to solve. Global Byzantine Fault Tolerant Consensus is
even harder to achieve with any scalability, as can be plainly seen from
Bitcoin and Ethereum. You can do it if you trust most of the nodes (like
Ripple) but otherwise it’s infeasible.

But it’s also overkill. You need vector clocks locally for activities only.
Like a chess game or a chat. You don’t NEED to know which message came first
across chats, or unrelated transactions across countries. It’s a bit like
quantum entanglement — only if actors activities and transactions start being
entangled do you need to start caring about conflicts. And it’s a bit like
relativity — if events happen far enough apart then it doesn’t matter which
happened “first”, it will depend on the location of the observer.

So anyway... the primitive for us is the Group Activity, and they can
reference each other, forming Merkle Trees and DAGs if needed, just like in
Git and so on.

For more info see qbix.com/platform/guide/streams

------
garyclarke27
Interesting- Sounds to me like you were reinventing a database - PostgreSQL
has taken over 30 years, has 400 contributers and 1.1 Million LOC, no surprise
to me, it was a bit tricky! PG has a rock solid ACID enabling transaction log
- the Event Source - you can access this easily via logical decoding
functions. You can easily replicate this data into tables, if you need to keep
it, you can also add system and applicable timestamps to get a bi-temporal db,
to enable queries to go back in time. You mentioned impedence mismatch with
GUI, relational databases are famous for ths same problem, fantatic tools like
PostGrest and recently GraphQL integration have already been built to address
this.

~~~
Lapsa
just wanted to say that I don't like your comment. too shallow, wrong
direction - something along those lines

------
antofar
I've worked in a couple of ES based trading system (matching engines and algo
trading both) and it was definitely the way to go, especially on the equity
side were you can restart or snapshot the system every day. In my startup,
I've designed the core transactional system and it's serving us well for 2
years now. It comes with its own challenges and you need more senior devs than
your average crud system hence you use it only for parts which are better
specified and less subject to change. As everything it's a trade off.

~~~
healsjnr1
I think this is the kind of pragmatism that is needed when taking about event
sourcing. It is a _very_ advanced architectural pattern. There is no easy
'Event Sourcing made simple' way to use it.

I think one of the reasons for this is that the systems people often describe
when they talk event sourcing are actually _three_ different, but
interrelated, architectural patterns:

\- event sourcing (build your models based on immutable facts) \- event driven
(side affects triggered by messages, often delivered by queues) \- workflow /
state machine

It takes a long time to get these concepts straight. In our case once it was
untangled, our framework can quite clearly demonstrate how they relate:
messages update the state in a workflow engine, this triggers side effects,
the results are captured as facts and used to build the model, repeat.

This worked for our use case, in which our transactions have a strong
workflow, it may not work in other cases.

Finally the one point I'd certainly reinforce from the article is: _don't
reach directly into the event stream_. This causes huge amounts of coupling.
Instead, we ended up using bounded contexts to define our systems, and then
treat key events as our API. It sounds counter to some of the ideals of event
sourcing, but it is absolutely needed once you grow past that toy phase.

~~~
antofar
Very good points. Cleary defined bounded contexts are the way to go and also
having a clear distinction between internally and externally visible domain
events. Other tricky points to get right how to distribute events and the
relationship order (as in stream ordered, totally ordered, entity ordered)
based on the scalability and "correctness" requirement. For instance in algo-
trading is not uncommon to have totally ordered events throughout the system.

------
nicodjimenez
Event sourcing is still the best paradigm for exposing valuable service data
to an undefined number of services downstream, especially when there’s enough
data to make querying painful for some use cases. They are like a NoSQL buffer
between services and databases. Not always necessary but sometimes useful.

Event sourcing brings its own complexities (eg Kafka clients) but it’s still
better than having one huge shared database or even RabbitMQ fanout exchanges.
In the best cases, developer experience of Kafka can be very good and
comfortable. You can write short python scrips that ingest data and dump to
other databases and services, it’s very nice. In some cases you want services
writing directly to databases but sometimes you don’t.

~~~
dm3
This has nothing to do with event sourcing. What you're describing is event-
driven integration. Rarely the events used to source the domain model should
be the same as the ones used for the messaging integration. Unfortunately, due
to the repetition of the term "event" the purpose gets confused most of the
time. The "Data on the Outside versus Data on the Inside"[0] paper makes a
good distinction even though it doesn't use the same terms.

[0]:
[http://cidrdb.org/cidr2005/papers/P12.pdf](http://cidrdb.org/cidr2005/papers/P12.pdf)

~~~
nicodjimenez
This is a good point (about how messaging integration requests differ). I
think the choice comes down to whether at least once or exactly once makes
more sense for a specific unit of shared data / functionality. For exactly
once, APIs and RabbitMQ shine for connecting services.

I’m still wrapping my head around when to do what in SOA. Thanks for the link
btw.

------
preommr
Never heard of event sourcing before, but I've used this pattern myself (as
another comment mentioned, when you need it, it's a fairly obvious way to do
it). Looking it up, I am guessing it's from Martin Fowler's book about
patterns and architecture design? And on a related note, would you recommend
the book to the average developer?

------
jonahx
This is question more than a comment, as I have only casual knowledge of event
sourcing...

"""

You wouldn't let two separate services reach directly into each other's data
storage when not event sourcing – you'd pump them through a layer of
abstraction to avoid breaking every consumer of your service when it needs to
change its data

"""

Isn't the event itself precisely that layer of abstraction? That is, you're
not publishing the details of your data store. You're publishing an event
which is a thin slice or crafted combination of details that ultimately reside
in that store, but which you are hiding...

Am I misunderstanding the quote?

~~~
voiceofunreason
> Am I misunderstanding the quote?

I don't think you are misunderstanding the quote, I think you are
misunderstanding the nature of the problem.

If you tip your head sideways, you may notice that the persisted
representation of your model is "just" a message, from the past to the future.
It might describe a sequence of patches, or it might be a snapshot of
rows/columns/relations. But it is still a message.

The trick that makes managing changes to this message schema easy is that you
own the schema, the sender, and the receiver. So coordinating changes are
"easy" \-- you just need to migrate all of the information that you have into
its new representation.

If the schema is stable, the risk of coupling additional consumers to the
schema is relatively small. Think HTTP -- we've been pushing out new clients
and servers for years, but they are still interoperable, because the schema
has only changed in quiet safe ways.

But if the schema _isn't_ stable, then all bets are off.

Because of concerns of scale/speed, we normally can't lock all of our
information at once. Instead, we carve up little islands of information that
can be locked individually. The schema that we use are often implicitly
coupled to our arrangement of these islands, which means that if we need to
change the boundaries later, we often need to change schema, and that ripples.

And all of this is happening in an environment where business expect to
change, and there is competitive advantage in being able to change quickly. So
it turns out to be really important that we can easily understand how many
modules are going to need to be modified to respond to the needs of the
business, and to ensure as often as possible that the sizes of the changes to
be made are commiserate with the benefits we hope to accrue.

------
erikpukinskis
I’ve been using event sourcing as an exclusive data store for a variety of toy
modules, which are slowly becoming something large... I am using all bespoke
infrastructure, both for managing the logs and for building UI.

I don’t think what the author did sounds like event sourcing, as I think of
it. His setup sounds more like pubsub. In all honesty I’m probably the one
doing it wrong though.

My event stream isn’t typically consumed or listened to by arbitrary
listeners. (Although it can be, in some rare cases) Each event is namespaced
to a specific module. When you load a log, you provide singletons for each
module. Each message is only received by one singleton. If you want to
broadcast that message to another module you have to do that from the parent
module.

Everything is totally imperative so there’s no ambiguity about what causes
what.

If I need data on a different machine, or in the browser, I just send the log
down as a procedure and run it with fresh singletons on the client.

I also don’t have one mega stream, I only mix modules that actually interact.
If I have two pieces of data that aren’t connected, they each have their own
log.

I don’t know, maybe I’m the one who’s not really doing event sourcing.

Like I said, I’m still somewhat at the toy stage, so maybe I will get to where
OP is eventually. I do plan on doing log rewriting for compression. That might
lead to some pain.

I also haven’t yet had to deal with sharding. I suspect there’s pain there.

My plan is to use the one-singleton/one-module per message namespace rule to
keep things from getting too complex. I figure if the same module that
consumes messages is responsible for rewriting them, maybe it won’t be too
weird. We’ll see. I have a lot more prototyping to do.

If anyone is interested, the core module is “a-wild-universe-appeared” on NPM.
It’s still in the 0.x.0 series so it could break. But the basic API is pretty
simple and hasn’t changed much in a few months even though I’ve been using it
regularly.

[https://www.npmjs.com/package/a-wild-universe-
appeared](https://www.npmjs.com/package/a-wild-universe-appeared)

I agree with the OP that it’s perhaps too soon to throw this kind of thing at
production problems. We need more research on how stores like this integrate
with other layers.

------
bruth
Event sourcing is often described in the context of a rich domain model where
every event has slightly different semantics with respect to the domain. As a
result, there is often a need to adapt these event types over time as the
domain changes. This could involve revising an existing event type (which is
fine for adding new attributes for capturing more information) or creating a
_new_ event type to model some that has changed in the domain. If you simply
incorrectly modeled the events for the domain, then you will have the same
challenges of migrating a poorly designed database schema.

The arguments that Git and Datomic are both event sourced systems are good
examples of successful application of this pattern. However, these are poor
examples when it comes to event sourcing where the events are "domain events".
In both Git and Datomic, the data model of the _event_ are pre-defined. With
Git you have a changeset and with Datomic you have datoms, both of which are
composable by design and where every changeset and datom is equivalent (from a
structural standpoint).

Applying event sourcing to an arbitrary domain model means every event is both
semantically and possibly structurally different. That is, how event of type X
affects the state compared to an event of type Y is different, where as
applying two different datoms or changesets in Datomic or Git, respectively,
change the state in the same way.

So I think there are two fundamental challenges faced when using event
sourcing with a domain model. First is that the type and structure of events
need to adapt to the domain (business, organization, etc). Datoms and
changesets never change.. they are fixed and therefore don't have that
challenge. Second, and related, is that as new events are introduced or
existing ones are adapted, etc. the code that processes those events are
unique as well (not just the model of the event itself). This challenge is
exacerbated if there are multiple downstream consumers of this event stream
where now there is "coupling" on the event/data side of things and likely
semantic coupling as well.

Again, this is not something you run into with Datomic or Git simply because
the data model of the _events_ in those systems (the fundamental unit being
left-folded) has a fixed data model and semantics.

------
kccqzy
Some contrarian opinion:
[https://vvvvalvalval.github.io/posts/2018-11-12-datomic-
even...](https://vvvvalvalval.github.io/posts/2018-11-12-datomic-event-
sourcing-without-the-hassle.html)

Basic idea is to make the event log dumb. Instead of capturing business logic,
make them as dumb as SQL commit logs: this row was inserted at this time, that
row was deleted at that time. Sort of like a glorified WAL file. Benefits of
this scheme? Querying old database states and auditing.

------
agentultra
I've never read or heard that event sourcing was supposed to be all sunshine
and rainbows.

Microsoft wrote about their adventures going down this path [0]. It's well
worth the read if you're considering it for your project.

For a few features in my company's current platform we use event sourcing. We
collect plenty of small data points over time that is aggregated into rows to
provide users summary information of reams of operational data. We tried
aggregating the data in queries and despite our best efforts to optimize our
indices, tables, and queries there wasn't any way to compute it in any
reasonable amount of time.

The pain points for us:

Our UI team wasn't in sync with how data flows in an event-based system.
There's a lot of friction there. We're slowly updating the team on task-
oriented user experiences and breaking down our UI components to transmit
their commands directly. For now a lot of our control plane has to break up
the huge form data we receive into commands and send back partial responses to
the client. As we move forward and find better UI patterns this has improved.

The control plane controls some data models that are not event-sourced and are
mutable. Our users tend to expect to be able to rename and delete objects in
the system that our event-sourced models refer to. It caused some confusion
when certain views in the application that are built from our projections
wouldn't see updated name of the object they had just renamed. And so we ended
up doing the event-sourcing no-no of emitting the CRUD events so that our
projects could appear in the manner our users expected. This is partly because
of the aforementioned problems with the UI team but is also a problem with
event-sourced models referring to data that can mutate over time.

However it hasn't been a hellish experience either. I took the liberty of
developing some models of our event-sourced infrastructure and features in
TLA+. This has been helpful to ensure that certain properties of the system
under development would hold: consistency, availability, etc. You may not be
willing to go down the path of learning TLA+ but the key take-away there was
that a little planning goes a long way with a project like this: simple unit
tests and whiteboard diagrams are not going to cover all of the things that
can go wrong in an event-sourced system. If anything it might convince you to
keep it simple, limited, and constrained as we did.

 _edit: forgot link_

[0] [https://www.microsoft.com/en-
ca/download/details.aspx?id=347...](https://www.microsoft.com/en-
ca/download/details.aspx?id=34774)

------
jpz
It is hard to separate DDD, Event Sourcing/CQRS - they seem to all be joined
up concepts promoted by a small circle of people.

I worked on a blockbuster project financed by a local billionaire in a Gulf
State where the entire shebang was “mandated” by the CTO and a well-known
Scala consultancy. Event sourcing, CQRS, with DDD to define architecture.

Let me describe one simple issue that was almost intractably complex - user
sign up.

We had one service which was authentication, and another which was user
preferences.

The front end sends a create new user _command_ (set up a new user in the auth
system with a password, email, etc), we also needs now to send some kind of
message to initialise the user info records (e.g. home address, telephone,
language preference, etc.)

We need to implement a saga for this, e.g. implement a process coordinator. Or
maybe the authorisation system should on first request where the data is not
populated, fill in a blank, default record?

We genuinely achieved organisational paralysis over this, as there was no
clear emergent right way to do this, and a bunch of ways that really smelled.

Additionally, regarding the a code organisation perspective, we had a repo
that had the central commands and events defined as code, which every service
linked to. So we had a huge central dependency which had a high velocity of
change.

What was meant to be a distributed, loosely connected system was in fact the
most coupled system I’ve ever worked on.

Architecture is about clear communication, and the developers were simply
perplexed. We were bringing in 10 developers a month, and communicating the
architecture was impossible. Lofty ideals were espoused.

The first basic rule of delivery in a software engineering project, KISS, was
universally ignored, in favour of using “sexy” architecture.

Fitfully, the CEO was first sacked after 18 months at the helm, a few months
later the CTO and 95% of the dev staff. The year long coding effort was
turfed. Estimates are $50m-$100m was burnt.

The winner of course was the DDD consultant that mandated everything, whose
word was seen as law, who didn’t actually seem to have very much pragmatic,
practical experience - he was getting paid $2k/day, and found it difficult to
listen to any idea which watered down his architecture in any way - I think he
would eeked out enough cash from this shitshow to purchase a small dwelling.

The biggest losers were the hundreds of staff that had relocated to the region
sometimes with families, that had made genuine plans to be in the region for
years, that would have all had their visas cancelled.

I think the entire thing is a con - my experience was in ideologues and
purists pushing it and making themselves niche consultancy/speaking/writing
careers in it.

Now this is just one apocryphal story - but it really is an architectural
style that can totally wreck a development or even a company - people are
ideological about it - it is not low-risk, and the entire Saga/Process
Coordinator stuff is just a tack-on to try to (unsuccessfully in my opinion)
answer things that simply don’t work properly. The message I mean to
communicate is that the async-everywhere nature of CQRS/ES is super complex
where coordination is required.

If you are going for loosely connected services, I far more prefer the
microservices/RPC architecture - such as Netflix - as the synchronous model
with distributed load balancing is a lot more sound.

~~~
bigbluedots
I might be misinterpreting your example, but wouldn't you have both the auth
service and the preferences service listening for a 'new user' event (which
would contain all the details needed by both systems), and both if them acting
on it? Of course it would get more complex to handle error conditions, e.g
what do you do if there is validation on the preferences that fails...

~~~
jpz
There are commands and events in CQRS/ES - where you need commands and success
reported from multiple subsystems from front end interaction, now you need a
transaction coordinator for your “saga”

Typically the front end needed to get a success message to say user created,
or user creation rejected (e.g. backend checking on valid postcode, valid
username, etc - rejections which would come from two separate services.)

~~~
ouro
I am completely confused by your description ...

I have also not used a 2pc transaction coordinator in well over a decade.

Also you use the term "saga" which is correct in that what people are often
doing here is not in fact a saga (which actually has a pattern description) it
is usually a process manager which is a different pattern.

~~~
jpz
If you’re confused, welcome to how everyone felt. The nuances and abstract
definitions made for confusion for the staff. I’ve forgotten what I knew about
it, you may well be right in your definitions, however what was clear was that
we needed to implement asynchronous pub/sub comms on the Kafka queue to get
this work. What might have been simple with an RPC (e.g. a POST/PUT) was
turned into a system which needed to track the state of its asynchronous RPCs,
and listen to the queue for responses.

All of this was for no real purpose, and as I said the proof was in the
pudding - tens of millions of dollars wasted, hundreds of staff hired and then
fired. The domain itself was not complex (e-commerce) but the implementation
was ludicrous.

There were basic delivery problems in the project as there was a complete lack
of keeping things simple, and massive overengineering.

------
pgwhalen
I'm of the mind that the usefulness of event sourcing as an architecture is
directly correlated with how easy it is to determine what an event is.

I'm also of the mind that most challenges with event sourcing are ultimately
tooling problems, and that we will "get there" eventually, for some fairly
pleasurable definition of "there."

------
fetbaffe
It is worrying that a central figure to Event Sourcing & CQRS like Greg Young
reduces the "framework" to a function, a pattern match & a left fold.

Linked in the article
[https://youtu.be/LDW0QWie21s?t=1926](https://youtu.be/LDW0QWie21s?t=1926)

~~~
detaro
Why do you find that worrying?

I interpreted that section as "the core bits are easy, and the frameworks
people have built don't really help you with the non-easy problems that come
later, so they provide little value", which (assuming the statement about the
frameworks is true) seems like a reasonable position?

~~~
fetbaffe
It is not how I interpreted it. What I heard was salesman pitching an
expensive product as a bargain.

Even though frameworks have lots of drawbacks, I think it solves one problem
really well, it gives you, the team, a direction.

Doing Event souring & CQRS correctly takes years of experience, this can be
concluded by reading articles like this one or watching any of video by Greg
Young.

In a sense the origin of this article stems from the notion that no framework
is needed. I think that is a setup for disaster by selling developers the idea
that this is easy, when it isn't.

In my experience frameworks have often learned me how _not_ do things.
Frameworks are condensed experiences that you don't need to learn yourself,
someone else has already done the mistakes for you. This is a huge time saver
& gives you, the developer, experience at a lower cost.

With that experience going frameworkless can then be achieved if necessary.

However my interpretation of it can be exaggerated due to the fact it was a
short statement without much context.

------
vikiomega9
I think Event Sourcing is neatly defined as an algebraic group and provides a
nice abstraction to reason with. Perhaps the problem is not recognizing this
notion and not proving the correctness of a system before implementing it?

------
lkrubner
Although terminology differs, storing the canonical source of truth in Kafka
has worked great for many of my clients. If that is Event Sourcing, then it
can be made to work easily. I do get asked many, many questions about this,
often from inexperienced teams. I took their questions, and my answers, and
posted them here:

[http://www.smashcompany.com/technology/one-write-point-
one-r...](http://www.smashcompany.com/technology/one-write-point-one-read-
point-one-log)

------
l8again
If people are indeed directly accessing a stream, they are violating a
fundamental service oriented principle - Teams must communicate with each
other through their service interfaces. It is worth understanding this concept
with clarity. Assuming we go with Kafka, even if we don't need any additional
functionality other than what Kafka provides out of the box, it would still
need to be wrapped in a service and treated as an actual service. Otherwise,
it becomes a shared resource.

~~~
hvidgaard
This is one of the core issues that make many ES systems complex. A service
owns an event in the same sense that it owns internal state. It's internal to
that service and only expose what it finds appropriate, in the way it finds
appropriate. That is almost never directly as an event on a public eventbus.

------
acjohnson55
I agree that event sourcing (and also CQRS) are not so simple, in practice.
The coupling the author mentions is something I definitely experienced, and I
think the answer is to separate your internal event representation from both
the input (e.g. command) and output representations. I got seduced by the
simplicity of having them all be the same, but I definitely found I wanted to
be able to vary these things independently.

In the first project where I applied event sourcing, I treated my commands
(system inputs) as events, but ended up regretting it. The problem was that
many commands cause multiple effects and as the number of commands increased,
the complexity of the logic for deriving the state started to accelerate.

If I could do it again, I would have strictly separated the command and event
ontologies, and adopted the concept of a command processor. The command
processor takes a command and an event list and returns feedback on the
validation of the command along with a new event list that is at least as long
as the input. Rejected commands would result in the same event list. Accepted
commands that map 1-to-1 with events would result in an event list one item
longer. Complex/composite commands would result in more events. I probably
would have logged the commands to retain the command-event relationship (for
things like undo), but largely that would be a separate thing.

If commands and events are separated, CRUD doesn't require the event sourcing
paradigm to bleed through to the UI. Each operation is just a command.
Valdiation comes directly out of the command processor and can easily be
mapped back to user-visible feedback.

The other mistake I made was concentrating too much logic in my state
calculator. My calculator produced all derivable facts from the event log, as
well as housed the validation logic (the command processor). In retrospect, I
should have figured out what the most fundamental derived facts were and then
moved higher level facts into their own calculation logic. I think this would
have made maintenance and testability far easier.

While event sourcing does come with its baggage, I find that in projects that
aren't using it, the cost is a bunch of ad hoc solutions to the problems of
"first-class change", which you get for free. It's an extremely helpful
essential technique when modeling stateful workflows.

~~~
adrianratnapala
> If I could do it again, I would have strictly separated the command and
> event ontologies,

Your use of "I" instead of "we" is interesting here.

I find that when there is a separation of two two things that naively look
like they can be collapsed into one things (e.g. because all the simple cases
have 1-1 mappings) then someone will either collapse the two -- or (more
likely) force you to collapse them by building in an assumption about the 1-1
mapping.

Your beautiful separation then becomes useless. Do you have experience about
how to set up teams that don't do this?

~~~
acjohnson55
I say "I" for a couple reasons. First, I'm no longer at the same company, so
that's just my personal reflection. Second, I was the architect of the project
and definitely the person who sold the bill of goods on the benefits of event
sourcing :). It was largely successful, but with those lessons learned.

I'm not opposed to collapsing things that have a 1:1 mapping. It's often a
reversible decision, when/if you find the simplification is no longer actually
simplifying things. The problem is that as these representations cross
boundaries between modules and systems, reversing the decision becomes far
more difficult. This isn't limited to event sourcing at all, though. It's the
fundamental concept of encapsulation and coupling in system design.

I have found it difficult to socialize the benefits of encapsulation in a
team, because the upfront cost is easy to see, but the downstream benefits are
not. Sometimes, I've made the judgment to just step back and let people learn
from their own mistakes. I've learned the hard way that it's actually not the
worst thing in the world.

------
naasking
There's sort of a middle ground between event sourcing and ordinary mutable
entities: versioned entities.

[http://higherlogics.blogspot.com/2015/10/versioning-
domain-e...](http://higherlogics.blogspot.com/2015/10/versioning-domain-
entities.html)

The particular schema described there isn't suitable for highly concurrent
entities, but a more suitable schema could be employed that achieves the same
goals.

~~~
moojd
We did something similar to this for a CRUD app that needed to become append
only, have a full change log, have approve/deny events, and the ability to be
rolled back. We still had an event table, but instead of having event data it
just had a reference to a 'shadowed' (versioned) entity in the entity table.
Once an event is approved, you project the shadowed entity on to the real one.
That way the ID of the real entity never changes. This worked really well for
our very specific use case (simple CRUD events, monolithic app.)

------
ta93754829
One of the problems I _think_ I see with event sourcing is its inability to
scale. You have to guarantee the order of events, right? How do you do that in
a large scale distributed system with eventual consistency, without incurring
an insane synchronization time penalty? I'd genuinely love to hear if you have
a good solution for this, because if you do I have a use case I need it for,
so this ain't a troll comment!

~~~
jdc
[https://stackoverflow.com/questions/41082938/event-
sourcing-...](https://stackoverflow.com/questions/41082938/event-sourcing-
microservices-how-to-manage-timestamp)

~~~
ta93754829
that basically says Vector Clock, which is a good solution for where it's
usually applied, but I think this is different. Having everything bottlenecks
through your vector clock isn't going to be feasible.

------
vemv
I don't think one can do proper, robust Event Sourcing without having first a
few years experience in hardcore functional programming (particularly of the
variant that strictly segregates side-effects), something most of us lack.

OO and loose FP are fine for a (huge) variety of problems, but the hardest
problems need the next level of correctness and elegance.

Else you end up authoring yet another Rube Goldberg machine.

------
ralusek
Here's my take on Event Sourcing: it's not particularly well defined what an
"event" is. Did the event happen yet? Did it succeed? Which part? I didn't
like having an event ledger say "comment created," where my application is
then meant to consume this, handle validation, potentially fail on the db
operation, etc.

So here is what I do: I basically combine Event Sourcing architecture with
CQRS. Whenever a client makes a write request, that "command" gets written to
a log. Then when things happen as a consequence of that command, for example,
a successful DB write happens, I write another log that references the
command.

So I'll have an event that says:

    
    
        id: '1234'
        type: 'command'
        data: {type: 'createComment', text: 'Ayy', userId: '1234', postId: '6543'}
    

And then that can get picked up and processed by the application, which will
create another event like:

    
    
        id: '2345'
        type: 'dbWrite'
        data: {model: 'comments', data: {text: 'Ayy'}}
        commandId: '1234'
    

There's a lot of redundancy, and you can instead rely on something like your
DB WAL for some of the consequences of the commands, but separating out the
commands from the results like this has made Event Sourcing work quite well
for me.

~~~
ZenPsycho
Having it clear not just in your own head, but your entire team's head what an
event is, quite important, especially to keep in mind what happens when you
"play back" a log. Do transactional emails get sent out again? Do upstream
services record the playback as duplicate events? How are transactions
handled? What external state are you unknowingly depending on for that
playback to produce the same result?

~~~
jacques_chester
> _Do transactional emails get sent out again? Do upstream services record the
> playback as duplicate events? How are transactions handled?_

The narrow problem here is that Event Sourcing is temporal, but not
bitemporal. Or rather, the different kinds of temporality are frequently
muddled.

Event streaming in the Akidau/Google style is bitemporal, but mostly
accidentally, as a side effect of distinguishing "event time" (fact time) and
"processing time" (transaction time / belief time).

(Snodgrass later proposed _tri_ temporal models, including a timeline for when
a fact-belief was viewed, which I find both brilliant and slightly
terrifying).

The problems of evolving data models, which is the third-order problem, is
often hardest when you put a log or stream at the centre of your design.
Databases using SQL, for example, struggle with first-order (current state)
and second-order (historical state) evolution. But they do much better on
third-order evolution, since DDL is built into every relational database.

------
sohex
Related to the topic, does anyone have experience with Axon Server?
[https://axoniq.io/product-overview/axon-server](https://axoniq.io/product-
overview/axon-server) It claims to solve a number of the concerns over
adopting an event sourcing model.

~~~
HelloNurse
From what I've seen in demos, it addresses low-level plumbing elegantly but I
don't see how a relatively unobtrusive framework (I mean it as a major
compliment) can help against making plain old design mistakes with wrong
events, commands etc. that don't fit together and are unable to satisfy
requirements.

------
lmilcin
Event Sourcing is not hard (as compared to not event sourcing).

But event sourcing is not for every application. Event sourcing solves some,
otherwise hard, problems at the cost of added complexity. You need to judge
whether it pays off.

Event sourcing assumes particular size of an application. Too small
application will pay a lot in complexity with no added benefits because the
problem would otherwise be easily solvable without having to use event
sourcing. Too large an application will pay a lot in complexity because your
write path will be complex due throughput requirement.

Event sourcing requires dedication. You can't go half-way, mix event sourcing
here with direct inserts there, for example. This is going to be hell of a
complex environment to live in with worst of the two worlds. Event sourcing
only solves problems if you use it 100%.

------
leowoo91
But I suppose there is no other alternative for high traffic systems? Concept
of "Event sourcing for everything" belief might be the issue itself, not the
selective use.

------
yenwel
If you want to do event sourcing. Look at persistent actors from akka (jvm and
clr). It solve a lot of feaures as discussed here with replay, snapshotting,
event upgrades etc.

------
AzzieElbab
Everything in the world works off event sourcing, from your database commit
logs to your react. Hard or not, we are stuck with it

