
Event Sourcing (2017) - taspeotis
https://arkwright.github.io/event-sourcing.html
======
ed_blackburn
Sadly user management is crap domain for demonstrating Event Sourcing.

Equally building an entire system as Event Sourced is daft. Some aggregates
ought to be Event Sourced nominally I’d say models that exhibit temporal
properties like a business process or workflow are well suited.

Similarly the most common “issue” I see with Event Sourcing is conflating
Event Sourcing and Event Driven Architecture. They can be complimentary but
aren’t the same thing. This conflation and leads to a befuddled mess of
inappropriate tech choices, and inappropriate consistency models.

------
ec109685
This article was good.

It didn’t get into read after write consistency well enough in my opinion,
which breaks the event sourcing pattern for many use cases. E.g. in the create
user example, there are states in the system where a user could create an
account and then reload their page and have the account not be there if the
write hasn’t propagated to the database used to satisfy reads.

~~~
codebje
That's a distributed systems problem, rather than an event sourcing one. I'm
sure we've all done something like comment on a site like HN and not seen our
comment appear when we reload. The more distributed the system, the more
likely it is we're hitting a stale cache somewhere.

Even the most absurdly reduced system running on a single machine, taking an
HTTP request in and processing it fully to completion in all aspects before
returning any response, is a distributed system - the browser is at the other
end, running asynchronously. The user may tell the browser to reload before
the single server has finished processing. When do both the user and the
server agree that the account has been created?

To "fix" the problem with event sourcing, just don't add distributed
components if you don't need them. Synchronise your "on event" action handlers
with your event creation, and don't return success to the command until the
handlers have completed.

You can even choose to wrap it all in a transaction so the event doesn't write
unless the handlers all succeed, side-stepping the problem of desynchronised
views due to handler bugs.

You still keep the (IMO) main benefit of event sourcing: you can define new
views you didn't have to foresee and build them from the complete history of
the system as if you'd known about them from the start.

~~~
ec109685
With the reload before the server has acknowledged situation, you have’t told
the user we have fully done action X, so there would be no expectation it
persisted.

Yes, it is a distributed systems problem, but with a pure event sourcing
approach as advocated in this article, every action is a potential data race.

Compare this to an application that uses a distributed data store like
DybanoDB where read after write consistency is possible, while availability is
still quite high. Apps that use it are easy to reason about for user actions,
yet you can still use its event log for asynchronous events like sending mail.

That said, delaying acknowledging the write until you know it has propagated
to all critical data stores is an interesting way to solve the problem.

------
xtagon
Great article. One thing that wasn't pointed out that might be of interest to
someone learning about Event Sourcing is that it introduces some challenges if
you are to be compliant with GDPR and similar laws. For example, if your event
log is immutable, and you use it as an audit log, then by nature you are not
ever deleting data. There are solutions to this (for example, crypto-erasure),
but it can be non-trivial to implement.

~~~
tunesmith
I saw the Akka people talking about this on twitter once, I think they were
theorizing that encrypting the data in the log would be sufficient, because
then "deleting the key" could be interpreted as the deletion of the record
(even though the useless data still exists in the log). But I'm not sure this
was ever legally validated?

~~~
xtagon
That is what I was referring to as "crypto-erasure" (also known as "crypto-
shredding"). I'm not sure what counts as legally validated, but some have
shared concerns that the encryption you use to do this would need to be
future-proof against, say, advancements in quantum computing cracking the
encryption down the road even after the key has been thrown out.

~~~
rodocite
if the references don't point to actual data, you don't need this.

\- minimum 2 parts. a relay (reference hash) and the cold/true storage
portion. you can break the reference hash up and reassemble only for secret
holders. which brings us to:

\- content-based routing

\- there are also deterministic vaults for rolling keys

i'm actually not sure in what high-level situation "crypto-erasure" would work
in because being able to re-key a reference means you have complete control.
so why would you need to erase the "bad" key when you can just switch the
reference?

Eth draft for enabling cold storage relay
[https://github.com/ethereum/EIPs/blob/master/EIPS/eip-1077.m...](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-1077.md)

Decentralized ID [https://www.w3.org/TR/did-core/](https://www.w3.org/TR/did-
core/)

^ both generally use the same concept i described above and solve GDPR
"delete" issue. actually, it solves GDPR completely if you can just rely on
the DID. "hard delete" is a separate issue, though-- no other way to get
around that but to fork/version + replay your store and re-reference anyone
who wants to hard delete if you didn't use reference hashes.

------
jbverschoor
Ahh event sourcing... this is the hoarding decease which programmers, and
these days "business people" have. It is also why everybody is doing "big
data", why we have privacy leaks etc.

Solution: Think what you'd want with the data, and only then start collecting

Solution 2: Tax on data posession

------
kureikain
Even Sourcing is nice on paper but when having a few service you will feel
pain if not do it properly.

We had applications listen to topics on Kafka and can re-play to process the
message. All sound goods. When we started to add more topics, we realize we no
longer know who own the topic and subscribe to the topic. We no longer feel
safe to just drop a topic and have to grep/search around, however lots of
these information is configured in environment variable, and sometime pull
from our config management system such as Vault/K8S config make it even harder
to grep because we have to export data out of these system and grep

I think event sourcing is nice and powerful but hard to done well.

------
lukev
A word of caution for anyone considering an event-sourced architecture. I was
on a large government project where the decision was made to use event
sourcing, and it was disastrous. It ended up being a big contributor to
several years of time and cost overruns.

The reason is that for event sourcing to work, you have to have a pretty good
idea of your application's requirements up front. It's simply not conducive to
agile development, compared with using a traditional DB. The requirements were
constantly shifting, and we were constantly realizing that we had the wrong
semantics or structure for various fields, or that assumptions we had made
about the coupling of different types of data simply didn't hold. This led to
a ton of rewriting and churn on the view code, and required constant decisions
on how to handle existing data in the "old" format.

Some of the requirement churn even had ramifications for fundamental
architectural characteristics such as support for atomic transactions, so
there were several points at which we had to hack locking or other techniques
to ensure consistency on top of the event sourcing approach. I do NOT
recommend this, it turns everything into a huge mess.

The worst part is, ultimately, the data sizes ended up not being that big. We
could have run the whole thing off of append-only tables in a single beefy
Postgres instance.

Conclusion: if you are designing systems, you should definitely know what
event sourcing is and the benefits it can provide. However, avoid it by
default in favor of simpler more traditional models unless they are really
infeasible for what you are trying to do. And then, lock down as many key
requirements (at the very least, those around consistency and interop with
other systems) as possible before charging ahead with implementation.

~~~
Yuioup
Thank you for this. I've been warning people off Event Sourcing for a while
now. The architecture is the most convoluted, pretentious, redundant and
downright soul-crushing. If you see Event Sourcing anywhere, run away. Run
far, far away.

~~~
inopinatus
Some of worlds most useful and powerful data structures are the projection of
an event log. The tables of a RDBMS, the balanced writes of a SSD, and even
the classic: double-entry book-keeping.

Even the data stream of a TCP connection is a projection of events, which is
why (and how) we can reconstruct them by replaying captured segments.

So just because there are some lousy executions of a general architecture,
doesn’t mean we should recoil from the basic idea.

My takeaway is that successful event-sourced structures are crafted for the
domain they represent. I’ve developed a couple for my own work, for very
specific aspects of an application, and they work well in context.

If your experience has been that a _general-purpose_ ES framework leads to
shitty, hard-to-maintain apps, I’d say that’s evidence for the corollary.

~~~
nvarsj
Just because a tool is powerful doesn’t mean it’s appropriate. There’s a
reason most people should just use a DB and not a raw event log. The issue I
have with ES proponents is they seem to all pretend there is no additional
complexity that comes with it. I think ES is useful but not always and
requires weighing the costs and benefits, and we need to be honest about it.

------
bsaul
I don't understand how the uniq service would be able to scale horizontally.

How would you load balance calls to uniq to different servers and make sure
there's still coordination to ensure unicity of values ?

Either you keep relying on the logs for replica syncing, but then the service
can't answer in a synchronous manners. Or you need some synchronous
distributed lock ? But then you still have the problem associated to locking
described previously in the same article.

------
bsaul
I don’t get it : in the microservice approach, you can have services
communicating with each others using event sourcing, but why forcing every
service to work with event sourcing _internaly_ ? Any requring transactionnal
behavior, or at least transactionnal functions, could rely on a relationnal
database to ensure atomicity.

The purpose of microservices seems to me to be able to have different internal
architectures for each service. So why getting back a shoving a single one
everywhere ?

~~~
Autowired
I agree with you, maybe except about the atomicity part. When you use event
sourcing, your source of truth becomes the event log, so transactioning
against your local representation of the state does not give you the same
guarantees.

------
sriram_iyengar
This is very detailed. Just got out of building systems integrated to azure
event hub and your summary is very useful.

------
AzzieElbab
Event sourcing is just a nice add-on for event processing systems. Whether we
like it or not, async systems are all about events. Persisting and replaying
those events is just a matter of convinience.

------
whyineedaccount
There is no way to source events in the right order if what you do is keep
adding to the queue, I think this should be revised

------
rodocite
I think the article is too scattered and doesn't actually discuss how event
sourcing works. Key concepts are missing. People who actually want to learn it
will end up getting even more confused or misguided-- which seems like the
trend with this thing.

Excuse the list formatting. I don't post here that much. Just scroll if the
list item is cut off.

TLDR; Use Redux-Saga-- it matches all the bird-eye view event sourcing
concepts closely.

What is it?

    
    
      - Structure: In its simplest form, an append only file (aka a log).
      - Events -> { type, metadata } -> {type: "UserUpdated", data}
      - Projections -> The read model. Think of them as .reduce() over your log. But because all they need to do is look at the newest event to perform a reduction, they act as “realtime” queries for aggregated data.
      - Separation of read and write models aka CQRS
      - Read Model: Projection
      - Write Model: Event Dispatch
      - You can have CQRS without event sourcing (GraphQL, etc) but you cannot have Event Sourcing without CQRS. Event Sourcing is implicit CQRS.
      - Separation allows you to scale your reads and writes independently.
      - Event sourcing, since it is just a log, allows you to replay your data
      - For communication to a service that’s supposed to perform an action for you:
        - Command (dispatch) -> PresentTenseVerbNoun (UpdateUser)
        - Event (write) -> NounPastTenseVerb (UserUpdated)
    

The gist of what you do:

    
    
      - You dispatch events to the event store and create contextual "realtime" data via projections that your services read.
    

Why do you want to use it?

    
    
      - It scales and plays well with distributed infrastructure.
      - You are already using bits of the concepts if you are scaling or doing logging.
      - You have uniform communication between services.
      - Event sourcing is extremely good at modeling state in your system. You’re forced to think of state (via events) and how those events “eventually” resolve. If you’re purely on SQL, on the other hand, you need a log or triggers on top of your commits to keep track of state. Eg — Customer goes down the shopping isle and puts an XBOX in their cart. Then they decide to put it back and put a PS4 in their cart. If you had bound that behavior to events, you would be able to run complex projections on them.
      - ^ On that note, you essentially get free logging and metrics with Event Sourcing (though you need to build out the projections).
      - Event sourcing actually makes writing sequence diagrams to optimize or design your system very easy.
    

Why don’t you want to use it?

    
    
      - If it is overly “complex” for a small-mid sized CRUD project.
    

Misconceptions

    
    
      - Redux / Elm did not “popularize” event sourcing. There was a small snippet in the “Prior Art” section in the old Redux docs that mentioned event sourcing, but no one was thinking “event sourcing, yay!” as they were using Redux w/ Thunks.
      - Redux w/ Redux-Sagas is almost 1:1 the event sourcing model, however. If you want to learn event sourcing, instead of reading the article above, just learn how to use Redux-Sagas. Sagas, in event sourcing terms, is what is more generally known as a “Process Manager”.
      - Event Store DB vs PG DB — No need to go one or the other. Use the best of both worlds. ES for your event sourcing, PG for your read model and fully scoped write models.
      - “Event sourcing is so much more complex than using ORM” — No. The concepts are pretty standard whatever you use when you get into distributed systems modeling. Event sourcing is actually less complex but the tooling and verbiage we are used to is too highly focused on ORM, PG, etc.
      - Acid-compliance and eventual consistency are not mutually exclusive. Eventual consistency does not refer to the DB itself. It refers to the infrastructure. If your infrastructure is not brain splitting and is always “eventually consistent”, everything will be OK for most applications. There will be eventual consistency issues in any large scale system.
      - As soon as an event hits an event store, that event is "logged". It won’t be lost.
      - Streams in a proper event store are very cheap to create and not computationally expensive.
    

Things to Note

    
    
      - Blockchains are event sourcing implementations.
      - Read Smart Contracts are essentially “projections”
      - Write Smart Contracts are your write model, obviously
      - Blockchains replay “events”. That’s what syncing is with wallets. You can see that state changes.
      - The big structural difference between blockchains and a regular event store is a blockchain stores the events as a cryptographically verifiable log (the merkle tree).
      - The consensus algorithm in PUBLIC blockchains are also very different from your typical event store. Public blockchains use BFT consensus algorithm that’s very slow by design. Hyperledger, with its leader consensus model looks very similar to clustered event stores.
      - Redux-Saga matches the event sourcing flow and implementation so closely that it is probably the best place to start.
      - If you have really good modeling and event sourcing in place, you’ll start to see that Redux begins to disappear from your frontend; especially if you use GQL and caching.
      - Domain-driven Design really helps model an event sourcing infrastructure.

~~~
anaganisk
this must be up

------
_tkzm
99,99% new people that come to event sourcing and 9/10 of those who have been
in the event sorucing already, make the huge mystake of taking event store
concepts from other people that took it from other people and in the end that
is where everyone fails. even big names like greg and his praised "eventstore"
project. if you are new to ES, great, you have no baggage. do not read any
technicalities about the event store or try to use any existing library for
it(the underlying storage engine does not matter, mysql, postgres, rocks...).
come up with your own solution and you will have zero ES problems. why? well,
the entire concept of event store that is being flown out there is completely
flawed and if you implement it, it will cost you a lot of money and time to
unfuck yourself later on.

~~~
zbentley
There are a lot of assertions there about poor quality, "everyone" failing,
and concepts being completely flawed.

Could you explain the rationale for those assertions, and expand on why
rolling your own avoids those pitfalls?

