Hacker News new | past | comments | ask | show | jobs | submit login
Event Sourcing (2005) (martinfowler.com)
105 points by gjvc on Aug 11, 2021 | hide | past | favorite | 88 comments



> Reversal is the most straightforward when the event is cast in the form of a difference. An example of this would be "add $10 to Martin's account" as opposed to "set Martin's account to $110". In the former case I can reverse by just subtracting $10, but in the latter case I don't have enough information to recreate the past value of the account.

I've used the event sourcing pattern with great success in the context of an exceptionally complex web application for my day job. The front-end would create an 'event' for every action the user took. This allowed us to seamlessly implement client-side undo/redo functionality with surprisingly little effort, a market leading feature our users really appreciated.

One key insight we had was the necessity of capturing the state change as a difference, as Martin explains. This turned out to be tremendously powerful. We were able to build a variety of other features on top including detailed analytics of how users were using our app, and session replay for training and debugging, among other things.


The article also says:

> ... all the capabilities of reversing events can be done instead by reverting to a past snapshot and replaying the event stream. As a result reversal is never absolutely needed for functionality. However it may make a big difference to efficiency...

When I implemented undo/redo, I found that replaying from snapshots was fast enough. I wasn't working on a database-backed app though.

Implementing reversal is extra work and bug surface, so it should be subject to cost/benefit analysis.


I find fuzz testing works great for things like this. Make a function which, given everything you know about a user's state, randomly generates an action that user could take (and the expected result). Then you can run your user model forwards and backwards through randomly chosen actions. If you support undo, check that if you play any change forward then backward, the resulting state is unchanged.

I usually pair that with a "check" function which just verifies that all the invariants I expected hold true. Eg, check the user has a non-negative balance.

It sounds complex, but you get massive bang for buck from code like this. 100 lines of fuzzing code can happily find a sea of obscure bugs.


And if your system isn't using event sourcing (where this sort of testing is relatively simple to add), you can also do model-based testing.

I like this example as a demonstration from FsCheck using F# (for the test program) and C# (for the program under test): https://fscheck.github.io/FsCheck//StatefulTesting.html


If you have any examples of this I'd love to see them. Are you just doing a random pick of X possible actions N times?


Sure - here's a simple fuzzer from a rope (fancy string) library I wrote a few years ago back when I was still learning rust. (Don't judge me!):

https://github.com/josephg/skiplistrs/blob/master/tests/test...

In a loop it simply randomly decides whether to insert, delete or replace some text, and after any action checks that the state is valid (In this case via a call to check2()).

You can go way deeper with this sort of thing if you want, with more complex models, random item generation and simplifiers to help pare down problems to simple test cases. And you can have more complex state - for example, where you also track interaction between multiple items. For example, if your state is a few users, and you also track one user paying another user and verify the total balance across all users is unchanged.

But you don't need to go deep for fuzz testing to be worthwhile. Even something as simple as this little loop is a remarkably effective bug finder.


We're using jqwik's stateful testing to achieve this, https://jqwik.net/docs/current/user-guide.html#stateful-test...

It's essentially as you described.


I had the impression idempotency was the way to go in distributed systems and not differences.


Those two concepts are not mutually exclusive. Each change can be idempotent by marking some identifying information about the state it was trying to update.

    increase balance by 10 (it is currently 100)
    balance is now 110
    increase balance by 10 (it is currently 100) <- duplicate event!
    event has already been applied
    balance is now 110
It can definitely break down at some point (maybe the user really did want to increase by 10 two times, they just clicked twice really fast?). But idempotency and change-based flow can live together.


This is where CQRS comes in. Every unique action has its own command id, so if a user clicks submit twice, the same command id is being sent twice and after the first success all future ones don't affect anymore change, they just return automatic success since that command id succeeded in the past.

Combine this with versioning, where you assert an event stream version on write, and now you're able to affect change on a known state that cannot have changed (and if it did change, your command would fail and cause no persisted changes server-side).


I'm well aware that there's no absolute standards for this, but I've built a number of CQRS systems, and I didn't ever have idempotency IDs as a core tenant of the system.

What literature, or framework or ecosystem holds this to be a hard-and-fast requirement of CQRS?

Personally I've solved this problem similar to Git, modelling CQRS on top of Merkle trees so your system at least knows if your incoming edit (command|commit) is being applied on HEAD (no changes to the model since the command started) or to HEAD+n (needing to re-run some command validations, and "rebase" the incoming command).


I don't think there is any "standard" that strictly governs the implementation of a CQRS system. We just need to be aware of the nuances of the fault-tolerance aspect in distributed systems:

- idempotency is ONLY a hard-requirement if the events are transmitted via lossy channels such as network. ID is one out of several ways to ensure idempotency. If you're implementing CQRS for your distributed system then it's needed - otherwise, if the events aren't lossy and can be processed atomically, you don't need to ensure idempotency. think implementing CQRS for your user interface where events are just transmitted between different objects in memory rather than via network


That's generating an idempotency ID on the fly, isn't it?


That means commands have to be stored as well (at least for some time) in a non-eventual-consistent way?


If a command is successfully converted/persisted into an event, that event should contain the command's ID. When the command is replayed, the domain aggregate/event stream will contain that command ID, and will therefore reject/ignore the replayed command.


So just to get it right: 1. it means that there is only one instance that processes commands and 2. this instance checks all command ids if they appeared before (so keeps track of all ids in an efficient way)?


> it means that there is only one instance that processes commands

Not necessarily. Depends on how you shard the aggregate/streams. Since aggregates _by definition_ maintain their invariants, you can have literally a shard for every aggregate. Overkill, but it illustrates the point. Since there's no state shared between aggregates, you can process each aggregate/stream on its own instance.

> this instance checks all command ids if they appeared before (so keeps track of all ids in an efficient way)

Yep. If you trust the client and don't mind losing old commands, this isn't too hard - use a monotonically increasing number (e.g. ms since the epoch) as part of the commandId. Any command with an id less than the commandId in the aggregate will be rejected as "out of date". Note that this serves as an optimistic concurrency check.

If you don't trust the client (or you don't want to lose old commands issued on a possibly out of date view) and just use GUIDs as commandIds, then yes you'll need to keep track of some ids - but not necessarily all. Do you really need commandIds past 1000? (Depends on your domain.) You'll need to "roll up" the events into some summary view anyway to check/maintain business invariants, so keeping, say, the last 100 commandIds as part of that summary view is simple.


Thank you, that makes sense and a lot more clear to me.

Now I only wish there would be a good alternatives for when you want to have multiple instances per aggregate (even if you may have to give up certain guarantees).


Interesting... just to be clear, an _individual stream of user events_ is an aggregate. E.g. a user Bob Smith's events are independent of another user like Jane Doe. It isn't the case that the entire "User Table" is a single aggregate - each row in it is an aggregate.

If you're trying to do multimaster db storage, like a phone app syncing with some cloud server, where an aggregate resides in two separate locations, this is where event sourcing shines. It's just like git, right, and so you can do all the stuff that git can do - merge event streams, rebase events... it's actually the reason why my pet project switched from an RDBMS to event sourcing. Related video: https://skillsmatter.com/skillscasts/1980-cqrs-not-just-for-...


You can run multiple instances, and use something like distributed locks to avoid contention causing failures.


Exactly. When you are affecting a change on a stream, it has to read the events anyways, so it will be able to tell if the command id is already associated with the stream since each event has the command id in its metadata.


> It can definitely break down at some point (maybe the user really did want to increase by 10 two times, they just clicked twice really fast?)

Well, that is exactly what OP meant - in the context of a distributed system it is not "at some point" it is just a given.


What if this is concurrent with "decrease balance by 10"? If the decrease occurs between the two increases, then the increases won't be detected as duplicates.


It’s my understanding that these systems don’t use the value, they’ll instead reference some kind of “state id” that refers to the state of the system as the result is some previous, accepted, change. This way, even if a decrease races the duplicate increase, the duplicate is still correctly detected.


The two are not necessarily mutually exclusive.

    Balance is 100
    Operation #1 says add 10
    Balance is 110
    Operation #1 says add 10
    Balance is still 110
If you can create unique IDs for your operations, idempotence is really easy to implement.


from 10000ft away that has the feeling of op-based CRDTs


Event sourcing can be represented as a chronological list of changes, replayed so that any list index represents a state.

How CRDTs work is different in different implementations. Some CRDTs use a timestamp to order all changes, thus making it essentially identical to an event sourced list.

Timestamp based ordering may not actually be the most ideal, especially for low trust environments where the timestamp can be spoofed.

Some apps might not last-write wins. In this case, timestamp largely becomes meaningless because there has to be some other way to resolve conflicts due to multiple clients making changes to the same piece of data. Some apps want one clear winner, other apps might want the two inputs to be merged, others (like git) explicitly ask the developer to resolve the conflict.

In this way CRDTs can diverge significantly from an ordered list of changes.


More or less.


We debated the precise meaning of Event Sourcing when I was writing my Temporal.io explainer (https://www.swyx.io/why-temporal/)

Basically the source of the debate - to do "proper" event sourcing, do you need to rerun the computations each time you roll back/forward, or is it enough to simply restore state?


I was part of that debate, I remember a rather interesting point of discussion: Is the main operation "apply" or "dedup"

Apply seems to be the common notion of event sourcing: There is a function apply that takes a state an event and yields a new state. Then, starting in an init state and iteratively applying the entire event history, boom, latest state restored.

Dedup has a lot of charm though: Run and rerun your code, if that step of your code is executed for the first time (no corresponding event in the event history) execute the step and store its result as an event in the history, however, if that step of your code is executed for the second, third time (there is a corresponding event in the event history) do not execute the step and return its result from the event in the history. The Haskell Workflow Package (https://hackage.haskell.org/package/Workflow) is a good example

Temporal follows the second approach, so "proper" Event Sourcing? You be the judge :)


This may be how Temporal works, but I imagine the way you ask "is there a corresponding event in the event history" is that you tag every "derived event" in the system with a hash of both the state and code used to create it? So if you change the code, you (eventually?) invalidate all events derived from it?

This of course would be incredibly reliant on the code not being able to access anything except the state, and have no side effects - so if you want to say "if X, fetch Y" you need to have something looking at derived state from the "if X" part and putting the results of "fetch Y" into the event stream, thus causing the state relevant to the code to change, and invalidating the need for something to "fetch Y" a second time? There's something incredibly clean and enticing about this, but also incredibly scary to implement correctly at scale!


Temporal takes much simpler approach to updating code while a program is running. For any code update, it keeps both old and the new version of the code. Then it uses the old code to replay already recorded events which allows to reconstruct the program state and takes the new code path when it is executed for the first time. Eventually the old code is removed once all the executions that used it are completed.


Hello. I love Temporal, it is an amazing solution. I have been pushing it pretty strongly at my workplace, but you guys need to make that web interface better. I lose way too many people at the demo stage.

Next year on the Q1 I'll be able to push for a full demo again. Hope we get traction this time. It's great tech, and a pleasure to use.


i'm literally working on it as we speak :) hear you loud and clear. my top 2 priorities rn are TypeScript SDK and new Web UI.


Can't remember how I first encountered your online presence, but I'm pretty sure I've followed you on Twitter for a while. Didn't realize you had joined Temporal. I've followed Temporal for a while, although I haven't yet had the chance to use it professionally. Still, I'm fairly convinced it's The Future, especially as it becomes obvious how to combine it with UX and data layer (right now, it's a bit unclear to me how much of UX state and permanent storage should live in Temporal). If I were building a green field business with automation at its core, I'd very strongly consider picking Temporal.


thank you! pls let us know if we can help with those questions when its time to evaluate us seriously.


That's a technical detail imo, as long as your implementation guarantees the outcomes are identical. The reason some people replay events is that it's a simple and easy way to restore previous state.


Event sourcing is a game changer. Everywhere I have worked has ended up reimplementing it in some form, because it is irresistible to incorporate history into your data model.

My most thorough use of event sourcing was in a real-time art auction system, which combined in-person and online bidding. An operator had to input the actions in the room into the system and bid on behalf of online bidders. Event sourcing allowed us to model what actually happened for each auction lot. It also made it possible to model undos and cancellations in really slick ways.

Command-query response segregation (CQRS) is also a powerful concept, and it combines elegantly with event sourcing. I also find it to be a handy way of thinking of raw and derived data, rather than trying to perfect a normalized data model to serve all purposes.


The "reimplementing" part resonates with me 100%. I've also reimplemented this solution many times and every time it has been a pain in the ass.

I built https://batch.sh specifically to address this - not having to reinvent the wheel for storage, search and replay.

For some cases, storing your events in a pg database is probably good enough - but if you're planning on storing billions of complex records AND fetching a particular group of them every now and then - it'll get rough and you need a more sophisticated storage system.

What storage mechanism did you use? And how did you choose which events to replay?


In our case, we used Postgres. Our event volume was quite small, and we needed strict consistency (i.e. all participants in the auction see the same state). So we stored the log of events for an auction lot as a JSON blob. A new command was processed by taking a row-level lock (SELECT FOR UPDATE) on the lot's row, validating the command, and then persisting the events. Then we'd broadcast the new derived state to all interested parties.

All command processing and state queries required us to read and process the whole event log. But this was fairly cheap, because we're talking maybe a couple dozen events per lot. To optimize, we might have considered serializing core aspects of the derived state, to use as a snapshot. But this wasn't necessary.

Batch looks pretty cool! I'll keep that in mind next time I'm considering reinventing the wheel :)


I'm late to this party but I'll chime in anyway - I love event sourcing. So much so that I actually built a company around it (and got into YC S20) - https://batch.sh

Event sourcing is not easy but the benefits are huge - not only from a software architecture perspective but from a systems perspective - you gain a whole lot of additional platform reliability.

There are roughly 3 pillars to event sourcing that have to be built (most of the time) from scratch - storing events (forever), reading stored events, replaying stored events.

Those are the 3 things I've built at several companies and it is always a huge barrier to someone taking up the pattern. Of course there are more things under the hood, but those are foundational pieces that will make the whole experience much better.

Would love to chat with folks who want to nerd out over this stuff :-)


As a techie-turned-PM , I found this pattern really useful when discussing specs with different stakeholders. Business stakeholders understand their business as a series of events (Customer walked in, customer bought X , customer checked out etc) so the user stories become a lot more relatable to both developers and business folks. I vaguely remember events being discussed in "Domain modelling made functional" , too


Very much agree. I had a memorable conversation with a PM a short time before the pandemic about this very subject. She was enthused about how it would make her work more closely aligned that of developers.


I’m surprised to see positive experiences with event sourcing in the comments, I had the impression that in older posts about event sourcing there were horror stories, and I think even Martin Fowler said in most cases he’d seen it didn’t work. Or maybe he was referring specifically to CQRS, but isn’t that used most of the time along event sourcing?


Honestly, you should take negative comments about technology on HN (and elsewhere) with a pinch of salt. By reading HN, I have learned that the language I earn a living with is unfit for industry use. I also frequently encounter claims that the stuff I do is impossible, etc.

From what I have seen however, industry is much more technologically diverse than people imagine.

Event sourcing is an amazing tool. The failing projects are usually the ones that want to use it without acknowledging that it offers different tradeoffs from more traditional data modeling/storing solutions, and not adapting to meet different ups and downs.

There are also many projects that want to use event sourcing, but actually end up picking up event sourcing, reactive programming and a host of frameworks or even languages to do the second. That's a lot of technical baggage to learn at once, and people making the switch in a production context with tight deadlines and without deep pockets to hire experienced people usually don't fare so well.


Event sourcing is really good for building occasionally connected applications (e.g. phone app). Makes conflict resolution so much easier than, say two relational databases.

Basically it follows the git model - distributed systems are each their own sources of truth, and allows for (relatively) simple syncing of state. When two distributed systems conflict, well... you need to write some kind of merge algorithm that may or may not ask for user input. But at least the accidental complexity is significantly reduced.


I often find myself wishing for some sort of small scale durable log, so that this pattern was easy to implement in a small app backend without standing up something like Kafka.


I recently refactored a small set of services to use events instead of depending directly on eachother. I initially had Kafka in mind, but it would've been an absolute overkill. A "simple Kafka" would definitely be nice for these cases.

I ended up using Redis Streams [0], which was good enough for my small-scale need. We already had Redis in our stack too, so it was a very simple integration.

[0] https://redis.io/topics/streams-intro


Hi there, I started a company specifically centered around event sourcing - it's a saas platform for capturing all of your events, gaining the ability to granularly search them and then replay them to whatever destination.

I'd be happy to walk you through the platform - there's no lock-in, since we don't require the use of any SDK's, just run an event-relaying container which will pipe all of your events to our stores.

One big piece is that our platform is message bus agnostic meaning, we are able to hook into any message bus, be it kafka, rabbit, nats, etc.; same goes for replays - we can replay into any destination.

Check it out: https://batch.sh

The relayer is open source - https://github.com/batchcorp/plumber - if anything, the relayer can also be used for working with message busses which could improve your dev workflow for reading/writing messages, etc.


Check out MessageDB https://github.com/message-db/message-db. It's basically Postgres and can be installed into any Postgres DB.


For a small app, you may simply create a `events` table in a pg, with a json field storing the payload.

It works pretty well this way, not everything needs to be streamed.


Last time i looked into it there weren't that many i could find. There is https://github.com/tikv/tikv which uses rocksdb with raft. and there is faster https://github.com/microsoft/FASTER/ .


Styx was built with this kind of use in mind, among other things. If you’re using Go you can even use the storage engine standalone. It’s pretty robust and very fast (millions of fsynced writes/sec).

https://github.com/dataptive/styx


I think this may be exactly what I was hoping for :)


I’ll be more than happy to get your feedback if you test it ! Especially potential shortcomings preventing its use in production settings.


What's wrong with EventStore? https://www.eventstore.com/


Nice! Hadn't heard of it, will check it out.


They have a channel in this slack that's pretty good for getting answers to questions:

https://github.com/ddd-cqrs-es/slack-community


Have you considered rabbitmq or nats?


try jetstream


TDD stole much, if not all, of the airtime Event Sourcing should have received.


Interesting to see a lot of positive comments about CQRS + Event sourcing, I only had one experience with it and it wasn't good after about 1.5 year of development it needed to be rewritten without ever being in production.

The whole thing was super complex to understand, the event flow that could be triggered made it hard to understand what was going on and migrations where very difficult to do correctly. Notifying frontend with "eventual consistency" was pretty terrible to.


The adage “when you have a hammer, everything starts to look like a nail” comes to mind.

Event sourcing is a very powerful concept, but it’s very easy to use inappropriately. If you can’t map your business/domain problem very easily into the paradigm, it is worth pausing and reconsidering. Don’t get stuck maintaining a system you never should have built in the first place.


I recently built https://github.com/siriusastrebe/jsynchronous which uses event sourcing to synchronize javascript variables on the server with connected browsers.

You can also replay states using a special “rewind” mode, a core advantage of event sourcing.


This article[0] by Arkwrite continues where Martin Fowler left off, and does a good job of explaining what Event Sourcing is, complete with illustrations and stick-men.

[0] https://arkwright.github.io/event-sourcing.html


I'm experimenting with this for client-server design in game development, to ensure that the clients are in sync with the server: since the only way to change the game state is through events, I (as a developer) am unable to forget to implement some notification for something that happens. This also gets me match replays as an extra.

I came across the Update Monad [1] while researching how to use Event Sourcing in functional languages: it basically has the same guarantees of only changing the state through recorded events. Beware though, as it can be quite slow if implemented naively.

[1]: https://chrispenner.ca/posts/update-monad



Was this time though...


The biggest problem I've encountered in places that adopt Event Sourcing is using it as a hammer. Not every piece of information needs to be event sourced, and neither is it a good fit for every use case.


This is the direction I want to take our product. Running RCA on one of our transactions is really difficult because we only ever store the current state of affairs.

I am still struggling with the grain on which I would want to source events. The engineer in me says its safest to go low-level and source key-value, so it doesnt matter if the biz model changes. But, this would reduce fidelity on the other side in terms of scanning for specific events of importance (e.g. SetKeyValue vs SetCustomerEmail).


I would argue that there is no such thing as "the business model". There may be a projected rolled up summary state... but that's only one possible view out of infinitely many. For example, one possible "business model" could be "customers who have changed their emails in the past month". You shouldn't favor any particular view as being the favored "business model"... unless you're doing snapshots, in which case everything goes out the window.

Basically, I vote for `SetCustomerEmail`.


Events have already happened. The idiomatic name would be `CustomerEmailSet` or perhaps just `EmailSet` or `EmailChanged`.


From my experience, definitely go with the second approach. Try to spend a lot of time initially defining the events with domain experts. Designing events that map well to the domain in the first place is the hardest, but also the most important criteria for success with event sourcing. Usually that means very specific event types. A simplified example from my career: an event EquityPurchased { Portfolio, Symbol, Quantity, Price } can be used when the customer says: "Our portfolio should calculate cost FIFO rather than average". Very specific events are also easier to version, because often something in the past version has been implicit. In the example the customer may say: "we are going to purchase some equities on the Oslo Stock Exchange, we didn't really care about currency and currency rates since we only bought US stocks before." So now we can make a new verison of EquityPurchased { Portfolio, Symbol, Quantity, Price, Currency = "USD", ExchRate = 1 }


If your events are well defined, you should be able to understand what happened by looking only on the event names. So, SetKeyValue is not good here: "what key? what does it mean?". CustomerEmailSet or CustomerEmailChanged clearly describes what happened with the customer.


> If your events are well defined

This is clearly the key to the whole picture.

Reading through this and other comments this morning, and I am starting to develop some concrete notions.

Seems like I simply need to build events for each unique type of fact that could exist in the domain. The verbosity of this seems extreme, but also makes a ton of sense.

The thought in my mind right now - If you combine event sourcing with a model in 6NF, the mapping should be trivial. Every fact table has its own series of events pertaining to it (Create/Update/Delete).

This also seems elegant - If your core data representation is inherently linear, do you care that you have this degree of normalization going on? You have to reconstruct some reasonable in-memory representation from the log anyways, so your live in-memory instance could be ~3NF while it ultimately transacts with a 6NF event log on disk. This also plays nicely with backwards compatibility and data migrations.


Start with very granular events. They can be aggregated later if needed, but the reverse isn't possible.


Isn't it the other way around? If you have an “AB” event, it's fairly trivial to go through the log and split it into “A then B”. If you have “A then C then B” for whatever reason and then realize that you need to aggregate “A” and “B”, that is much harder.


How do people handle versioning events, like adding/removing/changing properties? It seems like your application would always need to be aware of all previous event versions to be able to replay all events.


Where I work (meetdandy.com, digital dental lab), we do a combination of:

For small changes like adding a field, we make the property on the underlying event optional, but require it in the event dispatch function. That allows old events to be missing the field, but type-checking still requires it in the codebase going forward. It does require a special case to handle the missing field, however.

For larger changes to events, we do migrations. We store events in Postgres, so we just use TypeORM migrations. I haven’t had to do one of these myself, but I’ve heard secondhand that it’s not too bad. We don’t have very many of these, regardless.


Event Sourcing advocates: How do you deal with cleaning up events, especially with respect to the GDPR.

I inherited an application with 24M+ events and I have no idea how to go about it.


We don't delete events, just replace sensitive info in events with the specific string marker (by direct update in DB) and then update projections. In this case, the system remains consistent because all the related aggregates and events are still there. You can safely rebuild all the projections without issues and see these string markers instead of deleted info.


^ 100% this. Do not delete events. They are your source of truth - you shouldn't really even modify them but stripping out PII is "alright".

Re 24M+ records: create a batch runner that goes through "jobs" to perform stripping/cleaning tasks. To store state (and to organize cleaners), use a distributed store such as etcd - that way you can bookmark where you were at in the cleaning process.


Two paths : encrypt customer data in payloads with a customer-specific key that you can throw away at will ; OR ; allow some events to rewrite their history / stream.


Not an advocate, but you just run the script to clean the data (as ordinary events).

It's not really different than handling regular backups. You don't wipe them, just guard them well until they expire.


Does it comply when I cant see the data but it is still there?


In Kafka, you can use log-compacted topics, which only keep the last snapshot of a key, and allows deletions via tombstones.


(2005)


Added. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: