Hacker News new | comments | ask | show | jobs | submit login
Event Sourcing made Simple (kickstarter.engineering)
197 points by based2 9 months ago | hide | past | web | favorite | 40 comments

If you're already using react/redux, there's one really nice pattern to make eventsourcing easy to implement. I learned it from the boardgame.io google project[1].

Basically redux is more or less already doing eventsourcing on the client. You have actions("events") that tells your global state what the next state should be, and replaying the history of all actions will deterministically get you to your current state. The only thing you need to do is persist that history of actions in your state also (which is already commonly done for implementing undo and time-travel).

Then all that is left is to make your server use these same events too! This means whenever you save to the server, rather than doing GraphQL/REST-style per resource updates, you just sync any unsaved redux actions from the client. The server will save these actions and optionally replay them to get a snapshot of current state (or don't - that's just an optimization. The queue of actions is the first class citizen now).

Afterwards realtime collaboration, undo/time-traveling, history/audit logging, etc all come for free.

This is a really nice way to unify event shape design that is consistent on client and server.

[1] https://news.ycombinator.com/item?id=15946425 or https://github.com/google/boardgame.io.

(I am making it sound a little simpler than it is. There's a lot of details to get right with state design, tagging actions as relevant to be persisted, multiplayer conflict resolution, app versioning issues, event queue compaction, serverside permissions validation, etc. But to get a prototype up and running quickly what I said above will work. Boardgame.io didn't worry about those details but it's still a nice codebase to study)

A month ago I was trying to implement something redux-like on the backend. Check it out [1]. Its a PoC, events a stored into mongo and a reducer is applying them to produce state (projection). I added another component called decider, which essentially has access to state and then decides for each event whether to process or not! Could be part of reducer but I thought it was nice to split that!

Next step for improvement would be to split event receiving side and event displaying side to get CQSR pattern.

I didn’t write any docs or so, it was just playing around! In the index.js there are some routes that store events and some that show you projection of state.

I really hope event sourcing will get more traction. The idea that you defer building your state model is in my opinion incredibly strong. But there are some open questions that i find hard to answer! Like instant feedback to client, replayabolity of side effects, ensuring ordering when having multiple services etc!

[1] https://github.com/MarkusPfundstein/event-sourcing-node

An alternative to redux is mobx-state-tree, which has something along these lines as well. Very cool. https://github.com/mobxjs/mobx-state-tree/blob/master/packag...

Side effects and non-determinism can break the relationship between action ("command") and event. You'd need to take care with things such as random number generation, to use a PRNG whose call sequence is determined only by the actions, and whose seeding is an action itself.

Side effects caused by action handlers are also something to beware - don't launch the missiles when you're replaying the past.

There are certain ideas which are virulent that people seem to catch, like an incurable virus. One of them I believe is Event Sourcing.

I was hired into one $100m+ greenfield project (silly money by a silly investor) where the novice technical management wanted everything done "perfect" - no room for compromise.

Project failed, and 600 people that were hired over the 18 months were then all fired.

Every piece of data was to be event-sourcing, every data transaction on a global message queue (Kafka), the ultimate pure event sourcing model.

The big expenses with event sourcing are up-front design of protocols, backward compatibility of protocol changes (if you got it wrong), and problematic structuring of coordinated queries (using coordinators.)

Trivial stuff like serialisation starts to become 30%+ of your development cost.

The simplest example we had, for instance, is the creation of a new user on sign up.

But the Authentication module was not the same as the personalisation module.

So we had authentication creation, then personalisation module need to learn about the user and create the personalisation defaults, as a coordinated, distributed query.

All with Kafka. The next thing here is you discover you need RPC and not Pub/Sub. So you end up seeing confused developers bending the architecture out of shape, doing RPC over pub/sub - which looks like a dog's dinner.

My general view of event-sourcing is "do it in the small" for specific problems where you need it.

If you think it's the silver bullet, then please read Fred Brooks again, and stop pushing your golden solution to all things on us all, please.

BTW, you don't need a fancy framework to keep time-versioned histories in a relational database (I realise RDBMS doesn't scale for all systems, but it does for many.) You just need to design your schema well, and preferably, make your writes go through stored procedures which transactionally maintain history and current schemas.

I've had a similar experience at a smaller scale with kafka and something of a CQRS architecture.

I can vouch that at least 30% of development cost went into things that are otherwise very trivial. The risks of defining the wrong domain boundaries early on has culminating costs and besides specific use cases the benefits are not outweighed by the disadvantages.

Even things such as business intelligence need to have knowledge of events and how they reduce to comprehensible state. For a startup this is can turn into a disaster of dealing with technology instead of focusing on solving problems.

Lastly, GDPR really changes things when you need to be able to anonymise or delete data. It doesn't play too well with the event sourcing model.

Something confusing I find in the example of a "reactor" here. The example causes a side effect "Send email" and another creates an event. Is the idea here that you're supposed to completely cut out "reactors" when you play back events? because, with these two examples, you'd shoot off a stale email, and then create a duplicate "showed promo" event that goes... exactly where when you're replaying?

It almost seems like it's obvious to just cut out reactors when you're replaying, but what if you introduce an almost invisible dependency on a reactor's results that makes the playback of application state NON deterministic? The old joke "then just don't do that" comes to mind, but it would be a bit nicer if the rules about what's allowed to do what, and more detail on what to avoid should probably be spelled out here?

It almost seems like allowing reactors to generate new events is a mistake! how do you keep that from becoming unmanageable mess?

I think I understand your problem. When applying an event, how do you know whether it’s an event that just happened now, or something you’re playing back from months ago?

One solution I implemented was to log all of the ‘reactions’. Then when reacting, I checked whether the reaction had already happened to prevent side effects. For example:

    if !reactionLog.happened(reaction)
Admittedly, this was a system used by only 200 people and performance wasn’t too much of a concern. Also, if ‘sendEmail()’ has side effects, they won’t happen.

We used to differentiate between projections and event handlers

Projections only create/update/delete views and are idempotent.

Event handlers could do side effecty things like emails or generate another command that goes back into the system.

While playing back events, just didn't hook up the event handlers at all.

Worked beautifully and it was very liberating to be able to wipe out your entire read side and recreate it either exactly the same way or tweak it to handle evolving requirements.

Presumably, for this to work “event handlers” must not modify views, though they could also produce events (reflecting the fact of completion of their own side effects, if that needs to be noted) of their own that could drive projections or other event handlers.

I deal with it in the following way: External calls are only triggered on Commands, never on events.

In order to achieve this, I use process manager.

Say the event "User Signed Up" is dispatched. A process manager will then execute the "Send Welcome Email", the aggregate checks if the user aggregate root is in a state where a welcome email should be sent (something like a flag `welome_email_sent`). Depending on the result of your email provider, the event "WelcomeEmailSuccessfullySent" or "WelcomeEmailNotSent".

It is possible to react to the erroneous event with another process manager and retry.

The event sourcing pattern (or something akin to it) can also be applied on the data structure level to create very resilient and flexible CRDTs: http://archagon.net/blog/2018/03/24/data-laced-with-history/

It's an amazing technique for implementing eventual consistency!

Well now I guess I have even more to read on the train tomorrow.

I was very bullish on event sourcing, based on my experiences with databases that struggled with historical queries. Since then I've recognised that they come with a lot of effort and don't play nicely with existing systems.

I've also learned that, as usual, I didn't know enough and that others had gotten to this problem and thought about it more thoroughly than I ever could.

> These questions could be answered in seconds if we had a full history.

I agree, but that does not necessarily require an event sourcing design. I think the better fit here is bitemporal databases.

A bitemporal database can answer pretty much any query about what is currently true, what was true and what will be true in future; about the story of a sequence or the existence of a sequence or of a point in time; about not only what "is" or "was" true but when you believed it to be true.

So I can not only ask "what was the total value of the cart last thursday", I can ask "what did we believe the total value was, before we applied a correction?"

I've recently given this general area of systems a lot of thought, trying to square away several different schools of thought around data. Principally stream processing in the style of Akidau/Chernyak/Lax, bitemporal databases as described by Snodgrass and dimensional modelling as described by Kimball. Plus some ideas pinched from Concourse and a hasty skimming of Enterprise Integration Patterns by Hohpe, Woolf et al.

As it happens I will be trying to pitch some of this lunatic handwaving to colleagues this coming week. My basic goal is that you should have pluggability without needing to rewrite the universe. Streaming without giving up tables. Tables without having to convert them always to streams. Different views of data treated on their own terms, in their own form, without elevating one or the other to being The One True Way Of Managing Data.

Alternatively, I'm wrong.

Since then I've recognised that they come with a lot of effort and don't play nicely with existing systems.

I'm curious if you think this is because the primitives don't exist (seems everyone doing CQRS/ES roll their own) or if it isn an "essential complexity" of doing CQRS/ES?

I'm leaning more towards the former and less of the latter. Although I'm only a month down the road of implementing an CQRS/ES system.

A mix. Accidental complexity plays a role, it's just hard to boil an ancient ocean of data.

But I have also come to think that streaming or eventing systems have come to overstate the argument that the stream is the "true" system. It is and it isn't.

The analogy a lot of folk hit on before I did is to calculus, including streaming folks. But I think it actually proves that in "streams and tables", neither is the truest of them all.

I can take the instantaneous velocity of an object, or I can take it over a span of time, or I can calculate its acceleration, or perhaps the distance that has been traveled. These are all functions that can be reached by deriving or integration from each other. But none of them is the high lord master formula. They are just different representations that make sense in different cases.

Event sourcing has worked very well for my former employer, Jet.com[0]. One of their engineers recently gave a talk about functional programming and event sourcing at Microsoft Build which was pretty good[1].

[0] https://medium.com/@eulerfx/scaling-event-sourcing-at-jet-9c...

[1] https://www.youtube.com/watch?v=dSCzCaiWgLM

I think more and more things will be developed like this instead of just writing CRUD apps. It kind of makes sense since storage is super cheap, lambdas and other event-driven systems are getting popular, and it fits really nicely with start-ups being able to pivot existing data to a new schema/database/analytics.

I've read before this is not recommended by people who did it (besides people who are consulting firms or promote Kafka). Is there a reason for this anyone can chime in with?

Event sourcing and CQRS as a whole leads to horrific complexity quickly, especially if you violate certain constraints.

How do you synchronize multiple events? How do you handle partial system outages? What is the retry strategy for failed events and how do we handle inconsistent data?

Then comes the fact that you've more then doubled your data size since you need your source of truth and a copy on each read view - that gets expensive quickly.

I agree with Fowler - CQRS/ES is probably too complicated and you should avoid it unless you have a well described bounded domain that fits the model.

> Event sourcing and CQRS as a whole leads to horrific complexity quickly, especially if you violate certain constraints.

This has been my lived experience with it, as well. I find that, in retrospect, the complexity was really not worth it.

And now, with regulation like GDPR, how do you handle things like the right to be forgotten in an (and especially if it's a legacy app) ES/CQRS architecture? What I've personally discovered as the approach we implemented at work is horrifically complex, to borrow your words.

"How do you synchronize multiple events? How do you handle partial system outages? What is the retry strategy for failed events and how do we handle inconsistent data?"

Funny thing is that the traditional stateful model handles these scenarios horrendously as well. Probably more so, since you have no history to reconcile with.

In other words, you should avoid using a model aimed at simplifying a domain model if you don't have one? Sounds fair...

The canonical description of the domain model pattern (from Fowler, also) calls for it be _considered_ in a case where there are "complex and ever-changing business rules". He goes on say that "if all you have is some sums and null checks, a different transaction processing pattern is more appropriate".

I have seen a lot of event sourced systems, and the vast majority which fail fall into the "not null checks and trivial sums" case rather than the former.

If you can store events in a sql database and use transactions and use idempotent APIs, a lot of those issues go away.

If you can do those things, you don't need cqrs to begin with.

> How do you synchronize multiple events? How do you handle partial system outages? What is the retry strategy for failed events and how do we handle inconsistent data?

Make an event sourcing system single threaded, and the problems of synchronisation don't exist. Make a "traditional" RDBM system distributed, and the problems of synchronisation exist.

If you weren't building a distributed (or parallelised) system to start with, why introduce it with event sourcing? If you were, how were you going to solve all those problems with a traditional RDBMS approach?

The solutions are more or less the same.

Perhaps "retry strategy for failed events" is new - how would you have handled a replicant failing in a distributed RDBMS application?

The data size issue in my experience pales in comparison to the problem of having the business ask questions you can't answer because you threw away data. How high is your transactional rate, anyway? Views shouldn't be a data size concern, as they should only contain the pertinent data. In many cases you should be able to hold views completely transiently in memory. It's relatively rare that a view is expensive to compute from an ordered history.

> I agree with Fowler - CQRS/ES is probably too complicated and you should avoid it unless you have a well described bounded domain that fits the model.

Given the model fits any domain in which things happen, this might not be the best way to determine if ES is appropriate - it's probably more along the lines of how important history is, how important flexibility for future use of data is, and how important time to market is. If we give up security to get a product out sooner, we would pretty quickly also give up a total history of the system, too.

There's also more than one way to skin this cat - you can use transaction scripting and log transactions; you can use database table level auditing; you can have a high level business intent audit log. Probably others. They all have complexities and drawbacks.

I don't personally find CQRS/ES particularly complicated, having worked with it a few times. I don't apply to every project, the same way I don't apply any of the other history mechanisms to every project, but I'll reach for it over those other mechanisms, because I prefer the costs of ES over the costs of those mechanisms.

I discovered the promises of CQRS/ES through Martin Fowler. I didn’t realize he had discounted it to the extent you’ve said, but I’d like to read his thoughts. Do you have a link to the article, or is it in a book?

They are probably referring to this: https://www.martinfowler.com/bliki/CQRS.html

I wokred a lot with CQRS/ES before and looked into it when building our tech stack of a new startup. My takeaway is that the complexity it brings would simply get in the way for a new system that don't have a lot of complexity.

You will simply get things faster without it, and can always introduce it later to progressively phase out components with lots of debt.

I would love to use Event Sourcing everywhere, if storage was cheap. People, including in this thread, usually say storage is cheap: it is not cheap. Look at the prices for S3, storing _all the events every user performs on an app in the entire history_ is too much even for Google and its sampling of Google Analytics events.

I agree that SSD storage is not cheap. I think that you can offload 'old' data to S3 or something.

Event sourcing made simple. It sure is simple when you don't care about who can do what.

> This post has been created by A who is a temp. So it must be approved by an editor first. How do I check those things before validating my command? Let's forget about that, but a link (maybe) to the saga pattern and consider it done.

Some things come for free but a lot of easy things become a lot more complex. But this will be hidden to sell the new silver bullet. What's funny is they often use git as an example: remember you rarely use git alone when you want to do things like code review and handling permissions.

You don’t use event sourcing on its own either. You can use Rails (or Django or Symfony or whatever) to validate the command and add end-user friendly error messages. Then your models should again validate the command as an ‘anti-corruption layer’, this time throwing developer friendly error messages.

That said, I completely agree with your assessment. Event sourcing makes simple things a lot harder. It should be used sparingly, only for those situations when the history is genuinely useful.

This is a really interesting concept, which does make sense. The aggregates become pretty important in this kind of setup. I'm pretty familiar with Splunk, and most of these concepts align with the capabilities of Splunk, although Splunk is not usually used in these sorts of use-cases.

Has anyone here here implemented a system like this based on Splunk, and what was your experience while doing this?

I like this concept, except I see Calculators as just a special case of Reactors. Generally, Reactors are all about "when this condition is true, do that"; Calculators are exactly the same, only with a specific definition of that (update some Aggregates).

See cloud event spec https://github.com/cloudevents/spec

Would be nice if the events here were compatible

The CloudEvents spec is, at the moment, basically an envelope format.

Yes, with auth, so you can connect systems in a standard way and reuse cloud emitted events

Don't get me wrong, I want it to be a success, and I think it has a good chance of meeting its goals. I just don't want folks to read into it what isn't there.

"...made simple" might be a bit of an overstatement.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact