
Event Sourcing made Simple - based2
https://kickstarter.engineering/event-sourcing-made-simple-4a2625113224
======
arnioxux
If you're already using react/redux, there's one really nice pattern to make
eventsourcing easy to implement. I learned it from the boardgame.io google
project[1].

Basically redux is more or less already doing eventsourcing on the client. You
have actions("events") that tells your global state what the next state should
be, and replaying the history of all actions will deterministically get you to
your current state. The only thing you need to do is persist that history of
actions in your state also (which is already commonly done for implementing
undo and time-travel).

Then all that is left is to make your server use these same events too! This
means whenever you save to the server, rather than doing GraphQL/REST-style
per resource updates, you just sync any unsaved redux actions from the client.
The server will save these actions and optionally replay them to get a
snapshot of current state (or don't - that's just an optimization. The queue
of actions is the first class citizen now).

Afterwards realtime collaboration, undo/time-traveling, history/audit logging,
etc all come for free.

This is a really nice way to unify event shape design that is consistent on
client and server.

[1]
[https://news.ycombinator.com/item?id=15946425](https://news.ycombinator.com/item?id=15946425)
or
[https://github.com/google/boardgame.io](https://github.com/google/boardgame.io).

(I am making it sound a little simpler than it is. There's _a lot_ of details
to get right with state design, tagging actions as relevant to be persisted,
multiplayer conflict resolution, app versioning issues, event queue
compaction, serverside permissions validation, etc. But to get a prototype up
and running quickly what I said above will work. Boardgame.io didn't worry
about those details but it's still a nice codebase to study)

~~~
codebje
Side effects and non-determinism can break the relationship between action
("command") and event. You'd need to take care with things such as random
number generation, to use a PRNG whose call sequence is determined only by the
actions, and whose seeding is an action itself.

Side effects caused by action handlers are also something to beware - don't
launch the missiles when you're replaying the past.

~~~
DougBTX
Reminds me of eg:
[https://gafferongames.com/post/deterministic_lockstep/](https://gafferongames.com/post/deterministic_lockstep/)

------
jpz
There are certain ideas which are virulent that people seem to catch, like an
incurable virus. One of them I believe is Event Sourcing.

I was hired into one $100m+ greenfield project (silly money by a silly
investor) where the novice technical management wanted everything done
"perfect" \- no room for compromise.

Project failed, and 600 people that were hired over the 18 months were then
all fired.

Every piece of data was to be event-sourcing, every data transaction on a
global message queue (Kafka), the ultimate pure event sourcing model.

The big expenses with event sourcing are up-front design of protocols,
backward compatibility of protocol changes (if you got it wrong), and
problematic structuring of coordinated queries (using coordinators.)

Trivial stuff like serialisation starts to become 30%+ of your development
cost.

The simplest example we had, for instance, is the creation of a new user on
sign up.

But the Authentication module was not the same as the personalisation module.

So we had authentication creation, then personalisation module need to learn
about the user and create the personalisation defaults, as a coordinated,
distributed query.

All with Kafka. The next thing here is you discover you need RPC and not
Pub/Sub. So you end up seeing confused developers bending the architecture out
of shape, doing RPC over pub/sub - which looks like a dog's dinner.

My general view of event-sourcing is "do it in the small" for specific
problems where you need it.

If you think it's the silver bullet, then please read Fred Brooks again, and
stop pushing your golden solution to all things on us all, please.

BTW, you don't need a fancy framework to keep time-versioned histories in a
relational database (I realise RDBMS doesn't scale for all systems, but it
does for many.) You just need to design your schema well, and preferably, make
your writes go through stored procedures which transactionally maintain
history and current schemas.

~~~
2color
I've had a similar experience at a smaller scale with kafka and something of a
CQRS architecture.

I can vouch that at least 30% of development cost went into things that are
otherwise very trivial. The risks of defining the wrong domain boundaries
early on has culminating costs and besides specific use cases the benefits are
not outweighed by the disadvantages.

Even things such as business intelligence need to have knowledge of events and
how they reduce to comprehensible state. For a startup this is can turn into a
disaster of dealing with technology instead of focusing on solving problems.

Lastly, GDPR really changes things when you need to be able to anonymise or
delete data. It doesn't play too well with the event sourcing model.

------
ZenPsycho
Something confusing I find in the example of a "reactor" here. The example
causes a side effect "Send email" and another creates an event. Is the idea
here that you're supposed to completely cut out "reactors" when you play back
events? because, with these two examples, you'd shoot off a stale email, and
then create a duplicate "showed promo" event that goes... exactly where when
you're replaying?

It almost seems like it's obvious to just cut out reactors when you're
replaying, but what if you introduce an almost invisible dependency on a
reactor's results that makes the playback of application state NON
deterministic? The old joke "then just don't do that" comes to mind, but it
would be a bit nicer if the rules about what's allowed to do what, and more
detail on what to avoid should probably be spelled out here?

It almost seems like allowing reactors to generate new events is a mistake!
how do you keep that from becoming unmanageable mess?

~~~
rraghur
We used to differentiate between projections and event handlers

Projections only create/update/delete views and are idempotent.

Event handlers could do side effecty things like emails or generate another
command that goes back into the system.

While playing back events, just didn't hook up the event handlers at all.

Worked beautifully and it was very liberating to be able to wipe out your
entire read side and recreate it either exactly the same way or tweak it to
handle evolving requirements.

~~~
dragonwriter
Presumably, for this to work “event handlers” must _not_ modify views, though
they could also produce events (reflecting the fact of completion of their own
side effects, if that needs to be noted) of their own that could drive
projections or other event handlers.

------
archagon
The event sourcing pattern (or something akin to it) can also be applied on
the data structure level to create very resilient and flexible CRDTs:
[http://archagon.net/blog/2018/03/24/data-laced-with-
history/](http://archagon.net/blog/2018/03/24/data-laced-with-history/)

It's an amazing technique for implementing eventual consistency!

~~~
jacques_chester
Well now I guess I have even more to read on the train tomorrow.

------
jacques_chester
I was very bullish on event sourcing, based on my experiences with databases
that struggled with historical queries. Since then I've recognised that they
come with a lot of effort and don't play nicely with existing systems.

I've also learned that, as usual, I didn't know enough and that others had
gotten to this problem and thought about it more thoroughly than I ever could.

> _These questions could be answered in seconds if we had a full history._

I agree, but that does not necessarily _require_ an event sourcing design. I
think the better fit here is bitemporal databases.

A bitemporal database can answer pretty much any query about what is currently
true, what was true and what will be true in future; about the story of a
sequence or the existence of a sequence or of a point in time; about not only
what "is" or "was" true but _when you believed it to be true_.

So I can not only ask "what was the total value of the cart last thursday", I
can ask "what did we _believe_ the total value was, before we applied a
correction?"

I've recently given this general area of systems a lot of thought, trying to
square away several different schools of thought around data. Principally
stream processing in the style of Akidau/Chernyak/Lax, bitemporal databases as
described by Snodgrass and dimensional modelling as described by Kimball. Plus
some ideas pinched from Concourse and a hasty skimming of _Enterprise
Integration Patterns_ by Hohpe, Woolf _et al_.

As it happens I will be trying to pitch some of this lunatic handwaving to
colleagues this coming week. My basic goal is that you should have
pluggability without needing to rewrite the universe. Streaming without giving
up tables. Tables without having to convert them always to streams. Different
views of data treated on their own terms, in their own form, without elevating
one or the other to being The One True Way Of Managing Data.

Alternatively, I'm wrong.

~~~
ryanmarsh
_Since then I 've recognised that they come with a lot of effort and don't
play nicely with existing systems._

I'm curious if you think this is because the primitives don't exist (seems
everyone doing CQRS/ES roll their own) or if it isn an "essential complexity"
of doing CQRS/ES?

I'm leaning more towards the former and less of the latter. Although I'm only
a month down the road of implementing an CQRS/ES system.

~~~
jacques_chester
A mix. Accidental complexity plays a role, it's just hard to boil an ancient
ocean of data.

But I have also come to think that streaming or eventing systems have come to
overstate the argument that the stream is the "true" system. It is and it
isn't.

The analogy a lot of folk hit on before I did is to calculus, including
streaming folks. But I think it actually proves that in "streams and tables",
neither is the truest of them all.

I can take the instantaneous velocity of an object, or I can take it over a
span of time, or I can calculate its acceleration, or perhaps the distance
that has been traveled. These are all functions that can be reached by
deriving or integration from each other. But none of them is the high lord
master formula. They are just different representations that make sense in
different cases.

------
Nelkins
Event sourcing has worked very well for my former employer, Jet.com[0]. One of
their engineers recently gave a talk about functional programming and event
sourcing at Microsoft Build which was pretty good[1].

[0] [https://medium.com/@eulerfx/scaling-event-sourcing-at-
jet-9c...](https://medium.com/@eulerfx/scaling-event-sourcing-at-
jet-9c873cac33b8)

[1]
[https://www.youtube.com/watch?v=dSCzCaiWgLM](https://www.youtube.com/watch?v=dSCzCaiWgLM)

------
0xCMP
I think more and more things will be developed like this instead of just
writing CRUD apps. It kind of makes sense since storage is super cheap,
lambdas and other event-driven systems are getting popular, and it fits really
nicely with start-ups being able to pivot existing data to a new
schema/database/analytics.

I've read before this is not recommended by people who did it (besides people
who are consulting firms or promote Kafka). Is there a reason for this anyone
can chime in with?

~~~
vorpalhex
Event sourcing and CQRS as a whole leads to horrific complexity quickly,
especially if you violate certain constraints.

How do you synchronize multiple events? How do you handle partial system
outages? What is the retry strategy for failed events and how do we handle
inconsistent data?

Then comes the fact that you've more then doubled your data size since you
need your source of truth and a copy on each read view - that gets expensive
quickly.

I agree with Fowler - CQRS/ES is probably too complicated and you should avoid
it unless you have a well described bounded domain that fits the model.

~~~
joevandyk
If you can store events in a sql database and use transactions and use
idempotent APIs, a lot of those issues go away.

~~~
vorpalhex
If you can do those things, you don't need cqrs to begin with.

------
fiatjaf
I would love to use Event Sourcing everywhere, if storage was cheap. People,
including in this thread, usually say storage is cheap: it is not cheap. Look
at the prices for S3, storing _all the events every user performs on an app in
the entire history_ is too much even for Google and its sampling of Google
Analytics events.

~~~
amirouche
I agree that SSD storage is not cheap. I think that you can offload 'old' data
to S3 or something.

------
arkh
Event sourcing made simple. It sure is simple when you don't care about who
can do what.

> This post has been created by A who is a temp. So it must be approved by an
> editor first. How do I check those things before validating my command?
> Let's forget about that, but a link (maybe) to the saga pattern and consider
> it done.

Some things come for free but a lot of easy things become a lot more complex.
But this will be hidden to sell the new silver bullet. What's funny is they
often use git as an example: remember you rarely use git alone when you want
to do things like code review and handling permissions.

~~~
blowski
You don’t use event sourcing on its own either. You can use Rails (or Django
or Symfony or whatever) to validate the command and add end-user friendly
error messages. Then your models should again validate the command as an
‘anti-corruption layer’, this time throwing developer friendly error messages.

That said, I completely agree with your assessment. Event sourcing makes
simple things a lot harder. It should be used sparingly, only for those
situations when the history is genuinely useful.

------
technimad
This is a really interesting concept, which does make sense. The aggregates
become pretty important in this kind of setup. I'm pretty familiar with
Splunk, and most of these concepts align with the capabilities of Splunk,
although Splunk is not usually used in these sorts of use-cases.

Has anyone here here implemented a system like this based on Splunk, and what
was your experience while doing this?

------
BerislavLopac
I like this concept, except I see Calculators as just a special case of
Reactors. Generally, Reactors are all about "when this condition is true, do
that"; Calculators are exactly the same, only with a specific definition of
_that_ (update some Aggregates).

------
tlarkworthy
See cloud event spec
[https://github.com/cloudevents/spec](https://github.com/cloudevents/spec)

Would be nice if the events here were compatible

~~~
jacques_chester
The CloudEvents spec is, at the moment, basically an envelope format.

~~~
tlarkworthy
Yes, with auth, so you can connect systems in a standard way and reuse cloud
emitted events

~~~
jacques_chester
Don't get me wrong, I want it to be a success, and I think it has a good
chance of meeting its goals. I just don't want folks to read into it what
isn't there.

------
bmpafa
"...made simple" might be a bit of an overstatement.

