
Seven Commandments for Event-Driven Architectures - n1bble
https://rjzaworski.com/2019/03/7-commandments-for-event-driven-architecture
======
EngineerBetter
I think this misses the main one - append vector clocks because your events
will inevitably turn up out of sequence at some point.

~~~
stingraycharles
I find vector clocks to introduce a lot of complexity. In our case, we just
use PostgreSQL to handle ordering. When committing an event into PostgreSQL,
you verify that the last committed event for your stream is still what you
expect it to be (i.e. CAS), and you have strong ordering.

Vector clocks I typically want to stay away from as far as possible.

~~~
EngineerBetter
What do you do in a network partition?

~~~
tatersolid
They go down. Or go read-only.

What do _you_ do during a network partition? Accept writes that you’ll throw
away eventually?

~~~
EngineerBetter
Yeah, accept those writes and favour availability. Customer was a wealth
management company.

Imagine a customer has £100 in their account. System partitions. Customer
withdraws £70, hitting one DC. Customer then hits the second DC, this time
withdrawing £50. Each DC thinks the transaction is valid, and so serves it.

Later when the partition is restored, events are played back, and divergent
history is detected via the vector clocks - the two withdrawals are not
causally related. Remediative action can then be taken.

Transactions prevent bad things happening, but require CP semantics. Eventual
consistency allows AP, allows bad things to happen, meaning you have to be
able to detect them and clean them up later.

~~~
tatersolid
There’s a long history on this debate, but _in practice_ even the GOOG and
AMZN have settled on transactional CP systems for handling the important stuff
like money (Spanner, Aurora). Expecting app developers to roll their own
transactions and conflict resolution at the app layer proved intractable even
with all their resources.

> Remediative action can then be taken

Sounds expensive and error-prone; taking a “read only” outage makes more sense
in many use cases

------
xtagon
The section on "Minimize state in-flight" [1] has an example of an event with
both an amount withdrawn and the new balance, and the author recommends
sending only the amount instead.

Wouldn't it be useful to send both, since if the expected balance does not
match up when the event gets processed, that must mean something was processed
out of order?

[1]: [https://rjzaworski.com/2019/03/7-commandments-for-event-
driv...](https://rjzaworski.com/2019/03/7-commandments-for-event-driven-
architecture#7-minimize-state-in-flight)

~~~
aarbor989
Well when you work with distributed messaging systems it can be difficult (if
not impossible) to guarantee messages arrive or are processed in the same
order in which they were generated. So I think the author is just saying
generally speaking it’s better to track as little state as possible if you
don’t need to.

------
mavdi
Excellent list, thank you. I’ve thought of most of them by experience but it’s
great to know I’m not full of crap.

------
HeyBillFinn
Related to ordering, I was mildly surprised that the "Minimize state in-
flight" section didn't mention timestamp-ing all events with a created (and
possibly a separate effective) date.

~~~
naasking
A created date might encourage an assumption that the date is a reliable way
to order events.

------
jcims
Curious about the minimize state in-flight. If the producer has state on-hand,
why not include it and just have a policy around when you trust it. If it’s
equal cost for producer or consumer to collect state then maybe, but how often
is that the case?

~~~
pyrale
Consider replacing "event" by "observation". What you really have is a set of
observations, and a state that you infer from it. Observations you make may be
reliable, but you have no guarantee that you can observe every fact in the
problem.

For instance, if you're a payment processor in the EU, and a withdrawal was
done in the US, you may not have received it yet, so you have an incorrect
balance. If a withdrawal is made in the EU, and you add the balance in the
event, you will have mixed a perfectly legit observation with something, which
is not an observation but an inference, an artifact processed from your
incomplete set of observations.

Trust is something that changes over time. When your EU platform was
implemented, maybe your US platform didn't exist. But the difference between
observations and inference will remain true.

------
2sk21
Reminds me of this famous list from the 90s
[https://en.m.wikipedia.org/wiki/Fallacies_of_distributed_com...](https://en.m.wikipedia.org/wiki/Fallacies_of_distributed_computing)

~~~
Nicksil
Non-mobile:

[https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...](https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing)

------
everyone
The title is to broad imo.. This is clearly only about Event-Driven
Architectures for the web. Many of those 7 points are specific to web.

~~~
jayd16
The examples are web based but these points can still be adapted down to any
scale, really. Things like staleness are still a concern in something like an
event driven UI ie user input disrupts the current action.

~~~
everyone
Wrong. I like using an event driven architecture in games as opposed to an
update driven one. There is no concept of staleness in a single player game
running on one machine. Many of the concepts the author mentions simply do not
exist in that context. In games, event driven = some event happens, all the
logic that needs to execute in response to that happens _immediately_ , as
opposed to, for example being put on a blackboard and dealt with in some other
update loop.

The reason a lot of games have frame-perfect bugs / exploits / glitches is
because they use an update-driven architecture. So its possible that for one
frame after certain things happen the game is in an inconsistent state.

~~~
jayd16
You do run into the same issues in something like a game.

Lets say you have a character doing some animation clip and at the end you
want to run some callback.

That character could be killed by the player or the player could quit to the
main menu or any number of things.

Now your callback is stale. You have to deal with cleaning it up, ensuring it
does actually fire if you need it to, and handling any issue with the now
deceased character.

~~~
everyone
Wrong again. You are obviously commenting about an area you are not familiar
with. That is a very weird scenario. You dont want stuff like animation
affecting logic. The game logic should never be affected no matter what the
animations are doing. But if for some bizarre reason you did have that
scenario, a way of doing it using a typical game engine approach would to put
that callback at the end of a coroutine. If the character dies then no matter
what is is you _definitely_ dont want that callback to fire, (There shouldnt
be logic being affected by an animation, but if that character is dead then it
shouldnt be affecting _anything_ at all) part of your game engine would be
that coroutines do not run on things which have been destroyed.

~~~
jayd16
Man, you're the one who said you were using events for game dev not me.
Coroutines or events, the idea that you need to reconsider your state when
execution returns is still valid advice. Tying a courtine to a game object's
life cycle _is_ managing the event state. Its clearly not a weird scenario
when entire game engines are based around this principle.

~~~
everyone
Well as I said having some logic executed by an animation would be doing
things backwards...

Most of the points in the article are about using an event driven architecture
in the world of distributed systems, where the messages triggering events are
delayed by a certain amount, can be lost, malformed, or different to what the
receiver expects.

None of that applies to a single program running on one machine. Thats why
using an event driven architecture for a game is so great. I _never_ have to
worry about most of the stuff in the article. Also (as opposed to the more
common Update driven model) if I do a good job its possible for my games logic
to be flawless, it can be impossible for it to be in an illegal or weird
state. So an event driven architecture in a game is fantastic for that reason,
and the article is quite useless when it comes to event driven architecture in
games. I'm sure its fine for event driven architecture in distributed systems,
but the title is certainly too broad.

Like with many of the articles posted here, CRUD / web devs forget there are
other fields of programming.

