That’s not my experience. In fact I’d say fat events add coupling because they create an invisible (from the emitter) dependency on the event body, which becomes ossified.
So I’d say the opposite: thin events reduce coupling. Sure, the receiver might call an API and that creates coupling with the API. But receivers are also free to call or not call any other API they want. What if they don’t care about the body of the object?
So I’m on team thin. Every time I’ve been tempted by the other team, I’ve regretted it. It’s also in my experience a lot more difficult to version events than it is to version APIs, so reducing their surface area also solves other problems.
> thin events reduce coupling. Sure, the receiver might call an API and that creates coupling with the API.
You make a statement in the first sentence, and in the next sentence produce evidence ... that the statement is wrong. And, YMMV.
It is my experience that thin events add coupling. If service B receives an event, and wants to process it ASAP (i.e. near real time) and so calls back over http to Service A for the details, then
a) there is additional latency for a http call. And time variance - Even if the average latency of a http request round-trip is fine, the P99 might be bad.
b) You're asking for occasional "eventual consistency" trouble when A's state lags or has moved on ahead of the event
c) Worst of all: When service A is down or unreachable, Service B is unable to do work: Service B uptime must be <= Service A uptime. You have coupled their reliability, and if Service B is identified as mission-critical, then you have the choice of either making Service A equally critical, or decoupling them e.g. with "fat events".
I don't believe that it's accurate to say "receivers are also free to call or not call..." it's not choosing a flavor of ice-cream, you do the calls that the work at hand _needs_.
If you find that you never need to call back to service A then yes, "thin events" would suit your case better. That has not been my experience.
It's fair that event data format versioning is a lot of work with fat events - nothing is without downside. But in your case, do you have "dependency on the event body" ? All of it? If a thin event is all that you need, then you depend on a couple of ids in the event body, and not the rest. Json reading is very forgiving of added / removed fields, you can ignore the parts of a fat event that you don't care about.
> You make a statement in the first sentence, and in the next sentence produce evidence ... that the statement is wrong.
My first sentence was quoting from the article, then I refute the article. Sorry if that wasn’t clear.
Re your point a), yes I agree in this case you’d send the contents in the body, but then I’d tend to call it stream processing rather than event processing - I admit this might seem like splitting hairs, but I do feel that there’s a difference between events and data distribution. And I personally find the data distribution pattern tends to be a lot more specialised.
Re b), it’s just an assumption that the receiver needs the version of data in the message, rather than the latest version. So I don’t think this is a strong argument for fat events.
Re c), again, it’s an assumption that the receiver needs the exact data provided in the event body; but I’ve found that, except in very simple cases, it’s very difficult to efficiently create event bodies that contain everything that all receivers are going to need. Maybe the receiver needs to collate a bunch more data, in which case the problem persists regardless of fat or thin, or maybe it just clears a local cache, in which case the problem is deferred until the data is needed and you probably have other things to worry about then anyway.
> I don't believe that it's accurate to say "receivers are also free to call or not call..." it's not choosing a flavor of ice-cream, you do the calls that the work at hand _needs_.
Sure, and the calls you make depend on the context, and if there is enough data in the event body to avoid making any calls at all. And I’m saying that in my experience that’s not generally the case. What I’ve seen is that the sender composes some event body and sends it, and the receivers end up needing to call APIs anyway.
In which case, the sender may as well have not gone to the trouble, hence my preference for thin events.
> But in your case, do you have "dependency on the event body" ? All of it?
From a maintenance perspective, the sender doesn’t know what the receivers depend on, so even if all your receivers only depend on the IDs, there is no way to find out. Because of this, it’s really easy to add fields to an event message, but really dangerous to remove them, because you can’t easily tell what receivers depend on the thing you’re removing. This is why I said that fat events create more coupling than thin events.
Of course as with most things there are always exceptions. Maybe I should have said, “I’m on team thin by default. But of course some use cases require fat messages, in which case proceed with great care”.
I think it's a straw man to say "we couldn't eliminate all API calls, so fat events are useless" - even removing 1 dependency at a time is a win. In my experience, you generally can do this, and that was the approach taken for reliability improvement.
> it’s very difficult to efficiently create event bodies that contain everything that all receivers are going to need.
"everything that all receivers need" seems like another straw man, a "you won't get it perfect so don't try to improve". I've seen it work well enough to be worthwhile.
> From a maintenance perspective, the sender doesn’t know what the receivers depend on
At a glance, no. But it's not imponderable, assuming a limited number of in-house consumers. The absolute statement about it isn't accurate.
> it’s just an assumption that the receiver needs the version of data in the message, rather than the latest version. So I don’t think this is a strong argument for fat events.
I've seen it cause a severe and hard-to-diagnose failure, when system A lags enough, so I think it is a strong argument.
> Maybe I should have said, “I’m on team thin by default.
Sure. I'm on team "fat events" by default because it can solve more issues than it creates. If it turns out that 90% of the event gets ignored, with no issues or http call-backs, then this might be a case for thin events.
b) You're asking for occasional "eventual consistency" trouble when A's state lags or has moved on ahead of the event
If you allow A's state to lag behind it's own events, then how are you ever going to create a sane system? Surely A either has to be ahead or at the state that caused the event to emit, or events are pointless.
Sure, but in a thin events model someone would "own" the events since otherwise the subscriber wouldn't know where to query the actual data. What would you even do with an event saying a customer changes address if querying that address then produces the old one.
I'm genuinely curious how such an architecture would work. You don't have to respond directly here, but if you have any reference to further reading, I'd appreciate it.
> I'm genuinely curious how such an architecture would work.
Complex systems are the way that they are because they got that way over time. It is not my goal to defend or even characterise a system that I did not create.
I am here telling you the issue that I saw: one event consumer, at an edge case, ran substantially behind another, and when they attempted to co-ordinate over http, this failed. And how it was successfully resolved: fatter events removed the need for co-ordination between these two altogether. This was IMHO a more elegant design - it avoided he issues of the the thin events.
Ah, so A and C where both subscribed to B, but during A's processing of the event it assumed C had already processed it and tried to look up some state. Is that correctly understood?
This sounds more like an architectural deficiency (as you say probably from architectural decay) than a systematic design edge case. I can't quite understand what information A would need to get from C that could be included in the fat event but not simply queried from B.
> Ah, so A and C where both subscribed to B, but during A's processing of the event it assumed C had already processed it and tried to look up some state. Is that correctly understood?
yes, though you're down the rabbit-hole on this one issue. My point (aside from the fact we actually saw this specific issue and it took a long time to correctly diagnose) is that with thin events followed by a http query call-back, You're asking for occasional "eventual consistency" trouble. Data races will happen occasionally - this is inevitable in the design.
At the tail end of the latency distributions, too fast or too slow, or service A is now having a blip, or you now hit the new version just deployed or whatever, things will go wrong by mis-sequencing in surprising and hard to follow ways (example given that you're fixating on) in complex real systems, and it's a win to avoid that chaos entirely, with fat events.
If you allow A's state to lag behind it's own events
That's a mischaracterization. A's state is not lagging its emitted events; instead, A's state may have been changed at the time A's event is processed.
The "own events" was the faulty assumption. it's not always the same service that both emits the events, and is the place to go to over http for data. It "seems logical" to also build that store from listening to events, but it can cause issues as mentioned.
> when A's state lags or has moved on ahead of the event
That sounds like it can EITHER be ahead or behind. Specifically, I do not understand it as A's state can either lag OR be ahead, not that "lags" is a synonym for "moved ahead"
> b) You're asking for occasional "eventual consistency" trouble when A's state lags or has moved on ahead of the event
To be noted that this is the default if B is recovering after an outage.
Personally, I consider events to be insane. "We create an immutable database so that the state of the system is always recoverable." Okay, cool, very functional programming of you. "But then to actually work with the event from the immutable database, you have to query a stateful service." ??? What? And even fat events only go so far to get you out of that. So with a stream of n events, you don't have n states that the application can be in, but n times the product of all possible states of every other service that you query. How does this help?!
The bit you seem to be missing is the events are the source of truth, not the databases.
Lose your database? Roll up all the events. Got a lot of them? Take snapshots and then roll up from the last trusted snapshot.
In true event sourced systems, the databases and stateful systems are artefacts that can be thrown away and rebuilt. The event log is the actual “true” database.
Once you design around that, your objections melt away.
And if you think this is some faddish trend, this is how finance has worked since the invention of book keeping and how your databases under your stateful services are working under the hood.
This only works if your events are in a single globally ordered stream or all your code is eventually consistent over every stream it consumes. Specifically, you cannot do the "query a service for the aggregate state" thing this article espouses for thin events, ever.
I also disagree with the article - thin events don't always result in more coupling, and I'll add that thin events can remove temporal or state coupling as illustrated below. However, the caveat is: as with many things I think choosing one team or the other has nuance and depends on the specific scenario.
An example: I'm using thin events in a master data application integration scenario to send a 'sync this record' type of command message into a queue. The message body does not have the record details, only the basic information to uniquely identify the record. It also doesn't identify the type of change except for a flag to identify deletes. The 'sync' message is generalized to work for all entities and systems, so routing, logging, and other functions preceding the mapping and target operation have no coupling to any system or entity and can expect a fixed message format that will probably never change. Thus versioning isn't a concern.
Choosing team 'thin event' does result in an extra read of the target system, but that is a feature for this scenario and what I want to enforce. I can't assume a target system is in any particular state, and the operation to be performed will be determined from the target system at whatever point in time a message is processed, which could be more than once. If the message ended up in a dead letter queue, it can be reprocessed later without issue. If one production system's data is cloned down to a lower environment, the integrations continue to work even if the source and target environment data is mismatched. No state is stored or depended upon from either system and the design is idempotent (ignoring a target system's business rules that may constrain valid operations over time).
In contrast, other scenarios may benefit from or require a fat event. I've never used event sourcing, but as others mention, if current state can be built from all previous events 'rolled forward' or 'replayed', then each event must be a stand-alone immutable record with all information - thin events cannot be used. Or, if a scenario requires high performance we might need to use a fat event to eliminate the extra read, and then compensate for the other consequences that arise.
assume the data format changes, it would change in the called api as well. as long as the fat event sends data that it's in the same format that the api would return, you'd have the same level of coupling.
I think fat vs thin is more about how much other services the event have to travel, because thin event would multiply reads by a fair factor, with the tradeoff being the performance hit for the queue system to store and ship large events
With an API you can publish a new endpoint (/v1, /v2 etc). It’s normally reasonably easy to maintain an old API even while you add features to the new API, and the runtime penalty is minimal because clients would be expected to call just one version of the API for any given event. (You can also see who’s calling the old API and ask them to change)
But this is not true for events. If you change the body such that you now need to maintain two versions of an event, then you have to publish both events simultaneously, which means double the server side effort, storage etc for each event version. It’s pretty inefficient, and painful. You can work out who subscribes to the old event but there is still a big efficiency hit.
You might be right about many reads per event in a simplistic way; if you have a lot of clients then it could be expensive if you don’t have a server side cache. But there would typically be a lot of temporality in such a system so it seems like an easy problem to solve for most use cases; you don’t have to cache for long, but caches are of course tricky if your use case is not very simple. That said, if there is already a HTTP connection open then the additional latency and bandwidth hit cause by this events are going to be minimal in most cases, and probably drowned out entirely if you need to push multiple versions.
As I said in another thread, I should have said that thin is my default. There are cases when fat makes more sense, but normally I’d start with thin and see if I need to flesh it out. Whenever I’ve started fat I’ve ended up reverting.
This lets you identify the version but it doesn't let old clients read the new messages. (Well, for avro and others they still can if the new fields aren't important or the old fields aren't required - but if you can do that you also don't really have a new incompatible version and you don't need the schema hash to begin with.)
The point is that with a pull-based API, I have a fixed number of requests. As clients migrate from /v1 to /v2, load on /v1 goes down and /v2 goes up, and I can adjust resource allocations accordingly to keep the total requirements relatively constant. I can even reimplement /v1 in terms of /v2 internally in many cases and have ~0 operational overhead.
But for an evented system, as soon as just a single client wants v2 I need to publish that, and as long as any client wants v1 I need to publish that. So my outbound "work" (at the very least i/o but probably also DTO conversions and god help you if it's any kind of storage or business logic) is doubled immediately and remains doubled until everything is migrated.
API versioning is more for external users, not internal. if your api is versioned, your events should be versioned as well tho, so we're at square one, as in, you're manufacturing a scenario where one approach is advantageous, and I agree your approach works in that scenario, but that is different than saying that one approach is advantageous at priori
When I've seen this fat event pattern it's been because different services' responsibilities were not fully separated. And that's tight coupling. Fat events imply tight coupling.
The "thin" pattern described in the article goes like this:
1) service FOO gets an event
2) FOO then has to query BAR (and maybe BAZ and QUUX) to determine the overall state of everything to determine what to do next
And #2 means all of that is kind of "thin" is tightly coupled, too.
I've also personally seen thin events that are not the article's thin strawman.
I sometimes wonder if people understand coupling or design.
When the "state" is large, or changes often, obviously you can't send full state every time - that would be too much for end-nodes to process on every event. Both cpu - deserialization, and bandwidth. Delta is the answer.
Delta though is hard, since there always is an inherent race between getting the first full snapshot, and subscribing to updates.
On the other hand doing delta is hard. Therefore, for simple small updated not-often things, fat events carrying all state might be okay.
There is a linear tradeoff on the "data delivery" component:
- worse latency saves cpu and bandwidth (think: batching updates)
- better latency burns more cpu and bandwidth
Finally, the receiver system always requires some domain specific API. In some cases passing delta to application is fine, in some cases passing a full object is better. For example, sometimes you can save a re-draw by just updating some value, in other cases the receiver will need to redraw everything so changing the full object is totally fine.
I would like to see a pub/sub messaging system that solves these issues. That you can "publish" and object, select latency goal, and "subscribe" to the event on the receiver and allow the system to choose the correct delivery method. For example, the system might choose pull vs push, or appropriate delta algorithm. As a programmer, I really just want to get access to the "synchronized" object on multiple systems.
You send the entire state of the entire object that changed. Irrelevant fields and all.
This makes business logic and migrations easier in dependent services. You can easily roll back to earlier points in time without diffing objects to determine what state changed. You don't have to replay an entire history of events to repopulate caches and databases. You can even send "synthetic" events to reset the state of everything that is listening from a central point of control.
I've dealt with all three types of system, and this is by far the easiest one to work with.
since the "fat event" ones are vaguely defined here, they could be arbitrarily close to or far from the "Entire object" cases. How does it differ? Maybe it does not.
Your team decides what parts of the model to expose in its events and it becomes an API in its own right.
You might change the names of fields, move them to places that don't reflect where they live on a nested model, etc. It requires a lot more thought and maintenance.
That isn't to say the choice can't be correct. All of these approaches have pros and cons.
>Delta though is hard, since there always is an inherent race between getting the first full snapshot, and subscribing to updates.
Since the deltas include a version identifier for what they should be applied on top of, then you should always be able to safely start by requesting the deltas, then ask for the object. Buffer the deltas till your full copy is received, then discard deltas for previous versions until the stream applies to yours, applying them thereafter to keep it up to date.
This omits the issues with "thin events" - it may be fine most of the time, but as it usually involves a "get more details" call over http or of some other kind, it has more moving parts, is therefor more prone to failures and slowdowns due to the extra coupling. This can kick in when load goes up or some other issue affects reliability, and cause cascading failure.
I‘d pick neither and just let the system in possession of data send with the event only the part of data it owns (i.e. something in between fat and thin). Saves API call back, the body doesn’t have to be fully deserialized, so no format coupling, the rest can be fetched from other services on demand (coherent state though is not guaranteed, but that’s usually not critical with well designed bounded contexts).
Yes. The owned part of saga can be as small as an acknowledgment of something that happened. Basically, you do not create the pattern of fat event in your architecture and then stick to it, instead you send both fat, thin and in between depending on context.
I'm a fan of fat events, and let the receiver decide if they want to trust the event or not, or go ahead and make a call to the service to get the data.
for example:
if one receiver wants to know if you have read a book, then there is no reason to make a call to the service.
but if a service wants to know the last book you read, and doesn't trust the events to be in order, then it would make sense to just call the service.
> if a service wants to know the last book you read, and doesn't trust the events to be in order, then it would make sense to just call the service.
It would make more sense to me if the events had an increasing sequence number, version number or accurate timestamp, so that if I record that "'sithlord' last read 'The Godfather' at event '123456'" I can record that, and ignore any event related to "sithlord last read" with event < 123456.
This is not a new problem, there are existing solutions to it.
As always, it depends. Yay engineering and trade-offs.
Hey just remember: both is always an option if you're consumers disagree. Thin stream from the consumers who don't trust the fat data, fat stream for the event log and other consumers that prefer it.
Fat events once overloaded our message broker with OOM under high load and the broker's default behavior was to block all publishers until the queue was emptied (to release memory) - downtime as a result. Another issue was that under high load, if the event queue was too large, handlers would end up processing very stale data resulting in all kinds of broken behavior (from the point of view of the user).
Thin events resulted in DDoS of our service a few times because handlers would call our APIs too frequently to retrieve object state (which was partially mitigated by having separate machines serve incoming traffic and process events).
(A trick we used which worked for both fat and thin events was to add versioning to objects to avoid unnecessary processing).
We also used delta events as well but they had same issues as thin events because handlers usually have to retrieve full object state anyway to have meaningful processing (not always, depends on business logic and the architecture).
There are so many ways to shoot yourself in the foot with all three approaches and I still hesitate a lot when choosing what kind of events to use for the next project.
For me, this depends on the semantics of the system. Is the sender commanding the receiver to carry out the rest of the process, or is the sender broadcasting information to a dynamic set of interested parties? In other words, are you building a pipeline or a pub-sub?
If the former, there is inherently tight coupling between sender and receiver, and the sender should send all necessary context to simplify the system design.
If the latter, then we talking about a decoupled system, where the sender cannot make assumptions about what info the receiver does or doesn't need to take further action. A thin event is called for to keep the contract simple.
One of my frustrations with the event-driven trend is that people don't always seem to think through what they're designing. It's easy to end up with a much more complex system than a transactional architecture.
Generally, I favor modeling as much of my system as possible as pipelines, and use pub-subs sparingly, as places where you have fan out to parallel pipelines.
Raw events are like GOTOs. They are extremely powerful, but also very difficult to reason about.
Thin events have the benefit of easy retry/resend logic. Depending on your message queue solution you might need to sometimes resend events.
If the event is 'user account changed', receiving it a few times too many causes only performance issues, but not correctness problems.
Sometimes this is the better tradeoff.
It is easier to send an event 'user account changed' than to analyze in detail what exactly changed, which also allows you to decouple the event logic from everything else.
Of course not every system benefits from such solutions, but sometimes simplicity wins.
If you send to multiple receivers, some of the messages may not make it through. Then you're in the position of deciding whether 'the message was sent' or not.
Because the very real downsides of thin events are not simple or obvious. It may be fine most of the time, but as it usually involves a "get more details" call over http or of some other kind, it has more moving parts, is therefor more prone to failures and slowdowns due to the extra coupling. This can kick in when load goes up or some other issue affects reliability, and cause cascading failure.
But it's a lot less prone to data races and other upsides so it really is contextual to your needs.
I would also push back that it has more moving parts. You'll often need to pull information and getting pushed information as well can duplicate code on both the service side and client. In practice thin events are easier to get right, despite the extra API requests.
I also think the cases where there's some kind of outage but it's an outage such that the information in the event is enough is fairly rare. I would guess it's more rare than outages that also disable event triggering anyhow.
What's the reasoning for that statement? My other comments have detailed cases of data races, caused by using thin events, solved by using fat events, so I'm going to push back on that: In my experience, the idea that thin events are "less prone to data races" has not been true at all, and that data race is inherent in the model of "receive an event, go back and get more data over http about it". How would you qualify this as "a lot" - how much is that lot? Citations very much needed.
> I would also push back that it has more moving parts. You'll often need to pull information
"Often" is not always. You seem to be saying that when you don't get the signature benefit of fat events (not having to pull information) ... there's no benefit. Yes, that's a tautology, and also an encouragement to further refine the design until you do get the benefit.
> I also think the cases where there's some kind of outage but it's an outage such that the information in the event is enough
Are you speaking from experience here? Doesn't seem like it.
I refer you to point c) here https://news.ycombinator.com/item?id=33392655
It helps reliability to shorten the list of services that your service depends upon being 100% up and responsive.
Data races can occur with fat events because fat events don't usually store a snapshot of the entire world. They're often actually delta events, forcing the consumer to rely on caching (a racey affair) or just fall back to hitting the API anyway.
If you're having issues with thin events causing data inconsistency then the answer should be better data representation in your core API. If you had to solve it by cramming perfectly stale but self consistent data into a fat event, a better solution would be to persist such data and make it queryable, no?
But of course any way can be made to work. Any extra bit of data can be appended to the event details. Any number of separate DB and persistent queues can be maintained and backed up like the business depends on it.
In my experience thin events are just less bug prone because they're more simple. You're not worrying about stale data floating around an event queue or a client cache. You're not guessing what data the consumer needs. Having the event consumer pull data is usually a trivial cost difference.
Fat events have their advantages but I put them under premature optimization and YAGNI.
> fat events because fat events don't usually store a snapshot of the entire world. They're often actually delta events, forcing the consumer to rely on caching (a racey affair) or just fall back to hitting the API anyway.
I think you're saying "I don't like fat events because they're not really fat events", well, sorry that's your experience, I don't think that's a valid criticism of the actual thing at all, just of your poor experience of it.
> Data races can occur with fat events.. hitting the API anyway.
That happens with thin events as a matter of course, right? You stated that thin events are "lot less prone to data races" and now you're saying that they're the same? Where's the fat-event-specific issue that you alluded to? Citation not provided.
> Having the event consumer pull data is usually a trivial cost difference.
As I have stated twice before, it's a non-trivial _reliability_ difference, and that's the key.
>That happens with thin events as a matter of course, right?
No. If you send no data, what is the race? You can structure an API where there are no races. If you concede to API access the API can provide any sort of historical data necessary. The issue is with incomplete event data and assumptions around what the state was when the event was sent.
>As I have stated twice before, it's a non-trivial _reliability_ difference, and that's the key
You can state it until you're blue but the advantage _is_ usually trivial or there would be no debate. It just doesn't bite that often.
I would say the buggy and prone to rot coupling of baked in event data is a bigger concern for most.
That said, I'm making an argument about what is the best bet for dev time and it sounds like you're making an argument about what is best given infinite developer resources, an unchanging APIs, and full knowledge of what data the consumer needs.
> No. If you send no data, what is the race? You can structure an API where there are no races.
You seem to be saying that the race is a problem when comparing 2 copies of the same data (yes) and that this is an issue for fat events (no, and misses the entire point).
A vague thin event contains at minimum an event type and an item id, e.g. "SoemthingHappendedToAnOrder id:123456" which is _ahem_ two pieces of data that are sent. Events containing "no data" are not a thing, don't be absolute.
So there's potential for a race or inconsistency when you correlate that with a http api which might or might not have that order. You can't entirely get away from that.
> You can structure an API where there are no races.
I do not think that you understand "the fallacies of Distributed Systems"
> That said, I'm making an argument about what is the best bet for dev time
Sure, if you want to write as many bugs as possible as fast as possible, go ahead. (yes, I realise that this is mischaracterising hyperbole, but you did same by saying "infinite developer resources" etc above)
It seems rare that every operation in a business is atomic across a single aggregate, so I’ve always been wary of events that are pure CRUD, whatever you choose to call them.
I agree in principle, but I also to need action commands when crossing boundaries in and out of the event system (e.g. to the user, or to an external provider.) Of course I still phrase them in the past tense:
Of course you need command actions. All I'm asking is that you stop mixing them up with events.
Whether you're mixing or not is hard to tell if you're fudging the wording.
Whats the point of the payment request sent to stripe event? Does it trigger a stripe API request or is it emitted after an API request to stripe is made?
That’s not my experience. In fact I’d say fat events add coupling because they create an invisible (from the emitter) dependency on the event body, which becomes ossified.
So I’d say the opposite: thin events reduce coupling. Sure, the receiver might call an API and that creates coupling with the API. But receivers are also free to call or not call any other API they want. What if they don’t care about the body of the object?
So I’m on team thin. Every time I’ve been tempted by the other team, I’ve regretted it. It’s also in my experience a lot more difficult to version events than it is to version APIs, so reducing their surface area also solves other problems.