Hacker News new | past | comments | ask | show | jobs | submit login
Anti-patterns in event-driven architecture (codeopinion.com)
243 points by indentit 6 months ago | hide | past | favorite | 163 comments



I often hear the argument in favor of event-driven architecture that you can work on one part of a system in isolation without having to consider the other parts, and then I get assigned some task which requires me to consider the entire system operation, now with events that are harder to trace than function calls would have been.

Now when people argue “because decoupling,” I hear, “You don’t get as much notification that you just broke a downstream system.”


You need to improve your telemetry to feel the benefits. I can trace all the way through multiple services easily on a simple detailed flame graph in our systems.

https://www.datadoghq.com/knowledge-center/distributed-traci...

Unless you have a single monolith, you’re going to face issues with versioning whether it’s event based or API based. In each case you can usually add new properties to a message, but you can’t remove properties or change their types. If you need that, create a new version.

The author does a lot of videos on the event sourcing topic. Event driven I get. It works well in several applications I’ve helped to build over the last 15 years. But event sourcing? I truly don’t get it. Yeah I get it’s nice in terms of auditing to see every change to an entity and who made it, or replay up to to change x on y date, but that really is a niche requirement.


> I can trace all the way through multiple services easily on a simple detailed flame graph in our systems

I'm not sure what point is being made here. It's good that you can do that - but are you implying that that's not possible in an API-driven system?


There are many examples of event sourcing but perhaps the most classical one is the bank account.

It's not just about auditing, it's also about transactionality and atomicity.

If you want to withdraw $5 from your account, the traditional approach of locking, updating everything, unlocking (or in other words wrapping everything in a transaction) doesn't scale as well as the notion that you just record the transaction (event). Implementation-wise this withdrawal can involve, updating two accounts and updating the audit/account transaction logs. We also want this to scale since our bank has millions of customers all operating more or less concurrently. A distributed log (like Kafka) is easy to scale and easy to reason about. You just insert the transaction record and you have a distributed system that will scale and is easy to reason about.

Another driver/flavour for something like event sourcing is what some might call state-based or state-oriented programming. That is instead of modifying state directly you are synchronizing state via events. This lets you e.g. code state machines around those that can lead, again, to easier to reason about (and test) code.


As far as I know banks in my country use transactions and IBM mainframes, not distributed systems.


I once spoke with some engineers at Monzo and at the time (2018) financial transactions were handled via a combination of Kafka and Cassandra.


Do you know for a fact that banks use event sourcing for transactions? I thought they were extremely eventually consistent


I'm pretty sure different banks use vastly different architectures. Some run nightly batch jobs on mainframes that are written in COBOL. But alas, I've not worked for any bank, this is just a commonly used example. I am willing to bet the transaction log, or ledger, is indeed a very common approach since that's also common in accounting.

EDIT: Also event sourcing would typically be eventually consistent. I imagine for some banking applications a stronger consistency guarantee might be required, e.g. to prevent you from withdrawing the $100 in your account multiple times.


I’m sceptical, because I can’t find any examples of banks using it in production, just lots of blog posts by consultants and companies selling event sourcing solutions.

I did work somewhere that used a Kafka stack in production. It wasn’t a compelling use case and they spent almost an entire year on infra and productionizing it. It left me extremely sceptical about anything “big data streams” related :)


Fair enough. But I've used event sourcing in production in two companies. One project was a large scale distributed object store and the other was network equipment (like switches) and network management. The banking example is a classical one but I can't tell you if it's actually used in production. What I do know is that accounting software follows the ledger approach which has a similar spirit, recording transactions, and my guess would be regardless of technology banks also are transaction/event based as their source of truth (even in a COBOL mainframe batch processing scenario).


> Implementation-wise this withdrawal can involve, updating two accounts

Can't wait to read the incident report when one of the consumers successfully receives and reads the message but the other doesn't.


Depends what domain you work in. Auditability is a key/mandatory requirement in a lot of regulated industries.

There are of course other ways to do auditability.

Event Sourcing + Projections provide a nice way to build multiple models/views from the same dataset. This can provide a lot of simplification for client code.


A niche requirement? There are big accounting firms who organize payrolls, where the changes that you mentioned are an important part of their business.

There are also other companies, which do the typical snapshot and roll up to the current time, when they start the services, that need the data without having access to the database.


You don't need event sourcing to organize payrolls event "at scale".


> I can trace all the way through multiple services easily on a simple detailed flame graph in our systems.

That's not exactly an obscure feature exclusive to datadog. From the top of my head, both AWS and Azure support distributed tracing with dedicated support for visualization in their x-ray and application insights services.


I doubt GP was suggesting it is unique to DD.


I think generally a lot of these types of problems were actually had by folks who grew out of single node systems and had a lot of interesting ideas to solve problems that were new in those domains, GIVEN they've already solved the stateful domain problems as well.

When you've never grown out of a single node domain but you do event driven "because scaling" or whatever, you've shot yourself in the foot amazingly hard.


Yes, events, async, eventual consistency, decoupling represent a difficult/complex solution for some hard problems encountered when scaling high.

But people often forget there are trade-offs to everything and if you don't have these hard problems, you're giving yourself only headaches.

My pet-peeve is "decoupling" - it's treated as holy with only benefits and no downsides. But it's actually again a level of complexity - unless you need it, tightly coupled code will be easier to write, read, debug etc.


Like anything it can be abused and sometimes folks go overboard with turning everything into an event. However, when done right, it is really amazing to work with.

As an event producer as long as you follow reasonable backwards-compatibility best practices then you should be pretty safe from breaking things downstream. As a consumer, follow defensive programming and allow for idempotency in case you need to reprocess an event. Pretty straightforward once you get the hang of things.


> As an event producer as long as you follow reasonable backwards-compatibility best practices then you should be pretty safe from breaking things downstream.

That can protect you from "downstream can't even read the message anymore" but it doesn't help you with the much more common "downstream isn't doing the right thing with the message anymore" problem. Schema evolution is kinda like schema'd RPC calls vs plain JSON: it will protect you from "oops, we sent eventId instead of event_id" type of errors, but won't prevent you from making logical errors. In a larger org, this can turn into delayed-discovery nightmares.

A synchronous API call could give you back an error response and alert your immediately to something being wrong. The system notifies you directly.

A downstream event consumer may fail in ways entirely off of your team's radar. The downstream team starts getting alerts. Whether or not those alerts make it immediately obvious to them that it's your fault... that depends on a bunch of factors.


Data consumers in unknown teams is a nightmare regardless of the architecture.

Events sent for readership you can’t control are ideally of the type «x changed», and the consumer must then fetch data on the relevant endpoint.

That or the company must have serious versioning policies.


What's the benefit of using an asynchronous event driven system if you can't process any of those events without performing a synchronous query back on the same provider for all of the necessary data?


You get relevant notifications without polling or needing to sub, and you don’t have to be strict about message versioning.


Do you have pointers to such best practices? Gratefully received etc.


Design for things to be easily failable. It should be trivial to have failed messages to go to a DLQ and then reprocess them later on, say after a bug fix.

Only make additive changes, don't change existing fields. For enums it's up to the consumer to ensure they don't fail when a new case is added.

Be very careful with including data (especially time/expiry stuff) in the message too. If you need to reprocess the event several hours later then it may no longer work or be stale. Rather than include the data in the message itself, we would include the database ID and then have the consumer query for that entry.


Rich Hickey's talk about "growth" (as opposed to change) of software systems is a good one for this.

Tldr: ok to add things. Not ok to remove things or change things


> now with events that are harder to trace than function calls would have been

I don't know how this could be true. Events are things - nouns which can be backed-up, replicated, stored, queried, rendered, indexed and searched over.


How is it not true? Instead of tracking data and function calls over a unified stacktrace, you track Things and Messages over databases, queues, and logs —- none of which you can trivially attach a debugger to.

I generally like event-driven architecture, but I need to admit that debuggability is sacrificed where it matters most.


There's no "find usages" for events, and it becomes harder to find out why something didn't happen. A function call can't simply "not return" - in the worst case you get an exception, or a stuck thread in the caller that will show up in a stack dump. But downstream event processing can very easily just not happen, for one of many different reasons, and out-of-the-box it's often difficult to investigate.


> A function call can't simply "not return"

Remember "callback hell"? Assumption that a function call returns after running to completion requires rather specific synchronous cascading architecture, which WILL break in multithreaded code. Most of the multithreaded function calls will set a flag in shared memory and return early, expecting caller to poll.

If your API is based on single entry-point `invokeMethod(callee, method)` it is equally untraceable to event entry point `fireEvent(producer, event)`.


> Most of the multithreaded function calls will set a flag in shared memory and return early, expecting caller to poll.

Which is exactly switching from function calls to event-driven architecture, and the problems with that are exactly the problems we're talking about.


The problems you describe are inherent to indirect invocations and are related to event-driven architectures only because typical event dispatching architecture is built on non-blocking calls.

You do not even need return-early (non-blocking) semantics for these problems to manifest. You can implement a giant string-keyed vtable for all methods in your program (or use a language with advanced reflection capabilities) and will have exactly the same problems. Namely there probably won't be tooling to match caller-callee pairs, which is the core issue here.


In JavaScript,

  const
    myEvent = 'myEvent', 
    target = new EventTarget()

  target.on( myEvent, () => {
    console.log( "It's easy to introspect well-organized code." )
  })

  target.dispatchEvent( new Event( myEvent ))


Yeah, good luck remembering to do that up-front for every event handler. You missed one? Whoops, enjoy the information you wanted silently not being there when you need it.


Remembering to do what? Properly maintain a list of constants and enums to use throughout my application?

That's not something I have to remember or forget, it's a simple habit that is as natural as importing and referencing a function.

As a general rule, numbers and string literals should never be hardcoded. Internalizing this should be a base expectation of any high-performing team member.


What I like about event driven is that you don't even need to know if anyone is listening to or cares about your event.

And as a consumer, many independent tasks can be triggered by the same event.

I'm working on a system right now and because of events, it's very easy for me to write a handler for when a certain type of record is created in the database. My feature depends on knowing that new record was made so we can send some emails and do other things.

The people that wrote the code that creates the record, didn't have to do anything to support the feature.

But I agree that it's not the right solution for every problem. But there are certain problems it solves really well.


> you don't even need to know if anyone is listening to or cares about your event.

Right up until you need to change something about the event because the business logic it represents has changed. Then you suddenly need to track down all the systems that have been relying on it, including that one that nobody knows anything about and always forgets exists because some guy decided to implement the service in erlang and nobody who ever touched it even works at the company anymore.


How is that any different for an API-driven architecture? You'd need to track down all consumers of your API you're wanting to make a breaking change to.


I really dislike this argument, because it puts the duty of managing dependencies and requirements directly on code. This is organizational issue!

First, if your event (or whatever) changes enough that there are inter-component breakages it means engineering requirements must have changed and tracing dependencies of requirements is organizational thing.

Second, you either do trunk based development and constantly break downstream or do leaf based development and have constantly out of date core dependencies. In any case, that's release version management, which is again organizational thing.


And that's why a SAGA describes that flow.

Don't take it into consideration and you're fucked.

Source: previous "seniors" didn't take it into consideration, they left


> now with events that are harder to trace than function calls

Same issue as microservices: there are people who want to use the paradigm but not do the investment in monitoring/tooling.


Amen. Event-driven architecture makes it easier to bury your head in the sand, and harder to implement an actually-working feature.


Integration tests?


...happen too late


I have seen Kafka pulled out by its hairs and replaced with request based architecture.

Event driven architecture, to me is itself an antipattern.

It seems like a replacement for batch processing. Replayable messages are AWESOME. Until you encounter the complexity for a system to actually replay them consistently.

As far as the authors video, while there was some truth in there, it was a little thin, compared to the complexity of these architectures. I believe that even though Kafka acts the part of "dumb pipe", it doesnt stay dumb for long, and the n distributions of Kafka logs in your organization could be 1000x more expensive than a monolithic DB and a monolithic API to maintain.

Yes it appears auditable but is it? The big argument for replayability is that unlike an API that falls over theres no data loss. If you work with Kafka long enough you’ll realize that data loss will become a problem you didnt think you had. You’ll have to hire people to “look into” data loss problems constantly with Kafka. Its just too much infrastructure to even care about.

Theres also, something ergonomically wrong with event drive architecture. People dont like it. And it also turns people into robots who are “not responsible” for their product. Theres so much infrastructure to maintain that people just punt everything back to the “enterprise kafka team”.

The whole point of microservices was to enable flexibility, smart services and dumb pipes, and effective CI/CD and devops.

We are nearing the end of microservices adoption whether it be event or request driven. In mature organizations it seems to me that request driven is winning by a large margin over event driven.

It may be counterintuitive, but the time to market of request driven architecture and cost to maintain is way way lower.


> I believe that even though Kafka acts the part of "dumb pipe", it doesnt stay dumb for long

In my experience programmers are very happy to do everything in the application (something database people often complain about). What kind of problems do you see?

> If you work with Kafka long enough you’ll realize that data loss will become a problem you didnt think you had. You’ll have to hire people to “look into” data loss problems constantly with Kafka.

Not my experience at all, and I've used Kafka at a wide range of companies, from household-name scale to startups. Kafka is the boring just-works technology that everyone claims they're looking for.

I'm no fan of microservices, but Kafka is absolutely the right datastore most of the time.


> and the n distributions of Kafka logs in your organization could be 1000x more expensive than a monolithic DB and a monolithic API to maintain

Not to mention certain observability vendors bleeding you for all those logs you now need to keep an eye on it.

Absolutely agreed on every point


The unseen critical part of the equation


I think the problem here is Kafka and not event driven architecture. I am a strong proponent of not using Kafka for events. It's wrong 90% of the time and for the other 10% you can find better solutions.

Also, people need to understand that "event driven" has nothing to do with "event sourcing". Just don't keep all the events until eternity, because you can (and because some people think you should because "kafka").


I haven't run into weird Kafka data loss issues like you describe - although, I will note, a lot of applications don't actually have much testing to notice something like 1 in 10k messages being dropped if it was happening.[0]

But when I've done that testing, Kafka hasn't been the problem.

The problem I've run into most is that ordering is a giant fucking pain in the ass if you actually want consistent replayability and don't have trivial partitioning needs. Some consumers want things in order by customer ID, other consumers want things in order by sold product ID, others by invoice ID? Uh oh. If you're thinking you could easily replay to debug, the size and scope of the data you have to process for some of those cases just exploded. Or you wrote N times, once for each of those, and then hopefully your multi-write transaction implementation was perfect!

[0] in fairness, a lot of applications also don't guarantee that they never drop requests at all, obviously. 500 and retry and hope that you don't run out of retries very often; if you do, it's just dropped on the ground and it's considered acceptable loss to have some of that for most companies/applications.


I would say that requiring in-order events is a huge anti-pattern. What guarantee do you have that they were actually produced in-order and all the clocks are in-sync enough to know that without a doubt?


What are the causes of data loss?


Jepsen has written a fantastic article on this issue. I'm not sure if it has been fixed since then. https://aphyr.com/posts/293-call-me-maybe-kafka


Not specifically about event-driven, but the most damaging anti-pattern I would say is microservices.

In pretty much all projects I worked with in recent years, people chop up the functionality into small separate services and have the events be serialised, sent over the network and deserialised on the other side.

This typically causes enormous waste of efficiency and consequently causes applications to be much more complex than they need to be.

I have many times worked with apps which occupied huge server farms when in reality the business logic would be fine to run on a single node if just structured correctly.

Add to that the amount of technology developers need to learn when they join the project or the amount of complexity they have to grasp to be able to be productive. Or the overhead of introducing a change to a complex project.

And the funniest of all, people spending significant portion of the project resources trying to improve the performance of a collection of slow nanoservices without ever realising that the main culprit is that the event processing spends 99.9% of the time being serialised, deserialised, in various buffers or somewhere in transit which could be easily avoided if the communication was a simple function call.

Now, I am not saying microservices is a useless pattern. But it is so abused that it might just as well be. I think most projects would be happier if the people simply never heard about the concept of microservices and instead spent some time trying to figure how to build a correctly modularised monolithic application first, before they needed to find something more complex.


Also, the single most nonsensical reason that people give for doing microservices is that "it allows you to scale parts of the application separately". Why the fuck do you need to do that? Do you scale every API endpoint separately based on the load that it gets? No, of course not. You scale until the hot parts have manageable load and the cold parts will just tag along at no cost. The only time this argument makes sense is if one part is a stateless application and the other part is a database or cache cluster.

Microservices make sense when there are very strong organizational boundaries between the parts (you'd have to reinterview to move from one team to the other), or if there are technical reasons why two parts of the code cannot share the same runtime environment (such as being written in different languages), and a few other less common reasons.


Oh, it is even worse.

The MAIN reason for microservices was that you could have multiple teams work on their services independently from each other. Because coordinating work of multiple teams on a single huge monolithic application is a very complex problem and has a lot of overhead.

But, in many companies the development of microservices/agile teams is actually synchronised between multiple teams. They would typically have common release schedule, want to deliver larger features across multitude of services all at the same time, etc.

Effectively making the task way more complex than it would be with a monolithic application


I've worked with thousands of other employees on a single monolithic codebase, which was delivered continuously. There was no complex overhead.

The process went something like this:

1. write code

2. get code review from my team (and/or the team whose code I was touching)

3. address feedback

4. on sign-off, merge and release code to production

5. monitor logs/alerts for increase in errors

In reality, even with thousands of developers, you don't have thousands of merges per day, it was more like 30-50 PRs being merged per day and on a multi-million line codebase, most PR's were never anywhere near each other.


Regarding monoliths...when there's an issue, now everyone who made a PR is subject to forensics to try to identify cause. I rather make a separate app that is infrequently changed, resulting in less faults and shorter investigations. Being on the hook to try to figure out when someone breaks "related" to my team's code, is also a waste of developer time. There is a middle ground for optimizing developer time, but putting everything in the same app is absurd, regardless of how much money it makes.


I'm not sure how you think microservices gets around that (it doesn't!).

We didn't play a blame game though... your team was responsible for your slice of the world and that was it. Anyone could open a PR to your code and you could open a PR to anyone else's code. It was a pretty rare event unless you were working pretty deep in the stack (aka, merging framework upgrades from open source) or needing new API's in someone else's stuff.


> I'm not sure how you think microservices gets around that (it doesn't!).

Microservices get around potential dependency bugs, because of the isolation. Now there's an API orchestration between the services. That can be a point of failure. This is why you want BDD testing for APIs, to provide a higher confidence.

The tradeoff isn't complicated. Slightly more work up front for less maintenance long term; granted this approach doesn't scale forever. There's not any science behind finding the tipping point.


> Microservices get around potential dependency bugs, because of the isolation.

How so? I'd buy that bridge if you could deliver, but you can't. Isolation doesn't protect you from dependency bugs and doesn't protect your dependents from your own bugs. If you start returning "payment successful" when it isn't; lots of people are going to get mad -- whether there is isolation or not.

> Now there's an API orchestration between the services

An API is simply an interface -- whether that is over a socket or in-memory, you don't need a microservice to provide an API.

> This is why you want BDD testing for APIs, to provide a higher confidence.

Testing is possible in all kinds of software architectures, but we didn't need testing just to make sure an API was followed. If you broke the API contract in the monolith, it simply didn't compile. No testing required.

> Slightly more work up front for less maintenance long term

I'm actually not sure which one you are pointing at here... I've worked with both pretty extensively in large projects and I would say the monolith was significantly LESS maintenance for a 20 year old project. The microservice architectures I've worked on have been a bit younger (5-10 years old) but require significantly more work just to keep the lights on, so maybe they hadn't hit that tipping point you refer to, yet.


50 PRs with a thousand developers is definitely not healthy situation.

It means any developer merges their work very, very rarely (20 days = 4 weeks on average...) and that in my experience means either low productivity (they just produce little) or huge PRs that have lots of conflicts and are PITA to review.


Heh, PRs were actually quite small (from what I saw), and many teams worked on their own repos and then grafted them into the main repo (via subtrees and automated commits). My team worked in the main repo, mostly on framework-ish code. I also remember quite a bit of other automated commits as well (mostly built caches for things that needed to be served sub-ms but changed very infrequently).

And yes, spending two-to-three weeks on getting 200 lines of code absolutely soul-crushingly perfect, sounds about right for that place but that has nothing to do with it being a monolith.


> Also, the single most nonsensical reason that people give for doing microservices is that "it allows you to scale parts of the application separately". Why the fuck do you need to do that? Do you scale every API endpoint separately based on the load that it gets? No, of course not. You scale until the hot parts have manageable load and the cold parts will just tag along at no cost. The only time this argument makes sense is if one part is a stateless application and the other part is a database or cache cluster.

I think it really matters what sort of application you are building. I do exactly this with my search engine.

If it was a monolith it would take about 10 minutes to cold-start, and it would consume far too much RAM to run a hot stand-by. This makes deploying changes pretty rough.

So the index is partitioned into partitions, each with about a minute start time. Thus, to be able to upgrade the application without long outages, I upgrade one index partition at a time. With 9 partitions, that's a rolling 10%-ish service outage.

The rest of the system is another couple of services that can also restart independently, these have a memory footprint less than 100MB, and have hot standbys.

This wouldn't make much sense for a CRUD app, but in my case I'm loading a ~100GB state into RAM.


> Why the fuck do you need to do that?

Because deploying the whole monolith takes a long time. There are ways to mitigate this, but in $currentjob we have a LARGE part of the monolith that is implemented as a library; so whenever we make changes to it, we have to deploy the entire thing.

If it were a service (which we are moving to), it would be able to be deployed independently, and much, much quicker.

There are other solutions to the problem, but "µs are bad, herr derr" is just trope at this point. Like anything, they're a tool, and can be used well or badly.


Yes. There are costs to having monoliths. There are also costs to having microservices.

My hypothesis is that in most projects, the problems with monoliths are smaller, better understood and easier to address than the problems with microservices.

There are truly valid cases for microservices. The reality is, however, that most projects are not large enough to qualify to benefit from microservices. They are only large projects because they made a bunch of stupid performance and efficiency mistakes and now they need all this hardware to be able to provide services.

As to your statement that deploying monoliths takes time... that's not really that big of a problem. See, most projects can be engineered to build and deploy quickly. It takes truly large amount of code to make that real challenge.

And you still can use devops tools and best practices to manage monolithic applicaitons and deploy them quickly. The only thing that gets large is the compilation process itself and the size of the binary that is being transferred.

But in my experience it is not out of ordinary for a small microservice functionality that has just couple lines of code to produce image that take gigabytes in space and takes minutes to compile and deliver, so I think the argument is pretty moot.


Also - you give up type safety and refactoring. LoL


Well, technically, you can construct the microservices preserving type safety. You can have an interface with two implementations

- on the service provider, the implementation provides the actual functionality,

- on the client, the implementation of the interface is just a stub connecting to the actual service provider.

Thus you can sort of provide separation of services as an implementation detail.

However in practice very few projects elect to do this.


Even with this setup in place you need a heightened level of caution relative to a monolith. In a monolith I can refactor function signatures however I desire because the whole service is an atomically deployed unit. Once you have two independently deployed components that goes out the window and you now need to be a lot more mindful when introducing breaking changes to an endpoint’s types


You don't have to. The producers of the microservice also produces an adapter. The adapter looks like a regular local service, but it implements the code as a REST request to another microservice. This was you get you type-safety. Generally you structure the code as

Proj:

|-proj-api

|-proj-client

|-proj-service

Both proj-client and proj-service consume/depend-on proj-api so they are in sync of what is going on.

Now, you can switch the implementation of the service to gRPC if you wanted with full source compatibility. Or move it locally.


Can someone share some long term event driven success stories? Almost everything you see online is written by consultants or brand new, greenfield implementations, curious how long these systems last.


Chiming in with another “no” here. We adopted a message bus/event-driven architecture when moving a very popular piece of software from the cloud, to directly on the user’s device… it was a disaster IMO.

The core orchestration of the system was done via events on the bus, and nobody had any idea what was happening when a bug occurred. People would pass bugs around, “my code did the right thing given the event it got”, “well, my code did the right thing too”, and nobody understood the full picture because everyone was stuck in their own silo. Event driven architectures encourage this: events decouple systems such that you don’t know or care what happens when you emit a message, until one day it’s emitted with slightly different timing or ordering or different semantics, and things are broken and nobody knows why.

The worst part is that software is basically “take user input, do process A on it, then do process B on that, then do process C on that.” It could have so easily been a simple imperative function that called C(B(A(input))), but instead we made events for “inputWasEmitted”, “Aoutput”, “Boutput”, etc.

What happens when system C needs one more piece of metadata about the user input? 3 PR’s into 3 repos to plumb the information around. Coordinating the release of 3 libraries. All around just awful to work with.

Oh and this is a very high profile piece of software with a user base in the 9 figure range.

(Wild tangent: holy shit is hard to get iOS to accept “do process” in a sentence. I edited that paragraph at least 30 times, no joke, trying every trick I could to stop it correcting it to “due process”. I almost gave up. I used to defend autocorrect but holy shit that was a nightmare.)


I think the true term for this phenomenon is "decoherence" rather than "decoupling". Your components are still as coupled as they ever were, but the coupling has moved from compile-time (e.g. function calls) to runtime. The component that "handles events" decoheres the entire system because it's now responsible for the entire messaging layer between components, rather than the individual components being responsible for their slice of the system.


That's a great name for that property. I've always cringed when people say 'something something decoupling' because most of the time the end result is actually just as coupled, but ends up indirected or something. Now I have a more specific word for it, thanks!!


> (Wild tangent: holy shit is hard to get iOS to accept “do process” in a sentence. I edited that paragraph at least 30 times, no joke, trying every trick I could to stop it correcting it to “due process”. I almost gave up. I used to defend autocorrect but holy shit that was a nightmare.)

can you not just pick the original spelling in the autocomplete menu above the keyboard?


We use an event driven architecture at work and find it works quite well, however events are for communicating between services across business domains and owned by different teams.

If you have some logic A and B running on user input, I wouldn't be splitting that across different services.


https://www.eventstore.com/case-studies/insureon

I can attest to this case study being 100% true. Our platform has been using EventStore as our primary database for 9 years going strong, and I'm still very happy with it. The key thing is that it needs to be done right from the very beginning; you can't do major architecture reworks later on and you need an architect who really knows what they're doing. Also, you can't half-ass it; event sourcing, CQRS, etc all had to embraced the entire time, no shortcuts.

I will say though, the biggest downside is that scaling is difficult since you can't always rely on snapshots of data, sometimes you need to event source the entire model and that can get data heavy. If you're standing up a new projector, you could be going through tens of millions of events before it is caught up which requires planning. It is incredible though being able to have every single state change ever made on the platform available, the data guys love it and it makes troubleshooting way easier since there's no secrets on what happened. The biggest con is that most people don't really understand it intuitively, since it's a very different way of doing things, which is why so many companies end up fucking it up.


Am I dumb or is this basically the binlog of your database but without the tooling to let you do efficient querying?

Like I get the "message bus" architecture when you have a bunch of services emitting events and consumers for differing purposes but I don't think I would feel comfortable using it for state tracking. Especially when it seems really hard to enforce a schema / do migrations. CQRS also makes sense for this but only when it functions as a WAL and isn't meant to be stored forever but persisted by everyone who's interested in it and then eventually discarded.


> Especially when it seems really hard to enforce a schema / do migrations

Enforcing the schema isn't too hard ime. But every migration needs to be bi-directionally compatible. That's likely what they meant with "you need an architect and can't make major changes later on"

It's the same issue you've had with nosql, even though you technically do have a schema


Pretty much. Also all your projectors need to be deterministic.

eg. Your commands have to ALWAYS do the same thing else replaying the event log does not produce the same output and then you’re back to square one.

It’s usually easier / more useful to just use an audit table.


Yes, if version 1.3 of Command handler X was the active version when an event happened, then it needs to be replayed with that version, even if you’re now on v4.5.


> Am I dumb or is this basically the binlog of your database but without the tooling to let you do efficient querying?

Yes, and I honestly think a traditional database that exposed this stuff would be a winner (but I guess it's too hard, and meanwhile event-sourcing frameworks are building better alternatives). Separating the low-level part from the high-level part that does things like indexing and querying has a lot of advantages: you decouple data storage from validation so you can have validated data without having to drop invalid data on the floor, you decouple index updates from data writes so your scaling problems get way simpler, you can get sensible happens-before semantics without needing transactions that can deadlock and all the crazy stuff that traditional databases do (secret MVCC implementations while the database goes out of its way to present an illusion of a single "current" version of the data, snapshotting that you can't access directly, ...).


For events you include a version. When you're only adding properties to the event or removing properties (assuming you defensively write the projectors), no need for a new version, but if you're creating a breaking change in event schema, you'd increment the version for the event and update your projector to handle each version. It's simpler than you'd think.


Note that event sourced data and event based architecture are different things. You can have one without the other.


It's a type of event driven architecture, since events generated both hydrate models and trigger event listeners/subscribers. For example, a command to update a customer's address might also have a process manager listening for that event that kicks off a process to send an email notification. That same event is still used to event source the customer model which includes an address.

I suppose you could have event sourcing in a purely isolated manner where the events aren't used for anything else, but you'd be severely limiting the advantages that come free with that.


That’s what I mean. I worked on services that have event sourced db model but synchronous REST API. And I’ve worked on services that communicate with events for IPC but use relational sql for data store.

Your example uses the same events for both so sure that can be done but doesn’t have to. I haven’t worked on a system like that personally so maybe it can fine.

But honestly I’m a bit skeptical since that removes services’ data sovereignty. Sounds like the recipe for “distributed monolith” architecture. And actually I just remembered another team at my company is ripping out their use of kafka as their data source on a green field project cuz it was a disaster, so skepticism emphasized.


I was the lead developer on one for an insurance company a few years back, and it’s still in active use. Insurance is a heavily regulated domain, where an audit trail is more important than performance. There was a natural pattern for it to follow, as we were mapping a stable industry standard.

I also tried doing it in a property setting, where profit margins were tight. The effort needed wasn’t worth the cost, and clients didn’t really care about the value proposition anyway. We pretty much replaced the whole layer with a more traditional crud system.


What did you mean traditional crud as oppose to event-driven arch? How is it relevant to the subject in dicussion?


Event-driven: At runtime, the client tells the system what has happened, the system stores the event and is configured in advance for how to react to it.

CRUD: Imperative. Client tells us to create/update a specific entity with some data.


When I did game dev I often went for an even driven approach or messaging based systems combined with oop and state machines to prevent eventual consistency locally. It works great in that domain, albeit not being the most performant solution.

In web or business systems it works well for some(!) parts. You just shouldn’t do everything that way - but often people get too exited about a solution and then they tend to overdo it and apply it everywhere, even when not appropriate.

Always chose the golden middle path and apply patterns where they fit well.


Wrote a public transport ticketing system that processes 100-200K+ trips/day with sub-second push of notification to mobiles of trip/payments.

Event driven and CQRS "entities" made logic and processing much easier to create/test/debug.

Primary issues: 1. Making sure you focus on the "Nouns" (entities) not the "Verbs". 2. Kafka requiring polling for consumers sucks if you want to "scale to zero". 3. Sharding of event consumers can be complicated. 4. People have trouble understanding the concepts and keep wanting to write "ProcessX" type functions instead of state machines and event handlers. 5. Retry/replay is complicated, better to reverse/replay. Dealing with side effects in replay is also complicated (does a replay generate the output events which trigger state changes in other entities?)

Been running now for 6 years, minimal downtime except for maintenance/upgrades.

In the process of introducing major new entity and associated changes, most of the system unaffected due to the decoupling.


Can you elaborate #1 Nouns over Vers?


A lot of people focus on the process instead of the participating entities.

The focus when designing the system should be on the entities (Customer, Payment, Bill, Order, Inventory) instead of the processes (ordering, billing, fulfillment). I summarize that by saying "Nouns over Verbs".

The state of each of the entities is affected by the processes, but the effect happens from changes in other entities, Customers place an Order. Customers get a Bill for the Order, Customers make a Payment, etc.

The states of each of these entities is independent of the others and reacts/changes only as a result of two things, either an external "Command", or an "Event".

Commands are events that occur outside of the system boundary, usually visible as part of an API (if RESTful) that uses POST/PUT/DELETE or they are imperatives from one entity to another.

Commands are imperatives, Place Order, Pay Bill, Fulfill Order, etc.

Events are records of occurrences in the system, expressed in the past tense and are immutable. Order Placed, Bill Paid, Order Fulfilled.

Customers place an Order by POSTing to /orders (or potentially /customers/uuid/orders).

Events are generated from entities inside the system. (Order being placed generates an order_placed event).

The difference is that by focussing on the entities, and their state, independent of other entities, the entities can be created, tested, installed, evolved independently of other entities in the system.

The thinking about them is simplified and focussed, they are naturally decoupled because they can only find out about other entities by inquiring or affect other entities by generating a Command or an Event.

Any events they generate are processed asynchronously and can have multiple consumers.


No. We have a complete fucking disaster on our hands.


How old of a system? Do you feel it’s the implementation, the design, or the concept itself that went wrong? Is your system a good fit?

(No stake in this one way or another, just curious.)


Less than 5 years. Vanity project. Built and maintained by astronaut architects. Entirely unnecessary. Poorly implemented down to the level of wire contracts being inconsistent. Overheads are insane both from engineering and operational POV.


Resume driven development never goes out of style


I call this one FDD: Fuckwit Driven Development. Because if it was resume-driven I'd expect it to be something that they would want to put on their resume. But this is unmentionable.


There's sayings along the lines of "Victory/success has a thousand fathers, but defeat/failure is an orphan."

Chances are that system, and its outcomes are described very differently on a resume


As long as the list of technologies used is impressive sounding you're on to a winner.


Hail, RDD, my favorite development style.


I work on an event-based architecture that I think is successful, but that’s because our core primitives are event-based, so there is no impedance mismatch in the way that there can be if you migrate from a request-response architecture to an evented one. Specifically, we aren’t trying to deal with databases and HTTP (both of which are largely synchronous primitives). Instead, I work on a platform for somewhat arbitrary code execution; and the code we are executing depends on our code rather than vice versa. In general, the code we execute on the platform can run for an indeterminate amount of time, and it generally has control and calls back into our code rather than our code calling into it. So our control flow is naturally callback-based rather than request/response; as a result, our system is fundamentally event-based.


I have been doing this kind of stuff both in ad tech and trust & safety industry, mainly to handle scalability. Something that looks like "Event-carried state transfer" here https://martinfowler.com/articles/201701-event-driven.html

These system are working fine, but maybe a common ground : * very few services * the main throughput is "fact" events (so something that did happen) * what you get as "Event carried state transfer" is basically the configuration. One service own it, with a classical DB and a UI, but then expose configuration to all the system with this kind of event (and all the consumers consume these read only) * usually you have to deal with eventual consistency a lot in this kind of setup (so it scales well, but there is a tradeoff)


PostgreSQL.

The WAL is an event log, and when you squint at its internal architecture, you’ll see plenty of overlap with distributed event sourcing.


Likewise with git. There's the "top-level events" that you see (commits). But even when you're doing 'unsafe' operations, you're working with the lower-level reflog events.


Almost every modern software system. Anything running over the Web is event driven.


99.99% of the data we consume on the Web comes out of databases [call it Transactional-SQL-xxx or ColumnBased-yyy or Elastic-SaaS-zzz].


Well, yes. Databases are event driven themselves.

As is any web application, because the web (at least without sockets) is constrained into communicating events only.

Also, most local GUI applications, because people just like events better for it.


Right. The hard part is already done. Which makes it infuriating that it's all "internal". Every serious RDBMS already contains an implementation of an event-sourcing system, but you're not allowed to actually use it.


We've had mistakes that we've been able to course-correct from.

Our users are small-businesses with organisation numbers, and we mostly think of them as unique. But they strictly aren't, so we 'overwrote' some companies with other companies.

Once we detected and fixed the bug, we just replayed the events with the fixed code, and we hadn't lost any data.


AFAIK almost every stock market order processing system is event driven, and they are all usually very old systems that have been successfully running for years. I've seen some implementation in investment banks, what you're usually told is that most exchanges and banks run similar architectures. The reason for this is partially that FIX, the protocol for electronic orders in markets is event based.


It is a very convenient way to move higher latency operations from the realtime path to a near real time path. E.g. you want to send an email when a payment is authorized, you don’t want to wait for the whole SMTP transaction so you just post an even and reply back to the user. Also settlements of captured autos, 5st sort of thing. Even saving some user pref, start the task, reply back to user. and if it fails async send a failure msg.


I've seen successful, but flawed usage.

Every use I've seen sent events after database transactions, with the event not part of the transaction. This means you can get both dropped events, and out of order events.

My current company has analytics driven by a system like that. I'm sure there's some corrupted data as a result.

The main issue being people just don't know how to build and test distributed systems.


I had an interview where I was asked how I would guarantee that an event happened in addition to a database update (transactionally).

It sounded kind of impossible, I said as much, and then proposed a different approach. The interviewer persisted and claimed that it could be done with 'the outbox pattern'.

I disagreed and ended the interview there. Later when I was chatting about it with a former colleague, he said "Oh, they solved the two generals problem?"

> Every use I've seen sent events after database transactions, with the event not part of the transaction.

Maybe this is what they were doing.


I don't quite see what the outbox pattern has to do with the two generals problem.

The point of the outbox pattern is that a durable record of the need to send an event is stored in the DB as part of the DB txn, taking advantage of ACID guarantees.

Once you have that durable record in your DB, you can essentially treat your DB as a queue (there's lot of great articles on how to do this with Postgres for instance) for some worker processes to later process the queue records, and send the events.

The worker processes in turn can decide if they want to attempt at least once, or at most once delivery of the message. Of course if you choose the later, then maybe your event is never sent, and perhaps that was the point you were trying to make to the interviewer.

They key takeaway though is that you are no longer reliant on the original process that stores the DB txn to also send the event, which can fail for any number of reasons, and may have no path to recovery. In other words, at least once delivery is now an option on the table.


> I don't quite see what the outbox pattern has to do with the two generals problem.

Well then, hopefully you would have found it an unsatisfactory 'solution' and walked away from that interview too ;)

> Once you have that durable record in your DB, you can essentially treat your DB as a queue (there's lot of great articles on how to do this with Postgres for instance) for some worker processes to later process the queue records, and send the events.

Yeah but I already have a queue I can treat as a queue. It's called Kafka.


You could two phase commit with XA compliant broker and database.


I'm a bit confused by the story. Why did you disagree?


They asked how I would guarantee that a Postgres update and a Kafka event could both happen transactionally.

  (P, K)
Which sounds like one of those classical impossibility proofs.

Their solution was to introduce another part into the system, "the outbox":

  (P, O) K
P and O can form a transaction, but that still leaves the original problem unanswered, how to include K in the transaction.


The goal of the outbox pattern is at-least-once publishing though, not only-once. You either get P + (eventually) at least one copy of K, or you get no P and no K.

Without the outbox you can get P without K or K without P, which lead to consumers out of sync with the producer.

This requires the consumer to be able to deal with repeated events to some extent, but you usually want that anyway, since an event can be processed twice even if it appears only once in the topic. For instance if the consumer crashes after processing but before updating the offset.


> The goal of the outbox pattern is at-least-once publishing though, not only-once.

Right, which is why it's an unacceptable solution to 'transacting over postgres and Kafka', and why I wouldn't want to work for a company that wants me to believe differently.

And there's a better solution anyway: just K.


I think you're just using "transaction" in a different sense than what the interviewer meant; "guarantee that an event happened in addition to a database update" sounds like at-least-once to me, and it's normally what you would want in this kind of system.


It's been an incredibly useful pattern for me in game development. I have a hard time imagining making a game with any level of complexity without it. You can definitely go overboard with it, but I have a hard time even imagining how some systems like collision detection/a physics engine could even work without it.


That's generally been my experience as well.

However I've seen some frameworks where you can do collision imperatively. For example

if (sprite.collide(tilemap)) {do something}

These are generally on smaller less taxing frameworks (in this case I'm referring to haxeflixel) but they do exist!


I've worked in an embedded Linux system that was a greenfield project. We needed a library that was written in a certain language, but we also wanted Python for the rest because getting the logic right with a client that changed his mind often was top priority and the data crunching was minimal.

So we ended up using protobufs over a local MQTT broker and adopted a macro-service architecture. This suited the project very well because it had a handful of obvious distinct parts and we took full advantage of Conway's law by making each devs work the part where their strengths and skills were maximized.

We made a few mistakes along the way but learned from them. Most of them relating to inter-service asynchronous programming. This article put words on concepts we learned through trial and errors, especially queries disguised as events.


Our system is command driven, and works well, but it is because we explicitly have less rigorous demands on the messages and the messages don't cross team boundaries. My past experience also makes me wary of event driven systems.


I saw it done well in manufacturing.

I think it works well when it's the only thing that can work.


The project I'm working on is about 13 years old (ruby on rails) with over 260 engineers and the product has a very robust event driven system that is at the core of a lot of important features.


Out of curiosity, what is the system? Is this based on Rails Event Store, or something else (custom?)?


Webhooks? Slack automation? GitHub actions?


Yes, it’s a great tool for integration. We have a product suite and it’s our chosen way to connect products.


No. The usual pains are:

- Producer and consumer are decoupled. That’s a good thing m right? Good luck finding the consumer when you need to modify the producer (the payload). People usually don’t document these things

- Let’s use SNS/SQS because why not. Good luck reproducing producers and consumers locally in your machine. Third party infra in local env is usually an afterthought

- Observability. Of rather the lack of it. It’s never out of the box, and so usually nobody cares about it until an incident happens


> Good luck finding the consumer when you need to modify the producer

It sounds like your alternative is a producer that updates consumers using HTTP calls. That pushes a lot of complexity to the producer and the team that has to sync up with all of the other teams involved.

> Let’s use SNS/SQS because why not. Good luck reproducing producers and consumers locally in your machine

At work we pull localstack from a shared repo and run it in the background. I almost forget that it's there until I need to "git pull" if another team has added a new queue that my service is interested in. Just like using curl to call your HTTP endpoints, you can simply just send a message to localstack with the standard aws cli

https://github.com/localstack/localstack

> Observability. Of rather the lack of it. It’s never out of the box, and so usually nobody cares about it until an incident happens

I think it depends on what type of framework you use. At work we use a trace-id field in the header when making HTTP calls or sending a message (sqs) which is propagated automatically downstream. This enables us to easily search logs and see the flow between systems. This was just configured once and is added automatically for all HTTP requests and messages that the service produces. We have a shared dependency that all services use that handles logging, monitoring and other "plumbing". Most of it comes out of the box from Spring, and the dependency just needs to configure it. The code imports a generic sns/http/jdbc producer and don't have to think about it


> - Let’s use SNS/SQS because why not.

The amount of times I've come across someone who's inserted SQS into the mix to "speed things up"...


> Good luck finding the consumer when you need to modify the producer (the payload)

I just grep for the event's class name.


> Can someone share some long term event driven success stories?

JavaScript


I think we have very different definitions of success


Banking, 7 ish years. Worked well for us in general. When I start needing increased confidence and truth the effort level goes way up but can be done. Definitely still worth it, has given us some solid benefits.

When I say increased, I mean we want the best answer but there are some answers the bank can’t know. If someone has transferred money into your account from another bank but we don’t know that yet, optimising for absolute correctness is pointless because the vast majority of wrong answers are baked in to the process. We can send you a message and you might read it a day later. Unless we delete the message from your phone, we can’t guarantee the message you read is fully consistent with our internal state.

Frankly our system is much better than the batch driven junk that is out of sync a second after it has executed. “Hey you have a reward.” “No I used it 2 hours ago you clowns.”

Note this isn’t cope. In some cases we started fully sync but relaxed it where there are tradeoffs that gave us better outcomes and we weren’t giving anything material up.


Or worse “hey you have a reward” but it doesn’t show up in the UI for three minutes. Twitter used to do this to me all the time.


Eventually consistent means just that, I guess. But I’m sure their reasoning was a lot more sophisticated or impacted by scale than most. Cool problem.


Does canbus count?


I've worked in a large company where some variation of event driven architecture was used everywhere and treated as the word of G-d. Fairly successfully. Mostly in applications that ran on a single machine.

I've ended up in a lot of arguments about this while we were building larger distributed systems because I've come from a more request/response oriented message passing architectures. I.e. more synchronous. What I've found is that the event driven architecture did tend to lead to less abstractions and more leaked internal details. This isn't fundamental (you can treat events like an API) but was related to some details in our implementation (something along the line of CDC).

Another problem with distributed systems with persistent queues passing events is that if the consumer falls behind you start developing a lag. Yet another considerations is that the infrastructure to support this tends to have some performance penalties (e.g. going through Kafka with an event ends up being a lot more expensive than an RPC call). Overall it IMO makes for a lot of additional complexity which you may need in some cases, but if you don't then you shouldn't pay the cost.

What I've come to realize is that in many ways those systems are equivalent. You can simulate one over the other. If you have an event based system you can send requests as events and then wait for the response event. If you have a request/response system you can simulate events over that.

If we look at things like consensus protocols or distributed/persistent queues then obviously we would need some underlying resources (e.g. you might need a database behind your request/response model). So... Semantics. Don't know if others have a similar experience but when one system is mandated people will invent workarounds that end up looking like the other paradigm, which makes things worse.

There are things that conceptually fit well with an event driven architecture and then there are things that fit well with a request/response model. I'm guessing most large scale complex distributed apps would be best supporting both models.


Every synchronous call is, in fact, asynchronous. We just hide it on the stack, in the return address, in the TCP connection etc. No call is really blocking anymore, the OS or the CPU will run some other thread. People who insist that things are simpler in a synchronous model are just ignoring the actual mechanics. Which is fine, that's just abstraction.


Well, a CALL instruction on your CPU is pretty much synchronous. But it's true that sometimes what looks synchronous is often asynchronous under the hood. The CPU itself is I guess "asynchronous", a bunch of flipflops and signals ("events") ;)

I can recall software where I tried to wrestle a bunch of asynchronous things into looking more synchronous and then software where I really enjoyed working with a pure asynchronous model (Boost.Asio FTW). Usually the software where I want things to be synchronous is where for the most part I want to execute a linear sequence of things that depend on each other without really being able to use that time for doing other things vs. software where I want all things to happen at the same time all the time (e.g. being able to take in new connections over the network, serve existing connections etc.) and spinning threads for doing that is not a good fit (performance or abstraction-wise).

The locality of the synchronous model makes it easier to grok as long as you're ok with not being able to do something else while the asynchronous thing is going on. OTOH state machines, or statecharts to go further, which are an inherently asynchronous view, have many advantages (But are not Turing Complete).


> What I've found is that the event driven architecture did tend to lead to less abstractions and more leaked internal details. This isn't fundamental (you can treat events like an API)

I'd put it the other way: event driven architecture makes it safer to expose more internal details for longer, and lets you push back the point where you really need to fully decouple your API. I see that as an advantage; an abstract API is a means not an end.

> Another problem with distributed systems with persistent queues passing events is that if the consumer falls behind you start developing a lag.

Isn't that what you want? Whatever your architecture, fundamentally when you can't keep up either you queue or you start dropping some inputs.

> If you have a request/response system you can simulate events over that.

How? I mean you can implement your own eventing layer on top of a request/response system, but that's going to give you all the problems of both.

> If we look at things like consensus protocols or distributed/persistent queues then obviously we would need some underlying resources (e.g. you might need a database behind your request/response model).

Huh?

> Don't know if others have a similar experience but when one system is mandated people will invent workarounds that end up looking like the other paradigm, which makes things worse.

I agree that building a request/response system on top of an event sourcing system gives you something worse than using a native request/response system. But that's not a good reason to abandon the mandate, because building a true event-sourcing system has real advantages, and most of those advantages disappear once you start mixing the two. What you do need is full buyin and support at every level rather than a mandate imposed on people who don't want to follow it, but that's true for every development choice.


Everything is a means and not an end but decoupling via an explicit API makes change easier. Spreading state across your system via events (specifically synchronizing data across systems via events relating to how that data changes) creates coupling.

re: Huh. Sorry I was not clear there. What I meant is you can not create persistent queue semantics out of a request/response model without being able to make certain kinds of requests that access resources. Maybe that's an obvious statement.

re: mandate. I think I'm saying these sort of mandates inevitable result in poor design. even the purest of purest event sourcing systems actually use requests/response simply because that is the fundamental building block of systems. E.g. Kafka uses gRPCs from the client and waits for a response in order to inject something into a queue. The communication between Kafka nodes is based on messages. The basic building block of any distributed computer system is a packet (request) being sent from one machine to another, and a response being sent back (e.g. TCP control messages). A mandate that says though shall build everything on top of event sourcing is sort of silly in this context since it should be obvious the building blocks of event sourced systems use requests/response. Even without this nit-picking restricting application developers to only build on top of this abstraction inevitably leads to ugliness. IMO anyways and having seen this mandate at work in a large organizations. Use the right tool for your job is more or less what I'm saying or the other famous way of stating this is when all you have is a hammer everything looks like a nail.

re: isn't that what you want. well, if it is what you want then it is what you want, but many systems are ok with things just getting lost and not persisted. e.g. an HTTP GET request from a browser, in the absence of a network connection, is just lost, it's not persisted to be played later, and so there is no way to build a lagging queue with HTTP GET requests that are yet to be processed. Again, maybe an obvious statement.


> decoupling via an explicit API makes change easier. Spreading state across your system via events (specifically synchronizing data across systems via events relating to how that data changes) creates coupling.

An explicit API comes at a cost; the way I'd put it is that the inherently lower coupling of events (because e.g. you can publish the same events in multiple formats, whereas a request-response API generally needs to have a single response format) means that you have more slack to defer that cost for longer (i.e. it takes you longer to reach the point where the overall system coupling is too bad and you need to introduce those API layers).

I'm not sure I follow what you're saying about sharing events relating to how data changes. IMO if you need a shared view of "the current version of the data", the right solution is to publish that as events too.

> E.g. Kafka uses gRPCs from the client and waits for a response in order to inject something into a queue. The communication between Kafka nodes is based on messages. The basic building block of any distributed computer system is a packet (request) being sent from one machine to another, and a response being sent back (e.g. TCP control messages).

I don't know the details of kafka's low-level protocols, but it's certainly possible to build these systems based on one-way messaging all the way down; gRPC has one-way messages, plenty of protocols are built on one-way UDP rather than TCP...

> e.g. an HTTP GET request from a browser, in the absence of a network connection, is just lost, it's not persisted to be played later, and so there is no way to build a lagging queue with HTTP GET requests that are yet to be processed.

Right, because HTTP is a request-response protocol. Whereas Kafka does buffer messages and send them later if you lose your network connection for a short time (of course there is a point at which it will give up and call your error handler).

I don't think the fact that HTTP works that way means it's desirable to just abandon those requests - e.g. in fact these days if you navigate to a page with Chrome when you have no network connection it will make the request when you're back online and send you a notification that it's loaded the page that it couldn't load earlier.


Usually when you're ingesting something into Kafka it's important to know whether that was successful or not, hence the more or less inherent request/response that's part of that. That said it's an interesting thought experiment to see how far you can go without that.

When I think of large scale success stories around the request/response model I think AWS (where famously Bezos mandated APIs first) and Google. Both now have services that look more event oriented (e.g. SQS or Firebase). And ofcourse the modern web (though the ugly hacks needed to make something look like event driven was certainly not fun).

Events related to data changes are about keeping data structures in sync via events. Also known as state-based architecture. Something I worked on in the early 2000's kept a remote client/UI in sync with the server and a database using events like that and was a pretty lean/neat implementation.

Good one on the Chrome re-making requests when you're online for an active tab. That's certainly an interesting use case.

My intuition is that some things are very naturally events. Let's say a packet arriving into your computer. A mouse click. And some things are naturally a request-response. Let's say calculating the Cosine of an angle. You can replace x = sin(y) with an event, and an event that comes back, but that feels awkward as a human. Maybe not the best example...

It's another variation on the sync vs. async debates I guess. Coroutines or callbacks...



I’m starting to get a sad about event driven stuff.

I’ve used it with a good degree of success in some data pipeline and spark stuff to have stuff automatically kick off, without heinous conditional orchestration logic. I also use evented stuff over channels in a lot of my rust code with great success.

However, echoing the sentiments of some other comments: most articles about event driven stuff seem to be either marketing blogspam or “we tried it and it was awful”. To be honest I look at a lot of those blog posts and about half the time my thoughts are “no wonder that didn’t work out, that’s an insane design” but is that just “you’re-doing-it-wrong-cope”?

Are there success stories out there that just aren’t being written? Is there just no success stories? Is the architecture less forgiving of poor design and this “higher bar of entry” torpedoes a number of projects? Is it more susceptible to “architecture astronauts” which dooms it? Is it actually decent, but requires a somewhat larger mindset-change than most people take to it, leading to half-baked implementations?

I can’t help but feel the underlying design has some kernels of some really good ideas, but the volume of available evidence sort of suggests otherwise.


What are people’s thoughts on using event driven architecture in games? Specifically multiplayer games, and massively multiplayer games (MMOrpgs). Another comment mentions it was helpful, how specifically, were there any tradeoffs, do certain types of games work better?


I have not worked on an MMO before, but recently I had the chance to try out my own custom event system on a small multiplayer game (Unity, PUN2). I had most issues with differentiating which events came from which client. Additionally, I had lots of issues differentiating which events were issued locally. In the end, the code ended up being quite messy. If I were to redo the game, I'd use direct method calls where possible with regular callbacks.

Generally, I found that when using event systems you have to be really careful not to over-use it, even small/single player games. Its super hard to debug when everything is an event - if you go this route, you essentially end up in a situation where everything is "global" and can be reached from anywhere (might as well just go full singleton mode at that point). Additionally, I found it difficult having to deal with event handlers which raise other events, or worse, async events, as then it becomes really hard to ensure the correct order of invocations.

If you plan to use an event system, my advice would be (in Unity): - Reference and raise events only on root Game Object scripts (e.g., have a root "Actor" script which subscribes/publishes events and communicates with its children via properties/C# events) - Never subscribe or publish events in regular "child" components - Use DI/service locator to fetch systems/global things and call them directly when possible from your "Actors"


I always thought it would be an interesting exercise to build an event-based controls system. There's a lot triggering action X based on event A. And actions based on composite events. I never found anyone who had done it.

Edit- I should say I never saw one in the wild, quick search found some academic projects https://scholar.google.com/scholar?q=event-driven+control+sy...


My 2 cents - There is no anti-pattern specific to event driven. It is essentially asynchronous nature. It means you start with understanding the business needs and SLA. Question often comes in my experience is "can they wait?" and what's the risk of dirty data or data fetched with delay ( worst case ). event driven is always about worst case scenario and will it work then.


The main anti pattern is making the wrong choice; using async when sync fits better.


"Event driven architecture ". Mēh!

There is no avoiding it when dealing with, erm, events.

Events are things that happen that you cannot predict exactly when, where, and what.

The user clicked the mouse

The wind changed direction

Using Events to signal state change from one part of a system to another is a bad idea. Use a function call.

A rule of thumb is if the producer and the consu,er are in the same system then "Event Driven Architecture " is the anti pattern


What if one event should trigger many actions?


Our system at work is a command driven system. We don't use messages as a source of state, but really just as Async instructions. And we store them which can be useful for retrying, data fixes, and stats.

I feel like a lot of teams out there can probably benefit from this simpler approach - it's probably what a lot of people are doing unwittingly.


Event-driven architecture should be implemented across complete system (client-be) or be used in a single feature, i.e. it needs to be all or bare minimum, else it's just an absolute mess.


Went in thinking I would find out a few pitfalls for the event-driven app I'm writing...

> Commands only have a single consumer. There must be a single consumer. That’s it. They do not use the publish-subscribe pattern.

...oops.

Now the question is how much (more) time I want to spend on a(nother) rewrite.


[flagged]


But if you pay them you can be in their private discord...


traffic is just events so like


Eh, it's a Wordpress site.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: