I am going on a bit of a tangent here, but I always wondered, are those of you who use absolutely huge event-driven architectures, have you ever got yourself into a loop? I can't help but worry about such, as event systems are fundamentally Turing-complete, and with a complex enough system it doesn't seem too hard to accidentally send an event because A, which will eventually, multiple other events later again cases A.
Is it a common occurence' and if it happens is it hard to debug/fix? Does Kafka and other popular event systems have something to defend against it?
Debugging Windows events is challenging. I once created a reproducible deadlock using events. I had a Windows application with a notebook computer base, so I had to provide for and handle the power save/resume event for a service in the application. This was simple, just ensure that the service stopped and didn't get into a funky state, and restarted cleanly after resume. During testing, I accidentally discovered the power save event was going into a loop or hanging. Fortunately, Microsoft has a tool for this (DebugDiag - C#), and to my amazement landed right on the problem area. This event (like most) can and was firing multiple times..., so had to add some extra locks. Also, the power save/resume code worked 100% on a virtual, it was real hardware where the issue manifested.
It always bothers me that the systems I've worked on have their data flows mapped out basically in semi-up-to-date miro diagrams. If that. There's no overarching machine readable and verifiable spec.
Reg. the "technical" question: Kafka or any log-based message broker (or any message queue) would not prevent you from that. Any service can publish/send and/or subscribe/receive.
Regarding if it's a problem or a regular occurrence: No, really not. I have never seen this being a problem, I think that fear is unfounded.
Yes, it happens. A way to deal with it is carrying some counter on the message metadata and incrementing it every time a consumer passes it along, so you can detect recursions. Another is having messages carry a unique id, and consumers record already seen messages.
Do you consider it a requirement for every message?
Like, the problem sounds bad enough to warrant it. If not, now do you choose when to apply it?
Our architects have a habit of ignoring these kind of issues and when you suggest making things like this a requirement they accuse you of excessive concern!
Is it a common occurence' and if it happens is it hard to debug/fix? Does Kafka and other popular event systems have something to defend against it?