There's more than one flavor of this particular arch, but Event Sourcing in general is simply not very useful for most projects. I'm sure there are use cases where it shines, but I have a hard time thinking of any. Versioning events, projection, reporting, maintenance, administration, dealing with failures, debugging, etc etc are all more challenging than with a traditional approach.
Two of the projects I worked on used Event Store. That was one of the least production ready data stores I've encountered (the other being Datomic).
I see a lot of excitement about CQRS/ES every two years or so (since 2010) and I strongly believe it is the wrong choice for just about every application.
I'd love to read more about the difficulties you've faced, and overcome.
For migration of immutable events, there's a good research paper that outlines five strategies available: multiple versions; upcasting; lazy transformation; in-place transformation; copy and transformation. The last approach even allows you to rewrite events into an entirely new store.
 The Dark Side of Event Sourcing: Managing Data Conversion
CQRS is a Good Thing(tm); for a real-world example of it in work on a not-so-shabby system processing 10B+ transactions/year - see http://ithare.com/gradual-oltp-db-development-from-zero-to-1... .
ES, however, is more controversial. If speaking about "pure" ES (i.e. not having any mutable state, and reconstructing current state from input events all the time) - versioning and potential synchronization failures (and synchronized access is a prereq for event sourcing) will kill it very quickly (and I didn't even start speaking about performance, which is going to be a very serious challenge).
OTOH, if understanding ES just as an ADDITION to classical mutable-state processing - it can be made very useful. Not only ES will serve as a perfect audit, the duplication of information (once in mutable state and once in input events) will allow such things as regression testing, and fixing data problems caused by bugs, within the DB. BTW - with this model, the latter can be done in a post-factum-fix manner and this, unlike "pure-ES" fixes, is not confusing to the readers who already got and stored previous state of the DB (with "pure" ES, after the fix, all the history can change, invalidating all the data which might have been stored by the third parties, and this is really crazy - imagine if your bank statements would change overnight; with a "ES+mutable" model, bugs still can be identified, and effects of the bugs can be found too - and then a separate correcting transaction can be issued against the DB, which is a much better match to a vast majority of existing business processes).
Hope it makes sense :-) (it is admittedly very sketchy, but forum is not a good place to elaborate further)
I've been following your projects on Github for awhile, good work—I don't necessarily agree with all of the design choices but we've built on the eventstore at work and I'm going to be using it on another project in the near future.
Here's the section of their manual related to this.
Akka Persistence is the other JVM ES framework that I'm familiar with. it supports upcasting as well.
None of these are pure as-originally-outlined-by-Fowler CQRS/ES, but I'm willing to suggest they are paradigm equivalent, real world successful examples:
1. Basically all double-entry accounting/book-keeping/core banking systems.
2. Many RDBMSes. Specifically the replayable log structure of transactions.
3. I sell a service that includes two event-sourced data structures mutated by domain-specific commands. They are used for collaborative decision making in sports management. This aspect works very well indeed: the resulting characteristics are intrinsic to ES structures, are (AFAIK) unique in our market, and represent one of our most customer-retaining capabilities.
The only place I'd recommend it these days are where the business views their state as an event stream, maybe finance/stocks. Not developer-forced-events like "customer address updated" or "user email changed".
Even workflow systems I've dealt with, the business doesn't view their state as an event stream. The state is where it is, how it got there is an interesting footnote.
As for "not production ready" with regards to Event Store specifically, we have had zero reported incidents of data loss which were catastrophic failure of the hardware (or, usually the cloud) it was running on.
After the CQRS/ES project failed (I was on cleanup duty) we used a more traditional arch. To handle the audit log we just had a separate table. ("customer" table had "customer_audit" table. Both were written to in transaction. Solved.)
The big issues didn't really have that much to do with the persistent store of events. The bigger issue was the fact that as new features get added to your application, your event payloads change. Well, in order for projection (particularly re-projection in the case of an issue) to work, your code needs to know how to read and process all versions of every event. Of course, there are techniques like snapshotting to give you point in time "good states" so you can eventually deprecate some of these events, but thinking about it is challenging.
Additionally, most folks argue that the event store should be immutable. This is great until some kind of bad event gets in there. Now your code needs to know how read, and discard, this bad event forever (or until a snapshotted point in time).
Finally, projection is not the panacea the evangelists will have you believe. Inevitably there will syncing issues between the event store and the projected database/elasticsearch cluster/mongo instance whatever. And what do you do then? Re-project! But that is not easy :)
- We use Postgres as the event store and all of our other projections are stored in the same database.
- Our app is not distributed. We use a single database.
- Our event store is not immutable. Instead we will run migrations to rewrite the events, delete events, etc. You either have to either deal with the complexity of maintaining two versions of code or the complexity of migrating. I've found the later is a fixed cost (do it once and move on) vs. a variable cost (continue to deal with two versions of code).
- Our commands aren't async. They are executed inline.
- We don't do any snapshotting.
Granted, we don't have a lot of users on our app (~50 active users) but overall this has been a positive experience.
 The Dark Side of Event Sourcing: Managing Data Conversion http://files.movereem.nl/2017saner-eventsourcing.pdf
Storing data as immutable events implies that all data ever generated by your application becomes available to future versions of your application. Writing an application that can handle all the forms of your data across time is obviously more complex but it's a necessary consideration if you decide to go with event sourcing. Unfortunately, you cannot have all the benefits of available and useful unaltered historical data without also putting in the engineering effort to support it.
It uses Elixir streams to provide composable transforms to rewrite, aggregate, remove, and alter serialization format of events.
The one area we've struggled with in the architecture is the query/projection side. Recently we have decided to completely separate the projection logic into a different application that is deployed independently to mitigate some of the issues we have been having.
The main issue we have been running into is a performance bottle neck after restarting the application. Currently, when the application restarts (i.e. after a deployment) the projections may not update until several minutes. Our hypothesis is that if we can run projections independent of the command side of the application they can run indefinitely (i.e. as Spark jobs) since in theory a projection should not be updated. If we need to make changes to a projection then we will deploy another version to run side-by-side with the existing projection until the consumers migrate to the new version.
Even where there are smart, capable, technically-adept folks in-place, it in no way means that they have full command of the implications and the ways and means of event sourcing.
In my experience, event sourcing is useful in most projects - even, contrary to an earlier comment, for the user and user profile concerns.
I've worked with EventStore. I've wanted to smash EventStore out of existence on a number of occasions. On some of those occasions, I was at fault. And on others, I found the user experience (as an implementer) misleading, ambiguous, and ultimately costly. I'm quite frank with James and Greg about what I think EventStore should be achieving, and have been quite public about it in the past, so I won't rehash that here.
I guess I haven't seen the ebb and flow of excitement, though (since 2007-ish when I started crossing paths with Greg and Udi).
I do see a steady rise in awareness of it since then, and I see an increasing number of successful projects. There's also more people helping other people and shops with implementation guidance and safety.
And there's bound to be more failures - just as a matter purely of numbers - just as there are with any platform or tool where the learning was underestimated, or the grasp of the whole was overestimated.
Yep, you can fail with event sourcing. You can fail with Rails. It will take you longer to fail with Rails, though. And for an organization that isn't into the learning as a matter of course, failing over a longer term can be an important and empowering strategy.
I have two sets of customers: those whom I help slow the looming failure of their Rails projects, and those whom I get started on event sourcing. It's not always a good cultural fit. But I have yet to see a domain that's worth the expenditure of a software development team that is somehow naturally inappropriate for event sourcing.
Conceptually it wasn't really a fit with the team or our application either. The query model and direct reading from the storage medium seemed silly - why not just use underlying storage medium directly for everything? (Besides, now you have to administer both Datomic and Cassandra/Oracle whatever) Not worth it for the obtuse query approach and limited value of "never forgetting" imo :)
While I personally would love to use Datomic for just about everything. The fact that I need at least three machines going for the simplest app (1 actual database, the peer and the transactor) and that those machines can't be the cheapest machines (you need enough ram to store datoms in a cache on the client, or things will be slow), Datomic is something I can rarely afford in practice.
Probably one the nicest systems I've ever helped build. We had a very good lead though.
CQRS is a highly understandable, almost always reasonable technique to follow when you strip away all the other things that people tend to associate it with.
This is a pretty decent article on that: https://lostechies.com/jimmybogard/2012/08/22/busting-some-c...
That being said:
- Dealing with failures can be better than traditional systems if done properly. For example, we have services where, if they fail, won't bring down the entire system. However, this does require you to be more explicit on how you handle errors.
- I have found debugging to be easier. When an error occurs, we can trace it back to the exact command and the events it generated. This allows us to see 1. the exact state of the system at the time the error-producing command was generated and 2. the exact command that was executed. From this we can easily reproduce the error.
I have covered "versioning events" in my other comment. Please be more specific about "projection, reporting, maintenance, administration". What exactly were the challenges there?
I understand that ES is not a silver bullet but I would like others to have a clear understanding of the tradeoffs to traditional systems.
For example, say you fire an "address change" event. That gets sent to the event store for eventual projection in to a medium you can actually query from (realtime querying of the event store itself is the road to very bad places, I promise). So now your event is sent along, but how do you know when it has been processed/projected? What if there was a conflict with another address change event from somewhere else? How does that bubble back up?
The typical solution is to pass back some kind of "receipt token" so you can poll to see when your event is processed, then do your read from the projected database, or whatever. Of course, this can be made to work, but once you start talking edge cases and the need to support standard UX paradigms, polling for every update and handling error scenarios in this way becomes painful.
This really makes me think you—and the originators of these projects you're lambasting—are working from an incomplete understanding of how to apply CQRS and ES. If you're applying CQRS in a fully async fashion, polling itself is an antipattern. That receipt token? That's what everyone else calls an ID. You know when its been processed because the subscription you should be watching tells you its been processed.
You mentioned changing event payloads in another thread. That's another big code smell to me. In a stable, well-understood domain, your payloads don't change much. If you're applying ES to a domain that ISN'T well-understood, you need to do a LOT of discovery ahead of time, or be prepared to iterate on your data until you do. It sounds like the projects you were on failed on those accounts.
Yeah, ES is hard when you keep trying to treat it like CRUD. Its overkill when you apply it to an easy domain, and its an antipattern when you write CRUD events. So don't do those things.
Yes, some of the systems used subscriptions too, which had their own set of issues.
Additionally, domains are almost never completely understood. Even if they're well understood today, things will change tomorrow. CQRS/ES in your own words is not good when requirements change. Well guess what? That's every system I've ever worked on.
If you've had success building non-trivial CQRS/ES applications I'd love to hear more specifics about how you solved all the other issues I've presented.
> Additionally, domains are almost never completely understood. Even if they're well understood today, things will change tomorrow. CQRS/ES in your own words is not good when requirements change. Well guess what? That's every system I've ever worked on.
It's still possible to design individual components that don't require constant churn of their application state. Most software teams are incapable of this, and event sourcing is not for them, even in domains where it shines (like finance).
In my experience, when teams have solid leadership, you can get your software pretty close to the target the first time you build it. Minor course corrections are straightforward, if sometimes tedious. When the business experiences big pivots, much of what you've built can be reused providing it's modular and does not make assumptions about the overall system. The rest can be discarded.
That's a big departure from the topic at hand, but my point is that if your software isn't modular, event sourcing in particular will amplify the pain you feel.
I recommend that anyone thinking about doing CQRS/ES find someone who is an expert to help guide them or their team.
Well, when you base an argument on a set of known antipatterns, you shouldn't feign surprise when someone points out that you're basing your argument on known antipatterns.
>Additionally, domains are almost never completely understood. Even if they're well understood today, things will change tomorrow. CQRS/ES in your own words is not good when requirements change. Well guess what? That's every system I've ever worked on.
The first point is flat out untrue. There are domains of expertise with literal centuries of knowledge and practice in them. There are many many more with decades. And many manys more with years. Startups measure knowledge in weeks and months. This is not a suitable playground for ES.
Secondly, I didn't say CQRS/ES was unsuitable when requirements change. I said it required a lot more work when the domain was not well understood—and that the work was primarily in understanding the domain.
I've used some combination of these patterns on nearly every system I've worked on for the last 7 years. That spans medical billing, ticketing, public health, the wedding industry, and for the really esoteric, voting software for college life organizations. Here are the rules I've found:
* Keep it simple. Do not try to apply ES to all areas of your software, if you apply it at all. Use it within small bounded contexts, and guard the data from other BC's. The minute you poke a hole in the BC's data store, you've guaranteed yourself headaches down the road. This means don't try to make your user model something that's ES-based unless you're building an LDAP server or similar.
* CQRS does not require ES. ES does not require CQRS.
* on-demand projections are fine for a lot of purposes, learn to tell when you're going to need a static projection. Key indicators are reporting, background use, and expense of the projection. This is not a complete list of indications.
* a projection is part of a BC. Don't go querying other BC's at runtime for their data. If its important to the projection, establish a public contract on the events from the other BC, listen to them, and store the data independently. Yes, its duplicated, that's fine. YMMV.
* do not try to back ES into an existing application, unless you're a) rebuiding an entire feature silo from scratch; b) building an entirely new feature from scratch; c) there is no C. Its tempting, I've tried it, but your best value for time is to refactor into something more modular, which is the 80/20 value of it.
* If you're going to go async, go async. build that expectation into your UI. the pain of dealing with async commands comes from figuring out how to get feedback on them. Its a command; there is no feedback. Once it validates, its done as far as the sender is concerned. A failure to fulfill the contract is itself an event, like any other that comes over your event bus. If you build in the facilities to treat it as such from the beginning, your life is much easier.
* Use uuid's for PK's, and originate them with the client whenever possible. This allows for optimistic concurrency and additional commands to be sent before receiving the results of the original command. Also, track command ids/causation ids as part of the metadata for events. Its not always useful to have, but when it is, its very useful to have.
I'm sure there's more to say, but a lot of these lessons are basically common knowledge if you're well-read on the subject. A few of them are just things I've learned the hard way—I've broken damn near every one of them at some point, with regrets. That said, you do this enough and you learn which rules can be broken and when to break them, as with any other kind of expertise.
But ES has saved my bacon more than once. I've used it to back out of a poorly designed CRUD model, report on BI questions for years past, even restore data once when a network partition created a gap of several hours with high-frequency writes. (Chalk that up as a good reason to keep your event store independent of your transactional data store.) Yes, there are headaches to it—to pretend like CRUD doesn't have different versions of those headaches is disingenuous, or simply inexperience talking.
In terms of conflict resolution, it seems like you'd have to clearly define a scenario where a conflict was possible. Based on the write-up, the state of an address would be based on the aggregate of the events that wrote to it. That seems like it would always lead to the last change winning.
From the write up of the system, I actually can't imagine trying to do this in anything other that Elixir/Erlang. The set of requirements and challenges to pull it off would be really complicated on just about any other platform.
So your commands should not be
They should be
For the address, just take the last one. All the business logic I can think makes this ok.
CustomerMoved(fromAddress, toAddress) is a domain event.
(Edit: ok, they matter some, in the way names of variables and apis matter.)
My issue with audit logs in crud systems is that they're almost always at the row level, which is almost useless when you're trying to make sense of the audit log. An audit log of "operations"—i.e. command log—is far more useful, and trivial to implement when CQRS is used. I'm guessing that's what this article details...
There have been zero reported incidents of data loss in stable released versions which were not as a result of catastrophic failure of the hardware on which it was running (though most people's poor understanding of Azure has caused more problems than everything else put together). Furthermore, this should be no surprise given the testing process continually run.
We had no solution for this problem in prod. How would you fix an event that should not have gotten in to the store? (Wrong contract, bad data etc) I understand it shouldn't happen, but in the real world all kinds of things go wrong.
I can't imagine running a massive system on something like event store. Fortunately the projects always failed well before we got in to any kind of production with real users.
If you go and edit an event, how do subscribers receive that edit? Let's imagine I have a projection updating a sql db and you now edit an event,how will this projection receive the edit?
"We had no solution for this problem in prod. How would you fix an event that should not have gotten in to the store? (Wrong contract, bad data etc) I understand it shouldn't happen, but in the real world all kinds of things go wrong."
You should do some more research into eventsourced systems as there are patterns for handling these exact scenarios. http://files.movereem.nl/2017saner-eventsourcing.pdf discusses some. In your scenario the most common is read the problem stream out, write it to a new stream (with any changes that you want) then either delete the old stream or leave a last event in the old stream saying it has been migrated to the new stream.
How would you fix an event that should not have gotten in to the store?
- Replay events and filter/modify the problematic events into a new event store
Fortunately the projects always failed well before we got in to any kind of production with real users.
1. Manually remove the specific problem event from the store, then re-run all events starting from the previous accumulated state snapshot stored before that event? Then you get the new state snapshot just by re-processing all the events. This of course assumes you have state snapshots, and may not be feasible if this is a common occurrence and there are too many events to process.
2. Create a "reverse" event? This will cancel out the bad change and give you a new, valid state to work from and continue. This is nice since the historical state of the system is still represented, but that may be a disadvantage in some situations.
There's one overarching issue I see early on in the adoption process.
The difference between what is being called "traditional" architecture here and event sourcing is in which things that have to be correct from the very start. I.e.: The things that you can change later in traditional vs event-sourced are different. If you don't know realize that this is the case, and proceed with developing an event-sourced system, you'll end up facing decisions that you'd expected to be able to reverse later which are not reversible.
Architecture at its root is concerned primarily with "reversibility" - understanding which decisions can be reversed and which can't and making sure that you get the irreversible decisions correct up-front, and focus design efforts on them.
If you've only ever built systems under a single paradigm, CRUD for example, then you might not even be aware of the differences in reversibility concerns between paradigms.
I know folks who've succeeded wonderfully with event sourcing and I know folks who've failed. I don't know anyone who has failed with event sourcing who were not themselves to blame for the failure. Some of those have refused to consider the possibility that they are the failure's root cause, and then derided others who point that out, as has been done here in this discussion.
Event sourcing is a big leap. There is a lot of presumed knowledge that isn't easily spelled out in bite-sized chunks like blog posts and tweets. You won't learn it (and wield it safely) if you believe that all things should be as readily-consumed as Meteor over Mongo, or Rails, or pick-your-forms-over-data tool.
If you don't work to become aware of the dominant paradigm that shapes your preconceptions, then it will be next to impossible to leverage another paradigm without polluting it with concerns that are counterproductive.
The things that need the application of your intellect in event sourcing are not the things that need the application of your intellect in "traditional" (or any other) architectural paradigm. Square pegs, round holes, etc. But since our perception of the squareness of the pegs is filtered through the lens of our own predispositions, we may not even know that we're in the process of attempting to force-fit things that need a tight fit, and that will cause project failure without the tight fit.
Event sourcing makes things simple - far simpler than ORM, for example. But it only does so when approaching it from its own predispositions, rather than the predispositions (especially unrecognized) of some foreign and impertinent approach.
I find the prospect of working on "traditional" systems depressing now. I find having to solve problems that shouldn't exist in the first place to be soul-crushing. I have an expectation of productivity that is sustained and sustainable over the long haul, and doesn't decrease even as the size of the system/team/complexity and expectations grow.
But, I've been following event sourcing since it's unnamed, prototypical forms in 2006. I started putting it into practice many years later. I didn't expect it to come to me fully formed after watching a handful of conference presentations, and I didn't expect it to come to me as easily as other architectural paradigms and approaches I've used in the past.
I'm still experiencing new realizations about event sourcing. The journey isn't done yet, but I'm far better off than I was.
I've compared notes with colleagues who are also quite deep into the transition and have heard similar observations repeated back to me: I can do an event-sourced system today as fast as I could have built the same system in a rapid-prototyping tool like Rails in years past. However, unlike Rails (for example), my productivity, and my team's productivity remains stable, and changes in scale and complexity of the business and its organization don't induce the panic that it used to.
And my expectation for the quality and the caliber of the implementation has only increased - and dramatically so relative to the implementations I still occasionally see in "traditional" systems.
But had I not had to put some distance between my own mind and its hard-won-yet-entrenched preconceptions I might not have seen the instincts and subconscious mechanisms which surely would have led me to underestimating how and where I needed to focus in order to not build elaborate structures that I would have been crushed by in the end.
If anyone is really and truly interested in digging into event sourcing, and really seeing it through its own lens (by becoming aware of the existing lens we all may have), I'd love to help out. Yes, I make money doing this, and yes, I have a vested interest - but I also spend a lot of volunteer time with devs who are earnestly trying to get to the point where event-sourcing and all of its unfamiliar challenges are just another thing that was learned and facility cultivated over time.
I can also help ween you off a compulsion to adopt event sourcing for a project that you want to do as fast as you can with a paradigm that you do already have a grasp of. Either way.
Ultimately, I find the potential of a future with event sourcing far brighter than one without, but I've had to cross many bridges and dismantle many self-imposed obstructions to get there. I hope that others are open to these kinds of experiences and that we have more profound conversations about event sourcing experience reports from the other side of the chasm.
My 2 cents, anyway.
Could you exemplify what you mean by: `problems that shouldn't exist in the first place`?
I don't have any recommendations for resources. I haven't found any that are as helpful as learning resources as they are helpful for exercising an author's lexicon of superfluous jargon (especially DDD jargon and patterns).
I prefer to teach interactively, through coaching. If I ever find a resource that helps people learn, and that doesn't just load the reader up with distracting vocabulary, it'll be a happy day. We're not there yet.
I have a good deal of sample code at this point, but not yet much in the way of documentation.
But even before trying to get a grasp on this stuff, it has to become clear why reproducing a relational database model in code creates the productivity problems that harm projects in the long term. Until that's a no-brainer, the solutions to this problem as presented by event sourcing won't click. Until the partitioning of an application around "root" objects (or just "roots", if not using OO) is understood, and until the traversal of a web of associations in order to execute queries is understood to be the magnet that draws complexity and obscurity, event sourcing might make no sense.
Until it clicks that ORM is as unnatural an abstraction now as server pages was in the 2000s, event sourcing probably won't matter. And working with event sourcing might create all kinds of problems for not having recognized how to partition a domain. You may just end up with a distributed monolith rather than a service architecture. And at that point, you might just end up blaming event sourcing as a pattern rather than the unconscious importing of anti-patterns from "traditional" development.
For myself, I picked up the necessary precursor ideas over many years and from many disparate sources; integrating and re-integrating new bits of knowledge until I had refined a working understanding.
It was much more involved that learning "traditional" development. You can learn traditional development from blog posts. It's kind of trivial in that way. There are reasons that there aren't three-month boot camps for beginners that are based on event sourcing. But once you grasp it, it'll fundamentally re-write your conceptualization of applicative development.
The event sourcing community needs to do a better job with resources, but it's not there yet. The work is under way, but it's not complete. It will be as some point, hopefully sooner rather than later, but not in the immediate term.
You missed the whole point of CQRS
Sincerely curious about this statement. I understand that CQRS is an architectural pattern, but couldn't a library implement that architecture and then provide ways for you to implement into that architecture? You don't get to pick the architecture nuances at that point, though.
Using facades to encapsulate multiple complex objects behind a simpler interface may be an important of your overall design, but even if you use them it a lot you probably don't need (or want) to build a "Facade Framework" that lets you instantly define new ones with a few lines of meta-code.
Any code you make for reuse by other people is going to contain something more in order to have value to them. For example, "A Facade library for dealing with various cloud services".
Relating it back to CRQS, compare "CQRS Framework" to "A CQRS framework for command-line applications" or "A CQRS framework for websites". The CQRS-ness is a quality that can't stand just on its own.
One thing that absolutely needs library/drivers is persistence of the event store. I say this having built my own as well as contributing to open source ones in the past. It is still a hard and not well solved problem to do this well. Again it can be simple, but in the real world it isn't usually that simple.
Tools are useful, tools are important. If CQRS/ES wants to make it into mainstream it needs to think about tooling. So far I have not been impressed with the people in the community hand waving about this and not having much to show for it.
I agree with this entirely. CQRS/ES applications really help from tailored tooling. Tracking chains of events using correlation and causation identifiers. Auditing commands. Projecting events out to datastores like Elasticsearch. Then using Kibana to create dashboards to drill down into the data. I'd like to go into this in more detail in the future. Revisit this case study after 6-12 months and describe the pitfalls, and countermeasures. Identify tools that allow you to monitor the system proactively.
Would it also be such a hard thing to do if you can delegate the actual persistence to something like a rdbms? What are the typical pitfalls?
The good thing is that you can adapt the event store to your performance requirements and do the simplest thing possible in a huge amount of cases.
I built Commanded as a self-contained, reusable, open-source library. With the goal of demonstrating one approach to implementing the pattern using Elixir. I hope it provides some use. That's why I've written up the case study. Anyone can take the code as is, to bootstrap their own application. Use it as a learning tool, take away the good ideas. Rebuild and improve upon the bad.
I received a pull request only today adding support for Greg Young's event store. That's going to broaden the appeal.