Hacker News new | past | comments | ask | show | jobs | submit login
Building a CQRS/ES web application in Elixir using Phoenix (10consulting.com)
181 points by tortilla on Jan 6, 2017 | hide | past | favorite | 76 comments

I have worked on, or cleaned up, 4 different CQRS/ES projects. They have all failed. Each time the people leading the project and championing the architecture were smart, capable, technically adept folks, but they couldn't make it work.

There's more than one flavor of this particular arch, but Event Sourcing in general is simply not very useful for most projects. I'm sure there are use cases where it shines, but I have a hard time thinking of any. Versioning events, projection, reporting, maintenance, administration, dealing with failures, debugging, etc etc are all more challenging than with a traditional approach.

Two of the projects I worked on used Event Store. That was one of the least production ready data stores I've encountered (the other being Datomic).

I see a lot of excitement about CQRS/ES every two years or so (since 2010) and I strongly believe it is the wrong choice for just about every application.

I'm the author of this article. It sounds like you have some valuable, real-world experience with CQRS/ES.

I'd love to read more about the difficulties you've faced, and overcome.

For migration of immutable events, there's a good research paper[1] that outlines five strategies available: multiple versions; upcasting; lazy transformation; in-place transformation; copy and transformation. The last approach even allows you to rewrite events into an entirely new store.

[1] The Dark Side of Event Sourcing: Managing Data Conversion http://files.movereem.nl/2017saner-eventsourcing.pdf

FWIW: IMO, we should separate CQRS and ES.

CQRS is a Good Thing(tm); for a real-world example of it in work on a not-so-shabby system processing 10B+ transactions/year - see http://ithare.com/gradual-oltp-db-development-from-zero-to-1... .

ES, however, is more controversial. If speaking about "pure" ES (i.e. not having any mutable state, and reconstructing current state from input events all the time) - versioning and potential synchronization failures (and synchronized access is a prereq for event sourcing) will kill it very quickly (and I didn't even start speaking about performance, which is going to be a very serious challenge).

OTOH, if understanding ES just as an ADDITION to classical mutable-state processing - it can be made very useful. Not only ES will serve as a perfect audit, the duplication of information (once in mutable state and once in input events) will allow such things as regression testing, and fixing data problems caused by bugs, within the DB. BTW - with this model, the latter can be done in a post-factum-fix manner and this, unlike "pure-ES" fixes, is not confusing to the readers who already got and stored previous state of the DB (with "pure" ES, after the fix, all the history can change, invalidating all the data which might have been stored by the third parties, and this is really crazy - imagine if your bank statements would change overnight; with a "ES+mutable" model, bugs still can be identified, and effects of the bugs can be found too - and then a separate correcting transaction can be issued against the DB, which is a much better match to a vast majority of existing business processes).

Hope it makes sense :-) (it is admittedly very sketchy, but forum is not a good place to elaborate further)

Good link! Looking forward to reading that.

I've been following your projects on Github for awhile, good work—I don't necessarily agree with all of the design choices but we've built on the eventstore at work and I'm going to be using it on another project in the near future.

Please do feedback your ideas to improve these open-source projects. I'd be interested to find out how you're using CQRS/ES in Elixir.

We've got a channel (#eventsourcing) on the elixir slack—low volume but frequently interesting discussions, please feel free to join us there as well.

Thanks for the link to the research paper, it's a good read. Are you aware of any event sourcing frameworks/datastores that use one (or more) of these strategies to tackle the problem of schema evolution?

Axon Framework is a very mature solution for ES on the JVM.


Here's the section of their manual related to this. http://www.axonframework.org/docs/2.4/repositories-and-event...

Akka Persistence is the other JVM ES framework that I'm familiar with. it supports upcasting as well.


Wow, this succinctly sums up our experience. Fun to develop against, absolute nightmare to support in production.

The only place I'd recommend it these days are where the business views their state as an event stream, maybe finance/stocks. Not developer-forced-events like "customer address updated" or "user email changed".

Even workflow systems I've dealt with, the business doesn't view their state as an event stream. The state is where it is, how it got there is an interesting footnote.

No shit... if you listen to the actual talks, that is EXACTLY what they recommend. Use the patterns where appropriate, don't use it where it's not appropriate. Also consider _where_ the advice came from - finance, gambling, healthcare and so forth.

As for "not production ready" with regards to Event Store specifically, we have had zero reported incidents of data loss which were catastrophic failure of the hardware (or, usually the cloud) it was running on.

One of the projects I worked on was in finance. The fact that we were going to get "a free audit log" from the architecture made everyone so excited.

After the CQRS/ES project failed (I was on cleanup duty) we used a more traditional arch. To handle the audit log we just had a separate table. ("customer" table had "customer_audit" table. Both were written to in transaction. Solved.)

Not production ready - the system, not the database. We never had a problem with the event data stores. The bespoke systems our CQRS/ES rescuees built around it, absolutely.

> I strongly believe it is the wrong choice for just about every application.

None of these are pure as-originally-outlined-by-Fowler CQRS/ES, but I'm willing to suggest they are paradigm equivalent, real world successful examples:

1. Basically all double-entry accounting/book-keeping/core banking systems.

2. Many RDBMSes. Specifically the replayable log structure of transactions.

3. I sell a service that includes two event-sourced data structures mutated by domain-specific commands. They are used for collaborative decision making in sports management. This aspect works very well indeed: the resulting characteristics are intrinsic to ES structures, are (AFAIK) unique in our market, and represent one of our most customer-retaining capabilities.

I've had similar, unfortunate gigs.

Even where there are smart, capable, technically-adept folks in-place, it in no way means that they have full command of the implications and the ways and means of event sourcing.

In my experience, event sourcing is useful in most projects - even, contrary to an earlier comment, for the user and user profile concerns.

I've worked with EventStore. I've wanted to smash EventStore out of existence on a number of occasions. On some of those occasions, I was at fault. And on others, I found the user experience (as an implementer) misleading, ambiguous, and ultimately costly. I'm quite frank with James and Greg about what I think EventStore should be achieving, and have been quite public about it in the past, so I won't rehash that here.

I guess I haven't seen the ebb and flow of excitement, though (since 2007-ish when I started crossing paths with Greg and Udi).

I do see a steady rise in awareness of it since then, and I see an increasing number of successful projects. There's also more people helping other people and shops with implementation guidance and safety.

And there's bound to be more failures - just as a matter purely of numbers - just as there are with any platform or tool where the learning was underestimated, or the grasp of the whole was overestimated.

Yep, you can fail with event sourcing. You can fail with Rails. It will take you longer to fail with Rails, though. And for an organization that isn't into the learning as a matter of course, failing over a longer term can be an important and empowering strategy.

I have two sets of customers: those whom I help slow the looming failure of their Rails projects, and those whom I get started on event sourcing. It's not always a good cultural fit. But I have yet to see a domain that's worth the expenditure of a software development team that is somehow naturally inappropriate for event sourcing.

What do you think about the use of technologies like Kafka which enable what is effectively an event sourced architecture without all of the buzzwords? Not everyone uses it that way, obviously, but there is plenty of discussion about it.

One of the projects did use Kafka as the "event log". There were stability issues with the version of Zookeeper that was used. From a writer/reader perspective Kafka was sufficiently performant when it was up. (The Zookeeper issue was eventually fixed as I recall, but by then the damage was done in terms of political capital spent and lost.)

The big issues didn't really have that much to do with the persistent store of events. The bigger issue was the fact that as new features get added to your application, your event payloads change. Well, in order for projection (particularly re-projection in the case of an issue) to work, your code needs to know how to read and process all versions of every event. Of course, there are techniques like snapshotting to give you point in time "good states" so you can eventually deprecate some of these events, but thinking about it is challenging.

Additionally, most folks argue that the event store should be immutable. This is great until some kind of bad event gets in there. Now your code needs to know how read, and discard, this bad event forever (or until a snapshotted point in time).

Finally, projection is not the panacea the evangelists will have you believe. Inevitably there will syncing issues between the event store and the projected database/elasticsearch cluster/mongo instance whatever. And what do you do then? Re-project! But that is not easy :)

We use a flavor of event sourcing in production (Elixir+Postgres) and I have found many "rules" can be broken while still maintaining the core benefits of CQRS/ES

- We use Postgres as the event store and all of our other projections are stored in the same database.

- Our app is not distributed. We use a single database.

- Our event store is not immutable. Instead we will run migrations to rewrite the events, delete events, etc. You either have to either deal with the complexity of maintaining two versions of code or the complexity of migrating. I've found the later is a fixed cost (do it once and move on) vs. a variable cost (continue to deal with two versions of code).

- Our commands aren't async. They are executed inline.

- We don't do any snapshotting.

Granted, we don't have a lot of users on our app (~50 active users) but overall this has been a positive experience.

We use the strategy of versioned events. Event data is still immutable, but older events are upgraded to the latest structure at read time. This works reasonably well but it is not ideal.

Storing data as immutable events implies that all data ever generated by your application becomes available to future versions of your application. Writing an application that can handle all the forms of your data across time is obviously more complex but it's a necessary consideration if you decide to go with event sourcing. Unfortunately, you cannot have all the benefits of available and useful unaltered historical data without also putting in the engineering effort to support it.

For migration of immutable events, there's a good research paper[1] that outlines five strategies available: multiple versions; upcasting; lazy transformation; in-place transformation; copy and transformation. The last approach even allows you to rewrite events into an entirely new store.

[1] The Dark Side of Event Sourcing: Managing Data Conversion http://files.movereem.nl/2017saner-eventsourcing.pdf

To address the concern with changing "immutable" events. I've just built a migrator [1] tool for my PostgreSQL-based event store. It implements the copy & transform migration strategy. The source event store database is untouched. Transformed events are written out to a separate database.

It uses Elixir streams to provide composable transforms to rewrite, aggregate, remove, and alter serialization format of events.

[1] https://github.com/slashdotdash/eventstore-migrator

I work for a finance startup that uses Akka Persistence to implement an event sourced architecture. It has worked pretty well for us, much better than the original CRUD system. Akka can support a variety of different databases via journal plugins which has allowed us to maintain using a SQL database instead of being forced to adopt an unproven database.

The one area we've struggled with in the architecture is the query/projection side. Recently we have decided to completely separate the projection logic into a different application that is deployed independently to mitigate some of the issues we have been having.


Just curious, what issues did you have that were solved by an independent projection application?

We're currently in the midst of implementing the independent application so we have not definitely solved our projection issues yet.

The main issue we have been running into is a performance bottle neck after restarting the application. Currently, when the application restarts (i.e. after a deployment) the projections may not update until several minutes. Our hypothesis is that if we can run projections independent of the command side of the application they can run indefinitely (i.e. as Spark jobs) since in theory a projection should not be updated. If we need to make changes to a projection then we will deploy another version to run side-by-side with the existing projection until the consumers migrate to the new version.

Do you mind detailing your experience with Dataomic? We're looking at ways to store 6-ary tuples (quadstore + temporality of existence and temporality of observation) of facts and build indices on top of them. I'm hesitant to move to (relatively) obscure data store without a really good idea of where that puts us.

We didn't get terribly far down the road with Datomic. Management/administration of the DB was not for the faint of heart. (At the time the docs were simply terrible, maybe they are better now?) We would see data corruption/loss as well. We weren't doing anything terribly complicated with it, and data loss doing the simplest things was unacceptable. (It's entirely likely we were the cause of the data loss somehow, but we followed the guides and did nothing crazy or weird). So anyways, we just threw it out almost immediately.

Conceptually it wasn't really a fit with the team or our application either. The query model and direct reading from the storage medium seemed silly - why not just use underlying storage medium directly for everything? (Besides, now you have to administer both Datomic and Cassandra/Oracle whatever) Not worth it for the obtuse query approach and limited value of "never forgetting" imo :)

Direct use of the storage medium for everything would negate the advantages Datomic offer. The beautiful part of Datomic is that you can perform an expensive query without having any effect on other peers. Since Datomic also stores datoms in blocks, the n+1 problem is reduced as well. Another cool thing is that for unit-testing, you can just disconnect Datomic from the storage medium, and run everything in memory. Datomic requires writes to be synchronised, which is why you can't go directly through the underlying storage medium for write cases.

While I personally would love to use Datomic for just about everything. The fact that I need at least three machines going for the simplest app (1 actual database, the peer and the transactor) and that those machines can't be the cheapest machines (you need enough ram to store datoms in a cache on the client, or things will be slow), Datomic is something I can rarely afford in practice.

I've worked on a successful CQRS project. It didn't use es though.

Probably one the nicest systems I've ever helped build. We had a very good lead though.

Absolutely. I use CQRS fairly regularly, but I've never used ES on production software and I can't think of a case where it would have been appropriate. It's a serious solution that is not trivial to implement.

CQRS is a highly understandable, almost always reasonable technique to follow when you strip away all the other things that people tend to associate it with.

This is a pretty decent article on that: https://lostechies.com/jimmybogard/2012/08/22/busting-some-c...

I like thinking about applications in terms of read-side and write-side. I'm not super enamored with async CQRS, but I am a huge fan of APIs and data models where the developers realized that how I consume the data is wildly different than how I update it.

Oh yeah, us too. We use CQRS on nearly all projects we do these days. But event sourcing, totally different animal.

Would love to hear about your experience and why CQRS/ES failed those projects, and what type of technical complexities it introduced.

I detailed some of the problems I saw in another thread, but one other thing that gets challenging is the CQRS side (if you choose to keep everything async).

For example, say you fire an "address change" event. That gets sent to the event store for eventual projection in to a medium you can actually query from (realtime querying of the event store itself is the road to very bad places, I promise). So now your event is sent along, but how do you know when it has been processed/projected? What if there was a conflict with another address change event from somewhere else? How does that bubble back up?

The typical solution is to pass back some kind of "receipt token" so you can poll to see when your event is processed, then do your read from the projected database, or whatever. Of course, this can be made to work, but once you start talking edge cases and the need to support standard UX paradigms, polling for every update and handling error scenarios in this way becomes painful.

> Of course, this can be made to work, but once you start talking edge cases and the need to support standard UX paradigms, polling for every update and handling error scenarios in this way becomes painful.

This really makes me think you—and the originators of these projects you're lambasting—are working from an incomplete understanding of how to apply CQRS and ES. If you're applying CQRS in a fully async fashion, polling itself is an antipattern. That receipt token? That's what everyone else calls an ID. You know when its been processed because the subscription you should be watching tells you its been processed.

You mentioned changing event payloads in another thread. That's another big code smell to me. In a stable, well-understood domain, your payloads don't change much. If you're applying ES to a domain that ISN'T well-understood, you need to do a LOT of discovery ahead of time, or be prepared to iterate on your data until you do. It sounds like the projects you were on failed on those accounts.

Yeah, ES is hard when you keep trying to treat it like CRUD. Its overkill when you apply it to an easy domain, and its an antipattern when you write CRUD events. So don't do those things.

I was waiting for the "you're doing it wrong" guy to show up. You win!

Yes, some of the systems used subscriptions too, which had their own set of issues.

Additionally, domains are almost never completely understood. Even if they're well understood today, things will change tomorrow. CQRS/ES in your own words is not good when requirements change. Well guess what? That's every system I've ever worked on.

If you've had success building non-trivial CQRS/ES applications I'd love to hear more specifics about how you solved all the other issues I've presented.

You didn't succeed at building a CQRS/ES system despite several attempts. Why aren't you asking "what am I doing wrong?" instead of presuming that your personal experiences are sufficient to render informed judgement?

> Additionally, domains are almost never completely understood. Even if they're well understood today, things will change tomorrow. CQRS/ES in your own words is not good when requirements change. Well guess what? That's every system I've ever worked on.

It's still possible to design individual components that don't require constant churn of their application state. Most software teams are incapable of this, and event sourcing is not for them, even in domains where it shines (like finance).

In my experience, when teams have solid leadership, you can get your software pretty close to the target the first time you build it. Minor course corrections are straightforward, if sometimes tedious. When the business experiences big pivots, much of what you've built can be reused providing it's modular and does not make assumptions about the overall system. The rest can be discarded.

That's a big departure from the topic at hand, but my point is that if your software isn't modular, event sourcing in particular will amplify the pain you feel.

I recommend that anyone thinking about doing CQRS/ES find someone who is an expert to help guide them or their team.

Maybe they've been asking since 2010 and, not having received a satisfactory solution from the experts in the field, have stripped all the projects of CQRS/ES and gone back to what works well. There comes a point in time where you stop asking and move on, and expecting them to re-ask on HN is a poor presumption on your part, leading to an uninformed judgment.

Well, there's a lot of people who have successfully deployed ES systems at both the large and small ends of scale. So one might ask, after having looked for and received some answers from people who have done this successfully, where did I misapply or misunderstand the advice?

> I was waiting for the "you're doing it wrong" guy to show up. You win!

Well, when you base an argument on a set of known antipatterns, you shouldn't feign surprise when someone points out that you're basing your argument on known antipatterns.

>Additionally, domains are almost never completely understood. Even if they're well understood today, things will change tomorrow. CQRS/ES in your own words is not good when requirements change. Well guess what? That's every system I've ever worked on.

The first point is flat out untrue. There are domains of expertise with literal centuries of knowledge and practice in them. There are many many more with decades. And many manys more with years. Startups measure knowledge in weeks and months. This is not a suitable playground for ES.

Secondly, I didn't say CQRS/ES was unsuitable when requirements change. I said it required a lot more work when the domain was not well understood—and that the work was primarily in understanding the domain.

I've used some combination of these patterns on nearly every system I've worked on for the last 7 years. That spans medical billing, ticketing, public health, the wedding industry, and for the really esoteric, voting software for college life organizations. Here are the rules I've found:

* Keep it simple. Do not try to apply ES to all areas of your software, if you apply it at all. Use it within small bounded contexts, and guard the data from other BC's. The minute you poke a hole in the BC's data store, you've guaranteed yourself headaches down the road. This means don't try to make your user model something that's ES-based unless you're building an LDAP server or similar.

* CQRS does not require ES. ES does not require CQRS.

* on-demand projections are fine for a lot of purposes, learn to tell when you're going to need a static projection. Key indicators are reporting, background use, and expense of the projection. This is not a complete list of indications.

* a projection is part of a BC. Don't go querying other BC's at runtime for their data. If its important to the projection, establish a public contract on the events from the other BC, listen to them, and store the data independently. Yes, its duplicated, that's fine. YMMV.

* do not try to back ES into an existing application, unless you're a) rebuiding an entire feature silo from scratch; b) building an entirely new feature from scratch; c) there is no C. Its tempting, I've tried it, but your best value for time is to refactor into something more modular, which is the 80/20 value of it.

* If you're going to go async, go async. build that expectation into your UI. the pain of dealing with async commands comes from figuring out how to get feedback on them. Its a command; there is no feedback. Once it validates, its done as far as the sender is concerned. A failure to fulfill the contract is itself an event, like any other that comes over your event bus. If you build in the facilities to treat it as such from the beginning, your life is much easier.

* Use uuid's for PK's, and originate them with the client whenever possible. This allows for optimistic concurrency and additional commands to be sent before receiving the results of the original command. Also, track command ids/causation ids as part of the metadata for events. Its not always useful to have, but when it is, its very useful to have.

I'm sure there's more to say, but a lot of these lessons are basically common knowledge if you're well-read on the subject. A few of them are just things I've learned the hard way—I've broken damn near every one of them at some point, with regrets. That said, you do this enough and you learn which rules can be broken and when to break them, as with any other kind of expertise.

But ES has saved my bacon more than once. I've used it to back out of a poorly designed CRUD model, report on BI questions for years past, even restore data once when a network partition created a gap of several hours with high-frequency writes. (Chalk that up as a good reason to keep your event store independent of your transactional data store.) Yes, there are headaches to it—to pretend like CRUD doesn't have different versions of those headaches is disingenuous, or simply inexperience talking.

Based on his article it looks like he's using subscriptions internally rather than polling. That's a fairly natural thing to do across an Elixir application/cluster.

In terms of conflict resolution, it seems like you'd have to clearly define a scenario where a conflict was possible. Based on the write-up, the state of an address would be based on the aggregate of the events that wrote to it. That seems like it would always lead to the last change winning.

From the write up of the system, I actually can't imagine trying to do this in anything other that Elixir/Erlang. The set of requirements and challenges to pull it off would be really complicated on just about any other platform.

Pushing read model updates back to the client using a two way communication channel is one technical solution. I want to experiment with using Phoenix channels[1] to solve this. I think that has potential for easing the UI/UX concerns. You post a command from the web front-end and subscribe to receive updates for the read model you're looking at. Domain events can drive the client notification.

[1] http://www.phoenixframework.org/docs/channels

If you can write your read model into Mongo, then you can use Meteor to build a real time interface extremely quickly; it tails the database log and dispatches updated records to subscribers practically instantly over websockets, no need for the event processing code to know about how to map to frontend queries. We use this for our production CRM, albeit for internal users. No doubt Phoenix would be more performant and support more databases, but it's nontrivial to build the record-to-subscriber reverse mapping that Meteor brings out of the box. RethinkDB was going to be the Chosen One for this use case, alas...


The address thing is normally solved by the fact that you organise your commands by things that should only logically change togther. So a conflict messages won't revert irrelevant fields back to their old ones.

So your commands should not be


They should be

UpdateCustomerAddress UpdateCustomerEmail


For the address, just take the last one. All the business logic I can think makes this ok.

If your events contain the words "Create", "Update", or "Delete", or any synonym thereof, you're modeling CRUD with events and life is always going to be more complicated than it has to be for you. The names of events are data too—make them representative of the domain.

CustomerMoved(fromAddress, toAddress) is a domain event.

CQRS by itself does not imply using ddd or es.

Yeah, fair enough, but if you're not using ES then the names of messages don't matter a whole lot because you don't have to live with them forever.

(Edit: ok, they matter some, in the way names of variables and apis matter.)

Dino Esposito describes an "historical" crud System in a series in msdn magazine https://msdn.microsoft.com/magazine/mt703431 This is basically ES with crud. Not saying ES with crud is the best example, but for data which requires Audit Trail logic it actually works fairly well.

Haven't read that article, but will check it out, thanks for the link.

My issue with audit logs in crud systems is that they're almost always at the row level, which is almost useless when you're trying to make sense of the audit log. An audit log of "operations"—i.e. command log—is far more useful, and trivial to implement when CQRS is used. I'm guessing that's what this article details...

I would be interested to know why you consider ES "not production ready".

There have been zero reported incidents of data loss in stable released versions which were not as a result of catastrophic failure of the hardware on which it was running (though most people's poor understanding of Azure has caused more problems than everything else put together). Furthermore, this should be no surprise given the testing process[1] continually run.

[1]: https://geteventstore.com/blog/20130708/testing-event-store-...

We had no data loss incidents. My concern was around tooling and maintenance of the database. Once you had to start clearing out bad events in dev environments or doing common administrative chores there were many holes. I never did find out how to remove specific problem events. We ended up just purging the store every time something went wrong in lower environments.

We had no solution for this problem in prod. How would you fix an event that should not have gotten in to the store? (Wrong contract, bad data etc) I understand it shouldn't happen, but in the real world all kinds of things go wrong.

I can't imagine running a massive system on something like event store. Fortunately the projects always failed well before we got in to any kind of production with real users.

I can see why your projects failed. That EventStore does not support you editing an event is not something that makes it "not production ready" it is in fact a feature that keeps you from doing stupid things.

If you go and edit an event, how do subscribers receive that edit? Let's imagine I have a projection updating a sql db and you now edit an event,how will this projection receive the edit?

"We had no solution for this problem in prod. How would you fix an event that should not have gotten in to the store? (Wrong contract, bad data etc) I understand it shouldn't happen, but in the real world all kinds of things go wrong."

You should do some more research into eventsourced systems as there are patterns for handling these exact scenarios. http://files.movereem.nl/2017saner-eventsourcing.pdf discusses some. In your scenario the most common is read the problem stream out, write it to a new stream (with any changes that you want) then either delete the old stream or leave a last event in the old stream saying it has been migrated to the new stream.

    How would you fix an event that should not have gotten in to the store?
- Reverse transaction (like in accounting)

- Replay events and filter/modify the problematic events into a new event store

    Fortunately the projects always failed well before we got in to any kind of production with real users.
That the project failed and the above was still an open question for you explains a lot.

Not saying this is great, and I don't have experience with an ES system in production, but... Could you do one of:

1. Manually remove the specific problem event from the store, then re-run all events starting from the previous accumulated state snapshot stored before that event? Then you get the new state snapshot just by re-processing all the events. This of course assumes you have state snapshots, and may not be feasible if this is a common occurrence and there are too many events to process.


2. Create a "reverse" event? This will cancel out the bad change and give you a new, valid state to work from and continue. This is nice since the historical state of the system is still represented, but that may be a disadvantage in some situations.

my answer above ^^ assumes we really want to remove it. compensating actions are generally the preferred option

CQRS/ES requires a different way of thinking and a different set of best practices. This could make it easy to shoot yourself in the foot.

That being said:

- Dealing with failures can be better than traditional systems if done properly. For example, we have services where, if they fail, won't bring down the entire system. However, this does require you to be more explicit on how you handle errors.

- I have found debugging to be easier. When an error occurs, we can trace it back to the exact command and the events it generated. This allows us to see 1. the exact state of the system at the time the error-producing command was generated and 2. the exact command that was executed. From this we can easily reproduce the error.

I have covered "versioning events" in my other comment. Please be more specific about "projection, reporting, maintenance, administration". What exactly were the challenges there?

I understand that ES is not a silver bullet but I would like others to have a clear understanding of the tradeoffs to traditional systems.

I'm curious to hear what experiences you had with Datomic that led you to the conclusion that it wasn't production ready.

One way to understand CQRS is to realize that even the traditional model relies on CQRS underneath, it just so happens that the relational databases hide some of this behind a blocking wait on commit while a background thread reads and flushes the event log (consisting of SQL statements written to a redo log for example). Really what you are gaining by inverting the database like this is a lot more flexibility and simplicity for initial development and subsequent change management (maintenance). The flexibility I speak of manifests in your ability to tweak availability by introducing controllable circuit breakers with availability at the expense of some latency or alternatively putting an upper bound on latency at the expense of some availability (with every tradeoff fully documented and demonstrably adhered to).

Very interesting read, and more so the comments here with all the various experiences. I'm the author of Event Horizon for Go [1] which is currently being developed along side a project that uses it, in this case a CRM-like system where the audit log is a selling point. We still have a lot of unsolved areas but as some have noted it is a very rewarding developer experience. How it behaves in production is still to be seen, but reading here has given me a few good insights to prepare.

[1] https://github.com/looplab/eventhorizon

If the writer of the blog post is here, the link to the Elm language looks like it is missing an http://

I've done a number of event-sourced projects, and have built a good amount of tooling over time to support the work.

There's one overarching issue I see early on in the adoption process.

The difference between what is being called "traditional" architecture here and event sourcing is in which things that have to be correct from the very start. I.e.: The things that you can change later in traditional vs event-sourced are different. If you don't know realize that this is the case, and proceed with developing an event-sourced system, you'll end up facing decisions that you'd expected to be able to reverse later which are not reversible.

Architecture at its root is concerned primarily with "reversibility" - understanding which decisions can be reversed and which can't and making sure that you get the irreversible decisions correct up-front, and focus design efforts on them.

If you've only ever built systems under a single paradigm, CRUD for example, then you might not even be aware of the differences in reversibility concerns between paradigms.

I know folks who've succeeded wonderfully with event sourcing and I know folks who've failed. I don't know anyone who has failed with event sourcing who were not themselves to blame for the failure. Some of those have refused to consider the possibility that they are the failure's root cause, and then derided others who point that out, as has been done here in this discussion.

Event sourcing is a big leap. There is a lot of presumed knowledge that isn't easily spelled out in bite-sized chunks like blog posts and tweets. You won't learn it (and wield it safely) if you believe that all things should be as readily-consumed as Meteor over Mongo, or Rails, or pick-your-forms-over-data tool.

If you don't work to become aware of the dominant paradigm that shapes your preconceptions, then it will be next to impossible to leverage another paradigm without polluting it with concerns that are counterproductive.

The things that need the application of your intellect in event sourcing are not the things that need the application of your intellect in "traditional" (or any other) architectural paradigm. Square pegs, round holes, etc. But since our perception of the squareness of the pegs is filtered through the lens of our own predispositions, we may not even know that we're in the process of attempting to force-fit things that need a tight fit, and that will cause project failure without the tight fit.

Event sourcing makes things simple - far simpler than ORM, for example. But it only does so when approaching it from its own predispositions, rather than the predispositions (especially unrecognized) of some foreign and impertinent approach.

I find the prospect of working on "traditional" systems depressing now. I find having to solve problems that shouldn't exist in the first place to be soul-crushing. I have an expectation of productivity that is sustained and sustainable over the long haul, and doesn't decrease even as the size of the system/team/complexity and expectations grow.

But, I've been following event sourcing since it's unnamed, prototypical forms in 2006. I started putting it into practice many years later. I didn't expect it to come to me fully formed after watching a handful of conference presentations, and I didn't expect it to come to me as easily as other architectural paradigms and approaches I've used in the past.

I'm still experiencing new realizations about event sourcing. The journey isn't done yet, but I'm far better off than I was.

I've compared notes with colleagues who are also quite deep into the transition and have heard similar observations repeated back to me: I can do an event-sourced system today as fast as I could have built the same system in a rapid-prototyping tool like Rails in years past. However, unlike Rails (for example), my productivity, and my team's productivity remains stable, and changes in scale and complexity of the business and its organization don't induce the panic that it used to.

And my expectation for the quality and the caliber of the implementation has only increased - and dramatically so relative to the implementations I still occasionally see in "traditional" systems.

But had I not had to put some distance between my own mind and its hard-won-yet-entrenched preconceptions I might not have seen the instincts and subconscious mechanisms which surely would have led me to underestimating how and where I needed to focus in order to not build elaborate structures that I would have been crushed by in the end.

If anyone is really and truly interested in digging into event sourcing, and really seeing it through its own lens (by becoming aware of the existing lens we all may have), I'd love to help out. Yes, I make money doing this, and yes, I have a vested interest - but I also spend a lot of volunteer time with devs who are earnestly trying to get to the point where event-sourcing and all of its unfamiliar challenges are just another thing that was learned and facility cultivated over time.

I can also help ween you off a compulsion to adopt event sourcing for a project that you want to do as fast as you can with a paradigm that you do already have a grasp of. Either way.

Ultimately, I find the potential of a future with event sourcing far brighter than one without, but I've had to cross many bridges and dismantle many self-imposed obstructions to get there. I hope that others are open to these kinds of experiences and that we have more profound conversations about event sourcing experience reports from the other side of the chasm.

My 2 cents, anyway.

Do you have recommendations for resources about this topic? I would like to learn more about it but most of what I read until now were superficial introductions, word definitions...

Could you exemplify what you mean by: `problems that shouldn't exist in the first place`?

By "problems that shouldn't exist in the first place", I mean complexity and lack of clarity that is the result of presuming to reproduce a relational database schema in an object model. This is the single biggest problem in application software development today, and unfortunately it's also a default mode of development.

I don't have any recommendations for resources. I haven't found any that are as helpful as learning resources as they are helpful for exercising an author's lexicon of superfluous jargon (especially DDD jargon and patterns).

I prefer to teach interactively, through coaching. If I ever find a resource that helps people learn, and that doesn't just load the reader up with distracting vocabulary, it'll be a happy day. We're not there yet.

I have a good deal of sample code at this point, but not yet much in the way of documentation.

But even before trying to get a grasp on this stuff, it has to become clear why reproducing a relational database model in code creates the productivity problems that harm projects in the long term. Until that's a no-brainer, the solutions to this problem as presented by event sourcing won't click. Until the partitioning of an application around "root" objects (or just "roots", if not using OO) is understood, and until the traversal of a web of associations in order to execute queries is understood to be the magnet that draws complexity and obscurity, event sourcing might make no sense.

Until it clicks that ORM is as unnatural an abstraction now as server pages was in the 2000s, event sourcing probably won't matter. And working with event sourcing might create all kinds of problems for not having recognized how to partition a domain. You may just end up with a distributed monolith rather than a service architecture. And at that point, you might just end up blaming event sourcing as a pattern rather than the unconscious importing of anti-patterns from "traditional" development.

For myself, I picked up the necessary precursor ideas over many years and from many disparate sources; integrating and re-integrating new bits of knowledge until I had refined a working understanding.

It was much more involved that learning "traditional" development. You can learn traditional development from blog posts. It's kind of trivial in that way. There are reasons that there aren't three-month boot camps for beginners that are based on event sourcing. But once you grasp it, it'll fundamentally re-write your conceptualization of applicative development.

The event sourcing community needs to do a better job with resources, but it's not there yet. The work is under way, but it's not complete. It will be as some point, hopefully sooner rather than later, but not in the immediate term.

Thank you for sharing your thoughts!

> CQRS library

You missed the whole point of CQRS

Would you care to expand on what "the whole point" of CQRS is, or how they missed it? You seem to imply that the existence or creation of a library to support the pattern is, in some sense, against the spirit of the pattern, which is a) not at all obvious to me, and b) seems like something you should expand upon if you're criticising the article and/or author for HN's civil dialogue guidelines.

I think CQRS is more commonly viewed as an architectural pattern, not a code pattern, despite many definitions floating out there on the internet that focus on command/query object patterns. See this post: http://udidahan.com/2009/12/09/clarified-cqrs/. If we view it as an architectural pattern, it becomes folly to think that CQRS can be distilled into a library.

> it becomes folly to think that CQRS can be distilled into a library

Sincerely curious about this statement. I understand that CQRS is an architectural pattern, but couldn't a library implement that architecture and then provide ways for you to implement into that architecture? You don't get to pick the architecture nuances at that point, though.

IMO it's a bit like talking about a "Facade Framework".

Using facades to encapsulate multiple complex objects behind a simpler interface may be an important of your overall design, but even if you use them it a lot you probably don't need (or want) to build a "Facade Framework" that lets you instantly define new ones with a few lines of meta-code.

Any code you make for reuse by other people is going to contain something more in order to have value to them. For example, "A Facade library for dealing with various cloud services".

Relating it back to CRQS, compare "CQRS Framework" to "A CQRS framework for command-line applications" or "A CQRS framework for websites". The CQRS-ness is a quality that can't stand just on its own.

At its most basic level, CQRS is frankly too simple to take on a dependency as a cornerstone of a project. You can write a single base class that handles all of the responsibilities needed. Its NOT an architecture—its an architectural tool.

Agreed. This is so true. I've never ended up with a "library" for doing this. At my previous employer we had three different CQRS-based systems and two of the three of them even had completely different implementations for 90% of the CQRS infrastructure. They are both successful implementations and there was no need to share that bit of the code, as the two teams doing things came to different conclusions on how they wanted to write their CQRS code. The only thing we shared was the event store and some of the minor things on top of the event store.

This is a guess, but if I were to have a problem with the term CQRS Library, it is that the query and command sides of CQRS really shouldn't be linked in any way. CQRS is a methodology that stresses that decoupling above all else. If your the command and query portions of your CQRS tech stack both fall under a single library, that seems to go against the methodology.

I know Greg personally, and I have heard him say things like "you don't need a library" in the past. However despite the fact that the basics are simple there are a lot of repeatable concepts that make sense to centralize around and even standardize. There is nothing wrong with a library, just as it is possible to do it without.

One thing that absolutely needs library/drivers is persistence of the event store. I say this having built my own as well as contributing to open source ones in the past. It is still a hard and not well solved problem to do this well. Again it can be simple, but in the real world it isn't usually that simple.

Tools are useful, tools are important. If CQRS/ES wants to make it into mainstream it needs to think about tooling. So far I have not been impressed with the people in the community hand waving about this and not having much to show for it.

> If CQRS/ES wants to make it into mainstream it needs to think about tooling.

I agree with this entirely. CQRS/ES applications really help from tailored tooling. Tracking chains of events using correlation and causation identifiers. Auditing commands. Projecting events out to datastores like Elasticsearch. Then using Kibana[1] to create dashboards to drill down into the data. I'd like to go into this in more detail in the future. Revisit this case study after 6-12 months and describe the pitfalls, and countermeasures. Identify tools that allow you to monitor the system proactively.

[1] https://www.elastic.co/products/kibana

> One thing that absolutely needs library/drivers is persistence of the event store. I say this having built my own as well as contributing to open source ones in the past. It is still a hard and not well solved problem to do this well. Again it can be simple, but in the real world it isn't usually that simple.

Would it also be such a hard thing to do if you can delegate the actual persistence to something like a rdbms? What are the typical pitfalls?

Having built a few RDBMS-based event stores - it's pretty easy to do. There are two parts that I can think of that are not completely trivial - appending events to the stream while respecting "expected version" modifiers with optimal concurrency and allowing for fast and light subscriptions to new events (e.g. LISTEN/NOTIFY in Postgres, ringbuffer on the client).

The good thing is that you can adapt the event store to your performance requirements and do the simplest thing possible in a huge amount of cases.

I'm the original author. Any application built following the CQRS/ES pattern requires development of the same building blocks: command registration and dispatch; hosting and delegation to aggregate roots; event handling; long running process managers.

I built Commanded as a self-contained, reusable, open-source library. With the goal of demonstrating one approach to implementing the pattern using Elixir. I hope it provides some use. That's why I've written up the case study. Anyone can take the code as is, to bootstrap their own application. Use it as a learning tool, take away the good ideas. Rebuild and improve upon the bad.

I received a pull request only today adding support for Greg Young's event store. That's going to broaden the appeal.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact