Hacker News new | past | comments | ask | show | jobs | submit login
CQRS and Event Sourcing Intro for Developers (altkomsoftware.pl)
149 points by witek1902 on May 22, 2019 | hide | past | favorite | 52 comments

I've worked on CQRS / ES microservices at scale and have seen how successful they can be at reliability, scalability and performance.

I don't think people are qualified to reject this pattern unless they've spent some serious time working in these ecosystems. It took me a long time with a ton of production experience to evolve my thinking and truly appreciate CQRS/ES.

I've worked in mature CQRS/ES services at scale, I've also helped convert other systems both into proper CQRS, and out of proper CQRS.

Like anything else, it's one tool in the tool bag, but like that giant pipe wrench that you're always looking for a reason to use, it's almost always the wrong tool for the situation. CQRS carries horrific complexity and requires commitment to a handful of golden rules or the entire thing comes crashing down.

> I don't think people are qualified to reject this pattern

I don't think most people are qualified to know what the hell this pattern truly is, much less put it into proper operational use. I've seen plenty of people attempt to use CQRS when they should have stuck to a simple CRUD model. At one point I was adjacent to a team that had built 8+ services to handle one or two trivial business processes.

CQRS isn't a fine wine or stinky cheese. If it smells funny for your case, it probably isn't the right choice.

> requires commitment to a handful of golden rules or the entire thing comes crashing down

What are the golden rules?

Two bigs ones for me:

- Don't use the read side from the write side.

- Each aggregate should own its data (or more abstractly, the basics of DDD, bounded contexts[0]).

[0]: https://cqrs.nu/Faq

Every state change needs to be reflected as event. Does your function need to save something to DB? Then your functionality should create event which will perform the DB save.

> CQRS carries horrific complexity and requires commitment to a handful of golden rules or the entire thing comes crashing down.

As with all software architecture, you need to adopt a concept to the problem at hand: following the "rules to the letter" is hardly useful. Most successful CQRS systems are those that do not follow every rule (e.g. let command handlers return response data make for a much more convenient workflow).

In general yes, but if you say, start using your read data on your write side, then you're in for a very bad time. Likewise you are going to have some very serious timing woes if you violate your domains.

It's easy to read this and go "of course" for a small conceptual system, but in practice even really experienced engineers want to break those rules in production systems - and it's a lot easier to break those rules than fix the system to respect them.

In this specific case I'm talking about e.g. returning errors in case command validation failed. Officially it's supposed to be returned asynchronously through events (and then correlate by the command identifier), but it can be much more pragmatic (and reliable even) to just return an error from the command handler in these cases.

Your writes are always going to return a write model, and that won't account for indexing problems when you don't return an error - so your clients may naively equate a non-error response as a success. If you're the one writing the client, maybe that's not a problem. If you have several consumers who don't know the ins and outs of your system, that might be a very serious problem.

This is the problem with CQRS. The answer to so many of these little tweaks depends on a ton of system knowledge and contracts and promises, and it's really really easy to make the wrong choice.

I don't think people are qualified to reject this pattern unless they've spent some serious time working in these ecosystems.

Fine - but it isn't a matter of rejecting or accepting; but rather of understanding. And if someone isn't "qualified" to reject a certain architectural pattern... then they probably won't be very good at implementing it either.

So on balance I'd rather be in an environment where people are allowed to "reject" what they don't fully understand yet -- rather than (pretend to) go along with it (on the basis of some article they vaguely read about it, or "because X said so.")

I think you are right, though doesn’t your conclusion mean that this pattern, and any pattern that isn’t “simple enough”, is automatically disqualified? I do think that any pattern that is more complex than your business case is always the wrong one.

Like every tool in the toolbox it has a range of proper applications, comes with its own complexities and downsides.

Having tried and failed to apply it shouldn't aautomatically generalize the failure to a problem with the tool, but rather to either its inadequacy to the use case, or its improper use (and this is useful learning as well).

I'm always excited to see posts about CQRS and Event Sourcing. My current project is entirely ES/CQRS-driven and it's been a revelatory experience. I can't understate it. As Charity Majors says:

> And while storing raw events may be "expensive" compared to any single optimization, all optimizations can be derived from raw events.

> The reverse is impossible.[0]

But I have to say: the resources that this site links only served to confuse me. Greg Young may have popularized these concepts, but watching his talks left me with little practical implementation guidance, and Udi Dahan was even worse, much, much worse, in terms of leaving me helpless and confused.

What really helped me was "Designing Event Driven Systems"[1] a promotional book by Confluent that nevertheless has many great sections with practical advise for implementing these patterns. Likewise the CQRS/ES FAQ[2].

And while this post says you don't need Kafka for CQRS/ES, Kafka sure does help. Kafka Streams is the ultimate tool for CQRS/ES. It contains all the primitives you need to do CQRS/ES easily. I am in the process of writing a blog post about my experience and am looking forward to sharing it. People love React/Redux, state as a function of reducers, and time travel debugging on the front-end: there is no reason you can't have all the way down your stack. Kafka Streams makes this possible, and much easier than you'd think.

[0]: https://twitter.com/mipsytipsy/status/1115537408705957888

[1]: https://www.confluent.io/designing-event-driven-systems

[2]: https://cqrs.nu/Faq

I agree, the Confluent team paint a very clear big picture. You may be interested to take a look at Crux [0], a general purpose document database with bitemporal graph queries, implemented as an ES/CQRS system directly on top of Kafka (or a local KV store).

We created Crux because we found ourselves routinely needing bitemporal functionality when building immutable systems that are capable of ingesting and interpreting out-of-order/late-arriving events [1].

So far we have decided against using Kafka Streams to keep our log-storage options very pluggable, but we pretty much implement the same mechanics.

Disclosure: product manager for Crux :)

[0] https://juxt.pro/crux/index.html

[1] https://juxt.pro/blog/posts/introducing-crux.html

Thank you! I look forward to checking this out. I'd say the picture painted by Confluent is much rosier than reality, but it definitely is the clearest, most actionable picture to date.

To be specific, the picture I am thinking about in particular is described in a slide titled "Local consistency points in the absence of Global Consistency" (from this talk: https://youtu.be/VYOMmwkdSig?t=1817), where it shows how each local consistency point is ideally implemented using ES/CQRS. I also agree that the tools Confluent are currently providing us with today do not make this trivial.

Edit: the other big picture Confluent quote I like is "You are not building microservices, you are building an inside-out database" (Tim Berglund, Confluent, 2018) which is a perfect answer to this other quote: "The hardest part of microservices is your data" (Christian Posta, Red Hat, 2017)

We use CQRS and event sourcing for most of our backend services at my current employer (https://m1finance.com); it's generally been a good experience, but we've learned a lot along the way.

My thoughts on the process are:

1. Being able to use a pure/functional event-driven model for the command model, and a standard relational model for queries, is the big payoff for us; we get the best of both worlds.

2. Our model does not encapsulate command-side and query-side updates in a single transaction, nor does it require that they live in the same database; this gives us a lot of benefits for scalability, but it introduces eventual consistency, and not having "read-your-writes" consistency can make things harder for our front-end devs.

A simpler model that does transaction updates to both side might be a win for a lot of teams, and I still wonder if we should have gone that way.

3. We use Kafka for bulk dataflow and inter-service messaging, but not as the internal event store; that gives us some leeway for migrations and surgical edits to the event stream where necessary.

We've found that Kafka's immutability and retention properties do not make it a good fit for a primary source of truth; it's way too easy to "poison" a topic with a single bad message.

4. One thing that none of the books/frameworks do a good job talking about is external side effects with vendors/partners/legacy systems. That's definitely been the single hardest part of our implementation, and we're still evolving our patterns here.

Overall, though, it's been a great direction for us, and we're excited to keep pushing our architecture forward.

I have never understood the effect CQRS seems to have on developers ? It's a pattern that should be treated and applied with absolute cautiosness. --- A command feeds a data store, without ever returning anything back.

Given a query a system returns some relatively useful model back.

Honestly, why is this so interesting ?

Sure, you might want to add some autonomous component(s) that manipulate data before returning them.

A good example of applying CQRS to a small part of a system is e.g

1. Data is constantly being written (commands) into the system, excessively. 2. You want to present a "snapshot" of the data, because trying to do it "realtime" for all your users will demand too many resources, and your system will come to a hault. 3. You create the "snapshot" every 10 seconds, from code in an autonomous component, and then store it as a serialized object in a data store. Like a cache sort of. 4. When the query asks the system, the system loads up the "snapshot", deserialze it, and returns that.

Here you have two independent read and write systems. That's CQRS for you.

Do not apply this pattern with a loose hand.

We use NestJS/CQRS in our property management app. Here's how we implement it:

1. There are 2 event handlers for "writes". One handler writes normalized data in a PostgreSQL DB. The other handler writes denormalized data into Firestore. 2. Our frontend uses Firestore so mutations are reflected realtime in the frontend. We never found a need for the command to return data. There is also no need for complex queries in Firestore since our data is denormalized and optimized for reads. 3. The PostgreSQL DB is useful for reporting and complex queries. Our frontend app displays this data only in the reports area.

So far, I don't see how things can get confusing with this pattern.

Does error handling exist in your application or did you guys take the "happy path" only exists?

What problem does CQRS solve in this scenario?

This is a good example of What CQRS really is. Separate read/write in a system.

It's in fact a very simple pattern. But I wouldn't use it system wide because not everything has to be eventuel consitent.

Isn’t all reads and writes already separated?

> A command feeds a data store, without ever returning anything back.

You are describing CQS (Command Query Separation), a la Bertrand Meyer, rather than CQRS. The only thing that CQRS says is that the read and write paths are different. It does not preclude a command from returning a response - or for that matter being handled synchronously.

> Do not apply this pattern with a loose hand.

Why especially this pattern ?

I would say you shouldn't apply any pattern with a loose hand...

because different patterns have different side effects and hence misapplying different patterns could yield different degrees of impact. Can’t speak for the parent commenter but cqrs can have nasty interplay between side effects. Commonly you would need to build interfaces that are aware of eventual consistency between read/write models, and your data scheme is definitely going to be impacted by the design choice. Versus misapplying an in-code pattern which might be an all-internals refactor to “undo the damage” it would take a lot to walk back a cqrs system.

In my experience, when CQRS is applied to a system, top down, it fails to be a success.

CQRS is not a system wide pattern. It should applied in small contexts and corners of domains.

I don't agree that every design pattern has the same level of impact on a system. By far IMHO.

Using Kappa architecture it becomes a system-wide pattern; and it should be applied in a large context; because thats the point of the architecture ;)

... caution.... absolute caution

I've seen CQRS pop up over and over, but never seen anyone do a real application aside from some pseudo-bank account code. Are there any bigger OSS projects that use event sourcing?

Also the cached version: https://webcache.googleusercontent.com/search?q=cache:jZXfTY...

Used it. Loved it. It does have its issues though.

I was working on the backend systems for electric car charging. When we heard about real-world events happen (start-charging, stop-charging, etc.), we wrote them directly to Kafka. It was up to other services to interpret those events, e.g. "I saw a start then a stop, so I'm writing an event to say user-has-debt". Yet another service says "I see a debt, I'm going to try to fix this by charging a credit card". I guess you'd call the above the 'C' part of CQRS.

But Kafka by itself is not great for relational queries. So we had additional services for, e.g history. The history service also listened to starts, stops, debts, credits, etc. and built up a more traditional SQL table optimised for relational queries, so a user could quickly see where they had charged before.

The issues we had were:

1) Where's the REST/Kafka boundary? I.e. when should something write to the Kafka log as opposed to POSTing directly to another service? E.g. If a user sets their payment method, do we update some DB immediatley, or do we write the fact that they set their payment method onto Kafka, and have another service read it?

2) Services which had to replay from the beginning of time took a while to start up, so we had to find ways to get them not to.

3) You need to be serious about versioning. Since many services read messages from Kafka, you can't just change the implementation of those messages. We explicitly versioned every event in its class name.

Worth it? For our use case, hell yeah.

> 1) Where's the REST/Kafka boundary? I.e. when should something write to the Kafka log as opposed to POSTing directly to another service? E.g. If a user sets their payment method, do we update some DB immediatley, or do we write the fact that they set their payment method onto Kafka, and have another service read it?

I believe the term that's emerging for this issue is "collapsing CQRS" and how you handle this is application-dependent (are you using plain synchronous HTTP requests? websockets?) In my case, the HTTP server has a producer that writes to a Kafka topic and a consumer that consumes the answers. The HTTP request waits until the answer appears on the answer topic.

> 2) Services which had to replay from the beginning of time took a while to start up, so we had to find ways to get them not to.

Kafka Streams local state makes this fast, unless you need to reprocess.

> 3) You need to be serious about versioning. Since many services read messages from Kafka, you can't just change the implementation of those messages. We explicitly versioned every event in its class name.

Yes, this is tricky. In my case, I either add fields in a backwards compatible manner, or rebase/rewrite event topics and roll them out while unwinding the previous version of the topic that may still be in use. The former is obviously the simpler option.

We've implemented it in a E-learning system used in some Dutch High-schools. A few talks we've given at the local Clojure Meetup:

- https://zeekat.nl/news/2015/02/12/event-sourcing-at-studyflo...

- https://speakerdeck.com/helios/2

- https://speakerdeck.com/helios/event-sourcing-cqrs-and-scala...

Right now the system is running with ~800 million events, with roughly 1.5M added per day :)

My current project is totally ES/CRQS from the core to the UI. It's an email service with users, messages, client sessions and other entities. Please see my top-level comment about the joy of implementing it with Kafka Streams. These state of the art is still emerging, so it was a lot of work to figure out what tools to use and how to model things, but once I did, I can't look back update state in a table again.

Most relational database engines use some form of event-sourcing internally.

Makes you wonder why people want to reinvent everything from scratch then.

I'm not totally sure but Git might be one?

I would love to write my application layer with any kind of declarative CQRS paradigm, but I did some research and concluded that today's database technologies don't give us the expressive power to do it. I wrote up my thoughts here:

[1] https://hackernoon.com/data-denormalization-is-broken-7b6973...

[2] https://hackernoon.com/when-logic-programming-meets-cqrs-113...

Event sourcing and CQRS in Elixir with [Commanded](https://github.com/commanded/commanded) is a treat. If you’re interested in these patterns or Elixir, take a look. The maintainer is remarkably friendly and helpful on any issues as well.

I've never used CQRS or Event Sourcing, but I found it very interesting after watching some of Greg Young's videos.

He also has this: https://leanpub.com/esversioning

This blog by Jay Kreps (Linked-in) is also a good read: https://engineering.linkedin.com/distributed-systems/log-wha...

We use the concepts of commands and queries, where commands can't return anything and queries cannot mutate state. No separate stores for each, though. I think that's either more correctly called CQS, though I remember finding sources that it's just as valid an implementation of CQRS too.

Some advantages:

- Command and query handlers being individual (testable) classes is great for general organisation and neatness

- Beats 'Transaction Script' as a pattern any day

- Can wrap command/query execution with various value-add pipelines: logging, timing, caching (queries)

- Can direct queries to use read replicas

Some dirty tricks:

- Service layer (API endpoint controllers) does pre-validation, declaratively where possible, to enable OpenAPI/Swagger documentation of failure modes

- We do really care about whether commands finish or not - so they do run synchronously in 99.9% of cases. They can be put to a queue, but usually the caller wants to know it finished

- Also, they can throw exceptions with error results.. it seemed a trade-off that's worked well enough

Recently had an idea to steal some concepts from serverless leveraging this: could measure command/query performance, resource use (CPU/RAM) by, based on some heuristic, farming out execution of one or the other to a separate process. Commands, queries and query results all being serialisable.

Event Sourcing and separate stores seemed a bridge too far. It definitely wants loads of expertise/experience and careful design, whereas the above is easy as pie to get going with.

Anyways. It's proved for some enjoyable enough development.

Another advantage:

- Compared with a repository pattern or typical fat-fingered ORM, separating command and query models in code can make it easier to let the data store / search engine do the things it has been optimised to do well on the query end (joins, views, etc), rather than over-fetching data and doing too much of the translation in code.

Has anyone tried doing Event sourcing with a redux app?

I'd be pretty interested to see the same "store" working on frontend and backend and every redux event being queued up and (eventually) synced to the server to provide a log of what happened. Obviously this leads to building an analytics system for your product quite quickly.

Redux pretty much is CQRS + event sourcing, as applied using functional paradigms.

A while ago I built a system that would use a React CQRS plugin and provide optimistic concurrency - allowing really fast UIs at the expense of users having to deal with failed commands if a breaking error occurs:


Same for Re-frame in CLJS: https://github.com/Day8/re-frame

I've implemented this pattern a handful of times, and if done in a team I think it manages the complexity of large software projects in a way the traditional MVC pattern simply cannot do. The real value of this kind of architecture is not the reliability, scalability, or performance advantages (even though they are significant), it is the elegence in the way it manages complex systems and the associated business logic. The cognitive gains here are innumerable, and may even increase the velocity of a team provided the underlying infrastructure is in place.

Any recommendations on an example driven/practical book on implementing CQRS/Event Sourcing for web apps?

I read part of an early chapter of a book that taught a lot of these concepts in a practical way. The book isn’t out yet, but will be in the next month or so: https://pragprog.com/book/egmicro/practical-microservices

I think the code examples are out of date so don’t worry about downloading those until the book is officially released. This guy has had some great success stories implementing CQRS/Event Sourcing at his place of work in the last year.

Mark Nijhof has a CQRS book via Leanpub [1].

But I would suggest just read up and watch a little online and explore on your own. E.g any talks by Greg Young. The OP article has good further reading links. The pure concepts are simple. Implementations vary.

[1] https://leanpub.com/cqrs

This should not be used on 99.999% of real applications. You just end up with a shit show.

Being able to roll back writes and reconstruct earlier states is pretty key for any sort of high-compliance application.

That being said, I converted a failed application that needed rollback and history from CQRS to standard CRUD with trigger based operation logging and achieved the same effect with about 1/10th the code. The architecture was a lot more accessible to lower level devs too, greatly reducing maintenance costs.

You can do that with backups and transaction logs.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact