Sadly user management is crap domain for demonstrating Event Sourcing.
Equally building an entire system as Event Sourced is daft. Some aggregates ought to be Event Sourced nominally I’d say models that exhibit temporal properties like a business process or workflow are well suited.
Similarly the most common “issue” I see with Event Sourcing is conflating Event Sourcing and Event Driven Architecture. They can be complimentary but aren’t the same thing. This conflation and leads to a befuddled mess of inappropriate tech choices, and inappropriate consistency models.
It didn’t get into read after write consistency well enough in my opinion, which breaks the event sourcing pattern for many use cases. E.g. in the create user example, there are states in the system where a user could create an account and then reload their page and have the account not be there if the write hasn’t propagated to the database used to satisfy reads.
While I'm no expert in the subject. Shouldn't read after write in event sourcing be nonsensical in terms of correctness? Instead you have to convert that problem into asynchronously waiting for a ACK message that your message had its intended effect. And only then would you ask to read the state?
EDIT: This of course does not cover the optimistic concurrency models in say PostgreSQL where you can effectively begin-write-read-rollback to extract information about a hypothetical state. Sadly I've had to use patterns like that before and there seem to be no alternative to that hack in event sourcing.
I don't quite understand this comment. Are you looking for a confirmation? Is it good enough to just have the most updated data for the state you want to track? I'm curious about what the specific use case was.
if you are waiting for ACK, you could alternatively create another stream that informs you of whatever write you were waiting for and therefore pushes you the most updated value or triggers the read. you can use the correlationID to make sure it's the set of changes that are bound to the original event you're tracking.
the "correctness" issue will crop up in whatever model you use since concurrency is the primary means for scaling a system. wrong order writes will have "correctness" problems with any db.
if you need strict ordering, the solution in ES will probably be the same as in pg-- separate the streams that need to be ordered and optimize them up front as much as you can. then post process when you no longer can't.
if data corruption is the issue, a changeset validation prior to the write probably works better than a rollback. if the validation is bad, catch it and send it to an exception stream you can track. event sourcing allows you to track by event/causation/correlation ids so you'll probably have an easier time debugging that.
The comment is talking about implementing sequential consistency: simplistically, if you have f();g() in a thread, and f() modifies the system's state, those modifications should be visible to g().
That's a distributed systems problem, rather than an event sourcing one. I'm sure we've all done something like comment on a site like HN and not seen our comment appear when we reload. The more distributed the system, the more likely it is we're hitting a stale cache somewhere.
Even the most absurdly reduced system running on a single machine, taking an HTTP request in and processing it fully to completion in all aspects before returning any response, is a distributed system - the browser is at the other end, running asynchronously. The user may tell the browser to reload before the single server has finished processing. When do both the user and the server agree that the account has been created?
To "fix" the problem with event sourcing, just don't add distributed components if you don't need them. Synchronise your "on event" action handlers with your event creation, and don't return success to the command until the handlers have completed.
You can even choose to wrap it all in a transaction so the event doesn't write unless the handlers all succeed, side-stepping the problem of desynchronised views due to handler bugs.
You still keep the (IMO) main benefit of event sourcing: you can define new views you didn't have to foresee and build them from the complete history of the system as if you'd known about them from the start.
With the reload before the server has acknowledged situation, you have’t told the user we have fully done action X, so there would be no expectation it persisted.
Yes, it is a distributed systems problem, but with a pure event sourcing approach as advocated in this article, every action is a potential data race.
Compare this to an application that uses a distributed data store like DybanoDB where read after write consistency is possible, while availability is still quite high. Apps that use it are easy to reason about for user actions, yet you can still use its event log for asynchronous events like sending mail.
That said, delaying acknowledging the write until you know it has propagated to all critical data stores is an interesting way to solve the problem.
Sequential consistency issues can appear in pretty much any system, not just distributed systems. E.g., a consumer thread that processes items from a queue. If you push into the queue and need subsequent read operations to see the processed state, you need to block after push until the item is processed.
Great article. One thing that wasn't pointed out that might be of interest to someone learning about Event Sourcing is that it introduces some challenges if you are to be compliant with GDPR and similar laws. For example, if your event log is immutable, and you use it as an audit log, then by nature you are not ever deleting data. There are solutions to this (for example, crypto-erasure), but it can be non-trivial to implement.
I saw the Akka people talking about this on twitter once, I think they were theorizing that encrypting the data in the log would be sufficient, because then "deleting the key" could be interpreted as the deletion of the record (even though the useless data still exists in the log). But I'm not sure this was ever legally validated?
That is what I was referring to as "crypto-erasure" (also known as "crypto-shredding"). I'm not sure what counts as legally validated, but some have shared concerns that the encryption you use to do this would need to be future-proof against, say, advancements in quantum computing cracking the encryption down the road even after the key has been thrown out.
if the references don't point to actual data, you don't need this.
- minimum 2 parts. a relay (reference hash) and the cold/true storage portion. you can break the reference hash up and reassemble only for secret holders. which brings us to:
- content-based routing
- there are also deterministic vaults for rolling keys
i'm actually not sure in what high-level situation "crypto-erasure" would work in because being able to re-key a reference means you have complete control. so why would you need to erase the "bad" key when you can just switch the reference?
^ both generally use the same concept i described above and solve GDPR "delete" issue. actually, it solves GDPR completely if you can just rely on the DID. "hard delete" is a separate issue, though-- no other way to get around that but to fork/version + replay your store and re-reference anyone who wants to hard delete if you didn't use reference hashes.
Oh, why? Holding one key for every human on earth would fit in 8 TB plus read-mostly replicas. You'd delete their key if they made GDPR removal request and the keyless, encrypted, immutable entries would be entombed in place.
Event: "User A sent Message X to User B". (I think the answer here is again to have PII be in other repositories, and only refer to them by id in the log.)
Don't stick PII in the events. Write it to a dedicated PII db and reference it in the event. Easy to get rid of the person's data. Just delete from the PII db.
I'm pretty sure you can "soft-delete" for GDPR compliance. So this concern is sort of a non-issue.
Besides that:
1. If you're using an immutable structure, as long as you use references, you can obfuscate data. Blockchains ran into this problem before GDPR requirements and that's essentially all they do.
2. ^ "Update" strategy for event sourcing is the same as above. Essentially a copy of the log or the log slice then a re-indexing to remove/update streams, events, projections, etc. Greg Young talks about the indexing internals (for Event Store) in the 2012 video.
Soft delete in the sense of removing availability but keeping the data shouldn't be GDPR compliant from what I've seen. Not in regards to right to be forgotten and restrictions on keeping PII.
Your event log should IMO be nominally immutable rather than actually immutable.
You should feel free to take actions such as expunging private or sensitive data as appropriate. Keep the events, but rewrite them to contain only the desired data. Trivial to implement, and simple.
I'd only worry about stuff like crypto-erasure if you physically cannot alter the past, such as if you have a requirement for non-repudiation or some such. Doing it just for technical purity isn't worth the cost :-)
I think cases where you cannot alter the past (or can only do so with difficulty) are fairly common; backups will easily tend to fall in that category.
Ahh event sourcing... this is the hoarding decease which programmers, and these days "business people" have.
It is also why everybody is doing "big data", why we have privacy leaks etc.
Solution: Think what you'd want with the data, and only then start collecting
Even Sourcing is nice on paper but when having a few service you will feel pain if not do it properly.
We had applications listen to topics on Kafka and can re-play to process the message. All sound goods. When we started to add more topics, we realize we no longer know who own the topic and subscribe to the topic. We no longer feel safe to just drop a topic and have to grep/search around, however lots of these information is configured in environment variable, and sometime pull from our config management system such as Vault/K8S config make it even harder to grep because we have to export data out of these system and grep
I think event sourcing is nice and powerful but hard to done well.
A word of caution for anyone considering an event-sourced architecture. I was on a large government project where the decision was made to use event sourcing, and it was disastrous. It ended up being a big contributor to several years of time and cost overruns.
The reason is that for event sourcing to work, you have to have a pretty good idea of your application's requirements up front. It's simply not conducive to agile development, compared with using a traditional DB. The requirements were constantly shifting, and we were constantly realizing that we had the wrong semantics or structure for various fields, or that assumptions we had made about the coupling of different types of data simply didn't hold. This led to a ton of rewriting and churn on the view code, and required constant decisions on how to handle existing data in the "old" format.
Some of the requirement churn even had ramifications for fundamental architectural characteristics such as support for atomic transactions, so there were several points at which we had to hack locking or other techniques to ensure consistency on top of the event sourcing approach. I do NOT recommend this, it turns everything into a huge mess.
The worst part is, ultimately, the data sizes ended up not being that big. We could have run the whole thing off of append-only tables in a single beefy Postgres instance.
Conclusion: if you are designing systems, you should definitely know what event sourcing is and the benefits it can provide. However, avoid it by default in favor of simpler more traditional models unless they are really infeasible for what you are trying to do. And then, lock down as many key requirements (at the very least, those around consistency and interop with other systems) as possible before charging ahead with implementation.
I mean that conclusion is true of basically every complex system. That's the whole point of hacking together an MVP first, to see what really matters.
However it's silly IMO to use that criteria as an example of why not to do event driven architecture. It's well understood in the eda community that you generally don't start with streaming systems unless you know you need it from the start.
I'm actually quite happy to see that other people are also facing similar problems with event-sourced microservices. The project that I'm on (currently working for a neo-bank) has been trying to use event-sourcing from the get go, and oh god, is it a mess. The project has been going on for awhile and some of the devs thought it a good idea to focus on scalability and all the other metrics that don't matter.
As your data evolves and your schema changes, you'll have a mix of messages of both the old and new schema in the same topic. You now have to change your consumer services to make sure you can handle the new schema as well as the old schema if they are important (think reconsuming in Kafka). Your code then gets really ugly having to handle OldXXXEvent NewXXXEvent and loads of if-else statements sprinkled in the code base. Either that, or migrate your data to a new topic which is a one-off exercise that takes time which you'll then have to do for different environments.
I'm sure event sourcing has its place but I'm not entirely convinced the approach is necessarily better than just plain ol' database which would have saved us more time and get our product out quicker.
This, this and third time this. I have the exact same experience. You should know the business requirements very well before you choose event sourcing and hope they don't start changing drastically.
Your word of caution is important. For some projects it's not initially clear what the right data model and bounded contexts are. And for some projects it's just overhead.
But the converse is also true: for some projects/systems it turns out that it being events first is the _only_ way for it to work. I've encountered that the last few years with systems in logistics that show an integral view across parties.
The way we approached that is that we started with a standard system and refactored to event processing approach when the decomposition into bounded contexts was clear.
So in that sense it's similar to the right way of approaching microservices: start out with a 'monolith' that you split up.
A way to deal with it that I've seen to work with Kafka is to make your messages not last very long (say about a week) and make them explicitly idempotent. So an "old" message being run would not affect the app negatively.
And then you make every producer of messages able to reproduce all the messages it knows about.
So if you have a new service and need historical data, you ask all of your dependencies to resend the data, and existing services should not be affected.
Schema evolution is natural - old messages with old schemas don't last very long, and you can slowly migrate services to new schemas as needed.
If you structure your events that each one holds all state of an entity, you could get removal of data easily for free, as any new message would overwrite old state.
Though the whole thing has its own problems though - especially when you go into large amounts of messages/data. Keeping copies of those around if you make a lot of changes can be quite expensive.
So the whole event sourcing thing looks to me as it needs a decade or two to mature so that tools are built and best practices established. It looks to me as basically a way to keep data for a lot of separate services in a fluid eventually consistent way.
Wonder if Clojure's Datomic is already there. Haven't used it myself but have read promising things about it.
Best practices around this have already been established. Most if not all event stores - which Kafka is not - have a concept called 'position.' You save the position atomically along with whatever you did with the message. Then if you crash, you simply ask for all messages starting from that position. If you have a new service (or a new projection), your position is 0 so you get everything.
That is indeed the case, and this is what Kafka calls unlimited retention topics. This however hampers schema evolution significantly as you're not allowed to make backwards incompatible changes. Or rather if you allow those changes, every service would need to be able to handle every schema throughout time.
If you make the retention to only several days this would mean you can evolve your schema with even breaking changes. Consumers would need to support only schemas that are "active right now". If the retention period is long enough that it is safe to assume all the consumers have acted on all the old messages you can delete those messages. You would need a mechanism to replay them though, if that data is ever needed again, but it would just be in the latest schema.
I feel like people are running into these problems because they want to pretend that a message broker is an event store. I could try to shovel a star schema into MongoDB too, but why would I want to?
Keeping data only in the latest schema is dangerous. We have no idea what data the business will find useful years down the line. By only having whatever is in the latest schema, you may have thrown valuable data away.
Datomic is interesting in the context of event sourcing, because it can be thought of as a highly generalized event store system itself, with each database transaction being an event.
Neither. If Richard's statements were facts they would violate his NDAs. In fact, these statements are falsehoods, and seem designed only to cause damage. Also, he's referring to 2016!
The only "problem" is that in 2016 I was physically assaulted at work, Richard had to leave work soon after, and has been denigrating my work since then. For example, here's a post from 2016, where it appears he had a multi-account conversation with himself:
https://news.ycombinator.com/item?id=13129798
Three years later, he's still unable to get over it and move on. This feels like a kind of twisted sexual investment, so I have chosen not to reply.
The Python eventsourcing library is excellent, and event sourcing is a great idea. The library is open source, so if Richard had discovered a bug that caused "corrupt data", he could have raised an issue on GitHub. But there hasn't been such a bug, and no such issue has been raised. Everybody can look through the mere 56 closed issues in the entire history of this successful project to see how many "corrupted data" bugs there have been (zero). https://github.com/johnbywater/eventsourcing/issues
The library is being used successfully in production. Some users have data stores with millions of stored domain events, and the library still works very fast. I didn't hear about anybody having billions of events yet, but I wouldn't expect much difference in performance, so long as the infrastructure has sufficient volume.
In case the wrong impression is created by listening to Richard's rubbish, SQL and event sourcing aren't somehow incompatible. Event sourcing isn't somehow the opposite of "regular databases". The library works very well with SQL databases, through both Django ORM and SQLAlchemy. It also works with NoSQL databases.
If you see posts like this in future, please ignore them, it's just fake news. My understanding is that Richard has diagnosed mental problems. But perhaps there's a kind of nominative determinism from the shortened version of his name? I don't know. At any rate, I've been keeping a log of these occasions, in case I need to call the police again.
Both. The library was slow and corrupted data and they were using regular databases along with the event-sourced stuff. That would never work, the whole system would have needed to be event sourced for it to have any chance of working.
It was essentially a CRUD app with the need for history of changes (for a few tables), SQL was the correct way to do it.
Thank you for this. I've been warning people off Event Sourcing for a while now. The architecture is the most convoluted, pretentious, redundant and downright soul-crushing. If you see Event Sourcing anywhere, run away. Run far, far away.
Some of worlds most useful and powerful data structures are the projection of an event log. The tables of a RDBMS, the balanced writes of a SSD, and even the classic: double-entry book-keeping.
Even the data stream of a TCP connection is a projection of events, which is why (and how) we can reconstruct them by replaying captured segments.
So just because there are some lousy executions of a general architecture, doesn’t mean we should recoil from the basic idea.
My takeaway is that successful event-sourced structures are crafted for the domain they represent. I’ve developed a couple for my own work, for very specific aspects of an application, and they work well in context.
If your experience has been that a general-purpose ES framework leads to shitty, hard-to-maintain apps, I’d say that’s evidence for the corollary.
Just because a tool is powerful doesn’t mean it’s appropriate. There’s a reason most people should just use a DB and not a raw event log. The issue I have with ES proponents is they seem to all pretend there is no additional complexity that comes with it. I think ES is useful but not always and requires weighing the costs and benefits, and we need to be honest about it.
You could say the same about e.g. lock free algorithms. If you work on them you are usually just playing with some cool technology instead of solving the business requirements.
I've only suggested event sourcing once, and that was for keeping physical warehouse inventory synced with orders. I.e. instead of having like a database table with a "available product count", we made a view that calculated "product stock minus pending orders" on the fly at any given timestamp. Very efficient, simple and easy to debug.
But as a general system architectural pattern it does seem to create more issues than it solves. Just look at the MailService example in the article. Dude, just use a queue...
Good approach. Much of software engineering is applying principals like event sourcing judiciously. Why does the whole system need to be in one architecture?
Simple problem: checking users have the rights to do something before you let them. Do you reconstruct the user account every time you want to check their rights (so, for every action they do)?
No, you should not compute the state you need from the event log on every request, this would be absurd. Your authorization service can maintain its own database (a "view" of the current state), or even an in-memory representation computed at startup, and update it whenever a new event pops up. Alternatively, if you are using Kafka, you can use stuff like KTables to do this.
For this specific case, nothing, however let's say you need to know when the user got a specific scope and from who. You probably can answer this question in SQL if you prepared your database to do it (i.e: a changeset table), however in an event sourcing architecture you would gain this information for free.
However, just because you're using event sourcing doesn't mean you don't have a database with the current state of your entities.
The siblings already said it, but event sourcing requires you (at least for practical purposes) to segregate the read model from the write model using "projections". The good thing is that, whenever you create a new service, you can derive the projections that your service will need from the same source of truth. In this way, you create coupling on the data, but not on the concrete service that owns it.
This is not very different from what a relational database does with redo logs. In fact, in a way, using event sourcing resembles composing a system from the fundamental building blocks of a traditional DBMS, in a distributed way.
That's command-query responsibility segregation. The command stream gives you the nice log of everything that happened, while the read database is easy to query (and to reprogram if necessary).
I don't understand how the uniq service would be able to scale horizontally.
How would you load balance calls to uniq to different servers and make sure there's still coordination to ensure unicity of values ?
Either you keep relying on the logs for replica syncing, but then the service can't answer in a synchronous manners. Or you need some synchronous distributed lock ? But then you still have the problem associated to locking described previously in the same article.
I don’t get it : in the microservice approach, you can have services communicating with each others using event sourcing, but why forcing every service to work with event sourcing internaly ? Any requring transactionnal behavior, or at least transactionnal functions, could rely on a relationnal database to ensure atomicity.
The purpose of microservices seems to me to be able to have different internal architectures for each service. So why getting back a shoving a single one everywhere ?
I agree with you, maybe except about the atomicity part. When you use event sourcing, your source of truth becomes the event log, so transactioning against your local representation of the state does not give you the same guarantees.
Event sourcing is just a nice add-on for event processing systems. Whether we like it or not, async systems are all about events. Persisting and replaying those events is just a matter of convinience.
I think the article is too scattered and doesn't actually discuss how event sourcing works. Key concepts are missing. People who actually want to learn it will end up getting even more confused or misguided-- which seems like the trend with this thing.
Excuse the list formatting. I don't post here that much. Just scroll if the list item is cut off.
TLDR; Use Redux-Saga-- it matches all the bird-eye view event sourcing concepts closely.
What is it?
- Structure: In its simplest form, an append only file (aka a log).
- Events -> { type, metadata } -> {type: "UserUpdated", data}
- Projections -> The read model. Think of them as .reduce() over your log. But because all they need to do is look at the newest event to perform a reduction, they act as “realtime” queries for aggregated data.
- Separation of read and write models aka CQRS
- Read Model: Projection
- Write Model: Event Dispatch
- You can have CQRS without event sourcing (GraphQL, etc) but you cannot have Event Sourcing without CQRS. Event Sourcing is implicit CQRS.
- Separation allows you to scale your reads and writes independently.
- Event sourcing, since it is just a log, allows you to replay your data
- For communication to a service that’s supposed to perform an action for you:
- Command (dispatch) -> PresentTenseVerbNoun (UpdateUser)
- Event (write) -> NounPastTenseVerb (UserUpdated)
The gist of what you do:
- You dispatch events to the event store and create contextual "realtime" data via projections that your services read.
Why do you want to use it?
- It scales and plays well with distributed infrastructure.
- You are already using bits of the concepts if you are scaling or doing logging.
- You have uniform communication between services.
- Event sourcing is extremely good at modeling state in your system. You’re forced to think of state (via events) and how those events “eventually” resolve. If you’re purely on SQL, on the other hand, you need a log or triggers on top of your commits to keep track of state. Eg — Customer goes down the shopping isle and puts an XBOX in their cart. Then they decide to put it back and put a PS4 in their cart. If you had bound that behavior to events, you would be able to run complex projections on them.
- ^ On that note, you essentially get free logging and metrics with Event Sourcing (though you need to build out the projections).
- Event sourcing actually makes writing sequence diagrams to optimize or design your system very easy.
Why don’t you want to use it?
- If it is overly “complex” for a small-mid sized CRUD project.
Misconceptions
- Redux / Elm did not “popularize” event sourcing. There was a small snippet in the “Prior Art” section in the old Redux docs that mentioned event sourcing, but no one was thinking “event sourcing, yay!” as they were using Redux w/ Thunks.
- Redux w/ Redux-Sagas is almost 1:1 the event sourcing model, however. If you want to learn event sourcing, instead of reading the article above, just learn how to use Redux-Sagas. Sagas, in event sourcing terms, is what is more generally known as a “Process Manager”.
- Event Store DB vs PG DB — No need to go one or the other. Use the best of both worlds. ES for your event sourcing, PG for your read model and fully scoped write models.
- “Event sourcing is so much more complex than using ORM” — No. The concepts are pretty standard whatever you use when you get into distributed systems modeling. Event sourcing is actually less complex but the tooling and verbiage we are used to is too highly focused on ORM, PG, etc.
- Acid-compliance and eventual consistency are not mutually exclusive. Eventual consistency does not refer to the DB itself. It refers to the infrastructure. If your infrastructure is not brain splitting and is always “eventually consistent”, everything will be OK for most applications. There will be eventual consistency issues in any large scale system.
- As soon as an event hits an event store, that event is "logged". It won’t be lost.
- Streams in a proper event store are very cheap to create and not computationally expensive.
Things to Note
- Blockchains are event sourcing implementations.
- Read Smart Contracts are essentially “projections”
- Write Smart Contracts are your write model, obviously
- Blockchains replay “events”. That’s what syncing is with wallets. You can see that state changes.
- The big structural difference between blockchains and a regular event store is a blockchain stores the events as a cryptographically verifiable log (the merkle tree).
- The consensus algorithm in PUBLIC blockchains are also very different from your typical event store. Public blockchains use BFT consensus algorithm that’s very slow by design. Hyperledger, with its leader consensus model looks very similar to clustered event stores.
- Redux-Saga matches the event sourcing flow and implementation so closely that it is probably the best place to start.
- If you have really good modeling and event sourcing in place, you’ll start to see that Redux begins to disappear from your frontend; especially if you use GQL and caching.
- Domain-driven Design really helps model an event sourcing infrastructure.
99,99% new people that come to event sourcing and 9/10 of those who have been in the event sorucing already, make the huge mystake of taking event store concepts from other people that took it from other people and in the end that is where everyone fails. even big names like greg and his praised "eventstore" project. if you are new to ES, great, you have no baggage. do not read any technicalities about the event store or try to use any existing library for it(the underlying storage engine does not matter, mysql, postgres, rocks...). come up with your own solution and you will have zero ES problems. why? well, the entire concept of event store that is being flown out there is completely flawed and if you implement it, it will cost you a lot of money and time to unfuck yourself later on.
Equally building an entire system as Event Sourced is daft. Some aggregates ought to be Event Sourced nominally I’d say models that exhibit temporal properties like a business process or workflow are well suited.
Similarly the most common “issue” I see with Event Sourcing is conflating Event Sourcing and Event Driven Architecture. They can be complimentary but aren’t the same thing. This conflation and leads to a befuddled mess of inappropriate tech choices, and inappropriate consistency models.