There are MUCH better articles out there about eventsourcing, even in Rubyland where its still fairly uncommon. This one is just cargo-culting, poorly.
We rather prefer to have a subscriber who re-runs the projections which are purely functional, and dump their state.
This mandates that you have some way of querying relevant events from the underlying store. Sometimes the best store for this is a DBMS, sometimes it's something simpler like CAS linked lists in Redis or similar (lists of hashes pointing to keys, similar to git's ref/tag model)
Either way, this idea that you subscribe to a stream makes reconnection and place-holding (cursor) semantics difficult, and simply calling as-frequently-as-makes-sense your update function is simpler to reason about, and helps highlight performance issues by making it easier to see the perf of frequently called short lived processes vs dealing with any VM bloat of your choice of language's runtime managing long lived subscribers.
Caveat: naturally this is all very subjective, just my 2 ¢
This is true. It's also a problem if you drop a packet - you need to build up state again on the client side.
If you're writing a book about event sourcing, I should think you'll spend a good deal of time covering trading systems and order books since these are an area where event sourcing (AIUI) has been used for decades.
I'm a bit mystified as to why it's called event sourcing, though. Everything sources events. The difference is that this data model has everything consuming events instead of consuming state.
Those are always the best kind. Real, live, messy, production code. You should open source it. We will learn something
That's a bit like saying Historians know everything, since every known fact is in the past and therefore history :P
I think the point is that the events are kept and can continue to be used as the primary source of truth, rather than used and discarded.
This is an article I have from 10+ years ago and its still roughly the same way I do it now. I've never needed to scale it beyond something like Postgres or Redis.
I poke around in the guts of databases, so that's what I meant. The WAL is ES in the context of databases. So I guess ES is the generalization/abstraction(?).
It's not uncommon to see RDBMS's still used to improve read performance (but not as a source of truth, just a cache with SQL).
Wouldn't it be better to store identifier of previous state and reconstruct the event tree using CONNECT BYs/WITH RECURSIVEs?
A better option is to only include the changes in the event payload, in which case you don't need to know the original state and you can simply project them all in order - and if one makes the next impossible, it's first come, first serve. Again, though, depending on your data model, generating an event payload that only includes the changed elements may also be a pain.
--- edit addition ---
For example there is an error in submitted (not yet committed) invoice for green and blue pens - invoice is for 5 boxes of blue and 5 boxes of green pens, but the courier departed with 11 total boxes, because it was a quick call directly to warehousing. Accounting does not particularly care for pen colour and get request to modify the invoice to total to 11 boxes. Accountant 1 modifies green pens count to 6, accountant 2 modifies blue pens. There is now high chance to sign the invoice with 12 boxes.
I understand that problem in example is solvable with checks and full state changesets, but it is a problem nevertheless. That's why I am proponent of history trees.
Indeed. For postgres it's much better to use https://www.postgresql.org/docs/current/static/logicaldecodi... to convert the WAL into a consistently ordered (commit time, with no issues w/ non-discrete or out of order clocks) stream of changes.
If possible event sequences are linear (or even a tree) then something intelligent can be done about it - maybe attempt to reconstruct dropped event, maybe hold or drop current event until the missing ones are in, depends. It's no candy, but doable. I see problems when possible event sequences form a graph. If you don't have metadata like "logically previous event" (which is not the same as "temporally previous event"), then eventually a situation will arise where you have two event sequences that are intermixed and have no way to tell them apart and then there is no way to apply projections and arrive at consistent state.
Maybe it's the paranoid me talking, though.
A lot of redundant plumbing in the code with classes mapped to other classes. The data structure they chose does not allow inheritance so there are a lot of classes that look exactly alike where sub-classes would be useful.
This notion of Command that generates an Event that generates more Commands that can generates more Events asynchronously with code for eventual consistency. Lovely concept but database records don't exist. You can't just query a database. You have to build your views based on playing back your events. I just find it completely confusing.
Frustrating as well because certain things that are trivial in an RDMS end up costing an incredible amount of development time in CQRS+Event Store. I must admit unfamiliarity with the architecture is definitely a factor - except that my predecessors wrote everything from scratch down to the JSON format of the data structure. One mistake and the microservices crash causing the data to be in an invalid state. In order to fix the data, you need to replay all the events - and there are millions of them. Nothing ever gets deleted because a 'delete' is a new event.
> including places where it's not really useful like CRUD.
Probably the most common mistake made by architects doing it for the first time.
> You can't just query a database. You have to build your views based on playing back your events.
Its pretty common to have projections that build database tables so you can do exactly that. It sounds more like the original architects violated (or had poor delineation) of service boundaries, which just creates a tightly coupled set of microservices, which is worse than the monolithic architecture it seeks to replace.
> In order to fix the data, you need to replay all the events - and there are millions of them. Nothing ever gets deleted because a 'delete' is a new event.
Its also common to archive streams periodically and "declare bankruptcy" with an initial state setting event, proceeding forward from there. Snapshotting is also a thing.
You might find this helpful in dealing with some of those issues: https://leanpub.com/esversioning/read#leanpub-auto-status
Eventsourcing requires really, really detailed analysis of your boundaries (and no small bit of experimentation) to get right, even when you've done it before. It should really only be applied to the areas of your application that are going to see a real benefit from it. My proxy for measuring this is "how much money does this group of features make for me?" If the answer is some version of "none" or "none, but its a requirement to have it anyway" then I don't try to apply ES to that area of the application. I never use ES on user records anymore, for example.
I'm sidestepping your actual question, but mostly because your use-case is something I consider to be a bit of a canard. To answer it though, as others noted, "it depends"—I model projections with actors, which means all events DO come through a single process handling events sequentially (and this is a very common way to do it.) But because I've got tons of these actors (one per stream that's actively running), there's no real bottleneck—this is also one way you guarantee ordering within a stream.
Higher up in the stack, at the eventstore level, you'll probably have a pool of processes handling writes and routing events to the correct handlers/projections, but this is an implementation detail that can vary widely. The advice to not get bogged down in the details early on is good—the thing i see almost every developer new to ES do is get mired in the basic "how do i implement this" question, and that's really not what's going to kill your project. Not understanding nearly as much about your business and the software boundaries in your system is where most projects go wrong.
I've heard this rule so often now, but never what exactly it is that you _can_ safely event source. I'm afraid it's something you can only get a feeling for by doing it a few times and making mistakes, but that way of learning is hard to justify in real projects.
>> My proxy for measuring this is "how much money does this group of features make for me?"
That said, i think you're also correct—there's a lot of trial and error in developing your instincts for this architecture. That's also true of monoliths, web MVC, or really any architectural style. Experience, as they say, is that thing you only get AFTER you needed it.
I've also used some of Ben Smith's libraries in Elixir. If you're in .Net, the Orleans framework has some really interesting ideas going on, and in Javaland, Akka and friends are what I'd start looking at.
What is depends on is how much horizontal scale you expect to have. At a small scale, you can use something like postgres as a centralized broker of sequence. A single postgres cluster can sequence a lot of records in a hurry, no problem.
At a larger scale you might need to decompose your domain into multiple, independently ordered aggregates. Now you have multiple choke point brokers, each of which can handle a lot of records, but which don't block each other.
Step it up another notch, you can use a consensus algorithm (paxos or raft) to get a cluster of machines to agree on the state of the sequence, and perform optimizations such as assigning blocks of sequence by shard and node.
I recommend not over building your event storage architecture until you measure your actual needs. There are cheap ways and huge expensive ways, and building too much is an easy way to make your project fail. Also, one of the nice things about a sequenced set of events is that it is pretty easy to replay into your new, faster event storage later once you've had the happy problem of too much success.
SELECT DISTINCT ON (message_id)
ORDER BY message_id, created_at;