I used to be really excited about event sourcing but yeah, its usually just over-engineering. It appeals to the nerd in me because its a powerful & clean model - events are immutable, your entire database can theoretically be reconstructed at any point in time by just replaying an event log up to time X. In the ideal form its sort of the highest fidelity version of data storage, throwing nothing away, supporting all ways of querying as materialized views or dependent DBs built on the stream. Its beautiful.
But also we live in reality. DBs use checkpoints because storing/replaying an event log from time 0 would take ungodly amounts of space and time. You deleted or resized a column to save some space? Lol no, the event log lives forever. You wanted to use SQL, a battle tested language to query your data? Lol no, the database is "inside out" so tough luck buddy, you're building the database now. Sure you might have to rebuild compaction, joins, query languages, concurrency control and the other 100 things a DB gives you, but on the plus side that one audit log that you could have built with some glue and a few INSERT triggers in Postgres is now an elegant map/reduce on your 100TB dataset! Yay!
Mad props to the consultants though my man wanted a car you sold him a car factory. Take that money to the bank, get a Lamborghini, travel the world drinking and talking shit at conferences fuck yeah. Fuels getting expensive, throw in some machine learning sauce, get a yacht all I do is win baby.
For non-trivial amounts of data you should combine event sourcing with snapshots - i.e. a somewhat up-to-date materialized view/DB table - so you don't have to start for 0. At that point you can delete older events or move them to cold storage.
A valid question then is: "what do you gain over just using the table"? You gain a well-described model of your business domain, with very clear actions about what should happen on which real-world event, and an audit log. Whether that's worth it depend on your use case.
Secondary benefits include easily being able to share events for new apps or use-cases, business analysts really like them, and making it unlikely things go wrong because data is in an inconsistent state.
Totally agree. The field of software engineering would be a lot better off if we stopped using weasel words like "clean" and "elegant", and discussed concrete things like performance or the ability to produce an audit log.
ES is the C in CQRS though. If you're using it for the Q too, then it's no wonder you'll quickly get in a bad way. It's specifically not designed for that.
As I understand it, ES and CQRS are somewhat orthogonal concepts (although they are indeed often used together). It's perfectly legitimate to implement CQRS without using event sourcing at all.
The only thing CQRS dictates is that a system's read models are separate from its write models.
That's very true, but the benefits of ES come on the write side of the equation. Expecting it to solve read-side problems will inevitably lead to disappointment.
Materialized views are what you should be using, not reconstructing the state from the event log every time. That gives you access to SQL and everything to do with it.
And in reality, the event log does live forever in the real world outside of your system. Attributes of your aggregates from last year are still valid for events related to last year, even if they're now deprecated or no longer in use.
CQRS/ES is about system design that evolves. It evolves in a much cleaner and easier way if you follow it.
Is it perfect? No. There are some gnarly problems related to, for example, GDPR's "right to be forgotten", but that needs to be solved across database backups as well.
I would like to see answers to the question: why move from Crud to event sourcing.
Because I have seen at least 5 moderate projects trying to integrate ES. They all failed in the sense that either people didn't understand the code anymore, huge integration times, performance issues and even complete project cancellation because of all people walking away
Was working on a client's stack and this one critical piece using ES (written by "experts" who had since left the company) was throwing an error every week at 2 AM triggering alerts and chaos.
After investigation I found nobody understood what it was doing or why. I thereby found the safest approach was to delete the ES implementation.
After that everything worked and everyone was happy.
I understand this thread is about dumping on ES... However, I must notice in your case it was an audit log. An ES system wouldn't work at all without access to events.
Oh yeah, I've run into similar situations so many times. A critical piece of functionality was overnegineered and shipped, only months later did we find out that it never worked, and nobody found out because nobody used it.
We were modeling well dynamics. And to calculate interesting properties of the well we needed sensor data, the previous state of the well, and a model of the well built by a petroleum engineer.
Unfortunately the model of the well wasn't always up to date (pump was replaced, etc..), so we needed to be able to replay all of the sensor data into the well when a model was retroactively changed. Event sourcing was such an easy pick because our events were very simple (15 fields of data) and our replay requirement was a showstopper.
it sounds pretty straightforward to implement with basic crud too, though. you'd just need an inserted timestamp field so that you can rerun the analysis with
I think GP mixing up Stream Processing with Event Sourcing [1]. Event Sourcing is usually used in complex business domains, Stream Processing in simple domains, but with large amounts of data (i.e. IIoT).
if you use a timeseries databases like kdb+/q, TimescaleDB, InfluxDB, QuestDB, etc., you don't even need to order by the insertion timestamp.
Unlike RDBMS/SQL where rows/tuples are unordered, in timeseries databases they're usually ordered by the insertion time, more similar to dataframes than to SQL tables.
--
[1] Martin Kleppmann — Event Sourcing and Stream Processing at Scale
That's bullshit. I've deployed a large CQRS/ES system that processes hundreds of thousands of business events each day.
The entities/aggregates have clearly defined state machines and the commands/events that change their state are clearly documented and simple.
The system is easy to understand, the business people understand it because we used their words to describe it.
We were able to model it and workshop it with them by literally having them each take on the role of one of the aggregates and pass paper "commands" and "events" between each other.
CQRS/ES is actually the way the world works. If you can't imagine a bunch of bureaucrats from Brazil, all sitting in a large room, passing files and paper forms between each other, then you're not understanding the actual work.
Two examples of systems that use a variant of event sourcing:
- relational database systems (their journals and async replication in particular)
- git
Perhaps not so shit. Perhaps a bit practical. It's just a bad fit for some problems and architectologists like to say it's good for everything for obvious reasons.
And it's should be pretty obvious that architecture doesn't matter at all if it's implemented poorly.
Here's the key, all the systems in which event sourcing works have clear immutable definitions of what an event does / means.
eg. Git ALWAYS applies a commit in a consistent way (and where it doesn't it's a disaster)
eg. If you modify the transaction replay code from one version to the next of a DB, it creates a mess. (and why many databases won't replay transactions from previous versions, or even boot at all, see Postgres)
The problem with event sourcing is that the same bad practices that came from the CRUD system are recreated using events, which makes the system inherently worse because now the state of your system is unstable, instead of just the manner in which you got to the current state.
Imagine if everytime your rebooted your Postgresql cluster and it replayed Postgres 7 transaction logs, even though your running on Postgres 13, and generated an objectively different state, except, now your using CI and every commit from every developer will boot your system into a different state. Also, your database now takes 2 months to start as every transaction gets replayed.
Take rails db migrations for example, the best practice is, every 6 months or so, to just dump the schema from production, create one large initial "migration" and add migrations from there, because eventually they become desynced, and you can't cleanly replay your migrations into a schema that mirrors prod.
If you are extremely careful, and follow a bunch of best practices you can, but in the general case it costs less to just dump schemas.
Git doesn't really store file changes as events, rather, it's a long chain of state snapshots, in something like a persistent data structure. Sure, the history is all there, but so is the current state in its most efficient form.
DB transaction logs might be considered event sourcing by some definition, but their use is very different. It's purely a technical trick. As a consequence, logs are truncated as often as possible/reasonable, and you never rerun the transaction log from time zero. Very different from the event sourcing idea to keep events as long as possible.
Event sourcing with snapshotting and deleting old events - like DB transaction logs - is still event sourcing. In fact I'd say it's the only way ES makes practical sense.
I believe ES allows taking snapshots (they call them projections, perhaps?) and removing all logs before the snapshot, which I think is kinda what dbs do?
ES is practical if your use case supports it. I think you only need to look at Citibank's success post 2008 banking crisis to see the successes they've had as a company. When their developers were touring the No Fluff Just Stuff events in 2015-2016 showing their work and how much they were giving back to the projects they were using really puts a stick in your wheel.
Financial systems can really take advantage of it. Your post is hyperbole.
The question is whether it’s not practical because it hasn’t received the three decades of engineering that relational databases have, or because it’s broken in theory.
I think, for one given moment in time, Event Sourcing is the hottest thing you can have.
Evolution is where the trouble's brewing.
You add a column to a table with maybe a default value, all is good.
You add a property to an event's payload, you suddenly have to deal with versioning.
Event Sourcing with its current ecosystem and frameworks mainly add technical complexity with very little added clarity of business.
Modern relational databases are marvels of engineering. But I don't think they would have gotten off the ground if they had needed those three decades to be practical at all.
Prevent lost writes from when two people read-modify-write at the same time. Be able to answer questions like "how did this end up like this" or "what happened to the change I made on xx/yy". Easier sharding, better performance, and avoiding deadlocks, because you're not worrying about database-level transactions any more. Much easier data migrations that you can do in a gradual, rollbackable way. Clear data provenance that lets you know how to resolve inconsistencies. Easier backups. Smoother relationship between live data and analytics/reporting.
Could you elaborate how you solve transaction, double writes , deadlocks.
I mean the consumers do still process messages in parallel right? Or do you force a serial pipeline to avoid handling messages at the same time?
I thought that EV is more about creating entire separate entities/codebase which have zero dependencies.
> I mean the consumers do still process messages in parallel right? Or do you force a serial pipeline to avoid handling messages at the same time?
Generally I follow the Kafka approach of, essentially, shards that are each processed serially. So unrelated messages may be processed out of order, but related messages will be processed serially.
If you don't have a good enough key, or you need to join two streams, then you have to use CRDTs or equivalent. This can be hard, and you will have bugs, but since you retain the original events, you can always fix things up and recover the correct data, whereas when you get an SQL datastore into an inconsistent state you're generally SOL.
I worked on a system years ago before event sourcing was even a term. We built the system on top of a relational database but stored every change and had a generic schema. We did this because we specifically had requirements where we needed to be able to quickly generate diffs between states of the system (differences in time and differences in version). So it was a natural solution to this problem and the benefits for us outweighed the costs. Of course we only used this approach for the part of our system that had these requirements.
I have never implemented ES, although it seems to be useful to have the most simple version of it in place: just a queryable log of versioned events and the ability to replay it.
But if you have the luxury to do consistent CRUD with a relational/graph, ACID database, then a temporal data model might give you much more leverage.
Essentially you think of your records as bookkeeping entries and go from there. You retain the ability to do ad-hoc relational queries, while not having to build core database functionality yourself.
not sure about event sourcing, but replicating your state immutably on write can save your bacon if there is a bug. I try to skip the D and make the U fork in CRUD for systems where the data is very valuable.
classic event sourcing require playing your events in order to calculate state. I think that's a complexity. Just do normal state computation but write it into a new timestamped row
This is the only practical book on event sourcing and cqrs I have ever seen.
After hearing about it all the time on HN I had never seen an actual implementation of it, this book uses JS so its easy to understand for any software engineer.
I'm somewhat skeptical of Wix's engineering principles in general. Back in 2015 they released an article[0] about using MySQL as a better NoSQL database and talked about how joins are bad for latency, and then their example of a "fast query" used a subquery which was the same as to joining the two tables. Did they not realise that the queries would be equivalent?
I'm skeptical of Wix's engineering principles because I've used their product and it's hot garbage, or at least it was 10 years ago. (I'd forgotten Wix existed.)
My first experience with event sourcing was having a new hire at a previous company try to shill it. It was a smaller company and he had previously built a career consulting on ES. Every problem we had he would come up with an argument for why ES would solve it. I soon learned he had zero technical ability, but was great at sounding credible to management. I liked to imagine him as a "seagull ES" practitioner. He flies in, sh*ts ES on everything, and flies out without having to deal with the mess.
Ironically I went on to use a lot of ES prinicpals in building a distributed deduplication engine for my previous startup after I left that company. We used a WAL which we were able to rebuild databases/projections from, ship to replica nodes for read replicas, push to S3 for resilient storage etc. it was an extremely powerful architecture for us. It was also complex and required a special skill set to work on and reason about.
I have to imagine anyone that advocates for "moving from CRUD to Event Sourcing" has recently been sucked into drinking the coolaid, and has no idea about the world of hurt they've signed up for, completely necessarily.
Similar thing happened on a project I joined. Except this person not only forced ES onto the business but their own specific library for it https://github.com/johnbywater/eventsourcing
The business eventually failed to due to this, due to slow implementation of simple features and many other issues with it.
I will never use ES due to this project, it's pointless, anything you can do with it, you can do without it.
For the benefit for readers, of course I didn't force event sourcing on anybody.
The above allegation is utter bollocks, which apparently continues a stalking campaign by an ex-colleague who was forced to leave the company by the management.
This also happened FIVE YEARS ago, so the obsessiveness is disturbing.
I had this team of consultants which sole purpose was to introduce an ES system, it was my predecessor that had decided this. The consultants were “experts” on building this kind of system. Very long story short, it failed miserably with a completely broke system which was hard to test, failed pretty much all use cases and completely blew budget.
According to the team there were many reasons why the project failed, one was they the requirements were complex. This also manifested in that implementing some of these requirements had “close to infinite” amount of combinations. Never understood that but that was one of the explanations. We shut down the project and made a “traditional” solution instead.
My wife was diagnosed with PCOS (poly-cystic ovary syndrome) and we learned that it reduced her chances to get pregnant.
Few weeks later we were sitting against this famous reproductive endocrinologist who were explaining us our options:
- Yeah, going to a fertility clinic is of course an option. However, you should at first perhaps consider a more traditional approach instead.
- What is the traditional approach, some medication?
- People seem to have forgotten how to have sex these days.
However embarrassed we were at that time (BTW he then laughed and saved us from the awkwardness), trying the obvious, easy solution and taking care of problems as they come became a conscious thing afterwards. I love over-engineering, but asking myself and the other stakeholders "what are we actually trying to accomplish here" is the most engineer-like thing I do nowadays.
I'm not suggesting just winging it all the time, but looking for the easiest solution first, not last. That's all.
I feel bad for anyone who got sucked into event sourcing for general purpose applications. Like... I'm burning a candle for you right now.
The event stream stuff makes sense in some specialized use cases (LMAX?) but for general purpose stuff? Oof, no.
EDIT: Also, don't hate me folks, but any article that references Fowler should be looked at with just a little bit of suspicion. Not hating the guy but he's blown quite a bit of hot air in his time.
Probably one of the largest technical blunders of my career (or at least the largest so far!) was pulling a team down into ES hell on a greenfield project.
Turns out, growing an event sourced application is fantastically hard[0]. Ours eventually became a twisted opaque mess which was terrifying to modify due to (a) the sheer volume of code it took to do so, and (b) eventually losing control over who read which events and where as the team grew.
I'm working with a team that's in the process of decomissioning a previous teams attempt at ES. Even the new team wanted to do ES, whilst staring at the remains of the previous attempt. Luckily they haven't.
Usually arguments go "we need an audit log so ES is a natural fit and we'll be able to iterate quicker because we can just re-project when we get things wrong, oh and testing is really straightforward too because we can just...".
You can have an audit log without ES. You're also going to hate versioning events. And navigating your tangled nest of events that you accidentally accumulated in the name of agile isn't the nirvana you were hoping for.
I believe it's better to have a system that responds to the requirements of here and now, leaving history behind, learning from mistakes going forward and accepting that yesterday is done.
In my experience, in growing software systems, you're more likely to have not to have even had the data/event at all, than to not have misappreciated what you already had. So event logs feel more like a bunch of baggage ready to trip teams up than a wealth of value.
Event sourcing could be a great architecture for general purpose applications - if we had good frameworks and specialized event stores built for this purpose.
Some features that would be needed:
* Automatic , schema based event and aggregate versioning
* A combined event and aggregate data store with ACID guarantees, schema validation and indexes
* event migrations (so you don't have to carry around old event versions forever)
* Event stream migrations that allow re-writing streams for purging data, gdpr cleanup, etc
* Copy on write based stream clones that can be used for testing with production data
As it stands, such a solution is nowhere to be seen. So you have to cobble together a working system with lots of custom work , which often ends in a complete mess.
If you declare the business process you want to support - either in code or in BPMN, you're automatically mapping out the events that are relevant to the business.
Having a system that takes care of executing this, and tying the pieces of this workflow together is one example where ES can shine IMHO.
I second this wholeheartedly because this happened to me.
I'm the kind of person who learns by my making my own mistakes (as opposed to learning from others', which is what smart people do)—and boy, did I learn a lot from the mistake of trying to build a general purpose application with event sourcing.
Theoretically, event sourcing is the way all applications should be built. I was first taken by this methodology by Martin Kleppmann's talk "Turning the Database Inside Out" (2014)[0]. Looking back, I think I had always subconsciously felt like there was something wrong with mutating state in place. You lose information. You lose accountability. After all, "accountants don't use erasers," and Martin's talk validated many of my obsessive-compulsive insecurities about data. As Charity Majors points out on Twitter [1]:
> ...while storing raw events may be "expensive" compared to any single optimization, all optimizations can be derived from raw events. The reverse is impossible.
And that's true. The table-stream duality [2][3] means that if you store events, you can later project those events into the exact perfect table for any use-case. But if you transform those events into a projection and store that projection, well, whatever table you store is data you have. You wind up having just an aggregate.
Unfortunately, the technical costs you incur when you attempt to store events instead of tabular data are tremendous. The problems and questions that arise are basically endless. Tons of things you wouldn't even think about with a relational database become complicated design decisions [4]. If there were tooling and frameworks that made this easy, it would be one thing. But, there are not!
Kafka Streams is very cool. If you already have a running business with well-developed models and a large team, event sourcing with Kafka Streams could possibly go quite well.
But if you need to get an MVP off the ground, it's just not what you need. What you need is PostgreSQL, perhaps with something like Hasura or postgraphile in front of it. What you need is a lightning-fast way to iterate and flesh out your idea. If that's you, and you're reading this, and you're smart: learn from me—event sourcing: don't do it!
I believe ES has it’s uses in discreet systems doing their thing for you.
For me ES makes perfect sense in systems handing a workflow for you. Products like Zeebe[1] and Ubers Temporal[2] comes to mind.
These systems basically becomes your business transaction log and as such ES is just a great fit.
(I have no idea whether ES is used in temporal though… never looked at the code)
hi! I work at Temporal. ftr our founders left Uber 2 years ago and Temporal is an independent startup (that is very much hiring) now.
Short answer is yes, we do, but it's abstracted away so you get the benefits (complete observability, retry/resume from failure) without the downsides (handwriting event sourcing logic and storage).
Event sourcing is an impt implementation detail for us, but it is not something we push onto our users. Still they benefit from it anyway. very much agree with your statement. happy to take followup qtns :)
> learning from others', which is what smart people do
my grandfather had the same words :)
> What you need is PostgreSQL, perhaps with something like Hasura or postgraphile in front of it. What you need is a lightning-fast way to iterate and flesh out your idea.
optimizing my team's incoming stream stack atm, event sourcing seems so enticing on paper...
> any article that references Fowler should be looked at with just a little bit of suspicion
Indeed. I don't doubt he's a very clever guy, but all of my most miserable gigs involved phalanxes of architecture astronauts endlessly bike-shedding every design session with the platitudes of Fowler et al.
> any article that references Fowler should be looked at with just a little bit of suspicion. Not hating the guy but he's blown quite a bit of hot air in his time.
I've just started reading some Fowler stuff. What hot air are you referring to? I would like to keep that in mind.
People over-apply his patterns. Some patterns that are useful in niche edge cases suddenly become a ‘best practice standard’, sprayed anywhere and everywhere. My rule of thumb is that you should only use his patterns if you could afford to hire him to code it for you.
One of those books has a rule buried in the middle somewhere that I was was in 60-point text on the first page of chapter one. I've taken to calling it Rule Zero: Don't implement a pattern unless it actually solves a problem.
Not OP, but the number of variants in the events makes the cardinality of the state of the system extremely large. Probably too large to be able to reason about fully. It makes weird bugs in the event schema likely impossible to debug should they occur.
This is exactly why 2-way data binding fell out of favor in modern UI development. Describing the problem as "the cardinality of the state of the system extremely large" is a succinct way of putting it. The non-linear increase in possible state quickly makes testing an uphill battle. And it's over when you can't have a comprehensive testing strategy, forget about it.
Eh, two-way binding is still alive and well it’s just implemented explicitly on top of one-way bindings. It seems most developers find this easier to reason about but the relationship is still there.
Like if all you’re doing is using the event to take the value from the UI and shove it into the model with setState and re-render then you made a two-way binding.
Wouldn’t it be helpful if you could tell React to just do this for you and save you the boilerplate? Boom you invented ng-model.
Can this be mitigated by ensuring that historical/legacy events are all transformed to more modern equivalents? Similar in theory to https://stripe.com/blog/api-versioning ? Then you at least have a chain of testable transformations and a reduced cardinality for the actual interactions that reflects the cardinality of all the ways your business users actually use your product today.
Of course, this can be lossy, but is it any more lossy than the non-event-sourced CRUD system which is essentially transforming historical events into "overwrite these fields" events that can only be used for their original purpose?
To me, the reason not to use event sourcing is more that the tooling and developer experience is just really poor - and so decisions like the above, while valuable to think through directly, rapidly become overwhelming. Visualizations of system state are only as good as the code you bring yourself. I'm hopeful that people building tools on top of things like https://materialize.com will start to change this though!
The first rule of data modelling is that you should model only what you care about. A good model allows you to store and retrieve what you need to meet your requirements.
Maybe you care about all the events that have ever happened on your incredible journey to reach a list of the customers you have today.
Or maybe it would be sufficient to have been mutating a customers table all along. If that is sufficient, then it would be faster and more straightforward to have done so.
This is why I always like to have a `status_history` or `audit` table. You don't have to have event sourcing to still want a trail of who changed what in the system when.
Event sourced systems are actually great, as long as they aren’t branded as such. Most of the time it’s totally stupid and the state of the system isn’t reproducible anyway unless the event code is never modified.
It is striking that this thread attracts so much negative commentary, while a similar thread 49 days ago is the exact opposite. It may be interesting to have a look and compare:
Well, this one is about moving from CRUD, that is the most mainstream paradigm on software development into what is solution to a very niche problem, while the other is just about the solution.
This article isn't really pushing people into an overengineered and disastrous situation only because it doesn't say everybody should follow it. But it's a really bad article for everybody to follow and people are reacting like if it was recommended to them.
I've done a few Tech DDs on companies who have very CRUD-like operations and oddly chose to CQRS. Mind you, these are software companies doing about $3~$10M in revenue after 10 or so years (i.e. not rapid growth, but still valuable).
Honestly, my big question is...why? I can find thousands (millions?) of CRUD and MVC developers in pretty much any major language, but how many developers can I find to write CQRS based code?
The answer is in your question. Because event sourcing is a niche skill, devs see it like playing on a higher level of a computer game.
But where it’s really needed, nobody would choose a novice. So novices learn to do event sourcing on systems that don’t need it, and proceed to make a huge mess. Most companies don’t have any kind of architecture review to prevent this happening - they have managers who naively trust the developers.
Event sourcing (and CQRS in general) is often compared (as comments here are doing) to "simple" CRUD and why add the complexity.
The difference is that event sourcing, aggregates and CQRS force you to actually identify the entities and the external and internal events that cause them to change.
It forces you to do the analysis that you should do anyway, but is often lost or forgotten when it devolves to "simple" CRUD. You end up adding change logging, histories, materialized view updating etc etc, all of which come "for free" if you do the work to establish what you're trying to build.
WALs are not the same as a business level entity/event stream, WALs are event streams for the database, not the business logic you're using the database for.
Yes, it is more complex, but, in combination with techniques like domain driven design and creating ubiquitous languages, your business logic will end up actually being simpler and closer to reality.
Derek Comartin does a great job explaining event sourcing on his site. I really like him for being able to easily explain fairly technical ideas like this.
However, like others have said, this should be a tool reserved strategically for ideal situations, not a jackhammer for every task.
As a proponent of Radical Simplicity, I'm always sceptical about event sourcing.
There are some benefits (I found it useful to replay events for example), but where I have seen it at work it makes things much more complex, the understanding of the system and the development of new features takes much longer.
This already starts with a command framework instead of method calls.
A competent dev experienced in event sourcing can build a system that leverages this without taking on so much cost. But… they’re expensive and hard to find. So most experiences of event sourcing look exactly like you describe.
Correct, and many applications are using WAL via CDC (Change Data Capture) to gain some of the benefits of the CQRS/ES.
The problem is that the CRUD is only recording "What" change/mutation was done, but not "Why" by "Whom", and "When" it was done. With tailing WAL / CDC you can also capture the "When".
This can be partially mitigated by adding audit fields like:
Why (reason for change):
created_bc ("because") / created_reason (a person opened an a/c, a new baby born, or a record was migrated from another datastore)
updated_bc ("because") / updated_reason (fixed typo in the address, or moved to new address)
deleted_bc ("because") / deleted_reason (closed an a/c, an end-user died, GDPR request, soft-deleted, removed by moderator, etc.)
By Whom (i.e. user_id, support person, decision maker, etc.):
created_by
updated_by
deleted_by
When (event time) - with CDC you can capture all the intermediate updated_at events:
created_at
updated_at
deleted_at
This captures much more information that a typical CRUD, but it will only save last update_xxx fields, so it still loses a lot of information.
If you use log-based CDC, you typically don't need those timestamps, the WAL will contain and expose that information. It's a different story for more semantical properties like a "user" or a "use case" associated to some change. One way for capturing these intents is to log that information transaction-scoped in a separate table and then use downstream stream-processing to enrich actual change events (which contain the TX id themselves) with that metadata. I've blogged about one way for implementing this using Debezium and Kafka Streams a while ago: https://debezium.io/blog/2019/10/01/audit-logs-with-change-d....
Yup a few fields gets you a long way, though I would recommend an audit log updated by triggers, rather than audit fields.
For example on my CRUD rails app, we use audit triggers in combination with setting a postgres local config variable with the username, the audit triggers pull the user info from the variables and record them into the DB. With a little more work I could probably also pull a backtrace of what line of code caused the trigger. Best part is because its all in the DB, if we rollback a transaction, the audit entries dont commit either.
Because this stuff gets set in the application controller as an around filter it's all completely transparent to the devs. They just make ActiveRecord calls and the database records the queries, user, etc that made the change as well as storing all the changed fields into the audit log. Even if it's one of those update all commands.
I would recommend to rename "Event Sourcing" to "Evil Sourcing". You guy don't even know how many companies went bankrupt just because they tried to introduce this evil technology that makes your app 100x more complex.
This article doesn’t really explain event sourcing and some of the trade offs with event driven architecture in general (additional complexity).
I’ve heard Git is an example of event sourcing. It has all the info for you to be able to determine the current state from scratch if you needed to. I’m not sure why you’d see event sourcing as the next stage of evolution for a CRUD system.
I started my carrier before the industry consolidated around CRUD, and even before RDBMS/SQL became the go to tool. I only had SQL on large machines (VAX-11), not on PCs.
The industry is moving chaotically and sometimes getting stuck in a local minima for a decade. The innovation follows diffusion model, and it slowly builds momentum until the tipping point is reached.
So I do agree, that CQRS/ES might be the next informal industry standard, but we do need tooling, best practices, use cases and success stories.
Maybe we need a special programming language, a DSL, or a framework for it, or a special DBMS.
Another point: it's not that easy to build a correct, scalable and maintainable moderate-complexity CRUD app. Anything beyond a simple blog engine or TODO list tutorial can quickly become a mess. The tutorials are omitting error handling, and that where the juniors are learning from.
Last anecdotal datapoint: when building a moderate-complexity CRUD app, and trying to handle all edge cases/error handling/correctness, I realized that I'm building a poor man's CQRS, so I might do the real thing as well.
I could see this being really powerful _for certain parts of a product or system_ but not the whole thing. Seems to be the overall understanding I get reading some of these horror stories.
It seems the benefits of eventsourcing can be had from using an immutable database. Auditing, Backing up state while the db is running, previous states and diffs.
The loop will close as we once again will realise that even though domain is king of software, it is too different a beast from what the enlightened among us (not me) call "Infrastructure", the serf that actually grows the crop. Now of course, you can make the serf bend over backwards to serve the king, and use forks to work the land, because that's how kings roll; but is it really the best way to work the land?
If you want proper engineering, you should do proper engineering, instead of being a business analyst documenting the domain in code. But that is, like, only my opinion, man.
But also we live in reality. DBs use checkpoints because storing/replaying an event log from time 0 would take ungodly amounts of space and time. You deleted or resized a column to save some space? Lol no, the event log lives forever. You wanted to use SQL, a battle tested language to query your data? Lol no, the database is "inside out" so tough luck buddy, you're building the database now. Sure you might have to rebuild compaction, joins, query languages, concurrency control and the other 100 things a DB gives you, but on the plus side that one audit log that you could have built with some glue and a few INSERT triggers in Postgres is now an elegant map/reduce on your 100TB dataset! Yay!
Mad props to the consultants though my man wanted a car you sold him a car factory. Take that money to the bank, get a Lamborghini, travel the world drinking and talking shit at conferences fuck yeah. Fuels getting expensive, throw in some machine learning sauce, get a yacht all I do is win baby.