Martin Kleppmann has some great blogs about this as well .
On audit logs in particular, we have a post on our blog, which discusses how to use CDC for that, focusing in particular on enriching change events with additional metadata like business user performing a given change by means of stream processing .
One advantage of log-based CDC over trigger-based approaches is that it doesn't impact latency of transactions, as it runs fully asynchronously from writing transactions, reading changes from the WAL of the database.
Disclaimer: I work on Debezium
I've been thinking about your "transaction" pattern more, and I was wondering - Why don't you stream the transaction metadata directly to a kafka "transaction_context_data" topic? Would writing some of the data to the db while writing some of the data directly to kafka make it less consistent during faults?
The reason I ask: I'm curious what it would look like to use this pattern for an entire application, I think it could be a very powerful way todo "event sourcing lite" while still working with traditional ORMs. Would writing an additional transaction_metadata row for most of the application insert/updates slow things down? Too many writes to that table?
Yes, this kind of "dual writes" are prone to inconsistencies. If either the DB transaction rolls back, or the Kafka write fails, you'll end up with inconsistent data. Discussing this in a larger context in the outbox pattern post . This sort of issue is avoided when writing the metadata to a separate table as part of the DB transaction, which either will be committed or rolled back as one atomic unit.
> Would writing an additional transaction_metadata row for most of the application insert/updates slow things down?
You'd just do one insert into the metadata table per transaction. As change events themselves contain the transaction id, and that metadata table is keyed by transaction id, you can correlate the events downstream. So the overhead depends on how many operations you in your transactions already. Assuming you don't do just a single insert or update, but rather some select(s) and then some writes, the additional insert into the transaction metadata table typically shouldn't make a substantial difference.
Another option, exclusive to Postgres, would be to use write an event for the transaction metadata solely to the WAL using pg_logical_emit_message(), i.e. it won't be materialized in any table. It still can be picked up via logical decoding, we still need to add support for that record type to Debezium though (contributions welcome :).
Except you already have postgres, why add another thing to it?
If you have some user attribute table that you set up with Debezium, there are a couple downstream consumers (teams!) that also want this data. So you hook up the table with debezium to Kafka and write all changes to a Kafka log.
A team of data scientists can stream these straight into their model, or a team of data engineers can dump these changes real time into an analytics store for business intelligence, or another team that needs access to this data can also create their own denormalized table of user attributes as well.
For the data producing team, once they get their data into Kafka basically any team can consume from this data which is a lot easier to maintain for the producing team and does not stress the database so its a very nice pattern for bigger orgs.
For the most part, keeping the data in a single postgres instance as long as possible would be my recommended solution. But there is a point where getting the data to consumers of that data becomes tedious enough that this additional infrastructure is worth it.
- Create a parent table like items with a valid time range column as a tstzrange type. This table won't store data.
- Create two child tables using Postgres inheritance, item_past and item_current that will store data.
- Use check constraints to enforce that all active rows are in the current table (by checking that the upper bound of the tstzrange is infinite). Postgres can use check constraints as part of query planning to prune to either the past or current table.
- Use triggers to copy from the current table into the past table on change and set the time range appropriately.
The benefits of this kind of uni-temporal table over audit tables are:
- The schema for the current and past is the same and will remain the same since DDL updates on the parent table propagate to children. I view this as the most substantial benefit since it avoids information loss with hstore or jsonb.
- You can query across all versions of data by querying the parent item table instead of item_current or item_past.
The downsides of temporal tables:
- Foreign keys are much harder on the past table since a row might overlap with multiple rows on the foreign table with different times. I limit my use of foreign keys to only the current table.
edit: it contains many database-patterns that are useful even you aren't building time-oriented applications. It will make you a better developer.
edit 2: basically, once you know some things from this book, every time you encounter a new database engine or programming language, the first thing you will want to look at is how the language handles dates & times - suddenly this kind of field become way more interesting. So the book might change how you view dates/times permanently, just be careful if you tend to over-engineer or are perfectionist, because it makes it easy to do so. Apply the patterns when it makes pragmatic sense.
Managing Time in Relational Databases: How to Design, Update and Query Temporal Data How to Design, Update and Query Temporal Data
> be careful if you tend to over-engineer or are perfectionist
One of the downsides of these books is that I see bitemporal data everywhere now but its quite difficult to write bitemporal queries in Postgres.
We use open source (both mysql and postgres) but I keep missing mssql features and have to replace them with adhoc stuff that often does not work on AWS Aurora as-is.
I find this lets me punt on a lot of fine-grained permissions for different roles within an organisation. With the audit log, it becomes reasonable to say “just ask them not to do that. If they do, you’ll easily be able to see what happened and we can reverse it, then tell them to knock it off”
I’ve also gotten pretty wild using it to reconstruct state in migrations. Like adding a “created_at” field to a table, then mining the audit log for creation events to set the value on existing rows. Or changing a table to a soft delete instead of an actual delete, then repopulating past entries from the audit log.
It does _feel_ like the kind of thing that's likely to build an unreasonable amount of cruft around the place, but it just hasn't happened over the last few years. The way I run it all the audit logs are actually in a single table. Querying that table isn't exactly quick but it doesn't need to be. It all just trucks along.
Than you for sharing, will play around with putting it all in one table.
If you have any "hot" tables, it is worthwhile to partition those audit tables, again, automate those parts.
I've only had a single case where we had to switch it off (the auditing) because we started getting deadlocks (main table was read & write heavy + our reporting engine was fed off of it at the time).
So for most cases it works perfectly well. Just keep an eye on your hot tables, memory consumption and how fast your main table grows (as the audit tables can sometimes take up more disk space than your main table (had a 20Gb table with a 50Gb audit table...)).
If you are on postgres, pg_partman + pg_cron are your friends. You can also automate it from you application side (which is what I mostly do, makes permission issues easier to manage and I have strong typing in my language - I run this code after deployments/migrations are done, automatically - make sure it is idempotent so you don't accidentally wipe your audit tables).
I'm sure plenty of people here will have thoughts on this point.
I appreciated the clear write-up of the symptoms of the problem. Causes of the problem not in the write-up: failing to shard on the ID of the paying customer, and highly questionable scripting language to run DDL, and lack of testing, and allowing non-DBAs to run DDL, and incompetent management.
On the bright side, the Node hackers can keep patting themselves on the back for recovering from their self-inflicted wounds.
I now use it with every project and it has saved me many times.
I see this same "shadow table"+trigger pattern with other databases too.
Surely the amount of DDL statements would be trivial compared to the DML ones?
with affected_rows as (
insert/update/delete into/from ...
insert into my_audit_table
select @op, current_timestamp, @userinfo, * from affected_rows
The nice thing there was that the entire data subsystem was designed to be lossless through the lifetime of system through normal use. So as you changed the datamodel, no data would actually be removed. For the target use-cases (small scale departmental systems) this is extremely powerful.
Kind of miss working there, I still have a huge list of really cool awesome things that you could do which would pull our industry forward kicking and screaming.
In my example, say a user edits their document and saves it (to a row in the DB). I want to be able to tell what their previous row was, what the new one is, and basically git diff them.
, HSTORE(oldData.*) - HSTORE(newData.*)
, HSTORE(newData.*) - HSTORE(oldData.*)
>I have used it in at least one application before.
Good to know that other folks are using this technique!
>One of my favorite things about it is that you can use it even when you don't own the application writing to the database
Great point! Hadn't thought of this.
Harder to set up, and error prone to not save (if something crashes), but it works on non-database endpoints too.
Thus it requires a "replicaset" but it's fine to run a single node replicaset.
For auditing, rewind, recovery, change history and other perks then Event Sourcing would be closer to the solution - but you shouldn't really be choosing ES just for those perks. ES should mostly only be considered in situations where complex data and business rules are being interacted with by simultaneous actors (financial systems might be one example).
I'd caution against going with ES just for the sake of auditing as it is quite the paradigm shift in both client and server architecture and you will need to be comfortable with eventually consistent systems and higher overall system complexity.