Hacker News new | past | comments | ask | show | jobs | submit login
Postgres audit tables saved us from taking down production (heap.io)
174 points by kmdupree on Oct 26, 2021 | hide | past | favorite | 57 comments

This is essentially the idea behind CDC (Change Data Capture) [0].

Martin Kleppmann has some great blogs about this as well [1].

[0] https://en.wikipedia.org/wiki/Change_data_capture

[1] https://www.confluent.io/blog/turning-the-database-inside-ou...

Thanks for sharing these links!

If you look for a ready-to-use open-source implementation of CDC for Postgres (and other databases), take a look at Debezium [1].

On audit logs in particular, we have a post on our blog, which discusses how to use CDC for that, focusing in particular on enriching change events with additional metadata like business user performing a given change by means of stream processing [2].

One advantage of log-based CDC over trigger-based approaches is that it doesn't impact latency of transactions, as it runs fully asynchronously from writing transactions, reading changes from the WAL of the database.

Disclaimer: I work on Debezium

[1] debezium.io [2] https://debezium.io/blog/2019/10/01/audit-logs-with-change-d...

Debezium is awesome, thanks for the great work! It's in my toolbox for when embulk [1] batch processing doesn't cut it (or even in combination with).


Thank you so much, it's always awesome to hear that kind of feedback!


I've been thinking about your "transaction" pattern more, and I was wondering - Why don't you stream the transaction metadata directly to a kafka "transaction_context_data" topic? Would writing some of the data to the db while writing some of the data directly to kafka make it less consistent during faults?

The reason I ask: I'm curious what it would look like to use this pattern for an entire application, I think it could be a very powerful way todo "event sourcing lite" while still working with traditional ORMs. Would writing an additional transaction_metadata row for most of the application insert/updates slow things down? Too many writes to that table?

> Would writing some of the data to the db while writing some of the data directly to kafka make it less consistent during faults?

Yes, this kind of "dual writes" are prone to inconsistencies. If either the DB transaction rolls back, or the Kafka write fails, you'll end up with inconsistent data. Discussing this in a larger context in the outbox pattern post [1]. This sort of issue is avoided when writing the metadata to a separate table as part of the DB transaction, which either will be committed or rolled back as one atomic unit.

> Would writing an additional transaction_metadata row for most of the application insert/updates slow things down?

You'd just do one insert into the metadata table per transaction. As change events themselves contain the transaction id, and that metadata table is keyed by transaction id, you can correlate the events downstream. So the overhead depends on how many operations you in your transactions already. Assuming you don't do just a single insert or update, but rather some select(s) and then some writes, the additional insert into the transaction metadata table typically shouldn't make a substantial difference.

Another option, exclusive to Postgres, would be to use write an event for the transaction metadata solely to the WAL using pg_logical_emit_message(), i.e. it won't be materialized in any table. It still can be picked up via logical decoding, we still need to add support for that record type to Debezium though (contributions welcome :).

[1] https://debezium.io/blog/2019/02/19/reliable-microservices-d...

> kafka

Except you already have postgres, why add another thing to it?

A benefit of Kafka is that it is relatively trivial to spin up multiple consumers from a single CDC stream.

If you have some user attribute table that you set up with Debezium, there are a couple downstream consumers (teams!) that also want this data. So you hook up the table with debezium to Kafka and write all changes to a Kafka log.

A team of data scientists can stream these straight into their model, or a team of data engineers can dump these changes real time into an analytics store for business intelligence, or another team that needs access to this data can also create their own denormalized table of user attributes as well.

For the data producing team, once they get their data into Kafka basically any team can consume from this data which is a lot easier to maintain for the producing team and does not stress the database so its a very nice pattern for bigger orgs.

For the most part, keeping the data in a single postgres instance as long as possible would be my recommended solution. But there is a point where getting the data to consumers of that data becomes tedious enough that this additional infrastructure is worth it.

That is very cool.

If you're tempted to use audit tables, you might also consider making the jump to temporal tables. I rolled my own light-weight version using inheritance-based partitioning in Postgres. The basic idea is:

- Create a parent table like items with a valid time range column as a tstzrange type. This table won't store data.

- Create two child tables using Postgres inheritance, item_past and item_current that will store data.

- Use check constraints to enforce that all active rows are in the current table (by checking that the upper bound of the tstzrange is infinite). Postgres can use check constraints as part of query planning to prune to either the past or current table.

- Use triggers to copy from the current table into the past table on change and set the time range appropriately.

The benefits of this kind of uni-temporal table over audit tables are:

- The schema for the current and past is the same and will remain the same since DDL updates on the parent table propagate to children. I view this as the most substantial benefit since it avoids information loss with hstore or jsonb.

- You can query across all versions of data by querying the parent item table instead of item_current or item_past.

The downsides of temporal tables:

- Foreign keys are much harder on the past table since a row might overlap with multiple rows on the foreign table with different times. I limit my use of foreign keys to only the current table.

If anyone feel interested after reading this comment, checkout the book called "Developing Time-Oriented Database Applications in SQL".

edit: it contains many database-patterns that are useful even you aren't building time-oriented applications. It will make you a better developer.

edit 2: basically, once you know some things from this book, every time you encounter a new database engine or programming language, the first thing you will want to look at is how the language handles dates & times - suddenly this kind of field become way more interesting. So the book might change how you view dates/times permanently, just be careful if you tend to over-engineer or are perfectionist, because it makes it easy to do so. Apply the patterns when it makes pragmatic sense.

Good recommendation; I also got a lot of mileage out of:

Managing Time in Relational Databases: How to Design, Update and Query Temporal Data How to Design, Update and Query Temporal Data

> be careful if you tend to over-engineer or are perfectionist

One of the downsides of these books is that I see bitemporal data everywhere now but its quite difficult to write bitemporal queries in Postgres.

That's exactly what I mean. It's new lens and feels like it should be applied to everything, when it really should not. But then you start overthinking, that ~maybe~ your will regret not doing it etc. Silly I know.

MSSQL comes with (very nice) temporal table support out of the box, if you can stomach the license fees.

MariaDB supports temporal tables, if you can stomach MySQL. Works fine in my experience, although the largest database I've used it with is no more than 30 GBs.


Yes, I wondered why open source dbs don't have that out of the box (although it seems mysql has it in the other answer to you, so will check) as I use them extensively: both audit tables and temporal tables really make my work in banking far easier (and audit tables are often mandatory anyway).

We use open source (both mysql and postgres) but I keep missing mssql features and have to replace them with adhoc stuff that often does not work on AWS Aurora as-is.

I make heavy use of this technique. I use it to display an audit log to users that tells them what’s changed on the entity and who did it. I set the application_name to the currently logged in user id so I can display it back.

I find this lets me punt on a lot of fine-grained permissions for different roles within an organisation. With the audit log, it becomes reasonable to say “just ask them not to do that. If they do, you’ll easily be able to see what happened and we can reverse it, then tell them to knock it off”

I’ve also gotten pretty wild using it to reconstruct state in migrations. Like adding a “created_at” field to a table, then mining the audit log for creation events to set the value on existing rows. Or changing a table to a soft delete instead of an actual delete, then repopulating past entries from the audit log.

Do you add audit tables for basically everything, or just the most critical tables? I’m wondering whether the DB eventually becomes dominated by audit entries, though I guess you could prune them periodically if that is a problem.

In my case I've added them everywhere, all tables. The critical tables are also the highly trafficked ones in my case. So leaving the audit off the less-critical ones wouldn't really buy me anything.

It does _feel_ like the kind of thing that's likely to build an unreasonable amount of cruft around the place, but it just hasn't happened over the last few years. The way I run it all the audit logs are actually in a single table. Querying that table isn't exactly quick but it doesn't need to be. It all just trucks along.

I have all mine in a separate schema, with prefixes: "audit_schemaname" with "audit_customer", "audit_invoice" etc. And then I partition them based on the hot ones.

Than you for sharing, will play around with putting it all in one table.

I add them to everything now, but automate it.

If you have any "hot" tables, it is worthwhile to partition those audit tables, again, automate those parts.

I've only had a single case where we had to switch it off (the auditing) because we started getting deadlocks (main table was read & write heavy + our reporting engine was fed off of it at the time).

So for most cases it works perfectly well. Just keep an eye on your hot tables, memory consumption and how fast your main table grows (as the audit tables can sometimes take up more disk space than your main table (had a 20Gb table with a 50Gb audit table...)).

If you are on postgres, pg_partman + pg_cron are your friends. You can also automate it from you application side (which is what I mostly do, makes permission issues easier to manage and I have strong typing in my language - I run this code after deployments/migrations are done, automatically - make sure it is idempotent so you don't accidentally wipe your audit tables).

> Unfortunately, the Node code we wrote to do this had a bug: it treated a string as if it was a number.

I'm sure plenty of people here will have thoughts on this point.

I copied that text from the article, and found you already posted it!

I appreciated the clear write-up of the symptoms of the problem. Causes of the problem not in the write-up: failing to shard on the ID of the paying customer, and highly questionable scripting language to run DDL, and lack of testing, and allowing non-DBAs to run DDL, and incompetent management.

On the bright side, the Node hackers can keep patting themselves on the back for recovering from their self-inflicted wounds.

I have no idea how it's still online, but I have a more robust implementation borrowed from the Postgres wiki: https://eager.io/blog/audit-postgres/

I now use it with every project and it has saved me many times.

For a moment I thought Postgres had a mechanism to simplify doing this.

I see this same "shadow table"+trigger pattern with other databases too.

You can use pgaudit, it's an extension that let's you audit DDL/DML statements. It's a great auditing mechanism. I use it on all our prod postgres instances, but have only "DDL" enabled, because of the potential performance overhead


> but have only "DML" enabled, because of the potential performance overhead

Surely the amount of DDL statements would be trivial compared to the DML ones?

Good catch, I meant "DDL", corrected my comment.

This extension simplifies things a bit: https://github.com/arkhipov/temporal_tables

Indeed. Better temporal support was added the ANSI SQL 11 standard[0] and it is supported by a number of databases, some (like Postgres) by extensions and others (like SQL Server) natively[1].

[0] https://en.wikipedia.org/wiki/SQL:2011

[1] https://docs.microsoft.com/en-us/sql/relational-databases/ta...

Ah. Sorry to disappoint. The technique was just new to me, so I thought it was worth sharing. :)

Audit tables are awesome, this technique is great. We had one additional requirement that got us away from using a trigger for this: we wanted to know what user (in the application) caused the change. So we moved the logic of "insert into audit_table" into the application code itself using a CTE, roughly translated as:

    with affected_rows as (
        insert/update/delete into/from ...
        returning *
    insert into my_audit_table
      select @op, current_timestamp, @userinfo, * from affected_rows
where @op is bound to one of "insert", "update" or "delete" and @userinfo is the user(name|id|various|etc) of the user that caused the change.

RETURNING * is a clever way to do this. What is this * from affected_rows syntax you're using here? Any chance you have a link to the docs that describes this?

The affected_rows identifier is the name of the common table expression, they're just doing:

    select *
      from affected_rows
With some additional columns that happen to be provided by postgres and/or a parameter to the SQL statement.

We wanted to do the same thing, but still keep it in the DB as triggers. For every change we store both the DB user making the change and the application user. In postgres (and mysql) you can set variables in your transaction and then reference them from your trigger. This way we can capture when things change via the application and on the rare occasion where we need to make manual updates.

Around 7-8 years ago I built something very similar into the data subsystem of the Low-Code platform I was working on (triggre.com). All fully generated, and fairly performant.

The nice thing there was that the entire data subsystem was designed to be lossless through the lifetime of system through normal use. So as you changed the datamodel, no data would actually be removed. For the target use-cases (small scale departmental systems) this is extremely powerful.

Kind of miss working there, I still have a huge list of really cool awesome things that you could do which would pull our industry forward kicking and screaming.

I'm interested in this list or even a part of it

Audit tables are a good first step, but I am wondering if there was already work being done toward making foreign keys "version aware". Say I linked an order to a discount, it would be nice to be able to know how that linked discount looked like at the time of linking vs what it looks like now in case a merchant decided to change it (but still be able to do a group by id for the given discount).

This is solvable with vanilla table design. Add a version column to the discounts table, make your foreign key point to (discount_id, version) instead of just discount_id, keep old discounts around. Don't mutate discounts, just add a new row with a new version. You can still group by discount_id and exclude the latest version for each discount if you want.

Also there is the point-in-time architecture https://www.red-gate.com/simple-talk/databases/sql-server/da...

So we do something similar, using HSTORE to diff the old and new records, which then allows us to easily tell what was changed in the case of an update. Useful for many kinds of things, but especially forensics.

I have a usecase where I could use a diff of records changed, and what was changed. I don't see how HSTORE solves this for you. Could you explain more?

In my example, say a user edits their document and saves it (to a row in the DB). I want to be able to tell what their previous row was, what the new one is, and basically git diff them.

I replied to a sister comment with an example. HSTORE can take a whole row and turn it into an HSTORE value, plus HSTORE has an operator of `-` that will diff two HSTORE values. So combine that with the OLD and NEW that you get in the trigger and you can effectively diff the records.

clever! hadn't thought of combining this with hstore.

``` , SUBSTRING(TG_OP, 1, 1) -- this gets you I,D,U, or T for truncate

    , HSTORE(oldData.*) - HSTORE(newData.*)

    , HSTORE(newData.*) - HSTORE(oldData.*)
``` this is basically what we do, getting the old and new of only what changed.

This is a useful technique. I have used it in at least one application before. One of my favorite things about it is that you can use it even when you don't own the application writing to the database!

Author here. :)

>I have used it in at least one application before.

Good to know that other folks are using this technique!

>One of my favorite things about it is that you can use it even when you don't own the application writing to the database

Great point! Hadn't thought of this.

Oracle RDBMS probably counts as cursing here :) but there's a nice (old) post that works through the considerations around auditing records https://blogs.oracle.com/oraclemagazine/post/a-fresh-look-at...

Please note that this is not about some postgres feature but about using triggers to have some CRUD history.

Interesting, I hadn't considered doing this with triggers. I made my current stack save the user, time, and a diff of all changes to an event log (along with all the trace IDs.)

Harder to set up, and error prone to not save (if something crashes), but it works on non-database endpoints too.

Any sufficiently large project's data architecture contains an ad-hoc, informally-specified, bug-ridden implementation of half of event sourcing.

Let’s say you have a friend (def not me) who chose to use mongo instead of Postgres and is now stuck with mongo. Does mongo have an equivalent?

Yes (sort of). This is just a bunch of triggers that populate audit tables. In Mongo you'd use change streams (https://docs.mongodb.com/manual/changeStreams/) which are based off the oplog.

Thus it requires a "replicaset" but it's fine to run a single node replicaset.

Anyone here more experienced than me in CQRS care to weigh the pros and cons betweens these two patterns?

CQRS on its own isn't really going to get you what you want here as it's more about separating commands from queries.

For auditing, rewind, recovery, change history and other perks then Event Sourcing would be closer to the solution - but you shouldn't really be choosing ES just for those perks. ES should mostly only be considered in situations where complex data and business rules are being interacted with by simultaneous actors (financial systems might be one example).

I'd caution against going with ES just for the sake of auditing as it is quite the paradigm shift in both client and server architecture and you will need to be comfortable with eventually consistent systems and higher overall system complexity.

Be careful with audit tables and GDPR requirements, though - if a user submits a valid deletion request, you're obliged to delete their data from your systems, wherever it resides. This includes backups and audit tables.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact