Hacker News new | past | comments | ask | show | jobs | submit login

Are you at all worried about the size of the audit table? Keeping a JSON copy of both the old and the new data is going to add up pretty quickly for fast-changing tables.

I've built audit systems in the past which track just the new values, with the idea being that you can replay the audit log for a particular record from the beginning to reconstruct its state. I'm not convinced that's a better approach than yours here though, just interested in your thinking.




> Are you at all worried about the size of the audit table? Keeping a JSON copy of both the old and the new data is going to add up pretty quickly for fast-changing tables.

That's def a valid concern if you're auditing a high frequency insert/update table, or have your entire database under audit. If you do have those needs a system that logs outside of postgres (like pgaudit) would be a better fit.

In my experience most startups and mid-size enterprises sprinkle in auditing around the sensitive parts of the DB, like `account` or `access_control` tables where writes load isn't much of a concern.

> I've built audit systems in the past which track just the new values

Yeah, that solution works great! The one place it breaks down is if you apply auditing to an existing table, and need to be able to recover the records from when auditing was enabled (initial state is not snapshotted)

We toyed with another approach to avoiding having to track old_record by first checking to see if the `record_id` exists in the audit table, and then inserting a row with the OLD values with a `SNAPSHOT` `operation` if it does not.

Even though that query/check was operating on an index, the performance overhead was higher than we were comfortable with.

Storage is pretty cheap these days so we opted to optimize for write throughput + reduced memory usage rather than any considerations wrt disk.


> We toyed with another approach to avoiding having to track old_record by first checking to see if the `record_id` exists in the audit table, and then inserting a row with the OLD values with a `SNAPSHOT` `operation` if it does not.

> Even though that query/check was operating on an index, the performance overhead was higher than we were comfortable with.

Was this done inside the trigger function? Couldn't you just insert the SNAPSHOT values first before applying the trigger? Seems wasteful to me to track both NEW and OLD for all subsequent operations just to ensure the initial value is covered.


I've also used an update trigger to track only changed columns, inspired by this post[0] which wraps a comparison function into a new subtract operator for JSONB, similar to the built in subtract operator for HSTORE which removes any keypairs that haven't changed.

[0] http://8kb.co.uk/blog/2015/01/19/copying-pavel-stehules-simp...


thats a cool solution for reducing storage. nice!


Building a bit off Simon's above question. I'm curious if any thought was given to using JSON over JSONB. The trade off of really fast ability to ingest them and save them seems like it could be better than JSONB which has to do some conversion and if there are any indexes on the JSONB columns could really dramatically slow down throughput.


> I'm curious if any thought was given to using JSON over JSONB

For sure! There is a conversion penalty that is paid incrementally (at write time). For paying that penalty, jsonb gets you reduced storage size and dramatically faster column value extraction with `->` or `->>` if you end up having to filter the audit table by something other than the `record_id`

It is a tradeoff though and depending on the specifics of your use-case JSON could be a better choice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: