Easy, alternative soft deletion: `deleted_record_insert`

jl6 · on Jan 1, 2023

This is another solution to such a common problem that one might be surprised at there being no solution baked into the SQL standard. Instead, we have some vendor specific features like automatic audit tables and time travel, and a huge array of bespoked techniques like in the article: everything from adding a deleted_at column through to re-architecting your system around event-sourcing.

Why such diversity of solutions? I believe this is because the problem of adding time dimensions to your data model is heavily dependent on exactly what you intend to do with your data. Enabling theoretical audits is a very different requirement than enabling admins to rollback changes to data, which is a very different requirement to enabling users to undelete their data - and then there’s a performance angle to layer on top: your design will depend on whether you need your INSERTs/UPDATEs to be fast or your SELECTs.

It’s definitely not a case of one of these approaches being definitively better than another. You can’t shortcut talking to your users/clients/stakeholders about how the whole system is intended to work.

simonw · on Jan 1, 2023

The SQL:2011 standard does describe a mechanism for solving this called System Versioned Tables, but the only database I've encountered that implements it so far is MariaDB: https://mariadb.com/kb/en/system-versioned-tables/

https://en.wikipedia.org/wiki/Temporal_database lists a few more - apparently there are versions of this in Oracle, DB2 and SQL Server now.

scythmic_waves · on Jan 1, 2023

I happened to fall down this rabbit hole a few days ago because of another HN thread. In it, a commenter mentioned [1] a talk by Markus Winand where he goes over some of the more interesting additions to the SQL standard since 92, including versioned tables [2] (I've linked to the timestamp, but the whole talk is good).

I got real excited about that feature because I could think of a few tables at work that could use it. Sadly, PostgreSQL doesn't implement it [3] (I can't permalink to the feature but you can search for "T180", "System-versioned tables", "T181", or "Application-time period tables").

[1] https://news.ycombinator.com/item?id=34182433

[2] https://www.youtube.com/watch?v=xEDZQtAHpX8&t=2276s

[3] https://www.postgresql.org/docs/current/unsupported-features...

can3p · on Jan 1, 2023

As many commenters pointed out, the ability to revert deletions is a highly desired feature. I cannot remember how many times our customers have deleted records by mistake and while deleted_at solution makes a revert trivial (update .. set deleted_at = null where undo_condition = true) it's much more complicated with the proposed condition. Why?

In any living system db schema is something that continuously evolves. New foreign keys, new columns, dropped foreign keys, dropped columns, new indexes. If soft deleted data remains in the tables it evolves with the rest of the system, but that's not the case if it all went into a json blob. Are you sure you want to remember all changes made to your data while trying to restore it?

The problem with queries is there, but I'm wondering about the scale of the problem. From my experience, tables do not go from hard deleted to soft deleted often, hence it's more a matter of habit to check the structure of a table you see for the first time and take deleted_at into account in case it's there.

As the author says, it's all about trade-offs, I would use an audit log for debug purposes (deleted_at does not answer who did it, adding deleted_by to every table adds a risk of split brain - what if deleted_at is null, but deleted_by is not?) and deleted_by to enable quick reverts to accommodate mistakes users and developers do.

A more problematic pattern for me is to trace the changes for different columns. Let's say you tag your uses as being a superhero. The moment you introduce this prop, analysts will immediately ask you who, why and how many times has changed this field and what was the value of the field at the time X. One can say that it's not necessary for all the fields, but I do observe that I have much less trouble in the development if I accommodate for it from the start rather then add hacks to support it later

brightball · on Jan 1, 2023

I’ve been using this exact approach for years and strongly advocate for it.

The deleted_at approach is rife with problems since you have to ensure it’s checked in every single query, including joins where it may be referenced. Getting the record out of the table is critical.

AlexeyMK · on Jan 1, 2023

You can get around the "check in every transaction" problem with an ORM, but now you're (more) coupled to your ORM which you will occasionally inevitable need to circumvent for something or other. And now you've made it (more of a) leaky abstraction.

ducaale · on Jan 1, 2023

or you can create view for every table that supports soft deletion and ensure all of your read-only queries are using those table

    CREATE VIEW current_customers AS
        SELECT * FROM customers where deleted_at is null;

    SELECT * FROM current_customers JOIN ...

Of course, this comes with its downsides e.g. views need to be recreated in every migration and there might be some complex join operations that might not work.

chewyfruitloop · on Jan 1, 2023

It may be work sometime in the future, but having to prefix every query with a null check sounds bonkers. A view can be optimised to do all that for you. I made the what the flip expression reading that they did a prefix on everything.

Inviz · on Jan 1, 2023

It can be the other way around, that the table has prefix/suffix, and the view doesnt. Alternatively view can be created with the same name, but in a different schema, which is set with higher priority for the user (e.g. via search_path in PG)

TylerE · on Jan 1, 2023

Bwttwr yet, encode the logic in a view and run your queries against the view. DRY

yen223 · on Jan 1, 2023

In postgres you cannot make a foreign-key reference to a view, which effectively means you can't prevent another table from referencing a soft-deleted record.

skissane · on Jan 1, 2023

I wish Postgres had something like a "CHECK contraint" on a foreign key.

There is a hack of sorts. You create a duplicate of the primary key column, named e.g. id_active. You create a CHECK constraint which says something like "(status = 'Deleted' AND id_Active IS NULL) OR (status <> 'Deleted' AND id_active = id)". You create a unique index on "id_active", and point your foreign key to that. When you create a record, populate both id and id_active to same value; when you soft-delete it, set id_active to NULL. Actually, maybe a simpler solution is to make id_active a "GENERATED ALWAYS AS ... STORED" column–although I'm not sure if Postgres supports them for foreign keys? That's a relatively new Postgres feature and I haven't done much yet with the more recent versions in which that feature was added.

aidos · on Jan 1, 2023

I’m also not sure about the foreign key restrictions on generated columns (and glancing at the docs I don’t see anything about it on there) but for all intents and purposes they are real columns so I’d imagine it probably works. Apparently they run after the before triggers, I’m not totally sure where foreign keys are checked but probably after that?

As an aside, they’re a great feature. We’re using them to generate columns that we can index for efficient joins between tables and also for creating text strings for searching over using trigram indexes. The whole thing is really seamless.

simcop2387 · on Jan 1, 2023

I haven't tested it but it might be the case that it'd need to be a stored generated column to be referenced like that but that shouldn't be a big deal.

aidos · on Jan 1, 2023

Might be mistaken but I think Postgres only has stored generated columns at this stage.

TylerE · on Jan 1, 2023

You can choose when FKs apply in postgres, including deferring until the very end of the transaction.

gavinray · on Jan 1, 2023

I'm not trying to shill for the company I work at, but in Hasura, you can make FK's and relationships between views and treat them like regular insertable/updatable tables.

It was one of the many things that impressed me so much when I was a user that made me want to hack on the tool for a living.

So, this problem of soft-deletes becomes trivial

AlexeyMK · on Jan 1, 2023

Sure, but you need an "undeleted" view per table now - I haven't seen an ORM offer maintaining those views capability out of the box - any to look at?

jbverschoor · on Jan 1, 2023

Don’t forget versions of the schema, and data that is not migrated to the new schema

porsager · on Jan 1, 2023

Setting up row level security policies to exclude rows `where deleted_at is not null` solves most of the issues with the discarded solution. Of course it would be crazy to have a system where you will need any extra where clauses for the default queries being made. You can even make a simple function that sets a statement level variable checked in the rls policy so that deleted rows can be included or only deleted rows be returned. The only negative thing I see about using a deleted_at column is that cancelling the delete in the before delete trigger changes the resulting "deleleted rows" count to be 0 instead of the expected number.

bastawhiz · on Jan 1, 2023

Excluding the rows still doesn't solve problems with foreign keys (you can't DELETE CASCADE and instead have to iterate all relationships manually). It also means you still need to remember to consider deleted_at when doing things like setting up unique indexes.

paulryanrogers · on Jan 1, 2023

> It also means you still need to remember to consider deleted_at when doing things like setting up unique indexes.

Considering soft deleted for uniqueness can be a feature, especially if one has a restore feature. Though I agree it can be easily overlooked if you want to exclude them, such as forgetting to use COALESCE with some reserved value. (Otherwise null will make the whole constraint always unique.)

bastawhiz · on Jan 1, 2023

In my experience, you only want machine-generated values to be reserved sheet soft delete. Operationally, too many users want to delete stuff so that they can free up a user-generated value. In my service, users frequently delete their sites so that they can replace the URL/subdomain with another that they want to replace it with.

Especially when you consider foreign keys, restoration of deleted data is essentially never just setting the deleted field to null. You need a more robust system to fill in missing details or rectifying the deleted data with new data.

YouWhy · on Jan 1, 2023

What I like about this construct is the fact that it uses the looser JSON semantics for the typically ad-hoc nature of recovering deleted entries - which does not necessitate the full power of referntial consistency.

What I'm not sure about is how will it will behave with blob fields and other data types with "problematic" serializations.

pw262 · on Jan 1, 2023

In some cases a soft delete is the only option. Say I run a chain of stores and wish to close a particular store mid year. The existing sales data still needs to reference the now closed store, but any other attempt to interact with the store should be prevented.

The application and its use of data must take this into account. I don't believe there is some magic bullet here. However, audit tables are invaluable for tracking changes and culprits.

Eventually, the store and all its data can be deleted using cascade delete to maintain referential integrity.

ljm · on Jan 1, 2023

I’ve found similar and settled on three distinct solutions that tackle different aspects of the problem that soft deleting covers.

The first is analytics/reporting, which can be used to aggregate or otherwise store data without being coupled to the main application (therefore, not a problem to delete data if it’s been processed already).

The second is auditing to maintain a paper trail.

The third is a soft delete backed by a TTL, basically so there’s a window of time to undo the operation, but the data doesn’t remain there forever (which might be a problem with GDPR and the like - depending on what your data is.)

Main reason I had for trying to break it down this way was because soft deletes in a relational database, with foreign key constraints, becomes unintuitive pretty fast. You need escape hatches to properly delete records, you need to remember to filter deleted ones from your results, and then there’s even more complication when trying to handle this across associations. So the final piece is only to soft delete things that require it, rather than enforcing it across all tables.

fb03 · on Jan 1, 2023

I wrote a specific library to do this (automatically generates a mixin for SQLAlchemy and installs a hook which rewrites all queries, removing soft-deleted items from queries and also relationships) so that the soft-delete problem becomes a non issue and you still have the data there if you want to revert/activate something

https://github.com/flipbit03/sqlalchemy-easy-softdelete

solidsnack9000 · on Jan 1, 2023

With a procedural language and SQL introspection features, you can programmatically generate triggers, functions and backing tables for these kinds of things. For Postgres, this SQL can generate temporal tables backed by JSONB for any table:

https://github.com/solidsnack/macaroon/blob/master/temporal....

It would be pretty easy to change this to perform deletion-with-interning like mentioned in the article.

In an older version, the code actually replicated the schema of the source table in the log table; but it can lead to problems during migrations.

duskwuff · on Jan 1, 2023

There's at least one factor which I don't see consideration for here.

Under GDPR, CCPA/CCRA, and a number of other privacy laws, if you're going to retain data related to people you need to provide those people some way of retrieving that data or requesting that it be erased. Putting the data into a "deleted_record" table doesn't remove that obligation.

So, if you're going to have a bunch of "deleted" records hanging around, you need some way to figure out who they belong to. And I don't see any way to do that with this schema, short of rehydrating all the rows and following the original foreign key relations.

username1407 · on Jan 1, 2023

Can't you use json operators [1] for this?

-- Permanently delete message sent by jon

DELETE from deleted_record where table_name = 'messages' and data->>'sender' = 'jon';

[1] https://www.postgresql.org/docs/15/functions-json.html

duskwuff · on Jan 1, 2023

That assumes that all of a user's data is a single foreign key relationship away from the user, which often isn't the case.

It also requires you to scan the entire table and decode JSON for every row you've ever deleted, which may become prohibitively expensive for large databases.

simonw · on Jan 1, 2023

PostgreSQL can create indexes on the results of expressions, so it's possible to accelerate queries that select against a nested value inside a JSON blob.

username1407 · on Jan 1, 2023

That's true.

To give an example for your 1st point: If you have the schema users->messages->message_attachments and message_attachments does not store the user_id, you'll have trouble finding all attachments from a specific user.

andretti1977 · on Jan 1, 2023

The deletion table is a schemaless table, it may become hard to anonimize the data

scrollaway · on Jan 1, 2023

This is an area where technologists and lawyers will end up disagreeing and fighting about boundaries etc.

Does it still count as your data if there is no normal way to retrieve/access it in the software?

If you say "yes", here's what this implies: if you have deleted the data, but it's still on the disk because the drive heads haven't wiped it yet (it's just been deallocated), then it's still accessible.

So, whenever there's a GDPR request, you should run disk recovery software? (The answer is no; you'd have to butt up against pretty thick lawyers and judges to be fined for this)

If you have an audit table that is automatically deleted after a while, and the audit table cannot be accessed as part of normal operations, then IMO you will be able to argue that it's not part of data that should be "reasonably accessed" via GDPR.

When you look at the spirit of the law, it also does make sense (disclaimer: I am a HUGE proponent of GDPR). What matters is that users have access to the data that the company has access to, and is able to correct and delete it. If the data is not normally accessible, and will soon be deleted, then it doesn't matter.

ectopod · on Jan 1, 2023

Here's a simple test: would the data be turned up in legal discovery? Nobody is doing a sector level disk scan but you would be expected to turn over relevant audit data if you had retained it. So it needs to be accurate and GDPR applies.

krab · on Jan 1, 2023

Not really. You actually may have other legal requirements to keep the data but you shouldn't use it for daily business because of GDPR. Imagine banks and long closed accounts.

Moreover, technical backup solutions, where only a very limited set of people have access, are fine. If you store DB backups, you don't have to rewrite or delete them because a customer that used your service for a week decided to ask for deletion under GDPR.

hgsgm · on Jan 1, 2023

That's a different part of GDPR: purpose limits. Deleted or not, you can only use data for an approved consented purpose that is mapped to that data item.

bennyp101 · on Jan 1, 2023

Something I’ve been meaning to play with is setting a “PII” comment on a table/field, then scanning over the schema if I need to find places to check for encryption etc

Perhaps something here that could mark it with a uuid so if it needed to be fully deleted, it could be found easily and removed or overwritten

douge1 · on Jan 1, 2023

I’ve recently put together an awesome list about temporality, including: soft delete, time travel, slowly changing dimensions, and bitemporality. https://github.com/daefresh/awesome-data-temporality

rubyfan · on Jan 1, 2023

I did something similar to this many years ago but didn’t limit to deleted records and instead created an audit log with a very similar approach. It worked nicely and provided a view into how data was changing by our users.

hdra · on Jan 1, 2023

I considered this for an audit log as well, but ran into a roadblock in terms of associating the operation with a logged in user who triggered the change..

how did solve that particular issue? or was it not a requirement in your case?

nkrumm · on Jan 1, 2023

We did something like this, by using postgresql's `set_config()` before every transaction. The trigger functions can read that value using `current_setting()`. It works for us, but for a relatively low-traffic internal application.

doctor_eval · on Jan 1, 2023

I have experimented with mapping application users 1:1 to individual database users, and using SET ROLE at the start of every transaction.

I got it working in a PoC, and with the right configuration of roles, this pattern would give you user identity and audit right down to the database itself.

Sadly I haven’t pulled it off in production environment yet, but if I ever get the chance to work at this level in an enterprise app again, I’ll definitely do it.

rubyfan · on Jan 1, 2023

I implemented in the app layer so had the user available to implement the new audit record. I suppose doing it in the DB layer would impose a challenge. The few ways to solve that that I’m aware of would require a) your app to kick off a function with the user id of the mutation or b) pushing user auth into the DB layer and principal switching your DB connection. Both have downsides IMO.

I don’t typically favor DB level solutions because bleeding app logic beyond basic schema and integrity means you lose some portability and/or your architecture messes separation of concerns pretty quick. Maybe that means you don’t get the full benefit of the DB but it’s a trade I usually make.

drittich · on Jan 1, 2023

We do this in SQL Server by running a SET CONTEXT_INFO command every time we open a connection. This makes the current user ID available to any SQL that runs.

Foobar8568 · on Jan 1, 2023

Reinventing standard database features like audit tables, sql server has such out of the box, same for any major databases.

douge1 · on Jan 3, 2023

Would “AS OF SYSTEM TIME” from SQL:2011[0] standard do the trick?

[0] https://en.wikipedia.org/wiki/SQL:2011

nonethewiser · on Jan 1, 2023

> I’ve spent the time migrating our code away from deleted_at, and we’re now at the point where it’s only left on a couple core tables where we want to retain deleted records for an exceptionally long time for debugging purposes.

Sounds like the problem is not soft deleting, but applying soft deleting to _everything_ without thought.

Then he goes on to suggest an alternative that is even more complicated.

Just include a where deleted at is null check. Hide it behind some interface in your ORM if you dont want to think about it.

maxbond · on Jan 1, 2023

It's a set-and-forget solution, where you only have to reason about it locally to the schema. As they point out, `deleted_at` is viral, you have to incorporate it into every query that touches a table with that column type, otherwise you might get into weird behavior or possibly vulnerabilities/disclosures.

A set and forget solution at the ORM level is begging trouble IMHO. I don't like ORMs, but I really don't like the idea of my ORM being even more magical than it already was.

nonethewiser · on Jan 2, 2023

Id rather have the magic in an ORM than the database. Sometimes this desire to keep people as far as possible from understanding the database schema just makes everything much more complicated.

maxbond · on Jan 2, 2023

It's funny, that's exactly how I feel about ORMs! (That they're a way to create distance between the programmer and the database, when what you should have is understanding.)

Understanding the schema is important I agree - a great way to achieve that is to make the schema simpler to understand & to eliminate the need to for special case logic as much as possible.

In my view triggers like this are great, they're keeping the integrity rules you've established for your data ("though shalt not delete a record without creating a copy for observability and manual recovery by incident responders") out of your application code (where shifting requirements and much higher change rate put them in jeopardy) and enshrining them in the schema (where that can stay for decades, through rewrites etc.).

djur · on Jan 1, 2023

This is addressed in the previous article:

> Some ORMs or ORM plugins make this easier by automatically chaining the extra deleted_at clause onto every query (see acts_as_paranoid for example), but just because it’s hidden doesn’t necessarily make things better. If an operator ever queries the database directly they’re even more likely to forget deleted_at because normally the ORM does the work for them.

I ran into this exact issue last month. Very common for places to have a general "soft delete unless you have a good reason not to" policy, and very common for people to forget about the deletion flag when writing joins by hand or doing reporting.

sroussey · on Jan 1, 2023

True, but in that case I wonder the reason why you or anyone else has such direct access to such data. ETL would eliminate such things for a data lake before anyone would run ad-hoc queries (that could and likely do) contains PII and other privileged data.

djur · on Jan 1, 2023

Well, for one thing, there are countless software systems worked on by hundreds of thousands of developers which will never include a data lake or any ETL more complex than "a programmer writes a script to generate a CSV and upload it/email it somewhere".

sroussey · on Jan 1, 2023

And likely all breaking GDPR laws

djur · on Jan 1, 2023

Or not doing business in the EU.

sroussey · on Jan 3, 2023

Or America, China, or India where similar laws exist.

pmontra · on Jan 1, 2023

Actually this is what they are defending against: "dozens of bugs and countless hours of debugging time as people accidentally omit deleted_at IS NULL from production and analytical queries."

Those queries could be raw SQL or from different ORMs and applications, maybe written without a full understanding of the database.

However the claim is not substantiated: how many bugs of that type did they had before?

porsager · on Jan 1, 2023

There is no reason to take on the tech debt of an ORM when you can use RLS to solve this be defining a policy that excludes records where deleted_at is not null.

japanman425 · on Jan 1, 2023

What’s the actual use case for soft deletes. Audits? Have an audit log then. There’s little reason to have deleted but not really deleted items in your database.

lazide · on Jan 1, 2023

‘Recycle bin’ type functionality for data is usually highly desirable for customers.

They want the ability to undelete stuff they accidentally delete, since accidental deletes happen all the time.

True for consumer users. Super super true for enterprise users.

hgsgm · on Jan 1, 2023

What's a use case for not even allowing soft deletes?

japanman425 · on Jan 5, 2023

Normal record keeping activities

revskill · on Jan 1, 2023

In most case, i just see deletion as change status from `active` to `deleted`. You should never remove the actual record from database.

hgsgm · on Jan 1, 2023

That's illegal in Europe, and unethical everywhere.

quesera · on Jan 1, 2023

That depends on the industry.

In finance or medicine, it's often illegal[0] to delete customer information before the end of an expiration period (minimum several years).

There are ways to comply without soft-deletion, but in practice they are much more expensive to implement and never used.

0: "Illegal" is the wrong word, really "non-compliant with regulations, resulting in significant fines but except in case of gross negligence or fraud, probably not criminal charges".

isitmadeofglass · on Jan 2, 2023

> In finance or medicine, it's often illegal[0] to delete customer information before the end of an expiration period (minimum several years).

There are also laws that require you to delete customer information. For instance patient data in situations where consent is withdrawn. So either way you can’t rely on “it’s often”. And on the flip side of things the legal requirements you are referring to are typically the audit requirements, which are still better served using audit tables, since it’s not enough to have a “current data” and “data that is currently not active”, since you need logs of who changed the data when, and you can easily run into situations where something is soft deleted, then restored, then soft deleted again.

quesera · on Jan 2, 2023

This is true, and in practice the solution frequently ends up being a combination of soft deletion and audit logs.

Just pointing out that in some common cases, the goals of consumer data control are impossible to implement, due to conflicting requirements for record retention.

blumomo · on Jan 1, 2023

Often other tables have foreign keys to the record to be deleted. What’s a good pattern for treating those related tables for every time a relationship row gets deleted/removed as per the OP blog post?

config_yml · on Jan 1, 2023

Really depends on the related data, but there‘s the option to use ON DELETE CASCADE|RESTRICT and set null or set default.

Cascade can be a bit of a footgun, as this can trigger a waterfall of deletes.

danparsonson · on Jan 1, 2023

Cascades can also lead to deadlocks as I found to my cost the one and so far only time I tried using hard deletes.

ivank · on Jan 1, 2023

https://github.com/xocolatl/periods periods does this for PostgreSQL.

ttfkam · on Jan 1, 2023

Great solution. Unfortunately only works on installations that support extensions, which excludes most managed database services like AWS RDS.

I like how you tried to track the standard as close as possible. I've seen (and written) ad hoc solutions that hard-code too much or mandate certain columns to be present.

That said, unlike the standard and most other RDBMSs, Postgres supports range types. Seems a shame to rely simply on two timestamptz columns when one tstzrange should suffice.

piyh · on Jan 1, 2023

JSONB seems like something I need to put a few nights into understanding, but my time is short and filled with diapers and bottles.

masklinn · on Jan 1, 2023

Unless you’re really into the implementation details of individual postgres column types that seems of little interest.

Jsonb just means “parsed json” (lit. “json, binary”), meaning postgres parses the data to a binary representation upfront which allows for more efficient json operations.

However that comes at increased insert and storage costs. It also leads to postgres normalisation so values don’t round-trip textually (which can surprise).

sandGorgon · on Jan 1, 2023

jsonb is a viable alternative to mongodb. im not doing blow by blow feature parity...but in 90 percentile usecase, you can replace mongodb with jsonb

eitland · on Jan 1, 2023

Good to see you have your priorities sorted.

Sincerely, a dad with 5 kids.

egeozcan · on Jan 1, 2023

Got my first one nearly 2 months ago. How are you still alive? I'm joking of course... But really, how?

blondin · on Jan 1, 2023

oh nice.

how do you recover deleted records?

dragonwriter · on Jan 1, 2023

It seems you should be able to do it fairly straightforwardly with dynamic sql with this structure, but, I don’t know why you wouldn’t just use proper history tables, rather than story only deleted records, but lumping them all into one table.

I’ve never encountered a database where I needed to know about non-current records but only ever the last-before-deletion state of deleted records, whether for data recovery or any other purpose.

ttfkam · on Jan 1, 2023

No dynamic SQL necessary.

jsonb_populate_record(…) was made precisely for this kind of scenario. Combine with an INSERT…ON CONFLICT DO UPDATE statement and Bob's your uncle.

https://www.postgresql.org/docs/current/functions-json.html

ttfkam · on Jan 1, 2023

https://news.ycombinator.com/item?id=34207659

ericpauley · on Jan 1, 2023

Exactly. The whole point of soft delete is data recovery, as it’s not sufficient for anything else (like auditing).

xupybd · on Jan 1, 2023

I think the author is suggesting that you often can't do it with soft deletes anyway.

https://brandur.org/soft-deletion#undelete

nemothekid · on Jan 1, 2023

>and without the expectation that it’ll be used to undelete data (which probably wouldn’t work for deleted_at anyway).

jimmytucson · on Jan 1, 2023

My question when people implement a generic feature like this, is why doesn’t the database do it? And many tunes, as is the case with soft delete, it does. For example, Redshift tombstones records and you can choose when to “vacuum” them up (actually delete them).

Usually if you’re changing the way a primary function of the database works, like delete, it’s probably not a good move.

no_wizard · on Jan 1, 2023

To take a stab at it:

This is one of those things where business requirements trump the technical implementation details. Prevailing theory is that actual deletes are bad because you can’t do historical analysis, recovery etc on the data. Say a customer stops using a service for a year but comes back: it’s a big win if you can (at least optionally) restore their data, so the theory goes.

That’s why tricks like this exist.

The alternative is to write the data to a separate database and/or table that’s meant for archive purposes. I think it’s better than these soft delete tricks but it’s got complexity too. More resilient IMO and let’s production systems run leaner over time IMO

nickjj · on Jan 1, 2023

I think there's still a spot for deleted_at or deleted_at'like functionality.

It's around historical data, especially in a work scenario.

For example a worker might create a thread and then 38 other workers reply to it. There could be a lot of great information in this thread. It could also be referenced in 5 other threads and external sources (docs, etc.).

If the worker leaves the company, should you really delete them in such a way where all of the threads they've created get deleted in a cascading fashion? I'm all for privacy and I would want to see that happen in most public communities but for private work, I don't know. That changes everything.

I've seen a number of tools keep the user around and visibly label them as "Jane Smith (Deactivated)". I think that's a lot cleaner than having a special reserved "Deactivated" user and then you change all of the user_id FKs to that user before deleting the person who left. If you do it that way you lose the context of who posted the original thing which has a lot of value in a long running project.

But all of the above hints at using deleted_at most likely. It keeps everything in tact at the DB level and then the app layer chooses what to do for each resource type. What other options do you really have if you want to keep things working exactly the same after deleting someone except for maybe showing an indicator that they're not around anymore?

no_wizard · on Jan 1, 2023

If you’re separating your deletions from everything else then I imagine archiving a user would not be the same as deleting one, in which case why not just mark the user as archived?

It prevents the overload of what deletion means as to preserve proper separation of concepts too

ttfkam · on Jan 1, 2023

Wouldn't this be better handled by marking the user entry as inactive and then reading that value from the join rather than setting every conversation thread as is_deleted/is_deactivated?

I feel like you are solving a different problem than the one presented in the article.

nickjj · on Jan 1, 2023

It was more about the comment I was replying to around a use case where historical data is important to keep around.

Marking it as inactive with a boolean makes you lose the context on when the inactive toggle happened.

Chances are you want to use deleted_at on the user in my use case because that provides you the timestamp of when it happened and also lets you have application logic which prevents that user from being able to login and do things, but still lets you show their previous information to everyone else as if they didn't leave and all you have to do is have a tiny bit of template logic that appends " (Deactivated)" to their name if `deleted_at` is not NULL.

brightball · on Jan 1, 2023

Another is the support case “Help! I accidentally deleted the wrong thing!” where it saves a huge amount of time compared to loading up a full DB backup.

ttfkam · on Jan 1, 2023

This is precisely the situation the article solves but without repeating

  AND NOT is_deleted

after every DB query in every app accessing the database. No full DB backup/restore needed.

  INSERT INTO mytable
       SELECT recovered.\*
         FROM myaudittable audit
            , jsonb_populate_record(
                audit.jsoncolumn
              , NULL::mytable
              ) recovered
        WHERE audit.id = 8675309
  -- optionally merge if new data added
  ON CONFLICT DO UPDATE
          SET field1 = EXCLUDED.field1
            , field2 = …etc…

Postgres has a lot of great functionality making jsonb manipulation relatively simple and easy. Is it more complicated than a simple UPDATE? Yes, but you only have that complexity once rather than in every query on the table from every app and ORM and that recovery can be more nuanced since not every restore strategy is equally valid in every situation.

jxf · on Jan 1, 2023

> Usually if you’re changing the way a primary function of the database works, like delete, it’s probably not a good move.

This isn't a soft delete -- you can't use the data for undeletion and it's not visible to an application that can see that table (as it would be in the case of a `deleted_at` column). It's only for analytical purposes, as the author says.

carefulfungi · on Jan 1, 2023

re: why doesn't the database do it

They do - almost all databases will offer change data capture integration either via making the write ahead log readable or by suggesting triggers, like the article.

If you wanted to read the deletion journal in another system (maybe because of transaction rate), you might prefer CDC to Triggers. For example: https://www.postgresql.org/docs/current/logical-replication-...

ibbtown · on Jan 1, 2023

Sounds like a DSGVO/GDPR nightmare. And I really don't want to explain why the deleted data are part of some sold stolen data blob. There might be some technical use cases, but in general it has many culprit

simonw · on Jan 1, 2023

Compliance there feels to me like it would be quite straightforward to implement: if there is a legal requirement for the deleted data to be fully disposed of there's a very obvious single place to delete it from.

ocimbote · on Jan 1, 2023

If this ils well documented, I don't see how this would be a problem for GDPR. Can you elaborate?

LudwigNagasena · on Jan 1, 2023

If you want to delete something, delete it.

If you want to restore something, get it from a backup.

If you want to delete something, but you fear that it will ruin something in your db because the architecture is a mess and you are not really sure what references what and what will break, then soft-delete it.

But what is the point of this?

berkle4455 · on Jan 1, 2023

soft deletes clutter up the table in question. They lead to bugs in code when people forget to exclude the soft deletes WHERE clause. It also can complicates patterns: user deletes account, a month later the same user wants to create a new account but the soft delete row prevents creation due to duplicate email address.

This seems like a clever solution that simply deletes rows but provides a cumbersome mechanism to see history of delete data if they need it.

ttfkam · on Jan 1, 2023

Not too terribly cumbersome when you have jsonb_populate_record in Postgres.

https://www.postgresql.org/docs/current/functions-json.html