

Ask HN: Best practices – Should I ever actually delete data? - MalcolmDiggs

Imagine you&#x27;re building a simple CRUD web application. Users can create stuff, mess with stuff, delete stuff, etc.<p>Is it ever a good idea to <i>actually</i> delete something as a response to user-behavior? Lately I&#x27;ve been defaulting to adding an &quot;is_active&quot; attribute&#x2F;column&#x2F;property to the data I store. When a user tries to delete something, I just switch that boolean from true to false.<p>The upside is I never actually delete anything (and can retrieve it if the customer freaks out because they deleted something on accident), the downside is just that as well...growing datasets that are not actually doing any good, except for being able to save-the-day in very rare customer-support contexts. And the downside of having to include &quot;is_active == true&quot; in most queries I send to retrieve data.<p>How do you guys handle this?<p>I was thinking of creating worker-scripts to regularly purge the database of non-active data (dump it to a flat file and send it all to Amazon Glacier or something)
======
hansgill
I'd say it has a lot to do with how your app/platform is used along with use
cases for your platform.

If you're snapchat and you promise people that you delete customer images
after 15 seconds then you sure as hell better delete them.

However if you are Trello and you delete a task, I can see a reason for doing
a "soft delete" and allow a customer to recover their task in-case it was an
accident. They actually offer an "archive" option first before a hard delete.
Now I can imagine even if the hard delete happened that maybe Trello still
doesn't delete the task and not jeopardize their business.

If I decide to keep the data, it would depend on me figuring out few things
such as:

1) how likely it is the user comes back for that data? 2) are there analytics
which I can run on this data? 3) is the cost of storing this data more than
the value if its lost?

~~~
MalcolmDiggs
That's good advice, thank you for the tips.

------
labpdx
Typically I will not delete the record but will set an Inactive or Deleted bit
on the record and exclude those from queries, unless the data is sensitive in
nature.

I do this after years of experience of users wanting to retrieve deleted data,
even years from when the record was originally deleted.

On larger datasets I'll then have a task run at a specified timeframe that
will go through and cleanup/delete all of the Inactive records.

~~~
MalcolmDiggs
Interesting, very close to where I was heading. Thank you for the insight.

