Hacker News new | past | comments | ask | show | jobs | submit login

Kudos to GDPR obliging companies to allow users to delete their personal data.



People say that but my paranoid side is convinced that no real deletion happens :-/


Its likely not deleted in backups is my suspicion. I dont know what policies some companies have in regards to backups.

Also some companies always had the option for years.

One good test might be to create Facebook and instagram accounts, then upload images, save direct links to those images. Delete the accounts and see.... If the links work after clearing cache / a few days / weeks / months... Then yeah they just keep your data but detach it from friends and your email / password.


Indeed it isn't deleted from backups. And according to [1] it doesn't have to be. In the company I work for it's handled the way that we have a list of subjects (their id in database) who requested deletion and after restoring any backup the subjects' data from the list is deleted again.

[1]: https://www.itgovernance.eu/blog/en/the-gdpr-how-the-right-t...


So do those records get deleted eventually? Or do they live on forever like some kind of ghost?


I think I am okay with this. So long as nobody is doing analysis on the data, it should be ok.


Direct links probably end up in their caches. If they stop being visited then you're fine and they'll be evicted, but intentionally evicting data that's been deleted is one of the hardest parts of implementing full deletion.


GDPR lawyers told me it should be deleted from backups if it is doable without breaking the integrity of the backup copy. If it could break the integrity or is technically impossible, then the company should have a list of all records to be deleted after restoring a backup and ensure that this list will be processed on each backup restore.


Subpoena their records for a lawsuit and see what they really have. My prediction is that at major tech companies (Google, Facebook, Amazon, etc) your data is actually deleted when you say "delete" while startups tend to start with soft deletions (less worried about being sued).


It could be that they store your data all encrypted, and when you want it deleted they just delete the encryption key from a few well defined places. That way there is less need to mess with backups, etc.


    update account set date_deleted = now() where account_id = 123


https://joindeleteme.com/help/diy-free-opt-out-guide/

why is a backgroundcheck company the right sponsor for a deletion directory?


I'm not worried that much about the technical side though, it is the data hoarding tendencies most sites have that i'm worried about.


Some years ago on a large though not especially well-known social network the task of deleting certain image files which it proved problematic to possess fell in my lap.

The list had been curated by ... some process not fully explained to me. A small number of spot checks convinced me that I didn't want to run any further validations myself, and I've rarely shredded any files harder.

The total set of images numbered in the millions, with each source image resulting in numerous thumbnail and preview sizes, as well as differing versions of the service app resulting in different naming patterns, paths, and locations. All of which were fronted by a CDN that had its own deletion mechanisms which I had to learn and adapt. The project involved conferences with the CDN's engineers.

I rapdily got the sense that large-scale bulk deletes weren't a frequently-encountered use case, as the default was to use a web form. That would have taken centuries to complete.

Some simple shell and awk could generate all the potential patterns, and batch the deletions (about 200 per request, with a return code indicating whether or not the request was accepted or the queue was full).

Documentation and initial tests suggested that it might take weeks, possibly months, to complete the deletions from the CDN. Residency on the CDN in any event was ~9 - 18 months, though no clear guarantees of deletion.

In practice, I kicked off the job on a Friday afternoon, and it completed over the weekend. The same initial request-generating code could be used to spot-check (random sampling), and eventually exhaustively search the space to confirm that all deleted content was now 404.

This was well before GDPR, and though the network userbase numbered in the tens of millions, the engineering staff was small (technology is an interesting multiplier lever, useful when deploying, problematic when dealing with issues at scale).

Upshot: deletion can be complicated. It's generally possible, however.

(A full scrub would have involved backups. I believe that the technical solution to that problem was not having any in the first place. Largely confirmed when the service fell over completely a few years later. Another warning regards online SAAS.)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: