Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: For Facebook, what's the most efficient way to erase data from backups?
4 points by ggregoire on April 11, 2018 | hide | past | favorite | 1 comment
Mark Zuckerberg just said to the Congress that Facebook erases users' data from their backups when a user deletes his account.

For 1 small database, I'd restore the dump, delete the data and redump without the data. But at the scale of Facebook, what's the most efficient way to achieve it?

I don't know in how many instances of MySQL my data are, or what's the backup periodicity & retention time. But let's say my data are in 1 MySQL, backed up every day, kept for 1 week. That makes 7 dumps of several petabytes. Let's say I have also some data in other DBs (Cassandra, Redis, etc). And let's say there are 10,000 users who delete their account every day. How Facebook does it?




If you interpret "deleting" data as making the data inaccessible, would discarding the encryption key of encrypted data count as it being deleted?

For example, let's say I encrypt each of my users data with a different, unique encryption key. In order to access the data, I need to fetch the contents of that user and then use their decryption key to decrypt them. The data can be regularly backed up and archived. If I ever need to delete a particular users data, I could simply discard and lose that users decryption key. While I do have mangled data that can be transformed back to the users data, I no longer have the ability to read the information and I wouldn't be able to access it (assuming you don't have the power to easily crack the encryption).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: