
Instagram kept deleted photos and messages on its servers for more than a year - jatins
https://www.theverge.com/2020/8/14/21368602/instagram-kept-deleted-photos-messages-on-servers-year-bug-fixed
======
holidayacct
Whenever someone publishes an article like this, I want them to find out what
its like when you reach the threshold of 100000 individual delete requests per
second which ends up being five million actual deletes when you factor in all
of the associated references to the item being deleted and its metadata. Then
I want them to find out what happens when you have to propagate those deletes
across geographically distributed data centers, clear a geographically
distribute cache and do it in a way that minimizes user facing errors and
guarantees consistency. Finally you have to also ensure that deletions don't
effect any business facing applications since ad revenue and metrics are all
generated from certain types of data.

These people imagine you just rm -rf a file and run a few SQL queries, you
literally can't just delete data. I've worked at the scale of Instagram
before. Literally nothing works like that at that scale, the person who wrote
that article should have called up someone from Instagram and had them explain
at a really high level how complicated deleting data is at large scale.

~~~
ctvo
Hi. I also work at this scale. It doesn't take a year to delete the data. Be
upfront with your users:

'Your delete request is being processed. No one will be able to see this item
though it may take up to three days for the data to be completely removed from
our servers'

This is not a technical challenge, it's an issue of priorities and Facebook
doesn't prioritize privacy. They were able to release a TikTok clone in a few
months, but can't solve deleting data?

~~~
parliament32
>It doesn't take a year to delete the data

It does if you have cold backups that take a year to cycle out. They're often
offsite, compressed, and incrementally hashed so finding individual items and
removing them is really hard -- you're much better off just waiting for them
to expire, and a year isn't an unreasonable amount of time.

~~~
slg
The article specifically states this data was downloaded using the "Download
Your Information tool on Instagram". The undeleted data wasn't from a cold
backup or any backup, it was still on production systems.

~~~
parliament32
If this is a GDPR "right to access" request, which most "download your
information" tools are, then backups are included. See
[https://law.stackexchange.com/questions/27625/gdpr-
complianc...](https://law.stackexchange.com/questions/27625/gdpr-compliance-
does-it-extend-to-database-backups-and-archived-records)

~~~
slg
Doesn't that link show that the how backups related to GDPR is still up for
debate? The accepted answer that says they are included has 3 upvotes and
there is an answer that says no they aren't included that has 2 upvotes.
Either way, the legal requirement is a separate issue from what Instagram is
actually doing. If that download tool is automated, I am rather confident in
saying that it isn't combing through year old backups to get data.

~~~
parliament32
Ever wonder why "download my data" usually takes a few hours/days on most
services before you receive the download link in your email? I promise you
compiling the production data doesn't take that long.

Backups are covered under GDPR, although when a user requests erasure you can
say "your data will be rotated out in X months/years". Not sure how this
applies to access, but I assume it's similar: [https://ico.org.uk/for-
organisations/guide-to-data-protectio...](https://ico.org.uk/for-
organisations/guide-to-data-protection/guide-to-the-general-data-protection-
regulation-gdpr/individual-rights/right-to-erasure/#ib5)

------
tgsovlerkhgsel
> "The researcher reported an issue where someone’s deleted Instagram images
> and messages would be included in a copy of their information (...) We’ve
> fixed the issue"

This makes it sound like they consider "you could see it" the issue, not "we
were still keeping it". In other words, the fix was to hide it, not to delete
it.

If I were the Irish DPA (and actually wanted to do my job and had the
resources to, instead of being intentionally lazy/crippled to attract tech
firm headquarters), I'd definitely be asking for retention plans and evidence
that the data is now being removed in a timely manner, and start issuing fines
(small ones for past transgressions, big ones if they keep doing it or don't
have a decent plan how to make sure to get rid of data they shouldn't be
having).

For comparison: Deutsche Wohnen (large real estate) got slapped [1] with a
14.5 million EUR fine for over-retaining sensitive tenant data and not having
an automated system to delete it.

[1] [https://www.dataprotectionreport.com/2019/11/first-multi-
mil...](https://www.dataprotectionreport.com/2019/11/first-multi-million-gdpr-
fine-in-germany-e14-5-million-for-not-having-a-proper-data-retention-schedule-
in-place/)

~~~
davidhyde
Why small fines initially? I’d like to see privacy fines being used to make
lots of money like traffic fines are used today. It’s a way to tax tech
companies in your jurisdiction with the added benefit of improving privacy.

~~~
jacquesm
Because the goal is compliance, not to put companies out of business. When the
laws were first enacted everybody was screaming that it was just to put
companies out of business. Now they are wondering why the small initial fines.

It's simple: change your ways and use the initial fines as a wake up call. If
you then do not wake up and persist the fines will get heavier and heavier
until you _will_ pay attention.

A Dutch hospital managed to get to the third round of fines and they weren't
all that happy afterwards. 460K Euro fine for a _single_ instance of ignoring
the regulators on a _single_ individual.

Believe me when I tell you they have understood now.

The initial fine was zero, just a warning to improve.

The case revolved around a very minor dutch celebrity whose data was reviewed
by hospital employees that should not have had that access.

~~~
luckylion
> Because the goal is compliance, not to put companies out of business.

There's middle ground between "we take 100% of your revenue" and "we take
0.001% of your revenue". Given that we're not this lenient with private
citizens and small companies, why should we be with international
corporations?

~~~
foepys
It's not 100%. The fine can get up to 4% global revenue or 20 million Euro,
whichever is higher.

~~~
jacquesm
There is still some unclarity as to whether or not multiple fines can be
issued for different transgressions, there hasn't been such a case yet and
nobody has gotten close to the limit so for now this is still grey. But I
think that once fined at that level no sane CEO is going to risk getting a
second such fine in the same year or even at all.

------
jacquesm
This sort of practice is not limited to just Instagram. Plenty of places that
do soft deletes when they should be doing hard deletes. Data life-cycles are
about the poorest understood subject in startup land. Ingestion is usually top
notch, friction free and heavily automated. Deletion - assuming it even exists
- is semi automatic or even manual, full of friction and usually incomplete or
broken.

You see a similar pattern with respect to signups vs account cancellation.

The weird thing to me is that it is usually the _marketing_ department and not
the legal or the compliance department that has the upper hand in these data
retention discussions. Fortunately thanks to the GDPR this is now changing and
slowly companies are coming around on this.

~~~
kevincox
I think just about everything should be a soft delete, however you need a time
limit where you sweep those. Ideally you would even give the user an option to
accelerate that (as much as technically possible) if they really want
something gone.

~~~
totalZero
"If we tell people that we're going to delete their data then we need to do
that." \-- Chairman Zuckerberg

I understand the user utility of a brief soft-delete period, but that hard-
delete sweep should be performed on a fairly tight delay.

------
FridgeSeal
> When you delete something from Instagram you expect it to be gone for good

Ohh do I have some bad news for you.

Seemingly everyone in this industry can’t or won’t design databases and
applications to actually allow for data to be deleted.

~~~
ben509
Deletion is effectively a failure mode, and engineering tends to focus on the
happy path.

~~~
FridgeSeal
Only if you treat it like one.

Arguably a well designed system should be built to handle having certain user
data deleted without interrupting or breaking anything.

------
tdons
We know that HN is visited by a fair share of Facebook employees.

Can some of you weigh in (anonymously?) on this topic? Do you guys do hard
deletes of user data instead of just soft deletes? If so, are logs or backups
kept? For how long?

In other words: if I'm a user of $POPULAR_SERVICE and I delete my account at
time t0, is there a t1 > t0 after which every trace of my data is gone from
the platform?

My (cynical) guess is no, but I hope I'm wrong :)

~~~
lend000
DELETED=TRUE

~~~
robjan
This is how most services work at scale. It's much cheaper to set a flag than
actually delete an entry in a database. The data can then be scrubbed by some
periodic maintenance process.

~~~
jacquesm
Only that doesn't typically happen. It just sits there, for _years_ or until
the company goes bust.

The typical reasoning is that marketing wants to hold on to the data, they
will never ever say 'ok, enough, you may delete it' because there is this
infinitely small chance that they can re-activate an account, market to it for
some other product (no matter that that is against the GDPR) or to sell the
data to some third party if there ever is a cash crunch or panic. They see
data as having positive value no matter what, whereas data that you shouldn't
be holding on to is actually a liability.

------
mtnGoat
Based on the conversation here, it would seem like deleting data is an
unsolved problem in CS, or some of you just make it overly complex. Every
system I have built had deletes, because it is the right thing to do... No
team I've been on got away with not doing the right thing because it was hard.
We put our heads down and got it done. Smh.

Some of the engineers at these companies forgot their morals as soon as
something got hard I guess.

------
ffpip
Oh they fully delete those? I didn't actually expect messages to be erased
when we un-send it.

~~~
shahsyed
I genuinely laughed at this - I thought the same thing!

If you ever downloaded your own account data, I think OP and other concerned
posters would understand how much data companies retain and this wouldn't come
off as a surprise.

Snapchat data for example, has chat logs, snap history, what accounts you've
added/requested as a friend, and friends that have added you all retained. I
bring this up as an example because the idea behind this application was to
send a message that would disappear after a variable amount of time :)

~~~
ffpip
I did download it. Before deleting my account. Just put it in a hard drive
since it had all messages, photos.

Time to dig into it.

------
searchableguy
> Pokharel discovered the bug in October last year and says it was fixed
> earlier this month (2020, august)

Shouldn't there be fines for this?

Took almost an year _after_ being reported.

~~~
rco8786
A fine for what? Not fixing a minor bug in a timely fashion? That’s not a road
we want to travel down.

~~~
jacquesm
This is not a 'minor bug'. This is a major oversight and possibly a regulatory
issue. The user owns the data. The user has rights.

~~~
ffpip
Instagram doesnt use the word 'delete'. It uses 'unsent' for messages.

So removed from other person's chat. Still associated with your account

I have my account data from last year. Time to dig into it and sue instagram.

Oh wait. I'm not in the EU. I'm in India, where they can even sell my info. So
they definitely didn't delete it

~~~
jacquesm
Pulling linguistic tricks typically does not amuse regulators.

~~~
rco8786
I'm curious what regulatory issue you think is in play here?

~~~
pirocks
Gdpr has a right to deletion within 30 days. Companies have been fined for
failing to do so.

------
yalogin
I assume all companies except may be Apple and google never delete the data
even if they provide the delete option. For example, I still keep getting
emails from mint saying that my credit score has changed even though I deleted
all my info from my account.

------
rco8786
Some real (understandable) ignorance about how the tech industry works in this
article. There should be _no_ expectation that Instagram ever deletes anything
permanently from their servers.

The bug was showing users photos that were internally marked as deleted. Not
that the photos were not in fact removed from Instagrams servers.

~~~
victords
> There should be no expectation that Instagram ever deletes anything
> permanently from their servers.

No, there should be this expectation. If I have a photo of myself that I'm not
comfortable being saved somewhere, there should be an expectation that when I
delete this from a service, that service will actually delete it.

The expectation should be that the service will do what it said, not that it
hid everything very well.

~~~
dboreham
Unfortunately this isn't how computer storahe works. This is why the military
have acid baths for their disk drives.

~~~
have_faith
In this particular case they are actively not attempting to delete the image
at all, a big difference.

------
anupamchugh
And then run facial recognition on them? Nice way to capture our biometric
data on the cloud :D

------
Dirlewanger
What is not revealed is that the actual bug was that an end user found the
deleted data, not that the data wasn't deleted. Should be no surprise that
Instagram would have similar data retention policies as its parent company.

------
thefounder
Maybe you shouldn't share that info in the first place? Even if the
provider(Instagram in this case) acts in good faith and deletes the data, 3rd
parties would still own it(i.e US sec agencies, various crawlers etc)

------
AnonHP
And here I am dreaming of a world where companies do real deletions of data
when a user requests (and also deleting older transactional data that has
outlived its utility, including regulatory requirements) and storage prices
being lower with a slightly smaller (steady) market for it from the major
companies.

On the other hand, storage prices seem to be low enough for all these
companies with bulk, long term contracts that developers wouldn’t bother doing
real deletes of data.

~~~
toast0
> storage prices being lower with a slightly smaller (steady) market for it
> from the major companies.

I suspect that big companies in the market help justify manufacturer R&D into
larger and faster and helps justify production capacity. I think we'd have
smaller, slower, and slightly more expensive drives without their demand. But,
speculation only.

------
j45
I'm surprised Social Media networks don't convey when your data is actually
'deleted'. This approach seems a little more evident in the "archive" status
on Instagram.

A flag to mark data for deletion makes sense at scale.. given the number of
other automated processes that run more often than a few times a year.. the
user should be in control of their information and intent.

------
Hitton
Few months ago I requested my data from Discord. Interestingly enough, it
didn't include my messages from server that was deleted some time before that.

~~~
scrollaway
Internal sources at Discord confirmed to me (a few months ago, so may be
outdated) that deleted messages are fully gone from the servers within hours
of deletion.

Deleted files hang out longer but are gone within a month.

------
imgabe
I don't know how many times this is going to have to happen before people
understand this: when you put something online, assume it is essentially
public. Forever. If you don't want it to be public forever, don't put it
online.

~~~
gilrain
This is victim blaming and only reinforces the status quo. The way things are
isn't the way things have to be. We can change the rules if we work together.
Or we can give up and blame the victims.

~~~
jccooper
You can have Instagram delete that photo when you change your mind, but how
about the friend who saw it and saved it, or the already-illegal bot that
crawled the site and archived it?

"Changing the rules" can reduce data availability, probably well enough for
most purposes, and that's good. But it's simply strictly true that once you
publish something, you cannot assure it's unpublished. And everyone should
know this and act that way.

------
jermier
I want to know if the photos are securely deleted. It's not enough that the
mere reference to a file is gone. I want everything overwritten with zeroes,
and the photo made properly irrecoverable.

~~~
smcameron
Always encrypt the data at rest, and delete by deleting the key is likely how
this would be done. This way you can also delete e.g. tape backups without
actually loading the tape and re-writing the whole thing with certain portions
deleted, which is not really practical.

~~~
jermier
Yes, and you could also _queue_ files for deletion at a later stage by
throwing away the encryption key for a large batch of files which have been
queued for deletion.

------
aspenmayer
Ok. I need help recovering photos on Instagram

------
alkonaut
Data that I wanted to delete isn't necessary for the functioning of the
service, so doesn't that mean the GDPR requires it to be hard-deleted within a
reasonable time?

------
ReptileMan
And if they delete it faster another vocal faction will start to whine that
they enable racists, sexists etc to abuse.

Tomorrow at slate "How instagram permanent delete perpetuates white supremacy"

There is no policy that could satisfy all.

