I got used to the looks of disbelief, thinking that I was some sort of hermit, an antisocial.
I also got tired of answering the frequent "Why don't you have Facebook?" questions.
I remember the last time I had this conversation with someone, last year (2017) around August. I found a new love partner, and after the long intimate talks on the phone, they requested the usual "intimate pictures", not necessarily sexual but certainly sexy. While I have no tabus with regards to my sexuality, having an understanding of how the Internet works, I have always refused to send that type of images/videos/audios, and I always tried to be patient with the other person to explain my constant denials. Unfortunately, expecting a non-tech-savvy person to understand how data moves around the Internet is most of the time based on hope, and even if they understand, they ultimately don't care because the result doesn't change: you don't get to share something with them and that affects personal interactions.
I am sure that the deletion of media files in services like Facebook has never meant to be absolute. Many of my colleagues believe the same thing that I believe: Facebook and other services do not actually delete data, they just mark it as "deleted" and purge it only if they need the space. The same way a hard drive works, you don't really delete a picture when you hit the "delete" key, nor even if you clear the "trash" folder, the data is still there, where it was, it just loses the links to the metadata.
It is sad how this information becomes news only when bad things happen.
No reason to believe. You can read about the storage architecture used to store photos from a post in 2009 here: https://code.facebook.com/posts/685565858139515/needle-in-a-.... Obviously that might and probably has changed since, but at least at some point that was exactly true.
"The delete operation is simple – it marks the needle in the haystack store as deleted by setting a “deleted” bit in the flags field of the needle. However, the associated index record is not modified in any way so an application could end up referencing a deleted needle. A read operation for such a needle will see the “deleted” flag and fail the operation with an appropriate error. The space of a deleted needle is not reclaimed in any way. The only way to reclaim space from deleted needles is to compact the haystack (see below)."
Who doesn't do the something like this?
Not to alleviate facebook of blame, but who's to say data on almost every other social media service isn't also just flagged for deletion?
Having said that, you'd be amazed how often folks ask for things to be undeleted (despite a big warning dialog).
Clearly developers pervasively believe soft deletes are occurring everywhere.
If you use asymmetric encryption, you can keep the group of people who who can recover “deleted data” small. You could even have an independent party generate your encryption key pair, give you the encryption key, and your customer, on request, the decryption key (I think there is a business model for a non-profit here).
> generate a new encryption key every day for “data deleted today”,
The question is not can we encrypt at storage. We’re now talking about encrypting as a soft-deletion method, which means we need to know everywhere the data is stored at deletion time, whether to delete it or to encrypt it with this new “deletion” key.
Ultimately, its the trust that is ghe problem, and that is what needs to be removed eother through new technology or legislation or both.
if they upload a private key, and delete because they "don't want a third party to have". do you also guarantee it wasn't seen or cached anywhere else? I dont know the details of that product, but I usually treat anything uploaded even once as compromised from that point on.
1. Deletions from backups
2. Deleting material that has been deleted prior to the restoration of the backup?
The word "delete" has a pretty clear definition to most users. Facebook is one of the most used pieces of software in the world. If FB is allowed to lie to its users, it would indeed give a pass to just about every social media service out there.
The reason Facebook is special, and deserves special scrutiny, is because of its power. If FB establishes a bad behavior, it will become the norm.
These are incredibly important questions. A related field would be the credit bureaus, such as Equifax. Global companies who store social security numbers and all other sorts of information. We need a national set of rules for these companies to follow.
Not keeping my hopes up, given our Congress is so dysfunctional these days.
So with this type of system nothing is ever "deleted". It's just an event that something is deleted.
This is a common and very scalable system. You don't deal with models, you deal with events (and a model is a snapshot of events).
This is even an issue. Even other companies that aren't event sourcing, but traditional model architecture have backups. You ask something to be deleted and they might actually delete it, but what about last weeks backup? It's not deleted there.
It's very much against the rules in event sourced systems to change history. But maybe that just doesn't matter. If it means you can never meet a user expectation about privacy, I guess you could tell the user that everything persists indefinitely... or when something is deleted, go back to the upload event and remove it, rebuilding history with any event related to that uploaded item ignored. Putting the user above the "purity" of the software and creating potential problems elsewhere.
Even on backups in long term storage, there could be some process of creating new copies of the backups with any needed modifications on some kind of schedule, so deletions can propagate over time.
Ultimately the challenges here are financial. We could delete things thoroughly if we were willing to pay for the developer time and other resources needed to make it work.
Do you mean the delete button is a lie? Why would it be a lie? Can you or someone else access the deleted video from Facebook.com? Or in another public way? Isn't it deleted from this point of view?
I am not defending Facebook in anyway. I just don't understand why is everybody surprised about these things. Do you think if you click delete on a video on youtube, then it physically deletes the video from all of its servers?
Snapchat just changed their messaging to quell user concerns. Once they have a critical mass of users, they are immune to disclosing that Snapchat messages are not truly ephemeral.
>Snapchat servers are designed to automatically delete all Snaps after they’ve been viewed by all recipients
>Snapchat servers are designed to automatically delete all unopened Snaps after 30 days
I might just "help" them by uploading more data I guess
- disk space is cheap
- deletes are expensive (time) and slow
- deletes are harder to scale
- can't revert a real delete
- delete's don't fit into an event sourcing architecture
- append only data is better, more durable
I could go on.
Companies that take daily backups. Say a user asked to delete something, do they now go through and comb through their backups (which might even be offsite or in cold storage) and delete it? It's essentially the same thing.
So you're saying that every company that has a backup system, and who don't regularly go through the backups and remove individual files from the backups because users requested it is a lack of respect? So companies that have offsite backups should, according to you, have policies in place where user data is also removed from offsite backups?
For example: a soft delete may be just a stronger version of public vs private settings. The whole software infrastructure still assumes a link exists and doesn’t need to cover cases where it really isn’t there. I could see how that makes maintaining indexes etc easier.
Flipping a flag and then filtering out results down the line based on the delete setting is probably much easier than actively removing them from an index.
And if deleting is rare (it probably is), then the performance and resource impact should be minimal.
Hehe, you mean, like... Facebook? They respect advertisers with money, not users.
At this time, what's the loss in being banned?
Even now, as facebook is burning, statements of how one has quit or will be quitting facebook get swept into the pile of incendiary indignation, with encouragement from all sides.
But never having used facebook, even at significant personal effort as you indicate, one is relegated from "elitist" before to "smug" now.
One day in the future a recruiter will ask why there's nothing about you on the Internet, and you will proudly be able to say: "Because I know the Internet and its dynamics that well" and they will hire you, in awe of your analytical foresight.
That's the dream anyway, because you're more likely to be reported for being suspicious. After facebook there will be another facebook, and another, and people will flock to them just the same, and you get to experience being an antisocial hermit all over again.
Now I made myself sad. "Social Media: even more depressing when you're not on them!"
Except not having your racy pictures in Facebook's media archive.
So I'm wondering if the services actually have some sort of archiving requirement for law enforcement purposes? Maybe for a certain number of years, they have to save your data or something like that?
If there's anyone who would be familiar with the legal obligations of these services vis-a-vis data archiving I'd be really interested in hearing more about what we should reasonably expect from these services in terms of deletion etc?
Apart from a handful of specific cases like financial data, the US has no general data-retention laws. You can delete stuff aggressively as long as it's based on a consistent archival policy, not one-off deletions where you risk looking like you chose a particular thing to delete to hide evidence.
You can tell this is possible in practice by looking at how common it is to have aggressive permanent-deletion policies in corporate email, at least outside of tech. A number of big US companies automatically delete read emails in employees' inboxes after N days (with N ranging from 7 (!) to 365), unless the employee specifically takes action to refile the email into a project folder with a different per-project retention policy. The goal of those policies is to reduce companies' exposure to fishing expeditions in future lawsuits by just keeping less email around. To make that effective, the policies really do delete the emails, including from any backup systems.
Given that they have figured out how to perma-delete their own old email, I believe companies could really delete user-deleted content, perhaps after some specified period of time, if they wanted to. But unlike with their own internal emails, they don't have the same incentives to be aggressive about purging that stuff from their servers. If anything, they have the opposite incentive, to keep as much user data around indefinitely as possible.
My understanding is that an image is by itself PII, regardless of whether or not it has any additional information associated with it. I don’t think there’s a way to retain images without contravening GDPR.
When looking at a single datum by itself, this seems to rule out anything except PII i.e. data that identifies or can be used to identify an individual.
What that says is that, if (A,B,C) identifies a person, each of A, B, and C, in isolation, is personal data, not that you will be allowed to keep the pair (A,B) if it doesn’t.
One mathematically can cut each bit of information in units of arbitrarily small entropy. So, if taken to the letter, “this user is not Mark Zuckerberg” would be personal data. I doubt jurisprudence will go that far, but we’ll see.
Whether information that can only be used to identify someone but doesn't tell you anything useful about them is still personal data is unclear to me.
Not in this case, because if the photos or videos contain recognisable people then they are themselves personal data.
How far the new subjects rights involving data deletion will go in practice is one of the biggest unknowns with the GDPR. Clearly from a technical point of view we understand that deleting a key isn't the same as deleting data from a disk, and often that would also include deleting a file in a filesystem if the underlying storage isn't robustly wiped as well. Throw in the kinds of distributed architecture, redundancies and backup systems that many organisations use, particularly in the era of cloud-based hosting and off-site backup services, and you have an unfortunate conflict between not truly deleting data (and therefore still having some degree of risk that the data will leak even if it's intended to be beyond use, contrary to the spirit and possibly the letter of the new regulations) and potentially high or even prohibitive implementation costs to ensure robust deletion of all copies of personal data when a suitable request is received.
The hard part is actually enforcing it, and assessing compliance.
You may be correct, but that doesn't explain why Facebook decided to include so-called deleted files in a download of user data. Clearly these deleted files are still a part of Facebook user profiles and accessible to company data mining software. Facebook has exposed their own duplicity.
But I wouldn't discount your hypothesis.
Such an ethic creates a moral reasoning to not comply with an individual's wishes in the immediate deletion of data.
(FWIW I'm not defending this position nor suggesting it's the case here, just you said there's no moral reason that can support it, which seems wrong; different ethical systems can provide different reasoned moral outcomes.)
Or possibly they just screwed up. Perhaps the "soft delete" was originally intended to allow "undelete" by the user with delayed purge, and/or single-instance storage with reference counting that they never quite got around to finishing.
This happened because the person tasked with writing the code to build the archive forgot to include the filter for "deleted" records somewhere in the code.
I.e., they forgot the "where is_deleted = false" part below on one or more DB query requests like this:
select * from table where is_deleted = false;
This is the biggest problem with the "soft delete flag in database" method of deletion. Every single query writer, everywhere, forever, must always remember to include the "is_deleted" filter in their queries. And when they don't, what was deleted reappears as if it had never been deleted at all.
Agreement is better than disagreement. Would I prefer we had agreement earlier? Yes. Is agreement today better than agreement tomorrow? Absolutely.
Now that we have a constituency, the important thing is to mobilise. The past is in the past. Our job, in the present, is to protect the future.
Antivirus vendors and shitty vpn services will be all over that.
Who even supports FB?
The media? No, they hate them because they took all their ad revenue.
Republicans? No, they hate them for the censorship controversy that happened a year or so ago, and because it is full of ultra left wing silicon Valley types.
Democrats? No, they hate them for the whole Cambridge analytica, Russian data thing.
Facebook has made a lot of enemies, and there really isn't any sort of constituency that SUPPORTS them.
So, even if a business currently gets a lot of revenue from Facebook, as long as a business thinks that other businesses in their field are more dependent on Facebook than they themselves are, they should be fine with Facebook declining.
2. The fact that they are the only source of information that a good percentage of the world uses.
Money can be used to buy power, and they already have a not-insignificant level of control over the flow of information (which is power itself).
It is more effective to organise for a cause than against a politician. Presidents are intentionally difficult to remove. The bar for promoting action against Facebook is lower than for prompting action against the President.
What bad things? I feel that's the part missing from the argument. People have yet to see or hear what are the negative consequences of all that data being kept or even leaked or re-sold.
The only one they've started to know about is the potential impact on elections, which is pretty hypothetical and weak to most people I feel. Or maybe identity theft, but that's more related to the Equifax leak.
I think its important to rationalise on what are the real consequences of our data no longer being private. Is it really dangerous? What's the worse that could happen? What are the chances of it happening, etc.
Isn't it still trivial to self-host stuff?
Just send a link to picture (or document or whatever confidential information you want to share) to a password-protected resource on your own server (or even a laptop or desktop machine, if you have globally routeable IP address there). Facebook automation is not that smart to grab the password from the very same conversation, and even if they do - I'm sure they won't do it, knowing you'll catch them in access logs and press charges for unauthorized access.
I doubt many would object and insist on sending via a very specific medium (i.e. strictly require pics in a FB Messenger). Some, of course, may find this inconvenient.
I really do wish self-hosting were more trivial, it would be a better world.
Then they charge often 5x or more their normal price to let you host things, but add lots of exceptions, for example all providers put in contract they can immediately cancel your subscription of they detect you hosting anything irc related, doesn't matter of it is a irc server or a irc bot or a server for a open source irc client...
Once someone understands public and private keys, and webs of trust, there really isn't much left to learn. For someone who understands keypairs, the limitations of Facebook/Twitter/etc., DRM, etc. are obvious.
It seems most of us are afraid our non-tech-savvy friends and family won't be able to wrap their heads around security, but not understanding it has gotten us into a pretty bad situation. We should really stress the importance of learning about it.
Nobody in the general public wants this.
Especially if their tech-savvy friends are confident they can learn about it - because it really isn't that complex - and if they understand that keypairs and trust are the basis for literally all digital security.
Have you ever actually successfully done this? More than once? And they continue to use it?
It's a usability nightmare. https://moxie.org/blog/gpg-and-me/
But keypairs. Everyone should understand keypairs. They are the basis for all of digital security, and they are really not that difficult.
I know that look.
>I also got tired of answering the frequent "Why don't you have Facebook?" questions.
I solved it by stating flatly "For the same reasons I don't have Twitter.", somehow marking the final period, people still believes I'm a kind of weirdo, but they don't go on asking ...
Explain how data can be unreadable while it moves. Teach them to use secure communication options. You don't need to be an electric engineer to use a TV remote control.
And heck, there are people who can't use tv remote controls.
The only thing that I'd consider "easy" is encrypted chat (signal). The "issue" there is market fragmentation (arguably a good thing).
With a few exceptions, anonymity online is ephemeral at best, subject to the motivation of the person/org trying to deanonymize you.
How did you arrive at that conclusion? I assume Twitter retains everything as well (even "deleted" tweets) and it's all associated with an email address. Or did you mean it in the sense that far fewer people have a Twitter account?
These are tar files that contain bz2 compressed newline separated twitter events as json. These include deletion events as well, so you can for instance easily estimate the time an auto-deleter is set to.
Yes, they're huge archives, but you could still probably process a year of these for particular targets for under $10 on EC2.
Whilst I'm impressed with archive team's efforts, I would be surprised if there aren't some commercial twitter stream consumers that absolutely dwarf this.
Treat everything you put on twitter as public forever and you won't go too far wrong.
Because of the twitter stream APIs it's not. But there does seem to be a strange presumption amongst users that deleted tweets are gone from public view and cannot resurface. There are people who use tweets in all manner of ways that they really weren't designed for, some of which involve deleting them after a few minutes.
Many a public figure uses these tweet deletion apps. Some do it for more honest reasons (status count limits -- do they still exist?), others do it to limit their exposure.
In the UK at least, there have been cases of libel where either the claimant or the defendant depended upon twitter and in at least one of these the court admitted the claimant had an unfair advantage by forgetting about having a tweet deletion app attached to their account. The case proceeded and the claimant won despite the acknowledged advantage. To some, this may be seen as a clear message that in the eyes of the judiciary it's okay to delete tweets (evidence) as long as it was through an auto deletion app and the individual concerned forgot about its existence.
I would not at all be surprised if some lawyers to the rich quietly suggest they install a tweet deletion app as general advice upon instruction.
Twitter probably have less data on you, but I doubt it can't be linked direct to you by a TLA, say.
Why the fuck are these a thing? Couples don't meet in real life much anymore? And how "usual" are they?
Then we - the people that do have the necessary technical knowledge - have a duty to teach them what they need to know. This isn't necessarily "how data moves on the internet". Yes, this can be difficult and tedious, but understanding the risk profile for data/networks is increasingly important as networks become involved in everything.
> they ultimately don't care
Again, it's our duty to teach them why they need to care. This probably shouldn't involve a lecture on networking or data analysis, but instead tailoring an explanation to their personal situation and knowledge.
Think about the last time you've tried tinkering with something you're a noob at. Maybe it's deciding that you would try fixing your car engine yourself even though you never were a mechanic. Maybe you decided to make a complicated cake and halfway through you realize that you overestimated your pastry skills. Try to remember the feeling of helplessness you felt at that moment, the "I have no idea what I'm doing and I wish I never had started that in the first place". In my experience that's how 90% of people feel like when trying to do something technical with a computer.
A few weeks ago a colleague from HR asked me if I could make a backup of a computer because it contained some critical stuff and she wanted to be able to restore it later if necessary. I say okay, boot up a debian live USB stick I had lying around and start dd'ing the drive to external storage. When I told her the copy was in progress she told me "but I didn't give you the password?". She was amazed when I told her that I didn't need the windows session password to access the data on the disc. I swear I'm not making it up when I say that she asked me if I was a "hacker".
That made me realize that there are probably many people out there who think their files are safe as long as their Windows password isn't compromised even if the disc is not encrypted. After all, they can't access the files, so surely nobody else can? If Facebook says my photo is deleted, then surely it must be? Why wouldn't it be?
I don't think it's fair to blame these people, we've designed so many strange patterns over the past decades in software that it's difficult to keep track. Maybe having "delete" not actually delete should be considered a dark pattern. Maybe it should even be illegal.
Of course they assume it. Partly also because windows tells you, if you loose your password, you can no longer access your account, which is bs and they know it and tell you only for "felt Security".
And encryption ... What is that?
There have been horror stories over the years about identity theft, even before the emergence of social media. Has this stopped anyone outside our community from posting details about their lives online? I hardly think this whole situation with FB will change anything in the end.
I don't feel I have any obligation/duty towards anyone. If they want my opinion or ask me about an issue I'll gladly inform them. But I won't start a crusade for a better informed society. Internet was supposed to do that and we ended up with videos of cats and wannabe celebrities posing seminude pics on Instagram. Fuck that shit.
It reminds me where in Zen Buddhism, there are those who become enlightened and go off to do their own thing, and those who become enlightened and stay in the world with the rest of the ordinary unenlightened people. In the words of Alan Watts:
The understanding of Zen, the understanding of awakening, the understanding of– Well, we’ll call it mystical experiences, one of the most dangerous things in the world. And for a person who cannot contain it, it’s like putting a million volts through your electric shaver. You blow your mind and it stays blown. Now, if you go off in that way, that is what would be called in Buddhism a pratyeka- buddha—“private buddha”. He is one who goes off into the transcendental world and is never seen again. And he’s made a mistake from the standpoint of Buddhism, because from the standpoint of Buddhism, there is no fundamental difference between the transcendental world and this everyday world. The bodhisattva, you see, who doesn’t go off into a nirvana and stay there forever and ever, but comes back and lives ordinary everyday life to help other beings to see through it, too, he doesn’t come back because he feels he has some solemn duty to help mankind and all that kind of pious cant. He comes back because he sees the two worlds are the same. He sees all other beings as buddhas. He sees them, to use a phrase of G.K. Chesterton’s, “but now a great thing in the street, seems any human nod, where move in strange democracies a million masks of god.”
— Alan Watts, Lecture on Zen
I’ve used this with success several times. Though you generally have to know the person well enough to know their “secrets”.
This is a dumb conspiracy theory. Facebook has made plenty of public statements that say otherwise, and there's a whole team that works on the system that ensures every trace is erased from disks, logs, cold storage and backups when deleting content.
"remove or obliterate (written or printed matter), especially by drawing a line through it or marking it with a delete sign."
"synonyms: remove, cut out, take out, edit out, expunge, excise, eradicate, cancel"
All of these seem clearly "absolute" to me. "Delete" means it's gone.
I think Facebook has its own special linguistic distortion field. It requires no "dumb conspiracy theory" to realize that Facebook cannot be trusted.
Some mail programs for a long time have had a soft-delete that requires an expunging process to create compete removal.
In an IT setting you can delete a blob from a db, but it might still be on disk, and it will still be in caches, on user machines, and in backups/archives.
Can you support your assertion? The infrequent cases where someone manages to extract or recover supposedly deleted data cast a lot of doubt on your claims.
In any case, even if it's not Facebook specifically, it seems overwhelmingly likely that the majority of companies do not actually delete your data.
I can give you plenty of statements about how I'm Santa Claus though.