"View As" was a vital tool to assess what data is shared with whom, and guarding one's privacy is harder without it.
I've seen advisory after advisory just on user impersonation. Just no. Don't implement.
If you want to impersonate, use a private session or have users you control do the testing.
The sheer number of records per user is probably astounding, and facebook has to figure out a way to understand exactly how thousands of records per user, being deleted, will affect it's models.
Thought experiment: Imagine, all of the hacker news community and other privacy minded people go right into the tool and delete everything. That is an entire community and classification of people that facebook has to figure out how to reconcile the loss of data for, and fix its machine learning models. The difficult part is since so much current AI logic is based on historical data, a user deleting data, will be nearly impossible to untrain from the model.
I know there are more nuances here, but I think there are some challenges Facebook has in implementing this, and business wise it's a drain on resources that will hurt them. It's easy to think it would be so simple for FB to just call user.delete.all.data(), but the reality is the loss of data could seriously mess up their algos.
My answer to that is 'who cares?'
It's not my problem, let me delete all my data.
In fact it's required by law in some places.
This is not a science problem and it is not hard to solve: a scientist would delete the data and re-derive their models because they want the best data possible. This is how you "preserve the usefulness of those models," by ensuring that the data has integrity.
facebook has to figure out a way to understand exactly how thousands of records per user, being deleted, will affect it's models.
This is looking through the telescope backwards. You create your models with the data that you have, and if some data is removed then you regenerate.
You can bet your ass they groom their data after detecting stuff like new botnets and fake clicks, and individual user data does not have to be different.
That is an entire community and classification of people that facebook has to figure out how to reconcile the loss of data for, and fix its machine learning models
Again, they already know how to do this, and they do it all the time. There are no nuances to it outside of roadblocks installed by FB themselves.
It's 110% a business problem, and 110% Zuckerberg's choice to make it difficult, and to invent difficulties in bringing it about.
I for one would be OK with a machine learning model that's been trained using my data. I wouldn't demand that it be "re-trained" just because I asked for my history to be deleted. I think most people, given an explanation of what a machine learning model is, would be fine with this.
The clear history thing is about personally identifiable information though. It's not having your information dangling around after you leave Facebook. It's having the option to delete your shadow profile.
Throughout all the hearings, FB execs have been very wishy-washy about how they handle your browsing history, particularly what they collect outside of FB properties.
People should continue to call them out on this until they get their act together.
2. That's it. Problem solved.
Question for extra credits:
These machine learning models you speak of, do they mostly do stuff that...
a) is in the interest of the users
b) goes against the interest of the users
Society does not have an obligation to tolerate negative externalities just to protect someone's business model. That is precisely what regulation is for.
Also, worth noting that FB has 2 primary users: you and advertisers. The second class of users is very interested in having your data. Unfortunately, FB seems to favor them bc they are the user who's paying them.
99.9% of the people are not even aware of the distinction. This is an attempt to follow the letter of the law to create an excuse not to follow the spirit of the law. What people want is personal privacy and cognitive freedom. They don't care about implementation details.
> Also, worth noting that FB has 2 primary users: you and advertisers. The second class of users is very interested in having your data. Unfortunately, FB seems to favor them bc they are the user who's paying them.
You are right. Sorry for the typo. I did not mean "users", I meant the "used".
Supermarkets have two types of users: customers and other businesses that use supermarkets to distribute their products. It would be in the interest of those other businesses if, for example, food safety standards were more lax. But not in the interest of the customers. The food industry has a lot more money and power than the customers, so we solve this with regulation. It's kind of Civilization 101, really.
In our world of Big Data, where it's easy to combine many disparate databases, effectively anonymizing data is also a very hard problem to solve.
Indeed, that sounds like a hard problem.
So the question becomes "how much of the hundreds of millions in R&D are being allocated to solve the record-deletion problem?". I'd like to see spending reports, new research publications from Facebook (e. g. retroactive record-deletion, attempts to build models that are robust to deletion, attempts to build k-anonymity into models from the start).
> I know there are more nuances here, but I think there are some challenges Facebook has in implementing this, and business wise it's a drain on resources that will hurt them.
I can give Facebook the benefit of the doubt and suppose that they're trying to figure out how to transition smoothly without destroying their models. But I'd also like to see a hard timeline when they promise to roll out user.delete.all.data() anyways (business value be damned).
The real test of whether a company really "takes your privacy seriously" is whether they ever reach a point where privacy becomes a priority, i. e. whether there are decisions to do the right thing even when it could be bad for the bottom line (under the current business model).
The easiest explanation is that Zuckerberg was lying. We could give him the benefit of the doubt but since he’s always lying it makes more sense to save time and assume he’s lying about this as well.
That may not be as simple as you think. FB probably has thousands of data types and each one needs to define how a deletion should affect everything else. E.g. if I delete my comment, what happens to your reply? That one is obviously solved already since it's a core part of the product, but there are probably lots of pieces of data that were never designed to be deleted so will need effort to support.
Given they solved that insanely hard problem pretty well, I’d suggest they could also solve this if they had sufficient motivation.
It seems more sensible than trying to block them - instead give them loads of bad data ruining their business model?
For instance you could pick which set of noise you wanted to feed Google i.e. convince Google you are single mother from Lithuania... or Facebook you're a 63 year old man from Italy or Instagram you are a Teenager from Korea.
Every time a tracker or data point is sent to them to "know" you it could be a lie. This would lead you out of your search bubble too so political ads by Russians being bought to undermine Hillary's black vote might be more clear to everyone.
Trusting Facebook or Google to do the right thing against their financial interest won't work will it.
I've used a similar add-on for Google search (the add-on made some human-like Google searches in the background), but I can't remember its name now.
Facebook would be a pointless exercise, since the <div> names are automatically generated. You can't even hide the sidebars you never use using CSS because they'll all change their <div> IDs on refresh. That and the murkiness of what exactly is an ad on Facebook makes even the most thorough adblockers struggling to work properly even in an otherwise-ideal situation (a desktop browser with an adblocker).
> all ads are supposed to contain the word “sponsored” as part of a mandatory disclosure, so users can distinguish between ads and their friends’ posts. Our tool recognized ads by searching for that word. Last year, Facebook added invisible letters to the HTML code of the site. So, to a computer, the word registered as “SpSonSsoSredS.” Later, it also added an invisible “Sponsored” disclosure to posts from your friends.
A Facebook page that I'm following paying money to reach me is an ad according to Facebook. To me, that's extortion.
On the other side, we have notifications about friend suggestions. You can't turn them off in the notification settings. That to me is an "ad" for Facebook itself, even if nobody payed them money. It's an undesired content that pulls my attention away from what I came there to do in order to promote a product.
How would you fix that? The only thing I can think of would be a chronological feed where everybody sees every post.
The model they currently use serves two goals: to extort money from fanpages, and to hook users up by intermittent reinforcement.
They probably already record every item displayed on screen (and how long for, and whether my cursor hovered it, etc.) so a list of items I saw (on any device) in the chronological order I saw them should just be a simple db lookup away.
If you build a cockpit by measuring everyone's dimensions and averaging them, you'll end up with a cockpit that fits nobody. I believe the same is true for feeds as well.
There's an amount of information that works for the average Facebook user. However, none of us are precisely on that average. By building algorithms whose mission is max consumption by the average user, they've made their product useless to everyone.
Honestly, everything this company does is shady beyond all shame.
Twitter keeps fucking around with their timeline feed, but they've preserved a "chronological, show me everything" option through all of their other bad ideas.
After a few refreshes of my Twitter homepage in a short time span, I did a little research: I've edited the CSS to make the distinction of "liked" tweets more obvious. Over 50% of "fresh" content turned out to be likes by the people I follow that are in no way chronological.
You can fight it... for now, but I assume not for long. I've stopped relying on Twitter for that same reason I've stopped relying on Facebook.
Is that the name for the "recent tweet from X" notifications? Those have made Twitter notifications completely useless for me. :(
As far as I know, there's no universal term for them. Since we already associate darkness with bad UX patterns, "shadow notifications" as a subset of "dark patterns" has a nice ring to it.
We've long had the ability to set a minimum font size, but I suspect browser vendors have viewed that purely as an accessibility feature.
(Tangent: I've also been increasingly disappointed that browsers let sites make text unselectable. I understand that there was originally a use case, but I only ever encounter it as an anti-feature these days.)
Breaking up a word into multiple tags is very different from inserting invisible extraneous content into the middle of a word. The latter is what I'm questioning the usefulness of; the former I can see as occasionally useful, but should be avoided when it breaks screen readers and search engines.
Take a modal for example - the text is not visible until the modal is opened. Let's think about menus that display on hover - they wouldn't work. What about tooltips or other sorts of floating label type things? What about other text that only appears on hover, or when some action is performed?
Sure, you could wait to insert these things into the page until the last second, but it would be less performant and complexity would be higher.
If you saw what the web looked like without display: none and visibility: hidden, I think you'd change your mind. There are infinite applications.
Aside, this reminds me of the brief period of time where "hacking" SEO by pasting huge volumes of keywords in text the same color as the background and/or hidden by other page elements was a relatively common trick. Search engines simply switched to more sophisticated methods of page ranking, I wonder what change could be made remedying this which would be similarly non-disruptive outside the bad actors.
But can anyone honestly claim that putting an invisible 'sponsored' on non-sponsored posts is ethically OK? Can't we all agree that ads should be readily identifiable as such?
By human users for sure. Scripts are a whole other question.
BTW. to the extent clearly marking ads/sponsored content is required by law, I wonder whether Facebook could be sued for making this difficult for users with disabilities - after all, accessibility tools like screen readers are scripts and rely on machine-readable metadata.
I disagree entirely.
(By 3rd party ads I mean all ads that are advertising anything not directly sold or provided by the owner of the service showing that ad. That's to resolve the immediate objection of whether telling people about new features is an ad or not.)
Man can dream.
EDIT: But this man would also vote for the person pushing for such legislation.
Personally I'm stuck with voting for "the party not dismantling the NHS who stand a chance of getting in".
Maybe in my kids lifetime we'll get proper PR at the national level (UK).
...also brexit ;-)
In a world where Firefox and Safari were the major internet browsers, this wouldn't actually be that hard since it could be built into those browsers directly and would already align pretty nicely with their respective parent company ethos
Then users will blame the browser for being trash rather than FB.
What I don't know if that means browser or App. Not effective at all if most people use the app.
One issue is that there's data about you that's not generated by you, but by people you know (i.e. John adds me as a contact into their phone along with my birthday and address, Jane friends me on VK and keeps tagging me in pictures that have location data, etc).
I suspect that right now, companies give higher importance to data that you've entered about yourself, but once they realize you're feeding them noise, they can adjust their algorithms to trust other people's data about you (if they're not already doing that), which would significantly decrease the effectiveness of feeding them noise about yourself.
It will still work for all data points about you that can't be collected from other people (like browsing habits, your phone's location, etc)
Maybe some people can inject noise about their friends, instead.
1. install the FB app on a phone with an address-book full of garbage and share it with the app,
2. inject fake geotag data in the photos you tag your friends in,
3. put the actual tag on strangers in the background,
4. tag your friends in stock photos, etc.
Some of these you obviously can't do with friends who don't consent, since they may get annoyed by all the fake tags, but others would be transparent to most social media users.
That said, it's pretty hard. It's not entirely clear which data points Facebook uses, but it's very likely you can't influence all of them. For example, friends adding you as contacts, or linking your profile to your phone number. If you also want to continue using it as normal, your normal usage will still give a lot of data about what you view; what you interact with; when you use it; that might stand out from the random stuff - and if you don't want to continue using it as usual, the extension will have to visit the website by itself, without interfering with your regular browsing activity, and without relying on your computer being on.
I also think that if something like this would get big enough to actually impact the quality of their algorithms, it'd be pretty easy for Facebook to detect abnormal activity.
(And of course, it's not even that much of a problem if the algorithm screws up more often, as long as advertisers believe it's accurate.)
Edit for extra nostalgia: https://en.wikipedia.org/wiki/Need_to_Know_(newsletter)
It's been done :)
Also, if your friend gives someone else information about you, do you expect that you should be able to control it? I'm pretty sure many of the details of my life appear in other people's emails, voicemails, journals, etc. that will never have a "forget me" button.
I'm mainly suggesting that there are lots of ways to improve this situation, from market forces to legislation, and it's sad to see people wasting time on things that won't work.
And since they're buying data from non-internet data brokers such as credit card companies, retailers that track you in their stores, etc., blocking the trackers and avoiding the websites doesn't actually stop the spying.
To stop them from spying on you pretty much requires that you stop engaging in large swaths of real-world society.
You can't really control what your friends are telling them about you though.
Personally, I've just about opted out of the entire internet except for work, email, and a few web sites that I read regularly. The flotsam is just too thick to be worth the bother.
Words are powerful. Normalizing terms like "data vandalism" is the road to being legally required to use FAANG in such a way that maximizes their profitability. Probably not a great idea.
I don't use Facebook/Google/etc., but they collect data about me without my consent anyway, and there is no way to delete that data or to make them stop.
That's what makes me so furious with them and all other ad companies. Since they're never going to change, I wish that they'd go out of business pretty much every day.
Edit: Found a copy of the audio: https://www.youtube.com/watch?v=XU5xC00m-gA
This worked best on mainstream P2P clients like Limewire, Kazaa, Ares (all based on Gnutella I think), because the fake MP3s would propagate more widely.
It didn't work as well (or at all), on relatively obscure, community-based networks like Slsk, where people were more likely to share entire albums, and not just the singles.
I don't know if I ran into that issue, but I wasn't really downloading a lot of mainstream stuff compared to him either.
The tool would need to be something like a color picker that has a hidden command to run headerless searches in a tab of your browser.
This (sort of) worked for teenagers hiding pictures on a fake calculator app on smartphones years ago, but your mom wont try to pry at your facebook obfuscation.
Just a thought
So I think they identify this as a tough to protect weakness as well.
TrackMeNot runs as a low-priority background process that periodically issues randomized search-queries to popular search engines, e.g., AOL, Yahoo!, Google, and Bing. It hides users' actual search trails in a cloud of 'ghost' queries, significantly increasing the difficulty of aggregating such data into accurate or identifying user profiles.
Privacy Badger is a browser add-on that stops advertisers and other third-party trackers from secretly tracking where you go and what pages you look at on the web. If an advertiser seems to be tracking you across multiple websites without your permission, Privacy Badger automatically blocks that advertiser from loading any more content in your browser. To the advertiser, it's like you suddenly disappeared.
* They're still working on Clear History, and think it is worth implementing (Zuckerberg is the one who introduces the topic–the interviewer hasn't asked him about it). It's important that users have a tool like this, even if it makes their experience "worse" (i.e. less personalized).
* Clear History is nontrivial to implement, because Facebook was not built with this functionality in mind. And they want a user interface that is more capable than just a single "delete everything" button, because their research has shown that users frequently want to clear only certain aspects of their data (e.g. data shared to a specific service or app via Facebook), so this adds complexity.
* (In the context of a Facebook subscription service, which they were discussing before Zuckerberg mentioned the Clear History tool)–having Clear History is a prereq for any sort of subscription service, but they want Clear History (and all privacy controls) accessible to all users, not as a privilege for certain users.
And to be clear, I'm not saying that Facebook has necessarily been malicious or lied. It's just that a product manager faced with the choice to prioritize this vs. some new engagement enhancing widget is going to pick the widget every time unless there's constant pressure from above.
Edit: This is why legislation is required. So privacy issues can't be ignored simply because they don't contribute to the bottom line.
> There’s a reason that Clear History isn’t called “Delete History”: Using the feature will disassociate browsing data that Facebook collects from your specific account but it won’t be erased from Facebook’s servers completely, Baser said. Instead it’s just “de-identified,” which means it’s stored by Facebook but no longer tied to the user who created it.
We promised a history tool and have not delivered it. We are deeply sorry. We founded Facebook to connect the world and facilitate relationships. We remain committed to doing things.
addendum: If this bothers you, I would strongly recommend just jumping ship now. There's almost no incentive for them to change and no sign of one on the horizon. If you're not getting enough value from them right now to accept the data they keep on you, you'll be happier just giving up and moving on early.
Even though the BP one is substantially more detailed, at least I've read it completely. I've skimmed over several paragraphs in the Zuckerberg's statement because they essentially say nothing.
Some part of it can be blamed on the medium itself, since Zuckerberg couldn't post links, subsections, nor visually-distinguishable lists in a fucking Facebook status.
"Well yes, but actually no"
I hate to say it, but it's probably too late anyway. FB has billions in the bank and are likely working on much bigger-picture projects with their surveillance tech. Why bother with a B2C product that brings tons of PR grief when you can enter billion-dollar contracts with militaries and governments where secrecy is the default MO?
It's an European account and I'm wondering if only marking the content as deleted, instead of actually deleting it, violates the GDPR.
Deleting a record from a table can cause a re-index, which is very intensive. It's much easier to flag a column in a record as "deleted" or whatever, and then run a cleanup during off-peak hours.
I'm sure there are clever ways around that with proper knowledgeable DBAs on your team, but as I'm a web dev for smaller audience projects, I don't touch solutions that require those types of optimizations that I'm sure Facebook has implemented.
As will adding new data and FB loves the story of adding to their profiling DB.
This is not an indexing issue but a "we love adding data but hate removing it" issue.
Disclaimer: Have no experience with databases at that scale so maybe this isn't entirely unreasonable.
In Facebook's case, we know they're seeing huge hits in active, engaged users, but their whole metrics ecosystem is built to optimize micro-engagement. Taken as a whole, those micro-optimizations are killing goodwill and turning off users, but it's much harder to measure that.
So many decisions are made because variant b "won", without any discussion of the layers upon layers of factors that aren't easy to measure.
I know most of them are likely working on backend stability and performance... but come on.
Probably somewhere in the backlog :-)
I would love more transparency here, but we're not getting that without government intervention, and that's unlikely to come anytime soon (and what gets passed may be worse than the current situation).
The GDPR (General Data Privacy Regulation -- an EU regulation that came into force last year) provides for this and has potentially very large fines to back up the regulation. Obviously it only applies to EU residents (or businesses) but I've heard that California is getting their own version of GDPR -- though I haven't looked into it. Facebook implemented the "delete your account" option that actually works in response to this, as well as giving explicit and opt-in consent to whatever tracking they use.
If you want to "clear" history to delude yourself that revenant Facebook information is gone (or at least make it difficult to obtain from your own account), there's plenty of options for that.
Not so much, really. Outside of techie types, my experience is that very few people are aware of this stuff.
No, the public who's aware isn't outraged because they don't think there's anything they can do to stop this. Resignation is very different from thinking it's "[not] a big deal."
Me too. And, although it may be unfair, I have to admit that I view engineers who are still willing to work for them as suspect.
That will tell you what data they still have about you.
You can't _know_ they've deleted everything if the subject access request comes back empty, but it puts then in the position of having to actively break the law to lie about whether they've deleted your data.
True, but I don't think Facebook is too bothered by that sort of thing.
Is anonymity in the eye of the beholder?
That depends on how you define "event" and "PII" data. The normal industry and legal definitions of PII leave out huge swaths of actual PII, after all.
Personally, I'm not even at the point where I can consider what is or is not acceptable for companies like Facebook to keep. I'm still stuck at the point of stopping them from collecting data about me without my permission in the first place.
Take that guy who managed to get a dump from his Spotify data thanks to the GDPR for instance: https://twitter.com/steipete/status/1025024813889478656/
I'm sure we all knew Spotify tracked us but seeing how much they track is pretty crazy to me even though I'm pretty sensitized to these privacy issues.
I expect that a full dump from Facebook would be at least as verbose if not more. I don't think they want the users to see how the sausage is made.
Facebook uses its pixels to track purchase and browser history. This purchase and browser history is a crucial part of their lookalike audience technology, which in turn is one of the most critical components of their advertiser tool suite. Without this data, advertisers will be less effective at targeting their ads, which means lost ad revenue.
It's not surprising to me that Facebook is backtracking on this and vying for other privacy improvements that don't hurt their fundamental business as much (like letting users view how advertisers got their contact info, or letting users view page ads).
Perhaps their “business” would need to be rescaled if they were honest about serving ads to their users.
I really don't want to read a single word about the "beating" they're taking. It helps Facebook by exaggerating the harm done to them, and it pats the press on the back for its supposed effectiveness. Facebook keeps growing and doing the same shit, that's how effective it is. "Maybe if we just had more facts!" says the well-meaning fool who then loses to a Zuckerberg or a Trump. Dream on, the world doesn't run on facts and virtue. By way of contrast, if Facebook executives had to suffer an actual beating, Rodney King style, things would be fixed in a hurry. I confess I would love to be holding one of the night sticks too.
FB get the appearance of pain and attrition. Press get more views and plaudits. Nothing changes.
They need actual pain to their organization (not violence to individuals of course) before a proper change will come. Think Microsoft pain in the 90’s. It seems Europe will again lead the charge on this.