Hacker News new | past | comments | ask | show | jobs | submit login
Facebook Defends Getting Data From Logged-Out Users (wsj.com)
86 points by goldensaucer on Sept 26, 2011 | hide | past | web | favorite | 74 comments

Their defence doesn't hold much water. But then, I can't imagine any excuse that would satisfy me.

They say “The onus is on us is to take all the data and scrub it,” said Arturo Bejar, a Facebook director of engineering. “What really matters is what we say as a company and back it up.”, except their track record on that matter isn't exactly stellar.

We know they don't actually delete messages or things you delete on FB, they just mark them "deleted". With that attitude to "deleting" things, what does it even matter?

And I don't care if they promise the data is not used for targeting ads, that is just one of the many ways this type of data can be abused.

The argument they use it to prevent "spam and phishing attacks" also seems dubious to me. How does that work? And the cookie that's kept contains just your facebook ID, so wouldn't that be trivial for spammers and phishers to work around?

And the most important thing is, they might act all innocent about it now, that they did it with the best intentions and not to continue tracking people after they log out. Let's believe that and lets assume this behaviour doesn't involve any other privacy implications: Facebook is by now well known for their feature-creep, if we hadn't caught them red-handed now, what's to say they wouldn't be using this data in a few months from now?

Sorry but it's all bullshit. Facebook doesn't care one bit about their user's privacy, they've made that perfectly clear by now, and them pretending to do otherwise in this article is absolutely laughable.

We know they don't actually delete messages or things you delete on FB, they just mark them "deleted". With that attitude to "deleting" things, what does it even matter?

I've never written a web app that actually deletes data.

The argument they use it to prevent "spam and phishing attacks" also seems dubious to me. How does that work? And the cookie that's kept contains just your facebook ID, so wouldn't that be trivial for spammers and phishers to work around?

Actually its an attempt to make life easier on users. When you log in from another machine they sometimes use enhanced measures to confirm your identity. By keeping the cookie they get more confirmation that you are you.

I'm not justifying it. There's ways to prevent this that weren't taken. But I can see what they're trying to do.

I've never written a web app that actually deletes data.

Sure, but that's just a business decision, right?

The big webapp I'm working on moved from deleting data to adding delete flags over the 7 years of its existence. There are two reasons for this, none of it involves tracking users.

For one, a lot of the data is synchronized to offline applications.

If you just delete the data on the server, it's gone and it becomes impossible to tell clients that they have to remove their copy. In this case, I could keep a second list of deleted items around and synch only that of course, but that would mean additional work and it wouldn't help for the other case:

Many times, end users wanted us to restore some data for them that they accidentally deleted. Back in the days that meant restoring the backup, and merging the backup with the current live data. A risky, complicated and thus expensive process.

Nowadays, I just set the delete flag to false and the problem is solved.

On the other hand, the data we are dealing with isn't nearly as sensitive as Facebooks and it's never shared between users.

It's often a performance, scalability, and safety decision as well. The optimal way for a web app to truly delete data is during an asynchronous garbage collection process. It's a lot easier to just mark the object as deleted.

I don't think the poster was allowing for GC, either.

What if you delete your account entirely? Do they delete your data then?

Facebook does delete all data associated with an account after it is deleted. An account is deleted after you indicate that you want to delete it (via a form in your account settings), and 2 weeks passes without you trying to reactivate the account (by logging into it). And yes, I do mean the permanent, irreversible kind of deleting. (I work at Facebook.)

I'd like to believe you. I really would, but I'm sorry to say that I can't.

About an year ago I deleted my Facebook account permanently. I even got a confirmation email after 14 days telling me I had deleted it. However, three or four months later I was forced to sign up for an account again[1]. After I logged in, Facebook showed me a list of "suggested friends". Note that I had zero friends at this point. Guess what, every single person I had added as a friend in my previous account was in that suggested friends list. How is that possible if Facebook is not retaining information about me? You guys are obviously associating something with my name and email address. That, or you're telepathic.

So no, I don't believe you. I don't believe Facebook deletes any information at all.


[1] The info for every event I wanted to attend was on FB. Classmates talked about college and swapped notes on FB. People planned meetups and reunions on FB. It's scary how much happens on FB instead of face-to-face/phone/email now.

Facebook could have stored your email address as part of your friend's account, eg "an email address this person is friendly with". Your account, posts and friends and all, are gone, but you leave traces of yourself with your friends. These traces could be reconstructed.

Without knowing the exact details of your case, this sounds like the correct explanation. The friends you saw were probably ones who have used Facebook's contact importer, and so their accounts had a record that your email address was a known contact. All of your wall posts, photos, friend lists, and other activity were actually dropped from Facebook's databases.

I've had similar experiences as well. I don't believe a word of Facebook's stated policies. Their employees defending them here is even more laughable. We've sold our souls to the devil. How did we ever get in this mess?

Maybe that's because your interactions with this account weren't deleted. Maybe you still were mentioned in walls, or people you chatted with still had the history of messages. That's just a guess, I don't work at facebook, but I think even if they delete everything, they won't delete every piece of data you may left on other people's profiles.

I honestly don't believe that. (How could you or Facebook prove that?)

I deleted my account a few months ago. However I have no way of verifying if all that data is gone for good, overwritten with some new persons data to sell to marketers.

But the reason I deleted my Facebook account is because I just don't trust Facebook.

One way to 'prove' that would be if Facebook can support such claims in legal documents or terms and conditions. Well technically it's not really a proof, but I will accept it.

Maybe it's time to institute outside audits on these apps. That would be a useful direction for something like the BBB to move into if they want to get out of the extortion business.

So removing individual status updates, comments or messages does not really delete them from the database, only marks them as removed?

This is what really matters to me. If they don't delete messages or posts while I still have an account, than who cares. But if I decide to close my account permently, which I did yesterday, then why should they keep all the data they have for me. It's no help for me, it's not convientent in any way. It seems like the only reason they would keep it would be for their own use.

There is another reason that is stated and restated, including elsewhere in this thread: it's not trivial to (actually) erase anything.

Wow, has anyone here ever set multiple cookies? People are blowing this up bigtime. Facebook sets multiple cookies, one for an active user session and another token that serves to authenticate a user has previously logged into facebook, so they don't need to enter extra security questions.

Who else does this? Major banks, forum software, etc. It's a common technique. All that matters is what Facebook actually does with the data, and their privacy policy, just like the Engineer stated.

If you're paranoid, either don't use Facebook or clear your cookies after you log out. Don't you just love simple solutions?

it's different because banks don't have "like" buttons that track your browsing across the web.

The cookies are somewhat a red-herring when you consider how insignificant they are compared to other methods of tracking.

They don't need a cookie in place to receive the IP of whoever loads a page with a Facebook 'like' button on it.

They're a big enough company with smart enough people to develop algorithms that can associate an IP address to a user account to at least a 95% confidence interval. They've got all that stuff you type in your profile and all the things you've shared to aid that, and the more you use your account the better they can predict.

To that end I'd be surprised if they don't continue to track 'deactivated' Facebook accounts. Not in anticipation of you going back to it, of course.

Beyond IP tracking, the EFF's Panopticlick website demonstrates how much uniquely identifying is exposed from a browser's User-Agent and system configuration values accessible from JavaScript and Flash (such as screen size, locale, and installed fonts). For example, my browser's fingerprint is unique among the 1.7M browsers the EFF has tested to date.


Tracking by IP is a ridiculous idea. My mobile phone provider uses transparent proxying for its mobile Internet - I must share the same external IP as thousands of other people when I browse the web via my phone. Not to mention that households using NAT will have three plus accounts from the one IP, let alone businesses with hundreds.

Internet-facing IP simply isn't unique enough for these purposes.

My inclination is to agree with you; the IP is hardly a unique identifier. But they don't need perfection. Think about it: most people, most of the time, will send requests to FB from just a few IPs and maybe one ISP proxy network (which FB can recognize as a proxy.) They know that your account is associated with these IPs based on tracking cookies. So, when they see a request from one of these IPs without the cookie, they can do a reverse lookup to get a list of possible accounts. That narrows the field. Next they can do a semantic analysis of the page that had the Like button which sent the request, and compare that to pages previously associated with the possible accounts. If one of them stands out as a likely match, they can be pretty sure who sent the request.

The more data they gather, and the more relationships they can record between you, your friends, and the pages you visit, the better they will get at tracking you without the cookies.

It's an interesting idea in theory, but I honestly think that the number of people who care enough about privacy to want to log out (or otherwise stop the cookies from being sent to Facebook) would be so low that it wouldn't be cost-effective. My guess is that it would probably be confined to HN's demographic.

The sort of zeroing-in on individuals based on traits/information, however, does kind of remind me of this: http://adage.com/article/digitalnext/target-a-facebook-ad-a-... - not really relevant, but still kind of cool.

Tracking by IP is pretty useless with so many people on phones, aol, etc. Plus, multiple accounts per workplace, just doesn't work...

Bejar said Facebook is looking at ways to avoid sending the data altogether but that it will “take a while.”

Maybe I'm naive, but why would turning off the gathering of information take a while? This reminds me of unsubscribing to email newsletters, where the final goodbye says something like "you should stop receiving our emails within 6-8 weeks."

Any code changes take a non-trivial amount of time. It sounds like the solution is to delete more of the cookies on logout, but there may be other Facebook services that use them and need to be transitioned away.

>Any code changes take a non-trivial amount of time

Thats a awfully cautions attitude and smells like a huge cop out for the well known fly-by-the-seat-of-your-pants commit to live strategy that facebook has.

I don't work there, but where I work we deploy 10-20 times a day and if somebody asked me to change the way we store data in cookies, it would probably take a bit of time to roll out.

I'm only defending them because it annoys me when people who aren't familiar with the software internals tell me "this is a minor change, it should take you less than an hour".

To be fair though, not doing something is a lot easier to implement than to add new functionality. As a minimal implementation they could err on the safe side and stop tracking everybody for a bit until they've corrected their error.

Whatever you've read, Facebook likely has a non-trivial push strategy, just like everyone else. Nobody at their HQ is committing directly to the live site.

Facebook's Release Engineering blog says a code change can go from commit to live in less than 60 minutes. Admittedly, they don't say how often they deploy.


Code changes are easy at facebook. Messing with domains/cookies/security/static-resources/etc is more than a code change.

Well first you have to create the other company to spin off the tracking to ...

Clearly, "Move fast and break things" doesn't apply when it benefits anyone other than the corporation.

I wonder if that involves something like collecting the data and storing it locally on your computer, then only sending the data once you log into facebook...

Not how cookies work. Visit any page with loads the facebook like widget iframe/img/script -> make a request to facebook with your cookie.

I was thinking more along the lines of a local store, but then you'd need a little script embedded into every page to handle the storage.

Essentially, instead of FB like widget -> request to facebook I would think FB like widget -> add to local datastore.

Then FB could do an optimised/aggragated query on the local database. The only thing would be that it would introduce large latency in the resulting data if its sent back only on FB login.

That's a micro/premature optimization at facebook's scale.

Well, he didn't say not only turning it off would take while. He said looking at ways to do it would take while. Speaking in such weaselese I'm not entirely positive they would ever get close to the actual turning off phase.

The company says the data is sent because of the way the “Like” button system is set up; any cookies that are associated with Facebook.com will automatically get sent when you view a “Like” button.

They have a point. This is going to be the same for any site that has static content served elsewhere with cookies attached to the domain. Hot link to an image on my blog you commented on? OFFLINE DATA GATHERING ZOMG.

they dont really have a point, cookies are nailed to a specific domain or sub domain. If they really wanted to they could easily associate the like button with a subdomain of facebook if the user isn't logged in, such that the cookies associated with the user login don't get sent.

They don't really want to.

How would whatever system that does this discover that the user is or is not logged into Facebook? The javascript portion doesn't have access to cross-domain cookies, so that won't work. Anything else requires connecting to a domain such that cookies are passed on so that it could discover whether the user is logged in or out before passing it to a subdomain.

(I work at Facebook, but not on this.)

hmm? without pretty specific knowledge of the problem set facebook is trying to solve with its current set of code I am clearly unable to offer a solution that will resolve them all.

However, if one of the problems that they wanted to solve was 'we dont want to track user data unless they are logged in', they would have solved it by now.

The fact that they haven't means either (a) they just haven't thought about it or (b) they have thought about it, but do not want to solve it.

The purpose of the social plugins is to provide social context - telling you which of your friends has liked something, or that you are the first.

To do this, it needs to know who you are if you are a Facebook user that has not logged out. To do that, it needs to check the cookie that the Facebook web site sets when you are logged in.

Unfortunately, the web as it stands doesn't allow this interaction without divulging some information (time/date, browser, IP address, &c.) when the only interesting thing is who you are if you happen to be logged in.

This is the same problem that web analytics, certain comment systems, other social buttons, and other embedded functionality systems face.

About the best that can be done is explain what happens with that data when it is received - and that is explained at https://www.facebook.com/help/?faq=186325668085084

Dude, if I am a facebook user who has not logged out, they can send cookies as much as they like.

The browser manages this - if they are logged in, set a cookie that will be sent to the hypothetical 'like' subdomain of facebook, if they are logged out, remove the cookie.

This kind of functionality is really not rocket science, there are dozens of ways to implement it and I feel kind of stupid talking about it.

There are reasons for facebook not doing this, but they are not technical ones.

If they deleted the relevant cookies on logout then the problem would go away - I believe that's the crux of the issue, certainly for me anyway. Besides which, your hot-linking analogy, while technically correct, falls down when you consider just how many sites have a 'Like' button on them, compared with how many sites are hot-linking you - and more importantly how many visitors those sites get. It's hardly an equivalent scenario.

Hotlink an image? That's now how the Like button works. It's more like linking to an IFRAME with its own javascript. Slight diff.

Hardly. Hotlinking an image can send an FB cookie too, and that's all you need.

The JS can't break out of it's frame, so that doesn't really matter. The cookie comes with the request (image, html or JS, it doesn't matter).

Gotcha. Interesting.

Thanks to both of you!

the difference is, that's not linked to your FB accounts and friends network / social graph.

not that third-party cookies aren't a big privacy issue, but this goes one step further.

Another good reason to run something like ShareMeNot [1] - it blocks Facebook from receiving anything unless you specifically click on a 'Like' button.

[1] http://sharemenot.cs.washington.edu/

"And earlier this year, Facebook discontinued the practice of obtaining browsing data about Internet users who had never visited Facebook.com, after it was disclosed by Dutch researcher Arnold Roosendaal."

I'm going to trust my gut on this one. I just get an uneasy feeling from their track record of 'mishaps' and the excuses that follow. There is a lot of stories that don't get enough attention or make enough people think...

Facebook might be called BigBrotherBlue when people look back one day. BigBrotherBlue is always watching.

How anyone from Facebook could make those statements with a straight face is beyond me. In my opinion, Facebook has a serious credibility problem.

How about if browsers implemented this cookie system: Each time a cookie is set, you could have the ability to mandate when that cookie is sent out. For example with a Facebook cookie you could tell the browser to only send that cookie when your address bar reads facebook.com. Problem solved?

They do.. it's called disable third party cookies.

No that's for setting cookies, so that website A can't set cookies on your machine while you're visiting website B.

What I'm talking about is the ability to limit when cookies as sent out with requests. Privacy wary users could perhaps have their browser set so that for example Facebook cookies are not sent to Facebook just because you're visiting a website that has code from Facebook on it, but only when you're actually browsing Facebook.

This is a great example of the inherent conflict of interest when your users are not your customers, in fact they're your product.


Please don't take this personally, but the whole "you're the product" meme, while it has a shred of truth in it, has been so re-hashed on the net that it's no longer pithy or informative. Just google for "you're the product" and you'll see what I mean.

I don't take it personally at all. The meme is common among people familiar with the internal workings of consumer web business models. My concern is that its not well understood outside that group. I also think it's an interesting way to view the rollout of Facebook's new features and public reaction to them.

Isn't there a way to run specific web applications, like Facebook, in a virtual sandbox? I.e. storing its cookies separately from other apps, launching new unrelated browser instance if you browse facebook from/to some other site etc.?

If you want to stop pushing tracking data to Facaebook from your machine then just add a local redirect in your hosts file for facebook.com to map to and just comment it when you want to use Facebook site :)

The real winner here is Google. Facebook makes Google look good. And that's pretty sad. When your users are logged out you have zero business tracking them or trying to do so.

I'd be interested to see how many competing social networks exhibit the same behavior. Specifically, Twitter and Google+ has similar social buttons.

Imagine I wanted to do this but not be get caught. What would you improve? Clearly the cookies will need to look different pre and post logout, but how different?

Why do the cookies need to exist? If I log out from your service, why do you need to keep a cookie on my computer?

Hell, HackerNews leaves a cookie on your computer after you log out with some opaque blob holding who-knows-what. Users like to complain about cookies when you bring them up, but generally can't seem to bother. Including the two of us.

Hacker News doesn't have like buttons or other widgets all over the Internet...

As stated in the article, so when you login again from the same computer, they don't have to do the whole two factor "I've never seen this computer before" text message handshake with you.

I would remain suspicious if there was any identifying or unique information in cookies after logout. Ideally, logout should delete all cookies.

I already pointed out that HN leaves a cookie behind in another comment, so here's a different tack: is there a site on the first page of http://www.alexa.com/topsites that actually leaves no cookies behind when you logout?

A major faux pas like leaving your uid in the clear in the cookie after logout certainly seems to bother us, but I don't think users (even savvy users) care about leaving some cookies behind. For the record, I've installed various opt-out browser extensions in the past (only to switch computers/browsers and forget to bring them along)--I don't think my views are pro-cookie or even moderate.

> I don't think users (even savvy users) care about leaving some cookies behind.

In most contexts, that is true. A Slashdot cookie is just a line in a text file until you visit Slashdot. But a Facebook cookie is sent home every time you visit a page with any FB spam on it.

The mysql.com malware is trivial. Hitting Facebook would get most everyone, users and not.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact