I work at Facebook, but this is not an official media statement.
This doesn't appear to be a Facebook bug that leaks anyone's private email address. It appears that all the examples indexed in Google exist in Google because they were already published publicly on other Internet sites. We're committing a fix right now to stop indexing this page in Google, but even that wouldn't prevent email addresses from being published because it appears that users are already republishing their addresses in other non-Facebook venues.
Although his email opt-out link has been edited out of that email (see "please clic to unsubscribe." at the bottom), it appears that the email was originally published to Posterous in its entirety via his iPhone and then edited later, after it was picked up by Google.
(Update: The author of the original blog post acknowledges this below.)
Good point. That blog post has a link "to sign up for facebook", and when you click on it, it reveals the gmail address of the user (which is something @gmail.com). At least that answers the question of how Google found it.
Thanks for following up. My personal email address is indexed, but is nowhere else on the internet. Shoot me an email at hi@corywatilo.com and I'll give you the address.
It wouldn't have mattered. The user republished a Facebook email on a non-Facebook venue (such as a public mailing list archive), thus exposing their email address. See, for instance, http://groups.yahoo.com/group/salemwhitemagic/message/89
There is no way for Google to find those pages unless those pages are linked from somewhere else or Google is getting URLs from users' browsing activity. The first scenario makes this relatively harmless. The second makes it a privacy issue on Google's end.
That page has to be publicly accessible. It is sent to users without Facebook accounts. It doesn't need to be indexed, but it's not the huge bug people are making it out to be. It's a tiny bug.
I understand that the page has to be public, but why include the email address there? The person who received the link knows what email address it applies to. Every opt out/unsubscribe page I've used didn't display my email.
> The person who received the link knows what email address it applies to.
No they don't. Plenty of people have multiple email addresses forward to one account. Showing the email address helps, and it's harmless unless people post the link somewhere else on the internet.
Google is indexing Facebook's "Opt out of emails from Facebook" page for email addresses that were submitted using the "Find a friend" feature.
I checked out the Google site and saw a few addresses in the format name.secret@blogger.com, which indicates these are the SECRET email addresses people use to post to their blogger sites. Pretty bad.
I can debunk the misconception that Google somehow crawled "private" or Gmail content to discover these links. How can I prove it? Because Yahoo crawled these pages too. Here's a screenshot I took of Yahoo returning similar pages: http://www.mattcutts.com/images/yahoo-facebook-leak.png including a Gmail address. Yahoo clearly didn't discover that content via Gmail--it found it via public links on the public web. That's how Google found these pages too.
I wonder why we are seeing more issues about Facebook privacy issues recently. Is it because the coverage of this has made people start questioning their policies and looking into things that have not been researched or is Facebook becoming more lacksidasical about privacy as time passes? Or, have these things been said a lot in the past, and we are just now realizing the plethora of complaints?
It's a self-reinforcing problem: I imagine most startups that grow as fast as Facebook don't have a perfectly engineered architecture, simply because they're desperately optimizing their products for time-to-market. When the "Facebook has security and privacy issues, and might be evil" meme gets around, people start looking for more weaknesses and quickly find them. From my limited experience in startups, none would be immune to such scrutiny.
Facebook is at a very challenging point right now: they need strong short-term PR, very fast patching up of issues as they arise, cleaning out the tech debt they acquired in their youth, and most importantly, battling off the fierce competitors of all stripes and colors: from Google to Twitter to foursquare. Will they be able to do it? Based mostly on their ability to capture as large a market as they did and their fast attention to issues as they arise, I think so: as long as they don't piss off their users enough for them to leave en masse (and they haven't - most of their users don't care about the privacy issues we go crazy over), they just need to hold on and keep doing more or less what they've been doing.
I imagine most startups that grow as fast as Facebook don't have a perfectly engineered architecture, simply because they're desperately optimizing their products for time-to-market. When the "Facebook has security and privacy issues, and might be evil" meme gets around, people start looking for more weaknesses and quickly find them.
This sounds like by far the most likely explanation for most of the Facebook problems. The Facebook non-system has become sufficiently internally complicated that no one person understands it anymore, which makes it easy to overlook security weaknesses. The systematic attempts to "share" information on the part of Facebook's leadership have been quite annoying, and have made me MUCH more circumspect in my Facebook behavior, but all the rest of this is just sheer inadvertence.
Because there have been a lot of legitimate Facebook privacy issues recently?
When the world's second largest site allows you to monitor the private chats of 400 million people - when it leaks your IP address to other people via e-mail for no reason - and now when it's leaking people's e-mail, then it's a big deal.
That's not even taking into consideration the privacy issues that were not bugs.
I still have these mails in my inbox. People who I've never met nor conversed with online in my life.
And whether it's sensitive depends on the context. Revealing one's IP reveals their location, and I can think of many cases in which this is sensitive information.
All those things are communication. Comment threads leaking IP addresses is slightly different from email in that you can only reply all, but it's not that huge of a leap.
Unless someone knows exactly how the internet works, getting their IP address via social engineering is trivial. Facebook's behavior was not a big deal, though fixing it was a positive thing.
Isnt it a bit more concerning that we can modify these peoples settings by clicking the links? (at least it appears we can)
I have not checked but there may be a way to modify the url string to view anyones email (sample URL taken from that guys post below)
Or, perhaps a way to take the authentication hash + mid hash's from above to perform another function on someone elses account. (like changing the email, or changing privacy settings)
It would probably be near impossible to view an email for a specific account or any random account. The mid value can be removed entirely (it seems) and I'm guessing u is a userid(?), but you would have to figure out the value for k that matches the id.
You are on to something here. A link that auto logs u out. Can we embed this link in a post on facebook, therefore auto logging people out just by visiting their wall? It would be the ultimate annoyance.
The worst part is that if you just try to login again at the prompt (instead of going to facebook.com) you get redirected back to the post and logged out again in a loop.
So, how did Google become aware of the existence of these URLs in the first place? I seriously doubt they're linked from another Facebook page.
Is Google harvesting links from secure pages using their toolbar or something? Are people's personal mails leaking through other means?
I noticed that a few of them are indexed by Google because someone decided to reprint the email - URLs and all - to their blog. But that's a rare exception,and obviously not the case with the author of the article.
Google has multiple ways of indexing, and following links is only one of them.
I suspect Google found these pages by scanning the directory structure of facebook, indexing all pages in these subdirectorys, and finding more subdirectories to drill in to.
And how do they do that (scan a web site directory tree) without following links? You are describing the same "following links" procedure I think.
Maybe somebody accessed those pages using the Google toolbar, in that case Google could have known about those pages even if they were not linked from any other pages elsewhere.
If Google is using their toolbar and/or Chrome to report private URLs back to Google for indexing, then that's a much more serious privacy violation than what's being reported here.
It's a pretty bad practice. I guess for semi-private material it's okay, but otherwise that's liable to leak. The biggest hole in it is referrers leaking to other sites, but I can think of other methods (eg probing the CSS history exposure bug).
The Google toolbar reports URLs back to Google if you have page rank or a few other features turned on. Their privacy policy says that they store the URL and provide you with whatever feature you requested, but it doesn't say weather they do anything further with the data.
Relevant bits:
URLs and embedded information
Some of our services, including Google Toolbar and Google Web Accelerator, send the uniform resource locators (“URLs”) of web pages that you request to Google. When you use these services, Google will receive and store the URL sent by the web sites you visit, including any personal information inserted into those URLs by the web site operator. Some Google services (such as Google Toolbar) enable you to opt-in or opt-out of sending URLs to Google, while for others (such as Google Web Accelerator) the sending of URLs to Google is intrinsic to the service. When you sign up for any such service, you will be informed clearly that the service sends URLs to Google, and whether and how you can opt-in or opt-out.
For example, when you submit information to a web page (such as a user login ID or registration information), the operator of that web site may “embed” that information – including personal information – into its URL (typically, after a question mark (“?”) in the URL). When the URL is transmitted to Google, our servers automatically store the URL, including any personal information that has been embedded after the question mark. Google does not exercise any control over these web sites or whether they embed personal information into URLs.
We process your requests in order to operate and improve the Google Toolbar and other Google services. For example, by knowing which web page you are viewing, the PageRank feature of Google Toolbar can show you Google's ranking of that web page. And the Sidewiki feature can tell you if others have written Sidewiki entries on a given page. Likewise, by processing the text on a web page, SpellCheck can offer spelling suggestions and AutoLink can provide useful links to information.
If you look at http://facebook.com/robots.txt there's actually a sitemap. It's protected, and may well not contain these pages, but I have heard that Google and other privileged partners have access to it.
In what situation would the HTTP referrer header report Google when clicking away from a Facebook link? Not only is there no Google search field on Facebook, but Facebook also reroutes links through an intermediary page in order to hide the actual referring page.
I had assumed that Referer URLs were sent even if it wasn't from a link that was clicked. That's not true. (And yes, "Referer" is misspelt in the standard...)
> The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard.
I imagine some people are downvoting this story too (if that is even possible). However, a new instance of privacy violation is a lot more important than an HN reader's personal decision to cancel his Facebook account. The quickness with which people tire of one type of information versus the other should scale accordingly.
If your information is PUBLIC, what the HELL do you expect? There are so many good reasons to be irritated with Facebook's privacy debacle. This is not one of them.
I work at Facebook, but this is not an official media statement.
This doesn't appear to be a Facebook bug that leaks anyone's private email address. It appears that all the examples indexed in Google exist in Google because they were already published publicly on other Internet sites. We're committing a fix right now to stop indexing this page in Google, but even that wouldn't prevent email addresses from being published because it appears that users are already republishing their addresses in other non-Facebook venues.
For example, if you see http://www.facebook.com/o.php?k=afc4a7&u=1018862530&... in isolation, it appears to be divulging private information.
However, the original Facebook email that links to that page and contains the user's email address was republished publicly on a mailing list archive by the owner of the email address: http://games.dir.groups.yahoo.com/group/Living_Greyhawk/mess...
Does anyone see an example where this is not the case that constitutes a privacy leak?
Blake Ross