I work at Facebook, but this is not an official media statement.
This doesn't appear to be a Facebook bug that leaks anyone's private email address. It appears that all the examples indexed in Google exist in Google because they were already published publicly on other Internet sites. We're committing a fix right now to stop indexing this page in Google, but even that wouldn't prevent email addresses from being published because it appears that users are already republishing their addresses in other non-Facebook venues.
Although his email opt-out link has been edited out of that email (see "please clic to unsubscribe." at the bottom), it appears that the email was originally published to Posterous in its entirety via his iPhone and then edited later, after it was picked up by Google.
(Update: The author of the original blog post acknowledges this below.)
Good point. That blog post has a link "to sign up for facebook", and when you click on it, it reveals the gmail address of the user (which is something @gmail.com). At least that answers the question of how Google found it.
There is no way for Google to find those pages unless those pages are linked from somewhere else or Google is getting URLs from users' browsing activity. The first scenario makes this relatively harmless. The second makes it a privacy issue on Google's end.
I understand that the page has to be public, but why include the email address there? The person who received the link knows what email address it applies to. Every opt out/unsubscribe page I've used didn't display my email.
I can debunk the misconception that Google somehow crawled "private" or Gmail content to discover these links. How can I prove it? Because Yahoo crawled these pages too. Here's a screenshot I took of Yahoo returning similar pages: http://www.mattcutts.com/images/yahoo-facebook-leak.png including a Gmail address. Yahoo clearly didn't discover that content via Gmail--it found it via public links on the public web. That's how Google found these pages too.
Isnt it a bit more concerning that we can modify these peoples settings by clicking the links? (at least it appears we can)
I have not checked but there may be a way to modify the url string to view anyones email (sample URL taken from that guys post below)
It would probably be near impossible to view an email for a specific account or any random account. The mid value can be removed entirely (it seems) and I'm guessing u is a userid(?), but you would have to figure out the value for k that matches the id.
You are on to something here. A link that auto logs u out. Can we embed this link in a post on facebook, therefore auto logging people out just by visiting their wall? It would be the ultimate annoyance.
So, how did Google become aware of the existence of these URLs in the first place? I seriously doubt they're linked from another Facebook page.
Is Google harvesting links from secure pages using their toolbar or something? Are people's personal mails leaking through other means?
I noticed that a few of them are indexed by Google because someone decided to reprint the email - URLs and all - to their blog. But that's a rare exception,and obviously not the case with the author of the article.
It's a pretty bad practice. I guess for semi-private material it's okay, but otherwise that's liable to leak. The biggest hole in it is referrers leaking to other sites, but I can think of other methods (eg probing the CSS history exposure bug).
URLs and embedded information
Some of our services, including Google Toolbar and Google Web Accelerator, send the uniform resource locators (“URLs”) of web pages that you request to Google. When you use these services, Google will receive and store the URL sent by the web sites you visit, including any personal information inserted into those URLs by the web site operator. Some Google services (such as Google Toolbar) enable you to opt-in or opt-out of sending URLs to Google, while for others (such as Google Web Accelerator) the sending of URLs to Google is intrinsic to the service. When you sign up for any such service, you will be informed clearly that the service sends URLs to Google, and whether and how you can opt-in or opt-out.
For example, when you submit information to a web page (such as a user login ID or registration information), the operator of that web site may “embed” that information – including personal information – into its URL (typically, after a question mark (“?”) in the URL). When the URL is transmitted to Google, our servers automatically store the URL, including any personal information that has been embedded after the question mark. Google does not exercise any control over these web sites or whether they embed personal information into URLs.
We process your requests in order to operate and improve the Google Toolbar and other Google services. For example, by knowing which web page you are viewing, the PageRank feature of Google Toolbar can show you Google's ranking of that web page. And the Sidewiki feature can tell you if others have written Sidewiki entries on a given page. Likewise, by processing the text on a web page, SpellCheck can offer spelling suggestions and AutoLink can provide useful links to information.
If you look at http://facebook.com/robots.txt there's actually a sitemap. It's protected, and may well not contain these pages, but I have heard that Google and other privileged partners have access to it.
In what situation would the HTTP referrer header report Google when clicking away from a Facebook link? Not only is there no Google search field on Facebook, but Facebook also reroutes links through an intermediary page in order to hide the actual referring page.
I wonder why we are seeing more issues about Facebook privacy issues recently. Is it because the coverage of this has made people start questioning their policies and looking into things that have not been researched or is Facebook becoming more lacksidasical about privacy as time passes? Or, have these things been said a lot in the past, and we are just now realizing the plethora of complaints?
It's a self-reinforcing problem: I imagine most startups that grow as fast as Facebook don't have a perfectly engineered architecture, simply because they're desperately optimizing their products for time-to-market. When the "Facebook has security and privacy issues, and might be evil" meme gets around, people start looking for more weaknesses and quickly find them. From my limited experience in startups, none would be immune to such scrutiny.
Facebook is at a very challenging point right now: they need strong short-term PR, very fast patching up of issues as they arise, cleaning out the tech debt they acquired in their youth, and most importantly, battling off the fierce competitors of all stripes and colors: from Google to Twitter to foursquare. Will they be able to do it? Based mostly on their ability to capture as large a market as they did and their fast attention to issues as they arise, I think so: as long as they don't piss off their users enough for them to leave en masse (and they haven't - most of their users don't care about the privacy issues we go crazy over), they just need to hold on and keep doing more or less what they've been doing.
I imagine most startups that grow as fast as Facebook don't have a perfectly engineered architecture, simply because they're desperately optimizing their products for time-to-market. When the "Facebook has security and privacy issues, and might be evil" meme gets around, people start looking for more weaknesses and quickly find them.
This sounds like by far the most likely explanation for most of the Facebook problems. The Facebook non-system has become sufficiently internally complicated that no one person understands it anymore, which makes it easy to overlook security weaknesses. The systematic attempts to "share" information on the part of Facebook's leadership have been quite annoying, and have made me MUCH more circumspect in my Facebook behavior, but all the rest of this is just sheer inadvertence.
Because there have been a lot of legitimate Facebook privacy issues recently?
When the world's second largest site allows you to monitor the private chats of 400 million people - when it leaks your IP address to other people via e-mail for no reason - and now when it's leaking people's e-mail, then it's a big deal.
That's not even taking into consideration the privacy issues that were not bugs.
I imagine some people are downvoting this story too (if that is even possible). However, a new instance of privacy violation is a lot more important than an HN reader's personal decision to cancel his Facebook account. The quickness with which people tire of one type of information versus the other should scale accordingly.