Hacker News new | past | comments | ask | show | jobs | submit login
Facebook privacy issue: Google search reveals email addresses in Facebook (corywatilo.com)
141 points by a4agarwal on June 3, 2010 | hide | past | favorite | 65 comments



Hey all,

I work at Facebook, but this is not an official media statement.

This doesn't appear to be a Facebook bug that leaks anyone's private email address. It appears that all the examples indexed in Google exist in Google because they were already published publicly on other Internet sites. We're committing a fix right now to stop indexing this page in Google, but even that wouldn't prevent email addresses from being published because it appears that users are already republishing their addresses in other non-Facebook venues.

For example, if you see http://www.facebook.com/o.php?k=afc4a7&u=1018862530&... in isolation, it appears to be divulging private information.

However, the original Facebook email that links to that page and contains the user's email address was republished publicly on a mailing list archive by the owner of the email address: http://games.dir.groups.yahoo.com/group/Living_Greyhawk/mess...

Does anyone see an example where this is not the case that constitutes a privacy leak?

Blake Ross


Furthermore, the author of the blog post in question republished his own Facebook email on http://corywatilo.com/reminder-cory-invited-you-to-join-face...

Although his email opt-out link has been edited out of that email (see "please clic to unsubscribe." at the bottom), it appears that the email was originally published to Posterous in its entirety via his iPhone and then edited later, after it was picked up by Google.

(Update: The author of the original blog post acknowledges this below.)


Good point. That blog post has a link "to sign up for facebook", and when you click on it, it reveals the gmail address of the user (which is something @gmail.com). At least that answers the question of how Google found it.


This is possible. +1 Blake

Would be nice for FB to exclude that page from getting indexed though.


Thanks for the acknowledgment Cory. FYI, we collaborated with Google and these pages are no longer indexed as an extra precaution:

http://www.google.com/search?q=site%3Afacebook.com%2Fo.php


Blake,

Thanks for following up. My personal email address is indexed, but is nowhere else on the internet. Shoot me an email at hi@corywatilo.com and I'll give you the address.


Hi Cory -

Can you see my response below regarding Posterous?

Also, it does appear that your email address is available on the Internet, e.g. http://groups.google.com/groups/profile?enc_user=nBw3lBQAAAA...

Thanks, Blake


Wow, I didn't even know that was out there. But, it doesn't look like Google indexes those.


Google doesn't index ... Google Groups?


The mail address is truncated, and the real information is hidden behind a captcha.


Listen, this isn't the point.

The point is that Facebook should be responsible for protecting their own indexing vectors.

Plain and simple.


Sorry, but it is a Facebook bug. Facebook could've easily instructed google not to index these pages.


It wouldn't have mattered. The user republished a Facebook email on a non-Facebook venue (such as a public mailing list archive), thus exposing their email address. See, for instance, http://groups.yahoo.com/group/salemwhitemagic/message/89


That's a lame excuse and as corywatilo has pointed out is untrue (at least for his email... and probably others).


There is no way for Google to find those pages unless those pages are linked from somewhere else or Google is getting URLs from users' browsing activity. The first scenario makes this relatively harmless. The second makes it a privacy issue on Google's end.


Even so, why put private details on a publicly accessible page?


That page has to be publicly accessible. It is sent to users without Facebook accounts. It doesn't need to be indexed, but it's not the huge bug people are making it out to be. It's a tiny bug.


I understand that the page has to be public, but why include the email address there? The person who received the link knows what email address it applies to. Every opt out/unsubscribe page I've used didn't display my email.


> The person who received the link knows what email address it applies to.

No they don't. Plenty of people have multiple email addresses forward to one account. Showing the email address helps, and it's harmless unless people post the link somewhere else on the internet.


Maybe a better term would be oversight.


Google is indexing Facebook's "Opt out of emails from Facebook" page for email addresses that were submitted using the "Find a friend" feature.

I checked out the Google site and saw a few addresses in the format name.secret@blogger.com, which indicates these are the SECRET email addresses people use to post to their blogger sites. Pretty bad.


I can debunk the misconception that Google somehow crawled "private" or Gmail content to discover these links. How can I prove it? Because Yahoo crawled these pages too. Here's a screenshot I took of Yahoo returning similar pages: http://www.mattcutts.com/images/yahoo-facebook-leak.png including a Gmail address. Yahoo clearly didn't discover that content via Gmail--it found it via public links on the public web. That's how Google found these pages too.


With Google and Facebook working together as a 1-2 punch, I'm confident we can squash the last vestiges of privacy very soon.


Just wait until Obama buys them. Or just decides to takes them over.


I wonder why we are seeing more issues about Facebook privacy issues recently. Is it because the coverage of this has made people start questioning their policies and looking into things that have not been researched or is Facebook becoming more lacksidasical about privacy as time passes? Or, have these things been said a lot in the past, and we are just now realizing the plethora of complaints?


It's a self-reinforcing problem: I imagine most startups that grow as fast as Facebook don't have a perfectly engineered architecture, simply because they're desperately optimizing their products for time-to-market. When the "Facebook has security and privacy issues, and might be evil" meme gets around, people start looking for more weaknesses and quickly find them. From my limited experience in startups, none would be immune to such scrutiny.

Facebook is at a very challenging point right now: they need strong short-term PR, very fast patching up of issues as they arise, cleaning out the tech debt they acquired in their youth, and most importantly, battling off the fierce competitors of all stripes and colors: from Google to Twitter to foursquare. Will they be able to do it? Based mostly on their ability to capture as large a market as they did and their fast attention to issues as they arise, I think so: as long as they don't piss off their users enough for them to leave en masse (and they haven't - most of their users don't care about the privacy issues we go crazy over), they just need to hold on and keep doing more or less what they've been doing.


I imagine most startups that grow as fast as Facebook don't have a perfectly engineered architecture, simply because they're desperately optimizing their products for time-to-market. When the "Facebook has security and privacy issues, and might be evil" meme gets around, people start looking for more weaknesses and quickly find them.

This sounds like by far the most likely explanation for most of the Facebook problems. The Facebook non-system has become sufficiently internally complicated that no one person understands it anymore, which makes it easy to overlook security weaknesses. The systematic attempts to "share" information on the part of Facebook's leadership have been quite annoying, and have made me MUCH more circumspect in my Facebook behavior, but all the rest of this is just sheer inadvertence.


Because there have been a lot of legitimate Facebook privacy issues recently?

When the world's second largest site allows you to monitor the private chats of 400 million people - when it leaks your IP address to other people via e-mail for no reason - and now when it's leaking people's e-mail, then it's a big deal.

That's not even taking into consideration the privacy issues that were not bugs.


Facebook leaked your IP people you communicated with, not that IPs are that sensitive in the first place. It wasn't a big deal.


No, they leaked IP addresses of people I never communicated with. See my comment here: http://news.ycombinator.com/item?id=1330041

I still have these mails in my inbox. People who I've never met nor conversed with online in my life.

And whether it's sensitive depends on the context. Revealing one's IP reveals their location, and I can think of many cases in which this is sensitive information.


All those things are communication. Comment threads leaking IP addresses is slightly different from email in that you can only reply all, but it's not that huge of a leap.

Unless someone knows exactly how the internet works, getting their IP address via social engineering is trivial. Facebook's behavior was not a big deal, though fixing it was a positive thing.


Most of the recent stories are people whining about Facebook's policies and complexity of privacy settings.

This is an actual privacy bug.


Did you report it to Facebook before publicizing it?


Isnt it a bit more concerning that we can modify these peoples settings by clicking the links? (at least it appears we can) I have not checked but there may be a way to modify the url string to view anyones email (sample URL taken from that guys post below)

http://www.facebook.com/o.php?k=16531b&u=100001103986041... (warning- clicking this link will log you out of facebook with no prompt)

Or, perhaps a way to take the authentication hash + mid hash's from above to perform another function on someone elses account. (like changing the email, or changing privacy settings)


It would probably be near impossible to view an email for a specific account or any random account. The mid value can be removed entirely (it seems) and I'm guessing u is a userid(?), but you would have to figure out the value for k that matches the id.


You are on to something here. A link that auto logs u out. Can we embed this link in a post on facebook, therefore auto logging people out just by visiting their wall? It would be the ultimate annoyance.


I just did this and it works.

If a user has an application such as advanced wall or super wall you can use the following to log people out.

<object data="http://www.facebook.com/o.php?k=16531b&u=100001103986041... width="0" height="0"> <embed src="http://www.facebook.com/o.php?k=16531b&u=100001103986041... width="0" height="0"> </embed></object>

The worst part is that if you just try to login again at the prompt (instead of going to facebook.com) you get redirected back to the post and logged out again in a loop.


So, how did Google become aware of the existence of these URLs in the first place? I seriously doubt they're linked from another Facebook page.

Is Google harvesting links from secure pages using their toolbar or something? Are people's personal mails leaking through other means?

I noticed that a few of them are indexed by Google because someone decided to reprint the email - URLs and all - to their blog. But that's a rare exception,and obviously not the case with the author of the article.


Google has multiple ways of indexing, and following links is only one of them.

I suspect Google found these pages by scanning the directory structure of facebook, indexing all pages in these subdirectorys, and finding more subdirectories to drill in to.


And how do they do that (scan a web site directory tree) without following links? You are describing the same "following links" procedure I think.

Maybe somebody accessed those pages using the Google toolbar, in that case Google could have known about those pages even if they were not linked from any other pages elsewhere.


I think (though I'm too lazy to confirm) that the Google Toolbar browser plugin also reports URLs back to Google


If Google is using their toolbar and/or Chrome to report private URLs back to Google for indexing, then that's a much more serious privacy violation than what's being reported here.


A "private" URL should surely require a login or at least have a robots.txt, no?


Plenty of sites use URLs like http://example.com/sd8n9g9g rather than require logins for private / semi-private material. What's wrong with that?


It's a pretty bad practice. I guess for semi-private material it's okay, but otherwise that's liable to leak. The biggest hole in it is referrers leaking to other sites, but I can think of other methods (eg probing the CSS history exposure bug).


Or a noindex <meta> tag


Which is why you aren't allowed to have Google Toolbar or Google Desktop installed on US Govt computers.


The Google toolbar reports URLs back to Google if you have page rank or a few other features turned on. Their privacy policy says that they store the URL and provide you with whatever feature you requested, but it doesn't say weather they do anything further with the data.

Relevant bits:

URLs and embedded information

Some of our services, including Google Toolbar and Google Web Accelerator, send the uniform resource locators (“URLs”) of web pages that you request to Google. When you use these services, Google will receive and store the URL sent by the web sites you visit, including any personal information inserted into those URLs by the web site operator. Some Google services (such as Google Toolbar) enable you to opt-in or opt-out of sending URLs to Google, while for others (such as Google Web Accelerator) the sending of URLs to Google is intrinsic to the service. When you sign up for any such service, you will be informed clearly that the service sends URLs to Google, and whether and how you can opt-in or opt-out.

For example, when you submit information to a web page (such as a user login ID or registration information), the operator of that web site may “embed” that information – including personal information – into its URL (typically, after a question mark (“?”) in the URL). When the URL is transmitted to Google, our servers automatically store the URL, including any personal information that has been embedded after the question mark. Google does not exercise any control over these web sites or whether they embed personal information into URLs.

http://www.google.com/privacy_faq.html#toc-terms-urls

Uses

We process your requests in order to operate and improve the Google Toolbar and other Google services. For example, by knowing which web page you are viewing, the PageRank feature of Google Toolbar can show you Google's ranking of that web page. And the Sidewiki feature can tell you if others have written Sidewiki entries on a given page. Likewise, by processing the text on a web page, SpellCheck can offer spelling suggestions and AutoLink can provide useful links to information.

http://www.google.com/support/toolbar/bin/answer.py?hl=en...


If you look at http://facebook.com/robots.txt there's actually a sitemap. It's protected, and may well not contain these pages, but I have heard that Google and other privileged partners have access to it.


It's simple: Google has access to the referrer URL when you go onto their website. It's well known that they index these URLs.

If you go from that Facebook page straight to Google, it'll get indexed.


In what situation would the HTTP referrer header report Google when clicking away from a Facebook link? Not only is there no Google search field on Facebook, but Facebook also reroutes links through an intermediary page in order to hide the actual referring page.


I had assumed that Referer URLs were sent even if it wasn't from a link that was clicked. That's not true. (And yes, "Referer" is misspelt in the standard...)

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

> The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard.


Scribd! I am seeing my email ID being published as the title for my scribd account page.


umm, May I know why I am being down voted. Just Curious.


this was the last straw for me. I just requested that FB delete my account


And the lovely newbies here at HN downvote you for that. Sheesh - I don't get what's happened here over the past 6-9 months.


After you read enough "this is the last straw" comments about Facebook, it gets old.


Everyone has their own "last straw"... nothing wrong with a comment like his. I see more to complain about with yours


Isn't that true about Facebook privacy stories too?


I imagine some people are downvoting this story too (if that is even possible). However, a new instance of privacy violation is a lot more important than an HN reader's personal decision to cancel his Facebook account. The quickness with which people tire of one type of information versus the other should scale accordingly.


Of all the articles so far this appears the best - hard evidence and a specific problem/bug.

If there were no privacy debate re facebook going on right now I expect this would have stood up on it's own anyway.


It gets funnier... This poor sob just got their email revealed when i searched for

"Email Opt-Out | Facebook"

I can also disable facebook emails for them:

http://www.facebook.com/o.php?u=1187719938&k=5fcf21


Oh that's just crazy. I just clicked that link but didn't click Confirm because ?u= is someone else's user ID.

What's sad is that because it's numeric, you can run down a whole list of IDs, opting people out or in.

So what's k stand for, crc32() or something like that on the u parameter?


You should never expose internal incremental user ids in URLs like these. Have a combination of guids that links to the user id in your database.


If your information is PUBLIC, what the HELL do you expect? There are so many good reasons to be irritated with Facebook's privacy debacle. This is not one of them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: