My name is Matt Jones, and I work on the Facbook security team that looked into this tonight. We only send these URLs to the email address of the account owner for their ease of use and never make them publicly available. Even then we put protection in place to reduce the likelihood that anyone else could click through to the account.
For a search engine to come across these links, the content of the emails would need to have been posted online (e.g. via throwaway email sites, as someone pointed out - or people whose email addresses go to email lists with online archives).
As jpadvo surmised, the nonces expire after a period of time. They also only work for certain users, and even then we run additional security checks to make sure it looks like the account owner who's logging in. Regardless, due to some of these links being disclosed, we've turned the feature off until we can better ensure its security for users whose email contents are publicly visible. We are also securing the accounts of anyone who recently logged in through this flow.
In the future if you run into something that looks like a security problem with Facebook, feel free to disclose it responsibly through our whitehat program: https://www.facebook.com/whitehat. That way, in addition to making some money, you can avoid a bunch of script kiddies exploiting whatever the issue is that you've found.
It shouldn't take you more than one Google query to find the place to report Facebook security problems.
I don't think it's a good idea to link it from the general support section -- you don't want the security team that is hopefully carefully monitoring this stuff to have to wade through thousands of regular customer service complaints.
It shouldn't... but it could be easier. I've been in the situation before where I wanted to report malware on facebook and I couldn't figure out where to report it.
I agree that you don't want reporting a security issue to supersede the general case of problems, but as things stand it is hard to figure out how to report a real security issue if you don't know about that magic whitehat url.
this one has a Report Something link... but that doesn't give you options for reporting a security issue, just TOS violations or copyright infringement.
Just to recap, in order to find how to submit a security bug report, it took me 15 minutes and I still only found it because I knew the term to look for was "white hat" and not "security".
Perhaps you're right. But "Facebook report a vulnerability" works just fine and that's what I would have tried if I were trying to report a vulnerability.
And you'll find at the time of writing 250.000 more results where the "wants to be friends" email with the auto-login link is posted on blogs. Many of these blogs are also hacked, in that they redirect you to Russian dating sites if you visit the homepage.
I must have made a typo at "don%27t". I corrected the first query and it now returns 238.000 results for me again.
Perhaps some Blogspot sites got hacked/ their users phished (I noticed suspicious posting activity dating back to November 2011), which would explain how they got access to the emails. Or these accounts are all fake (selling likes) and they use Blogspot to create online persona's and manage their accounts.
- Information disclosure vulnerability: you'll see the email of the Twitter user [username]
- DoS vulnerability: you can click on the "I did not sign up for this account" button. After that, the Twitter user [username] email will be removed from the [username] account.
The URLs don't need to be posted online. Some browsers (Chrome, possibly Firefox with Safe Browsing mode, very likely any browser with a Google Toolbar installed) send visited URLs to Google and they will be indexed. I don't know if this is officially documented by Google, but several people have reported seeing this while testing new/beta websites that weren't published or linked anywhere.
I noticed a new twist in your post though: you're saying that because of Safe Browsing (which checks for e.g. malware as users surf the web), those urls are sent to Google. The way that Chrome and Firefox actually do Safe Browsing is that they download an encrypted blob which allows the browser to do a lookup for dangerous urls on the client side--not by sending any urls to Google. I believe that if there's a match in the client-side encrypted table, only then does the browser send the now-suspect url to Google for checking.
Here's more info: https://developers.google.com/safe-browsing/ I believe the correct mental model of the Safe Browsing API in browsers is "Download a hash table of believed-to-be-dangerous urls. As you surf, check against that local hash table. If you find a match/collision, then the user might be about to land on a bad url, so check for more info at that point."
Sorry but don't believe you about google toolbar. I had a private page with no links in or out and yet it appeared in google search. It was not guessable and there was no chance for a referrer link. The page was never shared with friends nor accessed outside my own computers.
I only found out when a friend searched for his name and the page appeared as it was my phone list
The most common way such "secret" pages get crawled is that someone visited that secret page with their referrers on and then goes to another page. For example, are you 100% positive that every person who ever visited that page had referrers turned off on every single browser (including mobile phones) they used to access that page?
Are you sure that it is the referrer headers? PP clearly stated there were no outgoing links on the secret page. I think there's a much more mundane explanation: javascript stuff downloaded from Googles CDN.
People nowadays are so used to just plopping jQuery etc. into their web pages that they forget that this stuff has to come from somewhere. If it's from Google, I'm quite certain that their CDN loader phones home right before it gives up any of the good stuff.
EDIT: Confirmed, though I was wrong in that there's no loader, requesting jQuery from ajax.googleapis.com gives them a nice fresh Referer header pointing at your secret site for their spiders to crawl. Be mindful!
an old meme, and my usual recommendation: just test it: create a page that i not linked from anywhere. visit it with the browsers mentioned above. watch the logfiles. wait for it. nope, no googlebot request. it is unbelievable easy to test, i have done so on various occasions in the past, so there is no need for you to spread a "several people have reported" rumor. just ... test ... it.
as for the old stories, that google does this kind of thing: people, especially SEOs or people who think they know SEO, always blame google. oh, my beta.site has been indexed, it must be because of ... google is evil.
most of the times i have seen cases where googlebot found a not published yet site it was because of (just some examples, not a complete list) i.e.:
* turned on error reporting (most of the PHP sites)
* the URLs were already used in some javascript
* server side analytics software, open to the public
* apaches shows file/order structure
* indexable logfiles
* people linked to the site
* somebody tweeted about it
* site was covered on techcrunch (yes, really)
* all visited URLs in the network were tracked by a firewall, the firewall published a log on an internal server, the internal server was reachable from the outside
* internal wiki is indexable
* intranet is indexable
* concept paper is indexable
testing your hypothesis "chrome/google toolbar/... push URLs into the googlebot discovery queue, which leads to googlebot visits" is easily testable. no need to spread rumors. setup for testing this: make an html-page (30 seconds max, basically ssh to your server, create a file, write some html), tail & grep logfiles (30 sec max), wait (forever)
When you add the +1 button to a page, Google assumes that
you want that page to be publicly available and visible in
Google Search results. As a result, we may fetch and show
that page even if it is disallowed in robots.txt.
I can understand adding a +1 button to a dev site, and then not understanding why it shows up in the index.
Don't forget people who may have * installed UserScripts / GreaseMonkey scripts * Browser plugins other than Google Toolbar which may send stuff to the big G * (Self-)modded browsers which send out stuff to wherever...the list goes on and on indeed.
Best thing to do to keep a site secret:
* Don't host it on the internet (d'uh)
* Hide behind a portal page and have that and your server weed out misconfigured / hijacked browsers before any can proceed to your real secret site (also see web cloaking).
I'm not sure either, but I doubt that Chrome or any of the badware-stopping features that are built in to it cause the URLs they're checking to be indexed. I'd be even more surprised if Firefox did this.
If you've got the toolbar installed though, I'd be less surprised if they tried crawling or indexing URLs you go to.
At least in terms of malware detection, Chrome utilises a bloom filter in the first instance to identify the probability of a URL being malicious before making remote calls. If it is found to be positive, only then does it submit it to Google for more precise verification.
> EDIT: It looks like they've explicitly said the toolbar does not cause things to appear in search results
I read this too after posting, but I'm skeptical. It wouldn't be the first time they claimed to not do things they later admitted doing ... The rationale being that search engines need a way to discover new URLs quickly and keep ahead of the competition (indexing speed and breadth).
I'd also like to know what exactly Google Desktop Search does with URLs it finds.
Robots.txt is about fetching content, it has noting do to with indexing URLs or anything which is part of the content at non-robots.txt restricted locations.
Whether google sends the urls to itself or not can be easily decided by using a http monitoring tool like fiddler and with hosts filter we can narrow down the traffic to google.com
Leave it running for few days you will see for yourself
A robots.txt file disallowing crawling on the sites that display the contents of user email would help fix this.
However, as some of the discussion below points out, I don't believe that disallowing crawling of these URLs in our robots.txt would keep them from the index if a search engine finds reference to them elsewhere; I think it simply keeps them from being crawled.
(Regardless of whether one has a Facebook account or not) If your theory is correct, this seems like a good reason to not use Chrome or any browser with a Google Toolbar :)
My name is Jared Null, and I first reported this as a vulnerability back in March to the bug bounty program. I've posted one conversation here: http://news.cnet.com/8301-1023_3-57544933-93/facebook-passwo.... I'm confused, you say that its not a vulnerability, yet Facebook had to take action. I guess seeing is believing and it only took a public disclosure to see the light. The sad thing is I reported both the recover password link and the checkpoint link "https://www.facebook.com/checkpoint/checkpointme?u= (which by the way is still vulnerable), the checkpoint links are reusable but the recover password links were one time use.
You mention that the nonces expire after a period of time.
If you don't plan on cutting the feature for ever, perhaps you could consider an alternative approach of limiting the validity of the URLs to the first visit and also removing the email-id (and other PII data) of the user from the URL.
The feature is absolutely too dangerous to ever have existed!
It turns out that Facebook implemented the plain links that are more powerful than the password reset procedures, considering the easiness in taking over the account of another user.
Having the actual user id in the link is just a small topping on that cake, not even worth to discuss as long as the "no login just click the link" possibility remains to exist.
When did the term "nonce" start being used in web application development to refer to a token that expires after a period of time instead of being a true one-time use number/token?
Hi, My account was hacked at the weekend and although it is locked the person still keeps changing my password and I am not able to get into it. it wont let me reset my password as keeps coming up with an error message. I need this sorted and have had no help from fb even after reporting it numerous times
Would Facebook ever consider having the option of two factor authentication (something similar, if not compatible with Google Authentication/TOTP/MOTP apps)?
I use it too, but I must admit to wishing they would make it compatible with Google Authenticator or some other OATH implementation. SMS'ed text codes take way too long to be a good second-step when you're having to login everyday like I do (not to mention logging in from work where my reception is almost nil).
Facebook's privacy settings have a ton of bugs. Here's another one:
1. Make a stupid status update post.
2. It appears in all your friends newsfeed.
3. You realize you said something stupid and private.
4. Panic. Delete post
5. Breathe sigh of relief that it is no longer showing up in your profile.
6. But wait a minute! It still keeps showing up in all your friends newsfeed.
7. Now that you deleted the post, you can't even modify it's visibility settings. Heck you can't even get to the url. But all your friends continue to see the post in their newsfeeds
So the way facebook implements a delete of any activity(status post/like/comment) is that the owner stops seeing it but everyone else keeps seeing it. That is simply the most retarded delete implementation ever!
This isn't a case of eventual consistency. They don't bother to update the cache(if that is the reason) or update the database used in newsfeed. The deleted posts persist in the newsfeed database many hours(maybe forever) after the delete event.
huh? Infrastructure is totally and utterly irrelevant to the problem. I know enough about commonsense to make such a claim.
Just send a new message, exactly as you would post an original item/comment/etc, but have some special text/field in there that says "please ignore the previous message". The UI would then hide the previous message.
eg
COMMENT: {id:9374758, from:"mibbitier", data:"I hate you all!"}
COMMENT: {id:9374759, from:"mibbitier", data:"*IGNORE_MESSAGE_IN_UI* 9374758"}
Nothing whatsoever to do with infrastructure. Nothing to do with caches. Purely to do with the UI. Not rocket science.
Granted, it's a poor way to do it, but it's better than nothing, and easier than trying to invalidate caches etc
Here's the thing that customers, managers, and less experienced developers all have in common: they understand that no one thing is difficult. But they don't take into account that managing the complexity between a thousand, or a hundred thousand, or a million rules is very, very difficult.
That's why you hire more experienced developers: they're more experienced, not at things like cache invalidation (sure, just nuke your entire cache anytime anything changes! easy!), but at managing complexity.
Which is difficult.
That's why I try to keep my mouth shut about how somebody should "just do this, it'd be so easy, why are they dumb?"
Weird. I clicked on one of the links and it asked me if I was that user, and, if so, that I should click the login button. When I did, it logged me in as that user.
Edit: This happens for multiple users.
Edit2: It looks like if you click on the link, it automatically expires. bCODE is "an identifier that can be sent to a mobile phone/device and used as a ticket/voucher/identification or other type of token." I'm guessing somehow these tokens (the ones that auto log you in) never got used, plus the old ones were saved and contain email info. Not sure how Google could have gotten them though. Probably just got accidentally listed, despite robots.txt.
Here's one theory and analysis of what might have happened. Some people's emails got out into the public internet, and were indexed. Some of these emails were from Facebook, and included links to resources that require login. These links pre-populated the username field for convenience, or in some cases auto-login the user. Facebook's engineers probably did not anticipate email notifications to users being crawled by Google. Live and learn, eh?
But could Facebook have done something to prevent or minimize the damage caused by these leaked emails?
1. Lets start with the auto-login links, as those are the scariest. Do those links use one-time-use tokens, and do the tokens expire? If either or both of those steps was skipped it makes this leak much more serious, and speaks to negligence or disrespect for user security. If Facebook has both of those security measures in place, though, they did all they realistically could. If somebody lets their private email get indexed by Google (seriously, though, how does that even happen??), that's their own problem.
2. The other class of leaked urls link email addresses to Facebook profiles. This isn't as immediately scary, and for a lot of people it wouldn't even matter. But it is easy to imagine scenarios where this kind of privacy would be important to someone, and this kind of leak would be just as scary as someone being able to log in as them. Frankly, I never would have thought of securing this, and I doubt Facebook did anything to secure it. Going forward, though, it would probably be worth it for them to link auto-username-populating through one-time-use, expiring tokens as well.
So, it looks like Facebook probably got hit with a bizarre edge case privacy / security issue. There are likely things they could do to make their system more resistant to this kind of thing, but at the same time they probably didn't do as badly as this might make them look at first glance.
Again, this is speculation, any confirmation or disconfirmation would be great.
Seems like emails on these domains are much more easily viewable/leakable/indexable than normal personal email addresses?
EDIT: Googling one of the discovered gmail address revealed a Facebook email (with 'bcode') being auto-blogged at weight-loss-information-123.blogspot.com https://encrypted.google.com/search?hl=en&q=danielsams20... - some kind of malware maybe?
Going through my own inbox for Facebook emails, it attaches my email address in the n_m parameter, the bcode parameter, and a mid parameter to all the links it gives me. This includes links to my friends' profiles, events, group posts, etc.
As far as an expiration on the auto-login, I rarely click on the links Facebook provides in my email. (I like to get the notification to remind me to go on Facebook later.) The last one I got was about 25 hours ago. I didn't use the link before and it did not log me in when I clicked it just now.
I clicked on some profiles, and I noticed that many of the e-mail addresses populated were @asdasd.ru — the domain of a Russian mailinator-type service. Something like that might be indexed.
"Some people's emails got out into the public internet, and were indexed. Some of these emails were from Facebook, and included links..."
Doesn't Google's toolbar phone home with the URLs you click on? That could be a way to get supposedly-private URLs into Google's list of URLs to be visited.
Matt Cutts has publicly stated here and in other forums that the Google Toolbar does collect click data but does not use the data to insert URLs into Google's index. Here's a recent(ish) post on the matter:
That's an interesting idea, but as someone noted, most of the emails involved here come from a small set of domains, such as blogger or anonymous mailinator type emails (emails which are possibly crawled by google often!)
I think if it were google's toolbars picking up urls in emails, that there would be many more email domains here.
Okay, I've been through all the comments and I'm going to try to summarize:
- It looks like in some situations, Facebook will send an email that has a link. That link expires after a certain amount of time, but in the mean time, clicking that link lets people access that Facebook account.
- A large number of services can be set up to automatically post any email received onto the web. One major category is disposable email services such as asdasd.ru. Any email to a throwaway account on asdasd.ru gets put up on the web. Here's an example Facebook recovery email that got turned into a web page: http://asdasd.ru/read/414831
- Once these emails are just webpages, it's no surprise that search engines discover those URLs. Note that this is not a Google-specific issue. When I search on Bing for the query [site:facebook.com bcode n_m mid], the first result is also one of these urls that has an email address embedded in it. For a debunk of the misconception that this is related to the Google Toolbar or Chrome, see my post elsewhere in this discussion at http://news.ycombinator.com/item?id=4733276
So: an email gets sent to someone. That email gets put up on the web as a webpage. Search engines (including both Google and Bing) find that webpage as they follow links on the web.
When I try on Google to find the email bodies, I get 250k results, of which the large majority are on blogspot.com sites.
While mail bodies can be found on a few other sites, like the asdasd.ru example, and other search engines have found these links too, the main issue still seems to be with blogspot.com -- These aren't throwaway accounts with public inboxes, but likely some virus that is intercepting certain mails (Facebook, Twitter, Youtube, Twoo) and reposting them as a blogpost for everyone to see.
As Blogspot is Google-owned, this does seem to me a predominantly Google-specific issue.
If you look at the bottom of that Blogger post, it says "This message was sent to <a gmail address>." So an email from Facebook got posted as a web page to this blog.
There's no need to suspect some virus that's intercepting emails. Plenty of people have set up their systems such that email messages get turned into web pages.
You are probably right and I apologize for any misinformation. To me it seemed strange that the blogs first started spamming, followed by publishing only certain emails. Wouldn't it make more sense if all emails were published, not only from certain webservices? Why would a user want to publish their private Facebook emails in the first place? None of these accounts posts normal updates, they act compromised.
Automatically posting any email received onto the web can be a security issue. As you said, Blogspot is indiscriminate on which e-mail it publishes, there is no need to suggest a virus is targeting Facebook or Twitter mails -- I was confused on that point.
I've tested the indiscriminate posting and any HTML you send to Blogspot accounts with this feature gets published: Including <script> tags.
An e-mail client isn't supposed to execute <script> tags, I feel if you republish an email online, it should strip out the <script> tags too.
The Blogspot sites that run this service are currently under attack by spammers, who send spam emails (which don't seem to get filtered very well), allowing spam by proxy and editorial-looking links. Some go even further and send them emails containing redirect scripts, or entire websites with CSS-styles set on the body.
Custom CSS and custom script allow for attack vectors such as these. Spam doesn't seem to filter very well. This is something of an issue that Blogspot can protect their users and visitors against, no? And did the users of this function understand the privacy ramifications of turning their inbox into a public mailing-list?
Worse than redirects, thinking like a wicked spammer:
1. User turns on feature inbox-to-webpage
2. Spammer finds these users by scanning the index
3. Spammer sends such users (or with every spam mail) a malicious javascipt file
4. javascript pop-up with: "Re-enter your credentials"
5. Change password and steal blog
6. Check if blogspot account is connected to a Gmail account.
Thank you for exposing this. Much appreciated. Here's one more - The last time I checked, Facebook revealed 'what you liked' to search engines like Google. For example, if you search for your name inside double quotes like this - "Your Name" you will see your name listed virtually on every single page you liked, for example, If you had liked Sony's Facebook fan page, then your name would appear in the search results something like this - "[Your name] and 8 others like this"
That's strange because I did tell Facebook under my account settings NOT to list my profile or my name on Search engines.
To summarize - So be careful with what you 'like', because it really just takes a Google search to find out your interests. This could (potentially) be a problem if you are actively seeking employment (and if you had 'liked' some crazy stuff) or if you have a crazy girlfriend.
Common misinterpretation on how Google handle `Disallow` in robots.txt
Q. If I block Google from crawling a page using a robots.txt disallow directive, will it disappear from search results? [1]
robots.txt Disallow does not guarantee that a page will not appear in results: Google may still decide, based on external information such as incoming links, that it is relevant. If you wish to explicitly block a page from being indexed, you should instead use the noindex robots meta tag or X-Robots-Tag HTTP header. In this case, you should not disallow the page in robots.txt, because the page must be crawled in order for the tag to be seen and obeyed.
We develop and host a bunch of extranets, which without login consist of your typical authentication page. We put a robots.txt file there, and the only sites that link there are our customers companies home sites.
Google still indexes them. The definition of "relevant" here defies my wildest imagination.
You can do access control on the contents of HTTP_REFERER: if the browser visits a page in your robots.txt by following a Google link, serve them up a 403 forbidden. (In Apache 2.4, this can all be done using mod_authz_core.)
You could maybe say in your 403 forbidden message that Google has been forbidden from indexing the page (use ErrorDocument). If enough sites did that, Google might change their policy.
Google's default for logged in users is to use https and strip searched phrases when leaving SERP, so HTTP_REFERER will be empty. A lot of security software also cuts HTTP_REFERER. Being behind proxy may cause it to be empty, too.
In general, I don't think you can rely on headers sent by the the browser. You don't know if they are real or forged.
Large number of login emails seem to be from asdasd.ru domain. Googling one of these emails I find a site that resembles a public inbox with emails from Facebook in it, like this one - http://asdasd.ru/read/414831.
You are able to post on blogger via email. If you register with this blogger-email-address on facebook, all facebook notifications are published as blogger posts and indexed by Google.
Actually this might be a used to circumvent a firewall preventing you from using facebook.
You can search for the leaked email addresses on Google and propably find blogger blogs with facebook notifications posted.
If that's truly how people use it, it is very strange.
In actual roman numerals MM = 2000. Using 'M' as a roman numeral but then multiplying digits makes no sense at all (you'd need numerals for all the prime numbers to represent arbitrary numbers...).
And in SI, the prefix 'M' (mega) already means 1 million, so to me it seems MM is the notation that maximizes confusion.
I totally agree - there's not a roman numeral justifaction for it at all and it's very confusing in normal situations.
My understanding is that it comes from financial (specifically trader) jargon and I suspect it probably originated to differentiate it from some other use of "m", but don't know for sure... Maybe someone else knows why it arose?
I always thought it was short for "million monthly" so when you see something like "we have 10MM users" it would be 10 million monthly active users (i.e. 10 million users who have been active in the past month). I have no idea what it means here.
OK, so after a bit of searching, it seems that it comes from the Latin word for "thousand", millia[1]. And it's apparently common in financial contexts.
So 1M is 1,000. 1MM is 1,000,000. 1MMM is 1,000,000,000 (though the former and latter are not as common). Still seems like a confusing way to abbreviate to me.
Good point. So it turns out that MM is supposed to mean "thousand thousand" in the world of finance, but it is indeed not a correct roman numeral. Old school.
A large number of the emails are from 'blogger.com'. Aren't google and blogger one in the same? Are the urls being crawled because google is reading its emails and crawling contained urls?
I'm sure lots of people have had unwanted encounters with Google's crawlers, but here's mine:
I used to have a subdomain pointing to my home IP which was protected using Apache htpasswd. I naively had all of my clients' credentials stored in text files (conveniently named credentials.txt). Somehow I accidentally removed the htpasswd authentication and it was publicly exposed for a day or two. Of course Google indexed it and you could view everything in Google's cache.
There was a process for removing content from Google, but it took a few months to get completed. I never told anyone and I'm pretty sure all that info is now purged (I've tried to find it multiple times and it doesn't seem to exist anywhere).
I also downloaded a WoW guide that I had temporarily thrown up on one of my servers and forgot to take down. Like a year later I randomly was running a Google image search for 'Northrend Map' and happened to notice my site was the THIRD image. At first I thought it was a personalized search result, but I checked from multiple other places and it was still there even though there were zero inbound links.
They indicate doing a search. If I say do a search for [flowers] then someone should type the actual word flowers into Google. We use the brackets to make it clear what the literal text of a search query is.
I think it's just meant to mean "million" but it's used a lot in the VC/startup community in valuations ("CompanyX receives $10MM in angel funding" etc). I think it has come from finance originally, but either way I think it's unnecessary.
Can someone give write a summary on the exact issue ? Was it that people's FB accounts were accessible through an auto-login link ? Also, I only see one result returned, and not 1 million.
Goes to some Obama site and redirects to
another page. However, using it a second
time results in Firefox reporting that it
doesn't understand the URL.
What exactly was exposed here. It looks like it's been blocked now...
Just stealing from other bit in this thread: somehow these urls got on the Internet even though they shouldn't have. They are pre-authed urls that auto-login and then expire.
For a search engine to come across these links, the content of the emails would need to have been posted online (e.g. via throwaway email sites, as someone pointed out - or people whose email addresses go to email lists with online archives).
As jpadvo surmised, the nonces expire after a period of time. They also only work for certain users, and even then we run additional security checks to make sure it looks like the account owner who's logging in. Regardless, due to some of these links being disclosed, we've turned the feature off until we can better ensure its security for users whose email contents are publicly visible. We are also securing the accounts of anyone who recently logged in through this flow.
In the future if you run into something that looks like a security problem with Facebook, feel free to disclose it responsibly through our whitehat program: https://www.facebook.com/whitehat. That way, in addition to making some money, you can avoid a bunch of script kiddies exploiting whatever the issue is that you've found.