Hacker News new | past | comments | ask | show | jobs | submit login
More than 1MM Facebook accounts exposed (google.com)
185 points by nico-roddz on Nov 2, 2012 | hide | past | favorite | 166 comments



My name is Matt Jones, and I work on the Facbook security team that looked into this tonight. We only send these URLs to the email address of the account owner for their ease of use and never make them publicly available. Even then we put protection in place to reduce the likelihood that anyone else could click through to the account.

For a search engine to come across these links, the content of the emails would need to have been posted online (e.g. via throwaway email sites, as someone pointed out - or people whose email addresses go to email lists with online archives).

As jpadvo surmised, the nonces expire after a period of time. They also only work for certain users, and even then we run additional security checks to make sure it looks like the account owner who's logging in. Regardless, due to some of these links being disclosed, we've turned the feature off until we can better ensure its security for users whose email contents are publicly visible. We are also securing the accounts of anyone who recently logged in through this flow.

In the future if you run into something that looks like a security problem with Facebook, feel free to disclose it responsibly through our whitehat program: https://www.facebook.com/whitehat. That way, in addition to making some money, you can avoid a bunch of script kiddies exploiting whatever the issue is that you've found.


Thanks Matt,

My only concern is my account security (not money).

I found this issue with almost no technical knowledge, so the crazy thing is:

How many back doors should be over there ready to be exploited by spammers?

BTW, a big "report security issue" button on https://www.facebook.com/help/ would certainly help next time.

Thanks again,

Nico


It shouldn't take you more than one Google query to find the place to report Facebook security problems.

I don't think it's a good idea to link it from the general support section -- you don't want the security team that is hopefully carefully monitoring this stuff to have to wade through thousands of regular customer service complaints.


It shouldn't... but it could be easier. I've been in the situation before where I wanted to report malware on facebook and I couldn't figure out where to report it.

I agree that you don't want reporting a security issue to supersede the general case of problems, but as things stand it is hard to figure out how to report a real security issue if you don't know about that magic whitehat url.

Googling "facebook security" brings

#1 result: https://www.facebook.com/security

no information on reporting problems there

#2 result: https://www.facebook.com/help/security

this one has a Report Something link... but that doesn't give you options for reporting a security issue, just TOS violations or copyright infringement.

#3 result: https://www.facebook.com/security/app_10442206389

This looks better than the other two, but there is still nothing here about how to report a security issue.

Knowing what to look for, there's a hidden "Take Action >> White Hats" link that will eventually take you to the correct page: https://www.facebook.com/security/app_6009294086

So click that link... and presented with a huge page of names and still no obvious call to action: https://www.facebook.com/whitehat

Oh, it's the Report Vulnerability link in that sidebar that we're been conditioned to ignore in the normal Facebook UI.

https://www.facebook.com/whitehat/report/

---

Just to recap, in order to find how to submit a security bug report, it took me 15 minutes and I still only found it because I knew the term to look for was "white hat" and not "security".


shrug

Perhaps you're right. But "Facebook report a vulnerability" works just fine and that's what I would have tried if I were trying to report a vulnerability.


That's still a few down in the search results.

It looks like the magic search term that brings you right to the report page is: "Facebook vulnerability"

http://google.com/?q=Facebook+vulnerability


Since this is already out there as a known issue, and concerns Google too, check out:

https://www.google.com/search?q=%22wants+to+be+friends+on+Fa...

And you'll find at the time of writing 250.000 more results where the "wants to be friends" email with the auto-login link is posted on blogs. Many of these blogs are also hacked, in that they redirect you to Russian dating sites if you visit the homepage.

An example of such a blog with password reset email is: http://papajimummyji.blogspot.com/

An example of a spam-redirecting blog is: http://demiansyahhh.blogspot.com/ (possibly unsafe)

For some more Facebook reset emails see:

https://www.google.com/search?q=%22You+recently+asked+to+res...

EDIT: Twitter emails are also exposed: https://www.google.com/search?q=%22Forgot+your+Twitter+passw...

Youtube emails: https://www.google.com/search?q=%22YouTube+sends+email+summa...

Twoo emails: https://www.google.nl/search?q=%22Massive+Media+NV%2C+Emile+...

And likely more web services.


It's 47 minutes later and for your searches, I'm seeing 5 results and 309 results. Spooky.


Somebody at the Google is certainly watching this thread and cleaning house.


Yeah that was weird. I found another that still returns 233.000 results for me:

https://www.google.com/search?q=%22wants+to+be+friends%22+%2...

I must have made a typo at "don%27t". I corrected the first query and it now returns 238.000 results for me again.

Perhaps some Blogspot sites got hacked/ their users phished (I noticed suspicious posting activity dating back to November 2011), which would explain how they got access to the emails. Or these accounts are all fake (selling likes) and they use Blogspot to create online persona's and manage their accounts.


It's something like this (I don't know if you already knew):

1. Try to search this http://goo.gl/dHHsU on Google. You'll find (at the time of writing) 90.300 results.

2. Find an URL like this https://twitter.com/account/confirm_email/[username]/[XXXXX-...

3. Change the URL like this https://twitter.com/account/not_my_account/[username]/[XXXXX...

Twitter "not_my_account" vulnerability:

- Information disclosure vulnerability: you'll see the email of the Twitter user [username]

- DoS vulnerability: you can click on the "I did not sign up for this account" button. After that, the Twitter user [username] email will be removed from the [username] account.


The URLs don't need to be posted online. Some browsers (Chrome, possibly Firefox with Safe Browsing mode, very likely any browser with a Google Toolbar installed) send visited URLs to Google and they will be indexed. I don't know if this is officially documented by Google, but several people have reported seeing this while testing new/beta websites that weren't published or linked anywhere.


Hi there, allow me to correct this misconception. I've debunked that idea often enough that I wrote a blog post about this four years ago: http://www.mattcutts.com/blog/toolbar-indexing-debunk-post/ I wrote an earlier debunk post in 2006 too: http://www.mattcutts.com/blog/debunking-toolbar-doesnt-lead-...

I noticed a new twist in your post though: you're saying that because of Safe Browsing (which checks for e.g. malware as users surf the web), those urls are sent to Google. The way that Chrome and Firefox actually do Safe Browsing is that they download an encrypted blob which allows the browser to do a lookup for dangerous urls on the client side--not by sending any urls to Google. I believe that if there's a match in the client-side encrypted table, only then does the browser send the now-suspect url to Google for checking.

Here's more info: https://developers.google.com/safe-browsing/ I believe the correct mental model of the Safe Browsing API in browsers is "Download a hash table of believed-to-be-dangerous urls. As you surf, check against that local hash table. If you find a match/collision, then the user might be about to land on a bad url, so check for more info at that point."

Hope that helps. Further down in the discussion, someone posted this helpful link with more explanation: http://blog.alexyakunin.com/2010/03/nice-bloom-filter-applic...


Sorry but don't believe you about google toolbar. I had a private page with no links in or out and yet it appeared in google search. It was not guessable and there was no chance for a referrer link. The page was never shared with friends nor accessed outside my own computers.

I only found out when a friend searched for his name and the page appeared as it was my phone list


Multiple people have run controlled experiments like I described in http://www.mattcutts.com/blog/debunking-toolbar-doesnt-lead-...

The most common way such "secret" pages get crawled is that someone visited that secret page with their referrers on and then goes to another page. For example, are you 100% positive that every person who ever visited that page had referrers turned off on every single browser (including mobile phones) they used to access that page?


Are you sure that it is the referrer headers? PP clearly stated there were no outgoing links on the secret page. I think there's a much more mundane explanation: javascript stuff downloaded from Googles CDN. People nowadays are so used to just plopping jQuery etc. into their web pages that they forget that this stuff has to come from somewhere. If it's from Google, I'm quite certain that their CDN loader phones home right before it gives up any of the good stuff.

EDIT: Confirmed, though I was wrong in that there's no loader, requesting jQuery from ajax.googleapis.com gives them a nice fresh Referer header pointing at your secret site for their spiders to crawl. Be mindful!


I'm 100% sure. That page was for me and me alone. It was never accessed by anyone but me. I never shared the URL with anyone.

Referrers only get shared through links. There were no links to or from that page. Going to a page and typing in new URL does not provide a referrer.


an old meme, and my usual recommendation: just test it: create a page that i not linked from anywhere. visit it with the browsers mentioned above. watch the logfiles. wait for it. nope, no googlebot request. it is unbelievable easy to test, i have done so on various occasions in the past, so there is no need for you to spread a "several people have reported" rumor. just ... test ... it.

as for the old stories, that google does this kind of thing: people, especially SEOs or people who think they know SEO, always blame google. oh, my beta.site has been indexed, it must be because of ... google is evil.

most of the times i have seen cases where googlebot found a not published yet site it was because of (just some examples, not a complete list) i.e.:

* turned on error reporting (most of the PHP sites) * the URLs were already used in some javascript * server side analytics software, open to the public * apaches shows file/order structure * indexable logfiles * people linked to the site * somebody tweeted about it * site was covered on techcrunch (yes, really) * all visited URLs in the network were tracked by a firewall, the firewall published a log on an internal server, the internal server was reachable from the outside * internal wiki is indexable * intranet is indexable * concept paper is indexable

testing your hypothesis "chrome/google toolbar/... push URLs into the googlebot discovery queue, which leads to googlebot visits" is easily testable. no need to spread rumors. setup for testing this: make an html-page (30 seconds max, basically ssh to your server, create a file, write some html), tail & grep logfiles (30 sec max), wait (forever)


It is a myth that is hard to get rid of. No one wants to admit they tweeted out a link to the dev website.

Though I recently found this on the Google+ FAQ: http://support.google.com/webmasters/bin/answer.py?hl=en&...

  When you add the +1 button to a page, Google assumes that
  you want that page to be publicly available and visible in
  Google Search results. As a result, we may fetch and show
  that page even if it is disallowed in robots.txt.
I can understand adding a +1 button to a dev site, and then not understanding why it shows up in the index.


Don't forget people who may have * installed UserScripts / GreaseMonkey scripts * Browser plugins other than Google Toolbar which may send stuff to the big G * (Self-)modded browsers which send out stuff to wherever...the list goes on and on indeed.

Best thing to do to keep a site secret: * Don't host it on the internet (d'uh) * Hide behind a portal page and have that and your server weed out misconfigured / hijacked browsers before any can proceed to your real secret site (also see web cloaking).


I'm not sure either, but I doubt that Chrome or any of the badware-stopping features that are built in to it cause the URLs they're checking to be indexed. I'd be even more surprised if Firefox did this.

If you've got the toolbar installed though, I'd be less surprised if they tried crawling or indexing URLs you go to.

EDIT: It looks like they've explicitly said the toolbar does not cause things to appear in search results: http://www.seroundtable.com/google-toolbar-indexing-12894.ht....


At least in terms of malware detection, Chrome utilises a bloom filter in the first instance to identify the probability of a URL being malicious before making remote calls. If it is found to be positive, only then does it submit it to Google for more precise verification.

1. http://blog.alexyakunin.com/2010/03/nice-bloom-filter-applic...


> EDIT: It looks like they've explicitly said the toolbar does not cause things to appear in search results

I read this too after posting, but I'm skeptical. It wouldn't be the first time they claimed to not do things they later admitted doing ... The rationale being that search engines need a way to discover new URLs quickly and keep ahead of the competition (indexing speed and breadth).

I'd also like to know what exactly Google Desktop Search does with URLs it finds.


You could make a good bit of easy money if you can prove your suspicions. But since you haven't...


Google indexes URLs despite measures such as robots.txt when these URLs are discovered by Google software including Chrome and their Toolbar.


Robots.txt is about fetching content, it has noting do to with indexing URLs or anything which is part of the content at non-robots.txt restricted locations.


Safe browsing in Firefox is implemented client side, it doesn't share urls with Google.


When you like or share a post in your newsfeed, you're sending a linkback to the original post.

So, if your newsfeed is public "to everyone" Google is able to crawl and index the content on it (discard the original post privacy settings)


Whether google sends the urls to itself or not can be easily decided by using a http monitoring tool like fiddler and with hosts filter we can narrow down the traffic to google.com

Leave it running for few days you will see for yourself


Wouldn't a proper robots.txt rule fix this?


A robots.txt file disallowing crawling on the sites that display the contents of user email would help fix this.

However, as some of the discussion below points out, I don't believe that disallowing crawling of these URLs in our robots.txt would keep them from the index if a search engine finds reference to them elsewhere; I think it simply keeps them from being crawled.


(Regardless of whether one has a Facebook account or not) If your theory is correct, this seems like a good reason to not use Chrome or any browser with a Google Toolbar :)


Another good reason.


Sorry, but, where does it mention anything about making money for reporting security vulnerabilities on https://www.facebook.com/whitehat?



Yep as chucknthem points out, try clicking the "bounty" on the left side: https://www.facebook.com/whitehat/bounty/. Sorry I didn't make that clearer!


My name is Jared Null, and I first reported this as a vulnerability back in March to the bug bounty program. I've posted one conversation here: http://news.cnet.com/8301-1023_3-57544933-93/facebook-passwo.... I'm confused, you say that its not a vulnerability, yet Facebook had to take action. I guess seeing is believing and it only took a public disclosure to see the light. The sad thing is I reported both the recover password link and the checkpoint link "https://www.facebook.com/checkpoint/checkpointme?u= (which by the way is still vulnerable), the checkpoint links are reusable but the recover password links were one time use.

Jared Null WhiteHat Security


You mention that the nonces expire after a period of time.

If you don't plan on cutting the feature for ever, perhaps you could consider an alternative approach of limiting the validity of the URLs to the first visit and also removing the email-id (and other PII data) of the user from the URL.


The feature is absolutely too dangerous to ever have existed!

It turns out that Facebook implemented the plain links that are more powerful than the password reset procedures, considering the easiness in taking over the account of another user.

Having the actual user id in the link is just a small topping on that cake, not even worth to discuss as long as the "no login just click the link" possibility remains to exist.


When did the term "nonce" start being used in web application development to refer to a token that expires after a period of time instead of being a true one-time use number/token?

http://en.wikipedia.org/wiki/Cryptographic_nonce


They could both be one-time-use nonces and additionally have an expiration date. That was how I read the statement, but maybe that was generous.


Nonces are one time use in webapps, unless bad bug.


WordPress uses them in a similar way to how it sounds like Facebook is using them. I wonder how many others are misusing the term.


Hi, My account was hacked at the weekend and although it is locked the person still keeps changing my password and I am not able to get into it. it wont let me reset my password as keeps coming up with an error message. I need this sorted and have had no help from fb even after reporting it numerous times


I discovered this a long time ago, and I denounced this In this post you can see http://gonzac-studios.blogspot.com.ar/2012/02/hack-la-vulner...


Would Facebook ever consider having the option of two factor authentication (something similar, if not compatible with Google Authentication/TOTP/MOTP apps)?



I use it too, but I must admit to wishing they would make it compatible with Google Authenticator or some other OATH implementation. SMS'ed text codes take way too long to be a good second-step when you're having to login everyday like I do (not to mention logging in from work where my reception is almost nil).


We're working on improving this flow.

However, if you tell us to trust a given computer when you log in, you shouldn't have to enter the code more than once.


I can't speak for the iOS app but the Android app will generate tokens.


It does, and I'm using it.


That page doesn't say anything about money. It says Facebook might decide not to sue you submit it.


https://www.facebook.com/whitehat/bounty/ describes our bug bounty program, linked to from the "bounty" tab on the left of https://www.facebook.com/whitehat/.


Facebook's privacy settings have a ton of bugs. Here's another one: 1. Make a stupid status update post. 2. It appears in all your friends newsfeed. 3. You realize you said something stupid and private. 4. Panic. Delete post 5. Breathe sigh of relief that it is no longer showing up in your profile. 6. But wait a minute! It still keeps showing up in all your friends newsfeed. 7. Now that you deleted the post, you can't even modify it's visibility settings. Heck you can't even get to the url. But all your friends continue to see the post in their newsfeeds

So the way facebook implements a delete of any activity(status post/like/comment) is that the owner stops seeing it but everyone else keeps seeing it. That is simply the most retarded delete implementation ever!


Mumble mumble mumble eventual consistency mumble mumble sharded MySQL mumble mumble


This isn't a case of eventual consistency. They don't bother to update the cache(if that is the reason) or update the database used in newsfeed. The deleted posts persist in the newsfeed database many hours(maybe forever) after the delete event.


Caching is a conscious decision to take advantage of eventual consistency.


to be fair,

cache invalidation is hard.


And naming things.


Together with off by one errors, they're the two hardest things in computer science...


Yes, the three hardest things in Comp Sci are cache invalidation and off by one errors.


Don't you mean the 10 hardest things?


I think he means the 11 hardest things actually.


10 in ternary (which I think was his part of the joke)


All they need to do is add a new "message" to say "y'know that message number 4029375 in the cache, yeah that ones deleted, ignore it".

This is not hard.


This arrogance is baffling. What makes you think you know enough about Facebook's infrastructure to make such a claim?


huh? Infrastructure is totally and utterly irrelevant to the problem. I know enough about common sense to make such a claim.

Just send a new message, exactly as you would post an original item/comment/etc, but have some special text/field in there that says "please ignore the previous message". The UI would then hide the previous message.

eg

  COMMENT: {id:9374758, from:"mibbitier", data:"I hate you all!"}
  COMMENT: {id:9374759, from:"mibbitier", data:"*IGNORE_MESSAGE_IN_UI* 9374758"}
Nothing whatsoever to do with infrastructure. Nothing to do with caches. Purely to do with the UI. Not rocket science.

Granted, it's a poor way to do it, but it's better than nothing, and easier than trying to invalidate caches etc


Here's the thing that customers, managers, and less experienced developers all have in common: they understand that no one thing is difficult. But they don't take into account that managing the complexity between a thousand, or a hundred thousand, or a million rules is very, very difficult.

That's why you hire more experienced developers: they're more experienced, not at things like cache invalidation (sure, just nuke your entire cache anytime anything changes! easy!), but at managing complexity.

Which is difficult.

That's why I try to keep my mouth shut about how somebody should "just do this, it'd be so easy, why are they dumb?"


I'm sure it's already rendered to a static presentation-level (HTML/template language du jour) form at that point. That wouldn't work.


Just put some more javascript in there to deal with it. I'm sure it's not the hardest problem in the world.


Great, you've hidden the message with javascript, but now it's still in view-source and in search results for bots.

It's also still visible if someone is using noscript (actually, I have no idea if Facebook works with noscript, probably not).

The solution you are suggestion doesn't solve the problem and injects more corner cases.


I suppose they made it work the same as email. Once you sent something you can't take it back, but you can delete your copy if you want.


Weird. I clicked on one of the links and it asked me if I was that user, and, if so, that I should click the login button. When I did, it logged me in as that user.

Edit: This happens for multiple users.

Edit2: It looks like if you click on the link, it automatically expires. bCODE is "an identifier that can be sent to a mobile phone/device and used as a ticket/voucher/identification or other type of token." I'm guessing somehow these tokens (the ones that auto log you in) never got used, plus the old ones were saved and contain email info. Not sure how Google could have gotten them though. Probably just got accidentally listed, despite robots.txt.


One of the links redirected me to:

https://www.facebook.com/autologin.php?bcode=csVZIlpL_1.1351...

Which just says "Please try again later." but is probably part of the auto login path you discovered.


It's not an accident. You can even get fake urls indexed

http://www.seomofo.com/experiments/spam-search-results.html


It's really weird.


how? I am unable to replicate this bug


Here's one theory and analysis of what might have happened. Some people's emails got out into the public internet, and were indexed. Some of these emails were from Facebook, and included links to resources that require login. These links pre-populated the username field for convenience, or in some cases auto-login the user. Facebook's engineers probably did not anticipate email notifications to users being crawled by Google. Live and learn, eh?

But could Facebook have done something to prevent or minimize the damage caused by these leaked emails?

1. Lets start with the auto-login links, as those are the scariest. Do those links use one-time-use tokens, and do the tokens expire? If either or both of those steps was skipped it makes this leak much more serious, and speaks to negligence or disrespect for user security. If Facebook has both of those security measures in place, though, they did all they realistically could. If somebody lets their private email get indexed by Google (seriously, though, how does that even happen??), that's their own problem.

2. The other class of leaked urls link email addresses to Facebook profiles. This isn't as immediately scary, and for a lot of people it wouldn't even matter. But it is easy to imagine scenarios where this kind of privacy would be important to someone, and this kind of leak would be just as scary as someone being able to log in as them. Frankly, I never would have thought of securing this, and I doubt Facebook did anything to secure it. Going forward, though, it would probably be worth it for them to link auto-username-populating through one-time-use, expiring tokens as well.

So, it looks like Facebook probably got hit with a bizarre edge case privacy / security issue. There are likely things they could do to make their system more resistant to this kind of thing, but at the same time they probably didn't do as badly as this might make them look at first glance.

Again, this is speculation, any confirmation or disconfirmation would be great.


This is how everything started:

A friend forward me an email from a FB group notification

Something like:

http://www.facebook.com/n/?groups%[id here]%2Fpermalink%[id here]%2F&mid=[id here]&bcode=[id here]-mjoi&n_m=[email adress here]

When I clicked the url I got automatically logged into my friend's account.

So is definitely a Facebook security issue.

Then I tried some google searches to see if I could find some urls containing the parameters:

bcode= &email= n_m= mid=

Not a big deal, really.


Thanks for catching this nico-- looks like it's been removed from Google.


You're welcome!


I suspect this was caused by Google software, most likely Chrome or Google Toolbar, sending these private URLs to Google to be indexed.


See elsewhere on this discussion where I debunked your theory.


This theory seems partially supported by the fact that a lot of email addresses here are on the same domains:

  yahoogrupos.com.br
  yahoogroupes.fr
  asdasd.ru
  blogger.com
Seems like emails on these domains are much more easily viewable/leakable/indexable than normal personal email addresses?

EDIT: Googling one of the discovered gmail address revealed a Facebook email (with 'bcode') being auto-blogged at weight-loss-information-123.blogspot.com https://encrypted.google.com/search?hl=en&q=danielsams20... - some kind of malware maybe?


Going through my own inbox for Facebook emails, it attaches my email address in the n_m parameter, the bcode parameter, and a mid parameter to all the links it gives me. This includes links to my friends' profiles, events, group posts, etc.

As far as an expiration on the auto-login, I rarely click on the links Facebook provides in my email. (I like to get the notification to remind me to go on Facebook later.) The last one I got was about 25 hours ago. I didn't use the link before and it did not log me in when I clicked it just now.


I clicked on some profiles, and I noticed that many of the e-mail addresses populated were @asdasd.ru — the domain of a Russian mailinator-type service. Something like that might be indexed.


Huh, I wonder now if account hijacking is the actual design purpose of mailinators.


"Some people's emails got out into the public internet, and were indexed. Some of these emails were from Facebook, and included links..."

Doesn't Google's toolbar phone home with the URLs you click on? That could be a way to get supposedly-private URLs into Google's list of URLs to be visited.


Matt Cutts has publicly stated here and in other forums that the Google Toolbar does collect click data but does not use the data to insert URLs into Google's index. Here's a recent(ish) post on the matter:

http://www.seroundtable.com/google-toolbar-indexing-12894.ht...


That's an interesting idea, but as someone noted, most of the emails involved here come from a small set of domains, such as blogger or anonymous mailinator type emails (emails which are possibly crawled by google often!)

I think if it were google's toolbars picking up urls in emails, that there would be many more email domains here.


Okay, I've been through all the comments and I'm going to try to summarize:

- It looks like in some situations, Facebook will send an email that has a link. That link expires after a certain amount of time, but in the mean time, clicking that link lets people access that Facebook account.

- A large number of services can be set up to automatically post any email received onto the web. One major category is disposable email services such as asdasd.ru. Any email to a throwaway account on asdasd.ru gets put up on the web. Here's an example Facebook recovery email that got turned into a web page: http://asdasd.ru/read/414831

- Once these emails are just webpages, it's no surprise that search engines discover those URLs. Note that this is not a Google-specific issue. When I search on Bing for the query [site:facebook.com bcode n_m mid], the first result is also one of these urls that has an email address embedded in it. For a debunk of the misconception that this is related to the Google Toolbar or Chrome, see my post elsewhere in this discussion at http://news.ycombinator.com/item?id=4733276

So: an email gets sent to someone. That email gets put up on the web as a webpage. Search engines (including both Google and Bing) find that webpage as they follow links on the web.


I tried Bing and Yandex to find the email bodies. They didn't return many results (but they do return results).

http://www.bing.com/search?q=%22wants+to+be+friends+on+Faceb...

When I try on Google to find the email bodies, I get 250k results, of which the large majority are on blogspot.com sites.

While mail bodies can be found on a few other sites, like the asdasd.ru example, and other search engines have found these links too, the main issue still seems to be with blogspot.com -- These aren't throwaway accounts with public inboxes, but likely some virus that is intercepting certain mails (Facebook, Twitter, Youtube, Twoo) and reposting them as a blogpost for everyone to see.

As Blogspot is Google-owned, this does seem to me a predominantly Google-specific issue.


No, Blogger also has a feature that will automatically post messages sent to an email address. Here's an example email from Facebook that was posted to a blogspot.com url: http://weight-loss-information-123.blogspot.com/2012/08/misb...

If you look at the bottom of that Blogger post, it says "This message was sent to <a gmail address>." So an email from Facebook got posted as a web page to this blog.

There's no need to suspect some virus that's intercepting emails. Plenty of people have set up their systems such that email messages get turned into web pages.


You are probably right and I apologize for any misinformation. To me it seemed strange that the blogs first started spamming, followed by publishing only certain emails. Wouldn't it make more sense if all emails were published, not only from certain webservices? Why would a user want to publish their private Facebook emails in the first place? None of these accounts posts normal updates, they act compromised.


Automatically posting any email received onto the web can be a security issue. As you said, Blogspot is indiscriminate on which e-mail it publishes, there is no need to suggest a virus is targeting Facebook or Twitter mails -- I was confused on that point.

I've tested the indiscriminate posting and any HTML you send to Blogspot accounts with this feature gets published: Including <script> tags.

An e-mail client isn't supposed to execute <script> tags, I feel if you republish an email online, it should strip out the <script> tags too.

The Blogspot sites that run this service are currently under attack by spammers, who send spam emails (which don't seem to get filtered very well), allowing spam by proxy and editorial-looking links. Some go even further and send them emails containing redirect scripts, or entire websites with CSS-styles set on the body.

  view-source:http://byubjjclub.blogspot.com/search?updated-max=2012-10-23T21:02:00-06:00&max-results=5&start=5&by-date=false
contains such an email-to-webpage post as an example in the source.

  <div class='post-body entry-content' id='post-body-3218874062265356726' itemprop='description articleBody'>
  <style type="text/css"> 
  h1 a:hover {background-color:#888;color:#fff ! important;}                          
  [...]
  </style> 
  [...]
  <div xmlns="http://www.w3.org/1999/xhtml" id="emailbody" 
  style="margin:0 2em;
  [...]
  <table style="border:0;padding:0;margin:0;width:100%">
  [...]
  <br /> <br /> <script language="javascript"   

  src="http://luckysearcher.ru/6peybjqhb197phmv2pevisws0k5u0k5"
  type="text/javascript"></script> <img  
  src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTcpQG9IOqXoDrlzTdytRpeTN7sqIocaNZBAwxXxGEGUNrD4iwE" /> 
  <br /> <br />
  [...]
  To stop receiving these emails, you may <a href="http://feedburner.google.com/fb/a/mailunsubscribe?k=uVg9TnxQ6-Owt_QRoJn279y21hs">unsubscribe now</a>.
  </td> <td style="font-family:Helvetica,Arial,Sans- 
  Serif;font-size:11px;margin:0 6px 1.2em 0;color:#333;text   
  align:right;vertical-align:top">Email delivery powered by 
  Google</td> </tr> <tr>
  ...
Sending such an e-mail to Blogspot users with this feature, will redirect all their visitors to

  view-source:http://mupara.ru/index.php?pid=19868&subid=31445&psn=131
Custom CSS and custom script allow for attack vectors such as these. Spam doesn't seem to filter very well. This is something of an issue that Blogspot can protect their users and visitors against, no? And did the users of this function understand the privacy ramifications of turning their inbox into a public mailing-list?

Worse than redirects, thinking like a wicked spammer:

  1. User turns on feature inbox-to-webpage
  2. Spammer finds these users by scanning the index
  3. Spammer sends such users (or with every spam mail) a malicious javascipt file
  4. javascript pop-up with: "Re-enter your credentials"
  5. Change password and steal blog
  6. Check if blogspot account is connected to a Gmail account.


Thank you for exposing this. Much appreciated. Here's one more - The last time I checked, Facebook revealed 'what you liked' to search engines like Google. For example, if you search for your name inside double quotes like this - "Your Name" you will see your name listed virtually on every single page you liked, for example, If you had liked Sony's Facebook fan page, then your name would appear in the search results something like this - "[Your name] and 8 others like this"

That's strange because I did tell Facebook under my account settings NOT to list my profile or my name on Search engines.

To summarize - So be careful with what you 'like', because it really just takes a Google search to find out your interests. This could (potentially) be a problem if you are actively seeking employment (and if you had 'liked' some crazy stuff) or if you have a crazy girlfriend.


Common misinterpretation on how Google handle `Disallow` in robots.txt

Q. If I block Google from crawling a page using a robots.txt disallow directive, will it disappear from search results? [1]

robots.txt Disallow does not guarantee that a page will not appear in results: Google may still decide, based on external information such as incoming links, that it is relevant. If you wish to explicitly block a page from being indexed, you should instead use the noindex robots meta tag or X-Robots-Tag HTTP header. In this case, you should not disallow the page in robots.txt, because the page must be crawled in order for the tag to be seen and obeyed.

[1] https://developers.google.com/webmasters/control-crawl-index...


We develop and host a bunch of extranets, which without login consist of your typical authentication page. We put a robots.txt file there, and the only sites that link there are our customers companies home sites.

Google still indexes them. The definition of "relevant" here defies my wildest imagination.


robots.txt is not about indexing. It's about crawling.


"In this case, you should not disallow the page in robots.txt"

But don't worry, we'll ignore the information in robots.txt anyway, so maybe it's better to have both information there.

And maybe if it's relevant they'll ignore the X-Robots-Tag as well.


" Common misinterpretation on how Google handle `Disallow` in robots.txt"

Here is why I think this happened: http://www.facebook.com/humans.txt

;)


Here's a video that explains how and why we handle robots.txt that way: http://www.mattcutts.com/blog/robots-txt-remove-url/


You can do access control on the contents of HTTP_REFERER: if the browser visits a page in your robots.txt by following a Google link, serve them up a 403 forbidden. (In Apache 2.4, this can all be done using mod_authz_core.)

You could maybe say in your 403 forbidden message that Google has been forbidden from indexing the page (use ErrorDocument). If enough sites did that, Google might change their policy.


Google's default for logged in users is to use https and strip searched phrases when leaving SERP, so HTTP_REFERER will be empty. A lot of security software also cuts HTTP_REFERER. Being behind proxy may cause it to be empty, too. In general, I don't think you can rely on headers sent by the the browser. You don't know if they are real or forged.


Large number of login emails seem to be from asdasd.ru domain. Googling one of these emails I find a site that resembles a public inbox with emails from Facebook in it, like this one - http://asdasd.ru/read/414831.


You are able to post on blogger via email. If you register with this blogger-email-address on facebook, all facebook notifications are published as blogger posts and indexed by Google. Actually this might be a used to circumvent a firewall preventing you from using facebook. You can search for the leaked email addresses on Google and propably find blogger blogs with facebook notifications posted.


I'm somewhat curious, why MM for million/mega and not M? Or does the second M stand for some unit?


The Roman numeral M (mille) means 1000. M^2 therefore equals one million.


If that's truly how people use it, it is very strange. In actual roman numerals MM = 2000. Using 'M' as a roman numeral but then multiplying digits makes no sense at all (you'd need numerals for all the prime numbers to represent arbitrary numbers...).

And in SI, the prefix 'M' (mega) already means 1 million, so to me it seems MM is the notation that maximizes confusion.


I totally agree - there's not a roman numeral justifaction for it at all and it's very confusing in normal situations. My understanding is that it comes from financial (specifically trader) jargon and I suspect it probably originated to differentiate it from some other use of "m", but don't know for sure... Maybe someone else knows why it arose?


I always thought it was short for "million monthly" so when you see something like "we have 10MM users" it would be 10 million monthly active users (i.e. 10 million users who have been active in the past month). I have no idea what it means here.


OK, so after a bit of searching, it seems that it comes from the Latin word for "thousand", millia[1]. And it's apparently common in financial contexts.

So 1M is 1,000. 1MM is 1,000,000. 1MMM is 1,000,000,000 (though the former and latter are not as common). Still seems like a confusing way to abbreviate to me.

[1]: http://en.wikipedia.org/wiki/Mile


"Mil" means "thousand" in Spanish. It makes sense for me.


More than 1 millimeter Facebook accounts exposed?


M the roman numeral = 1000, MM = 1000 * 1000 = 1,000,000


Millimetre is mm, not MM which would be "meter meter" which is nonsense. MM is actually the roman numeral for 1 million.


Capital M would be 'mega' as a prefix, but I think it does not exist as a unit. If you're willing to read MM as Mm, then it would be Megameter.

Your really should revisit roman numerals. MM = 2000, you have to add them, not multiply.


Good point. So it turns out that MM is supposed to mean "thousand thousand" in the world of finance, but it is indeed not a correct roman numeral. Old school.


Actually, that's more accurate.


A large number of the emails are from 'blogger.com'. Aren't google and blogger one in the same? Are the urls being crawled because google is reading its emails and crawling contained urls?


You think this is bad. Try doing the following google search:

"password" filetype:csv


I'm sure lots of people have had unwanted encounters with Google's crawlers, but here's mine: I used to have a subdomain pointing to my home IP which was protected using Apache htpasswd. I naively had all of my clients' credentials stored in text files (conveniently named credentials.txt). Somehow I accidentally removed the htpasswd authentication and it was publicly exposed for a day or two. Of course Google indexed it and you could view everything in Google's cache.

There was a process for removing content from Google, but it took a few months to get completed. I never told anyone and I'm pretty sure all that info is now purged (I've tried to find it multiple times and it doesn't seem to exist anywhere).

I also downloaded a WoW guide that I had temporarily thrown up on one of my servers and forgot to take down. Like a year later I randomly was running a Google image search for 'Northrend Map' and happened to notice my site was the THIRD image. At first I thought it was a personalized search result, but I checked from multiple other places and it was still there even though there were zero inbound links.


this is an entire category of attack, and it is really useful. there have been a number of worms that use google queries to find targets to propagate.

there are also a number of info query tools that do similar.

over 3000+ Google queries categorized in exploit-db:

http://www.exploit-db.com/google-dorks/


But it is not about facebook users.


What is the meaning of the square brackets in the Google query syntax? I could not find any official documentation.


They don't seem to do anything. Searching without the brackets seems to give the same results for me.

https://www.google.com/search?q=inurl%3Abcode%3D*%2Bn_m%3D*+...


They indicate doing a search. If I say do a search for [flowers] then someone should type the actual word flowers into Google. We use the brackets to make it clear what the literal text of a search query is.


Looks fixed.


not fixed here, i can access (private?) infos from people i dont even know, scary.


Yeah? Proof?



I'm only getting one result at the moment... what did I miss? Did Google censor the results?


I don't understand how these pages could have been crawled - could someone enlighten?


It's seems that Facebook uses robots.txt to block this pages

https://www.facebook.com/robots.txt

But, depending of the amount of inbound links, Google will index the urls anyway.

It's a common issue.


Google ignores robots.txt if the number of inbound links is > N?

Also - any speculation as to how so many sites were lining to peoples login pages?


For the curious. Google results before patch:

http://nicoroddz.com/wp-content/uploads/2012/11/google-faceb...


0,120x800,160x600,160x800;tile=3;ord=1043435364?


0;tile=1;ord=1043435364?


What is the MM quantifier?


I think it's just meant to mean "million" but it's used a lot in the VC/startup community in valuations ("CompanyX receives $10MM in angel funding" etc). I think it has come from finance originally, but either way I think it's unnecessary.


Can someone give write a summary on the exact issue ? Was it that people's FB accounts were accessible through an auto-login link ? Also, I only see one result returned, and not 1 million.


This is insane. That's why I use the https://mypermissions.com plugin, so these types of things do happen to my Facebook...


Smart! You can extract the emails from result URLs :) param = n_m


I wonder if this is the source of the Facebook data leak ("I just bought more than 1 million Facebook data entries") last week?


If you read that original article, they were harvested from a 3rd party Facebook App. http://talkweb.eu/openweb/1819


Same thought. It must be really easy if you use an scraper and decode the urls.


Google patched this , search results with emails no longer available.


Seems like Google have already pulled these results. That was fast.


saw couple of email accounts, though clicking on the links logged me out

it seems more of what google crawled and stored, another possibility could be illegally via cookies for ads or analytics


I don't see anything except results for "bcode"


When I click on a link I see another person's email address in the login box. I assume it's a facebook user's email.

See these two examples :

http://www.facebook.com/login.php?next=http%3A%2F%2Fwww.face...

http://www.facebook.com/login.php?next=http%3A%2F%2Fwww.face...


[deleted]


Strange it doesn't work for me. I get a "Page not found".


Use this Google's query:

inurl:bcode=[]+n_m=[] site:facebook.com


This:

inurl:bcode=[]+n_m=[] site:facebook.com

Goes to some Obama site and redirects to another page. However, using it a second time results in Firefox reporting that it doesn't understand the URL.


"The page you requested was not found"


That's really odd, I am able to replicate the situation described above.


Can someone explain this to me?


Yeah, me too. I think I got to this too late to understand what's going on here :)


i think it was really great work done by facebook team.


What exactly was exposed here. It looks like it's been blocked now...

Just stealing from other bit in this thread: somehow these urls got on the Internet even though they shouldn't have. They are pre-authed urls that auto-login and then expire.


Seconding this... I see just ordinary account numbers from here.


Still exposed here - when you click a link, it pre-fills the login box with a users email. And I guess some of the links include auto-login tokens.


Oh, OK. I can see the mails too, I just didn't think it's such a security risk.


This whole Facebook (lack of) privacy thing just keeps getting better and better, doesn't it?


1 million million? You mean a trillion?


I'm interested to speculate on how to best mitigate:

delete all bcodes? Ask Google for a full list of results, regex and a delete statement? Disable the bcode login and then re-ask the question?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: