Read the whole event log.
If you were behind Cloudflare and it was proxying sensitive data (the contents of HTTP POSTs, &c), they've potentially been spraying it into caches all across the Internet; it was so bad that Tavis found it by accident just looking through Google search results.
The crazy thing here is that the Project Zero people were joking last night about a disclosure that was going to keep everyone at work late today. And, this morning, Google announced the SHA-1 collision, which everyone (including the insiders who leaked that the SHA-1 collision was coming) thought was the big announcement.
Nope. A SHA-1 collision, it turns out, is the minor security news of the day.
This is approximately as bad as it ever gets. A significant number of companies probably need to compose customer notifications; it's, at this point, very difficult to rule out unauthorized disclosure of anything that traversed Cloudflare.
Yes, apparently the allocation patterns inside Cloudflare mean TLS keys aren't exposed to this vulnerability.
But Heartbleed happened at the TLS layer. To get secrets from Heartbleed, you had to make a particular TLS request that nobody normally makes.
Cloudbleed is a bug in Cloudflare's HTML parser, and the secrets it discloses are mixed in with, apparently, HTTP response data. The modern web is designed to cache HTTP responses aggressively, so whatever secrets Cloudflare revealed could be saved in random caches indefinitely.
You really want to see Cloudflare spend more time discussing how they've quantified the leak here.
What would you like to see? The SAFE_CHAR logging allowed us to get data on the rate which is how I got the % of requests figure.
But they're probably aware of this issue and know enough to go looking to purge their caches.
Do you care to share or elaborate on this?
But I guess there is really no incentive for anyone in particular to do anything about this, because it provides a kind of perverted safety in numbers. "It's not just our website that had this issue, it's, like, everyone's shared problem." The same principle applies to uber-hosting providers like AWS and Azure, as well as those creepy worldwide CDNs.
Interestingly, it seems this is one of the cases where using a smaller provider with the same issue would really make you better off (relatively speaking) because there would be fewer servers leaking your data.
The web simply doesn't scale. The only way to fix DDoS reliably is peer-to-peer protocols. Which hardly ever happens because our moronic ISPs believed nobody needed upload. Or even a public IP address.
you can argue "if everything was symmetric, then traffic patterns would be different" and you might be right, but that's not how the market went or how the "internet" started.
the client-server paradigm drove traffic patterns, and there was never any market demand or advantage by ignoring it.
Yes, traffic patterns at the time was heavily slanted towards downloads. I know about copper wires and how download and upload limit each other. Still, setting that situation in stone was very limiting. It's a self fulfilling prophecy.
You don't want to host your server at home because you don't have upload. The ISP sees nobody has servers at home so they conclude nobody needs upload. Peer-to-peer file sharing and distribution is slower than YouTube because nobody has any upload. Therefore everybody uses YouTube, and the ISP concludes nobody uses peer-to-peer distribution networks.
And so on and so forth. It's the same trend that effectively forbid people to send e-mail from home (they have to ask a big shot provider such as Gmail to do it for them, with MITM spying and advertisement), or the rise of ISP-level NAT, instead of giving everyone a public IPv6 address like they all deserve (including on mobile).
There is a point where you have to realise the internet is increasingly centralised at every level because powerful special interests want it to be that way.
Regulation is what we need. Net neutrality is a start. Next in line should be mandated symmetric bandwidth, no ISP-wide firewall (the local router can have safe default settings), public IP (v4 or v6) for everyone, and no restriction on usage patterns (the ISP should not be allowed to forbid servers). Ultimately, our freedom of expression and freedom of information depends on this. They are messing with human rights.
And because IP multicast doesn't work over the internet. If it did, even if merely to some limited extent, some asymmetries would be far easier to stomach.
It may not have been how the market went but it definitely was how the internet got started.
The traffic patterns for higher upstream aren't there because they can't be there.
Sharing here in case anyone else finds it useful
(warning - it's 1.1MiB gzipped / 2.4MiB uncompressed)
any domains outside the top 1 million are ommitted
Potentially huge amounts of stuff might be exposed, but I have some assurances that "the practical impact is low" from someone I trust, so I think it's just a lot of random data. I'd still rotate all credentials which passed through Cloudflare in the past N months (and if I were a big consumer site NOT on Cloudflare, I might change end user passwords anyway, due to re-use), but I don't think it will be the end of the world.
In terms of the caching, knowing the broken sites tells you where to look in the caches after the fact, but do you have any idea of who's data was leaked? Presumably 2 consecutive requests to the same malformed page could/would leak different data.
Wouldn't the second request be served from the CDN cache? Since for Cloudfare that particular page is a valid cached page, it would send you that same page on the second request.
1. every tag pattern that triggers the bug(s)
2. which broken pages with that pattern were requested at an abnormally high frequency or had an unusually short TTL (or some other useful heuristic)
3. on which servers, and at what time, in order to tell
4. who's data lived on the same servers at the same time as those broken pages
to even begin to estimate the scope of the leak. and that doesn't even help you find who planted the bad seeds.
Exactly which search engines and cache providers did you work with to scrub leaked data?
ex: Right now there is in an easily found google cached page with OAuth tokens for very popular fitness wearable's android API endpoints
So: all services that have one or more domains served through Cloudflare may be affected.
The consensus seem to be that no one discovered this before now, and no bad guys have been scraping this leak for valuable data (passwords, OAuth tokens, PII, other secrets). But the data still was saved all over the world in web caches. So the bad guys are now probably after those. Though I don't know how much 'useful' data they would be able to extract, and what the risks for an average internet user are.
This is literally as bad as it gets, anyone trying to palliate the solution has something to sell you. You'd have to be an idiot to think that $organization (public, private, or shadow) doesn't have automated systems to check for something as stupid simple as this by querying resources at random intervals and searching for artifacts.
Someone found it. Probably more than one someone. Denial won't help.
And of course all other sites that are not in alexa 10k are not in this list (if they are not on some other lists used, you can see the source of lists in the README of the Github repo).
Cloudflare has a lot of customers who only use the free DNS service, for example.
You're not exposed if you never sent traffic through their proxies; for instance, if you somehow only used them for DNS.
The DNS service is essentially free. It's an upgrade from most registrars' built-in DNS. It's a pretty robust solution, really -- global footprint, DNSSEC, fully working IPv6, etc.
My point is, the actual number of impacted customers was much smaller than the entire set of Cloudflare customers. There are lists in this thread that still reference hundreds of thousands (millions?) of sites, and that's just wrong.
(I agree on your first point though; I was confused about the nature of the proxy bug at first).
I guess there is a long tail of pages on the internet whose primary purpose is to be crawled by google and serve as search landing pages - but again, if I had a bug in the HTML in one of my SEO pages that caused googlebot to see it as full of nonsense, I'd see that in my analytics because a page full of uninitialized nginx memory is not going to be an effective pagerank booster.
Or is the problem that one domain can trigger the memory leak, and another (unpredictable) domain is the "victim" that has its data dumped from memory?
I prefer CloudScare to Cloudbleed :)
Look at this, click on the downward arrow, "Cached": https://www.google.com/search?q="CF-Host-Origin-IP:"+"author...
(And then, in Google Cache, "view source", search for "authorization".)
(Various combinations of HTTP headers to search for yield more results.)
So I tried it too, and there's still data cached there.
Am I misunderstanding something - that above statement must be wrong, surely?
They can't have found everything even in the big search engines if it's still showing up in Google's cache, let alone the infinity other caches around the place.
EDIT: If the cloudflare team sees I see leaked credentials for these domains:
Thanks to Uber now requiring location services on Always instead of just when hailing a car, my and others' personal location history even outside of Uber usage could have been compromised. Sweet.
That doesn't make me a fool, it makes me human. Don't be a jerk. It's a dark pattern for a reason.
Someone pointed me to MALLOC_PERTURB_ and I've just run a few test programs with it set - including a stage1 GCC compile, which granted may not be the best test - and it really doesn't dent performance by much. (edit: noticeably, at all, in fact)
People who prefer extreme performance over prudent security should be the ones forced to mess about with extra settings, anyway.
What changed is the paged memory model: modern systems don't actually tie an address to a page of physical RAM until the first time you try to use it (or something else on that page). Initializing the memory on malloc() would "waste" memory in some cases, where the allocation spans multiple pages and you don't end up using the whole thing. Some software assumes this, and would use quite a bit of extra RAM if malloc() automatically wiped memory. It would also tend to chew through your CPU cache, which mattered less in the past because any nontrivial operation already did that.
I personally don't think this is a good enough reason, but it is a little more than just a minor performance issue.
That all being said, while it would likely have helped slightly in this case, it would not solve the problem: active allocations would still be revealed.
On BSDs, malloc.conf can still be configured to do that: on OpenBSD, junking (fills allocations with 0xdb and deallocations with 0xdf) is enabled by default on small allocations, "J" will enable it for all allocations. On FreeBSD, "J" will initialise all allocations with 0xa5 and deallocations with 0x5a.
Maybe an alternative approach is to simply mark the pages to be lazily zeroed out when attached, in the Page Table Entries of the MMU. They wouldn't be zeroed out at the time of the call malloc(), but only when they are attached to a physical memory location (the first time you use it).
The stuff about eagerly allocating pages is spot on though.
There is calloc which allocates and zeroes memory, but people don't use it as often as they should.
In terms of "wasting" memory, perhaps the kernel could detect that you are writing 0s to a COW 0 page and still not actually tie the page to physical RAM. (If you're overwriting non-0 data, well it's already in a physical page.)
I don't quite follow the details of the CPU cache issue and why that is more-than-minor.
I do think in this day and age we should be re-visiting this question seriously in our C standard libraries. If the performance issues are actually major problems for specific systems, the old behaviour could be kept, but after benchmarking to show that it really is a performance problem.
Writing to your COW zero page causes a page fault. Now, in theory you could disassemble the executing instruction and if it's some kind of zero write, just bump the instruction pointer and go back to userspace - but then the very next instruction in your loop that zeroes the next 8 bytes will cause the same page fault. And the next. And the next...
Taking a page fault for every 8 bytes in your allocation is completely infeasible. You'd be better off taking the hit of the additional memory usage.
However, zeroing on free is generally a useful defense-in-depth measure because can minimize the risk of some types of information disclosure vulnerabilities. If you use grsecurity, this feature is provided by grsecurity's PAX_MEMORY_SANITIZE .
The computational cost of doing so, I suspect.
malloc/free were designed around 1972. That was a time where performance was much more important and security concerns didn't really exists.
Modern systems, like Go, do zero-out newly allocated memory because they do consider a bit more security to be more important than a bit more performance.
But changing the defaults of malloc/free is not really an option and it would probably break stuff.
Especially on Linux, where, I believe, malloc returns uncommitted pages, which increases the perf advantage in some cases.
Security conscious programmers can use calloc() or write their own wrappers over malloc/free.
language spec should probably now default to zeroing memory unless you specifically ask it not to....and maybe that should be a verbose option :)
Either they believe it's right, which means they're not competent enough to really assess the scope of the leak; or they don't believe it, but they went "fuck it, that's the best we can do".
In either case, it doesn't really inspire trust in their service.
I agree it's troubling that Google is taking so long. We were working with them to coordinate disclosure after their caches were cleared. While I am thankful to the Project Zero team for their informing us of the issue quickly, I'm troubled that they went ahead with disclosure before Google crawl team could complete the refresh of their own cache. We have continued to escalate this within Google to get the crawl team to prioritize the clearing of their caches as that is the highest priority remaining remediation step.
Not as simple as you thought?
Thousands of years from now, when biological life on this planet is all but extinct and superintelligent AI evolving at incomprehensible rates roam the planet, taviso will still be finding 0-days impacting billions of machines on an hourly basis.
Be glad that Google is employing him and not some random intelligence agency.
However, I am always wondering: are they really globally unique in their work and skill? So that they are really the ones finding all the security holes before anyone else does because they are just so much better (and/or with better infrastructure) than anyone else? Or is it more likely that on a global scale there are other teams who at least come close regarding skill and resources, but who are employed by actors less willing to share what they found?
I really do hope Tavis is a once-in-a-lifetime genius when it comes to vulnerability research!
If I were just casually googling two weeks ago and came across a leaked cloudflare session in the middle of my search results I think I would have vomited all over my desk immediately. Dude must have been sweating bullets and trembling as he reached out on twitter for a contact, not knowing yet how bad this was or for just how long it's been going on.
I know the search I performed now on Yahoo states "Powered by Bing™" at the bottom.
<!-- fe072.syc.search.gq1.yahoo.com Sat Feb 25 03:58:27 UTC 2017 -->
Given they are identical results it's pretty clear it must be a shared index I suppose, that or the leaked memory was cached.
Yahoo was never really a search company (even its founding, it was a "directory", not a "search"). Sure, they pretended fairly well from 2004ish (following their move off Google results) to 2009 (when they did the Bing deal), but the company never really nailed search or more importantly search monetization despite acquiring one of the first great search engines (Altavista) and the actual inventor of the tech Google stole for its cash cow Adwords (Overture).
And some sketchy internal variables: `log_only_china`, `http_not_in_china`, `baidu_dns_test`, and `better_tor`.
And monoculture is the elephant in the room most pretend not to see. The current engineering ideology (it is ideology, not technology) of sycophancy towards big and rich companies, and popular software stacks, is sickening.
I've never seen anyone suggest it, I suppose It cannot or should not be done for some reason?
The way I see it, time given by GZero was sufficient to close the loophole, it was not meant to give them chance to clear caches world-wide. They have a PR disaster on their hands, but blaming Google won't help with it.
20 hours since this post and these entries are still up ...
You can also click on "parent", and repeat as necessary.
He discovered one of the worst private information leaks in the history of the internet, and for that, he won the highest reward in their bug bounty: a Cloudflare t-shirt.
They also tried to delay disclosure and wouldn't send him drafts of their disclosure blog post, which, when finally published, significantly downplayed the impact of the leak.
Now, here's the CEO of Cloudflare making it sound like Google was somehow being uncooperative, and also claiming that there's no more leaked private information in the Bing caches.
Wrong and wrong. I'd be annoyed, too.
Read the full timeline here: https://bugs.chromium.org/p/project-zero/issues/detail?id=11...
I can see a whole team at Cloudflare panicking, trying to solve the issue, trying to communicate with big crawlers trying to evict all of the bad cache they have while trying to craft a blogpost that would save them from a PR catastrophe.
All the while Taviso is just becoming more and more aggressive to get the story out there. 6 freaking days.
short timeline for disclosures are not fun.
Two questions came to mind: "how do we clean up search engine caches?" (Tavis helped with Google), and "has anyone actively exploited this in the past?"
Internally, I prioritized clean up because we knew that this would become public at some point and I felt we had a duty of care to clean up the mess to protect people.
Has this question been answered yet?
Wouldn't your team now even have to decide how to deal with this even after some specific well known caches have been cleared? I mean there's no guarantee that someone may not have collected all this data and use it to target those cloudflare customer sites. Are you planning to ask all your customers to reset all their access credentials and other secrets?
There are very good reasons to enforce clear rules like this.
Cloudbleed obviously falls into the second category.
Legally, there's nothing stopping researchers from simply publishing a vulnerability as soon as they find it. The fact that they give the vendor a heads-up at all is a courtesy to the vendor and to their clients.
It is the norm, and it is called responsible disclosure. You're trying to do the less harm, and the less harm is a combination between giving some time to the developers to develop a fix and getting the news out there for customers and customers of customers to be aware of the issue.
I would also advise you notify your cloud-based services' customers how they might be affected (yes really), trust erosion tends to be contagious.
I think you have misunderstood the issue. Just because YOU did not use those services does not mean your data was not leaked. It means that other peoples data was not leaked on YOUR site, but YOUR data could be leaked on other sites that were using these services.
If this part is true, they're not vulnerable. Only data that was sent to CloudFlare's nginx proxy could have leaked, so if they only proxy their static content, then that's the only content that would leak.
The rest of their comment gives the wrong impression though, yeah.
The way it worked, the bug also leaked data sent by the visitors of the these "static sites": IP addresses, cookies, visited pages etc.
Don't use CF, and after seeing behavior like this, don't think I will.
Before Let's Encrypt is available to public use (beta), CF provided "MITM" https for everyone: just use CF and they can issue you a certificate and server https for you. So I tried that with my personal website.
But then I found out that they replace a lot of my HTML, resulting mixed content on the https version they served. This is the support ticket I filed with them:
On wang.yuxuan.org, the css file is served as:
<link rel="stylesheet" title="Default" href="inc/style.css" type="text/css" />
Via cloudflare, it becomes:
<link rel="stylesheet" title="Default" href="http://wang.yuxuan.org/inc/A.style.css.pagespeed.cf.5Dzr782jVo.css" type="text/css"/>
This won't work with your free https, as it's mixed content.
Please change it from http:// to //. Thanks.
There should be more similar cases.
Luckily I have Let's Encrypt now and no longer need them.
This led to Cloudflare refusing to implement support for Google Authenticator for 4 years.
Also, the notion that the CEO of an internet company would have a "beef with Google" is pretty funny.
Bugs happen to us all; how you deal with this is what counts, and wilful, blatant lying in a transparent attempt to deflect blame from where it belongs (Cloudflare) onto the team that saved your bacon?
I've recommended Cloudflare in the past, and I was planning, with some reservations, to continue to do so even after disclosure of this issue. But seeing this comment? I don't see how I can continue.
(For the sake of maximum clarity: I take issue: 1) with the attempt at suggesting the main issue is in clearing caches, not on the leak itself. It doesn't matter how fast you close the barn door after the horse is gone and the barn has burned down. 2) With the blatantly false claim that non-Google caches have been cleared, or were faster to clear than Google's. Cloudflare should know, better than anyone, the massive scope of this leak, and the fact that NO search engine's cache has or could be cleared of this leak. If you find yourself in a situation so bad you feel like you need to misdirect attention to someone else, and it turns out no one else is actually doing anything so you have to like about that...maybe you should just shut up and stop digging?)
Google has absolutely no obligation to clean up after your mess.
You should be grateful for any help they and other search engines give you.
But I still find it troubling. Is it their mess? No. Does it affect a lot of people negatively - yes. I expect Google to clean this up because they're decent human beings. It's troubling because it's not just CloudFare's mess at this point.
It reminds me of the humorous response to "Am I my brother's keeper?", which is "You're your brother's brother"
I view leaving up the cached copy of leaked data as being a jerk move - not towards CloudFare, but to anyone whose data was leaked.
This is an opportunity for Google to show what they do with rather sensitive data leaks - do they leave them up or scrub them?
Had damage from the leak been aleady done (to those whose data it was)? Probably. Even taking that into account, I think the Google search comes off as a jerk in this situation.
This is not the case; it is not obvious, trivial, or easy to delete the leaked data. It is not simple to find it all. This is not like they are being given a URL and being asked to clear the cached version of it; they are being asked to search through millions of pages for possibly leaked content.
If you are using the same attitude as you use in this comment, with their team, i'm pretty sure they will be thrilled to keep aside all their regular work and help you out cleaning up a enormous mess created by a bug in your service.
I will be migrating away from your service first thing Monday. I will not use you services again and will ensure that my clients and colleagues are informed of you horrific business practices now and in the future.
I'm no longer using CF for my own projects, but you've just cemented my decision that none of my clients will either.
It sounded like they (cf) were under a lot of pressure to disclose ASAP from project zero and their 7 day requirement...
Internal Upstream Server Certificate
/C=US/ST=California/L=San Francisco/O=Cloudflare Inc./OU=Cloudflare Services - nginx-cache/CN=Internal Upstream Server Certificate
EDIT: but there's still plenty of fish: http://webcache.googleusercontent.com/search?q=cache:lw4K9G2...
This will take weeks to clean, and that's just for Google.
EDIT2: found other oauth tokens, lots of fitbit calls... And this just by searching for typical CF internal headers on Google and Bing. There is no way to know what else is out there. What a mess.
> authorization: OAuth oauth_consumer_key ...
what a shit show. I'm sorry but at that point there must be consequences for incompetence. Some might argue "But nobody can't do anything" ...
I'm sorry, CF has the money to to ditch C entirely and rewrite everything from the ground up with a safer language, I don't care what it is, Go,Rust whatever.
At that point people using C directly are playing with fire. C isn't a language for highly distributed applications, it will only distribute memory leaks ... With all the wealth there is in the whole Silicon Valley, trillions of dollars, there is absolutely 0 effort to come up with an acceptable solution? all these startups can't come together and say: "Ok,we're going to design or choose a real safe language and stick to that"? where does all that money goes then? Because this bug is going to cost A LOT OF MONEY to A LOT OF PEOPLE.
OAuth2 "simplified" things and just sends the secret over the wire, trusting SSL to keep things safe.
Perhaps the largest MITM ever eh?
The short waiting period balances the vendor's interest in coordinating the smoothest fix to the problem with the public's interest in knowing its exposure and maximizing it's options for reacting to the exposure.
The fixed waiting period keeps the process sane. Every vendor you'll ever disclose a serious vulnerability to will try to delay disclosure, usually repeatedly. If you set a precedent of making arbitrary exceptions, you'll never be able to stare anyone down.
Again: as the reporters, you're trying to balance the vendor's interests with those of the public. Your credibility in these situations is pretty important, not just for this vulnerability, but for the next ones. With P0, we all know there will be a long series of "next ones" to be concerned about.
I feel like adding even just another day or two would've allowed them to purge more of these search results. I think that would greatly outweigh the increased risk of letting it remain undisclosed for slightly longer.
"Internal Upstream Server Certificate0"
And yet, I occasionally see working cache links on relevant unaffected pages.
Really, really awesome to see this kind of response. It's an obvious course of action (also considering corporate liability that you're publicly holding/offering this data) but it's really cool to see everyone work to fix this en masse so quickly.
I think a lot of people would enjoy hearing campfire battle stories of the past ~week once this is all over.
Couldn't Google just purge all cached documents which match any Cloudflare header? This will probably purge a lot of false positives, but it's just cached data, so would that loss really matter? My guess is that this approach should not take more than a few hours on Google's infrastructure.
Of course, this leaves the problem of all the other non-Google caches out there.
OAuth2 does send the secret, typically in an "Authorization: Bearer ..." header.
The uber stuff that somebody else linked to looks like a home-grown auth scheme and it appears that "x-uber-token" is a secret, but hard to know for sure.
This is an ongoing disaster, wasn't this disclosed too soon?
edit: Uber also seems to be affected.
So the issue wasn't fully fixed on Feb 19, or Google's cache date isn't accurate?
I don't know, this just seems catastrophic.
.... uhm is that what I think I'm seeing???
You have to wonder whether something like this is implicated.
Apps that consume APIs would be more sensitive to unexpected junk than browsers.
And it's just a speculation. Shrug.
I don't know how it works in the back so this is all speculation of course.
If someone knew about this exploit they're not going to be messing with people's Uber rides for lulz.
Cloudflare certainly does; I founded a health tech company, and Cloudflare was the recommended go-to for health tech startups who needed a CDN while serving PHI.
And this is definitely a reportable breach. Technically any breach is supposed to be reported to HHS, but in reality, a lot of covered entities (e.g. insurers) fail to report smaller breaches (which, as a patient, should terrify you). The big ones, though, are really, really bad, and when reported, the consequences can be very serious and potentially even include serving time, depending on the circumstances.
The reason I can be so confident that this is a reportable breach is that the definition of PHI is so broad that even revealing the existence of information between two known entities can be considered protected information. Anything more specific, like a phone number or DOB, or time of an appointment (even if you don't know who the appointment corresponds to) - that's always protected. And Cloudflare certainly has many of those.
Just think about the HIPAA document describing a single endpoint of dozens of sensitive datastreams, decrypting and then encrypting them all on the same machine, a machine that does some random HTML parsing for snippet caching on the side.
I don't see that passing review, but perhaps I'm naieve..
"Because Cloudflare operates a large, shared infrastructure an HTTP request to a Cloudflare web site that was vulnerable to this problem could reveal information about an unrelated other Cloudflare site."
You don't need to be using this feature, or to be sending malformed HTML yourself - just to be in memory for this Cloudflare process.
So we should see very quickly that Cloudflare knows what to do when stuff goes wrong.
The memory leaked by this bug includes that pre-encryption data, which is what we're seeing here.
(At least that's my interpretation, computer security isn't quite my wheelhouse)
We've been reaching out to a couple of vendors that do use the proxy functionality (given that the data spill could impact our clients as well). Hoping to resolve the BAA uncertainty in the process too.
I feel for folks who lost API keys -- really -- but everyone regulated should be in full-on disaster recovery mode right now.
Some have suggested that Cloudflare might not be a business associate because of an exception to the definition of business associate known as the "conduit" exception.
Cloudflare is almost certainly not a conduit. HHS's recent guidance on cloud computing takes a very narrow view:
"The conduit exception applies where the only services provided to a covered entity or business associate customer are for transmission of ePHI that do not involve any storage of the information other than on a temporary basis incident to the transmission service."
OCR hasn't clarified what "temporary" means or whether a CDN would qualify, but again, almost certainly not. ISPs qualify, but your data just sits on the CDN indefinitely.
p.s. Hi Patrick and Aditya!
I do hate for CloudFlare to be the example for companies playing fast and loose with the rules, but I am hoping we'll have an opportunity in this to clarify the conduit definition a bit more.
Would like to mention that I don't think this declaration applies to every scenario. CloudFlare isn't just one service. I don't see an immediate issue using CloudFlare for DNS on a healthcare app. Neither do I see an issue using CloudFlare as the CDN for static assets. Both of these cases should be evaluated in a risk analysis, but they don't necessitate the level of shared responsibility a BAA entails.
I'm sorry but when the reward for breaking into you is basically a massive pinata of personal information...that simply is a bad joke. Security flaws are going to happen and if you aren't going to even offer a reasonable financial reward to report them to you, well, that is just begging to be exploited with a pinata that size.
That said, I disagree that bug bounties don't work for CDNs. You can scale a bug bounty up, it just requires resources. Cloudflare has those resources, and part of it is a function of the reward tiers you offer.
More than that, access to the service is actually the limiting factor for good bug bounty results. Cloudflare's bug bounty, we might surmise, works as well as it does because anyone can sign up for a Cloudflare account for free. For an enterprise CDN, who won't talk to a potential customer without the prospect of an $x0,000+/year contract, everyone who has enough access to the service to, in the general course of business, find and submit meaningful reports is employed by a customer, and likely prohibited from accepting substantial rewards. Everyone else either doesn't have enough access to submit meaningful reports, or the bug is so bad (like this one) that they'll report it regardless.
Arguably this shows that Cloudflare and other CDNs are right in their calculations: Tavis disclosed this bug to Cloudflare without promise of a payout, or even a T-shirt. Might some good Samaritan on the Internet have noticed the bug and reported it earlier if the bounty was more substantial? Perhaps. But in responding to a vulnerability of this magnitude, you want to work with someone of Tavis's caliber, who has the good of all the stakeholders in mind, not a profit-motivated rando.
We've got about 2500 tickets in our ticketing queue that have been filed over the past 8 months (excluding spam). Out of those 2500 tickets, only five are valid issues, and only one came with an actual write up.
The signal to noise ratio is absolutely awful - and it's not uncommon for people with invalid issues to demand that you pay them regardless.
I've been meaning to try a formal bounty program, as our software is a high value target (administrative tool running on over a million systems), but we're Open Source and don't have a lot of budget for bounties or anything else. If it produced hundreds of reports for every valid issue, it'd be counter-productive, for sure.
Hacker One should rename itself The Institute For Advanced Redirect Studies. I'm only partly kidding: bug bounty submitters are good at redirecting. Way better than I was before I started handling bounties. There's an interesting epistemological discussion to have about the low-value-yet-severity:critical bugs people file on bounty programs, because the level of cleverness required to exploit URL parsing differences between platforms is no less than what it takes to get an XSS bug.
There's a form listed under "How to apply", and an email address nearby.
It appears that projects are only documented once audited, FWIW.
Yes, running a real bug bounty system requires professional security engineers and a professional security posture to sort through the noise. However, when the sole product you are selling is security (i.e. Cloudflare) you kind of have to admit it should be expected that they do so.
It isn't "too high", it simply requires a serious financial commitment to security in the terms of salaried security engineers.
As to your other point, No one works for free. Project Zero is paid for by Google. Security engineers are going to prioritize the purposes that make them real, hard cash.
Parent's claim, as I read it, is that it's a better use of an enterprise CDN's money to hire security engineers to find bugs than to administer a bounty. Seems plausible to me. Where's that line?
Depends on the company, but tbpfh, most security engineers in a group tend to have a culture and that culture creates common blindspots. The fact they weren't testing for this sort of issue (i.e. parser memory leaks) is an example of something that seems obvious to some people that others ignore.
Maybe that is just my experience tho.
The award may still not be all that much, but let's not make things up about them.
I mean I guess it's good if you're already on Pro and could do with the freebie year but it's not really much to get the whitehats auditing your systems for free*
* free unless they find something
I've never put any of my sites behind Cloudflare precisely because I never had faith their WAF would always be bug free and I'm not comfortable with their MitM position.
Getting me to use your service on a time limited basis falls more under the category of "try-it-so-you-buy-it" marketing ploy than a real bonus to me. It benefits Cloudflare more than the researcher for that reason since if they use it, they'll be invested continuing to "help" Cloudflare since they'll be dependent on it.
I'm sorry, I just don't buy that is anything but a marketing ploy wrapped up as a bonus.
Then for anything like this, give publically a bonus gift which makes it worth people reporting to them and not blackmarket selling it. Once it's gone through the legal dept. and so on.
Then they can be very quick with handing out tshirts and so on to any and every microissue report, without the people running triage having to care about amounts or tax or whatever.
Having any kind of publically offered payment for service (beyond a tshirt bounty or services in kind) is just begging for legal issues, right?
https://hackerone.com/coinbase ($500-$10k) or https://hackerone.com/uber ($500-$10k) or https://hackerone.com/facebook ($500-$10k) or dozens of others have no trouble with it.
For instance what does it mean "sprayed into caches"? what cache? dns cache? browser cache? if the latter, does it mean you are safe if the person who owns that cache is an innocent non technical iser?
The best way to understand the bug is this: if a particular HTTP response happened to be generated in response to a request, the response would be intermingled with random memory contents from Cloudflare's proxies. If that request/response happened through someone else's HTTP proxy --- for instance, because it was initiated by someone at a big company that routes all its traffic through a Bluecoat appliance --- then that appliance might still have that improperly disclosed memory saved.
* Browser caches.
* Sites like wayback machine or search engines that make copies of webpages and save them.
* Tools that store data downloaded from the web, e.g. RSS readers.
* Caching proxies.
* the list goes on and on.
I think what tptacek wanted to say: It's just so common that people download things from the web and store them without even thinking much about it. And all those places where this happens now potentially can contain sensitive data.
Many of these caches are available online, to anyone who wants to look at them.
This bug meant that any time a page was sent through Cloudflare, the requester might receive the page plus some sensitive personal information, or credentials that could be used to log in to a stranger's account. Some of these credentials might let a bad actor pretend to be a service like Uber or Fitbit.
This very sensitive information might end up saved in a public cache, where anyone could find it and use it to do harm.
What are the odds I had a credential stored?
We know the impact but what are the odds to a provider and to a possible exposeee?
When it had bugs and devivered up cached files the typical symptom was that everyone in the company got unwanted porn.
Because the biggest user (by far) of the 'net was the person into porn and so 90% of the Squid cache was porn.
How am I going to explain this to my wife?
Actually a serious question. How do we communicate something like this to the general public?
Or used as confetti for a parade: http://www.npr.org/2012/11/27/166023474/social-security-numb...
As a one-man company who has never done this before (and to the best of my knowledge never needed to): Any guides/examples to writing a customer notification for security ups like this? Or just recommendations? Thanks.
Advise them to change passwords for other services too, list sites possibly affected: https://github.com/pirate/sites-using-cloudflare/blob/master...
On the plus side, all those booter services hiding behind the Cloudflare are probably being probed and classified/identified/disabled by competitors and probably FBI. That is good.
*as bad as it has ever gotten so far.
Curious whether there could be some automated way of preventing such a widespread cache poisoning in the future. Some ML trained on valid pages from a given domain?
Is it even possible to recover the original content of the documents or was the data randomly inserted into different parts?
Step 2) leak cleartext from said MITM'd connections to the entire Internet
I recently noted that in some ways Cloudflare are probably the only entity to have ever managed to cause more damage to popular cryptography since the 2008 Debian OpenSSL bug (thanks to their "flexible" ""SSL"" """feature"""), but now I'm certain of it.
"Trust us" doesn't fly any more, this simply isn't good enough. Sorry, you lost my vote. Not even once
edit: why the revulsion? This bug would have been caught with valgrind, and by the sounds of it, using nothing more complex than feeding their httpd a random sampling of live inputs for an hour or two
I'd guess it's because of the crude and reductive way you describe the service cloudflare provides. I don't know what type of programming you do, but many small services don't have the infrastructure to mitigate the kind of attacks cloudflare deals with and they wouldn't be around without services like this.
I don't like the internet becoming centralized into a few small places that mitigate DDOS attacks like this, but I like the alternative (being held ransom by anyone with access to a botnet) even less.
I'm going to take a more even handed approach than what you're suggesting. Any time you work with a service like this you risk these kinds of things - it's part of the implicit cost/benefit analysis humans do every day. I'm not ready to throw out the baby with the bathwater because of one issue. I'm not sure what alternative you're suggesting (I didn't see any suggestions, just a lot of ranting, which might also contribute to the 'revulsion') but it doesn't sound any better than what we have.
Using services like Cloudflare as a 'fix' is wrecking the decentralized principles of the Internet. At that point we might as well just write all apps as Facebook widgets.
That is a separate step. First you either take cover or help.
Do you see a problem with that?
Everyone in the "cloud" is able to do the migration even without having prepared a disaster recovery plan ahead of time.
Extreme centralization of the Internet is not a "baby", except maybe in the sense of a cuckoo's egg.
But I'm willing to bet the mentality of this comment is highly representative of many web developers and service providers. They will not seek to fix anything, because they don't see this state of things as a problem in the first place.
Cloud means extreme centralization.
It means giving your data to a third party you don't control.
Why does our networked software have to assume a centralized topology?
In the days when developed countries had dialup, protocols (IRC, Email, etc.) were all decentralized. Today, all the famous developers live with fancy broadband internet connections and forgot what it's like to have to think about netsplits.
The result... all the software is either "online" or broken.
There shouldn't be an "online" or "offline". There should be "do I have access to server X currently?"
Why do we need Google Docs to collaborate on a document if we are all in the same classroom?
Why do we need centralized facebook server farms whose engineers post on highscalability how they enable us all to post petabytes of photos and comment to our friends?
Why do we need centralized sites to comment at all? Each thread is local to its parent.
Why does India need internet.org from facebook?
If communities could have a network that survives without an uplink to the outside world then DDOS from the global internet would just cut off that network's hosting of documents to outsiders. They'd still be able to do EVERYTHING locally - plan dinners, book a local appointment, send an email etc. and even post things out to the greater internet.
This is a future I want to see.
We already have mesh networks. We need more web based software to run these things.
That's what we are building at qbix.com btw.
Tim Berners-Lee, the "father" of World Wide Web, is currently advocating for exactly what you are asking for.
(Now I'm trawling Crunchbase to see if I can work out which investors are NSA front companies, then I'm gonna look to see what _else_ them and their partners have invested in...)
I don't actually believe that, but it isn't an unreasonable theory.
I once came up with that exact concept for a nation-state subversion. It would even pay for itself over time. I kept thinking back to it seeing the rise of the CDN's and the security approaches that trust them.
After the Snowden leaks it really seems nonsensical to give Cloudflare the benefit of the doubt and assume that they aren't compromised.
Or prevented using abstraction that do bounds checking. Or even just used ragel with a memory safe language and prevented all issues like that from ever happening. Probably would have been less work even with the reimplementation of an http proxy from scratch.
drastically reduced, but not quite ever.
For instance, use a GC language, especially in this domain, you might do some data pooling to reduce GC overhead. Maybe you forget to clear data in the pool. Same kind of error can result.
But yes, I feel like security sensitive stuff like this shouldn't be done in C / C++ any more.
I think you are overestimating the amount of people doing their regular browsing through Tor
I think the decision that goes on in the minds of most site operators is "fuck convenience and sleazy Tor users, I want my site to be as safe as they can make it".
It's worth noting that other reverse proxy providers I worked with when freelancing expose the very same controls to site owners. Based on anecdotal knowledge, I'd say anonymized users accessing a site behind CF are subject to less hassle than those accessing a site behind something like X4B with comparable settings.
Sure, the proportion of requests passing through Tor are more likely to be malicious, but given the bandwidth constraints the adversary seems limited.
The costs aren't only the lost business from people like you, but people who should use Tor giving in. There's some wisdom to people even researching something as mundane as what their dog ingested using anonymized services, much less other medical questions.
Over in App Engine land, someone bypassed their JVM sandbox and managed to extract a copy of their JVM image, which included much of their revered base system statically linked into something like a 500mb binary.
Sorry, I'd have to go digging to find references to either of these incidents. At least in either case customer data wasn't leaking, but suffice to say it's a little bit of the pot calling the kettle black
And finally let's not forget the China incident, which rumour has it, resulted in a system compromise at Google right to the heart of their engineering organization. Of course they didn't get roasted like Yahoo recently did over their password leak
A site using Flexible SSL is no less secure than one using http://, and in fact is more secure, because nobody can MitM the connection between CloudFlare and the end user. The only thing vulnerable is the connection between the website and CloudFlare (~~and only to MitM, not to passive sniffing~~ EDIT: this isn't true, see ), but that's a much smaller and much better-protected surface area.
Now it's quite obvious that the alternative SSL options are much better because they secure the data properly the whole way. But claiming that Flexible SSL is somehow undermining the security of the web is extremely hyperbolic.
: The connection between the origin server and CloudFlare can in fact be passively sniffed. I thought Flexible SSL was the option to use an arbitrary self-signed cert, but it actually means no encryption.
Edit: Dear downvoters, can you please explain why you disagree? What I wrote really shouldn't be controversial in the least, so I don't understand the drive-by downvotes.
No company is likely to handle your payment details completely securely. You're relying on it working out on sheer luck most of the time and chargebacks on the rest.
Then there's the whole lone-auditor thing where a very large data-center or three are being audited by a single person over the course of two weeks, or less. That person is absolutely bombarded with information about an environment that is foreign to them. The end result I think is that so far companies have had it very easy to get by. They only have to pay for a week, or two at most, and whatever limited findings they get are fixed and they move on to the next year.
If companies actually had to live with a slower and more methodical audit, there would be many more findings and a lot more money spent, both on the auditing process and the resulting cleanup. The upshot is this would drive actual innovation in the space of having proper logging, file integrity, encryption, access controls, etc.
The whole audit industry is just.. icky. It needs a massive overhaul and the financials need to be forced to pay for it.
This is true, but conversely there is no legitimate use case for Flexible SSL. Having a datastore like Redis or MongoDB that by default listens insecurely on any address is almost as bad, and such things often compromise the security of a site if it e.g. sends your data across the internet to one of those, but at least there's a more-or-less legitimate use case for that default if it's used on a secured network - it's at least possible that someone using that default isn't deceiving their users. Whereas anyone using Flexible SSL is necessarily deceiving their users (I mean you can argue users might genuinely think "I don't trust my local cafe operator but I do trust the completely public, unsecured internet", but I don't think that's a coherent position for anyone to take).
That said, now that we have Let's Encrypt, and as more tooling gains support for automatically handling that, the value of Flexible SSL is going down, and I do hope they retire it eventually.
That's putting the cart before the horse. "Every website should offer" authentication and confidentiality, that's why we want every website to use HTTPS; having a URL that starts with https:// is not a goal in itself.
Security is not binary, but you keep treating it like it is. Security is a continuum, and any progress you make towards perfect security is good.
I would strongly dispute the "much". If anything the local network is more likely to be trustworthy than the remote network - people keep talking about cafe wifi, but the user likely knows who's running the cafe wifi and can complain if they start injecting ads etc. Whereas the user has literally no idea who might be on the connection path between cloudflare and the website and listening in, MitMing or anything.
http:// versus https:// is inherently binary; there's no way to display a connection as http⸵:// . If it doesn't mean "encrypted while transiting the public Internet" at least then what does it mean?
Indeed - so we should be applying all of those against CloudFlare, and any other organization that offers or uses a "Flexible SSL"-like product, as firmly as we can.
If the company is handling sensitive data, such as credit card information or medical information, there's already regulations to handle that. There's literally no point in trying to add regulations around Flexible SSL specifically, since the usage of Flexible SSL likely already contravenes the regulations for that sensitive data and therefore companies handling that data shouldn't be using it.
If the company isn't handling sensitive data, then again there's no point in adding regulations around Flexible SSL, because what possible benefit would that serve?
Flexible SSL is simply one tool that websites can use. It's intended to be used by sites that would otherwise just be using http://. Sites that do protect more sensitive information certainly could use it, but that would be a bad decision on their part. And we don't need regulations around it specifically, because there's also a million other bad decisions that company could make that would expose that data, and there's really nothing special about Flexible SSL that makes it in particular need of regulation.
I think serving a site over https:// amounts to advertising that information sent to/from that site will not be sent unencrypted over the public internet, and users will use that when deciding what things are or aren't safe to enter into that site. Surely there are regulations that already apply to that? And in any case regulations are only one of the options you mentioned; we should be applying a lot more shame to CloudFlare and anyone who uses "Flexible SSL".
In their defense, this is a flaw of the whole SSL/TLS security model. I think even Google did that before Snowden, presented you with https:// urls but proxied everything in clear text (they claim they don't do it now). Still, you can be pretty sure that many https websites might pass traffic in clear text to their backends and not necessary take security even a little bit seriously.
EDIT: Original comment said he could pull content off Google results. To respond to the new one:
No, they're not worlds apart when you're on the backbone. They still go through other people's datacenters and that's what causes the problem - we're not talking about stuff that goes over wifi or corporate networks here - we're talking generally just big ISPs in both cases.
It can be, in several ways. Most critically, it stops browsers from detecting the connection as insecure and applying mitigations.
Browsers also prevent HTTPS sites from embedding active content from HTTP sites.
The reality is, you're much more likely to get sniffed on public wifi or even your school or workplace network than someone running the server in a datacenter is, generally speaking if someone can sniff them at a DC they can do much more already. So it's still a respectably huge security gain for users.
And they do offer a good way to secure this connection too where you can do full SSL and use a certificate signed by them.
Would you be more comfortable if they offered another way to represent this to the browser? An X-Endpoint-Insecure header or something like that?
Yes, definitely, _Cloudflare_ should own this and push it through. You know they won't though because that would inconvenience their customers.
So no, it's not 100% secure, but it's far far better than having an unsecured http:// connection.
As for the green lock, you can blame that on Chrome. I have no idea why they insist on using a green lock and green "Secure" text for DV certs. Safari only uses a green lock / green text for EV certs, which is a lot better (and I don't know offhand what Firefox or Edge do). Of course, you could have an EV cert and still use Flexible SSL, but anyone who cares enough to get an EV cert should know better than to use Flexible SSL anyway, and there's a great many ways to make your server insecure, using Flexible SSL is very far from the worst way.
All that said, it would be great if CloudFlare would just stop offering Flexible SSL in favor of the self-signed CSR approach. Any CloudFlare customer who can create their own cert to talk to CloudFlare can also create a CSR to get a cert from CloudFlare just as easily, so it's not clear to me why they still even offer Flexible SSL.
: I thought Flexible SSL was the option to use an arbitrary self-signed cert on the origin server. gkop pointed out that, no, Flexible SSL means no encryption at all.
How is it secure? CloudFlare allows you to send this traffic in the clear. If they required this traffic be HTTPS, that would be far better for web security.
When observing non-technical users, I still see people clicking through blatant full page cert errors after connecting to WiFi because they've been implicitly trained that it's the captive portal making them sign in.
Where would you even start to address this? Everything you've been serving is potentially compromised, API keys, sessions, personal information, user passwords, the works.
You've got no idea what has been leaked. Should you reset all your user passwords, cycle all or your keys, notify all your customers that there data may have been stolen?
My second thought after relief was the realization that even as a consumer I'm affected by this, my password manager has > 100 entries what percentage of them are using CloudFlare? Should I change all my passwords?
What an epic mess. This is the problem with centralization, the system is broken.
You can start by cross referencing your password manager with this list, and working your way out from there.
I find it really interesting that they registered that particular misspelling and they both point to the same servers. I can see doing this for some obvious domains like gogle.com, but the distinction there is simply that r+n looks like m.
Probably a really obvious answer here, but my guess is that they are trying to help people throw off the scent of someone browsing a history.
Yes. Right now. Don't wait for the vendor to notify you.
> What an epic mess. This is the problem with centralization, the system is broken.
If it only took 60 seconds per site, it would still take eight hours to change them all.
Might change a few key passwords, though. Couldn't hurt. I only have a couple of bank/financial passwords at this point. And my various hosting service access passwords.
Anything else is not worth the hassle -- and mostly would have 2FA anyway.
The decision to wear a seatbelt isn't driven by the probability of needing it, the decision is drive by the magnitude of exposure to an event where you would need it.
You misunderstand. My argument is explicitly around "What is the potential effect?" That's why I listed changing financial passwords is on my list of things that I might do. (Though see below for why I won't.)
If I only change passwords where someone can do real damage (my primary social media accounts, my accounts that have a current, saved credit card, and any hosting-related accounts) then I've already hit the 98th percentile in damage avoidance. And as I pointed out above, most (all?) of those accounts are unaffected because they don't use CloudFlare at all.
If someone has stolen my password to the Woodworking Forums, and they ... what, post rabid alt-right spam in my name and get me banned? Oh well, either tell them that it was hacked, or if they don't believe me, let that account die and create a new one, if I ever decide to go back and post something again. No big deal. I haven't used it in years anyway, and I can create unlimited new (wildcard-based) email addresses on any of several domains I own.
Aside from the top 10-15 sites I use, I rarely have logins that are that important, anyway. So I'm totally basing this on worst-case damage assessment, not on "how likely it is I'm attacked."
AND...I just looked through all of the top sites I use, and according to the HTTP header, none of them is served using CloudFlare at all (I only checked the index page of each, but none have the telltale CF-Cache-Status headers). No financial sites, no shopping sites that have my credit card, no social media sites. So where's the fire exactly?
Which one is it? Hacker News.
The same isn't quite true for my blogger account.
The cost of your life is much higher than your blogger account, but it's not literally infinite, even from your own perspective.
If it were truly infinite, then it would be irrational for you ever to take any action that were not 100% motivated by the desire to protect your life. (Not just "never take any risks", but literally irrational not to actively spend every waking second solely on that goal).
Instead I'm using KeePass. KeePass is open source and has its "full stack" of encryption available for review. For LastPass I need to trust they're doing everything right, and that a government actor hasn't asked for some kind of backdoor. It's so easy to screw up security that I'm more comfortable trusting two levels of security: That KeePass has its encryption done right, and that Google Drive keeps my KeePass file out of the hands of bad-guys.
LastPass would become a single point of failure compared to what I'm doing: They just need to make one mistake and suddenly any bad guy gets all of my passwords.
Nice feature for LastPass, though.
So LastPass isn't the password manager mentioned in the post.
When I log into the Woodworking Forums, I have to use a password. If someone steals my Woodworking Forums authentication and posts as me there, um....Oh well. Sucks, and I'll clean up the mess.
Glancing through my password vault (kept in KeePass, for those wondering) I have some in there that I literally haven't used since before Cloudfare was founded, like the Creative Labs developer site.
(Where I mean some other sites that are not at all HN, but might plausibly exist.)
I'm pretty sure kogir came up with that one and he's been off working on his contrarian bug tracker for ages.
$ curl -I okcupid.com
In other words: Just assume that everything has been compromised. With how much of the web CloudFlare controls nowadays, you're not going to be far off anyway.
digitalocean.com name server walt.ns.cloudflare.com.
digitalocean.com name server kim.ns.cloudflare.com.
$ host -t NS okcupid.com
okcupid.com name server nameserver2.okcupid.com.
okcupid.com name server nameserver1.okcupid.com.
to get the A Record, then
$ whois 126.96.36.199|grep Cloudflare
Not 100% reliable, but should do the Job.
whois $(dig +short yoursitehere.com) | grep -i 'Cloudflare' 1>/dev/null; if [[ $? -eq 1 ]]; then echo 'Didnt find CloudFlare'; else echo 'Found CloudFlare'; fi
Like you said, not 100% reliable though. For example, I'm pretty sure Reddit uses CloudFlare, but their whois mentions Fastly, which is a competitor.
append /cdn-cgi/trace to the URL and you will some debug info
...but since this bug has been out in the wild since perhaps 2016-09-22, now is indeed, the time to go and reset your active sessions and change all your passwords.
If your site is served through Cloudflare, assume it's all out there because it might be. Standard Big Red Button(tm) procedure.
I don't run any particularly impressive sites but I'll be resetting passwords today. Also cycling things I use behind Cloudflare like DigitalOcean passwords/API keys.
It's supposed to be read-only Friday, Cloudflare :(
In my opinion, if my accounts get compromised because the provider uses Cloudflare and leaks my data all over, it's their fault, not mine... It's not my job to guess which services are using Cloudflare, which ones were affected... and further, if my account gets compromised, others presumably will.
(PS: Of course you may need to change passwords if you reuse passwords from one service to the other, but obviously you shouldn't be doing that in the first place.)
While this event is orders of magnitude less severe than my example, depending on the service that could be compromised there can be sufficient repercussions that you could not be made whole or avoid on-going inconvenience through the legal system or other acts of the genuinely responsible party.
I absolutely get and sympathize with where you're coming from... but you may want to check a few of your more important accounts none-the-less :-)
> The examples we're finding are so bad, I cancelled some weekend plans to go into the office on Sunday to help build some tools to cleanup. I've informed cloudflare what I'm working on. I'm finding private messages from major dating sites, full messages from a well-known chat service, online password manager data, frames from adult video sites, hotel bookings. We're talking full https requests, client IP addresses, full responses, cookies, passwords, keys, data, everything.
This is huge.
I mean, seriously, this is REALLY HUGE.
You have a function that strips all colons from your input. For some reason - in certain cases - your code misbehaves and when you are replacing the colons with an empty character you accidentally replace that colon with other data you have in the memory. So now all the colons in your input have been replaced with data that you shouldn't have touched. So now whoever sent you an input, gets back that input + more data they shouldn't be able to see.
And Google in this case caches those output strings.
Imagine I'm having a chat on some website X, which uses Cloudflare. Cloudflare acts as a man in the middle, meaning my request, and the response, likely pass through its memory at some point to allow me to communicate with X.
Later, a Google bot comes along and requests a page from site Y. Because of this bug, random bits of memory that were left around on the Cloudflare server get inserted into the response to the bot's request. Those bits of memory could be from anything that's gone through that server in the past, including my conversations on website X. The bot then assumes that the content that Cloudflare spits out for website Y is an accurate representation of website Y's contents, and it caches those contents. In this way, my data from website X ends up in Google's cached version of website Y.
Then Google accesses the website as the crawler (user B), and their header and data is saved in M2. However, Google triggered a bug and now has access to M1 as well. So now Google sees their own headers + my data + other garbage.
Google gets this HTML and caches it and that's how it ends up there.
"We leaked information from Customer A to Customer B by accident" is the first order problem.
But the existence of web caches means that all that private information of customer A is potentially fucking everywhere now.
How do you even clean this up? How do you even start?
So a request sent to Cloudflare customer A's site could return data from Cloudflare customer B, including data that B thought was only being served via https to authenticated users of B.
Apparently 7xx sites had this enabled, but that affected 4000ish other sites that happened to be on the same infrastructure.
For certain other sites, with malformed html, there is a bug that caused it to grab random data (headers and body) from memory and include it in the body of the response HTML. (Some html rewriting product that cloudflare offered was broken and it ran on the same servers.)
This stuff got sent to peoples browsers and also to web indexers like Google or Bing.
Google lets you search for stuff and will also show you the original page that it scraped, making it easy to find this data.
Edit: Also you may be seeing more headers in examples because headers are easier to search for.
since anyone can put a broken page behind cloudflare, all you need to do is request your own broken page through cloudflare, and start collecting the random "secure" data that comes back.