Cloudflare Reverse Proxies Are Dumping Uninitialized Memory

tptacek · on Feb 23, 2017

Oh, my god.

Read the whole event log.

If you were behind Cloudflare and it was proxying sensitive data (the contents of HTTP POSTs, &c), they've potentially been spraying it into caches all across the Internet; it was so bad that Tavis found it by accident just looking through Google search results.

The crazy thing here is that the Project Zero people were joking last night about a disclosure that was going to keep everyone at work late today. And, this morning, Google announced the SHA-1 collision, which everyone (including the insiders who leaked that the SHA-1 collision was coming) thought was the big announcement.

Nope. A SHA-1 collision, it turns out, is the minor security news of the day.

This is approximately as bad as it ever gets. A significant number of companies probably need to compose customer notifications; it's, at this point, very difficult to rule out unauthorized disclosure of anything that traversed Cloudflare.

tptacek · on Feb 24, 2017

In case you're wondering how this could be worse than Heartbleed:

Yes, apparently the allocation patterns inside Cloudflare mean TLS keys aren't exposed to this vulnerability.

But Heartbleed happened at the TLS layer. To get secrets from Heartbleed, you had to make a particular TLS request that nobody normally makes.

Cloudbleed is a bug in Cloudflare's HTML parser, and the secrets it discloses are mixed in with, apparently, HTTP response data. The modern web is designed to cache HTTP responses aggressively, so whatever secrets Cloudflare revealed could be saved in random caches indefinitely.

You really want to see Cloudflare spend more time discussing how they've quantified the leak here.

jgrahamc · on Feb 24, 2017

You really want to see Cloudflare spend more time discussing how they've quantified the leak here.

What would you like to see? The SAFE_CHAR logging allowed us to get data on the rate which is how I got the % of requests figure.

tptacek · on Feb 24, 2017

How many different sites? Your team sent a list to Tavis's team. How many entries were on the list?

jgrahamc · on Feb 24, 2017

We identified 3,438 unique domains. I'm not sure if those were all sent to Tavis because we were only sending him things that we wanted purged.

rdl · on Feb 24, 2017

3438 domains which someone could have queried, but potentially data from any site which had "recently" passed through Cloudflare would be exposed in response, right? Purging those results helps with search engines, but a hypothetical malicious secret crawler would still potentially have any data from any site.

dsp1234 · on Feb 24, 2017

It doesn't have to be a secret crawler. Just one that wasn't contacted by cloudflare (I didn't see any non-US search providers mentioned).

kevcampb · on Feb 24, 2017

In other words, Baidu are currently sitting on a treasure trove of keys and passwords.

Kalium · on Feb 24, 2017

Possibly not, Baidu and CloudFlare have a well-documented long-term partnership.

snaky · on Feb 24, 2017

Maybe there's much more to worry about Baidu's particular not-so-well documented, but longer-term partnerships.

Kalium · on Feb 24, 2017

Oh, absolutely. Baidu's relationship with their host nation should be a source of concern for us all. I've heard some interesting and unusual stories.

But they're probably aware of this issue and know enough to go looking to purge their caches.

seesomesense · on Feb 24, 2017

Or Baidu know enough to not purge their caches. Think of the amount of tangible gratitude that their host nation would show them for access to some potentially tasty information....

etherealG · on Feb 24, 2017

Swap baidu for google or microsoft in that sentence and it still has the same problems. Every government 3 letter agency has a vested interest in the secrets.

AndrewKemendo · on Feb 25, 2017

Whether you believe it or not, there is actually a tangible difference between the relationships US corporations have with the USG vs other nations and their corporate entities.

gonzo · on Feb 24, 2017

They're not all 3 letters. (e.g. GHCQ, ASIO, CSIS, DGSI, etc.)

deftturtle · on Feb 26, 2017

It's an expression

lmm · on Feb 24, 2017

Well, purge their public cache, after taking a private dump and supplying it to those who would find value in such a thing.

bogomipz · on Feb 24, 2017

>"I've heard some interesting and unusual stories."

Do you care to share or elaborate on this?

Kalium · on Feb 24, 2017

They're not my stories to share, I'm afraid.

homakov · on Feb 24, 2017

+Yandex

foodstances · on Feb 24, 2017

I wonder if archive.org or archive.is have anything cached...

doesXcloudflare · on Feb 24, 2017

archive.is was red, meaning it uses Cloudflare....

www.doesitusecloudflare.com

EvanAnderson · on Feb 24, 2017

The concern isn't that they use Cloudflare. The concern is that they're spidering the Internet, and therefore might be storing cached data that Cloudflare leaked.

tomjakubowski · on Feb 24, 2017

while the internet archive / wayback machine do spider, I think archive.is only archives a site "on demand"

eriknstr · on Feb 24, 2017

Yes but with all the people and even automated 3rd-party scripts making use of archive.is, it is practically a spider.

teh_klev · on Feb 24, 2017

No TLS on this site?

taviso · on Feb 24, 2017

correct

Trundle · on Feb 24, 2017

Have you asked them for an eta on your shirt?

aioprisan · on Feb 24, 2017

You know a company isn't serious about security when their top security bounty is a t-shirt. Instagram has a better policy, for God's sake.

electrum · on Feb 24, 2017

Instagram has been part of Facebook for over four years, so they are covered by the Facebook Bug Bounty: https://www.facebook.com/whitehat

mikey_p · on Feb 24, 2017

I'd love to see some evidence that big bounties correspond to more exploits being found. In my experience, they tend to result in an increasing number amount of crap for your security team to sort through.

lclarkmichalek · on Feb 24, 2017

Plenty of companies that are serious about security don't do bounties. They're a real pain to administer apparently

aioprisan · on Feb 24, 2017

I'd expect for a company that can MITM a good chunk of the Internet to incur that pain in exchange for all the money customer pay them.

rdl · on Feb 24, 2017

fuck :(

mcphilip · on Feb 24, 2017

Indeed, this is the point in the comment thread where you get the feeling the internet is broken.

gambler · on Feb 24, 2017

What I'm wondering: how many fuckups like this need to happen for website owners to realize that uber-centralization of vital online infrastructure is a bad idea?

But I guess there is really no incentive for anyone in particular to do anything about this, because it provides a kind of perverted safety in numbers. "It's not just our website that had this issue, it's, like, everyone's shared problem." The same principle applies to uber-hosting providers like AWS and Azure, as well as those creepy worldwide CDNs.

Interestingly, it seems this is one of the cases where using a smaller provider with the same issue would really make you better off (relatively speaking) because there would be fewer servers leaking your data.

whitepoplar · on Feb 24, 2017

Cheaply fix DDoS attacks as Cloudflare does and people will move away. It's a big problem and the general consensus is, "just use Cloudflare to fix your DDoS problem!"

loup-vaillant · on Feb 24, 2017

You might as well scrap http entirely, with or without the "s".

The web simply doesn't scale. The only way to fix DDoS reliably is peer-to-peer protocols. Which hardly ever happens because our moronic ISPs believed nobody needed upload. Or even a public IP address.

pyvpx · on Feb 24, 2017

as someone who has been involved in a number of moronic ISP designs, operations, and build outs --- asymmetric access networks are designed that way due to actual traffic patterns and physical medium constraints.

you can argue "if everything was symmetric, then traffic patterns would be different" and you might be right, but that's not how the market went or how the "internet" started.

the client-server paradigm drove traffic patterns, and there was never any market demand or advantage by ignoring it.

loup-vaillant · on Feb 24, 2017

That's not how the market went because the market is often moronic. Case in point: QWERTY. (Why QWERTY is actually the best layout ever is left as an exercise to the occasional extremist libertarian)

Yes, traffic patterns at the time was heavily slanted towards downloads. I know about copper wires and how download and upload limit each other. Still, setting that situation in stone was very limiting. It's a self fulfilling prophecy.

You don't want to host your server at home because you don't have upload. The ISP sees nobody has servers at home so they conclude nobody needs upload. Peer-to-peer file sharing and distribution is slower than YouTube because nobody has any upload. Therefore everybody uses YouTube, and the ISP concludes nobody uses peer-to-peer distribution networks.

And so on and so forth. It's the same trend that effectively forbid people to send e-mail from home (they have to ask a big shot provider such as Gmail to do it for them, with MITM spying and advertisement), or the rise of ISP-level NAT, instead of giving everyone a public IPv6 address like they all deserve (including on mobile).

There is a point where you have to realise the internet is increasingly centralised at every level because powerful special interests want it to be that way.

Regulation is what we need. Net neutrality is a start. Next in line should be mandated symmetric bandwidth, no ISP-wide firewall (the local router can have safe default settings), public IP (v4 or v6) for everyone, and no restriction on usage patterns (the ISP should not be allowed to forbid servers). Ultimately, our freedom of expression and freedom of information depends on this. They are messing with human rights.

the8472 · on Feb 25, 2017

> Peer-to-peer file sharing and distribution is slower than YouTube because nobody has any upload.

And because IP multicast doesn't work over the internet. If it did, even if merely to some limited extent, some asymmetries would be far easier to stomach.

jacquesm · on Feb 24, 2017

> you can argue "if everything was symmetric, then traffic patterns would be different" and you might be right, but that's not how the market went or how the "internet" started.

It may not have been how the market went but it definitely was how the internet got started.

evilDagmar · on Feb 25, 2017

You say this as I look at my positively anemic upstream that makes browsing even simple Nagios pages painfully slow, and my ISP that doesn't offer anything substantively better without a massive increase in monthly costs.

The traffic patterns for higher upstream aren't there because they can't be there.

regularfry · on Feb 24, 2017

Decentralisation doesn't do a whole lot better. Just think about MTA or DNS vulnerabilities, for a start.

creshal · on Feb 24, 2017

Or look at how many websites are still vulnerable to Heartbleed.

kordless · on Feb 24, 2017

The Internet will remain periodically broken until we put a cost metric on the breaking (and working) times.

genericpseudo · on Feb 24, 2017

Which means any user who has used any service which uses CloudFlare, right? At least in theory.

biafra · on Feb 24, 2017

How can I find out which services I have accounts with are using cloudflare? Or better have been using cloudflare in recent months? Assume I have a list of domains, where I have accounts.

nikisweeting · on Feb 24, 2017

We're compiling a list of affected domains using several scrapers here:

https://github.com/pirate/sites-using-cloudflare

gabemart · on Feb 24, 2017

I ranked your list of Cloudflare-using domains by their Alexa rank.

Sharing here in case anyone else finds it useful

(warning - it's 1.1MiB gzipped / 2.4MiB uncompressed)

https://polarisedlight.com/tmp/cf_ranked.txt

any domains outside the top 1 million are ommitted

dustyfresh · on Feb 24, 2017

Hacked this together to determine which ones out of the list are potentially using cloudflare reverse proxies. You could also send an HTTP request to them and look for the cloudflare-nginx Server header.

https://gist.github.com/dustyfresh/4d8d364ca4c6da465cfc7d817...

teraflop · on Feb 24, 2017

You can check IP whois records, but it'll be very hard to be 100% sure about any of them. For example, one of the examples from the bug report is Uber, which doesn't use Cloudflare for its home page but apparently does for one of its internal API endpoints.

revi · on Feb 24, 2017

There is a chrome extension named "claire"[1] which tells you if they use CloudFlare or not, but not sure about other browser (FF or else).

[1]: https://chrome.google.com/webstore/detail/claire/fgbpcgddpmj...

photon-torpedo · on Feb 24, 2017

For Firefox, I just made this: https://github.com/traktofon/cf-detect

eropple · on Feb 24, 2017

At this point, I would just start rolling everything. (And I have.)

user5994461 · on Feb 24, 2017

[edit: correction]

rdl · on Feb 24, 2017

No. 3438 domains were configured to expose this, and were potentially queried and logged by a far greater number of people. And yet other data (anything in cloudflare for months) could be exposed.

Potentially huge amounts of stuff might be exposed, but I have some assurances that "the practical impact is low" from someone I trust, so I think it's just a lot of random data. I'd still rotate all credentials which passed through Cloudflare in the past N months (and if I were a big consumer site NOT on Cloudflare, I might change end user passwords anyway, due to re-use), but I don't think it will be the end of the world.

georgemcbay · on Feb 24, 2017

It may seem like a nightmare Internet data security scenario, but it looks like Tavis is going to get a free t-shirt out of the deal, so let's just call it a wash.

tptacek · on Feb 24, 2017

What anomalies would be apparent in your logs if someone malicious had discovered this flaw and used it to generate a large corpus of leaked HTTP content?

aidos · on Feb 24, 2017

That's also what I'm interested in. There's a lot of talk about the sites that had the features enabled that allowed the data to escape, but it's the sites that were co-existing with those that were in danger.

In terms of the caching, knowing the broken sites tells you where to look in the caches after the fact, but do you have any idea of who's data was leaked? Presumably 2 consecutive requests to the same malformed page could/would leak different data.

vmarsy · on Feb 24, 2017

> Presumably 2 consecutive requests to the same malformed page could/would leak different data.

Wouldn't the second request be served from the CDN cache? Since for Cloudfare that particular page is a valid cached page, it would send you that same page on the second request.

artursapek · on Feb 24, 2017

Only if the leaked memory is in the response before the response is cached.

aidos · on Feb 24, 2017

I don't know enough about the layers in the cloudflare system to say. Does it only apply to cached pages? What about https? They would have the ssl termination first and then these errant servers behind that - none of those pages would be cached, right?

markonen · on Feb 24, 2017

Cloudflare doesn't cache HTML pages by default.

beachstartup · on Feb 24, 2017

it seems to me you'd have to know at a minimum:

1. every tag pattern that triggers the bug(s)

2. which broken pages with that pattern were requested at an abnormally high frequency or had an unusually short TTL (or some other useful heuristic)

3. on which servers, and at what time, in order to tell

4. who's data lived on the same servers at the same time as those broken pages

to even begin to estimate the scope of the leak. and that doesn't even help you find who planted the bad seeds.

tptacek · on Feb 24, 2017

Here's a question your blog post doesn't answer but should, right now:

Exactly which search engines and cache providers did you work with to scrub leaked data?

dsp1234 · on Feb 24, 2017

Also, have you worked with any search engine to notify affected customers.

ex: Right now there is in an easily found google cached page with OAuth tokens for very popular fitness wearable's android API endpoints

wilde · on Feb 24, 2017

Are you guys planning to release the list so we can all change our passwords on affected services? Or are you planning on letting those services handle the communication?

pepve · on Feb 24, 2017

That list contains domains where the bug was triggered. The information exposed through the bug though can be from any domain that uses Cloudflare.

So: all services that have one or more domains served through Cloudflare may be affected.

The consensus seem to be that no one discovered this before now, and no bad guys have been scraping this leak for valuable data (passwords, OAuth tokens, PII, other secrets). But the data still was saved all over the world in web caches. So the bad guys are now probably after those. Though I don't know how much 'useful' data they would be able to extract, and what the risks for an average internet user are.

ComputerGuru · on Feb 24, 2017

> The consensus seem to be that no one discovered this before now, and no bad guys have been scraping this leak for valuable data (passwords, OAuth tokens, PII, other secrets).

This is literally as bad as it gets, anyone trying to palliate the solution has something to sell you. You'd have to be an idiot to think that $organization (public, private, or shadow) doesn't have automated systems to check for something as stupid simple as this by querying resources at random intervals and searching for artifacts.

Someone found it. Probably more than one someone. Denial won't help.

wilde · on Feb 24, 2017

Ah, gotcha. Thanks for explaining!

jimmaswell · on Feb 24, 2017

Myself and 4 other people I know all happened to get their reddit accounts temporarily locked due to a "possible compromise" in the past week or so, which has never happened to any of us before. Anyone else?

gooeyblob · on Feb 24, 2017

That would be unrelated to this. We haven't taken any action on any accounts because of this issue and have no plans to, as we (reddit.com) were unaffected.

rand77763 · on Feb 24, 2017

Happened to me as well. If it's not related to CloudBleed, can you tell us specifically what happened? It's making me not trust Reddit.

gooeyblob · on Feb 24, 2017

If anything, it should make you trust reddit more! I don't know the exact details as to why your account may have been locked, but generally it will be because we're being proactive and have some signal that your account is using a weak or reused password.

jimmaswell · on Feb 24, 2017

Why was reddit on the list of affected sites, and how do you know reddit wasn't affected?

mirimir · on Feb 24, 2017

My reddit password failed a week ago, and I had to do an email reset. And I use a password manager.

jimmaswell · on Feb 24, 2017

In that case I'm even more inclined to think it might be because of Cloudbleed.

nikisweeting · on Feb 24, 2017

I've compiled a list of 7,385,121 domains that use Cloudflare here: https://github.com/pirate/sites-using-cloudflare

dbmnt · on Feb 24, 2017

This list is misguided. It's just a dump of sites using Cloudflare's DNS, a hugely popular and (mostly) free service. The vulnerability only affected customers using Cloudflare's paid SSL proxy (CDN) service. The latter is a much smaller subset. Even then, only a subset of the SSL proxy users, those with certain options enabled that caused traffic to go through a vulnerable parser, were really impacted. I'm not sure a list as broad as this is helpful.

aidos · on Feb 24, 2017

At least some of this is incorrect. The issue is NOT the pages running through the parser — the issue is the traffic running through the same nginx instance as vulnerable pages.

asdfaoeu · on Feb 25, 2017

You are right in that other sites are affected but only the sites running through the parser would have leaked content in their cached pages.

ThePhysicist · on Feb 24, 2017

This is not correct in my understanding: The sites with certain options enabled produced the erroneous behavior, but the data that would get leaked through this behavior could be from any site that uses Cloudflare SSL (as this requires Cloudflare to tunnel SSL traffic through their servers, decrypt it and re-encrypt it with their wildcard certificate). So if I understand correctly anyone using the (free) Cloudflare SSL service in combination with their DNS is affected.

dbmnt · on Feb 27, 2017

I was wrong about the nature of the proxy issue, but right about DNS-only customers. Customers using only the free DNS service were not impacted by this at all, because traffic never flowed through the proxies.

ThePhysicist · on Feb 27, 2017

Ah yes, sure if you only use DNS then your data never touches a CloudFlare server. Lucky you ;)

nikisweeting · on Feb 24, 2017

(whoops forgot to remove dupes, it's only 4,287,625) https://github.com/pirate/sites-using-cloudflare/raw/master/...

tlrobinson · on Feb 24, 2017

If I'm understanding correctly, that list would include not only the 3,438 domains with content that triggered the bug, but every Cloudflare customer between 2016-09-22 and 2017-02-18.

Xorlev · on Feb 24, 2017

Can we trust it was only those domains?

gog · on Feb 24, 2017

Not really. If a site is using Cloudflare protection for only some of their subdomains they do not show on this list even if the site itself is in the alexa top 10k sites.

And of course all other sites that are not in alexa 10k are not in this list (if they are not on some other lists used, you can see the source of lists in the README of the Github repo).

dbmnt · on Feb 24, 2017

No. Only Cloudflare customers using a subset of features of the SSL proxy service are impacted.

Cloudflare has a lot of customers who only use the free DNS service, for example.

tptacek · on Feb 24, 2017

Careful. It appears that any Cloudflare client who was sending HTTP/S traffic through their proxies is affected. A small subset of their customers had the specific problem that triggered the bug, but once triggered, the bug disclosed secrets from all their web customers.

You're not exposed if you never sent traffic through their proxies; for instance, if you somehow only used them for DNS.

dbmnt · on Feb 27, 2017

I suspect there are a large number of Cloudflare customers that only use their DNS. I have a couple of domains in this category.

The DNS service is essentially free. It's an upgrade from most registrars' built-in DNS. It's a pretty robust solution, really -- global footprint, DNSSEC, fully working IPv6, etc.

My point is, the actual number of impacted customers was much smaller than the entire set of Cloudflare customers. There are lists in this thread that still reference hundreds of thousands (millions?) of sites, and that's just wrong.

(I agree on your first point though; I was confused about the nature of the proxy bug at first).

jameshart · on Feb 24, 2017

What I find remarkable is that the owners of those sites weren't ever aware of this issue. If customers were receiving random chunks of raw nginx memory embedded in pages on my site, I'd probably have heard about it from someone sooner, surely?

I guess there is a long tail of pages on the internet whose primary purpose is to be crawled by google and serve as search landing pages - but again, if I had a bug in the HTML in one of my SEO pages that caused googlebot to see it as full of nonsense, I'd see that in my analytics because a page full of uninitialized nginx memory is not going to be an effective pagerank booster.

chatmasta · on Feb 24, 2017

Perhaps as a follow up to this bug, you can write a temporary rule to log the domain of any http responses with malformed HTML that would have triggered a memory leak. That way you can patch the bug immediately, and observe future traffic to find the domains that were most likely affected by the bug when it was running.

Or is the problem that one domain can trigger the memory leak, and another (unpredictable) domain is the "victim" that has its data dumped from memory?

aidos · on Feb 24, 2017

I believe that's the real issue. Any data from any couldflare site may have been leaked. Those domains allow Google etc to know which pages in their cache may contain leaked info, unfortunately the info itself could be from any request that's travelled through cloudflare's servers.

lnanek2 · on Feb 24, 2017

Yes, the victim can be a different site. Cloudflare's post mentions this: " Because Cloudflare operates a large, shared infrastructure an HTTP request to a Cloudflare web site that was vulnerable to this problem could reveal information about an unrelated other Cloudflare site. " https://blog.cloudflare.com/incident-report-on-memory-leak-c...

_wmd · on Feb 24, 2017

It shouldn't be too difficult to feed an instrumented copy of the parser some fraction of their cached pages (after all, that's what they're for.. right?) and calculate a percentage of how many triggered e.g. valgrind, or just some magic string tacked on the end of the input appearing in the output or similar

I prefer CloudScare to Cloudbleed :)

Natanael_L · on Feb 24, 2017

Downpour is my preference right now. The clouds are dumping everything they got

killing_time · on Feb 24, 2017

How about Cloudburst?

tiger3 · on Feb 24, 2017

CloudBust

Romanulus · on Feb 24, 2017

If only CloudShare wasn't a thing already. :)

JoshTriplett · on Feb 24, 2017

I'd suggest "FlareOut".

Hemospectrum · on Feb 24, 2017

Cloudflush.

midgetjones · on Feb 24, 2017

ShitFest

espadrine · on Feb 24, 2017

It is far from over, too! Google Cache still has loads of sensitive information, a link away!

Look at this, click on the downward arrow, "Cached": https://www.google.com/search?q="CF-Host-Origin-IP:"+"author...

(And then, in Google Cache, "view source", search for "authorization".)

(Various combinations of HTTP headers to search for yield more results.)

PuffinBlue · on Feb 24, 2017

> The infosec team worked to identify URIs in search engine caches that had leaked memory and get them purged. With the help of Google, Yahoo, Bing and others, we found 770 unique URIs that had been cached and which contained leaked memory. Those 770 unique URIs covered 161 unique domains. The leaked memory has been purged with the help of the search engines.

So I tried it too, and there's still data cached there.

Am I misunderstanding something - that above statement must be wrong, surely?

They can't have found everything even in the big search engines if it's still showing up in Google's cache, let alone the infinity other caches around the place.

EDIT: If the cloudflare team sees I see leaked credentials for these domains:

android-cdn-api.fitbit.com

iphone-cdn-client.fitbit.com

api-v2launch.trakt.tv

vengefulduck · on Feb 24, 2017

I'm also seeing a ton from cn-dc1.uber.com with oauth, cookies and even geolocation info. https://webcache.googleusercontent.com/search?q=cache:VlVylT...

sneak · on Feb 24, 2017

That's terrifying.

Thanks to Uber now requiring location services on Always instead of just when hailing a car, my and others' personal location history even outside of Uber usage could have been compromised. Sweet.

jedberg · on Feb 24, 2017

To be fair, you were kind of a fool if you actually let Uber have your location at all times. As soon as they announced that I blocked Uber from my location. I only allow it when I take an Uber (which is almost never now).

sneak · on Feb 24, 2017

Sometimes I'm in a rush and forget to turn it back to Never.

That doesn't make me a fool, it makes me human. Don't be a jerk. It's a dark pattern for a reason.

Dylan16807 · on Feb 25, 2017

If you only sometimes forget, then that's not letting them have your location at all times, and you weren't called a fool.

ejanus · on Feb 24, 2017

Not a fool but ...

Animats · on Feb 24, 2017

At least the location isn't embarrassing.[1]

[1] https://goo.gl/maps/FjQVttcZCpH2

RandomBK · on Feb 24, 2017

Oh my gosh, that's the Ivey Business School, where I graduated from last year. I didn't expect this to hit so close to home...

mattdeboard · on Feb 24, 2017

so sorry for your loss

kmfrk · on Feb 24, 2017

What did it show before it was taken down? In vague terms, of course.

infinity0 · on Feb 24, 2017

Could someone enlighten me on why malloc and free don't automatically zero memory by default?

Someone pointed me to MALLOC_PERTURB_ and I've just run a few test programs with it set - including a stage1 GCC compile, which granted may not be the best test - and it really doesn't dent performance by much. (edit: noticeably, at all, in fact)

People who prefer extreme performance over prudent security should be the ones forced to mess about with extra settings, anyway.

amalcon · on Feb 24, 2017

Some old IBM environments initialized fresh allocations to 0xDEADBEEF, which had the advantage that the result you got from using such memory would (usually) be obviously incorrect. The fact that it was done decades ago is pretty good evidence that it's not about the actual initialization cost: these things cost a lot more back then.

What changed is the paged memory model: modern systems don't actually tie an address to a page of physical RAM until the first time you try to use it (or something else on that page). Initializing the memory on malloc() would "waste" memory in some cases, where the allocation spans multiple pages and you don't end up using the whole thing. Some software assumes this, and would use quite a bit of extra RAM if malloc() automatically wiped memory. It would also tend to chew through your CPU cache, which mattered less in the past because any nontrivial operation already did that.

I personally don't think this is a good enough reason, but it is a little more than just a minor performance issue.

That all being said, while it would likely have helped slightly in this case, it would not solve the problem: active allocations would still be revealed.

masklinn · on Feb 24, 2017

> Some old IBM environments initialized fresh allocations to 0xDEADBEEF, which had the advantage that the result you got from using such memory would (usually) be obviously incorrect.

On BSDs, malloc.conf can still be configured to do that: on OpenBSD, junking (fills allocations with 0xdb and deallocations with 0xdf) is enabled by default on small allocations, "J" will enable it for all allocations. On FreeBSD, "J" will initialise all allocations with 0xa5 and deallocations with 0x5a.

magnetic · on Feb 24, 2017

> What changed is the paged memory model: modern systems don't actually tie an address to a page of physical RAM until the first time you try to use it (or something else on that page). Initializing the memory on malloc() would "waste" memory in some cases, where the allocation spans multiple pages and you don't end up using the whole thing. Some software assumes this, and would use quite a bit of extra RAM if malloc() automatically wiped memory. It would also tend to chew through your CPU cache, which mattered less in the past because any nontrivial operation already did that.

Maybe an alternative approach is to simply mark the pages to be lazily zeroed out when attached, in the Page Table Entries of the MMU. They wouldn't be zeroed out at the time of the call malloc(), but only when they are attached to a physical memory location (the first time you use it).

magnetic · on Feb 24, 2017

And it seems to me the OS should ensure the pages are zero'd out rather than user space (via malloc()) doing it, because it's still a security hole to let a process read data that it's not supposed to have access to (whether it's from another process or the kernel - it doesn't matter).

EdiX · on Feb 24, 2017

OS already zeroes out pages, obviously. But malloc doesn't usually request memory to the OS but takes a chunk from the already allocated heap.

Gibbon1 · on Feb 24, 2017

Unsure, not my job. But I read stuff along those lines. A modern OS plays all sorts of games to delay doing work. Allocate a couple of megs of memory and the OS sets up some pointers in a page table. And yes it'll keep already zero'd pages handy. And mark pages as dirty to be scraped clean later.

slashdev · on Feb 24, 2017

It doesn't need to affect your CPU cache, because x64 processors have non-temporal writes (streaming stores) that bypass the cache.

The stuff about eagerly allocating pages is spot on though.

There is calloc which allocates and zeroes memory, but people don't use it as often as they should.

infinity0 · on Feb 24, 2017

Parsers don't usually need to hold onto what they're parsing for a very long time, so unless they were running this parallel on a machine with 4k cores, I'd imagine it would be much more likely that a buffer overrun hits the middle of an already-freed allocation rather than going into an active one.

In terms of "wasting" memory, perhaps the kernel could detect that you are writing 0s to a COW 0 page and still not actually tie the page to physical RAM. (If you're overwriting non-0 data, well it's already in a physical page.)

I don't quite follow the details of the CPU cache issue and why that is more-than-minor.

I do think in this day and age we should be re-visiting this question seriously in our C standard libraries. If the performance issues are actually major problems for specific systems, the old behaviour could be kept, but after benchmarking to show that it really is a performance problem.

caf · on Feb 24, 2017

In terms of "wasting" memory, perhaps the kernel could detect that you are writing 0s to a COW 0 page and still not actually tie the page to physical RAM.

Writing to your COW zero page causes a page fault. Now, in theory you could disassemble the executing instruction and if it's some kind of zero write, just bump the instruction pointer and go back to userspace - but then the very next instruction in your loop that zeroes the next 8 bytes will cause the same page fault. And the next. And the next...

Taking a page fault for every 8 bytes in your allocation is completely infeasible. You'd be better off taking the hit of the additional memory usage.

wbl · on Feb 24, 2017

How about this idea: free() zeros or unmaps all memory it allocated. This shouldn't fault. The OS zeros pages when mapping them into the process space (which it should do anyway). I think that solves the problem.

twright0 · on Feb 24, 2017

free() doesn't know what portion of the memory you allocated actually got written to. So for the model where a large, page-spanning buffer is allocated and only a small portion used, this approach causes many unnecessary page faults at free () time as it tries to zero out lots of memory that was never used or paged in at all.

wbl · on Feb 25, 2017

Large buffers just get unmmaped so the OS can fix that problem.

awirth · on Feb 24, 2017

An invariant you get from most kernels is that all new memory pages are zeroed when mapped into processes (normally through mmap or sbrk), so you only have the paging problem when initializing with a value other than zero.

garrettr_ · on Feb 24, 2017

Zeroing on malloc and/or free would not have prevented this type of error, since the information disclosure was due to an overflow into an adjacent allocated buffer.

However, zeroing on free is generally a useful defense-in-depth measure because can minimize the risk of some types of information disclosure vulnerabilities. If you use grsecurity, this feature is provided by grsecurity's PAX_MEMORY_SANITIZE [0].

[0]: https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity...

roca · on Feb 24, 2017

Zeroing on alloc/free probably wouldn't have helped much with this bug. Data in live allocations would still be leaked.

Kalium · on Feb 24, 2017

> Could someone enlighten me on why malloc and free don't automatically zero memory by default?

The computational cost of doing so, I suspect.

Terr_ · on Feb 24, 2017

Just like why most filesystems don't zero deleted files.

infinity0 · on Feb 24, 2017

Neither of these are good reasons: I already talked about MALLOC_PERTURB_ (man mallopt) in my post and my naive performance tests, and we rarely get bad security holes based on data from deleted files left on filesystems.

pcwalton · on Feb 24, 2017

Unfortunately, people write microbenchmarks of malloc and free a lot (and not completely without reason: they do quite often show up high in profiles).

For example, binary-trees on the Benchmarks Game is basically malloc/free bound (or at least is supposed to be as Hans Boehm originally designed it). Likewise, most JavaScript benchmarks (V8 splay, for example) are heavily influenced by raw allocation performance. Many people choose browsers and programming languages based on relatively small differences in these results. All of the incentives align in favor of performance, not security, because performance is easy to measure and security is not.

kjksf · on Feb 24, 2017

You asked for a reason, not for a good reason.

malloc/free were designed around 1972. That was a time where performance was much more important and security concerns didn't really exists.

Modern systems, like Go, do zero-out newly allocated memory because they do consider a bit more security to be more important than a bit more performance.

But changing the defaults of malloc/free is not really an option and it would probably break stuff.

Especially on Linux, where, I believe, malloc returns uncommitted pages, which increases the perf advantage in some cases.

Security conscious programmers can use calloc() or write their own wrappers over malloc/free.

jackmott · on Feb 24, 2017

they aren't good reasons now. They were good reasons ~20 years ago.

language spec should probably now default to zeroing memory unless you specifically ask it not to....and maybe that should be a verbose option :)

nerdponx · on Feb 24, 2017

Are these results hardware independent? Maybe it makes a difference on older machines, or different architectures.

Lxr · on Feb 24, 2017

I imagine clearing memory on free is more relevant than MALLOC_PERTURB_?

ppoint · on Feb 24, 2017

calloc zeroes memory on allocation.

cjbprime · on Feb 24, 2017

Yes, I think the question was something like "why doesn't malloc call calloc?".

ppoint · on Feb 24, 2017

Always nice to have options. Not zeroing memory on allocation might save a few cpu cycles.

earthboundkid · on Feb 24, 2017

It's pretty much the definition of false economy. Would you rather save a few cycles or suffer debilitating security bugs at random intervals? Always use calloc unless a) there's a proven performance problem and b) you know for a fact that due to careful inspection/static analysis/black magic malloc is safe. Then use calloc anyway because why risk it?

ppoint · on Feb 24, 2017

It depends on the size of the chunk of allocated memory. If it is quite large, time spent zeroing it can be substantial. Then again, if you're allocating in performance critical path, you're doing it wrong anyways.

_wldu · on Feb 24, 2017

It takes time to do that.

toyg · on Feb 24, 2017

> that above statement must be wrong, surely?

Either they believe it's right, which means they're not competent enough to really assess the scope of the leak; or they don't believe it, but they went "fuck it, that's the best we can do".

In either case, it doesn't really inspire trust in their service.

blibble · on Feb 25, 2017

you missed one possibility: that they're deliberately attempting to downplay the severity to make themselves look less incompetent

sikhnerd · on Feb 24, 2017

jgrahamc: can you list which public caches you worked with to attempt to address this? It does not inspire confidence when even google is still showing obvious results

eastdakota · on Feb 24, 2017

Google, Microsoft Bing, Yahoo, DDG, Baidu, Yandex, and more. The caches other than Google were quick to clear and we've not been able to find active data on them any longer. We have a team that is continuing to search these and other potential caches online and our support team has been briefed to forward any reports immediately to this team.

I agree it's troubling that Google is taking so long. We were working with them to coordinate disclosure after their caches were cleared. While I am thankful to the Project Zero team for their informing us of the issue quickly, I'm troubled that they went ahead with disclosure before Google crawl team could complete the refresh of their own cache. We have continued to escalate this within Google to get the crawl team to prioritize the clearing of their caches as that is the highest priority remaining remediation step.

taviso · on Feb 24, 2017

Matthew, with all due respect, you don't know what you're talking about.

view-source:http://cc.bingj.com/cache.aspx?q=&d=4857656909960944&w=rj9cg...

view-source:http://cc.bingj.com/cache.aspx?q=&d=4901023173710126&w=n3mEZ...

view-source:http://cc.bingj.com/cache.aspx?q=&d=4558611265887320&w=urwoW...

view-source:http://cc.bingj.com/cache.aspx?q=&d=4592983872701813&w=Ghwdd...

view-source:http://cc.bingj.com/cache.aspx?q=&d=4997243316273666&w=wdpFH...

Not as simple as you thought?

phaed · on Feb 25, 2017

Thousands of years from now, when biological life on this planet is all but extinct and superintelligent AI evolving at incomprehensible rates roam the planet, new pieces of the great PII pollution incident that CloudFlare vomited across the internet are still going to be discovered on a daily basis.

mrich · on Feb 25, 2017

I was expecting this:

Thousands of years from now, when biological life on this planet is all but extinct and superintelligent AI evolving at incomprehensible rates roam the planet, taviso will still be finding 0-days impacting billions of machines on an hourly basis.

Be glad that Google is employing him and not some random intelligence agency.

ar0 · on Feb 25, 2017

I have huge respect for taviso and his team. Their track record in security work is so impressive. They are without a doubt extremely capable.

However, I am always wondering: are they really globally unique in their work and skill? So that they are really the ones finding all the security holes before anyone else does because they are just so much better (and/or with better infrastructure) than anyone else? Or is it more likely that on a global scale there are other teams who at least come close regarding skill and resources, but who are employed by actors less willing to share what they found?

I really do hope Tavis is a once-in-a-lifetime genius when it comes to vulnerability research!

djsumdog · on Feb 25, 2017

One of the big conservatories in the infosec world are people who sell 0-day exploits to "security companies." Some go for the tens of thousands of dollars. Ranty Ben talked about how some people live off this type of income, when it came up in a panel discussion at Ruxcon 2012.

tuyguntn · on Feb 25, 2017

No he is definitely not alone, some of them work for other security companies, for antivirus companies, some of them are selling found vulnerabilities

kakarot · on Feb 25, 2017

What's funny is he kinda just stumbled upon this bug accidentally while making queries.

If I were just casually googling two weeks ago and came across a leaked cloudflare session in the middle of my search results I think I would have vomited all over my desk immediately. Dude must have been sweating bullets and trembling as he reached out on twitter for a contact, not knowing yet how bad this was or for just how long it's been going on.

patcheudor · on Feb 25, 2017

Also still in Yahoo.

https://search.yahoo.com/search;_ylc=X3oDMTFiN25laTRvBF9TAzI...

http://208.71.46.190/search/srpcache?p=2001%3A56a%3Af651%3A6...

patcheudor · on Feb 25, 2017

Bing and Yahoo with the same cached content. Interesting:

http://208.71.46.190/search/srpcache?p=yOGHqpbGWiXrRAIqLM87w...

http://cc.bingj.com/cache.aspx?q=%22X-SSL-Server-IP+104.16.5...

npongratz · on Feb 25, 2017

I believe the 2009 Yahoo-Bing agreement is still in force, where Bing provides search results on Yahoo.com:

http://news.bbc.co.uk/2/hi/business/8174763.stm

I know the search I performed now on Yahoo states "Powered by Bing™" at the bottom.

patcheudor · on Feb 25, 2017

Yeah, I thought that could be it as well but was at the bottom of the Yahoo result:

Given they are identical results it's pretty clear it must be a shared index I suppose, that or the leaked memory was cached.

jsjohnst · on Feb 25, 2017

Yahoo provides a front end to the search results, Bing provides the crawl/search/archives.

kakarot · on Feb 25, 2017

What the hell does Yahoo even do anymore? Just email? Or is that just a proxy to hotmail?

jsjohnst · on Feb 25, 2017

Finance, News, Mail, Fantasy Sports, etc to name a few where they are still in the top three of the category.

Yahoo was never really a search company (even its founding, it was a "directory", not a "search"). Sure, they pretended fairly well from 2004ish (following their move off Google results) to 2009 (when they did the Bing deal), but the company never really nailed search or more importantly search monetization despite acquiring one of the first great search engines (Altavista) and the actual inventor of the tech Google stole for its cash cow Adwords (Overture).

detaro · on Feb 25, 2017

Isn't Yahoo search just a frontend to bing nowadays?

dorianm · on Feb 25, 2017

Some IPv6 internal connections, some websocket connections to gateway.discord.gg, rewrite rules for fruityfifty.com's AMP pages, and some internal domain `prox96.39.187.9cf-connecting-ip.com`.

And some sketchy internal variables: `log_only_china`, `http_not_in_china`, `baidu_dns_test`, and `better_tor`.

acqq · on Feb 24, 2017

Exactly, it looks that the cleaning people up to now only looked for the most obvious matches (just searching for the Cloudflare unique strings). There's surely more where "only" the user data are leaked and are still in the caches.

tempz · on Feb 25, 2017

The event where one line of buggy code ('==' instead of '<=') creates global consequences, affecting millions, is great illustration of the perils of monoculture.

And monoculture is the elephant in the room most pretend not to see. The current engineering ideology (it is ideology, not technology) of sycophancy towards big and rich companies, and popular software stacks, is sickening.

loup-vaillant · on Feb 25, 2017

How about clearing all the cache? (Or at least everything created the last few months.)

I've never seen anyone suggest it, I suppose It cannot or should not be done for some reason?

tuyguntn · on Feb 25, 2017

You are asking for deleting petabytes of data. Some sides are interested in owning such data.

mroi · on Feb 26, 2017

The real problem is going to be where history matters and you can't delete - for example archive.org and httparchive.org. There is no way to reproduce the content in the archive obviously, so no one will be deleting it. The only way is to start a massive (and I mean MASSIVE) sanitization project...

baby · on Feb 25, 2017

or clearing all the cache of Cloudflares website. I think that's do-able.

tuyguntn · on Feb 25, 2017

At this moment problem is not in Cloudflare's side, search engines crawled tons of data with leaked information, even though Cloudflare drops their caches, data is already in 3rd party servers (search engines, crawlers, agencies)

fantyoon · on Feb 25, 2017

That's why he asked that the caches of all Cloudflare sites are dropped, not by Cloudflare but by these 3rd parties.

jamaicahest · on Feb 25, 2017

That might work. If said 3rd parties were interested in helping. Most of them might be but it just takes one party refusing to help and then you've still got the data out there.

baby · on Feb 25, 2017

no I meant, get a list of all domains using Cloudflare, get that removed from the cache of Crawlers.

janwillemb · on Feb 25, 2017

Offtopic: "with all due respect" is often followed by words void of respect.

Symbiote · on Feb 25, 2017

He is British. "With all due respect" means no respect is due. I don't think it's possible to show less respect while appearing polite. In other words, them's fighting words.

http://todayilearned.co.uk/2012/12/04/what-the-british-say-v...

LeoPanthera · on Feb 25, 2017

This is perfectly fine if the amount of respect due is sufficiently low.

77pt77 · on Feb 25, 2017

Given the answers that cloudflare is giving I's say it's quickly approaching zero.

janwillemb · on Feb 25, 2017

Ha! Excellent point!

amenod · on Feb 25, 2017

Incredible. Are they really trying to pin it on Google? Yes, clearing cache would probably remove some part of the information from public sources. But you can never clear all cache world-wide. Nor can you rely that the part that was removed was really removed before being copied elsewhere.

The way I see it, time given by GZero was sufficient to close the loophole, it was not meant to give them chance to clear caches world-wide. They have a PR disaster on their hands, but blaming Google won't help with it.

Bino · on Feb 25, 2017

You really have to see this to really grasp the severity of the bug.

sah2ed · on Feb 25, 2017

The scope of this is unreal on so many levels.

20 hours since this post and these entries are still up ...

sidcool · on Feb 25, 2017

Can anyone provide some context please ?

Gigablah · on Feb 25, 2017

For anyone being linked directly to the post: the link back to the parent page is right on top: https://news.ycombinator.com/item?id=13718752

You can also click on "parent", and repeat as necessary.

asdfaoeu · on Feb 25, 2017

The bottom of the file has contents from another connection. Notably

    HTTP/1.1
    Host gateway.discord.gg

myth_buster · on Feb 25, 2017

Great(x3) parent https://news.ycombinator.com/item?id=13718752

Gigablah · on Feb 25, 2017

After 16 hours, those cached pages are still up...

nodesocket · on Feb 25, 2017

While it is good that you discovered leaked content is still out in the wild, your tone is somewhat condescending and rude. No need for it.

dcposch · on Feb 25, 2017

You might not know the history here. Tavis works at Google and discovered the bug. He was extremely helpful and has gone out of his way to help Cloudflare do disaster mitigation, working long hours throughout last weekend and this week.

He discovered one of the worst private information leaks in the history of the internet, and for that, he won the highest reward in their bug bounty: a Cloudflare t-shirt.

They also tried to delay disclosure and wouldn't send him drafts of their disclosure blog post, which, when finally published, significantly downplayed the impact of the leak.

Now, here's the CEO of Cloudflare making it sound like Google was somehow being uncooperative, and also claiming that there's no more leaked private information in the Bing caches.

Wrong and wrong. I'd be annoyed, too.

--

Read the full timeline here: https://bugs.chromium.org/p/project-zero/issues/detail?id=11...

baby · on Feb 25, 2017

I think this is a one-sided view of what really happened.

I can see a whole team at Cloudflare panicking, trying to solve the issue, trying to communicate with big crawlers trying to evict all of the bad cache they have while trying to craft a blogpost that would save them from a PR catastrophe.

All the while Taviso is just becoming more and more aggressive to get the story out there. 6 freaking days.

short timeline for disclosures are not fun.

jgrahamc · on Feb 25, 2017

There was no panic. I was woken at 0126 UTC the day Tavis got in contact. The immediate priority was shut off the leak, but the larger impact was obvious.

Two questions came to mind: "how do we clean up search engine caches?" (Tavis helped with Google), and "has anyone actively exploited this in the past?"

Internally, I prioritized clean up because we knew that this would become public at some point and I felt we had a duty of care to clean up the mess to protect people.

tlrobinson · on Feb 25, 2017

> "has anyone actively exploited this in the past?"

Has this question been answered yet?

jgrahamc · on Feb 25, 2017

We're continuing to look for any evidence of exploitation. So far I've seen nothing to indicate exploitation.

winteriscoming · on Feb 27, 2017

>> "has anyone actively exploited this in the past?"

Wouldn't your team now even have to decide how to deal with this even after some specific well known caches have been cleared? I mean there's no guarantee that someone may not have collected all this data and use it to target those cloudflare customer sites. Are you planning to ask all your customers to reset all their access credentials and other secrets?

dcposch · on Feb 26, 2017

Google Project Zero has two standard disclosure deadlines: 90 days for normal 0days, and 7 days for vulnerabilities that are actively being exploited or otherwise already victimizing people.

There are very good reasons to enforce clear rules like this.

Cloudbleed obviously falls into the second category.

Legally, there's nothing stopping researchers from simply publishing a vulnerability as soon as they find it. The fact that they give the vendor a heads-up at all is a courtesy to the vendor and to their clients.

baby · on Feb 26, 2017

> The fact that they give the vendor a heads-up at all is a courtesy to the vendor and to their clients.

It is the norm, and it is called responsible disclosure. You're trying to do the less harm, and the less harm is a combination between giving some time to the developers to develop a fix and getting the news out there for customers and customers of customers to be aware of the issue.

empath75 · on Feb 25, 2017

With all due respect, they should suffer a pr catastrophe.

tedivm · on Feb 25, 2017

In this case I feel your comment is misdirected. Cloudflare was condescending in their own post above in which he was replying to- "I agree it's troubling that Google is taking so long" is a slap in the face to a team that has had to spend a week cleaning up a mess they didn't make. It is absolutely ridiculous that they are shitting on the team that discovered this bug in the first place, and to top it all off they're shitting all over the community as a whole while they downplay and walk the line between blatantly lying and just plan old misleading people.