Hacker News new | comments | show | ask | jobs | submit login

You really want to see Cloudflare spend more time discussing how they've quantified the leak here.

What would you like to see? The SAFE_CHAR logging allowed us to get data on the rate which is how I got the % of requests figure.




How many different sites? Your team sent a list to Tavis's team. How many entries were on the list?


We identified 3,438 unique domains. I'm not sure if those were all sent to Tavis because we were only sending him things that we wanted purged.


3438 domains which someone could have queried, but potentially data from any site which had "recently" passed through Cloudflare would be exposed in response, right? Purging those results helps with search engines, but a hypothetical malicious secret crawler would still potentially have any data from any site.


It doesn't have to be a secret crawler. Just one that wasn't contacted by cloudflare (I didn't see any non-US search providers mentioned).


In other words, Baidu are currently sitting on a treasure trove of keys and passwords.


Possibly not, Baidu and CloudFlare have a well-documented long-term partnership.


Maybe there's much more to worry about Baidu's particular not-so-well documented, but longer-term partnerships.


Oh, absolutely. Baidu's relationship with their host nation should be a source of concern for us all. I've heard some interesting and unusual stories.

But they're probably aware of this issue and know enough to go looking to purge their caches.


Or Baidu know enough to not purge their caches. Think of the amount of tangible gratitude that their host nation would show them for access to some potentially tasty information....


Swap baidu for google or microsoft in that sentence and it still has the same problems. Every government 3 letter agency has a vested interest in the secrets.


Whether you believe it or not, there is actually a tangible difference between the relationships US corporations have with the USG vs other nations and their corporate entities.


They're not all 3 letters. (e.g. GHCQ, ASIO, CSIS, DGSI, etc.)


It's an expression


Well, purge their public cache, after taking a private dump and supplying it to those who would find value in such a thing.


>"I've heard some interesting and unusual stories."

Do you care to share or elaborate on this?


They're not my stories to share, I'm afraid.


+Yandex


I wonder if archive.org or archive.is have anything cached...


archive.is was red, meaning it uses Cloudflare....

www.doesitusecloudflare.com


The concern isn't that they use Cloudflare. The concern is that they're spidering the Internet, and therefore might be storing cached data that Cloudflare leaked.


while the internet archive / wayback machine do spider, I think archive.is only archives a site "on demand"


Yes but with all the people and even automated 3rd-party scripts making use of archive.is, it is practically a spider.


No TLS on this site?


correct


Have you asked them for an eta on your shirt?


You know a company isn't serious about security when their top security bounty is a t-shirt. Instagram has a better policy, for God's sake.


Instagram has been part of Facebook for over four years, so they are covered by the Facebook Bug Bounty: https://www.facebook.com/whitehat


I'd love to see some evidence that big bounties correspond to more exploits being found. In my experience, they tend to result in an increasing number amount of crap for your security team to sort through.


Plenty of companies that are serious about security don't do bounties. They're a real pain to administer apparently


I'd expect for a company that can MITM a good chunk of the Internet to incur that pain in exchange for all the money customer pay them.


fuck :(


Indeed, this is the point in the comment thread where you get the feeling the internet is broken.


What I'm wondering: how many fuckups like this need to happen for website owners to realize that uber-centralization of vital online infrastructure is a bad idea?

But I guess there is really no incentive for anyone in particular to do anything about this, because it provides a kind of perverted safety in numbers. "It's not just our website that had this issue, it's, like, everyone's shared problem." The same principle applies to uber-hosting providers like AWS and Azure, as well as those creepy worldwide CDNs.

Interestingly, it seems this is one of the cases where using a smaller provider with the same issue would really make you better off (relatively speaking) because there would be fewer servers leaking your data.


Cheaply fix DDoS attacks as Cloudflare does and people will move away. It's a big problem and the general consensus is, "just use Cloudflare to fix your DDoS problem!"


You might as well scrap http entirely, with or without the "s".

The web simply doesn't scale. The only way to fix DDoS reliably is peer-to-peer protocols. Which hardly ever happens because our moronic ISPs believed nobody needed upload. Or even a public IP address.


as someone who has been involved in a number of moronic ISP designs, operations, and build outs --- asymmetric access networks are designed that way due to actual traffic patterns and physical medium constraints.

you can argue "if everything was symmetric, then traffic patterns would be different" and you might be right, but that's not how the market went or how the "internet" started.

the client-server paradigm drove traffic patterns, and there was never any market demand or advantage by ignoring it.


That's not how the market went because the market is often moronic. Case in point: QWERTY. (Why QWERTY is actually the best layout ever is left as an exercise to the occasional extremist libertarian)

Yes, traffic patterns at the time was heavily slanted towards downloads. I know about copper wires and how download and upload limit each other. Still, setting that situation in stone was very limiting. It's a self fulfilling prophecy.

You don't want to host your server at home because you don't have upload. The ISP sees nobody has servers at home so they conclude nobody needs upload. Peer-to-peer file sharing and distribution is slower than YouTube because nobody has any upload. Therefore everybody uses YouTube, and the ISP concludes nobody uses peer-to-peer distribution networks.

And so on and so forth. It's the same trend that effectively forbid people to send e-mail from home (they have to ask a big shot provider such as Gmail to do it for them, with MITM spying and advertisement), or the rise of ISP-level NAT, instead of giving everyone a public IPv6 address like they all deserve (including on mobile).

There is a point where you have to realise the internet is increasingly centralised at every level because powerful special interests want it to be that way.

Regulation is what we need. Net neutrality is a start. Next in line should be mandated symmetric bandwidth, no ISP-wide firewall (the local router can have safe default settings), public IP (v4 or v6) for everyone, and no restriction on usage patterns (the ISP should not be allowed to forbid servers). Ultimately, our freedom of expression and freedom of information depends on this. They are messing with human rights.


> Peer-to-peer file sharing and distribution is slower than YouTube because nobody has any upload.

And because IP multicast doesn't work over the internet. If it did, even if merely to some limited extent, some asymmetries would be far easier to stomach.


> you can argue "if everything was symmetric, then traffic patterns would be different" and you might be right, but that's not how the market went or how the "internet" started.

It may not have been how the market went but it definitely was how the internet got started.


You say this as I look at my positively anemic upstream that makes browsing even simple Nagios pages painfully slow, and my ISP that doesn't offer anything substantively better without a massive increase in monthly costs.

The traffic patterns for higher upstream aren't there because they can't be there.


Decentralisation doesn't do a whole lot better. Just think about MTA or DNS vulnerabilities, for a start.


Or look at how many websites are still vulnerable to Heartbleed.


The Internet will remain periodically broken until we put a cost metric on the breaking (and working) times.


Which means any user who has used any service which uses CloudFlare, right? At least in theory.


How can I find out which services I have accounts with are using cloudflare? Or better have been using cloudflare in recent months? Assume I have a list of domains, where I have accounts.


We're compiling a list of affected domains using several scrapers here:

https://github.com/pirate/sites-using-cloudflare


I ranked your list of Cloudflare-using domains by their Alexa rank.

Sharing here in case anyone else finds it useful

(warning - it's 1.1MiB gzipped / 2.4MiB uncompressed)

https://polarisedlight.com/tmp/cf_ranked.txt

any domains outside the top 1 million are ommitted


Hacked this together to determine which ones out of the list are potentially using cloudflare reverse proxies. You could also send an HTTP request to them and look for the cloudflare-nginx Server header.

https://gist.github.com/dustyfresh/4d8d364ca4c6da465cfc7d817...


You can check IP whois records, but it'll be very hard to be 100% sure about any of them. For example, one of the examples from the bug report is Uber, which doesn't use Cloudflare for its home page but apparently does for one of its internal API endpoints.


There is a chrome extension named "claire"[1] which tells you if they use CloudFlare or not, but not sure about other browser (FF or else).

[1]: https://chrome.google.com/webstore/detail/claire/fgbpcgddpmj...


For Firefox, I just made this: https://github.com/traktofon/cf-detect


At this point, I would just start rolling everything. (And I have.)


[edit: correction]


No. 3438 domains were configured to expose this, and were potentially queried and logged by a far greater number of people. And yet other data (anything in cloudflare for months) could be exposed.

Potentially huge amounts of stuff might be exposed, but I have some assurances that "the practical impact is low" from someone I trust, so I think it's just a lot of random data. I'd still rotate all credentials which passed through Cloudflare in the past N months (and if I were a big consumer site NOT on Cloudflare, I might change end user passwords anyway, due to re-use), but I don't think it will be the end of the world.


It may seem like a nightmare Internet data security scenario, but it looks like Tavis is going to get a free t-shirt out of the deal, so let's just call it a wash.


What anomalies would be apparent in your logs if someone malicious had discovered this flaw and used it to generate a large corpus of leaked HTTP content?


That's also what I'm interested in. There's a lot of talk about the sites that had the features enabled that allowed the data to escape, but it's the sites that were co-existing with those that were in danger.

In terms of the caching, knowing the broken sites tells you where to look in the caches after the fact, but do you have any idea of who's data was leaked? Presumably 2 consecutive requests to the same malformed page could/would leak different data.


> Presumably 2 consecutive requests to the same malformed page could/would leak different data.

Wouldn't the second request be served from the CDN cache? Since for Cloudfare that particular page is a valid cached page, it would send you that same page on the second request.


Only if the leaked memory is in the response before the response is cached.


I don't know enough about the layers in the cloudflare system to say. Does it only apply to cached pages? What about https? They would have the ssl termination first and then these errant servers behind that - none of those pages would be cached, right?


Cloudflare doesn't cache HTML pages by default.


it seems to me you'd have to know at a minimum:

1. every tag pattern that triggers the bug(s)

2. which broken pages with that pattern were requested at an abnormally high frequency or had an unusually short TTL (or some other useful heuristic)

3. on which servers, and at what time, in order to tell

4. who's data lived on the same servers at the same time as those broken pages

to even begin to estimate the scope of the leak. and that doesn't even help you find who planted the bad seeds.


Here's a question your blog post doesn't answer but should, right now:

Exactly which search engines and cache providers did you work with to scrub leaked data?


Also, have you worked with any search engine to notify affected customers.

ex: Right now there is in an easily found google cached page with OAuth tokens for very popular fitness wearable's android API endpoints


Are you guys planning to release the list so we can all change our passwords on affected services? Or are you planning on letting those services handle the communication?


That list contains domains where the bug was triggered. The information exposed through the bug though can be from any domain that uses Cloudflare.

So: all services that have one or more domains served through Cloudflare may be affected.

The consensus seem to be that no one discovered this before now, and no bad guys have been scraping this leak for valuable data (passwords, OAuth tokens, PII, other secrets). But the data still was saved all over the world in web caches. So the bad guys are now probably after those. Though I don't know how much 'useful' data they would be able to extract, and what the risks for an average internet user are.


> The consensus seem to be that no one discovered this before now, and no bad guys have been scraping this leak for valuable data (passwords, OAuth tokens, PII, other secrets).

This is literally as bad as it gets, anyone trying to palliate the solution has something to sell you. You'd have to be an idiot to think that $organization (public, private, or shadow) doesn't have automated systems to check for something as stupid simple as this by querying resources at random intervals and searching for artifacts.

Someone found it. Probably more than one someone. Denial won't help.


Ah, gotcha. Thanks for explaining!


Myself and 4 other people I know all happened to get their reddit accounts temporarily locked due to a "possible compromise" in the past week or so, which has never happened to any of us before. Anyone else?


That would be unrelated to this. We haven't taken any action on any accounts because of this issue and have no plans to, as we (reddit.com) were unaffected.


Happened to me as well. If it's not related to CloudBleed, can you tell us specifically what happened? It's making me not trust Reddit.


If anything, it should make you trust reddit more! I don't know the exact details as to why your account may have been locked, but generally it will be because we're being proactive and have some signal that your account is using a weak or reused password.


Why was reddit on the list of affected sites, and how do you know reddit wasn't affected?


My reddit password failed a week ago, and I had to do an email reset. And I use a password manager.


In that case I'm even more inclined to think it might be because of Cloudbleed.


I've compiled a list of 7,385,121 domains that use Cloudflare here: https://github.com/pirate/sites-using-cloudflare


This list is misguided. It's just a dump of sites using Cloudflare's DNS, a hugely popular and (mostly) free service. The vulnerability only affected customers using Cloudflare's paid SSL proxy (CDN) service. The latter is a much smaller subset. Even then, only a subset of the SSL proxy users, those with certain options enabled that caused traffic to go through a vulnerable parser, were really impacted. I'm not sure a list as broad as this is helpful.


At least some of this is incorrect. The issue is NOT the pages running through the parser — the issue is the traffic running through the same nginx instance as vulnerable pages.


You are right in that other sites are affected but only the sites running through the parser would have leaked content in their cached pages.


This is not correct in my understanding: The sites with certain options enabled produced the erroneous behavior, but the data that would get leaked through this behavior could be from any site that uses Cloudflare SSL (as this requires Cloudflare to tunnel SSL traffic through their servers, decrypt it and re-encrypt it with their wildcard certificate). So if I understand correctly anyone using the (free) Cloudflare SSL service in combination with their DNS is affected.


I was wrong about the nature of the proxy issue, but right about DNS-only customers. Customers using only the free DNS service were not impacted by this at all, because traffic never flowed through the proxies.


Ah yes, sure if you only use DNS then your data never touches a CloudFlare server. Lucky you ;)


(whoops forgot to remove dupes, it's only 4,287,625) https://github.com/pirate/sites-using-cloudflare/raw/master/...


If I'm understanding correctly, that list would include not only the 3,438 domains with content that triggered the bug, but every Cloudflare customer between 2016-09-22 and 2017-02-18.


Can we trust it was only those domains?


Not really. If a site is using Cloudflare protection for only some of their subdomains they do not show on this list even if the site itself is in the alexa top 10k sites.

And of course all other sites that are not in alexa 10k are not in this list (if they are not on some other lists used, you can see the source of lists in the README of the Github repo).


No. Only Cloudflare customers using a subset of features of the SSL proxy service are impacted.

Cloudflare has a lot of customers who only use the free DNS service, for example.


Careful. It appears that any Cloudflare client who was sending HTTP/S traffic through their proxies is affected. A small subset of their customers had the specific problem that triggered the bug, but once triggered, the bug disclosed secrets from all their web customers.

You're not exposed if you never sent traffic through their proxies; for instance, if you somehow only used them for DNS.


I suspect there are a large number of Cloudflare customers that only use their DNS. I have a couple of domains in this category.

The DNS service is essentially free. It's an upgrade from most registrars' built-in DNS. It's a pretty robust solution, really -- global footprint, DNSSEC, fully working IPv6, etc.

My point is, the actual number of impacted customers was much smaller than the entire set of Cloudflare customers. There are lists in this thread that still reference hundreds of thousands (millions?) of sites, and that's just wrong.

(I agree on your first point though; I was confused about the nature of the proxy bug at first).


What I find remarkable is that the owners of those sites weren't ever aware of this issue. If customers were receiving random chunks of raw nginx memory embedded in pages on my site, I'd probably have heard about it from someone sooner, surely?

I guess there is a long tail of pages on the internet whose primary purpose is to be crawled by google and serve as search landing pages - but again, if I had a bug in the HTML in one of my SEO pages that caused googlebot to see it as full of nonsense, I'd see that in my analytics because a page full of uninitialized nginx memory is not going to be an effective pagerank booster.


Perhaps as a follow up to this bug, you can write a temporary rule to log the domain of any http responses with malformed HTML that would have triggered a memory leak. That way you can patch the bug immediately, and observe future traffic to find the domains that were most likely affected by the bug when it was running.

Or is the problem that one domain can trigger the memory leak, and another (unpredictable) domain is the "victim" that has its data dumped from memory?


I believe that's the real issue. Any data from any couldflare site may have been leaked. Those domains allow Google etc to know which pages in their cache may contain leaked info, unfortunately the info itself could be from any request that's travelled through cloudflare's servers.


Yes, the victim can be a different site. Cloudflare's post mentions this: " Because Cloudflare operates a large, shared infrastructure an HTTP request to a Cloudflare web site that was vulnerable to this problem could reveal information about an unrelated other Cloudflare site. " https://blog.cloudflare.com/incident-report-on-memory-leak-c...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: