Hacker News new | past | comments | ask | show | jobs | submit login
Cloudflare data still in Bing caches (ycombinator.com)
649 points by neonate on Feb 25, 2017 | hide | past | web | favorite | 246 comments



From the parent thread:

  The caches other than Google were quick to clear and we've not been able to find active data on them any longer.
  ...
  I agree it's troubling that Google is taking so long.
That's really the core issue here - the Cloudflare CEO singled out Google as almost being complicit in making their problem worse whilst that exact issue is prevalent amongst other indexes too.

The leaked information is hard to pinpoint in general, let alone amongst indexes containing billions of pages.

I can understand the frustration - this is a major issue for Cloudflare and it's in everyone's best interests for the cached data to disappear - but it's not easy, and they shouldn't say as such (or incorrectly claim that "The leaked memory has been purged with the help of the search engines" on their blog post).

This is a burden that Cloudflare has placed on the internet community. Each of those indexes - Google, Microsoft Bing, Yahoo, DDG, Baidu, Yandex, ... - have to fix a complicated problem not of their creation. They don't really have a choice either given that the leak contains personally identifiable information - it really is a special sort of hell they've unleashed.

Having previously been part of Common Crawl and knowing many people at Internet Archive, I'm personally slighted. I'm sure it's hellish for the commercial indexes above to properly handle this let alone for non-profits with limited resources.

Flushing everything from a domain isn't a solution - that'd mean deleting history. For Common Crawl or Internet Archive, that's directly against their fundamental purpose.


If Google hadn't noticed and saved them from this bug, who can say how long they might have continued spraying private data into the world's caches? Apparently, this had been going on for months prior. Heartbleed was exposed for years. This could have been too. Worse: malicious attackers could have discovered and quietly exploited it. For years.

The very last people in the universe Cloudflare should be criticizing right now are the Google security team.


Strongly agreed - Google's Project Zero helped them immensely and without them it would have continued to grow worse.

The length of time this went for was already disastrous. From the Cloudflare blog post, "The earliest date memory could have leaked is 2016-09-22".

As an example of how destructive it might have been beyond leaked information, note that Internet Archive have spent considerable time archiving the election and its aftermath.

donaldjtrump.com is served via Cloudflare.

Chances are it wasn't a domain that was leaking information - though we don't know as there's no publicly accessible list and no way to tell if they had the buffer overrun features active.

If it was however the Internet Archive now have a horrible choice on their hands - wipe data from a domain that will be of historical interest for posterity or try to sanitize the data by removing leaked Cloudflare PII details.

History will already look back at the early digital age with despair - most content will be lost or locked in arcane digital formats. Imagine having to explain that historical context was made worse as humanity deleted it after accidentally mixing PII into random web pages on the internet -_-


One of us misunderstands what happened.

> Chances are it wasn't a domain that was leaking information - though we don't know as there's no publicly accessible list and no way to tell if they had the buffer overrun features active.

As far as I my understanding goes domains that had the features on were leaking the data of other domains. So it's near impossible to tell who was affected.


You understand it right, but I think you get the conclusions wrong. If a site didn't have said features active, information from it could have leaked via other sites that had them on, and could be in archives of those other sites (= so it still should rotate secrets where necessary). But the archive of the page itself can be kept safely, since without the buggy features no leaked data from other sites has been included in it.


Ah, I get it now, thanks for explaining.


The key bit is that your page needs to have been in memory in that process for it to leak. So malformed pages that use the affected services would leak data from other pages that use those services.


Wouldn't this be hard to exploit from an attackers poitn of view? I mean there is no way to know what data was currently in RAM, it is at best a blind attack.


If I understand correctly, you could literally keep smashing f5 on an affected page and get a different chunk of memory every time. An attacker could have potentially collected a lot very quickly with a simple script.


That's, from what I understand, completely correct. Not only that, but because of the nature of the flaw, it's not clear to me that the attackers would be generating any real anomalies by doing so. They're just fetching a particular pattern of otherwise non-notable web pages.


But it's a blind attack that is much more likely to hit big players.

And seeing as cloudflare has some of the biggest pipes on the net, it would be easy to saturate a pipe to gather as much data as possible from any sites you can find that are affected.


Can't they archive the content (but not expose), and then figure out how to filter the exfiltrated data? No history need be lost.


To some degree that's what happens. Internet Archive and Common Crawl use WARC files and deleting an individual entry or set of entries from them can be markedly difficult. It's easiest to mark it with some manner of tombstone for later handling.

The complication comes with the edge cases they'd need to face and @eastdakota's call to "get the crawl team to prioritize the clearing of their caches"[1]. This is also work they now need to do due to Cloudflare's blunder.

For Internet Archive and Common Crawl, these aren't caches, they're historical information. You can't just blow that away - but you also can't serve it if it has PII in it. Either they need to find all the needles and filter/tombstone them - which we'd expect to be very difficult given leaked information is still sitting in Google and Bing's cache now - or wipe/prevent access to the affected domains.

Wiping the donaldjtrump.com would be historically painful but even temporarily blocking access to the domain would be problematic.

Finally, and most importantly, the fact that non-profit projects need to worry about how to exfiltrate such information is ridiculous anyway. Exfiltrating the information is non-trivial as well and may well be destructive with even a minor bug in processing. Having looked at much of the web, it can be hard to tell what's rubbish and what isn't :)

(Fun example: I had an otherwise high quality Java library for HTML processing crash during a MapReduce job as someone used a phone number as a port (i.e. url:port). The internet is weird. The fact it works at all a mystery ;))

[1]: https://news.ycombinator.com/item?id=13721644


One way cloudflare could mitigate the damage they caused is by coughing up some money to facilitate non-profits in their cleanup activities on behalf of cloudflare.

Polluter pays.


It seems like google and bing should be able to sue cloudflare for the hours they're spending cleaning up their mess.


They decided to cache the internet, as soon as they decide to copy it to their servers, it's their problem, imho.


Cloudflare seems to be of the opinion they deserve priority treatment. Personally I think Google, Bing and others are mainly doing this to protect the users of Cloudflare powered websites, not Cloudflare.


Well ya, nothing to do with helping out Cloudflare, but however it got there random creds are being served to the world from their product. No specific criticism on their efforts to fix it, just that they bear some responsibility to make it inaccessible from their servers as fast as possible, even if its still cached in a million other spots. Shitty task but it comes with the territory.


robots.txt


it's not my fault, robots.txt made me do it!


To be fair, access to the archives of donaldjtrump.com could be blocked anyway if the owners at some point decide to add a robots.txt blocking the Wayback Machine.


>>While I am thankful to the Project Zero team for their informing us of the issue quickly, I'm troubled that they went ahead with disclosure before Google crawl team could complete the refresh of their own cache.

Why the help was not 100% appreciated...


Cloudflare is physically incapable of posting a blog that doesn't make them out to be saviors of the internet; the only way they could spin this to make them seem like the hero here is to throw Google under the bus.


Cloudflare: "We've just saved you from Google's incompetence. You're welcome."


After seeing how Cloudflare have handled this and tried to downplay and even try to imply Google is the problem here, I suspect if this bug wasn't discovered by a third party and they themselves found it out at some point, they pretty much would have buried this issue as just another "memory leak".


I still can't believe they tried to call this a memory leak. They're obviously trying to downplay the severity of the bug by referring to it that way, but anyone who would know what a memory leak is would also know this isn't one.


Sort of funny that one of CloudFlare products is Cloud Web Application Firewall. This 'enterprise-class' solution somehow for months failed to detect all sorts of invalid and suspicious content being returned in index.htmls. It even has detection of 'Sensitive data exposure' listed as a feature. https://www.cloudflare.com/waf/


The way it's described makes it sound like the WAF looks solely at the request payload, and not the response.

They mention using the default mod_security rules. I know mod_security can look at the response body, but it is not a default setting.


Funny but totally expected, this kind of features is always snake oil with dead simple heuristics


Sure. But remember when CloudFlare breached an embargo to sell their services on Heartbleed?

They seem to care a lot more about marketing than anything else.

(On edit: which also means they have a fucking cheek whining about disclosure timings.)


I don't remember. Can you point me to an article?


That is utter crap. No such thing happened.


The comment from eastdakota seems to be political in nature; some on here have suggested he has an axe to grind with Google.

Why do humans politicize so much? That's one thing I'll never understand, and one of the reasons why I refuse to become a manager.


Any talk that is about Google's systemic influence or role is political by definition, nothing wrong with political. People and companies have political ideologies and it's good to discuss them.


Discuss them, sure.

But what does it really matter?


Well, it's easy to ignore politics until you're personally affected by it.


Too bad for those who have to suffer under the unjust policies of others that you must surely know exist.


I summarized The Stranger a long time ago, with a remark I admit was highly paradoxical: 'In our society any man who does not weep at his mother's funeral runs the risk of being sentenced to death.' I only meant that the hero of my book is condemned because he does not play the game.


Indeed. The life I live.


'politicize' is just a fancy word for 'dispute'. humans have emotions and we still have mostly reptilian brains.


I think that's a lazy argument. Just because Google does one good thing (finding the vulnerability) doesn't make all their actions good. The affected party (in this case Cloudflare) has incentive to handle the situation in a way that's limit the damage to their customers and their customers customers (which are the ones affected). The 'security researchers' (in this case Google) doesn't have that incentive. In many ways they even have the opposite incentive, to disclose while it's still active so they can get larger recognition. The release rules aren't made to limit harm to the world, but to limit harm to Google.

Tavis Ormandy linking to leaked data to settle some personal groll is a pretty good example how much they really care about data leaks. I guess when you've already leaked all your customers data to intelligence agencies around the world while moving it between data centers it's hard to keep any form of standards.

Want to actually argue your idea with any sort of competition I suggest going outside HN.


> The affected party (in this case Cloudflare) has incentive to handle the situation in a way that's limit the damage to their customers and their customers customers (which are the ones affected).

Cloudflare has a much stronger incentive to handle the situation in a way that limits the damage to their reputation. As you note, researchers also have an incentive to maximize the benefit to their reputation. But the incentives for P0 are much more closely aligned to the interests of the public here than are the incentives for Cloudflare. The information revealed by the bug is indeed really bad, and there's likely no possible way to tell which of Cloudflare's customers were affected, much less those customers' users. No company wants to acknowledge "yeah, a bunch of our customers' data was compromised, and we have no possible way to tell whether your data was among the data compromised". Their incentives push them to downplay the impact; to accurately describe the potential impact to their customers would probably be disastrous for them as a business. In contrast, downplaying the impact would be contrary to P0's incentive to benefit their own reputation.


I'd say it shouldn't be their responsibility to fix it at all - this is on CloudFlare. CloudFlare shouldn't be downplaying the issue to their customers.

They should be telling them to reset passwords and get users to do similar. And I say this as a generally very happy CloudFlare customer.

A few Bitcoin exchanges have realized the seriousness and have already contacted me and told me to just go ahead and enable 2fa, which I did even on empty accounts. One of which is going so far as to revoke and change their SSL certificates even though there's no reason to believe they were at risk.


> They should be telling them to reset passwords and get users to do similar.

Someone correct me if I'm wrong, but:

This isn't just about passwords. Any memory from those proxy servers could have been dumped into these responses, meaning plaintext content of all kinds. The data needs to be purged in addition to telling customers that everything they did in these months has a chance of being out there in plaintext in the hands of random people.


Yes - definitely.

I was just saying passwords because they're something that could have leaked which could be used to actively compromise accounts and result in further attacks now. And probably the only thing most sites can do anything about at this point.

Of course if you were sending private keys, api keys, etc over an SSL connection - those all now need to be treated as exposed.

Some sites may also have special concerns - btc-e for example, a fairly large bitcoin exchange allows users to withdraw to a "btc-e code" which can be deposited by another user of the site. It'd be a serious mess if any of that kind of thing leaked.


Cloudflare f*cked up big time. It's not the responsible for other third parties to clean up the mess. Especially it won't happen 100% world wide anyway, there are too many parties who have indexes (public or private).

In the end Cloudflare should be held responsible. It also puts the whole CDN service business in a bad light.


I never liked CDN's because I don't like monocultures and the whole 'we decrypt your traffic but we can be trusted' bit defeated the purpose of HTTPS to begin with.


It's convenient for content publisher and reduce infrastructure costs a lot. Just have the authenticated/editing part served directly and the published content trough cdn.


So, retrospectively speaking: Was it worth it?


to us, yes. a large customer signed up and we had a 300x traffic spike overnight, all eaten up by cdn. the first impression for a startup matters and them testing the service unannounced could have gone wrong in so much many ways.. cloudflare literally saved the day.

the publishing part needs to be designed to leverage caching properly tho. however it's much easier to setup cloudflare+custom rules than a cdn push-on-change service


Except it is the responsibility of the third parties to clean up the mess. Cloudflare has no control over their infrastructure.

Forget the blame game for a minute: there is personally identifiable information of the search engines users sprayed all over their cache. At this point it doesn't matter who put it there, there's a moral responsibility to clean this up, for the greater good.


Yes. And once it is cleaned up, send Cloudflare the bill. And once they go bankrupt, send the executive officers and the shareholders the bill. And once they go bankrupt, make sure they are still held responsible for the damage inflicted and that nobody that incompetent tries to run a major business damaging millions of people’s lives ever again.


And once you have done all of that you will have to nationalize all but the smallest businesses because no one in their right mind will invest in anything any more.

As a result, tax payers will have to foot the bill for all further mistakes.


I guess we will have to agree to disagree on this one. If nginx does the same, or apache would you send them a bill also? This is a really really shitty mistake. CF hasn't handled it well. But it could have happened to anyone.


You are of course responsible for the tools you use, so if I use nginx and it has a bug, I won’t send them the bill.

A better analogy would be to crash your client’s car into the building of a competitor and then complain if the competitor doesn’t repair that car immediately because it makes you look bad and it is for the greater good for your client to have a working car. This is what happened! Cloudflare took their clients’ cars (their data), crashed them into random buildings all over town (Google, Bing etc.) and now complains that Google and Bing don’t get their act together to repair those cars (remove the private data from the public internet).

If Amazon Prime took your car to do your grocery shopping and then crashed it into Walmart, would you really complain that Walmart can’t send the bill to Amazon Prime for the clean-up?


>Cloudflare has no control over their infrastructure.

It has via robots.txt.


I agree. Also, I have to admit to liking this because of the way Cloudflare has treated Tor users. Feels like karma.


I suspect Google (and others) must have some mechanism to comply with EU 'right to be forgotten' law.

https://www.theguardian.com/technology/2016/feb/11/google-ex...


And it doesn't stop there. I bet the bad guys (or governments) are already crawling and storing the leaked data to their own databases for later analysis.


I bet NSA didn't even have to make any network requests on this one. They either got backdoors in most cache servers or have already duplicated their contents.


I would be shocked if the NSA hasn't been running their own crawlers for years. Because, you know, terrorists may be chatting on pages not allowed by robots.txt. And just because other crawlers respect it, doesn't mean we need to...


Also they don't seem to realize how many people have caches.

Crawling the web is something that a lot of people do these days for everything from SEO (Seomoz and every other major SEO tool probably has a crawl) to web data companies (Email Hunter, SimilarWeb, etc.) to national security agencies.

I'd be surprised if there weren't thousands of cached crawls out there that managed to grab some of this data.


Pretty obnoxious of cloudfare to act this way in the wake of their own mistake.


This is a burden that Cloudflare has placed on the internet community.

Nitpick: you're referring to the web (i.e. HTTP(S)) rather than the internet (UDP, TCP, IPv4, IPv6).


Is not the web part of the internet?

If I place a brick on the cushion on my couch it is correct to say I have placed a brick on my couch.


I really enjoyed this analogy. Is this a common phrase or did you just come up with it? Why a brick?


It's just the first random object that came into my mind.

It just came to me. I'm glad you liked it. :-)


A spoon, or a book, would work just as well IMO


I've had a fairly high opinion of CF, apart from their Tor handling and bad defaults (Trump's website requires a captcha to view static content.) Yeah I'm uncomfortable with them having so much power, but they seemed like a decent company.

But their response here is embarassingly bad. They're blaming Google? And totally downplaying the issue. I really didn't expect this from them. Zero self awareness- or they believe they can just pretend it's not real and it'll go away.


I'll just use their own logic they use against us when we ask them to take down the DDoS attack-for-hire sites they host while they are attacking our servers:

Why does Cloudflare think Google needs to do anything here? It's not illegal. Google shouldn't do anything unless they receive a court order to do so! Why does everyone expect Google to do enforcement here? Google has the right to post this information, it's not illegal to do so, therefore they shouldn't do anything at all about this. Don't you care about freedom of speech? If Google removes this, it creates a slippery slope that will lead to the entire internet being censored.


That's a childish and ignorant argument.

Google removes info from its index and caches all the time. It's not unreasonable for CloudFlare to expect them to remove this issue. It's just a matter of scale and difficulty.


I'm not sure if you're playing along or if you entirely missed his point. Of course it's a bad argument, just like when it is used by cloud flare to protect the ddos site they are hosting while they often remove other stuff. That's the hypocrisy he is pointing out.


But it's not hypocritical in the least. CloudFlare is not in the business of removing customers based on site content. Google is. It's entirely reasonable for CloudFlare to expect Google to clean the caches without having to consent to being internet cops.


CloudFlare actually is in that business.

They actively crawl for piracy sites and isolate them all on one IP, they crawl for child porn sites and straight up remove them.

Why not also remove DDoS sites?


Child pornography sites are easy to isolate and remove. There's a database of md5 hashes for images that are considered illegal; if you're a CDN you are likely already calculating the md5 hash of all images passing through your system as part of your caching process.

If you find any site has a large number of illegal md5 hashed images going through it; then just remove the site.

Piracy sites can be isolated by checking for keyword clusters or seeing if they're directly serving torrents, or banned hashed content.

We do something similar at work to sanitise image data---by policy no one actually looks at the content, but if you match against previously banned content for DMCA reasons, we drop your data.

DDoS sites though? How can you tell some site individually is part of a network to DDoS another?


These sites aren't performing DDoS, they're advertising DDoS services/tools. This makes them prime candidates for targeting by other DDoS services, hence the importance of being behind CloudFlare. If you can use keyword clusters to find piracy, DDoS advertisements aren't too far away.

Or they could just remove them when someone points them out to them, which is even harder to explain when you're already working to suppress child pornography and piracy sites using your services.


MD5 hash detection is easily avoided by changing the files by one bit. But if they're using PhotoDNA that's actually quite plausible, and they have my full support (err, I mean, censorship! Slippery slope! Where's the court order?)

Keyword clusters would work just fine for flagging DDoS attack-for-hire content:

https://www.google.com/search?q=ddos+booter

It's extremely obvious what these sites are up to.


They say they are 'legal' and perform 'stress tests', and 'distributed performance analysis' or 'real world testing'.

Granted I'd never use something as shady sounding as ddos.xyz, but they are plenty of legit companies that do the exact same things.

You'd need a bunch of manual review, and even then it'd become a "we think they're shady" instead of a "they're objectively sharing known illegal content" like it is with illegal pornography or copyright content.

Cloudflare understandably doesn't want to get into the business of being a company that manually reviews the internet (in how many languages?), and boots people who don't meet its tests.


> Why not also remove DDoS sites?

Because that hurts their business model. They offer commercial protection against DDoS and as such it is not in their business interest.

(Pretty sure you already knew that, just spelling out the obvious)


Next you'll be arguing that it's in the interest of police departments to secretly fund criminal gangs.

Your argument is really easy to make. It's also inane and pretty damn insulting to CloudFlare.


neither is google in the bussines of removing the content, they have to remove it if they get the legal request, or they remove it based on some internal rules.


Agree that it's a shame that it doesn't really feel like they're owning up to how bad it was.

But I wonder if it will just mostly go away. Luckily for cloudflare this is a pretty random sampling of people around the country and world. Unless someone has put together a big data set from the caches and decides to leak it or inform the victims, it seems like most people whose accounts do get taken over from this will have no way to trace it back to this bug.


For sure, there are assholes compiling cache data :(


> The carder forum CVV2Finder claims to have more than 150 million logins from several popular services, including Netflix and Uber.

http://securityaffairs.co/wordpress/56650/data-breach/cloudb...


...no way to trace it back to this bug.

This is a strong argument for simpler systems without multiple third parties as links in the chain.


> Agree that it's a shame that it doesn't really feel like they're owning up to how bad it was.

Do you expect them to close house or ask their clients to leave?


It's not surprising though & it's probably going to keep happening going fwd, and not just at CF. There are only ~10 megacap companies that can afford to hire & retain dedicated hardcore, top-shelf netsec teams to fastidiously audit every production SW module for problems like this one, and proactively rewrite things that look sketchy even if no specific bug has been encountered yet. At most other firms, security teams are still largely reactionary.


I've had trouble finding a competitor that offers the same service with DDoS mitigation, WAF, and CDN for a flat fee. Every other service charges per request and/or by bandwidth. Do you know of any comparable alternatives?


Maybe OVH?

https://www.ovh.ie/cdn/infrastructure/

I know their ddos is pretty good, and it is tiered flat fee, and fairly cheap.


By tiered flat fee, does that mean that if my little website was DDoSed, they'd stop serving traffic once the amount of data I've paid for is used up? I'd be fine with that. Being billed for more than the data I wished to pay for would mean doom.


You can also get dedicated server from them. I was hit with DDoS of about 40Gib/s, the filtering kicked in, users still were able to use website.


NearlyFreeSpeech seems to fit your needs:

https://www.nearlyfreespeech.net/


NFSN, while neat, is not a CDN.


Yes, if you exceed the bandwidth, they revert the DNS back to whatever the defined back end is.


Is there an option to not revert DNS but instead to just temporarily remove the DNS records or something in case one doesn't want the IP addresses of the origin servers revealed?


Not that I can see in the documents, but I imagine you could handle that yourself by not serving traffic if the request body doesn't have the appropriate proxy headers.


I have my origin servers firewalled to allow only traffic from CloudFlare servers and would do the same in case I switched to OVH, but even so it would cause a lot of trouble if the origin server IP addresses were revealed since this would let the attackers target the network I'm on directly.


That link goes to their CDN offering for me. Do you have more info about their ddos protection?


They don't have documentation on how their DDOS protection specifically works with the CDN, but there is this:

https://www.ovh.com/us/anti-ddos/


Regardless of this bug or their business practices, this is why Cloudflare has gotten so big. They have a much better pricing model compared to other CDNs.


Try Incapsula?

Never used them. Had a quote sitting on the table at one point.


> Trump's website requires a captcha to view static content.

I'm not particularly adept at security. Why would this occur?


CloudFlare typically shows a CAPTCHA when a site is accessed by an IP with a bad reputation. This is mainly to block access to spammers and evil crawlers, but the IP addresses used by Tor exit nodes often have bad reputations, as they are used for all manner of things.

For people who do most of their browsing via Tor, it can get annoying to be repeatedly presented with a CAPTCHA.


For jet.com and Trump's site, this would happen on residential connections in Guatemala. Jet.com fixed it after I told them. Most likely they just leave CF's bad defaults.


It's been pretty entertaining watching taviso's attitude towards CF go from "we trust them" to "dude, you're a tool".

I kind of understand what CF is doing here: they've screwed up, there's no way for them to clean it up, so all they can do now is deflect attention from the magnitude of their screw up by blaming others for not working fast enough in the hope that their fake paper multibillion dollar valuation doesn't take too big a hit.

Still a dick move though. Maybe next time don't use a language without memory safety to parse untrusted input.


> Maybe next time don't use a language without memory safety to parse untrusted input.

Untrusted input is safely parsed by programs written in languages without memory safety all the time. In fact, most language runtimes with memory safety are implemented in languages _without_ memory safety.

What's to criticize here is parsing untrusted input in the same memory space as sensitive information.


How would you rewrite websites to optimize them without parsing untrusted input in the same memory space as sensitive information? The thing you're trying to change (the HTML) can have PII or be otherwise sensitive.

(I used to work on Google's PageSpeed Service, and if it had had the same bug I think we would have been in the same situation as CF is now.)


> The thing you're trying to change (the HTML) can have PII or be otherwise sensitive.

Sure it can. But leaking PII from the thing you're parsing and leaking PII from any other random request isn't the same thing.

I understand the performance implications (and the added effort) of sandboxing the parser, but I'm arguing for it anyway. The mere presence of a man-in-the-middle decrypting HTTPS and pooling cleartext data from many disparate services in a single memory space is already questionable (something for Cloudflare customers - and not Cloudflare itself - to think about) but adding risky features into the mix shouldn't be done without as much isolation as possible.

Let's face it: parsers are about the most likely place for this sort of leakage to happen...


Actually, thinking more, we designed PSS to run in a sandbox specifically because we were parsing untrusted input. But leaking content from one site into responses from other sites would still be possible, because I think we didn't reinitialize the sandbox on every request (way expensive) and each server process handled many sites. Fix that, and then there's still the risk of leaking things between sites via the cache.

It's definitely possible to fix this (new sandbox per request, cache is fragmented by something outside of sandbox control) but I'm not sure the service would make sense economically.


The obvious solution that occurs to me would be to isolate the "email protection" feature to a second process, thus limiting the scope of the catastrophe to only those sites using the feature and not everyone.


But other than that, they didn't even do the most basic testing. I mean, they wrote an HTML parser and didn't test it with malformed input?! It's all over the place on the Internet.


So much this! It's 2017. I get that maybe, _maybe_ at scale you need the fastest code you can get.

But you'd damn well better fuzz-test the crap out of it before it goes anywhere near a network connection.


> Maybe next time don't use a language without memory safety to parse untrusted input

They use Go more than any other companies, your sentence seems a bit odd.


But the code responsible for the bug was generated C. That they use Go in other applications is irrelevant.


Any huge codebase will have some C at some point. You use Linux? It's in C. Plus they mentioned that they were trying to replace the vulnerable code before that bug happened.


The bug happened in September of last year. Since then we can assume they've been leaking 100k-120k of requests per day, unless CloudFlare makes a statement with hard evidence to the contrary. And that assumes no attacker noticed the bug and mined them for private data.


Why is Cloudflare underplaying this issue? All data that transited through Cloudflare from 2016-09-22 to 2017-02-18 should be considered compromised and companies should act accordingly.


>Why is Cloudflare underplaying the issue?

I suspect the random nature of the overflow plaintext spewed out into caches will be difficult to leverage into a statistically significant attack against any particular customer. If CF's bottom line is unlikely to be impacted, why not downplay the issue and refer to it in the past tense?

There may be significant long lasting damage to CF's reputation amongst vulnerability researchers, but that's a tiny subset of the population and statistically insignificant to a company that ~10% of internet traffic flows through.


This all is assuming that no-one has found the vulnerability before. My understanding is that, once you figure out what kind of request causes the overflow, you can pretty much just spam CF with it, getting new garbage data every time. If someone was deliberately doing that, they could have more data than all the indexes combined. And the worst part is that we'll never know.


Does CF redirect all non-HTTP traffic to HTTPS? If not, NSA could have passively intercepted tons of leaked data, and all it would take is for one of their analysts - people paid to find stuff like this - to notice a single out-of-place leaked secret and trace it back to CF.


The way I see it, it's not "chances that this individual is compromised are too low to worry about", it's "chances are this individual is compromised because even though everybody wasn't compromised, anybody was, and we don't know who or to what extent.

We are all familiar with "better safe than sorry", but another no-brainer when it comes to security is "remove all uncertainties".

Not only do Cloudflare's CEO and CTO not seem to operate by these golden rules, but they are spreading misinformation to others about the importance of respecting these basic tenants of security. That shows that they place their profit margin over the customers who give them that margin in the first place.

They do not consider routine corporate security, potential legal backlash to their customers, or the safety of their customers' customers to be the most important thing and that is unacceptable for a company that is basically trying to MITM the internet.


On the other hand, the random nature of the overflow plaintext also means it's perfectly possible for one or several keys to various kingdoms having fall into unsuspecting laps. Whether that happened or how much of those will be discovered by bad actors and to what effect we cannot really know for sure, but I it already says not so great things about those who downplay it.


> There may be significant long lasting damage to CF's reputation amongst vulnerability researchers, but that's a tiny subset of the population and statistically insignificant to a company that ~10% of internet traffic flows through.

Yes, security researchers boycotting CF isn't going to hit their user metrics directly. But what happens when security researchers advise people to use CF because of their poor security/handling of security?


Their whole business model is basically "funnel all your secure stuff through our servers. We won't compromise it, we promise". It's be pretty surprising if they didn't try to underplay it.


Their business model also seems to be "get DDOS protection from us by hiding your real IP behind our proxy (while we also hide the IP of the DDOS services to protect against in the first place)".

Aren't they held liable for taking money from those conducting DDOS attacks in the US?


A closely held company with a multibillion dollar valuation underplays a serious threat to their business... Hard to believe!


What do you expect them to do? They're essentially parasite of the Internet, has always been.


Rule #1 of breaches: you can't unbreach

At this point if you don't consider all data that was sent or received by CloudFlare during the "weaponized" window compromised, you're lying to yourself.


I briefly touched base with Cloudflare's Product Management and my impression was that they were overconfident and snobbish in every aspect, which is kind of opposite to what I'd expect from the company like this. Being humble never hurts.


Does Cloudflare have complete logs to rule out that someone noticed this before taviso and used it to massively exfiltrate data by visiting one of the vulnerable sites repeatedly?

If they can't tell, someone may now be sitting on a lot of very juicy data, far beyond what may be left in these caches.


> by visiting one of the vulnerable sites repeatedly

I mean, how could CloudFlare, or anyone, possibly differentiate this from normal scraping/polling/ manual F5 refresh behavior? This sounds like a PhD thesis.

I guess you are asking CloudFlare to quantify the amount of distinct bytes of unauthorized data sent to any particular user agent? But then, any sophisticated attacker would rotate IPs, UA identifiers, and probably even between vulnerable websites, if they had known about this vulnerability.

I don't think it's reasonably possible to rule this out, even with a massive dedication of investigative resources. Like the other commenter said, it's wisest to assume it happened.


It should be possible to show statistical significance in access pattern changes from before, during, and after the window to the sites that were leaking data.


Yeah, particularly because the specific HTML that causes the problem is known.

If you have perfect information about what resources were requested when, you can look for a spike in queries for vulnerable resources. Once you see that, you know there was an intentional exploit and can start to look at who drove that spike, what was leaked, etc.

The problem is that we're talking about l huge amounts of data. I'm skeptical that CF has lots of sufficient length and detail to conduct this analysis, but have no real knowledge about their forensic capabilities.


But the specific HTML that causes the problem is a common error that can be seen on plenty of pages, and the window which the vulnerability was active for was huge. How could you know that someone is intentionally using that erroring page to exploit the vulnerability?


We should just assume that all entities with both the resources and motivation to discover and exploit massive security vulnerabilities now have a database of every object in their NGNIX modules' memory since the leak sprung roughly 4 months ago.

Right now they'll be querying for political blackmail and insider material, then they will turn to Joe Schmoe who only gets his news from the television, where no doubt this will never be discussed. They will effortlessly and listlessly categorize the data largely as an automated process.

If we expect anything less, it's possible we are underestimating our adversaries. Better to overestimate them and feel embarrassed later when you later find out you were just being paranoid.


I haven't seen an attempt by Cloudflare at claiming that this definitely didn't happen. They may still be working on it. It's possible that the question is basically unanswerable even with logs.

As you say, in the presence of uncertainty it's most prudent to assume that this actually happened.


They seem to be presenting some dubious calculations made to imply that it was highly unlikely to happen.

The reason why I consider them dubious is that anyone simply searching the name of some HTTP headers in Google et all could have stumbled into this. I don't find it at all unlikely to happen in a timespan of 5 months.


The odds that Google had the first team of researchers to trip over the bug is low. But we know that they were the first team to disclose the vulnerability, and the only reason not to disclose it is if you wanted to exploit it.

So the key question really isn't "how likely was someone to find this", but "how likely is it that Project Zero was the first". I think it's hard to estimate odds, but I'd be surprised if it was even as high as 50%; there's too many teams, individuals, freelancers, state actors, etc. actively engaged in looking for this kind of thing.


Many people probably tripped over the bug but didn't know what it was.

The data it reveals isn't guaranteed to be obviously private and exploitable. It can just look like a valid but useless response, or a invalid and corrupted response depending on what you were looking for in the first place.


At this point we can mostly disregard what they claim did or did not happen, considering that they also claimed that all leaked memory had been purged from search engine caches.


I really hope people don't lose sight of how helpful Project Zero has been in finding ongoing vulnerabilities and making the Internet a better place.

There is a bit of tension between cloudflare and taviso over the timing of notification, but that is vanishingly insignificant overall.


Just please tell me the people who found the issue got their free t-shirts.


I was hoping someone would sell a neat commemorative T-shirt like they had for Heartbleed...


T-shirts. He only got one.


...and it had half of a customer logo on it, along the bottom edge.


Cloudflare's email to customers has been calling this a "memory leak", which means something entirely different than a "secret data disclosure".

One causes swapping. The other causes a month of extra work.


I'm compiling a list of affected domains (with data found in the wild): http://doma.io/2017/02/24/list-of-affected-cloudbleed-domain...

If you find some samples with domain names / unique identifiers of domains (e.g. X-Uber-...) you are welcome to contribute to the list: https://github.com/Dorian/doma/blob/master/_data/cloudbleed....



A list of Cloudflare DNS customers is almost completely unrelated to the list of sites affected by this bug.

Being a DNS customer doesn't mean you're using CF SSL proxy and using the proxy service doesn't mean you're using DNS


That's beyond flawed because he assumes any site that uses cloudflare's NS is using the proxied services. Which is 100% utterly _wrong_.


But he explained this at the very top of the readme?

>This list contains all domains that use Cloudflare DNS, not just the Cloudflare proxy (the affected service that leaked data). It's a broad sweeping list that includes everything. Just because a domain is on the list does not mean the site is compromised, and sites may be compromised that do not appear on this list.


Originally that wasn't the case, iirc, a bunch of angry people complained to get it added.


What is the correct way?

Honest question...

Out of hundreds of passwords I potentially need to reset, I'd like to prioritize.


Passwords and domain lists aren't a good answer. Explained here: https://gist.github.com/raggi/0d22757fee6eff4bb93a5731215060...


You have resolve the DNS and see if it points to one of Cloudflare's reverse proxies.


And even then, no guarantee. One could host a static shopwindow site on his own, and use CF for the actual backend of a mobile app under a different domain that nobody knows about.

There is no real way to know what has leaked and from whom. The only ones with real info are CF and it's clear from the amount of sites they've missed in their purge-requests that even them don't really know.


Yup, I started it because he didn't merge my pull request :D https://github.com/pirate/sites-using-cloudflare/pull/55


It seems that, due to the Cloudflare's confusing disclosure, it's still not clear what and how is leaked. What I personally observed, just by following the discussion and the links to some examples:

- there is a smaller number of sites that used some of the special features of Cloudflare that allowed leakage for some months, according to what Cloudflare said.

- it seems the number of the sites was much bigger for some days, according to what Cloudflare said.

- the data leaked are the data passed through the Cloudflare TLS man-in-the-middle servers -- specifically not only the data from the companies, but the data from the users, and not only the data related to the sites through which the leak happened, but also other sites that just happened to pass through these servers. Again, also the visitor's data, both directions are leaked. From the visitors, their location data, their login data etc. As an example: if you imagine the bank which used Cloudflare TLS, in the caches could be both the reports of the money in the accounts (sent from the bank to the customers) and the login data of the customers (sent by the customers to the bank), even if the bank site hasn't had the "special features" turned on. That's what I was able to see myself in the caches (not for any bank, at least, but the equivalent traffic).


This is a good reading of it.

To be clear, the SSL and caches are isolated from the process that handles transformations of web pages and neither of those leaked anything.

All traffic that is "orange clouded" passes through the transformation layer and may have leaked by any of the pages on the sites that had this unique set of features enabled (the cause) and also had broken HTML (the trigger) if they happened to be in the memory immediately after the broken HTML.

Which means that a small number of sites (3,438 domains - cite jgc) were able to leak the first bit of memory for requests located in memory after the page request of a broken page on one of those small number of sites... and this other memory could have been any other page that is proxied by Cloudflare.

Is it huge? Absolutely, because the leaked pages could have contained anything, especially in the headers which will have been included.

Is it a lot of pages? The scale of Cloudflare means no matter how small the fraction affected it adds up, so yes. The sum of pages that will have leaked data is horrifying because even a single page is a page too many to leak.

Are you a customer, are you paranoid and want to know what to do? OK, then change your origin server IP addresses, and expire your user sessions/cookies. Beyond this, you will need to look at your own web application to determine whether in the first bit of a response from your origin servers you include sensitive data, and from that what you feel is an appropriate action.

The only thing I'm doing to my sites is working through an expiry of user sessions. Even then, I think the chances that I was affected remain vanishingly small but expiring sessions is the responsible thing for me to do.

Note: I work at Cloudflare but wasn't involved in this security incident beyond helping to find data in caches. Additionally I run 300+ websites that are all behind Cloudflare web proxy so I understand that perspective extremely well.


> were able to leak the first bit of memory for requests

It's some kilobytes of browser requests or server responses that are leaked in the samples I have seen, if I remember correctly. Much more than "the first bit."

Just to be clear.


Yes, apologies for my phrasing... the first bit didn't mean "computer bit" meant "human description for the first part of a web response (headers and body)".

To be very precise, I think jgc mentioned that up to 4KB from the bounds of the initial request could have been leaked, where a good section of that was the internal server-to-server communication certs, the raw headers as visible during the internal processing of the request, and then part of the response body that follows... this may have been encrypted or compressed and could appear as garbage.

The focus for site owners on Cloudflare should be on "What do I put in headers that may be sensitive?" or "What URLs do I regard as being secret/unadvertised?".

Typically that will be session cookies and access_tokens. Hence my advice, expire and roll all sessions.

Headers include the Cloudflare internal headers, and so includes origin IP addresses too, so if those are secret for you (i.e. you have previously been the target of a DoS and are using Cloudflare to hide those IPs) then you'll want to change your origin IP addresses too. Though if you have been the target of a DoS then you probably should use iptables to only allow web traffic from Cloudflare IP addresses.


Millions of domains are on Cloudflare. We can't tell how many of them were affected.

Either we can search for obvious strings like X-Uber-* and try to scrub them one by one, or we can just nuke the caches for all the domains that turned on the problematic features (Scrape Shield, etc.) anytime between last September and last weekend. Cloudflare should supply the full list to all the known search engines including the Internet Archive. Anything less than that is gross negligence.

If Cloudflare doesn't want to (or cannot) supply the full list of affected domains, an alternative would be to nuke the caches for all the domains that resolved to a Cloudflare IP [1] anytime between last September and last weekend. I'm pretty sure that Google and Bing can compile this information from their records. They might also be able to tell, even without Cloudflare's cooperation, which of those websites used the problematic features.

[1] https://www.cloudflare.com/ips/


Nuking the caches is one thing, but what about services like the Internet Archive whose job it is to hang on to these pages? Pages with leaked data are clearly difficult to identify; removing the leaked data without nuking the document may be impossible, at least in an automated fashion. Are we supposed to erase five months of history from the affected domains?

This CloudFlare breach seems to have put a lot of people in a tough spot, but it feels like it's put archivists in an impossible position.


I agree that nuking entire domains would be bad for the Internet Archive. But I don't think it would be overwhelmingly difficult, nor controversial, to identify and remove the vast majority of "contaminated" documents. This applies to the Internet Archive as well as major search engines.

First, we're talking about raw memory pages, not merely malformed HTML. Those memory pages might contain valid HTML, but most of the sensitive information is in the headers, not HTML markup. It won't be very difficult to write a script to identify documents where random headers and POST data have been inserted where they don't belong, or where the markup is so obviously invalid (even compared to similar documents from the same site) that there is a high probability of contamination. Having a full list of contaminated domains would obviously help a lot, because we'll only have to deal with thousands of domains instead of millions.

Second, contaminated documents by definition contain information that is NOT what the publisher intended to be crawled, indexed, or archived. So there should be less resistance to removing them.

Finally, most of the contaminated domains used features such as Scrape Shield that were intended to deter archival. It's as if the domain had a robots.txt that said "User-agent:* Disallow:/". I'm not sure whether it's even possible for the Internet Archive to archive such domains. If they can, maybe they've been doing it against the publisher's wishes. If they can't, well, there's no problem to begin with.


Archives don't delete stuff, nor do they have much capacity to do much computation on their archived data. Whereas if blekko still existed as a search engine, I'd just push code to refuse to show cached pages or snippets containing text that likely means the CloudFlare problem. 15 minutes work, and the underlying data would expire fully in a couple of months.

So I would completely disagree with your speculation about what's easy or hard. (Note that I've worked at a search engine and an archive.)


At the very least that'd cost Internet Archive a lot of human resources. I think it's reasonable to assume that if Cloudflare wants them to fix the problems they themselves created, they should cover their costs of such operation.


My understanding is that scrape shield protects against certain abusive patterns of robot access, but not all URLs on the domain would ban robots. So archive may have many contaminated pages that scrape shield was enabled for but did not ban all robots from crawling based on various criteria.


Good time to shoe in some forced dmca content removal


Yeah, abusing the DMCA because it's the only tool you can think of, that's a great idea.


Come to think of it, our society grants search engines the privilege of keeping copies of copyrighted material in exchange for the services they provide.

It might not be unreasonable to say that this privilege comes with a certain responsibility to ensure that those copies do not cause excessive harm to others.

So although Cloudflare is the one that fucked up, Google et al. also have a responsibility to do whatever they can to protect the public. They should do what they can, with or without Cloudflare's cooperation.


I don't believe so. Cloudflare had this responsibility and they messed up. They are the only ones liable.


Cloudflare cannot delete documents from other company's caches. The actual deletion must be performed by Google and Bing, who can then sue Cloudflare for cleanup costs if the latter is unwilling to cooperate.

When there's an oil spill, we don't wait for the oil company to come and clean up their own mess. Others clean it up a.s.a.p. and (ideally) then make the oil company pay the fines and damages. CloudBleed is a virtual oil spill. They literally sprayed other people's private data all over the internet.


Yep, it's a Section 230 for the search engine, probably -- the unfortunate data came from Cloudflare.


Why can't they just search for pages with data after the closing HTML tag


That isn't uncommon enough. Searching for one of the CF- http headers in the source code, however would work.


we can just nuke the caches for all the domains that turned on the problematic features (Scrape Shield, etc.)

No, IIUC it's MUCH worse than that : a single nginx process would serve many different domains, and you would only need one of them with the problematic features turned on to trigger the bug, and potentially leak data from all other domains served by that same process (whether those other domains had the feature enabled or not)

I'm not familiar enough with CF's infrastructure to make an educated guess about the way they shard domains across their nginx fleet (assuming this is what they do), but in order to be able to produce a list of all "tainted" domains that were being served together on the same nginx process, at any point in time since September 2016, well their nginx config generation process had better be perfectly deterministic (and any changes thoroughly logged over time). Also, the bigger the shards, the more neighboring domains are affected by a single "poisoned" one, and the more widespread the problem is.

My guess is that if they had had a way to reliably piece together that information, they would have released a list by now.


> My guess is that if they had had a way to reliably piece together that information, they would have released a list by now.

Or they've got that list - and it's so vast and scary that legal and/or the board refuse to even acknowledge its existence…


After reading this, I'm considering switching from cloudflare for my DNS servers. Recommend a similar free service?


https://dns.he.net/ works for me, though your requirements may be different.


Highly recommended by me. Free (up to 50 domains) and ns[2-5] are anycasted.


DNS only customers were totally unaffected by this web proxy bug.


Sure, but maybe he doesn't want to do business with Cloudfare anymore?


Right. The bug doesn't affect me, but I don't like how they responded to this.


Actually I just realized that although I only really wanted cloudflare for dns, cloudflare http proxy is on by default. I had turned them off on some of my sites but not all.


I didn't look too hard but didn't see any similar free services with full DNS management and an API for cache validation. Would love a suggestion here too!


Why not just pay for something like Route53? It's very cheap.


Isn't that an AWS thing?

AWS? Cheap? Really?


Route53 might as well be free. Yes it's AWS, but it's costing me less than a dollar a month on a domain that has trended here several times.


AWS isn't cheap or expensive. It's a bundle of services and you're free to use any combination of them. Route53 can be used independently and yes it's very cheap.


I use Route 53 myself and it's under $3 for several domains. Google Cloud DNS is even cheaper.


Its NOT free, you are paying with your customers/visitors privacy.


IANAL --- what, if any, legal precedent/structure is there for what will happen to CF if, say, 1.5billion users are hacked and money shifts dramatically as a result or some other reasonably thinkable "hypothetical" situation that we, the Internet-At-Large, at this point, have no certain idea if the incident in question has or has not happened ... I'm saying, there's got to be negligence charges or something if there is money lost, that's how capitalism in America works... but this is a Global problem.

If this is how 2017 is pacing, we've got a long year ahead. This is an insanely interesting time to be alive, let alone at the forefront of the INTERNET.

Fellow Hackers, I wish you all the best 2017 possible.


I lost all respect for Cloudflare


eastdakota 19 hours ago [-] (Cloudflare CEO)

>Google, Microsoft Bing, Yahoo, DDG, Baidu, Yandex, and more. The caches other than Google were quick to clear and we've not been able to find active data on them any longer. We have a team that is continuing to search these and other potential caches online and our support team has been briefed to forward any reports immediately to this team.

>I agree it's troubling that Google is taking so long. We were working with them to coordinate disclosure after their caches were cleared. While I am thankful to the Project Zero team for their informing us of the issue quickly, I'm troubled that they went ahead with disclosure before Google crawl team could complete the refresh of their own cache. We have continued to escalate this within Google to get the crawl team to prioritize the clearing of their caches as that is the highest priority remaining remediation step. reply

taviso 6 hours ago [-] Tavis Ormandy

>Matthew, with all due respect, you don't know what you're talking about.

>[Bunch of Bing Links]

>Not as simple as you thought?


Cloud Flares judgement of the situation is obviously compromised. I'm not saying this much thought went into Google's disclosure but they were 100℅ on point in disclosing. There is no way to purge this data from the entire net; it was important for everyone to know what happened as soon as the leak was plugged.


If anyone wants to, they can access (cached/archived) pages from any number of services listed here: https://en.wikipedia.org/wiki/List_of_Web_archiving_initiati...

My personal favorites are:

- https://archive.fo

- https://archive.org/web/web.php

- https://historio.us

- https://timetravel.mementoweb.org


I have a question which might be stupid.

What happens for sites using Full SSL (a certificate between cloudflare and the user and a certificate between cloudflare and the server), could any information from ssl pages have been leaked?


My understanding is yes - if the https page was in CF memory (for decrypting from server and encrypting for user) its contents could have been dumped to a cache of one of the affected sites.


To be even more clear: if the CF servers had any plaintext in memory, that plaintext might have been compromised. Plaintext could be anything from HTML pages to GET and POST request data (containing auth keys, passwords, JSON blobs, etc.)

If CF decrypts information you send to it before encrypting it to your users, that's a step where plaintext might have been present in their servers' memory.

(Please correct me if I'm wrong.)


According to CF, they store certificates and cached content separately, so they were not exposed.


Right. Clear text data was exposed, not the certificate itself. But who needs a certificate when you have the cleartext?


Yes. Encrypted traffic is unique to every TLS session so when data is received from the origin, it must be decrypted (then stored and cached by the server) and then re-encrypted for every user that CF sends the data to. This is the whole reason why all CDNs are a MITM for secure connections.

In this case, certain CF features when enabled would transform the HTML that was sent to users, and sometimes it would include random bits of the server's memory in the output. These bits of memory could include anything like private data from other requests and responses being handled by the server.

This is very similar to the openssl heartbleed issue which also exposed random bits of the server's memory under certain conditions.


Full SSL just means there is SSL between your server and cloudflare. CF still terminates the user ssl at their point.


Hence the joke that they are MITMing half the internet... or perhaps it's not a joke :)


Also still in Yahoo caches with the same leaks found in both Yahoo and Bing. I posted the URLs to the linked thread.


Can someone explain why Cloudflare parse the HTML in the first place.

Is there some sort of information extraction feature service or something they offer? I don't get it.


HTML, CSS, JS optimization service. Also email obfuscation and scraping shield.


It's an optional thing. Unfortunately when enough of those features were active, they started leaking uninitialized memory which included lots of other pages too.


As much as CF would like people to believe otherwise (oh and look at our awesome response time and automation!) this cat can't go back in the bag. They should step away from the mic and contact a PR firm that specializes in salvage jobs.

If I were google I would hit back hard. They prob won't just stop, but I would not bother trying to even clean up the data unless under legal pressure. It out there, it's too late.


And the "irony" is that some of the data may leaked only to "bad bots" and "IP has a poor reputation (i.e. it does not work for most visitors)."

From their blog: https://blog.cloudflare.com/incident-report-on-memory-leak-c...


Any word on the possibility of credit card numbers having been exposed?


Yes, it's possible. Any data that passed througb Cloudflare was possibly exposed.


How are people finding this info?

Is it possible to find if anything leaked from my site behind Cloudflare is in the caches?


To help with this, I made https://bleed.cloud/index.html

It lets you run domains quickly without downloading and grepping.


Folks, can we please stop downvoting the parent of the linked comment? It's of no use when it disappears from HN.


For those who don't know, to read a greyed-out comment you can click on its timestamp to go to its page.


[dead]


Sorry, but something feels decidedly off about your posting this. It's not fair to pile on like that, reposting a comment from years ago is kind of weird, and making it a personal attack is not ok. I'm going to kill this comment now. (Anyone who wants to read killed comments on HN can always do so by setting 'showdead' to 'yes' in their profile.)

Please don't post like this to HN.


This is already a comment on the site, in the relevant thread. Seems a little meta as a post.


For what it's worth, in the general case, I think you're right. The only thing that makes this case different to me is that the one comment CF's CEO has decided to make on HN takes a potshot at Google, which is newsworthy --- but if we took every notable comment on HN and put it on the front page, that's all the front page would be.


Right, I'm mostly whining about the method - link to grumpy mid-thread comment, title that misleads about the intent of the post, etc.

The thing itself seems somewhat newsworthy but personally, I'd rather hear the CF people defend the TLS-MITM-as-service idea itself and get yelled at for that. The fact they're being weaselly and insufficiently contrite seems secondary.


That makes sense. I'm much less interested in the phenomenon of CF-as-global-TLS-MITM than I am in how responsible they are being with the position they've accrued, and handling vulnerabilities like this is a big part of that. So the fact that CF is in a spat with Google is a big deal to me, but I understand that's not the case for everyone.


It's sort of an incredible exchange. Seems worth seeing on its own.


It might be but so are videos of cars crashing and they're not good HN material. This is essentially drama-fishing in stuff that's already on the site and pushing a past (but barely day-old) covered story to the front page again.


I think the content is very relevant to HN. For better or worse, CF has essentially become a significant part of the global internet infrastructure. We should be aware that the CloudFlare CEO is trying to shift blame for their mess. A mess that impacts a staggering portion of the internet.

With regards to the fact that it's part of another thread, I don't think that matters too much. If it were an update present as a separate document it would pass muster. There've been several open letters, and responses to such, and further responses again, etc that have been posted to HN. Furthermore there's no guarantee that someone reading the original thread will see this exchange; I read the thread after the exchange happened and missed it.

This is significant enough to be posted as its own item, that it's part of another, ongoing thread shouldn't take precedence over this significance.


Yeah, I don't think this is the way to do it, see my other comment.


Whats happening is CF playing "never let a good crisis go to waste" against Google. Calling them out on it is not only appropriate but crucial given their past holier than thou attitude.


This isn't 'calling them out'. It's re-posting a single comment for rage-views. If you want to call them out, write something about it, quote Tavis Ormandy if you want, add whatever commentary you think is appropriate and post that. There is nothing in this that 'gratifies intellectual curiosity', it's just cheap incitement. I don't want to call it reddit-like since most sane subreddits sensibly discourage it as well.


Strange, it used to be not possible to submit HN comments. When I asked him about it, dang specifically said that he doesn't want those.


Speaking of that, as of now on the front page:

    23. 	Announcing the first SHA-1 collision (googleblog.com)
        	2882 points by pfg 1 day ago | flag | hide | 488 comments
second page:

    37. 	Cloudflare Reverse Proxies Are Dumping Uninitialized Memory (chromium.org)
        	3168 points by tptacek 1 day ago | flag | hide | 979 comments
It says "1 day ago" for both now, but the second story came later. So how come it ranks so much lower, having even more votes? Is that HN penalizing comment count, or users flagging it? If it got flagged, would anyone have a problem with removing those flags and removing flagging privileges from said users?


Company engaging in practices that undermine internet security and MITM their users found to be doing stupid shit.

Not exactly breaking news. At some point, maybe people will realise that CF is actively making internet worse and less secure, and that it should be treated as nothing more than a wart to be removed.


Every CDN that handles TLS traffic is a MITM.


What other CDN makes it easy to use plaintext between the CDN and origin, yet use a secure connection between the CDN and the end user, and has the nerve to market this as a feature called "Flexible SSL"?

Edit: I wasn't very clear. GP is wrong saying MITM is "wrong" for its own sake. I think Cloudflare is harmful for other reasons though.


That does seem to be an accurate name, it is a feature that's offered and up to the site operator to enable, and yes it's unfortunate that it potentially gives a false sense of security to end users. However in almost every case, it's still better than both sides of the connection being unsecured.

I'm not sure what this has to do with all CDNs being MITM operators when caching secure content.


At the very least, Cloudflare should do a better job of discouraging use of Flexible SSL. People that opt in to Flexible SSL should know what they are doing.

I edited my comment.


This incident affects Full SSL sites too.


Yes. I should have been more clear that Flexible SSL has nothing to do with the incident, it's just another sign that Cloudflare is dangerous to the web.


Wouldn't full SSL traffic be encrypted if leaked and therefore kind of irrlevant?


"Full SSL" in this case just means that the traffic is always encrypted on the network, Cloudflare still has to decrypt it locally to do what a CDN does.


Oh shit. Got it. Also Jesus fuck


Which is precisely why you should not use an MITM network.


What precisely? I don't see a counter-argument.

If you're saying not to use any CDNs then that's not very reasonable considering the service that CDNs provide.


The benefit of CDNs is increasingly questionable, but you can still realize them by managing it yourself. What I'm saying is to never use a third-party network that subverts your own security. I would try harder to nail that point in, but I don't think Cloudflare's coffin has room for any more.


Questionable? There is no shortcut for the speed of light, regardless of how optimized the HTTP and TLS protocol gets. Building your own CDN is no easy feat if you want the performance, security, scale and reliability that you get from a focused vendor.

Everything on the internet is a product of thousands of vendors, hardware equipment and software components working together. There are millions of factors that can and may be compromised so the only realistic approach is risk management.

It's far better to rely on a well funded, staffed and capable vendor rather than building your own version. This is solid advice for everything outside of the expertise of your business so I'm not sure why a CDN is anything different. Assess the risk and do what works for you.


Cloudflare is not a shortcut for the speed of light in this case. You load static assets/video streams/whatever from CDNs. Things that contain sensitive content like account pages, messages, etc should go directly to the server since that is exactly what cloudflare will do as well.


CDNs still provide a better experience by having faster open connections to the origin, local TLS termination, security/DDOS/WAF protections, and more.


I just wonder when we can stop beating this dead horse here...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: