Hacker News new | past | comments | ask | show | jobs | submit login

TL;DR for the lazy ones:

> The examples we're finding are so bad, I cancelled some weekend plans to go into the office on Sunday to help build some tools to cleanup. I've informed cloudflare what I'm working on. I'm finding private messages from major dating sites, full messages from a well-known chat service, online password manager data, frames from adult video sites, hotel bookings. We're talking full https requests, client IP addresses, full responses, cookies, passwords, keys, data, everything.

This is huge.

I mean, seriously, this is REALLY HUGE.

I don't get it. How is this info leaked? From the blog posts, it seems that "only" the HTTP Headers are being leaked and somehow being crawled by Google? But since when does Google store HTTP request info? Can someone explain?

Headers (among other sensitive stuff) were being leaked inside document bodies.

So just to clarify: some bug makes Cloudflare leak the HTTP Headers into the HTML being served and those HTML pages containing sensitive Info got cached by Google (and others)?

Yes. Think of it this way.

You have a function that strips all colons from your input. For some reason - in certain cases - your code misbehaves and when you are replacing the colons with an empty character you accidentally replace that colon with other data you have in the memory. So now all the colons in your input have been replaced with data that you shouldn't have touched. So now whoever sent you an input, gets back that input + more data they shouldn't be able to see.

And Google in this case caches those output strings.

@homero (since I can't nest a reply any further), it's not the contents of the crawler's request that gets randomly injected into the page that the crawler requests, but rather the contents of other requests to the same Cloudflare server.

Imagine I'm having a chat on some website X, which uses Cloudflare. Cloudflare acts as a man in the middle, meaning my request, and the response, likely pass through its memory at some point to allow me to communicate with X.

Later, a Google bot comes along and requests a page from site Y. Because of this bug, random bits of memory that were left around on the Cloudflare server get inserted into the response to the bot's request. Those bits of memory could be from anything that's gone through that server in the past, including my conversations on website X. The bot then assumes that the content that Cloudflare spits out for website Y is an accurate representation of website Y's contents, and it caches those contents. In this way, my data from website X ends up in Google's cached version of website Y.

But how is Google getting headers from the users of the sites, it should be from their crawler

If I (user A) access upwork.com (I just saw this on the list of affected websites, so it's not meant to be an ad), I am sending them my headers. Let's say my headers and other data are saved in M1 (memory register 1).

Then Google accesses the website as the crawler (user B), and their header and data is saved in M2. However, Google triggered a bug and now has access to M1 as well. So now Google sees their own headers + my data + other garbage.

Imagine this—Google sends a request to get data from malformedhtml.com for crawling purposes. This site's html happens to have that weird incomplete tag problem they mentioned. This site is served by Cloudflare, wherein a buggy script manages to insert some data from the server's memory into the HTML that it returns to Google. Now this data in the memory contains HTTP request headers etc. of _completely unrelated websites_ that are also behind CF.

Google gets this HTML and caches it and that's how it ends up there.


"We leaked information from Customer A to Customer B by accident" is the first order problem.

But the existence of web caches means that all that private information of customer A is potentially fucking everywhere now.

How do you even clean this up? How do you even start?

They leak uninitialized memory contents into the HTML being served; that memory could (and did) contain data from any other traffic that passed through their hands.

So a request sent to Cloudflare customer A's site could return data from Cloudflare customer B, including data that B thought was only being served via https to authenticated users of B.

Not just headers, basically random memory dumps that could contain anything that Cloudflare saw (which is almost everything). Passwords, certificates, you name it.

Essentially. Any headers from any site routing through cloudflare could get injected into the body of a second site's page if that second site was using the obfuscation feature. Those "mis-stuffed" pages could (and were) then cached by, among other things, crawlers like those operated Google and Bing.

Apparently 7xx sites had this enabled, but that affected 4000ish other sites that happened to be on the same infrastructure.

Near as I can tell, the HTTP Headers from one site are being included in HTML of other sites...

Cloudflare handles SSL for a lot of sites. It decrypts everything and passes it along.

For certain other sites, with malformed html, there is a bug that caused it to grab random data (headers and body) from memory and include it in the body of the response HTML. (Some html rewriting product that cloudflare offered was broken and it ran on the same servers.)

This stuff got sent to peoples browsers and also to web indexers like Google or Bing.

Google lets you search for stuff and will also show you the original page that it scraped, making it easy to find this data.

Edit: Also you may be seeing more headers in examples because headers are easier to search for.

Everything is leaked, headers are common but full plaintext is leaked in some instances.

HTTP Headers were being including the http response bodies of other random websites. Those websites were being crawled and cached.

requesting a page with a specific combination of broken tags, when done through cloudflare, will cause neighboring memory to be dumped into the response. op suspects this is due to a bounds checking bug on a read or copy. one can imagine this can be potentially kilobytes of data in one go.

since anyone can put a broken page behind cloudflare, all you need to do is request your own broken page through cloudflare, and start collecting the random "secure" data that comes back.

Just deleted my LastPass account - have been converted to KeePass for over a month.

Did Lastpass use Cloudflare? That would be a disaster.

It does not appear that LP was using Cloudflare. Even if they were, at most, your master password is all that leaked. All of their encryption and decryption is done on the client.

Only my master password? That seems kind of important.

I think he got it mixed up. Only the database would be leaked, but not the master password. However I understand the the clientside decryption is done when using their addon, but I'm not sure how it works if you go via the webportal.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact