Hacker News new | past | comments | ask | show | jobs | submit login
CloudFlare Is Now a Google Cloud Platform Technology Partner (cloudflare.com)
158 points by jgrahamc on April 13, 2015 | hide | past | favorite | 83 comments



Recently I have been getting cautious about Cloudflare. I do use them and like them a lot, also enjoy reading their technical blogposts. However from a privacy stand point it makes me feel uneasy. Cloudflare is just everywhere now: HN, Stackoverflow, Reddit and countless other sites. You can block a cookie, connection to a third party script, but how do you block an internal proxy? All your cookies, credentials, heck even HTTP request and response goes through them. Also why is there cloudflare specific cookie (__cfduid) on sites may not prefer tracking users? (eg: HN)

Maybe I am just being paranoid...



Thank you for the link. It was a simple and yet detailed. Wish more companies took time to explain parts of their privacy policy.

However my concern is more to do with thinking as an outsider. I have toyed with idea of a company that requires establishment of trust among users. Although it may seem as simple as do no evil on the surface, it is an extremely hard undertaking. The thing with Cloudflare is that its niche is its own problem, "Man in the Middle", even if I were to place trust in the privacy policy, there are other things I need to be worried about. What if there is a break in and a hacker places a sniffer to sniff the communication between Cloudflare and the client sites? Sure the client followed the best practices such as, one way hash, strong crypto (eg bcrypt), 1000+ iterations, database encryption, https communication, etc but all of it is pointless or redundant now because the browser ends up sending the password in plaintext. Now imagine if instead of an hacker its a govt agency and that too with a gag order. See my concern?



However, there is no way to verify that is all they log.

CloudFlare gets to see the cleartext of all traffic they serve as they MITM HTTPS connections.


From an engineering feasibility/cost standpoint: there is no scenario in which they could log (as in packet capture) and dedupe all traffic without a nation-state-like (alphabet orgs, interested companies a la Google) budget.

CloudFlare's (non-enterprise) prices simply aren't even in the required order of magnitude.

Now: whether or not metadata, request bodies, etc. are logged, and to what scale, is another story/discussion of possibility.

At some small, targeted scale, it's safe to say that total duplication (certainly request bodies, etc.) is possible, if they were so interested.


No, you're not being paranoid. The internet started off as a de-centralized system and now we are seeing the emergence of more and more silos, cloudflare is another one of these (albeit a special one and run by people that I would trust more than those running some of the other silos).


This is sorta an internet architecture question for those in the know. Assuming there's no issue with client reachability/latency, what's stopping CloudFlare from having a single IP?

Suppose the IP was behind a fat enough pipe, why not load balance behind it instead of DNS load-balancing in front of it (and additionally behind each as I presume now happens)? Also, if that IP was anycast then you could ignore the issue of client latency as well, assuming you have the necessary private network behind endpoints to manage state.

If you don't like/can't solve the problem at the level of IP anycast, when not leverage a third-party anycast DNS and just have a few fixed IP for specific geographic locales, again with fat enough pipes and load balancing behind them.

I guess what I'm saying is that there's no reason for an organization, a monolithic entity, to have more that a handful IP addresses at most.


My understanding is that they basically "fast flux" IPs to funnel traffic for targeted attack to a specific data center. So, while you normally may be sharing IPs, if an enterprise customer's website example.com starts getting attacked they will put it on dedicated IPs, then broadcast those IPs from one or two data centers. They will then reroute all other enterprise traffic away from those data centers, thus minimizing the attack effect on other customers. If these websites were all on the same IP, it would be impossible to distribute traffic selectively between data centers like this.

Another thing they can do is use anycast to load balance across data centers. So, if a data center rather than a website is a target - the attackers will need to know which IPs to attack. They can start flooding the broadcasted IPs from a particular route. However, if this happens then hypothetically Cloudflare could just stop broadcasting the IPs at this particular data center, re-broadcast them at all the surrounding data centers, and basically spread out the attack load across multiple sites. If the attackers change the IPs that they target based on new routes, then Cloudflare can continue fast-fluxing the IPs every 5 minutes and mitigate the attack.

It's pretty cool use of BGP and anycast, but being able to change IPs of website and where they are broadcasted in real-time is core to Cloudflare's security.


Thanks for this comment. I guess, along with jgrahamc's sibling comment, you have to make a routing decision based on (source, port) at most if you have a fixed IP, since HTTPS ports are stupidly fixed. That is 32+16 bits of info at most, so an ethernet MACs worth. So now I can clarify my question as follows: with X bits of data, what is the present state-of-the-art latency wrt to routing T Gbps of traffic. And it's not just that, you have to have good latency for updating that routing table.

Any research on the real entropy of (source,port) entropy on the Internet? The are also real issues like the distribution of (source, port) is hardly uniform, and is especially nasty when undergoing an attack, i.e. you want to manage latency based the both the distribution and authenticity of traffic.

This is a very interesting mathematical problem. I have to work on expressing it a bit better before I can hope of formulating a solution, but yes I can totally see now how leveraging BGP, anycast, and DNS TTL are all knobs to heuristically solve this problem, instead of a some crazy genius way of making use of router TCAM silicon.


As a further observation, it makes the GitHub attack an interesting case study. You now have to further route on the GET target, and if traffic is encrypted, the routing decision is moved to a later stage.

In order to protect latency to other GET targets, you're going to have to start doing interesting things.

One future solution I can see is multipath-tcp the anomalous traffic, and closing the original connection. But at that point you have to refilter based on genuine vs malicious traffic, and then there's the encrypted state you have to share for the proper stream handover. Ooof... what a nightmare.

At least it's an interesting one. :)


Keep in mind you can generally only anycast a /24's worth of IP addresses, so it's very unlikely they're doing this with single IPs.


CloudFlare has an IPv4 /12 to play with: http://bgp.he.net/net/104.16.0.0/12#_whois


Here is their complete IP list:

https://www.cloudflare.com/ips


1. Non-SNI based SSL means you need an IP per host.

2. People attack IP addresses. Handy to be able to change the IP address of a web site.

3. Countries block sites based on IP addresses. Handy to be able to move sites around to prevent collateral damage.


In my defense, I was assuming SNI (aka the modern internet), and that the IP was reachable by those you care for it to be reachable by. Ignoring these issues, is there an "engineering" reason why a single IP won't work, in terms of, for example, hardware can't demultiplex the aggregate ingress volume of CloudFlare and handle DOS mitigation?

I guess I'm asking this because of how woeful looking the "load-balancing" solutions are from the major cloud providers. I feel they way they're externally documented, and how their APIs are specified, hitting them with more than a 40Gbps fat-server's load of traffic will cause issues, regardless of how many hosts you have serving that load.

I'd appreciate some insight from those who handle such crazy amounts of traffic.


I work for a major CDN that uses anycast, and there are a number of reasons. I won't go into too many of them, but quickly:

1) Anycast doesn't give you fine grain control. Once we announce our anycast routes, what traffic actually gets sent where is out of our control - it is based on the peering arrangements of our transit providers. If we need to balance traffic between our pops, we need finer grained control than a single anycast IP.

2) IP addresses get blocked for all sorts of reasons (looking at you China!) If all customers were on one IP address, as soon as China decides to block one customer, they are all blocked.

3) Anycast sometimes has weird behavior. For example, traffic might be sent to a datacenter that might be close in terms of peer links, but far in terms of physical distance and latency. Using DNS, we can route around these issues.

I am not sure what you mean about the "40gbps fat-server's load of traffic" causing issues. We handle many customers that push more than that.


Just disregard my "fat-server" comment. It's more from being disillusioned with all load-balancing solutions being tied to the service provider. I'd like something that was cloud agnostic, that was peered at multiple points with the major providers.

I guess this is step 1 in the same effort from CloudFlare, before they add AWS and Azure. But their interface is over-simple, understandable considering the technical proficiency of their average customer.

CloudFlare is too one-size fits all, but from a business perspective it's totally understandable.

I know it's a pipe dream, but I wish we could defragment the IP space and clean up the BGP tables. It would at least make anycast more reliable without resorting to DNS tricks like edns-client-subnet.

As for IP blocking, if undesirable sites are behind the same IP as publically demanded ones, it could make blocking actions harder to get the populace to support. But worrying about authoritative regimes is not my concern. After all, why make a service accessible if you cannot monetize the user base sufficiently.

Yes, I'm a little jaded.


> 2) IP addresses get blocked for all sorts of reasons (looking at you China!) If all customers were on one IP address, as soon as China decides to block one customer, they are all blocked.

Feature, not a bug.


If you think of a connection is defined by the tuple (source_ip, source_port, destination_ip, destination_port) then you might run into problems if destination_ip was a single value, just because whatever hashing you are using/table lookups for connection management, DoS protection etc. etc. might have problems with the sheer size. We are doing a huge amount of traffic and I can imagine having to engineer around some things related to that.

But the real issues are the ones that outline above.


I wonder if this is one of those strategic deals that would lead to an acquisition. With the push surrounding cloud and Google actively competing hard in this space, it would make a lot of sense.


Oh wow, Google is already on every site with Analytics, imagine if they were also the SSL host/WAF/CDN/DNS host for every major property?

It would fit in well with their silent yet never ending reach across the internet.


And now we start to wonder even more - is Google just an unexpectedly profitable NSA front?


Problem is that (besides for the brand) Cloudflare really has nothing to offer for Google. Google has spent the last 20 years solving the same problems CF is aiming to solve, they've even got a competing service Google PageSpeed that does exactly what CF does, except better (in my personal experience.).


they've even got a competing service Google PageSpeed that does exactly what CF does

You mean this service? https://developers.google.com/speed/pagespeed/service

PageSpeed Service is in a limited field trial, and is not currently accepting new signups


Oh, I didn't know they closed signups. I've been using it for some personal projects for a good while (Since 2011 or so?) now.


Yes, announced in 2011. Looks like it was launched with limited signups: http://googlewebmastercentral.blogspot.com/2011/07/page-spee...

If google still doesnt think it has a marketable product after 4 years.. maybe a cloudflare acquisition isn't too far fetched.


it was never really open.


Google sucks at productizing their infrastructure expertise. The underlying tech at Google is indeed better than CloudFlare, but CloudFlare understands marketing, ease-of-use, product simplicity, all those stuff that are necessary to get people to actually use your offering.


I was amazingly surprised by Google Compute Engine. I've never used such a simple IaaS provider. Especially compared to the direction Azure's new portal is taking, GCE is refreshingly simple. EC2 is alright but still feels a lot more complicated than GCE. The UI is simpler, too.


I don't think they need CloudFlare for that though, they just need marketing people to sell it. I can't see CloudFlare acquisition being worth it for Google just for their marketing expertise.


How is this different from before they were a GCP partner?


It sounds like they are now peering directly. Google could also be operating Cloudflare's [Railgun](https://www.cloudflare.com/railgun) software at the edge of their network to reduce content transfer times.


What does this add? Before the partnership, could gce users not use cloudflare? Does the peering agreement result in lower transit costs on my gce bill?


Did you read the post, specifically the benefits section? Or the Google page they linked: https://www.cloudflare.com/google

It sounds like they now have a peering agreement so Google can directly communicate with CloudFlare's network, resulting in 2x faster performance. It looks like that's the primary benefit (other than the regular benefits of CloudFlare).


They never actually say that the peering agreement results in 2x faster performance, just that they use SPDY for 2x speed (which is something they've been doing for a while now).

>2x Web Performance Speed - CloudFlare uses advanced caching and the SPDY protocol to double web content transfer speeds, making web content transfer times significantly faster.



Direct Peering Costs: NA: $0.04/GB EU: $0.05/GB APAC: $0.06/GB


"double web content transfer times"

That should be speeds.


Maybe they're gonna be twice as slow now?


I misread the title and thought that Google has acquired CloudFlare.

And that made me a little uneasy.


Didn't Google announce their own DDOS protection service some month ago?


You mean the Project Shield[1] ?

[1]: https://projectshield.withgoogle.com/en/


So is this basically GCP and Cloudflare peering with each other?


Cloudways also became Google Cloud Platform Technology partner. And the invester in more than 10 cloud companies Ben Kepes wrote about it on Forbes. http://www.forbes.com/sites/benkepes/2015/02/04/cloudways-ad...

And that's what they have built using Google Cloud Platform: http://www.cloudways.com/en/managed-google-compute-engine.ph...


Is it going to be beta or alpha, like most Google Cloud services?


Google Product Manager here.

Not sure why you think most Google Cloud Services are in beta.

The Google Cloud products page [1] lists 17 main products. Two are in alpha (Container Engine, Deployment Manager), one is in beta (Pub/Sub).

The rest are fully supported. There are some beta features here and there...but saying "most" are in beta is certainly not correct.

[1] https://cloud.google.com/products/


I think that that it's beta because it's still the case that almost nothing works. Try setting up TLS on App Engine for anything other than your primary Google Apps domain. This has been broken for years, despite Google pushing TLS through Chrome policy changes.


The Google Cloud products page [1] lists 17 main products. Two are in alpha (Container Engine, Deployment Manager), one is in beta (Pub/Sub). The rest are fully supported.

Wait a second. I just clicked though the items on that page, and the following are listed as alpha:

  Container Engine
  Cloud Dataflow
  Cloud Deployment Manager
The following are in Beta:

  HTTP/HTTPS Load Balancing
  Virtual Private Network
  Cloud Pub/Sub
  Cloud Monitoring
  Cloud Logging
If these aren't in beta/alpha, then maybe you should update your docs.


Actually, I just filed a bug internally to fix docs. You're right -- alpha/beta is not clearly shown consistently enough (I missed at least one in going through that list).

In some cases, there are GA products with alpha/beta features & languages, so it may be difficult to figure out the best way to communicate at the top level (e.g., the examples pointed out for a given feature in a given language, or HTTP load balancing vs Network Load balancing).

But in cases where it's clearly a beta product, it should be clear.


I recently just found out that PHP is in Beta (by the support team). That wasn't obvious to me when looking through the website / documentation


Example: Data Flow is in Alpha & Logging is in Beta.


Go on App Engine has been in beta for years


Sure, a GA product like App Engine often has features that are in Alpha/Beta/Experimental states. That different than the major product being in a pre-release state.


Why? The point is, Google has a habit of releasing tons of shiny new Cloud Platform services (whether they're considered "separate products", part of an existing product, or whatever), but keeps them in alpha/beta for a long time. Go support has been there for years, and it's still "experimental". Some of the new products released last year at I/O 2014 are still in alpha, and we're already approaching I/O 2015. Pedantry aside, this deserves discussion.


And...so what? Why does it deserve discussion. Sure, Google openly has always favored a practice of releasing products/features very early to to validate them and evolve them through real-world use with willing early adopters that accept the risks associated with potentially unstable feature sets of not-yet-stable products.

What's the issue, here?


Managed VM's, PHP and Go on Appengine are all beta as well.

Autoscaler and Instance Groups on compute engine are beta as well.


What percentage of the web traffic flows through CloudFlare now?



Does this mean it will be even harder to DDoS sites protected by Cloudflare now?


It's always been hard to DDoS sites protected by Cloudflare. Their business model is to promise to absorb any DDoS attack against you - and I think they've delivered so far.


Do we have to do anything special to make this work? We've already been using CloudFlare with our App Engine application, using a CNAME in CloudFlare DNS.


Wait, isnt google appengine already using SPDY?



CloudFlare hosts reddit, is that correct?


Yes. The NS records list reddit nameservers (usually you need to use CF nameservers for using their service, using your own nameservers require more config) but the A records list CF IPs (free users just get two IPs, reddit has quite a lot)

    reddit.com.		22	IN	A	198.41.209.143
    reddit.com.		22	IN	A	198.41.208.141
    reddit.com.		22	IN	A	198.41.209.137
    reddit.com.		22	IN	A	198.41.208.139
    reddit.com.		22	IN	A	198.41.208.143
    reddit.com.		22	IN	A	198.41.208.142
    reddit.com.		22	IN	A	198.41.209.139
    reddit.com.		22	IN	A	198.41.209.141
    reddit.com.		22	IN	A	198.41.209.138
    reddit.com.		22	IN	A	198.41.209.140
    reddit.com.		22	IN	A	198.41.208.138
    reddit.com.		22	IN	A	198.41.208.137
    reddit.com.		22	IN	A	198.41.209.142
    reddit.com.		22	IN	A	198.41.208.140
    reddit.com.		22	IN	A	198.41.209.136


So is that like load-balancing out from reddit.com to all those IPs, is that the purpose of so many?


Huh, interesting. I didn't even know they'd allow you to do that. I assumed CF requires full DNS control to allow quickly switching over IP's in case of DOS and such.


HN does it via CNAMEs (this doesn't require manual editing of the IPs and CF can do it when they need to):

    news.ycombinator.com.   76  IN  CNAME   news.ycombinator.com.cdn.cloudflare.net.
    news.ycombinator.com.cdn.cloudflare.net. 124 IN A 198.41.191.47
    news.ycombinator.com.cdn.cloudflare.net. 124 IN A 198.41.190.47


They reverse-proxy reddit (and hacker news) - but they don't actually host the website (i.e. the application, databases, etc).


Does that mean when I make a request to reddit or HN, CF fetches the data and I get it through their proxy? Or am I missing something ...


Yes.


And to tack on - this is how they act like a CDN. In general they don't pull CSS, images, etc every time from your server - they pull it once, then cache it at edge locations.


Got it, cool.


They also capture all your content and communications.


More specifically - they terminate your SSL connections, thereby having the cleartext of your traffic - and doing whatever their NSL/gag-order requires them too do before fetching/forwarding your request to the appropriate chache/endpoint.


Which, more specifically, could happen at AWS, GCE, Azure, or your local colo- wherever you happen to terminate SSL.

Save for the scenario of expensive, relatively difficult-to-implement pieces of crypto hardware (and even then, a nation-state could probably defeat it), your traffic is likely vulnerable to determined aggressors.

It's one thing to possess such high-end, esoteric security technology, it's another thing entirely to implement it (and protection for other far more realistic attack vectors) at a CloudFlare-number (or other CDN) of global locations.


Sure.

I guess the bit that really grates on me is the "just give us your private keys, and trust us!" approach, especially with someone who's then routing not insignificant percentages of the total web traffic through their infrastructure.

Snowden showed us the NSA can and does target "high volume" opportunities for mass surveillance - if you look at the PRISM slides and estimate what percentage of global email their "top ten" targets represents, how much would you bet against them already having a similar program in place backdooring Cloudflare (and Akamai and all other significant players in the SSL CDN market)?

It's probably a false hope, but I feel my own SSL cert on a VM on a more Lavabit scale "local colo" is - while no safer from a targeted NSA/GHCQ/DSD probe looking for _me_ specifically - still significantly less likely to get caught up in a firehose scale "collect all the things" program.

Although, it's probably just as likely a "red flag" that marks me as a "potential terrorist" at least as accurately as having a public PGP key or a secure messaging app… :-/


To clarify terminology -- CloudFlares does not host websites. They provide DNS service and performance/security services.


Right ok.


You're right!


:)


Good time to stop using CloudFlare now :-) Thanks for the heads up OP.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: