
CloudFare outage (System Status, back online now) - wahnfrieden
https://www.cloudflare.com/system-status
======
Nyr
CloudFlare promotes their service as a highly redundant CDN but the truth is
that this isn't the first outage and they fail at very simple things.

I was using them until one day, they had routing problems with their DNS
servers to some parts of the world (about 20% IIRC). This shouldn't be a issue
except because all the name servers they provide were routing to the same
network, making all my services unavailable for more than one hour. Yes, they
have anycast and all that cool things, but if they fail providing real
redundancy for DNS, I can't be their customer anymore.

~~~
true_religion
I think CloudFlare isn't bad for a small-to-midsize site.

They're new in the industry and still learning.

If you have a large mature site, then its likely your uptime will be better
than theirs and you'll lose serious money for every minute of downtime. In
that case, don't use CloudFlare because they're still learning from their
mistakes.

~~~
benatkin
> They're new in the industry and still learning.

They aren't that new, and they would have learned faster if they had better
priorities. Not only do they seem to prioritize the script minification and
code insertion over performance, they seem to prioritize popularity over
customer service. It's easy to find reviews of CloudFlare where someone
running a small-to-midsize site gave them a fair shot and was disappointed.

------
jgrahamc
Perhaps change the title to "CloudFlare was down briefly" because it's not
down any more. We were aware immediately that a problem was occurring and
fixed it. To quote the people who watch the CloudFlare network night and day:
"While tuning Asia performance, an improper router config was pushed, causing
our upstream provider to misroute. Pinpointed issue and fixed."

Lots of red faces around the office and people apologizing.

~~~
qeorge
This explains a lot, thanks. We noticed that the traceroute for one of our
domains was running through Asia (usually DFW) and found it odd. It also
changed shortly when the sites came back online.

Glad to know it was just a mistake though, as opposed to an attack or failure.
Much less worried about recurrence.

------
larrys
When I first considered cloudflare the thing that kept me away (and this was
quite some time ago) was the "quality" of the customers. In looking at the
domains they hosted I rarely saw anything but small and spammy type sites. I
also notice that they had constant churn. They would bring on customers but
many customers were also leaving every day in numbers for any day I checked.
This doesn't appear to have changed.

For the link below, simply change the date in the URL to any day and you will
see the domains that are added and transferred out of cloudflare on a constant
basis:

<http://www.dailychanges.com/cloudflare.com/2012-05-02/>

Added: To me given what cloudflare is it doesn't make sense that these domains
are transferred out as frequently as they are other than for service related
reasons. In fact we had suggested cf to a customer and they lasted about a
week on the service and had issues.

~~~
eli
What are you comparing them to? Based on your logic they don't seem any worse
than any other DNS provider I plugged in. E.g.
<http://www.dailychanges.com/dnsmadeeasy.com/2012-05-02/>

------
Udo
To their credit, they turned this issue around very quickly. Overall I have to
say that Cloudflare is an excellent service.

This event does remind me of the predictable and somewhat obvious question
though: do the benefits of a service like Cloudflare outweigh the downsides
that inevitably come along with introducing another single point of failure to
a website?

~~~
jyap
Well theoretically Cloudflare is designed with a decentralized approach which
means it is not a single point of failure.

I've noticed that large scale issues like this are usually down to botched
router configurations or related networking changes.

~~~
Udo
> it is not a single point of failure.

Obviously today Cloudflare _did_ become a single point of failure for many
sites, so I'm not quite sure I understand your point. I also don't believe you
can design any service with 100% uptime. Things will go wrong.

~~~
jyap
The operative word here is "theoretically". Theoretically the concept of CDN's
(a broad term to describe the main aspect of Cloudflare's service) gives you
greater replication and redundancy of data. Now this all depends on their
overall design (not all CDN's are created equal). You can eliminate SPOF's
through replication and redundancy.

It's like saying I have 2 cars I can use to get to work. Then you say "But
what if both break down?" Uh, 2 cars breaking down is not considered a single
point of failure.

Take the example of serving a single image on the internet.

It starts with DNS. You can have multiple DNS servers for your domain (no SPOF
for DNS lookups). You have multiple web servers in different countries. Your
web servers point to CDNs to serve the image. Your CDN has multiple DNS
servers for their domain. They have multiple servers in different countries to
serve up your image.

Tell me where the 100% uptime of the single image fails in that scenario.

To answer your original question, if Value derived (can be high say for a news
site) > Risk involved (can be low depending on provider), then that is when
the benefits outweigh the downsides. In most cases, a CDN is meant to give you
better uptime as well as provide you benefits such as geographically delivered
content and the ability to serve your content to more people (eg. videos and
other large media content).

~~~
Udo

      Then you say "But what if both break down?" Uh, 2 cars 
      breaking down is not considered a single point of failure.
    

I'm afraid you might have spectacularly misunderstood my point.

The way Cloudflare operates is more closely related to a scenario where either
one of the two cars failing brings down the site: that would be either the
webserver or the CF infrastructure. Mathematically, the combined downtime of
both _must_ be greater than that of either one alone.

    
    
      Tell me where the 100% uptime of the single image fails in that scenario.
    

There is no question that a CDN is generally designed to add robustness and
speed to content delivery. But as you said, not all CDNs are created equal. I
say this as a (satisfied) Cloudflare user myself:

In its default configuration, a basic Cloudflare plan has (almost) no
settings. It's not like you make a choice e.g. to host only images there.
Using the standard CF plan comes with basically two states your site can be
in: either CF is on or it's off. When it's on, the entire traffic of that site
is going through Cloudflare, they become your site's front-facing servers.
There are some huge advantages to this, for example they block a lot of
malicious traffic that way.

Coming back to your image example: individual components of the service may be
designed for redundancy, but there is still a lot of stuff that can (and does)
go wrong with global repercussions, if only for the simple reason that the CDN
service as a whole must be centrally controlled.

If your webserver is up but CF is down, your site is down. If CF is up but
your webserver is down, your site is down (actually it becomes a mirror of
some static content for a few minutes before it goes offline completely). This
is what I meant by each one of those services being a single point of failure,
there is really no way of getting around that fact.

I remember the same discussion about CDN-hosted JavaScript libraries. People
argued that linking to a 3rd-party server made their site _more robust_ simply
because CDNs normally have a higher uptime than a standard web hosting server.
This was of course completely beside the point, because (again) either one of
the two failing meant the site would break. That's why it has become customary
to have a local fallback for CDN-hosted JS libraries now.

~~~
larrys
"In its default configuration, a basic Cloudflare plan has (almost) no
settings. It's not like you make a choice e.g. to host only images there. "

Correct but you could use a completely separate domain for all the images and
only enable cloudflare for that domains. (Not a cloudflare customer I'm just
pointing this strategy out.)

------
codexon
They seem to have random unexplained outages every other week that aren't even
mentioned on Twitter.

------
jaytaylor
Twitter search is nice for staying up to date on this:
<https://twitter.com/#!/search/cloudflare>

------
eli
Seems to be back now.

[https://twitter.com/#!/CloudFlareSys/status/1977854413600235...](https://twitter.com/#!/CloudFlareSys/status/197785441360023552)
blames an upstream network issue. But I dunno, cloudflare.com was giving me a
502 error from ngnix, which indicates a cloudflare-backend problem.

~~~
jjoe
Just thought I'd expand on this part of your comment: "cloudflare.com was
giving me a 502 error from ngnix, which indicates a cloudflare-backend
problem"

Everything is a backend (ex: your server). Your CF hosted website is a backend
to their front end (Nginx). Except the "backend" here is located in a remote
network where your actual server is hosted. So it could very well be a routing
issue between their network (Nginx nodes) and your server. Hence the 502
(backend unreachable).

Regards

~~~
micro-ram
Yes, but CloudFlare is supposed to show a cached copy of my site when my
server (i.e. backend) is unreachable.

------
wahnfrieden
Their status page is currently 500ing, but they have a Twitter account for it
too:

<https://twitter.com/#!/cloudflaresys>

"Investigating upstream network issues in EU." -- even though the outage
appears to be global...

------
pwenzel
My Pingdom monitor started making notes a few minutes ago. Having the same
downtime problems on one of my sites.

