
Stress Test the Cloud: Alibaba Cloud, AWS, Azure, GCP - ceohockey60
https://interconnected.blog/stress-test-public-cloud-alibaba-aws-azure-gcp/
======
shred45
I think this is an interesting analysis, but I'm not sure I can agree with
some of the logic here.

For example, a large component of "reliability" comes down to your
implementation (multi-regional, graceful fail-over, etc.). It is entirely
possible that the statistical reliability of individual components of one of
these clouds is worse, but they have engineered their architecture on top of
it to handle these failures gracefully. It is very difficult to estimate the
probability of an unforeseen multi-regional black swan event that would
actually threaten these businesses.

It is also worth noting that these businesses have a choice between their
original on-prem datacenter, which was likely built under a similar
engineering and management culture (and therefore similar quality), and their
cloud offering, which is probably equal or better than on-prem (generally
public products will be more polished, better documentation, more consistent,
etc.). Factor in the waste of maintaining extra data centers and capacity when
their cloud is likely not 100% utilized by customers and the decision seems
obvious. Cloud and on-prem could both be terrible and they would still likely
migrate to their own cloud for sheer efficiency reasons. They really don't
have a choice to use a different cloud.

The author also tries to reason about the complexity of the requests being
handled by each business (e-commerce vs. email) and I think it is very
difficult to do this meaningfully. All of these companies have to integrate
systems, developed by separate teams, which contain lots of unique
functionality and data. I'm not sure I agree that e-commerce is necessarily
more complex than Google maps or docs. I would have speculated that Google's
architecture is generally more complex than Amazon's based on my subjective
interactions with the platforms and experience with the two engineering
cultures. Even if there are large differences in complexity, this strikes me
as using layer 7 information to reason about layer 1-4. As we move to
serverless and managed offerings this might make more sense, but I think for
most businesses, clouds are very much just compute, storage, and network right
now, which doesn't have much bearing on how complex of an architecture you can
engineer on top of it.

Another thing I don't see discussed a lot is documentation and stability of
API. I have never had an unpredictable API response from AWS or GCP. Their
docs may not be perfect but they are generally accurate. Alicloud definitely
gets respect for handling Singles Day, however I have run into very unreliable
documentation and API behavior. It is always a bit of an adventure re-applying
a Terraform template that I made only a few months ago. Perhaps their internal
team gets more heads up when things change, but I consider DevOps reliability
to be a real issue there. I have also noticed occasional issues with single-
zone network latency, managed Hadoop version compatibility, and other general
fit-and-finish things.

In general I consider all of these clouds to be equivalent enough in
reliability that it just isn't a deciding factor. The location of clients and
APIs that I'm working with is a much bigger factor (if they are already in a
cloud zone somewhere). I subjectively prefer AWS because I have worked with
them longer and I am more familiar, but I have had projects where it made
sense to use each of the 4 clouds and I think being able to accommodate those
requirements with equal degrees of reliability is exactly what good
engineering is about.

~~~
ceohockey60
Author of the post here. Thank you so much for reading and for your comment! I
learned a lot.

100% agree with you that documentation and API stability is a topic worth
discussing more, so I might write a separate post on that in the future. I'm
personally a bit obsessed with good documentation (most vendors big and small
don't invest enough time and resources on documentation). It gets into both
the reliability and usability of the cloud.

~~~
shred45
Haha, I would also consider myself obsessed with good documentation. That
would make for a very interesting post. Cheers.

