
AWS, Azure, and GCP respond to cloud report - awoods187
https://www.cockroachlabs.com/blog/aws-azure-gcp-respond-to-the-2020-cloud-report/
======
sho
What a great marketing strategy - publishing interesting, valuable research in
an adjacent area - thereby gaining genuine, valuable attention from exactly
the type of person who'd also be interested in their product, whilst also
establishing a kind of thought leadership in the "cloud performance" category.
I'm not even being cynical - great work.

~~~
eadan
I actually really appreciate companies that produce research and write
articles on technologies adjacent to their business. It's a great way of
building trust and a community around the business' core product. Digital
Ocean are a good example. Reminds me of Patrick Collison mentioning on a
podcast [0] that one of the biggest drivers of new customers to Stripe in
their early days was a blog they wrote on using the Python debugger.

[0] [https://tim.blog/2018/12/20/patrick-
collison/](https://tim.blog/2018/12/20/patrick-collison/)

------
pwarner
> Azure offers a large number of configuration options that can be tricky to
> get right

That's basically the summary of Azure. I think all that complexity ends up
hurting more than it helps, not just on the end user side, but in terms of
lower reliability from the provider side.

~~~
api
Windows is the same way: absurd numbers of options. I would not be surprised
if Windows 10 had over a million configuration settings between regular
panels, policies, and the registry.

~~~
cortesoft
You think Windows has more configuration options than Linux?

~~~
chrisandchris
Not that, no. But usually Microsoft/Windows limits some arbitrary stuff when
configuring something else. The report reads a lot like (for at least the
Azure section) „if you configure option A, option B will be enabled because
option C will be unavailable. Therefore we provide 10 different options of
each A, B and C you can combine however you want“.

------
dang
Previous thread:
[https://news.ycombinator.com/item?id=21804939](https://news.ycombinator.com/item?id=21804939)

~~~
biomcgary
This is a new post that updates the previous post with cloud vendor responses.

~~~
dang
That's clear! Adding links to previous threads is just a service to the
curious. Not intended to imply dupiness. If it were a dupe we'd have
downweighted it and probably marked it as [dupe] in the title.

[https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...](https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=by%3Adang%20links%20curious&sort=byDate&type=comment)

~~~
biomcgary
Your link to the previous discussion was valuable, but I originally
misunderstood the meaning. I've been reading HN long enough that I should know
better.

However, for newbies (or the scatterbrained), perhaps you could prefix the
link with something like: "For context, see discussion of earlier, related
post: [https://..."](https://...")

~~~
dang
I do that sometimes (see the search link upthread) but still haven't figured
out the best way to word it. It would be nice to avoid being so repetitive.

~~~
biomcgary
Not sure there is a general solution. Thanks for your moderation and input!

------
sergiotapia
Thanks Cockroach team!

I'm very excited about the possibility of moving to you guys, especially since
it's a drop-in replacement for Postgres. The one thing holding us back today
is postgis replacement. We rely heavily on it for location calculations and
route path saving.

~~~
andreimatei1
Working on it.

------
daxfohl
Azure also offers ephemeral OS disks that can speed up OS perf. This probably
won't make any difference in a database test (unless they store their data
ephemerally too, which...). But still, another thing to consider.

------
longtermd
For me as a developer and CTO, it's still a huge pain figuring out how to
properly configure and scale AWS, Azure, or GCP. In-depth reports on "How to
configure your AWS EC2 for max. performance" would be great, one report each
for a specific use case, e.g. realtime chat app, basic apache webserver,
nodejs web app, ... If you think about it, there are only a handful common use
cases.

~~~
fivre
As someone who gets questions about this often, I wonder why nobody seems to
know how to answer these questions for themselves, or even how they'd begin to
research them. There seems to be a dearth of developers who understand what
bottlenecks their applications are likely to encounter, what options exist for
profiling and analyzing their performance, or that they may need to read
something and learn something new about their systems and the systems they
interact with.

Everyone wants an easy button, and that might exist for something that's very
well-established with a large community of users, at which point it probably
just exists as an AWS service. For many things, you have to do research.

~~~
je42
I did this kind of research in the last two weeks. I have got a huge
spreadsheets with goals and various options how to fulfil these goals with the
cloud offerings of just aws and gcp.

So the fun part here is, several options are viable, until you find this one
issue where it doesn't work anymore. then you have to track back or find work
around and check if the work arounds are acceptable or not.

And some of these limitations are usually not that straight forward to see and
also to read about.

However, there are some resources like [https://github.com/ahmetb/cloud-run-
faq](https://github.com/ahmetb/cloud-run-faq) which are very good and helpful.
Sometime official documentation doesn't really cover the questions that are
important for your product.

Also, what i found without a multiples poc testing connectivity and the basic
building blocks it is difficult to find all the gotchas, that might turn into
blockers for a particular solution.

------
plasma
Is there a reliability/availability report that covers multiple months (VM
uptime, PaaS services, etc)?

I've personally seen the reliability differences of services in Azure vs AWS
for example.

~~~
scarface74
If you’re concerned about reliability and your major concern is “VM uptime”,
you’ve got bigger issues. You need redundancy and automatic failover at every
level based on your cost benefit analysis.

------
alpb
Since you're one of the authors: TPC_RR is a typo.

~~~
boulos
Disclosure: I work on Google Cloud (and also commented on Cockroach’s original
draft post, that this is a follow up to).

Edit: I am a chump, I misread _TPC_ _RR as TCP_RR and responded. Leaving this
here for shame. Thanks to jsolson for pointing out my mistake.

I’m not sure if the post was edited, but TCP_RR is definitely not a typo. You
may be more used to seeing TCP_CRR which opens a new connection and then a
round trip, but for raw network latency netperf’s TCP_RR benchmark is probably
the best tool available.

~~~
alpb
There is still a TPC_RR occurrence in the post.

~~~
orangechairs
(editor here) we fixed it. I'm so used to seeing TPC-C, that my eyes missed
the "TCP".

------
timc3
What a load of nonsense. So cloud provides X doesn’t like the test because the
defaults of their services do not provide the best performance and on top of
that the benchmark used doesn’t present them with the result that shows them
in the best possible light.

It’s like listening to failed beauty pageant contenders

~~~
cthalupa
The idea that you can provide a default option that is the best tuned for
every workload is a very naive one.

There are many changes I would make to increase performance for one workload
that would have a deleterious effect on another. An easy example is the dirty
ratio in Linux - depending on the speed of your local storage, what the size
of your working set of memory is, how frequently the data in the working set
changes, keeping the settings the same across workloads and systems could be
disastrous - it could result in extended periods where you are stuck in a
synchronous flush to disk and block all other IO. That same setting on another
server might be perfectly acceptable and prevent having to go to slower block
storage devices sooner than necessary, increasing overall performance.

It's the same with how you configure your servers - you can throw more
spindles at sequential workloads and have great performance, but a random
workload really should be using flash storage, etc. etc. etc.

Most people strive to provide sane defaults that strike a good balance for the
majority of workloads. This is going to be beneficial to the largest number of
people. It's totally fair for someone to provide feedback on this sort of
thing, and give details on how things could be further optimized to fit a
specific workload. Defaults are not best practices - they are a starting
point.

