
A bunch of rants about cloud-induced damage - lwhsiao
https://rachelbythebay.com/w/2020/05/06/scale/
======
ciprian_craciun
Interesting article, although the author mixes two different problems:

(1) the fact that "cloud" (or in fact any virtualized system) is hard to
debug, especially when it comes to performance problems;

(2) the fact that most often "cloud" systems come packaged with "elasticity"
solutions;

Unfortunately I think both issues described are true, however none of them are
deal-breakers. Uncertainty when comes to performance can be taken into
account, as it happens at many layers, not only the OS / VM, including the
programming language level due to garbage collection.

Now regarding the auto-scaling based on CPU, this is unfortunately too true...
All cloud providers (I have experience with AWS) are quick to sell you CPU-
based auto-scaling. However CPU usage doesn't always translate to a true load
metric, especially for applications with non-uniform request patterns.

This is why I have taken another approach: identify the number of requests a
single VM can handle in parallel, and if that threshold is reached, although
the server will queue requests and serve them in a FIFO manner, start creating
new VM's.

(With AWS auto-scaling group rules it is a little bit more complex, but that
is the gist of it.)

~~~
0xcoffee
There is also so much more to the 'cloud' then just CPU usage. If your focus
is devs are using the CPU inefficiently, I see this problem independent from
the cloud. People with on-prem servers are just as guilty of building beefier
servers when code optimizations are possible.

The way I see it, there is a kind of equation for this:

cost of a hiring someone who can manage servers vs 'cloud markup'.

Lets say it would cost 60k to hire this person, that is a lot of
inefficiencies you can get away with. What you get 'for free' is your services
automatically running on multiple availability zones and really great uptimes
with minimum effort.

My company would gladly piss away lots of cash than have an hour of downtime.
Devs and Business people really have different perspectives on these things.

This is why we migrated from Hetzner to Azure, even though Azure is wayyy more
expensive. Got bitten once by Hetzer downtime and never came back.

The second half of their rant is directed at auto-scaling solutions (k8?). I
have my own share of problems with k8, but I guess the problem I have with
these kind of abstract rants, is they don't really offer any interesting
insight or solutions.

------
quezzle
FYI the author sells server hardware.

Seems somewhat disingenuous not to mention that up front.

~~~
rachelbythebay
I do? Holy crap when do I start getting those checks?

~~~
inemesitaffia
Your bucket of gold is on the way. Just wait for the next rainbow

