
An engineer’s guide to cloud capacity planning - gk1
https://increment.com/cloud/an-engineers-guide-to-cloud-capacity-planning/
======
cridenour
> Perhaps surprisingly for engineers who work in mission-critical business
> applications, occasional spikes of 90%+ of our users being entirely unable
> to use the sole application of our company was an entirely acceptable
> engineering tradeoff versus sizing our capacity against our peak loads.

I think we found the Pokemon Go engineering team.

~~~
samstave
Acceptable engineering trade off and acceptable business decision trade off
are not always the same thing.

That paragraph reads to me “screw the users!” - and for them to say that
around “mission critical” seems they don’t understand the critical mission...

~~~
lostcolony
The critical mission is to deliver as much business functionality as possible
within the business constraints.

The business constraints often include cost. They also may be arbitrarily far
from 100% uptime.

~~~
samstave
Yeah, agreed - but that paragraph was worded in such a way as to convey "screw
the users for spiking use, such that the application is now unavailable, and
now ALL business halts..." \-- Or did I misinterpret that?

~~~
adrianratnapala
The thinking is probably that if usage spikes and some people can't get access
to the game _and they know that this is because of spiking usage_ , then this
just builds buzz and demand -- like hip restaurants with visible queues.

So in this case the business demand on the engineers was presumably to make
sure that 10% of users get good service during the spike while the others are
gracefully shown a "Sorry we are too busy right now " message.

Whether or not this is evil depends on the pricing model and other factors
that determine whether those bounced user had a reasonable expectation of
service.

~~~
samstave
These engineers and C-levels of this company should be punched in the face.

------
qaq
If you are a small startup and do not have strict regulatory or other special
req. run on something simple like DO, Vultr etc. and save yourself pain and $.

~~~
cakeface
How do you work on Digital Ocean or some other cloud without VPC? I haven't
been able to figure this out. Sure, if I just have 1 or maybe 2 servers it's
fine. More than that what are you doing? Mesh networking like ZeroTier?
iptables on every machine? It just seems like a lot more work than setting up
a VPC.

~~~
tyingq
Not to say it's as robust a strategy as a VPC, but DO does have something
called a cloud firewall. It supports the idea of tags that apply to groups of
servers.

[https://www.digitalocean.com/community/tutorials/an-
introduc...](https://www.digitalocean.com/community/tutorials/an-introduction-
to-digitalocean-cloud-firewalls)

~~~
beberlei
The problem on DO is not outside tradfic but insode. when you enable
networking everyone in the same DC network can access open ports on your
machines. That elastixsearxh or redis you use needs ssl and pw protection
otherwise you have a big problem

~~~
closeparen
You can configure the cloud firewall so that only VMs in your account with a
certain tag can communicate, even over the internal network.

Public network + firewall feels dirtier than private network, but in reality
they are the same thing: a network that’s only accessible to your VMs, to the
extent that you trust the provider’s software to enforce that boundary.

------
iamtew
This seems more about capacity planning your application when running in the
cloud.

I was hoping for some insight when you're hosting your own private cloud, that
seems quite a bit more tricky. It's easy to prepare to scale up/down your
deployment when all the hardware is already there, but when you're the one
running the actual cloud and need to do the same, you can't just do an API
call and suddenly have another 100 U of servers and switches racked, cabled
and provisioned.

Still a nice article though!

~~~
qaq
Well if you want to service huge spikes you can run a hybrid env. with on
prem. handling base load and cloud absorbing spikes or you have to bite the
bullet and overprovision on prem/private cloud based on you specific business
req.

------
ryanmarsh
I don’t mean to sound flippant here but going Serverless would obviate the
need for much of this guide.

~~~
johnpython
Going serverless would increase your costs significantly. Have you seen how
much AWS charges for Lambda use? Where did this myth of serverless saving you
money come from?

~~~
ryanmarsh
I keep hearing this from people who don’t say they’ve built out Serverless
stacks. But I keep hearing from people who switched that they’ve been able to
cut costs. So in my limited experience this has so far proven not to be true
in practice.

