
Docker operations slowing down on AWS - pradeepchhetri
https://jeremyeder.com/2017/07/25/docker-operations-slowing-down-on-aws-this-time-its-not-dns/
======
chx
And then people consider me a dinosaur when I say, no cloud, just rent a
server or two (not colo! just dedicated servers). Your average web service
does not need to scale near infinitely; for the same amount of money you pay
to Amazon you can overprovision 3-5-10 times and that'll handle your spikes.
No surprises. Same amount of work: EC2 and bare metal both gives you a root
prompt, go from there. These days you can get things provisioned with 24 hours
-- some providers will do it in minutes. Of course, Amazon provides a lot of
services beyond basic EC2 instances but if you use them you have a very ugly
vendor lock and heaven forbid you want to do something else in the future...

This is like a double trap many try to sell to startups: a) you need to scale
across many machines and b) the way to scale is the cloud. My take: a single
machine (or two for HA) will be enough, if you really want to go big separate
the web server from the database but that's it. And yes, I am in the website
performance business, I worked on the video purchase platform of one of the
largest British television stations and even that didn't require more than a
single database server and a single Redis server for caching layer. Harken to
[https://stackoverflow.com/questions/5131266/increase-
postgre...](https://stackoverflow.com/questions/5131266/increase-postgresql-
write-speed-at-the-cost-of-likely-data-loss) this question from 2011
discussing speeding up from like a thousand inserts per second at the cost of
data loss -- today you will write on an SSD and don't need to risk data loss.
Does your web site / app really get a thousand writes _every second_? I
thought not. Does it even get a thousand _reads_? If not, then why are you
building a complex database cluster...?

The other day I saw quad E7-4870 (yeah won't win any single thread contest but
has 40 cores and 80 threads) 512GB RAM servers for $299 a month, with 1TB RAM
for $499. Had a low end 2TB SSD for boot and you could add 8x1TB HDD w/ HW
RAID for $40...

~~~
anonacct37
Yes. Look at [https://www.packet.net/bare-metal/](https://www.packet.net/bare-
metal/) for bare metal boxes provisioned in less time than an ec2 instance.

Or if you want to go old school, cheaper, and less sexy/api driven:
[https://www.delimiter.com/](https://www.delimiter.com/)

If you shop, for around 30-40 a month you can get 16 cores, 32gb ram, and a
120GB ssd or 1-2TB spinny disk.

~~~
jopsen
also scaleway.com (they have API, but no user-data).

IMO, docker and container orchestration spells a bright future for bare-metal
boxes like these, as you won't need cloudformation, etc..

But I still see few alternatives to S3, many vendors offers block devices, but
only the big clouds offer blob storage. Backup and restore from blob storage
makes recovery from crash pretty easy.

~~~
chrisan
How does docker replace cloudformation? Don't you still need something that
says "hey you are running out of capacity soon, time to add more hardware for
your software to run on"

There has to be something that adds more bare-metal for your docker containers
to run on when the existing bare-metal reaches capacity, right?

~~~
Diederich
A fully containerized setup allows one to use something like Kubernetes to do
such orchestration.

K8s is getting pretty easy to setup these days.

~~~
chrisan
I was asking about once your bare metal capacity reaches its limit. At some
point you need to provision more bare metal and expand the total resources
kubernetes can consume with docker.

Cloudformation, to me at least, is the power to expand resources for large
traffic events. Most of the time you can get by with a small amount of
instances but its nice when it scales up to hundreds of instances in minutes.

The bare metal equivalent would be to buy something that can handle the peak
load from the start, right?

~~~
jzelinskie
>Cloudformation, to me at least, is the power to expand resources for large
traffic events. Most of the time you can get by with a small amount of
instances but its nice when it scales up to hundreds of instances in minutes.

This is not what CloudFormation does. CloudFormation allows a declarative way
to express a group of AWS resources to be created and coupled together.
There's nothing that's quite exactly the same as CloudFormation, but stock
Kubernetes is quite close, since you effectively describe what resources you
want in individual declarative YAML/JSON files. However, there's nothing
standard in Kubernetes for coupling them together into one thing equivalent to
a Stack in CloudFormation.

~~~
chrisan
Maybe we are talking about different things, but I use CloudFormation to
autoscale ec2 instances based on avg CPU load over a period of time

[http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuid...](http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-
properties-as-group.html)

------
sudhirj
While the article is factually correct, the tone strikes me as being
disingenuous. The problem seems to be that the servers were running on gp2
disks, which offer a performance baseline with free short term bursts based on
credits collected. The author has just realised that for consistent
throughput, they would have to choose provisioned throughput and pay
accordingly.

This isn’t some conspiracy by AWS, though. It’s all in the documentation and
isn’t even hard to find. If you want X ops per second baseline with occasional
bursts you pick option A, or if you want consistent Y ops per second you
provision and pay for Y. Read the manual - not having read the docs or
explored the console is not an excuse to say that a service provider is being
shady or rent seeking.

~~~
znep
Exactly. I'd go further and say the burst capacity is actually a super useful
and powerful feature that is very hard to get on your own hardware. Definitely
can catch you unaware if you aren't on top of it and can get expensive for
some needs, but as you say not hidden at all.

I have a hacky shell script I sometimes use for moderately sized environments
that don't have better monitoring setup, it reports the minimum percent of
burst balance remaining over the past day, requires the aws cli and parallel:

    
    
      #!/bin/bash
      while getopts "p:" opt; do
        case $opt in
          p)
            export AWS_PROFILE=$OPTARG
            ;;
          \?)
            echo "Invalid option: -$OPTARG" >&2
            ;;
        esac
      done
    
      if [ ! "$AWS_PROFILE" ]
      then
        echo "-p <aws profile> or AWS_PROFILE env var required"
        exit 1
      fi
    
      start=$(date -v-1d +%Y-%m-%dT%H:%M:%S)
      end=$(date +%Y-%m-%dT%H:%M:%S)
    
      aws ec2 describe-volumes --output text --query 'join(`"\n"`, Volumes[*].VolumeId)' | parallel -j 20 "echo -ne {}\\\t && echo \$(aws cloudwatch get-metric-statistics --start-time $start --end-time $end --period 86400 --namespace AWS/EBS --statistics Minimum --metric-name BurstBalance --dimensions Name=VolumeId,Value={} --output text --query 'Datapoints[*].Minimum')" | sort -n -k 2

~~~
kangman
thanks for this script, pretty helpful!

------
BurritoAlPastor
The author's takeaways include moving disks to io1. This is a bad bargain in
most cases, and particularly bad in the ~500 IOPS range (which is what I'm
seeing in the Grafana screenshot there).

gp2 disks get 3 iops per gig "free", bursting up to 3k. (They don't burst
after 1 tb, because your baseline performance is higher than the burst rate.)
io1 is 25% more expensive per-gb, and you pay by the IOPS on top of that.

A 175gb gp2 disk will give you 525 IOPS baseline, at $17.50 a month. I'm
guessing his volume is about 40gb, doing math backwards from his bottlenecked
IOPS; a 40gb io1 with 500 IOPS will cost you $83.75 a month! And on top of
that, AWS will cap you _hard_ at that 500 IOPS; the gp2 can still burst if
needed.

I know of two general cases where io1 disks are worthwhile: if you need more
than 10k IOPS, or if you have very high IO requirements but very stable disk
space usage (e.g. high-performance RRDs). Crack open Excel and do the math;
it's worth the five minutes to check.

(Also, your burst balance is available as a Cloudwatch metric, as I recall!
Set alarms on that shit!)

~~~
nvivo
Agreed. After a lot of struggles trying to find a good average IOPS on AWS for
our database, we just increased disk to 1tb with gp2 and got rid of the
problem. It gets 3000 iops/sec all the time and you never have these problems.
It's still much cheaper than io1.

It's a trade you must account for when choosing cloud providers, nothing is
free. For now it's still cheaper to pay for servers than spend time setting
things up ourselves, but we're always measuring cost x benefits and thinking
if we need to leave aws. The time is coming, but for now it's still working
for us.

~~~
jdc0589
> we just increased disk to 1tb with gp2 and got rid of the problem. It gets
> 3000 iops/sec all the time and you never have these problems. It's still
> much cheaper than io1.

shockingly, tons of people don't know about this. Sure, there are use cases
where it probably isn't cost effective to do this, but my default policy
towards using gp2 is "will this server write to disk even a moderate amount?
if so, its get a 1TB volume." Especially for something like a node in a k8s
cluster where your workload is non-deterministic, this should be the default.

------
user5994461
Performance management 101 on AWS volumes:

1) The IO of a disks is proportional to the size of the volume. You need to
get bigger volumes to get more performances. 3 io/GB

2) The high performance volumes (io1/PIOPS) are extortion. It's cheaper to pay
for a bigger regular volume (gp2) that comes with a higher IO quota than to
pay for the special high performance volume.

3) Each instances type has a disk performance cap. They are lower than you
think.

4) Don't use t2 instances for anything that requires non negligible sustained
IO.

P.S. Clearly the author is just discovering AWS.

------
shusson
tldr: AWS EC2 has the concept of I/O credits for storage. If your instance
runs out of credits, bad things, which may seem completely unrelated, will
happen.

I was having similar issues last week and did not consider I/O credits. I
think AWS could do better at notifying you if your EC2 instance gets into this
state (without having to set up a cloud watch alarm).

~~~
inferiorhuman
Rather EBS performance is non-deterministic. This is definitely not new, but
is an informative deep dive.

The other big gotcha is that EBS volumes are lazy loaded. Not necessarily
something that's going to bite you in a production environment regularly, but
it's something that could very easily throw off your benchmarks and
performance testing.

~~~
m00x
Should you ever assume IO to be deterministic? Hard drives can crash, they can
be slow(especially magnetic). Networks can disconnect, become congested or
have random interference.

This is amplified on the cloud, but it should also be well known by anyone
that works with the cloud. These things are well documented, although I could
see why you don't really look for it until you have issues with it.

~~~
falcolas
I think the problem is that from your average application developer's point
view - there's nothing that they can really do that they aren't already doing
(unless they're blocking the main thread on disk IO, in which case they can do
a _little_ bit more).

------
x7467
We hit the IOPS limit on AWS many times, both on VMs and SQL instances. The
solution was always to artificially inflate the size of the underlying
storage, as you get 3 IOPS per GB on persistent disks (the other option was to
buy IOPS, but this somehow always turned out to be way more expensive).

This issue was on a "pros" section when we decided to move our operations to
GCP, when you get 30 IOPS per GB on persistent storage, so 10x more than on
AWS. One way or another, if you really need _a lot_ of IOPS, you better stick
with a local (ephemeral) SDD storage – just bear in mind it will vanish along
with your VMs.

------
rixed
In other words, if you do not rate limit yourself then others will rate limit
you.

~~~
morecoffee
It's called out in the post: most applications are not written to deal with
reads / writes suddenly dropping in throughput. How are you supposed to rate
limit the IOPS of a program you don't control? Can you make `docker' do
exponential backoff if it notices some operations are taking too long?

In a network scenario, the remote service could say something like Quota
exceeded, please try again later. read and write syscalls don't really work
like that. They instead remain in uninterruptible sleep, which means they
can't wake up, or be killed or stop.

------
notacoward
The debugging story was interesting, but what really sticks out for me is that
Amazon has pretty tight QoS working for distributed storage. That's actually a
really hard problem - much harder than its better known networking equivalent.
As much as I might curse the Amazon business folks for using it to screw
customers, I also have to give kudos to the engineers for implementing it.

------
eikenberry
Why not use an ephemeral volumes for the docker data. This is a CI system, so
the docker images are all transient anyways. Seems like an easy way to avoid
the I/O credits.

~~~
Dunedan
The most recent "general purpose" instance families (C4, G3, M4, P2, R4) don't
offer ephemeral storage anymore. My guess is that AWS will only offer
ephemeral storage for a few selected instance families in future.

So you have to decide: older instance types (which will become more expensive
as they usually don't benefit from new price reductions) or no ephemeral
storage

------
chucky_z
A 1TB gp2 volume is cheaper than an 1TB, 3000 iops io2 volume, and provides
nearly-identical characters.

Only use io2 if you have a _latency_ sensitive application or need more than
10,000 iops. Even then you can RAID10 some gp2 volumes together and with
enhanced networking get I believe 30k iops out of one instance.

~~~
KaiserPro
even thenm if you really are worried about latency, AWS isn't the place to
be...

------
jgrant27
I think after a decade of "Cloud" hype that many customers are finally
realizing that the costs/benefits of using a provider are just more
complicated and more expensive for most of their needs.

------
fapjacks
For me, this style of writing really detracts from whatever the article might
actually say.

~~~
meesterdude
I actually found it enjoyable and refreshing. Technical, but fun.

------
nvivo
EBS optimized VMs looks nice on paper, you can choose the size and pay for
your needs only. But the once you start to use the disk in production you see
the problems.

In short, if you want to use the disk a lot, you need to pay a lot. If your
app is slow on aws and uses EBS as storage, increase the disk size, not the VM
instance type. This is true mostly for database performance, which once the
RAM is filled, relies on IO a lot to get new pages from disk.

~~~
philipodonnell
This is the kind of advice I wish was easier to get. The AWS docs are very
lacking in my opinion for "If you're exhibiting this specific performance
problem the cheapest option is to increase [x]" where [x] is increase instance
size, disk size, performance, add a read replica, etc...

------
qaq
if you don't need to deal with HIPAA PCI DSS etc. going with DO, Vultr and the
like would save many startups considerable $ compared to AWS.

------
nailer
If you're interested in IO performance, maybe don't run Docker - whose main
advantage over VMs is fast IO - on top of VMs unnecessarily?

Triton and OpenShift add proper isolation to Docker and hence provide fast IO
since you're not adding a layer of Xen.

~~~
wmf
I love native containers, but none of the major cloud providers support them
so that isn't really actionable advice for most people.

------
whatnotests
+100 points to the excellent debugging skill demonstrated by the author.

It's great to see someone in top form.

------
vacri
How do you flip a volume on the fly? I thought you had to do the "snapshot >
make new volume > reattach" route

edit: thanks for the info, znep (hit my comment limit, hence the edit... )

~~~
znep
Not any more, as of some number of months ago you can grow and change volume
type on the fly especially if it isn't a boot volume.

[http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-
expan...](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-
volume.html)

Read the limitations very carefully,
[http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/considera...](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/considerations.html)
In particular if it was attached before Nov. 1, 2016 you need to do a one time
stop and start of the instance or detach/reattach the volume and there are
limits on instance types supported (but the definition of "current generation"
is broader than you might assume at first).

This is quite an awesome enhancement, we were able to transparently convert a
bunch of 15TB volumes from gp2 to st1 without downtime or impact to the app,
and save a bunch of money.

