
EC2 Instance Update – C5 Instances with Local NVMe Storage - jeffbarr
https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/
======
bostik
Depending on which kernel version you are, C5 (and M5) instances can be real
sources of pain.

The disk is exposed as a /dev/nvme* block device, and as such I/O goes through
a separate driver. The earlier versions of the driver had a hard limit of 255
seconds before I/O operation times out. [0,1,2]

When the timeout triggers, it is treated as a _hard failure_ and the
filesystem gets remounted read-only. Meaning: if you have anything that writes
intensively to an attached volume, C5/M5 instances are dangerous. We
experimented with them for our early prometheus nodes. Not a good idea. Having
the alerts for an entire fleet start flapping due to a seemingly nonsensical
"out of disk, write error" monitoring node failure is not fun.[ß]

If you run stateless, in-memory only applications on them (preferably even
without local logging), then you should be fine.

0:
[https://bugs.launchpad.net/ubuntu/bionic/+source/linux/+bug/...](https://bugs.launchpad.net/ubuntu/bionic/+source/linux/+bug/1758466)

1:
[https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729119](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729119)

2:
[https://www.reddit.com/r/aws/comments/7s5gui/c5_instances_nv...](https://www.reddit.com/r/aws/comments/7s5gui/c5_instances_nvme_storage_stucks/)

ß: We handle nodes dying. The byzantine failure mode of nodes suddenly spewing
wrong data is harder to deal with.

~~~
ohre1Eda
I am curious, what kind of write characteristics can manage to saturate a 255s
timeout on a storage device that does 10k+ iops and gigabytes per second
throughput? Normally writes slowing down leads to backpressure because the
syscalls issueing them take longer to return.

I can imagine some crazy random access, small-record, vectored IO from large
thread pools. But that's not exactly common because most software that is IO-
heavy tries really hard to avoid these things.

~~~
lathiat
According to the bug here:
[https://bugs.launchpad.net/ubuntu/bionic/+source/linux/+bug/...](https://bugs.launchpad.net/ubuntu/bionic/+source/linux/+bug/1758466)

It was actually filed about EBS block disks using NVME (on these new
instances, there is a hardware card that presents network EBS volumes as a
PCI-E NVME device). In certain failure cases since this is a network block
store, they can fail for some period of time exceeding this timeout.

The idea of this change is to ensure once they come back the machine is left
in a usable state.

Of course, I would not expect to see this on Local NVME disks which is what
they announced - that you can now get such instances with local disk as well
as EBS.

~~~
ohre1Eda
Ah yeah, the article was about local NVMe, so the concern is probably not
relevant here.

~~~
lathiat
Unrelated: love your pwgen username

------
jchw
This is probably a big deal, if it's anything like the local SSD storage on
GCE. The performance of local SSDs on Google Cloud is nearing absurd compared
to anything else you can find right now.

That being said, I think one of the less compelling parts of this is that
it'll probably vary per instance type quite a bit, being limited to C5 to
start, so if you have a workload needing way better disk I/O than CPU
performance, you might have to waste. That's one thing GCP really does have on
AWS, better granularity.

~~~
runesoerensen
> being limited to C5 to start

I3 and F1 instances also have NVMe SSD instance store volumes
[https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-
inst...](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-
store.html#nvme-ssd-volumes)

Here's the full list of instances with instance storage:
[https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Instance...](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#instance-
store-volumes)

~~~
jchw
Nice, didn't know that.

------
Elect2
If I remember correct, all old ec2 instance support local storage. Then as the
growing of EBS, they disabled local storage. Now it's back again..

~~~
dwyerm
It was less that local storage was disabled, and more that the newer hardware
types didn't have any local storage to share.

------
tbrock
Amazing! I hope they adopt this for RDS.

EBS volumes are great and all but not for database where the dataset is many
multiples of the working set.

~~~
pjungwir
I've been intrigued by the idea of running databases on EC2 instance storage
for a long time. (You couldn't use RDS though, at least not today.) Putting
your db on something also called "ephemeral storage" seems risky, but maybe
not much riskier than putting it on plain HDDs. The big issue to me is that
most instances don't come with much space. If you need more you have to scale
up the whole instance (not a separate dimension like with EBS), and if you're
already on the biggest instance type you're just out of luck. I guess it could
be worthwhile to use separate tablespaces so you could have some data on
instance storage and some on EBS. But so far I've gotten acceptable perf by
RAIDing over gp2 EBS volumes (12 TB in my case), following the approach here:
[https://news.ycombinator.com/item?id=13842044](https://news.ycombinator.com/item?id=13842044)

~~~
jedberg
All of Reddit was Postgres on raided EBS up till I left in 2011 and I think
still is today but I kinda hope not.

It’s totally safe to use local storage if you build it right. But those raided
EBSs caused a lot of problems. In short, when one gets slow the whole volume
gets slow because software raid isn’t hardware raid.

The main advantage of RDS is that they take care of the mundane redundancy for
you.

~~~
tbrock
Hey Jeremy, I’m saying they should take care of it for me.

As a database operator I treat safety on i3 similarly where I have multiple
hot replicas of my data so that if any fails I’m good to go. Additionally,
there isn’t any reason you couldn’t have a EBS replica of an ephemeral node.

What we typically do with i3 is mirror the data locally, replicate it, have an
EBS replica, and take backups. This is probably overkill but the data needs to
be both accessed quickly and secure so that’s where we are at.

~~~
kakwa_
I'm wondering, how do you handle failovers?

Is it automatic or manual?

On infrastructure I handled from top to bottom, I used VIPs with keepalived
(only the vrrp part, with a weight linked to success/failure of a check
script).

But in AWS, I'm wondering how to do it properly, maybe DNS records with low
TTL (like 1 second).

------
debunn
I noticed the following comment in the article:

> Encryption – Each local NVMe device is hardware encrypted using the XTS-
> AES-256 block cipher and a unique key. Each key is destroyed when the
> instance is stopped or terminated.

Does anyone know if the existing i3 EC2 instances NVMe drives are also
encrypted in this fashion? I can't find any articles stating this...

Thanks!

~~~
iconara
i3 and f1 also have encrypted disks. I have found some references on blogs for
this, but the only place I've seen it mentioned by AWS directly is in this
presentation from re:Invent 2017:
[https://youtu.be/o9_4uGvbvnk?t=30m20s](https://youtu.be/o9_4uGvbvnk?t=30m20s)
(at 30:25 the presenter mentions that the nitro cards "protect the underlying
flash device and customer data through encryption").

~~~
_msw_
Hi, it's presenter in the video here.

The documentation at [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-
inst...](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-
store.html#nvme-ssd-volumes) will be updated soon with the same information.

~~~
iconara
Excellent, thank you. It would be very good to have that information in the
official documentation. I've wanted to refer to something like that for
compliance reasons, for example.

------
plasma
We use i3 instances for some workloads which are great, I just wish the CPU
was as powerful as other instance types.

~~~
tbrock
That makes this a perfect fit

~~~
plasma
I'm keen for 2 vCPU (but faster) with 450GB+ SSD; these new ones are a bit too
expensive for that much SSD without going up in vCPU, but a good start.

------
cthalupa
It looks like i3.metal is also available - seeing them in the console

------
dis-sys
Great news! I think Amazon should provide more specs on those NVME drivers.
what kind of R/W latency/max bandwidth/low and high QD throughput I can
expect?

------
soccerdave
This is really exciting to see! I had assumed that they would be launching
most new instance types with only EBS storage so this is awesome that it looks
like this will be coming to even more instance types too.

The bottom of the article mentions "PS – We will be adding local NVMe storage
to other EC2 instance types in the months to come, so stay tuned!"

------
pishpash
So back to instance storage we go?

------
pwarner
Now they just need R5 instances for our RAM hungry apps. Keep the local nvme
please.

------
Manozco
I'm not seeing the usage here. What would need a (very very) performant
temporary storage that you could not achieve with io volumes ?

~~~
Johnny555
io1 volumes are expensive and not nearly as performant as local nvme SSD's.

io1's top out at 32K PIOPS per volume and a 32K PIOPS 225GB volume would cost
around 2100/month.

But an entire c5d.2xlarge with 225GB of local SSD will only cost around
$283/month, and (based on i3 performance) will give you around 180K write
IOPS.

