
Reinventing virtualization with the AWS Nitro System - manigandham
https://www.allthingsdistributed.com/2020/09/reinventing-virtualization-with-aws-nitro.html
======
khuey
Moving the hypervisor off the main CPU has made Amazon more comfortable (from
a security perspective) enabling "bare metal" features like hardware
performance counters. Brendan Gregg has written quite a bit about how these
are used at Netflix. We use them in rr[0] to enable fast recording of
asynchronous events like signals and context switches.

Speaking of which, if anybody who works on EC2 is reading this it would be
great if you could continue exposing more MSRs and bare metal features. rr in
particular would like MSR_INTEL_MISC_FEATURES_ENABLES to be available[1],
which would enable trace portability for traces recorded on EC2.

[0] [https://github.com/mozilla/rr](https://github.com/mozilla/rr) [1]
[https://github.com/mozilla/rr/issues/2667](https://github.com/mozilla/rr/issues/2667)

~~~
justicezyx
Seems hypervisor is still on main CPU: See figure 2 (2017 nitro architecture)

~~~
_msw_
Disclosure: I work at Amazon on building infrastructure for AWS

For the bare metal instance configurations (where the instance size type in
".metal"), there is no hypervisor running on the processor.

~~~
justicezyx
I cannot find docs sating nitro has any general purpose chip to run a hyper
visor to manage the VMs, how customer vm get launched and managed from outside
the physical host?

~~~
_msw_
The management of the physical host itself (including launching and
terminating customer instances when running in bare metal mode) is the
responsibility of a component of our server called the Nitro controller.

------
KMag
Very interesting, but unless I'm missing something, this analysis measures
virtualization overhead only in terms of x86 CPU cycles, not in terms of, say,
Watt-weighted processor cycles.

Approximately 30% of the x86 CPU cycles went into running the hypervisor, then
they designed some new hardware, moved the virtualization functionality over
to the new hardware, and apparently (in this telling) stopped accounting for
the power / cycles consumed by that entirely new hardware they added to the
system.

Maybe this makes accounting more clean, and it's great that it safely enables
bare metal instances. However, the tone of the article hinted that under
Nitro, a higher percentage of the cost / environmental cost goes into running
the user's workload and then failed to account at all for the monetary or
environmental cost of the added hardware. (If there's no cost savings to the
customer or environmental savings for us all, why does the reader care about
lower virtualization overhead? The article isn't phrased as satisfying pure
intellectual curiosity.)

Taken to an extreme, they could have used dynamic translation to run all of
the user workloads also on custom hardware and claimed negative infinity
virtualization overhead: billions of x86-CPU cycles-worth of work done, zero
x86 CPU cycles spent.

Am I misunderstanding this analysis? I don't think it's all accounting tricks,
and wouldn't be surprised if dedicated virtualization hardware/firmware is
more efficient at virtualization than general-purpose CPUs (and likely more
cost-efficient, based on Intel / AMD's markup on CPUs). I also understand
their internal hardware costs and energy costs for this new hardware might be
sensitive information, but it would be nice to see some kind of accounting for
this new hardware instead of apparently treating it as zero CapEx / zero OpEx
/ zero wattage.

~~~
jonstewart
I’m not sure what you mean when you reference CapEx—the CapEx required to
develop Nitro (definitely nonzero)? To buy the hardware and deploy it in their
data centers (also assuredly nonzero)? Or on a unit basis, where it would have
to be balanced against what it was replacing?

As far as development and deployment goes, this is a great example of how
large companies should use CapEx to build competitive barriers. They can do
something hard and expensive—5 years of engineering and then the expense of
custom fab—then use that to make their unit economics better (significantly
more utilization of the server by customer apps) and differentiate their
products. Vogels says that development began in 2012; left unsaid is that AWS
knew by then that competitors were gunning for them, so how do you protect
your lead? Nitro seems like one answer.

One thing to consider in the balance is the potential savings from not relying
so much on Xen. There have been Xen vulnerabilities; those haven’t been fun
for AWS. It’s a complex piece of software and AWS’s version was likely heavily
patched, requiring a dedicated team of senior devs. The reduction in
operational complexity from using Nitro plays a role, too.

Finally, you have to consider one of AWS’s big market pushes, getting Big
Enterprise to transition internal business applications away from in-house
hardware/data centers and onto AWS. Few in-house IT teams could likely match
the performance of EC2 with Nitro; killing off managing data centers makes the
CFO happy, and having improved performance on AWS makes users happy.

~~~
KMag
> I’m not sure what you mean when you reference CapEx ...

When I talk about CapEx, I'm talking about the portion of the whole server's
CapEx represented by the Nitro hardware cost, including amortized Nitro
development cost. CapEx + OpEx is the normal way to account for "total cost"
of the new hardware, which can be compared against CapEx + OpEx fraction of
the server's cost that can be attributed to the Intel / AMD CPUs.

Likewise for OpEx, I'm talking about the portion of the whole server's OpEx
represented by the Nitro hardware operating cost.

They've offloaded 30% of the work to this new hardware, and there are two
obvious ways to look at if this was a good idea: (1, evaluating monetary
savings) is the cost (CapEx + OpEx) of this Nitro hardware less than that of
the 30% of the box's CPU resources freed up? (2, evaluating environmental
savings) is the wattage used by this Nitro hardware less than the 30% of the
box's CPU resources freed up?

------
transpute
A public hardware predecessor to Nitro can be seen in the "SmartNIC" academic
research in several generations of [https://NetFPGA.org](https://NetFPGA.org)
hardware. Today, hardware co-processors can be prototyped with an FPGA and
open toolchains, see Ulf Frisk's work on DMA-based attacks,
[https://youtube.com/watch?v=5DbQr3Zo-
XY](https://youtube.com/watch?v=5DbQr3Zo-XY)

Meanwhile, AMD uses a closed-source Arm coprocessor (PSP) for SEV features
like VM memory encryption, inside their x86 CPUs. Intel has upcoming hardware
with dedicated x86 silicon to run an Intel-signed TDX (Trust Domain
Extensions) hypervisor for VM security features,
[https://www.phoronix.com/scan.php?page=news_item&px=Intel-
TD...](https://www.phoronix.com/scan.php?page=news_item&px=Intel-TDX-Better-
VM-Secure)

Kudos to Annapurna for blazing the Nitro trail. Their founder has since
pioneered NVME-over-TCP storage virtualization, with optional FPGA
acceleration from Lightbits Labs, code upstreamed to Linux. Hopefully the next
few years will bring more open-hardware interposers for storage & network
paths, for academic research and commercial prototyping.

------
artjomb
> In contrast, with the Nitro System, the only interface for operators is a
> restricted API, making it impossible to access customer data or mutate the
> system in unapproved ways.

That's great but what are the approved ways? This does not prevent access to
customer data. Is there and built-in audit functionality to see accesses that
were approved and done/attempted? This would also need to be implemented in
all levels of the stack.

This basically means that AWS closed a compliance issue through technical
control at the lowest level.

------
daxfohl
Does this mean that each node can only run a single vm size? Like one XXL or
two xls or four larges, etc, but no mixing and matching? (I got that
impression looking at outposts pricing too, it appears you have to decide your
vm sizes up front, unless I'm reading it wrong)

~~~
EwanToo
I don't know if this is a hard constraint, but I believe this is how AWS
organise the instances - you'll be sharing hardware with other instances of
the same size.

~~~
core-questions
I think it's more that it divides up. One physical machine may serve 8
c5.xlarge, 4 c5.2xl, 2 c5.4xl, or 1 c5.8xl (so to speak, not necessarily this
exact instance).

Would make sense to fit 2 c5.4xl alongside 4 more c5.2xl, for example.

------
romantomjak
This was quite eye opening. Never thought they go to such great lengths to
optimise their virtual (and physical) machines

~~~
ignoramous
I want to know more about their TCP replacement: the Scalable Reliable
Datagram (SRD) [0].

Seems like _Nitro_ is radically changing AWS' hardware story.

[0]
[https://twitter.com/ogawa_tter/status/1108767124476981248/ph...](https://twitter.com/ogawa_tter/status/1108767124476981248/photo/3)

Edit: Here's a paper on it
[https://ieeexplore.ieee.org/document/9167399](https://ieeexplore.ieee.org/document/9167399)
and a mini-discussion on twitter:
[https://twitter.com/_msw_/status/1297223835519815681](https://twitter.com/_msw_/status/1297223835519815681)

------
Thaxll
What is nowdays the overhead % of modern virtualization? Arround 5% or so?

~~~
dastbe
(i work at aws)

it depends on what you’re talking about. in the context of this article,
brendan gregg measured the overhead on nitro as less than 1%.

[http://www.brendangregg.com/blog/2017-11-29/aws-
ec2-virtuali...](http://www.brendangregg.com/blog/2017-11-29/aws-
ec2-virtualization-2017.html)

~~~
redstripe
Can you explain the 5x difference between physical and virtualized benchmarks
here:

[https://www.techempower.com/benchmarks/#section=data-r19&hw=...](https://www.techempower.com/benchmarks/#section=data-r19&hw=cl&test=fortune)

~~~
dastbe
As the other commenter mentioned, the machines have completely different
setups so I don't think you can make any comparative judgment.

I also don't work on Azure which is used in that benchmark, and the link I
provided was specifically about benchmarking AWS.

------
tyingq
Easier to read on mobile:
[https://outline.com/3D4R5b](https://outline.com/3D4R5b)

------
lazyant
Isn't AWS moving from Xen to KVM (actually I see my instances run on KVM), how
does it factor here (KVM not even mentioned)?

~~~
wmf
Yes, Nitro is built on KVM, but KVM is a very small component. (What most
people call "KVM" is really 90% QEMU and 10% KVM and Nitro does not use QEMU.)

~~~
lazyant
great thanks!

------
dmarinus
This all sound like marketing talk to me, at reinvent they talked about
replacing xen (ring0) with nitro but I don't see how you could replace that
with hardware.

~~~
teilo
Indeed, you don't see.

As the article makes plain: It's an entire hardware stack specifically
designed for virtualization from the ground up, and a corresponding software
stack to utilize and manage it. There is no traditional HAL in Nitro because
the hardware itself handles virtualization and resource isolation. You can't
do that with traditional PC server hardware where a NIC is just a NIC and a
SAS card is just a SAS card.

Xen, on the other hand, is designed to use standard PC hardware, with all the
virtualization handled in software. The only VM-specific hardware support in
Xen is the hypervisor support built in to modern CPUs, and occasionally some
helper functions in NIC firmware. But you still have a 100% software-based HAL
to provide the isolation and bare-metal simulation.

Imagine a NIC, for example, that instead of presenting itself to the OS as a
single card with however many ports, can generate new virtual NICs, in
hardware, and present them to the OS for assignment to a VM. The card itself
manages bandwidth allocation, aggregation, encryption, and communication with
cards in other hosts to create and manage VPCs.

