
The AWS EC2 Windows Secret Sauce - maishsk
https://technodrone.blogspot.com/2019/03/the-aws-ec2-windows-secret-sauce.html
======
scarface74
I’ve been a Microsoft developer for over two decades. But once I started
architecting solutions on AWS and seeing the Windows Tax first hand - in terms
of resource requirements and licensing costs - I started trying to avoid
Windows like the plague.

Also the true costs of infrastructure became my problem - accounting can see
exactly what a solution costs - instead of some amorphous cost in the IT
budget.

~~~
briffle
I really like the better visibility into the true costs. Too many times in the
past, a team had a budget for the software they needed to deploy a project,
but the Virtualization software, storage array, etc, were all out of the IT
budget. that makes the IT budget look bloated, and hard to explain, and ripe
for being seen as a 'cost center'.

~~~
mc32
That’s too bad and unfortunate of some companies who see IT as a black hole.
Imagine having an R&D dept which could incur lots of costs and little result
individually but as a group produce good licensable IP. Sometimes you have to
realize some things are a cost of doing business.

~~~
scarface74
It's not that they don't realize it. The R&D department doesn't have to be
accountable for costs. The R&D department isn't incentivized to control costs
because it isn't our problem. Proper tagging of resources shines a light on
the R&D department and the business knows exactly what our projects'
infrastructure cost is.

------
Twirrim
> Seriously though - Windows images are big - absolutely massive compared to a
> Linux image - we are talking 30 times larger (on the best of days) so
> copying these large images to the hypervisor nodes takes time.

No... just no. Disks aren't local to hypervisors. There's no copying going to
be taking place. EC2 instances are provisioned using EBS volumes, which aren't
going to be local to the instance itself. EBS is likely doing a disk clone
operation, and those are relatively cheap in standard filer operations. Even
large images you're talking a drop in the ocean in terms of the overall time.

The main issues with Windows in a cloud environment comes from that first boot
scenario. You could get past some of that by keeping a pool of warm instances
around, but it'd require a lot of work on the Windows side to handle the
provisioning use case.

On Linux the instance boots, init processes kick off, and right at the end
cloud-init creates and configures accounts with ssh keys etc. and away you go.
Typically anywhere between 30-60 seconds boot time depending on the
distribution.

Windows isn't that accommodating. Images that are used in cloud environments
have to be "generalized". You install it on specific hardware, and then tell
it that it's not to give a crap about hardware specifics, but oh you must have
these specific drivers etc. in you. It also tells it to clear up after
yourself while it is at it.

First boot happens, and Windows goes through a mandatory process whereby it
identifies hardware, installs and configures drivers etc. etc. etc. You can
inject a script to carry out the process of things like setting up a user, and
generating a password, but that's fairly minor on the scale of things. This
first process _requires_ a reboot. There's no escaping it.

That's really the reason why Windows provisioning takes long.

There's unavoidable reboots, and unavoidable Windows driver installation and
configuration, where the Linux distribution approach to the kernel makes life
easier (every driver you're likely to need is a module and available on
boot... unless you've got dracut running in the default host-only mode and it
has made you a totally stripped down initramfs.)

Add on that Windows booting takes longer than Linux even under optimal
conditions and you end up with a slower launch.

~~~
pjc50
> Windows goes through a mandatory process whereby it identifies hardware,
> installs and configures drivers etc. etc. etc. You can inject a script to
> carry out the process of things like setting up a user, and generating a
> password, but that's fairly minor on the scale of things. This first process
> requires a reboot. There's no escaping it.

This is the "OOBE" (out of box experience) phase. I'd have thought they'd just
skip it and provision you a pre-warmed image .. but I guess because there's a
license/activation dependency they can't do that?

I wonder how many MWh could be saved by Microsoft adding a "acquire cloud
volume license at boot" mode. Or shoving the licensing/uniquification
requirements into the platform TPM.

~~~
nonce_account
Disclosure and claim to authority: I work on the Windows team at Microsoft,
sometimes on performance and OS installation stuff.

There's definitely some cruft that chews up time on first boot. But it's not
everybody's favorite punching-bag, licensing. That stuff doesn't happen in the
critical boot path.

It might be installation of device drivers, but that too is unlikely. If you
generalize Windows in a VM, you can use the `sysprep.exe /mode:vm` flag, which
essentially tells sysprep to retain most of the device tree, since you expect
to run the thing on similar hardware. I would assume that AWS is clever enough
to have found that flag; certainly we have Azure use it. When the flag is
used, there's very little device- and driver-related work to do on first boot
after generalization.

The reality is that software is complicated and hard, and anything punchy
enough to fit into a comment on a website is going to be a vast simplification
of reality. So let the simplification begin :)

One reason first boot is slow is the component that orchestrates startup of
usermode services, which on Windows is called SCM. SCM is very old. At the
time SCM was created, it was _much_ better than the SysV-style init scripts of
other OSes. But since then, other OSes leapfrogged Windows with
systemd/launchd, which are a generation ahead of SCM. SCM starts services in
serial, while systemd maximizes parallelization. SCM has a "push" model: it
basically starts all the services that it can find, while systemd has a "pull"
model: it starts just the dependency cone you need to get the system you want.
(This is a simplification.)

Another performance issue is that Windows doesn't have a way to notify code
that the hostname has changed. Obviously it'd be easy to add one, but then the
hard part would be updating the whole OS to do something reasonable with that
notification. So instead, Windows requires a reboot to change the hostname.
Except first boot: to avoid a reboot as soon as you power on your shiny new
computer, there's a clumsy dance where the OS holds back most usermode
processes until the hostname is set, then it sort of tries booting usermode
again. (Huge simplification!)

Thirdly, the footprint of Windows is just bigger than that of an expertly
hand-tuned Linux installation. Much of this problem was solved with Nano
Server... but are you actually using Nano Server? It turns out that people
like Windows because Windows runs Windows programs. Take away compatibility
with many Windows programs, like Nano Server did, and you get a much faster
and more secure OS that nobody's heard of.

We take both perf and cloud hosting seriously, and we're working on problems
in this space. You should expect Windows to get better with each release. But
to close this off, I don't want to hog all the blame. It's always possible
that AWS is doing something silly in their guest agent or paravirtualization
stack that measurably degrades boot perf. We've previously caught Azure doing
silly things -- now fixed -- that seriously delayed the amount of time before
the guest reported itself as ready. If you want to see Windows hosting done
well, try Azure.

~~~
Twirrim
> you can use the `sysprep.exe /mode:vm` flag, which essentially tells sysprep
> to retain most of the device tree, since you expect to run the thing on
> similar hardware

In my experience that only really works with full paravirtualized
environments. If you start mixing in SR-IOV things can get a little messy.
Given customers actually crave high performing networks, that then presents
you with a choice:

1) Make two images. One for full paravirtualized environments.

2) Ignore it and let Windows first boot time take longer.

Of course most clouds already end up with what are effectively multiple images
for the same setup/configuration of Windows, one per hardware type anyway,
because even with full PV things can get a bit strange, and you're often
ending up with blue screens during first boot.

They're not going to want to _double_ that number. Even with full automation
that's a bunch more things that can go wrong, more operational burden etc.
etc. Where's the value proposition? Windows provisions a little faster?

Linux images rarely need to be produced for different hardware types /
environments. They just spin up and away you go. As to your systemd/sysv
comment.. even sysv based instances have a time-to-login on first boot of
under a minute.

The systemd developers were obsessed with the idea that parallelism would
speed up the boot process, but it doesn't make as significant a performance
impact as they'd have you believe, especially when you're talking about cloud
images that rarely have many services running on first boot. Even if you go
trawling down the systemd boot time reporting, you'll see that most components
start in fractions of a second, and the same was true under SysV too.

------
sudovancity
>Now that I have got your attention with a catchy title - let me share with
some of my thoughts regarding how AWS shines and how much your experience as a
customer matters.

This is the one guaranteed way to turn me off to whatever you're going to talk
about in the article.

------
londons_explore
I don't think they have a pool of instances at all.

It's a generalized image which they boot up for you. Cloning the image, even
though it is many gigabytes, takes milliseconds since the underlying storage
(EBS) will be some log-based storage.

If they really wanted to optimize boot time, they would freeze the fully
booted machine (keeping all the RAM contents) and then clone the frozen
instance. That should be able to get running in just ~10 seconds (enough time
to copy enough of the RAM contents to be able to log you in). They probably
won't do that because having every user running from a fork of the same image
could have some weird repercussions - for example the kASLR would be the same
for all machines, making designing exploits much easier.

~~~
cududa
Windows licensing mechanism prevents this scenario

~~~
londons_explore
Amazon are big enough they could get Microsoft to rework how the technicals of
licensing work.

------
Zombiethrowaway
I used Windows instances a few years ago. Beyond the slow start, once started,
frequently the CPU would stay stuck at very low %, and my tasks would run very
slowly.

Eventually I would get to 100%, but it could often take 10 minutes.

What I learned from those pains is how to use Linux in the Cloud.

~~~
vegardx
Seems like you were using T2 instances which have a low baseline performance
and burst credits. I would imagine that you quickly run out of credits on some
of the smaller instance types after creation, given how lengthy and costly (in
terms of CPU usage) the instance creation and boot process is.

~~~
Zombiethrowaway
I was typically using c3.xlarge for CPU-intensive tasks (video processing).

Boot time was OK. I would log in on the machine with RDP, because sometimes my
processes were almost frozen for a while. It felt like my neighbours were
stealing my CPU, but I did not know how to prove it.

Once I moved to Ubuntu, same instance type, I never experienced this.

------
rebelde
AWS might as well apply any Windows Update changes so that they deliver a
fully-patched and secure instance, instead of an insecure image from a few
months ago. It isn't just AWS, all Windows cloud servers seem to be delivered
unpatched.

~~~
gtsteve
Customers probably wouldn't appreciate getting a Windows instance with an
inconsistent patch level, you want to know that when you start a specific AMI
ID you tested in your staging environment that it will be the same AMI you
launch in production.

Also they ship a new Windows instance approximately every 3-4 weeks with the
most recent updates merged in. They don't make a big enough thing of it but
you can subscribe to update announcements.

------
gtsteve
This is an interesting thought; I use Windows instances myself but I use a
custom AMI built using Packer on our CI server. Presumably Amazon doesn't have
a pool of my custom AMI images lying around.

So I wonder if the custom AMI is actually stored as a layer on top of the
source Windows AMI and applied to the instance from the pool before it is made
available to me.

Alternatively, it means I'm missing out on an optimisation and I could get a
faster start-up time by using a vanilla Windows AMI and installing
dependencies in the user data.

Does anyone know or have an educated guess?

~~~
maxaf
A couple of jobs ago I have spent plenty of time optimizing a Packer-based AMI
pipeline that was similar in spirit to that which you've described. It was all
quite frustrating, but then the job got yanked from underneath me, so I never
finished solving the problem.

The article contains a screenshot[1] of a response from AWS support indicating
that only EC2's own AMIs benefit from this pooling optimization. It follows
that custom AMIs must take the startup time hit all the way.

[1]:
[https://maishsk.com/blog/images/20190307_aws_ec2_secret/foru...](https://maishsk.com/blog/images/20190307_aws_ec2_secret/forum_post_slow.png)

~~~
gtsteve
Ah, I had missed that! Thank you, that's very helpful to know. I always learn
something new every day here.

------
0x0
Does AWS have to pay a license fee to maintain a pool of Windows instances?
Who foots the cost of the license if a prepared/pooled instance is never
allocated to a customer?

~~~
jbigelow76
There has to be a custom deal in place, but it could be similar in spirit to a
hot standby MS SQL Server instance where you only pay for one license but have
a mirror of the machine ready to go if the first one goes down (my knowledge
of licensing was only tangential to my dev work and could be several years out
of date now).

------
altcognito
It would make sense to apply these optimizations to any kind of startup since
presumably anticipating resource allocation ends up as a cost savings and
improvement in user experience regardless of the base OS. There are plenty of
Linux AMI images that are bloated and slow to start.

~~~
simonh
It would only be practical to do this for very commonly used AMIs that AWS
create and provision themselves. Commonly used because you don't want unused
AMIs languishing in the instance pool long term. Only their own, because these
AMIs are heavily manipulated and then customised for the user so you have to
be positive the AMI is compatible with the manipulations.

I wonder if Azure does something similar.

~~~
vxNsr
From my limited use of Azure I'd guess yes, because it also was hot-n'ready in
under 5 minutes any time I needed a new instance.

------
lfx
I'm pretty sure GCE does the same, just last week I had a need to test
something and with 1vCPU instance was ready in about 3-4 mins. Next time will
check logs to confirm it. But seems the reasanoble thing to do.

------
osullivj
Also note AWS Windows is not plain vanilla. It comes with an x64 Python 2.7
build built in, and the x64 python27.dll is on the standard path. Bit of a
gotcha if you're deploying x86 Python 2.7 to AWS Windows!

------
drinane
I'm curious why the author does not think AWS will confirm his analysis. He
seems to have hard evidence of how the system works and communications from
staff.

~~~
bastawhiz
I could see them declining to comment to avoid creating the expectation that
the system works in a particular way. Being a black box means they can make
changes without breaking expectations. Documenting internals, even informally,
means folks may come to expect that behavior.

------
np_tedious
Safe to say this is only possible with default/official AMIs?

------
brian_herman__
Very interesting. Now that 4 minute wait time makes sense now.

------
jak92
Shared windows instances? What could go wrong?

~~~
camtarn
My assumption is that they're not shared. Once you request an instance from
the pool, if the desired number of pooled instances is still the same, another
one would immediately begin the imaging process to take its place.

