
AMD Zen Microarchitecture: Dual Schedulers, Micro-Op Cache and Memory Hierarchy - dineshp2
http://www.anandtech.com/show/10578/amd-zen-microarchitecture-dual-schedulers-micro-op-cache-memory-hierarchy-revealed
======
geertj
SEV (Secure Encrypted Virtualization, [1]) is a hugely interesting feature
that will be available with Zen. Once it's mature and perfected, it would
allow you to securely run a VM in the cloud that is protected against someone
who controls the hypervisor. And you'd also be able to attest that indeed
you're running in such a protected VM.

How do you protect against someone controlling the hypervisor? Read the paper.
But the high level is to encrypt memory using keys that cannot leave the
processor and are only available to a specific VM ASID (Address Space
Identifier), assisted by a secure firmware similar to the Secure Enclave.
Attestation uses an on-chip certificate signed by an AMD master key during
fabrication.

There were some discussions on this on the linux-kernel mailing list [2]. As I
understand it, the current generation of SEV is still somewhat leaky, but
there's no fundamental reason why those leaks cannot be closed.

[1] [http://amd-dev.wpengine.netdna-
cdn.com/wordpress/media/2013/...](http://amd-dev.wpengine.netdna-
cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf)
[2] [http://www.mail-archive.com/linux-
doc@vger.kernel.org/msg025...](http://www.mail-archive.com/linux-
doc@vger.kernel.org/msg02578.html)

~~~
spaceheeder
In addition to cloud VMs, I wonder what applications this might have on local
systems. Systems that boot from encrypted partitions and can't have the keys
recovered by cold boot attacks? Secure graphics acceleration of different
guests in a Qubes system? etc.

~~~
fulafel
DRM and keeping the user from rooting their PC. This is like MS/Intel Trusted
Computing on steroids.

~~~
snuxoll
AMD has had a on-die management CPU (dubbed the PSP - Platform Security
Processor) for years now, similar to the Intel ME, this is something
completely different. Memory encryption doesn't prevent you from doing
anything with your device, as far as userland is aware they still have access
to the full address space with no knowledge encryption is happening - the only
difference is memory _IS_ encrypted so even if you manage to freeze memory to
preserve the charge it will be useless as soon as it is removed from the host
machine or it is rebooted.

~~~
comex
> Memory encryption doesn't prevent you from doing anything with your device,
> as far as userland is aware they still have access to the full address space
> with no knowledge encryption is happening

It depends what you mean by "userland". The purpose of SEV is to allow a guest
VM (using hardware virtualization) to run without trusting the host, including
remote attestation. Traditionally hardware virtualization is used to run a
full operating system which was installed at the impetus of the user, but
there is no rule that it can only be used for that. If this feature is enabled
on desktop parts, it's equally possible for black box DRM software running,
say, on a non-virtualized Windows system, to include a small unikernel and
automatically set it up to run in SEV mode. The whitepaper proposes that
people running VMs in the cloud use remote attestation to upload disk
encryption keys such that the VM can only decrypt the disk if it hasn't been
tampered with, but the 'cloudiness' could just as well go the other way: cloud
DRM servers sending decryption keys, for both video and perhaps the code
itself, to enclaves on desktop PCs.

Using SEV alone for DRM would have a significant limitation compared to using
the PSP: since all interaction with the outside world is still through the
host, it would be hard to prevent the host from grabbing the raw decrypted
video data as it leaves. But this still prevents recovering the original
bitstream; allows 'perfect' obfuscation of many facets of how exactly the code
works; and could probably be used in combination with the PSP in some manner.
And in some DRM applications, the ability to grab the output may not matter.
Imagine a video game where the bulk of the game was inside an enclave,
preventing piracy but also all reverse engineering and modding.

Of course, a video service or video game that only runs on AMD CPUs won't get
very popular... but conveniently, Intel is coming out with their own feature,
SGX, that provides similar capabilities, though with a different design (it's
designed more directly for the DRM use case). One might imagine that
eventually most systems will have CPUs that support one or the other.

~~~
fulafel
> since all interaction with the outside world is still through the host, it
> would be hard to prevent the host from grabbing the raw decrypted video data
> as it leaves

Wasn't this part already paved earlier by Microsoft, when Hollywood wanted to
guarantee no unencrypted HD video leaves the PC? It might have weaknesses but
the principle is already estabilished.

A secure crypto path from black box VMs to smart TVs also leaves the door open
for all kinds of nasty scenarios involving TV pwnage. You also will have no
way of decrypting the data that the VM exfiltrates from your PC.

------
arcanus
I've been recently reading reports from some of my banking friends (and
actually chatted with some folks I know at AMD) because I'm curious about
AMD's turnaround. Even just last year AMD looked to be in very dire straights,
and are still operating at a loss.

However, they seem to have a strong technical pipeline and they have
historically punched above their weight-class. Does it look like they are
going to make it?

~~~
snovv_crash
The issue is the long lag time between new ideas being implemented at a design
level, and the many iterations of fabbing and tweaking that need to take place
before it can actually be sold. The majority of people in tech are too used to
something being written in the morning and deployed in the afternoon to
understand what it is like having a 3-month lag in your testing cycle, and
minimum twice that till release.

Just like Intel had the P4 hole that it had to drag its way out of, so now AMD
has had Bulldozer. Notice how Intel has been quite conservative with each
individual tick/tock, trying to keep their pipeline full. Doing crazy changes
risks causing a pipeline stall which could last years. Each new architecture
is risky, and AMD screwed up with Bulldozer. From early signs it looks like
Zen is a winner, hopefully AMD can stick with it for a while.

~~~
tormeh
I think bulldozer was just ahead of its time. I am fortunate to have one, and
it has aged a lot better than the same-price alternatives from Intel due to
the more multithreaded nature of e.g. games nowadays. That's not much comfort
for AMD, since being ahead of your time is just another way to fail, but at
least AMD had vision.

Mankind Divided's recommended specs are FX-8350 or i7 3770. The price
difference between the two in their heyday was $100 in AMD's favor.

~~~
merb
price was always in AMD's favor. problem is more and more the energy
consumption. also the i-series of intel was really really solid and good, at
least until broadwell (didn't seen too much skylake yet). that especially lost
amd a lot of ground in the server space. and a lot of "high end" gamer favored
the Intel i7 series aswell, even when they weren't as cheap as amd.

~~~
tormeh
The i7 was a _lot_ better for gaming than the 8350 was at the time. Hell, an
i3 was better at the time. The 8350 was priced accordingly. There was no price
advantage. What I'm saying is that AMD's design has aged a lot better.

~~~
merb
AMD was cheaper (always) not better. You can be cheaper and worse and people
will still buy the worse since it's cheaper even if price / quality on intel
would've been better.

~~~
tormeh
You obviously haven't actually checked the prices.

------
snuxoll
Not having a unified L3 cache is an interesting choice, I can see how it would
significantly reduce the cost of the chip and considering many multi-threaded
workloads are operating on separate chunks of data chances are it shouldn't
incur a noticeable performance penalty (especially in virtualization
workloads, I'm interested to see what their 32-core server chip ends up
looking like).

~~~
gpderetta
On the other hand an unified (inclusive) L3 cache helps with maintaining cache
coherency, which need to be explicitly handled in a non-unified design.

I guess a big benefit of the separate caches is that if only half cores are in
use, you can power half of it down, saving power and TDP.

~~~
sliverstorm
A unified L3 is expensive in a number of ways. It is large, which means it is
geographically remote, as well as slow (for caches, big == slow). This costs
lots of access latency.

It also has a bandwidth problem. If 64 threads are vying for access, you
either build it with few access ports and it gets choked, or you build it with
many access ports which is costly in area, power, & speed.

Two separate peer caches automatically have twice the bandwidth of one similar
double-size cache, for the price of NUMA & cache coherency challenges.

There is no one right answer here. Bandwidth is far more important and
coherency much easier in a small L1; as you go down the hierarchy, bandwidth
needs shrink and coherency is more expensive.

~~~
BlackMonday
I remember a rumour about a HPC-APU from AMD which would combine 16 Zen cores
with a Vega GPU and HBM (High Bandwith Memory) as a L4 cache. I know a L4
cache would be much slower than a L3 cache, but I'm curious, could HBM as a L4
cache be one of the reasons why they didn't use a unified L3 cache?

Disclaimer: I don't know sh*t about hardware design as you can probably guess
from my posting. ;o)

~~~
snuxoll
L4 cache is more used as embedded memory for the on-die GPU, last I checked
Intel only included their eDRAM L4 cache on Iris Pro equipped model as any on-
die GPU worth its salt is going to be bandwidth constrained even with a
relatively low amount of GPU cores.

Same situation with Zen, if they're going to include even a Polaris it would
be highly memory constrained if it had to hit system RAM all the time, so
another fat chunk of memory on-die will be necessary to not starve it and keep
latency down (as it stands the RX 480 can pump 256GB/s).

~~~
BlackMonday
Yeah, the bandwith problem is already noticeable with AMD's current APU's even
though they use small GPU cores compared to discrete grapics cards. Faster
DDR3/4 memory brings noticeable FPS improvements. If they already had HBM they
would run circles around Intel (which they probably already do unless the
competitor is a Iris Pro with eDRAM).

Could the CPU also profit from the HBM memory? The bandwith is much better
than with DDR4 main memory (even if it is 2 or 4 channels), and I would guess
the latency as well because it would be on the same die?

~~~
snuxoll
HBM won't be on-die, but it will be on-package - HBM relies on chip stacking
to get the desired throughput in a small surface area, regardless the latency
and throughput would stomp system DRAM something awful, and if it's a proper
L4 cache then the CPU would benefit as well.

IBM does something similar (though not for graphics) in recent POWER CPU's
with the Centaur memory controller(s), they are off-chip memory controllers
with a bunch of eDRAM to act as a L4 cache (though the difference here is each
system has multiple centaur controllers to handle different DIMM slots).
They're able to burst to ~96GB/sec to _system_ memory using this, having a
good amount of on-package HBM would probably yield similar gains :)

------
tcoppi
The timeframe is slightly disappointing since I think a lot of people were
expecting Q3/Q4 2016.

The architecture itself sounds pretty much like what everyone was expecting, a
traditional fat and wide core. Their power management and foundry process will
probably make the difference as to whether final performance is impressive or
not, may also be the cause of the delay.

~~~
BlackMonday
AMD stated for a while (march or so) that they maybe have small shipments in
december, but the bulk of shipments will realy only start in Q1 2017.

Anyway, the first benchmark is promising, and I hope Zen can also keep up with
Broadwell performance in other benchmarks/workloads, as well as in power
efficency.

------
maerek
As someone who is fascinated by articles like this one, but doesn't have a
background in CE/EE, any recommendations for literature/classes I could take
so that I can better understand the topics being discussed?

~~~
tcoppi
A computer architecture class. For books, [1] is what you will probably use in
any decent computer architecture class, and [2] is a good read from a more
general audience perspective, if a bit dated.

1\. [https://www.amazon.com/Computer-Architecture-Fifth-
Quantitat...](https://www.amazon.com/Computer-Architecture-Fifth-Quantitative-
Approach/dp/012383872X/ref=sr_1_1?ie=UTF8&qid=1471536721&sr=8-1&keywords=hennessy+and+patterson)

2\. [https://www.amazon.com/Inside-Machine-Introduction-
Microproc...](https://www.amazon.com/Inside-Machine-Introduction-
Microprocessors-
Architecture/dp/1593276680/ref=sr_1_1?ie=UTF8&qid=1471536789&sr=8-1&keywords=inside+the+machine)

~~~
deaddodo
I would argue that this is the best:
[https://www.amazon.com/dp/B00HCLUL5O/ref=dp-kindle-
redirect?...](https://www.amazon.com/dp/B00HCLUL5O/ref=dp-kindle-
redirect?_encoding=UTF8&btkr=1)

Albeit, slightly older and _very_ technical.

------
mark-r
I've been a big booster of AMD for a long time, but recently the
performance/power is so much in Intel's favor that I've been forced to use
Intel for my last couple of PCs. I hope Zen makes them competitive again.

------
Zardoz84
The most important thing for me. Zen cores have the AMD equivalent of Intel
AMT ? (I don't remember the name).

If it haves it, I would avoid it like a pest, and get an FX-8370 or 8350 to
replace my now aging FX-4100. The last thing that I like to have on my
computer is a hidden uncontrollable CPU doing things that could affect to my
privacy.

~~~
milcron
Unfortunately, it seems impossible to acquire a modern x86/x64 chip without
such hidden firmware. The last Intel CPUs without it are from 2008, and the
last AMD CPUs without it are from 2013.

If you can tolerate using a different CPU architecture, Raptor Engineering's
Talos Secure Workstation looks very intriguing.
[https://www.raptorengineering.com/TALOS/prerelease.php](https://www.raptorengineering.com/TALOS/prerelease.php)

~~~
rasz_pl
>last Intel CPUs without it are from 2008

and those have cpu wide ring0 escalation bug
[https://www.blackhat.com/docs/us-15/materials/us-15-Domas-
Th...](https://www.blackhat.com/docs/us-15/materials/us-15-Domas-The-Memory-
Sinkhole-Unleashing-An-x86-Design-Flaw-Allowing-Universal-Privilege-
Escalation.pdf)

------
bitL
I just wish AMD made drivers for Win 7 as well - then I could switch from
4-core 4790k/32GB to 8-core ZEN/64GB ECC and keep using all the Adobe video
editing stuff.

~~~
clevernickname
Citation needed. There would have been a huge stink if AMD had stopped
supporting Windows 7 already.

~~~
bitL
[http://techreport.com/news/29611/win10-will-be-the-only-
wind...](http://techreport.com/news/29611/win10-will-be-the-only-windows-
supported-on-next-gen-hardware)

~~~
gruez
The article states that it only applies to new CPUs/APUs. I also vaguely
remember that Intel's doing the same as well

~~~
bitL
> only applies to new CPUs/APUs

i.e. Zen

------
akerro
Should I wait for Zen or buy i7 now?

~~~
theandrewbailey
Depends on what you have and how urgently you need the greater speed.

My main system is still running a i7-2600 from over 5 years ago. That GTX 680
I have in there is still plenty fast. The upgrade question is: how pretty do I
want Star Citizen to be?

~~~
akerro
I would like to build a PC that would compile stuff quickly... Android, Java,
Spring/Hibernate, some Rust and JS recently. Currently takes a few minutes to
build any of my projects on a laptop I have. I think more physical cores will
boost it more?

~~~
mutagen
First question, does that laptop have a SSD? Most developer machines do these
days, that's the single biggest improvement to build times you can make. Then
look at CPU, IO, memory utilization during builds to see where improvements
can be had.

~~~
akerro
Yes, it comes with top fastest SSD and fast RAM.

~~~
theandrewbailey
Is that an NVME SSD or SATA SSD? NVME is about 4 times faster than SATA in
terms of pure transfer bandwidth, though I'm not sure about random access
speeds.

~~~
bitL
NVMe makes sense only if you are either pushing bandwidth limits (e.g.
processing large RAW 14-bit 4K/8K video on a scratch drive) or have hundreds
of threads with concurrent I/O operations. In real world, you are barely going
to notice any difference between SATA2 and SATA3 SSDs, not to mention M.2 PCIe
ones.

~~~
imtringued
With SATA2 you might as well just use a HDD. Of course I'm assuming you
actually optimized how your data is layout on the storage medium to take
advantage of the sequential read speed of your HDD, SSD or even RAM. Even my
HDD usually reaches 160MB/s so a SATA2 connected SSD is only twice as fast at
four to six times the cost. Yes SSDs are better at IOPS but an application
that heavily depends on IOPS is often a result of poor design.

[http://media.bestofmicro.com/Q/0/378072/original/AS-
SSD_Sequ...](http://media.bestofmicro.com/Q/0/378072/original/AS-
SSD_Sequential.png)

~~~
bitL
For real-world reaching speeds of 250MB/s while having significantly reduced
latency comparing to HDD as well as read/write IOPS >60k is what gives you the
snappy feel of SSDs. If you are unlucky and have e.g. Sandisk G25 SSDs with
write IOPS in 10k range, you'd barely notice any difference to a fast HDD. But
if you even get an SSD in a USB stick like Sandisk Extreme 64GB with
reasonable IOPS, you can install OS X/Linux/Windows there and it would give
you that snappy feel. Bandwidth beyond certain threshold is not what is giving
you the snappiness. If you take NVMe that reaches 3000MB/s comparing to SATA3
with 550MB/s, booting time reduces by like 1s - would you really want to pay
2x price when that is the only benefit you'd notice? Or starting your app
would take 0.63s instead of 0.67s?

Seriously, invest into NVMe if you are video producer (I can't imagine
processing my 4K movies on SATA SSD or HDD, even 24fps playback on SATA SSD
can't happen in RAW format as it needs >1MB/s) or you do some heavy I/O server
stuff. If you don't do any of the above, invest your $ into capacity instead,
i.e. given 512GB NVMe vs 1TB M.2 SATA I'd go with 1TB one.

~~~
majewsky
> [4K movie] 24fps playback [...] in RAW format [...] needs > 1 MB/s

Actually, much more than that. 3840 * 2160 * 3 bytes * 24 Hz = 569.53125
mebibytes / second

~~~
bitL
4K CINE 14-bit 24fps RAW - 4096 * 2160 * 5.25 bytes * 24Hz = 1063.125MiB ;-) I
should have written >1GB/s in the previous comment :-D

------
bitL
Am I the only person whom the Zen logo makes cringe? They shouldn't copycat
Intel Inside logo there. Really bad taste...

~~~
evanriley
It's not copying the Intel Inside log, although I'll agree they look _similar_
its most likely based off of the Ensō[0], which is part of _Zen_ Buddhism.

[0]
[https://en.wikipedia.org/wiki/Ens%C5%8D](https://en.wikipedia.org/wiki/Ens%C5%8D)

