
AMD Tackles Coming “Chiplet” Revolution With New Chip Network Scheme - vezycash
https://spectrum.ieee.org/tech-talk/semiconductors/design/amd-tackles-coming-chiplet-revolution-with-new-chip-network-scheme
======
dragontamer
The main issue AMD seems to be solving here is the yields at 7nm and lower
processes.

Smaller chips means that you get more yield in the presence of errors. Intel
builds relatively large 600mm^2 chips (like the XCC aka the 28-core Xeon), but
AMD thinks the future is to build networks of ~200mm^2 chips, like what
they've done with Zen / Threadripper / EPYC.

The advantage for AMD is that they've built a single design: the Zeppelin die.
RyZen is simply one Zeppelin. Threadripper is two Zeppelins. And EPYC is four
Zeppelins.

That's it. One singular chip design, mass produced over and over again, to
handle AMD's entire consumer and high-end line. Keep this one design small to
help yields and maybe AMD can get a process advantage over Intel's larger
designs.

AMD's "mobile" or "APU" line is Raven Ridge (a 2nd design at 193mm^2) that
doesn't use this.

\----------------

The above is the current status quo. This "active interposer" that AMD is
developing in the article would go above and beyond in terms of integration.

Note that HBM2 (next-generation high-bandwidth RAM) requires the interposer.
PCB is not good enough for HBM's protocol. Ditto with Hybrid-Memory Cube (a
competing standard). So it seems like the future of computer parts will be the
interposer.

The interposer isn't necessary for AMD's CPU strategy however. So the roadmap
for this network won't come till 2020+ or later (CPUs). Unless AMD might be
building this network for their GPU line?? (But that roadmap is also past
2020). I bet this is all research-and-development, and may never come out as a
commercial product.

~~~
paulmd
> The advantage for AMD is that they've built a single design: the Zeppelin
> die. RyZen is simply one Zeppelin. Threadripper is two Zeppelins. And EPYC
> is four Zeppelins.

While a popular meme, this is not actually true. Epyc is actually a totally
different die, stepping B2 vs the B1 die used in Ryzen+TR.

[https://en.wikichip.org/wiki/amd/ryzen_7/1800x](https://en.wikichip.org/wiki/amd/ryzen_7/1800x)

[https://en.wikichip.org/wiki/amd/epyc/7601](https://en.wikichip.org/wiki/amd/epyc/7601)

The 2700X is actually on a different die as well, and Raven Ridge on another.
There will probably be another die for Banded Kestrel, if AMD ever gets around
to releasing that. Presumably, their embedded SOC products are their own die
as well.

So, about 5 dies per generation, across AMD's lineup (Epyc, Ryzen, APU, Atom,
and embedded SOC). They're using about half as many dies as Intel is - still a
significant difference, but far from the "one die for the whole lineup!" meme.

The big difference is that they're serving the whole server market with one
small die, vs the three that Intel uses.

Of course, a small die isn't all roses either - both mfrs limit you to 8 dies
per system, so right now with an 8-core die AMD systems are limited to 64-core
systems (dual-socket Epyc) versus the 224-core systems that you can do with
octo-socket 28-core Xeons. But, not everybody needs a million-dollar octo-
socket system either.

Once AMD takes a node advantage that advantage will be diminished somewhat,
but Intel's 10nm woes are a whole different story ;)

~~~
pocak
Are newer Ryzen 1000 series CPUs still made with the B1 die?

I thought steppings were for fixing errata, and once a new revision is
qualified, the old one is no longer manufactured.

~~~
paulmd
All the Ryzen 1000 processors (including TR) were B1, at least as far as AMD
told the USB-IF (and AFAIK nobody ever observed anything else in the wild).
Epyc is on B2.

At the time, there was some speculation that B2 might be the "mirror image"
die that Epyc uses 2 of.

[http://www.usb.org/kcompliance/view/catalog_search/results_b...](http://www.usb.org/kcompliance/view/catalog_search/results_browse/process?step%3Aint=2&keywords=ryzen&submit=Go)

[http://www.usb.org/kcompliance/view/view_item?item_key=88142...](http://www.usb.org/kcompliance/view/view_item?item_key=88142d65cbedb30585943f2aa430c9630d7e9da0&referring_url=/kcompliance)

However, they now report Pinnacle Ridge (Ryzen 2000) as being on the B2
stepping.

Ryzen 2000 is on a slightly updated 12nm process but does not incorporate any
library changes. I'm not sure if you could just drop the existing die onto the
new process (given that 12nm is really a 14+), or if you could un-flip the die
to the proper pinout using the substrate, but it seems like that might be
where they moved to the B2 stepping.

But yeah, AMD produced the B1 stepping for an uncommonly long time. At least
through the end of 2017, and the first B2 steppings showed up like June 2017.

------
BooneJS
In the beginning, a sea of discrete components made up a system. Investment in
fab technology caused process nodes to shrink very 18 months so these discrete
components gave way to the System-On-Chip where the board of chips was
replaced by a single chip.

Now physics is harder to overcome, the cost of development at the bleeding
edge of technology is higher than ever, and the continued desire for larger
and larger systems caused the SoC to break apart again. It’ll be interesting
to see if this is what the future looks like for silicon-based chips, or if
this is a temporary shortcut.

~~~
dragontamer
I'm not sure if SoC is breaking apart actually. I think chipbuilders are
figuring out that its more efficient to combine some dies together at the
package level.

Consider that EPYC is basically a miniturized multi-socket design. Infinity
Fabric is really AMD's new protocol built on top of HyperTransport (multi-
socket protocol from the past). Before, AMD used to support 8-sockets. But
today, AMD stitches 4-chips together and only supports 2-sockets.

From a software perspective (ie: NUMA), EPYC x2 sockets looks like an 8-socket
chip of old. In effect, AMD has miniturized the 4x-socket setup in the form of
EPYC. And it has also miniturized the 2x socket setup in the form of
Threadripper.

\----------------------

These Threadripper / EPYC chips have the same downsides as all old 2x, 4x, and
8x NUMA designs of the past. High latency and poor communications between
cores.

The thing is: the modern environment is a highly virtualized, highly
independent set of systems. Running 8x NUMA efficiently today is as simple as
spinning up 8x VMs, one for each NUMA node.

IIRC, people are finding that Intel's 28-core design is far more effective in
say... unified Database performance. Intel's design has a true L3 cache which
can be used by all 28-cores, while AMD's L3 cache is split between each die.
4x 8MB caches cannot function as a singular cache in a large-scale database
application.

But there's enough situations (ie: VMs, multitasking, render farms) where
AMD's NUMA + Infinity Fabric is good enough. And with prices anywhere from
1/4th to 1/2 the cost of Intel, AMD's chips these days are certainly worth
considering.

------
bokchoi
It reminds me of the GreenArrays chip created by Chuck Moore. There was some
hubbub here on HN a few years ago about it -- what ever happened to it?

[http://www.greenarraychips.com/](http://www.greenarraychips.com/)

~~~
corysama
It certainly does seem like chip design is marching grudgingly from Core i7 to
Cell BE and eventually to Connection Machine. Physics doesn’t really care
about ease of programming.

[https://en.m.wikipedia.org/wiki/Cell_(microprocessor)](https://en.m.wikipedia.org/wiki/Cell_\(microprocessor\))

[https://en.m.wikipedia.org/wiki/Connection_Machine](https://en.m.wikipedia.org/wiki/Connection_Machine)

~~~
pjmlp
The big difference is that Connection Machine enjoyed being developed in
programming languages that were better suited to distributed computing,
whereas current chip design we still need to drag system developers away from
C.

------
Quequau
I think would be interesting to see someone like Mellanox make a chiplet with
their tech which could be fully integrated into an AMD SoC or APU or whatever
they're calling them now.

~~~
greglindahl
Check out Intel Omni-Path, it's a separate chip on-package for Intel's high
end cpus. 100 gigabit network.

~~~
Quequau
I can't afford Intel's high-end stuff. That's why I'm looking at AMD.

~~~
frozenport
Really the network is the least expensive part. To get IO that can saturate a
10G costs around 20k, while the infiniband card sets you back less than 4k.
Now you're talking about 100G, which will go faster, you could easily be
looking at a 500k box of ssds.

~~~
davrosthedalek
Wait, what? 10G is 1.2 GByte/s, you can get that from a single SSD easily.
100G is 12 GByte/s, so 5 consumer-level ssds. Squeezing 12 GByte/s over the
buses twice might be tricky, but certainly not a 500k problem.

~~~
dragontamer
>To get IO that can saturate a 10G costs around 20k

Assuming 10G Ethernet is 8b/10b like a lot of other protocols, that's 1GB/s
over 10G Ethernet.

Here's a $130 SSD with 500GB of storage: [https://www.amazon.com/Mushkin-
PILOT-500GB-Internal-MKNSSDPL...](https://www.amazon.com/Mushkin-PILOT-500GB-
Internal-MKNSSDPL500GB-D8/dp/B07CYJ4GS3)

That's 2600 MB/s read speeds. Or more than double your 10G Ethernet.

\------------

RAID0 8 of them together with ASRock's M2. Quad Ultra, and you've got $1040 of
SSDs + $200 for the 2x Quad Ultra cards, or just $1200 for 20GB/s read/write
speeds. More than enough to saturate any network I'm aware of.

In fact, someone has already done this: [http://www.guru3d.com/news-
story/eight-nvme-m2-ssds-in-raid-...](http://www.guru3d.com/news-story/eight-
nvme-m2-ssds-in-raid-on-x399-threadripper-reach-28-gbs.html)

They used a higher-end NVM.e SSD and measured 28GB/s (that's capital B,
gigaBYTES) on the Threadripper + x399 motherboard.

~~~
greglindahl
10 gigabit Ethernet is 10 gigabits of data. It's mainly Infinband that used a
horrible marketing tactic of saying the signal rate instead of the data rate.

~~~
dragontamer
PCIe 2.0, USB 2.0, SATA / AHCI, and more protocols are 8b/10b. So all of these
protocols were 10-bits per byte.

Modern protocols tend to be 64b/66b or better. So that's why I listed
"assuming 8b/10b", its hard to memorize which protocols are which.

Apparently I'm wrong. 10G Ethernet seems to be a more modern 64b/66b in any
case.

~~~
greglindahl
10 gigabit ethernet is not 1 GB/s. It's 10 gigabits of data per second, or
1.25 gigabytes per second. The encoding is not an issue with these data rate
numbers because Ethernet quotes their data rate as a data rate.

------
taneq
Sounds like it's just the next step in the chain from discrete components ->
ICs on a circuit board -> this. The active interposer is filling the role that
the circuit board currently fills, with devices etched into the interposer
filling the role of discrete components, making everything more compact.

------
gigatexal
Any hardware gurus out there care to tak about how this helps? I guess having
a flat pool of heterogenous resources is nice. As long as there’s a decent SDK
that abstracts the hard stuff away I’m all for it.

~~~
tzahola
>As long as there’s a decent SDK that abstracts the hard stuff away

I'd be very skeptical of that. See: Cell Broadband Engine.

~~~
garmaine
Pointing to one example of someone failing to do something in the past is not
strong evidence it won’t happen this time. If anything there was lessons
learned from that failure.

~~~
gigatexal
That’s very true, too.

------
berbec
Is the Big Deal about this having what used to be different cards/chips
sharing a cpu-style insanely fast bus, instead of trickling stuff over pcie or
dram channels? If that is the case, the advantage of this will depend on the
amount of bus saturation to be eliminated. Should be interesting. Everything
works on Infiniti fabric!

~~~
nrp
That’s the advantage over having multiple packaged chip on a traditional PCB.
The advantage over an SoC is that you can have different subsystems on
decoupled development schedules and different process nodes all come together
on the same “chip.”

------
sixdimensional
I wonder if FPGAs could be a node too. That would allow us to mix
programmable, highly parallel analog modeled acceleration onto the same high
speed / direct connect bus as all the other fixed, traditional computing
components. I really like the approach AMD is suggesting here, treating it
like nodes on a network.

~~~
greglindahl
You've been able to buy FPGAs tightly coupled with AMD CPUs since 2006 or so.
The tech back then was to either plug them into an HTX Hypertransport slot, or
in a cpu socket. Very few customers actually wanted to buy these things and I
think all of the makers lost money.

~~~
daveguy
They definitely are missing an application that probably won't be coming until
specialized chip designs are a commodity (eg agi+). Right now you can get
specialized chips for much less than the cost of the programmable chip +
custom design. Once a company identifies a kick-ass fpga design (and it has
any decent market) they move to asic to drive down costs. I guess if fpga
costs were as low as an asic it would be viable as you could change the
application depending on current needs. But currently fpga chips are 10-100x
the cost of asics.

------
tormeh
What's the difference between chiplets and SoCs?

~~~
taneq
The chiplets are separate pieces of silicon linked together by a larger chip,
whereas SOCs are etched onto one large piece of silicon?

~~~
digi_owl
Sounds somewhat similar to what Intel started offering recently, in
collaboration with AMD. Basically a Intel CPU and an AMD GPU in a single
package, for use in laptops that needed something beefier than an Intel GPU
but didn't have the volume allowance for a full GPU card.

~~~
zitterbewegung
They also have a budget Ryzen that does the same thing but with their own Vega
graphics. I am using one now for web development and distributed ledger
development.

~~~
kdmytro
> ... and distributed ledger development.

You are careful not to say "blockchain" :)

~~~
jacoblambda
Just because something is a distributed ledger does not necessarily mean it is
a blockchain.

Yes, they are a dev in the cryptocurrency/blockchain space but still, who
knows?

------
etaioinshrdlu
I remember Intel being slightly mocked when they put 2 dual core dies on a
package and called it a Core 2 Quad.

------
shmerl
Will this still allow modular builds or it will require buying a single board
with everything soldered to it?

~~~
planteen
My guess is modular will still be an option, if not the only option. x86 CPUs
have really been PCBs for over 20 years. You are now starting to see a move
toward SoMs take off in the embedded world, which combine the CPU and RAM on a
module to simplify board layout. In the past, these would have been separate
components that customers laid down themselves on their custom PCB.

------
faragon
"The AMD team found that deadlocks on active interposers basically disappear
if you follow a few simple rules when designing on-chip networks"

I would like to know how they solved that problem. Is there any public paper
or patent explaining that?

------
m3kw9
Similar to project ara?

------
KenanSulayman
Reminds me of this video from the Cisco TechWise Fundamental series:
[https://youtube.com/watch?v=l75B6D9xyMQL](https://youtube.com/watch?v=l75B6D9xyMQL)
(“Fundamentals of Software-Defined Networking”)

That’s obviously not a hardware development, but I feel like the motivation
may be similar: make components more modular; stabilize, standardize and align
their interfaces.

By making these components “plug and play” the distance between a logical flow
chart and the actual implementation is somewhat reduced, making the
development of custom components more efficient and agile.

