

AMD reveals Steamroller CPU architecture details - geoffgasior
http://techreport.com/articles.x/23485
At the Hot Chips conference today, AMD CTO Mark Papermaster revealed details about the company's next-gen Steamroller CPU architecture.  Steamroller promises to address Bulldozer's shortcomings, and AMD expects a 30% increase in instructions per clock.
======
incision
I'd really like to know what happened at AMD between the great success of the
original Athlon [1] and their recent struggles [2].

1: <http://en.wikipedia.org/wiki/Athlon#Athlon_.22Classic.22>

2: [http://semiaccurate.com/2011/10/17/why-did-bulldozer-
underwh...](http://semiaccurate.com/2011/10/17/why-did-bulldozer-underwhelm/)

~~~
reitzensteinm
A lot of it is down to the Pentium 4 turning out to be a dead end. Intel
designed it to scale to 10ghz, with a massively long pipeline. That turned out
not to be possible, so all the design sacrifices that were made (primarily a
ridiculously long pipeline) turned out to be bad bets.

When Intel introduced Core 2 Duo, performance per clock in many cases
_doubled_ , on the _same socket and process node_. I'm unaware of a precedent
for that, at least in recent history.

Then Intel a couple of years later rolled out Nehalem, with an integrated
memory controller and hyperthreading, cementing their advantage in the server
market. AMD has been playing catch up ever since.

If Intel's chips were half the performance today, AMD would be winning; though
not by quite as much.

Core 2 Duo review (compare it to the Pentium D, which is dual core):
<http://www.anandtech.com/show/2045/11>

i7 3770k review (the FX 8150 is AMD's flagship 8 core Bulldozer CPU):
[http://www.anandtech.com/show/5771/the-intel-ivy-bridge-
core...](http://www.anandtech.com/show/5771/the-intel-ivy-bridge-
core-i7-3770k-review/6)

~~~
lsc
>Then Intel a couple of years later rolled out Nehalem, with an integrated
memory controller and hyperthreading, cementing their advantage in the server
market. AMD has been playing catch up ever since.

Before intel essentially copied HT from AMD with their QPI, (I believe that
Nehalems were the first QPI xeons) AMD servers nearly always came in
dramatically lower power than the FBDIMM using Xeon-based servers.

Also note, in the early days of hyperthreading, it was a great way to run your
two active processes on one core, while your second was idle. My understanding
is that even now, in the best case, it's not a particularly huge boost.

I mean,yeah; between the release of the QPI xeons and now, for most things,
intel has had the superior chip. But before QPI? man, if you paid for your own
power, AMD was dramatically superior for high-ram applications.

~~~
justincormack
Hyperthreading is better now. They even had to add another instruction to fix
some of the issues. Plus a lot of work on the scheduler. Not worth disabling
now!

------
tmurray
Looks like too little too late. If Steamroller is still a year or more out,
that will put it in competition with Haswell (DDR4 in the server part, AVX2
with FMA ops, etc.). Considering how much Ivy Bridge is already dominating and
those kinds of memory bandwidth and FP throughput increases, Haswell looks
like it will be a monster.

------
programminggeek
I don't follow AMD's stuff much beyond knowing that Bulldozer was not very
competitive beyond the $100-120 price point. Is Steamroller set to change that
or is AMD dying a long, slow death?

Also, is AMD going to do anything in mobile or is that all ARM moving forward?

~~~
gauravk92
ARM is introducing 64 bit chipsets in less than a year or so. The power
savings will be worth the switch for cloud hosts. Most devices sold already
run ARM.

~~~
tmurray
Why do you assume ARM-based processors will necessarily have a higher perf/W
than x86? This has been a common claim (usually because of the perceived size
of the x86 instruction decoder), but Medfield has proven that to be false:

[http://www.anandtech.com/show/5770/lava-xolo-x900-review-
the...](http://www.anandtech.com/show/5770/lava-xolo-x900-review-the-first-
intel-medfield-phone/)

~~~
ajross
Benchmarks and lies, yada yada. Atom wins on some things (in particular it
tends to kick the A9's butt on Javascript benchmarks, which rely on single
threaded dispatch and high clock speeds) and loses on others (it's a single
core with hyperthreading, where most ARM SoCs are dual core).

Actually depending on the benchmark, the low-clocked Ivy Bridge CPUs tend to
do quite well in "performance per watt" vs. ARM SoCs too. They lose big in
idle power, but under load those enormous L3 caches and the uOp cache can give
them 2-3x the performance per clock of the in-order A9 (and they run about 2x
as fast, and draw about 4-10x as much power at peak, so it actually comes out
very (!) roughly even).

ARM has a long way to go before they are legitimately competitive in the
server space. But Intel still isn't anything more than "broadly competetive
with 2-year-old devices" in the mobile world. Over time I'd expect the
architectures to converge from both directions, but I don't feel lucky enough
to guess at which one will "win".

~~~
FrankBooth
A9 is out-of-order.

------
dkhenry
It doesn't matter much if they can keep making incremental improvements at
this point. They really need to increase their iteration speed. This update
should have been out last year.

