

The Pentium 4 and the G4e: An Architectural Comparison (2001) - CoolGuySteve
http://arstechnica.com/features/2001/05/p4andg4e/

======
frozenport
It was hard to tell who would win.

History tells us that the x86 architecture won, but the P4 lost out to AMD and
later the Intel Core line. Furthermore, PowerPC is still used. From the
article, I can't tell which of these architectures would have trouble scaling
up clockspeed and power-consumption. AMD getting almost 50% of the market
share was equally a shocking.

Lastly, there is the lingering thought that Windows market dominance
facilitated investment that solved the shortcomings of x86, while G4 lingered.

I think the winners and losers were decided by factors other than
architecture. x86 beat Power because of Windows, and Intel floated on their
past success for a few years until they made a better architecture.

~~~
userbinator
PowerPC at the high end still exists in the form of POWER8 for servers, but
they're very expensive and don't perform all that well both in absolute terms
and energy efficiency - it looks like they've gone down the same path as the
P4 for the POWER series, with emphasis on high clock frequency and (extreme)
hyperthreading - resulting in ~200W TDP and frequencies in the 4-5GHz range. .

~~~
wmf
4-5 GHz is actually a low frequency today; a NetBurst/Power6-like processor
would be running at 8-10 GHz today while consuming 500 W or more (if it were
possible).

~~~
xchaotic
Given that we're speaking hypothetically (Since physics gets in the way) not
sure why're saying Netburst would be consuming 500W - as the manufacturing
processes shrink so does the power consumption.

~~~
Retric
Last few die shrinks have done little for power consumption.

~~~
tadfisher
It's because they keep shoving those pesky transistors in there.

~~~
Retric
There is some advantage, but

32nm Sandy Bridge vs 22 nm Ivy Bridge they take the same 2700K chip from 95W
to 77 W.

Which seems ok progress, but at the high end:

Sandy Bridge: 32 nm Core i7 3820 (4 cores 10 MB cache) @ 3.6 GHz = 130W

Ivy Bridge: 22 nm Core i7 4820K (4 cores 10 MB cache) @ 3.7 GHz = 130W

So, you gained .1GHz granted their not identical but they have vary similar
transistor counts.

------
duskwuff
For what it's worth, while Intel did win the architecture wars here, the "deep
and narrow" pipeline being described here died out with Netburst. The Core
microarchitecture that replaced it (which is the predecessor of today's Intel
CPUs) used a much more "wide and shallow" pipeline, and benefited greatly from
it.

~~~
userbinator
Indeed, the P4 was a very odd CPU to program and optimise for - in some ways,
it's the "most RISC-like" microarchitecture Intel has attempted. It was far
more sensitive to things like instruction alignment and branch prediction than
its successors and predecessors, and likely in the pursuit of higher clock
frequencies, some instructions (e.g. shifts/rotates) were made several times
slower. This meant near-optimal code sequences for the P6 family and often
before that would perform horribly on the P4, and vice-versa. It could beat
the PIII in "straight-line" execution of simple integer instructions with no
branches, but the PIII was faster (even at a lower clock frequency) with more
complex and branch-heavy instructions. It's probably the only x86 where a
significant speed advantage can be obtained by extreme loop unrolling, a
practice that is mostly counterproductive for post-Nehalem.

One of the more amusing P4 oddities is that certain 32-bit add/sub
instructions will have a very slightly higher latency if there is a
carry/borrow between the two 16-bit halves - it's very difficult to detect (I
believe it's ~0.5 cycle), but it's there. This is probably a consequence of
pipelining in the ALU itself.

~~~
duskwuff
> One of the more amusing P4 oddities is that certain 32-bit add/sub
> instructions will have a very slightly higher latency if there is a
> carry/borrow between the two 16-bit halves

This kind of data-dependent delay gives crypto people hives, for what it's
worth. It's the sort of thing that can make timing attacks possible.

------
tambourine_man
I miss Jon Stokes articles. They were the best thing of Ars, along with
Siracusa's writings.

~~~
CoolGuySteve
I must admit that while the article is fantastic (and inspired me to get into
low level development), I had an ulterior motive in posting it.

I wish Ars would bring back these in depth architecture overviews. Maybe by
bring traffic to them, Ars will notice there is still demand.

~~~
wmf
There's always [http://www.realworldtech.com/](http://www.realworldtech.com/)

~~~
fulafel
Unfortunately RWT's David Kanter was recently hired by MPR and announced a
near-hiatus from writing RWT in-depth articles.

------
danbruc
"For a look at two instructions as they travel through the G4e, check out this
animated GIF. Modem users should beware, though, because the GIF weights in at
355K."

~~~
johnpowell
Funny thing is the CSS file for Ars weighs in at 381KB.

~~~
leedo__
That is because we embed fonts in the CSS file to cut down on HTTP requests.
It's about 100k without the fonts. Sure it's still much larger than existed 10
years ago, but it's pretty standard these days.

Also note that we're gzipping, so the transmitted size is much smaller. And we
also correctly return 304 responses after the first request.

