

Intel's Ivy Bridge Architecture Exposed - yread
http://www.anandtech.com/show/4830/intels-ivy-bridge-architecture-exposed

======
Retric
I kept reading this and wondering why there was going to be so little
improvement when dropping down to 22nm. Turns out they are mostly just bumping
the GPU which is useless to anyone with a discrete graphics card.

"Intel isn't disclosing the die split but there are more execution units this
round (16 up from 12 in SNB) so it would appear as if the GPU occupies a
greater percentage of the die than it did last generation. It's not near a
50/50 split yet, but it's continued indication that Intel is taking GPU
performance seriously."

~~~
eliben
What percentage of users _use_ a discrete graphics card, though? Especially on
ultra-light laptops?

~~~
mappu
And what percentage of Ivy Bridge users are going to be using ultra-light
laptops? It's a desktop part as well.

~~~
eliben
It is, but I would guess the majority of desktop users are in offices and will
be just fine with the on-core CPU (keep in mind it is pretty powerful - the
current ones are on par with medium-low range discrete cards, and Ivy Bridge's
are likely to be much better).

What really is the percentage of gamers and video editors that need the high-
end GPUs?

~~~
zokier
Saying that current integrated GPUs are on par with medium-low range is still
generous. Based on the article, Ivy Bridge GPU is approx 60% faster than in
Sandy Bridge. I'd estimate that it would it near the performance of Radeon
6450, which is the weakest discrete card in the series.

That said, I agree that with Ivy Bridge, the GPU performance will be good
enough for a lot of uses.

------
hga
The thing I found most interesting was "supervisory mode execution protection
(SMEP)", AKA rings, available in 64 bit as well as 32 bit modes. This is
something that AMD punted in their 64 bit macroarchitecture (which Intel was
then forced to copy) and is sorely missed for various low level stuff.
Unfortunately the long 64 bits interregnum without rings means it will be a
long time before you can depend on others being able to run software depending
on it.

~~~
marshray
What's an example of something that depends on this?

~~~
hga
I remember that some things depended upon it before AMD came out with their 64
bit architecture, but not what they are.

One thing I do know it's useful for is allowing your GC to run in an
intermediate level of privilege between user and supervisor code. Also, if you
do it right (i.e. Multics) a system call can be much cheaper because the
supervisor isn't running in its own address space etc. Instead, your user
level code calls a carefully vetted bit of system gateway code that has a foot
in both rings.

~~~
bdonlan
At least on Linux, there is no address space change when you make a syscall -
entering ring 0 _grants access_ to the top half of the address space, where
kernel code and data lies, but there's no change in page tables (with all the
overhead that implies).

As for code having a foot in both rings, Linux has its VDSO (and Windows has
ntdll) running at a user privilege level; I don't really see how having an
additional intermediate level would help much.

~~~
marshray
I'm not the world's expert on this kind of thing but I don't think I've seen
anything much more complicated than the basic 'supervisor mode bit' get much
use in practice.

Even the venerable 286 (yes 2 not 386) had this four ring model and 'tasks'
which continue to exist as vestigial organs of the architecture. Not used by
anything I'm aware of.

~~~
hga
Well, I think it's fair to mention that more than a few of us are appalled by
how ... conservative the systems development community is. E.g. people persist
in building systems that are less advanced than e.g. Multics and ITS (notice
some of the comments on BeOS and IO). E.g. the biggest action here other than
hypervisors (which are a pretty old concept, it's been a long time since an
IBM OS could run on bare metal) and perhaps smartphones (an area I'm just not
familiar with) is Linux, which ... foundationally in't very advanced. The base
concepts are straight from the early '60s (pre-Multics).

This was true for languages as well; at least at the beginning of the last
decade I noted that nothing seriously popular was based on concepts that had
been developed any earlier than the '60s or maybe early '70s depending on how
your scored OOP (Simula vs. say Smalltalk). Don't know if this applied to
Ruby, though, or really even Python, languages with I don't know. Fortunately
that Dark Age didn't last long (I suspect because the dot.com crash forced
companies to "work smarter, not harder" because of resource constraints).

As for the 286, using it as anything more than fast 8086 was so painful, so
crippled and so slow few bothered (OS/2 is the famous and famously not
successful exception). Remember the hack with the keyboard controller to get
back to real mode? If you didn't need to do that (mostly a device driver
issue), it still was _extremely_ expensive to switch segments in protected
mode which were still limited to 64KB.

~~~
marshray
_people persist in building systems that are less advanced than e.g. Multics
and ITS [and] hypervisors_

I find it interesting that the VMware-style hypervisors were originally
implemented not using, but _in spite of_ all the virtualizaton features of the
CPU. Yet today VM virtualization is considered the most reliable security
boundary on shared hardware. No large security-conscious company would share
user accounts on a Windows instance with untrusted parties, yet they will
share virtual private servers.

I think what it says it that chip designers are lousy at developing what OS
developers want and OS designers are lousy at developing what customers want.

 _286_

Remember, this chip was designed before anyone realized that DOS (and DOS-
based device drives) was going to take over the world. They never imagined
anyone would want to switch from protected mode back to the obviously-inferior
real mode. :-)

They were darn lucky IBM thought to put that auxiliary keyboard controller
there to do a reset on the main CPU.

~~~
hga
" _I find it interesting that the VMware-style hypervisors were originally
implemented not using, but_ in spite _of all the virtualizaton features of the
CPU._ "

Errr, I've not directly studied this but I've read that the x86_32
architecture is particularly hard to virtualize, and that VMWare does it by
rewriting the binaries it runs and Xen obviously started out by
paravirtualizing the hard stuff.

You're second point speaks more to how horrible Windows security is than
anything else, I'd say. The security conscious are more willing to do that
sort of sharing in the UNIX(TM) world, but of course it started out as a
classic time sharing system. And then Linux at least got seriously hardened by
a whole bunch of people, most notably the part of the NSA that does this sort
of thing (it's no accident a lot of SELinux is _very_ familiar to someone who
knows and/or studied Multics).

But all that said, separate VMs raise very high walls between parts of a
system that must be protected from each other. A system with very serious real
world security requirements that I'd providing some advice on right now is
using XCP and separate VMs to help achieve that. Of course it helps that
nowadays we have CPU to burn (and with caches bigger than the main memory of
any machine I used in the very early '80s) and that e.g. 2 4 GB sticks of fast
DRAM (DDR3 1333) cost ~$80. So I suppose this is in part a "if you have it,
why not use it?"

It's certainly the case " _that chip designers are lousy at developing what OS
developers want_ " (see the 286, AMD's dropping of rings, the 68000's
inability to safely page ... although all of these examples at least have been
or will be fixed). As for " _OS designers are lousy at developing what
customers want_ " ... very possibly. It's certainly a problem that developing
a serious OS is for almost everyone a once in a lifetime effort (David Cutler
was famous for doing 3 or so). There's also the curse of backwards
comparability, which has also cursed the CPU designers.

But it's worse than that. It's hard to keep the original vision going, and
counterwise sometimes part of that vision is wrong or rather becomes wrong as
things change. E.g. we all thought the Windows registry was a _great_ idea ...
and then it got dreadfully abused (my favorite: using autogenerated 8.3 file
names as values, making restores problematic). There are some great and I
gather correct screeds about Linus refusing to create an HBI so that keeping
existing device drivers from regressing is ... a very big problem. (Of course,
that a _big_ competitive advantage for Linux as well, but ... not a very nice
one).

------
onan_barbarian
Whoa: mov is a rename and _doesn't take a uop_. In the leadup to Sandy Bridge
this seemed hinted at (the SB renamer handles 'zeroing' registers if you use
the right cliche) but SB still didn't do movs as renames. The fact that Ivy
Bridge does is pretty cool, and more applicable than a shift to 3-operand
forms, as it won't need a recompile to run older code more quickly.

This is pretty cool, at least for those of us who care about this sort of
nonsense. I think I have a performance-critical 105-operation loop somewhere
that might shed about 8-10 pointless execution slots burned on movs...

------
ricw
Interesting how the focus has shifted from performance / frequency to power
consumption. I guess Intel is starting to get a bit of a scare of ARM's
impending Cortex-A15 attack on the laptop space..

~~~
nextparadigms
I hope that once we have quad core 2.5 Ghz Cortex A15 chips (probably next
year), ARM will also start focusing on lowering power consumption after that,
or at least split the product line in 2: one that continues to double up the
performance every year, while maintaining the same TDP, like they do now, and
one where they maintain the performance, but cut the power consumption in half
every year.

That kind of performance should be more than enough for smartphones, and
probably even tablets, though I could see how we might need more on
clamshells. But by having a product line that focuses on lowering power
consumption every year, while maintaining the performance will also ensure
Intel will never catch-up with them in chips with extremely low energy
consumption, and in the same time our smartphones will start lasting more and
more.

~~~
wtallis
ARM already has that strategy in place. They don't stop licensing the A8 core
when the A9 MPCore hits the market. They're perfectly willing to let you
produce an A8 on 22nm silicon if that's what provides the power/performance
balance you want. You should take a look at the full list of cores offered by
ARM; there's a lot you won't have heard of if you only pay attention to the
flagship phones and tablets.

------
cletus
The evolution of technology has certainly been interesting. I remember a
lecturer once told me that everything old is new again. Basically we go in
cycles.

We started with mainframes, then mini-computers and then micro-computers
(PCs/Macs). What happened in the last 10 years? Through the "cloud" we
basically went back to large computers and timesharing again.

Another turning point in the last 10 years is that basically any PC made since
2000 will probably be sufficient for what most normal users want. We get
increasingly powerful CPUs that most people just don't need.

This is part of the reason for the move to lower power consumption. A smaller
form factor and lighter computer is something most people care about. Having 6
cores instead of 4 just isn't.

Many pundits have predicted that Web applications will take over. 5-10 years
ago there was a reasonable basis for this opinion. Computers would get
increasingly powerful and that headroom would make the otherwise horribly
inefficient Javascript medium (compared to the compiled languages like C/C++
or even the bytecode languages like Java/C#) dominant as the inefficiencies
become irrelevant.

The rise of native apps on mobile is another example of this cycle. Part of
this is that native apps have access to libraries that Web pages simply don't
but part of it really is performance and the fact that performance once again
matters.

Personally I find the manufacturing of chips at 22nm to be simply amazing.
When I started paying attention to this stuff IIRC the 386/486 were done on
500nm+ lithography.

I do wonder what the future holds because that number just can't physically
get much smaller (with current lithographic techniques).

It's amazing how much power a small chip can get now. I have one of the latest
Macbook Airs and it can decode 10 video streams without the fan coming on. A
friend has the Core 2 Duo MBA (last year's) and his machine is dying under the
same load.

~~~
rnemo
"Another turning point in the last 10 years is that basically any PC made
since 2000 will probably be sufficient for what most normal users want. We get
increasingly powerful CPUs that most people just don't need."

That's being quite generous to the Pentium 4. Really the last 4-5 years is
when acceptably good CPU power in any circumstance for normal users became the
norm. The Core series of processors is what first gave us the sort of power
that we've come to expect and Nehalem, Westmere, Sandy Bridge and now Ivy
Bridge are all just improvements on that.

Also, according to cpu world, the original 80386 was done on on 1.5µm
lithography.

~~~
acdha
Depends on your software stack, too: the Athlon system I bought in 2000 had no
trouble running a web browser with multiple windows while simultaneously
compiling, transcoding DV, having email & IM open, etc. on BeOS.

We've gained in many areas but I think it took the hail-mary SSD migration to
dodge the usability hit from poor I/O scheduling on Windows, OS X, Linux, etc.
This is far from saying BeOS was perfect (e.g. networking was wretched, there
was no pervasive color management or visual quality from the switch to GPU
compositing) but rather that some of our memories are colored more by the
software than underlying hardware.

