

It's all about the cores. Why AMD doesn't use HyperThreading - ssp
http://blogs.amd.com/work/2010/01/21/it%E2%80%99s-all-about-the-cores/

======
yungchin
I'm a bit disappointed by the marketing spin. They come up with a few examples
where HT falls down, and use that to dismiss the whole idea?

I wouldn't mind if they had put it in clear and honest terms: a 20%
improvement in performance is not worth the unpredictable chances of
performance degradation. That's a fair engineering decision. But hiding behind
what some Cognos consultant and some Technet curator have written online,
really?

And I'm still trying to decipher the technical motivation from this answer:
"Of course there are those that can say “well, things like SMT can be
implemented inexpensively and don’t consume that much power.” To those, I ask
you, historically hasn’t AMD been the one committed to deliver better value
and lower power? Why would we stray from our core principles?"

~~~
ars
And those examples are terrible.

This first example says "unstable" which is clearly false, so we can ignore
him as ignorant.

The second two are both on windows, which until windows 7 was unable to
schedule hyperthreading correctly.

The last example with linpack you are maxing out the core, so obviously
hyperthreading won't help. Hyperthreading helps when you are not maxing out
the core on any one task. Although a drop is not great, for regular desktop
apps I think you'll have an overall net gain.

The OS task scheduler can switch tasks if one stalls for IO. A CPU based task
scheduler can task switch on cache miss, or availability of single pieces of a
CPU (like an adder).

Next he'll say not to run a multi tasking OS. Instead run each task on it's
own CPU - "and don't worry we are making more CPUs soon".

~~~
andreyf
Could you link to further reading about how a CPU-based scheduler works? The
wikipedia article [1] only mentions OS-level scheduling, and I was always
under the impression that the CPU knew nothing of threads or processes.

 _CPU based task scheduler can task switch on cache miss, or availability of
single pieces of a CPU (like an adder)._

Can't the CPU just signal an interrupt for those, and let the OS scheduler
handle it?

1\. <http://en.wikipedia.org/wiki/Scheduling_(computing)>

~~~
wmf
Multithreaded processors have multiple hardware threads. An SMT processor
schedules instructions, mostly ignoring which thread they come from. An SoEMT
processor schedules hardware threads, usually switching on a cache miss or
similar stall event.

[http://en.wikipedia.org/wiki/Multithreading_(computer_hardwa...](http://en.wikipedia.org/wiki/Multithreading_\(computer_hardware\))

 _Can't the CPU just signal an interrupt for those, and let the OS scheduler
handle it?_

This is usually not efficient because handling an interrupt takes nearly as
long as a cache miss (and let us not consider the case where the interrupt
handler itself triggers a cache miss), but Intel did a prototype on the
Itanium that worked this way.

------
jrockway
This article is worthless. There are no technical details at all, and the FUD
is mostly wrong.

The idea of Hyperthreading is to keep the processor utilized at all times.
Modern processors have many different components for each core, and not every
thread of execution uses all the components. Enabling Hyperthreading lets the
OS supply the processor with more work, potentially reducing the amount of the
processor that's unused at any time.

The article is right about reducing performance. If you have an 8 core
machine, and you have one app that only runs on one core (hello, kcrypd...),
it will run slower than "normal" when you have 7 other jobs using the CPU.
But, if you have a job that scales linearly across cores, then you will almost
always see a speed increase. (I tested this on my machine with a "make" on the
Linux kernel. The CPU time used by the whole build is the same when you run
with -j1 to -j4. The wall-clock time decreases linearly. When you start adding
"hyper" threads, you see the wall-clock time decrease more slowly, but the CPU
time _increase_. This, I think, is what scares people. The processor becomes
slower as you add threads beyond the number of actual cores, but its overall
throughput increases.

I can't think of any workloads where this is a bad thing. I don't know of any
server workload that leans heavily on one thread. (Databases, maybe.) On a
workstation, you are only going to "starve" normal processes when you are
intentionally doing something intensive, like encoding video. For normal
workloads, you will never use even the 4 real cores completely.

(As I type this post, the only things that could possibly want to run are
kcryptd, my music player, the X server, and the web browser. Hyperthreading is
enabled, but there is no way I can possibly get the processor into the state
where it starts slowing down. But when I run "make -j8", I really enjoy the
speed benefit at the expense of making firefox 3% slower.)

Anyway, Google around for various benchmarks showing the results of
Hyperthreading enabled and disabled, and I think you'll want to enable it. I
did my own, and I'm glad I did -- it even makes my eeepc slightly faster!

~~~
jrockway
Doh, I meant a 4 core machine. 4 real cores, 8 virutal cores. Sorry for not
noticing within the editing window.

------
jacquesm
> Running more threads increases throughput for applications as long as you
> have available cores.

As long as you are CPU bound.

Once you're IO bound you can have as many cores as you want but it isn't going
to move any faster.

The only machines where I manage to approach 100% cpu usage are video servers,
all the other boxes are sooner or later IO bound. The only cure for that is
gobs of memory.

It's in my experience pretty rare to have a webserver with more than 4 cores
to be CPU bound. Most of the time the disk or the network card are the bottle-
neck.

Unless you code in a very inefficient way of course, then it can be that you
need more CPU before you hit that wall.

AMD so far more or less held the moral high ground against intel in the 'spin
wars', they can do better than this.

It's simply a marketing piece in the guise of a blog post.

At the bottom of the entry it reads:

> John Fruehe is the Director of Product Marketing for Server/Workstation
> products at AMD. His postings are his own opinions and may not represent
> AMD’s positions, strategies or opinions.

So they effectively distantiate themselves from this posting but at the same
time it sits on amd.com.

~~~
JF-AMD
Ah, welcome to the world of lawyers. Those disclaimers are everywhere, like
the the labels on the lawnmowers that say "don't pick this up while it is
running." They aren't distancing themselves from me, just standard legal
stuff.

Yes, it is marketing, but my job is marketing.

You are definitely right that CPU is not always the problem, some systems can
be I/O or memory-bound.

Our 4 channels of memory on the new products will help with the latter; to fix
the former, you need OEMs willing to put down mulitple chipsets. Typically the
cost drives people away from those designs, unfortunately.

------
Andys
Ooops: AMD are going to support SMT - its in their 2011-2012 road map!
However, they don't call it SMT, they think they'll get away with calling them
cores. It is two small integer cores sharing fetcher, decoder, L2 cache, and a
single FP unit.

<http://www.anandtech.com/printarticle.aspx?i=3683>

~~~
wmf
I think it would be more accurate to say that Bulldozer has conjoined cores,
not SMT.

<http://dx.doi.org/10.1109/MICRO.2004.12>

------
joanou
It's just my experience, I have no formal studies to back it up...

In my multithreaded java application, turning off HT improved performance
100%. Granted, it was one of the early HT implementations wit ha single core,
but I still always turn it off in my BIOS.

I tend to prefer AMD over Intel because you get more MIPS/$ and less heat.

~~~
albertcardona
My java experience is in the opposite direction: using HT enhances performance
(but by no means by 100%).

That, with an Intel 5500 (2 quadcores, "16" cores total when using
hyperthreading).

~~~
JF-AMD
I think this helps underscore my point. Sometimes you get performance,
sometimes you don't.

I used to work for a major OEM that sold Intel systems. One of the biggest
problems was that SQL Server, for instance, ran slower with HT when you had
more than 4 threads.

But customers never thought to turn it off because they assumed it was a
"performance feature." If it was pitched as a "sometimes" performance feature,
then people might be inclined to turn it off to check performance.

If I had a dollar for every customer I had to have that discussion with, I
could have bought myself a server.

------
stcredzero
I just had an insight about Python's GIL. Python depends on the OS scheduler
to allocate computing resources to tasks. But if multiple threads of Python
are contending for the GIL, to the OS scheduler, this is Python doing "work".

HLL's could benefit from better support from the OS for scheduling and dealing
with multiple cores.

~~~
wmf
That's why you're not supposed to use spinlocks in userspace. At least you
should spin for a short time and then block.

------
scdlbx
What is the "Fusion-based" computing mentioned in the first paragraph? A quick
search doesn't yield much.

~~~
Gmo
I guess it's the fusion between the CPU and the GPU ...

~~~
sparky
Yep. <http://fusion.amd.com>

------
andreyf
Are "symmetric multithreading" and "simultaneous multithreading" the same
thing?

~~~
wmf
"symmetric multithreading" isn't even a real term; it's just a mistake in the
article.

~~~
JF-AMD
I will get that fixed. Thanks.

