I think they are talking in the hardware sense: hyperthreading/SMT

duaneb · on Aug 15, 2012

How much does that actually help? In my extremely fuzzy memory, it only worked out to around a 30% increase in ideal situations. I'd rather see them work on features that can be exploited with less voodoo.... like hardware 64-bit support, or SIMD support, or HTM, or hell, clock rate.

sounds · on Aug 15, 2012

Intel HT [1] originally was like that (if your code runs in 1.0s single-threaded, ideally it will run in ~0.77s multi-threaded).

The main problem with hyperthreading is that each CPU generation has been so different and software's only decision is in binding to unique cores and hoping the performance is better. AMD's Bulldozer hasn't helped either.

On the other hand, most of Intel's big markets all tend to use pretty inefficient code (very low IPC), and that's where HT makes a lot of sense. ARM cores are typically running a pretty tight ship. So it makes me laugh when I see Atom includes HT.

Intel, clearly, would dispute my claims.

[1] http://en.wikipedia.org/wiki/Hyper-threading

brigade · on Aug 15, 2012

I figured Atom had hyperthreading because it was Intel's first in-order x86 core in over a decade, so compilers had forgotten how to schedule x86 code, so there were lots of stalls in the ALUs that a second thread could make good use of. Plus scheduling for Atom is pretty hard in part due to the lack of registers in x86.

Additionally, Ars argues [1] that from a performance per watt perspective, hyperthreading makes more sense with x86 and two cores makes more sense with ARM

[1] http://arstechnica.com/gadgets/2008/05/risc-vs-cisc-mobile-e...

duaneb · on Aug 15, 2012

> Intel, clearly, would dispute my claims.

Remember that it's peoples' perception of products, not reality, which makes money.

This is all very interesting. I'm going to have to break out my Hennessy & Patterson and get back into hardware.

timsally · on Aug 15, 2012

You want to read Agner Fog's article, How good is hyperthreading?, http://www.agner.org/optimize/blog/read.php?i=6. As an aside, Agner is one of those rare people who only writes when he has extremely valuable things to say. His entire website is worth a read.

tedunangst · on Aug 15, 2012

Does he actually answer that question? I read the main post, which seemed to conclude if it's good, it's good, and if it's bad, it's bad. Then there's some replies, and finally a single (negative) number presented for the Rybka chess engine. What about programs that aren't chess engines?

tedunangst · on Aug 15, 2012

But it's 30% you get for basically free. I kind of thought HT was mostly a gimmick (look, now with 256 virtual CPUs), but changed my mind since it doesn't cost anything (in terms of die space) to add it to a chip. 30% more performance for 1% more cost is a better deal than 100% more performance for 100% more cost, assuming you can live with only 30% more performance.

I should add I think what AMD is doing with Bulldozer (claiming two virtual cores are actually full cores) is bullshit.

duaneb · on Aug 15, 2012

> I should add I think what AMD is doing with Bulldozer (claiming two virtual cores are actually full cores) is bullshit.

I think AMD is doing whatever it can to get people to buy its CPUs. If it weren't for their ATI purchase, I think they'd be basically dead by now. It still amazes me how far they've fallen: I built my first computer with an AMD X2 when I was 15 (6 years ago now) - they looked like they were going to upset Intel as deciding the future of x86 chips. They did for a while - we got a sane 64-bit architecture out of it. I'm not sure where they went wrong: was it marketing, was it manufacturing tech, was it profit margins, was it Apple? I don't even know if their current processors are competitive or not in the performance market - things like "Bulldozer" make me think not.

Anyway, could SMT be implemented on top of ARM v8? My knowledge of hardware doesn't include multithreading. However, from my limited understanding of it, I don't see SMT making much difference in tight RISC code, which is designed to have a high instruction throughput per cycle, leaving little for instruction reordering to optimize.

tedunangst · on Aug 15, 2012

One way to think of SMT is context switches for free, and lots of them. What happens when you run two processes on one core? Every 10ms the kernel copies out all the registers from one process to memory, copies in the regs for the other, and switches. What happens when you use SMT? Every "2" instructions the CPU switches from one process to the other, transparently, without hitting memory. After 20ms, the same amount of work is done, possibly a little more, and if process two only had 1ms of work to do, it doesn't have to wait the full 10ms timeslice of process one.

SMT is not about instruction reordering at all (within one process). Just like the OS switches between processes whenever you wait for disk, now the CPU switches processes whenever you wait for memory. It just happens that virtual cores are the way the OS programs the CPU scheduler.

wmf · on Aug 15, 2012

SMT generally doesn't require any ISA support, which is why it's confusing that Kanter would mention it.

Also, a lot of code (think pointer-chasing) can never be made "tight".

caf · on Aug 15, 2012

It's true that AMD is being disingenuous with Bulldozer, but on the other hand their SMT threads share less resources than other implementations (they have separate integer execution units for example, which makes them much closer to "full cores").

mtgx · on Aug 15, 2012

The thing is I don't think it adds 30%, but more like 10%.

mtgx · on Aug 15, 2012

The tests I've seen only showed a 10% improvement on average with hyperthreading. I don't think it was worth it for ARM.