Hacker News new | comments | ask | show | jobs | submit login

Is anyone able to explain Figure 1? I don't understand what the levels are, and whether level 1 is the coarsest or the finest. The caption doesn't seem to make sense either way.. Also, Algorithm 1 is in terms of cpus, but the description mentions 'nodes' and 'cores'. Is a CPU a core or a node? Neither?

The scheduler looking at an idle core decides wether to steal work from an overloaded neighbor. It will only compare over the interconnects in the figure (between domains).

E.g. the the two cores in the dark grey box can steal work from each other. But they will only see load averages of the neighbouring domain. In certain cases the current scheduler calculates the load figures sort of odd, so the idle core decides that a neighboring overloaded 'scheduling domain' is not overloaded.

Unfortunately the authors are using AMD machines which behave differently than most others because pairs of cores are conjoined into "modules" that share resources. In AMD processors, a cpu is a core. In almost all other processors, a cpu is a SMT thread (aka Hyperthread).

In figure 1 the levels/shades represent distance from node/socket 1, darker being closer. So node 1 is distance 0 from itself, two other nodes are distance 1, and one node is distance 2.

The only thing shared in a bulldozer core is the FPU (and perhaps some cache, not sure). For all other purposes, a bulldozer module contains two full CPU cores.

Also, what CPUs besides Intel's and IBM's use SMT?

Sun^H^H^HOracle has the UltraSparc T* CPUs which, IIRC, use SMT heavily.

Itanium also supports some form of SMT, I think, although I am not sure if anyone actually uses those.

'Dozer, the arithmetic and memory units were pretty much the only thing that were separated. Each core pair shared a scheduler, a dispatcher, branch predictor, cache, etc. It really was an oddly thought out design.

It's probably even more complicated than that as they separated parts more per core during the evolution of dozer, like the decoder which was one unified one at first and became two separate ones later.

Yeah, an oddly thought out design for sure. The idea seems to make sense to me but execution wasn't good enough I guess.

From what I understand from this presentation the 'scheduling domain' abstraction is reused through different layers of the hierarchy. So for example the two hyperthreads on one logical core are also modeled as 'scheduling domain'.


ARM doesn't do SMT either.

I agree.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact