Hacker News new | past | comments | ask | show | jobs | submit login

Can we say that, without speculation and caching, and just throwing more and more cpu to work, we would have slower single app performance but better parallel execution ?

Sure -- for a good example, GPUs go partway there by not speculating (they still have cache hierarchies though). It works because GPU workloads have massive data parallelism, so while one group of threads (a "warp") is stalled waiting for data, the cores can just execute other threads. Sun/Oracle had built a number of Sparc chips along this line too, e.g. the Niagara (Sun UltraSPARC T1) tolerates memory latency by having a bunch of SMT threads (8 per core, IIRC?) rather than OoO scheduling.

The problem is that single-thread performance is really important for a lot of workloads, because (i) parallelization is hard, (ii) even for parallelized workloads, serial bottlenecks (critical sections, etc.) still exist, and (iii) latency is often important too (one web request on one core of a server, or compiling one straggler extra-large file in a parallel build, for example).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact