The Wide Finder 2 implies lots of IO activity, which proved to be relatively hard to optimize on the T2K, because it you can easily saturate a core by doing mere IO (i.e., a single core is barely able to cope with the sustained read rate of the disk).
This might be the most thought-provoking sentence I've read in weeks. It certainly calls the credibility of people who say "the future will be hundreds of cores per CPU".
Maybe this is fixable in OCaml somehow, but we might be seeing the limits of our multi-core panacea future sooner than we think because of realities like this.
edit: I haven't looked, but I wonder if the IO thread is just busy waited on the disk, and this perhaps isn't the limit. The fundamental question stays the same, I guess.
IO was only (barely) disk-bound when you gave a full core to the reader (i.e.,
you have to be careful not to use HW threads on the same core). This is not a
problem specific to OCaml --- I reproduced it with a standalone program
written in C that simply read the file: as soon as you have more stuff running
on the same core (in different hardware threads), the IO performance drops.
See
http://groups.google.com/group/wide-finder/browse_thread/thr...
Also, and this came as quite a surprise, it turns out that mmap is slower than
read(2) on the T2K.
Wow, interesting. Thanks for the info and the link. This is a surprising and disappointing aspect of this CPU.
Do you know if there's something about this (seemingly trivial) workload that is pathologically bad for this processor, or do you believe that the CPU is just wimpy? I've only ever had a glossy spec-sheet-level introduction to these at work. Given this load I don't see the value of "4 threads per core" that they proclaim on http://www.sun.com/processors/UltraSPARC-T1/specs.xml
It's just that the hardware threads are slow, I think --- compiling stuff on the T2K also took forever. It also seems to me that there's seemingly little value in having 4 threads per core: it forces you to parallelize programs that ran fine with normal cores just to match the performance you'd get without hardware threads...
I don't think it affects credibility of hundreds of cores per CPU at all. I think 32 cores is about as high we will go in the first phase though. Disk I/O bound applications can't be faster by throwing more cores at the problem. Memory bound applications will flat out around the 32 core limit and then we will have to split up the memory as well into several "banks". When that has happened, we can again improve up to some hundred cores.
There is no reason to "fear" having more cores and I don't think it will affect us so much people are stating it will.
This might be the most thought-provoking sentence I've read in weeks. It certainly calls the credibility of people who say "the future will be hundreds of cores per CPU".
Maybe this is fixable in OCaml somehow, but we might be seeing the limits of our multi-core panacea future sooner than we think because of realities like this.
edit: I haven't looked, but I wonder if the IO thread is just busy waited on the disk, and this perhaps isn't the limit. The fundamental question stays the same, I guess.