Heh. Well the current threading models are much worse from the perspective of libraries and hierarchical memory. I work with some of the MPICH developers and members of the MPI Forum. Nobody is satisfied with the status quo, but we need a viable threading system. Some people think we'll end up with MPI+CUDA/OpenCL, others (myself included) would generally prefer a system with in-node communicators and collectives, with a concept of compatible memory allocation. The system-level tools are basically available in pthreads and libnuma (unfortunately Linux-only), but we're working on formulating better APIs because the annotation-based approach of OpenMP isn't very good for long-lived threads or heterogeneous task distributions and systems like TBB are too intrusive and still don't really address NUMA.