For massively parallel systems, functional language compilers do not address the...

sounds · on Oct 14, 2012

I think C++ is the right tool for some problems (single-threaded algorithms locked to a core, no context switching sounds like a good fit for C++).

I think many problems are not well suited to C++. Google prefers python for the first attempt at a problem allegedly for the ease of prototyping.

I think Chuck Moore (of Forth fame) frequently propounds an important idea when it comes to improving HPC performance. Of course, Chuck Moore doesn't do HPC optimization that I know of. But he does talk a lot about thinking about the whole problem and avoiding premature optimization. As such, it sounds like the hash join algorithm is not well suited to some parallel problems - so what? Pick the right tool for the job. Picking a hash table could be premature optimization if the problem demands massive parallel scalability.

It seems flowlang.net is right to say that massive parallel scalability will rapidly become a must-have at most companies.

zurn · on Oct 14, 2012

> To dispel one misconception, the reason it is simple to write massively parallel code in C++ [...]

Are you joking?

jandrewrogers · on Oct 14, 2012

Not at all. I've written massively parallel codes for a number of computing platforms, both conventional and exotic. The vast majority are primarily written in systems-style C++ (basically, C++ minus several popular features).

Most of the knocks against C++ are micro-optimizations that are not material to the actual parallelization of the code. Parallelization is done by carefully selecting and designing the distributed data structures, which is simple to do in C++. For parallel systems these are required (theoretically) to be space decomposition structures, which happen to be very elegant to implement in C. Communication between cells in the space decomposition is conventional message passing.

The usual compiler-based tricks such as loop parallelization are not all that helpful even though supercomputing compilers support it. By the time you get to that level of granularity, it is a single process locked to a single core, so there is not much hardware parallelism to exploit. And even then, you have to manually tune the behavior with pragmas if you want optimal performance.

Most compiler-based approaches to parallelism attack the problem at a level of granularity that is not that useful for non-trivial parallelism. The lack of multithreading in modern low-latency, high-performance software architectures make the many deficiencies of C/C++ for heavily threaded environments largely irrelevant.