I think C++ is the right tool for some problems (single-threaded algorithms locked to a core, no context switching sounds like a good fit for C++).
I think many problems are not well suited to C++. Google prefers python for the first attempt at a problem allegedly for the ease of prototyping.
I think Chuck Moore (of Forth fame) frequently propounds an important idea when it comes to improving HPC performance. Of course, Chuck Moore doesn't do HPC optimization that I know of. But he does talk a lot about thinking about the whole problem and avoiding premature optimization. As such, it sounds like the hash join algorithm is not well suited to some parallel problems - so what? Pick the right tool for the job. Picking a hash table could be premature optimization if the problem demands massive parallel scalability.
It seems flowlang.net is right to say that massive parallel scalability will rapidly become a must-have at most companies.