Most people using C for high-performance computing are just using it to glue together the high-performance libraries, anyway.
Remember, most people are writing big applications, not tiny procedures. If you just want to multiply a few numbers together, sure, C is going to be fast. If you want to build a complicated application, then C's advantages are going to be almost unnoticeable and its disadvantages are severe.
Tactically, even code that simply glues libraries together is still maintaining state, and there's clearly a benefit to being able to efficiently manage bit-level control over how that state is laid out and looked up. Lack of bookkeeping and overhead, cache-cognizant data structures, and simple compactness of representation all add up. Not to mention the fact that the top of the profile for lots of programs is malloc(), and C programs can swap out allocators --- I've gotten 500% (five hundred percent) speedups from retrofitting arena allocators into other people's code.
Strategically, well, there's a term for the strategy you're advocating, and it's called "Sufficiently Smart Compiler", and it's a punchline for a reason.
Are there any in existence that do this particularly well (ie, at least as good or better than a reasonably experienced C programmer might)?
(Also, take a look at some of the ghc on llvm benchmarks, it's very competitive with C, and doesn't require you to jump through any hoops. I'd link you, but Google is blocked at work due to incompetent firewall rules. Sigh.)