Most people using C for high-performance computing are just using it to glue together the high-performance libraries, anyway.
Remember, most people are writing big applications, not tiny procedures. If you just want to multiply a few numbers together, sure, C is going to be fast. If you want to build a complicated application, then C's advantages are going to be almost unnoticeable and its disadvantages are severe.
Tactically, even code that simply glues libraries together is still maintaining state, and there's clearly a benefit to being able to efficiently manage bit-level control over how that state is laid out and looked up. Lack of bookkeeping and overhead, cache-cognizant data structures, and simple compactness of representation all add up. Not to mention the fact that the top of the profile for lots of programs is malloc(), and C programs can swap out allocators --- I've gotten 500% (five hundred percent) speedups from retrofitting arena allocators into other people's code.
Strategically, well, there's a term for the strategy you're advocating, and it's called "Sufficiently Smart Compiler", and it's a punchline for a reason.
Are there any in existence that do this particularly well (ie, at least as good or better than a reasonably experienced C programmer might)?
(Also, take a look at some of the ghc on llvm benchmarks, it's very competitive with C, and doesn't require you to jump through any hoops. I'd link you, but Google is blocked at work due to incompetent firewall rules. Sigh.)
OO-itis is something which definitely will sap performance if left unchecked, but it's true whether you're using C or Smalltalk. Much like denormalizing relational databases, careful placement of data members can reduce cache misses, and avoiding long chains of dereference for data commonly accessed together is important.
The idea that C# (or for that matter any GC'd language) is going to beat an allocator in which alloc is amortized to a single load, constant add, and store seems... I don't know, fill in the blank.
You seem to think I'm advocating C over GC'd languages. I'm not. I write in Ruby by default, a language that is not only GC'd but badly GC'd. I just do so with my eyes wide open on performance.
I'm not saying you're advocating C over GC'd languages. I'm specifically disagreeing with the idea that in practice, in large applications written in C, that the application author actually has discretion over allocation policy. I'm saying that you couldn't in practice use a pool allocator much of the time, even if you wanted to, unless your application is very self-contained.
The Delphi compiler I work on is written in C and uses pool allocators to great effect. There's a heap allocator which can be marked and shrunk, there's a pool for every unit, there's a pool for expressions, there's a pool for constants, etc. You've got to keep track of what got allocated where, make sure you don't pollute one pool with references to another, and sometimes do a bunch of copying to ensure that. Pooled allocation isn't a panacea, even for something as self-contained as a compiler, albeit a compiler which can be hosted by the IDE, so it needs to be long-lived, manage memory across debugging sessions, multiple compiles, etc.
Real library code either owns object lifetimes (and can use pools internally because the library's own _release() function is the only thing that can free its state), or keeps its hands completely off allocation. The few counterexamples, where for instance a library malloc()'s something and expects you to free it, tend to be notorious examples of error-prone and evil interfaces.
Meanwhile, just because everything isn't amenable to pool allocation (or, even better, arena allocation, where there is zero memory management overhead at all ever) doesn't mean you don't win huge on the places that are amenable.
You are raising the boogeyman of a hypothetical library that is going to take a pointer that I pool allocated and call free() on it, blowing up the program. I assert that any real example of such a library is going to be easy to shoot down as "an incredibly crappy library".
The primary advantage of a pooled allocator isn't in allocation - though that's nice - it's that you don't have the cost of iterating through each object to free it. But if you have external libraries, they'll abstract their allocations into handles (say), and now you have the problem of running what amount to destructors.
I think people also overestimate the extent to which C programs depend on third-party libraries for their own statekeeping.
And again... what are we talking about here? I'm not advocating writing things in C. I'm saying, it's bogus to say that since code tends to be I/O bound, C isn't a performance win for most programs. That is simply a bogus argument. That the level of performance you can get out of C is usually not worth the investment is neither here nor there. Again: Ruby programmer here.
Where I think performance advantages that come out of writing things in C come from is being rarely being able to take shortcuts by relying on provided libraries and primitives which turn out to be not quite tweaked for the problem at hand. That is, C forces you to do so much yourself - largely because it has such poor abstraction tools - that you end up with a more specialized codebase. That specialization can include cache-oriented optimization, but I don't think it's the most important aspect or unique to C such that you can't get 95% of it - to the point where it's no longer a meaningful advantage - in a GC'd language.
In other words, I'm curious what are the underlying malloc() implementations that pool allocation so outperforms.
I think your real observation is that C level performance just isn't that relevant in scale-horizontal network-bound programs, and I agree with you.
(and no programs are high-performance if they have a pointer bug that makes the program crash).
"The difference between theory and practice is small in theory and large in practice..."
Performant C code doesn't use linked lists. Linked lists shred caches and allocators. Your go-to data structure for storing resizeable arrays is... wait for it... the resizeable array. I call mine "vector.h".
(That performant code often doesn't want a hash table is also why I backported STLport's red-black tree to C specialized on void* in my second-to-last major C project, but I'm just saying that because I'm proud of that hack.)
This is how the Linux kernel maintains just about everything. It's also how I've implemented lists inside of an allocator: http://github.com/scotts/streamflow/blob/master/streamflow.h...
(Felt I should clarify that it's nothing about hooked-in lists that prevent you from navigating the whole list. It's just the algorithms you end up using to solve the problems that pop-up at that level of systems programming.)
I think the original article should add, "performance tends to depend on the programmer, not the language".
No, because you could be learning that today and get a head start on your own future that way. It's not that hard to understand and once you've seen how the clock ticks from the inside it becomes a lot easier to build other clocks.
Lists vs resize-able arrays is a very old debate, both have their places, it all depends on the problem you're trying to solve which one is the most appropriate. If you code your stuff in a not too stupid way (say using a macro for your accesses) it's trivial to replace one approach with the other in case you're not sure which one will perform better.
Lists work well when they're short and you use a an array of pointers to the beginnings of your lists and maintain a 'tail' pointer for fast insertion in case inserts are frequent. If you use resize-able arrays you probably want to allocate a bunch of entries in one go, roughly doubling the size of your array every time you need to increase the size. That seems to waste a lot of memory, but since it is heap memory (allocated using brk) it (usually) won't be mapped to real memory until you use it, so the overhead is smaller than it seems.
I hope that's correct and clear :)
shouldn't the paradigm be "least amounts of bottle necks possible"?
Also the points mentioned by other comments are also valid (battery, VM behaviours)
However, his final point, or at least the crux of it, still stands: "Focus on stability and features first, scalability and manageability second, per-unit performance last of all, because if you don’t take care of the first two nobody will care about the third."
Incidentally on the final point, there was an interesting test that was run many years ago. Multiple teams were given the same spec, but were each told to optimize for a different characteristics (speed of development, speed of execution, maintainability, memory use, etc). Most of the teams managed to come in #1 on what they were trying to optimize for. The team that optimized for maintainability came in #2 on most other characteristics. The team that optimized for speed of execution came in last on most other characteristics.
The lesson from this is that a consistent focus on maintainable code results in more of everything else you need. Yes, there really are times that you're writing throw-away code and can forget all that. But by default, code well and let the other details take care of themselves.
Source? I'd really like to read about it.
There are unlimited number of potential future requirements that a program might have to meet. You should only code for these if you have a concrete basis for believing these will become real at some point.
If there's a serious chance your sedan will do stock car racing, then you outfit it accordingly. Otherwise, that super-muffler's just an unnecessary expense and something more to break.
Edit: You'll notice that in average car, every part has about the same quality, power and durability. In a sense, engineering is actually about achieving the least cost and the largest number of bottle necks, since any unneeded quality is wasted time and money.
When writing software, there's no direct analogy because usually the components we build our programs out of have hardly any per-unit cost. We don't save any money by using a crappy third-party library over a high-quality one. Using a library with lots of unnecessary features may be cheaper both for prototyping and for the final product.