Most people using C for high-performance computing are just using it to glue together the high-performance libraries, anyway.
Remember, most people are writing big applications, not tiny procedures. If you just want to multiply a few numbers together, sure, C is going to be fast. If you want to build a complicated application, then C's advantages are going to be almost unnoticeable and its disadvantages are severe.
Tactically, even code that simply glues libraries together is still maintaining state, and there's clearly a benefit to being able to efficiently manage bit-level control over how that state is laid out and looked up. Lack of bookkeeping and overhead, cache-cognizant data structures, and simple compactness of representation all add up. Not to mention the fact that the top of the profile for lots of programs is malloc(), and C programs can swap out allocators --- I've gotten 500% (five hundred percent) speedups from retrofitting arena allocators into other people's code.
Strategically, well, there's a term for the strategy you're advocating, and it's called "Sufficiently Smart Compiler", and it's a punchline for a reason.
Are there any in existence that do this particularly well (ie, at least as good or better than a reasonably experienced C programmer might)?
(Also, take a look at some of the ghc on llvm benchmarks, it's very competitive with C, and doesn't require you to jump through any hoops. I'd link you, but Google is blocked at work due to incompetent firewall rules. Sigh.)
OO-itis is something which definitely will sap performance if left unchecked, but it's true whether you're using C or Smalltalk. Much like denormalizing relational databases, careful placement of data members can reduce cache misses, and avoiding long chains of dereference for data commonly accessed together is important.
The idea that C# (or for that matter any GC'd language) is going to beat an allocator in which alloc is amortized to a single load, constant add, and store seems... I don't know, fill in the blank.
You seem to think I'm advocating C over GC'd languages. I'm not. I write in Ruby by default, a language that is not only GC'd but badly GC'd. I just do so with my eyes wide open on performance.
I'm not saying you're advocating C over GC'd languages. I'm specifically disagreeing with the idea that in practice, in large applications written in C, that the application author actually has discretion over allocation policy. I'm saying that you couldn't in practice use a pool allocator much of the time, even if you wanted to, unless your application is very self-contained.
The Delphi compiler I work on is written in C and uses pool allocators to great effect. There's a heap allocator which can be marked and shrunk, there's a pool for every unit, there's a pool for expressions, there's a pool for constants, etc. You've got to keep track of what got allocated where, make sure you don't pollute one pool with references to another, and sometimes do a bunch of copying to ensure that. Pooled allocation isn't a panacea, even for something as self-contained as a compiler, albeit a compiler which can be hosted by the IDE, so it needs to be long-lived, manage memory across debugging sessions, multiple compiles, etc.
Real library code either owns object lifetimes (and can use pools internally because the library's own _release() function is the only thing that can free its state), or keeps its hands completely off allocation. The few counterexamples, where for instance a library malloc()'s something and expects you to free it, tend to be notorious examples of error-prone and evil interfaces.
Meanwhile, just because everything isn't amenable to pool allocation (or, even better, arena allocation, where there is zero memory management overhead at all ever) doesn't mean you don't win huge on the places that are amenable.
You are raising the boogeyman of a hypothetical library that is going to take a pointer that I pool allocated and call free() on it, blowing up the program. I assert that any real example of such a library is going to be easy to shoot down as "an incredibly crappy library".
The primary advantage of a pooled allocator isn't in allocation - though that's nice - it's that you don't have the cost of iterating through each object to free it. But if you have external libraries, they'll abstract their allocations into handles (say), and now you have the problem of running what amount to destructors.
I think people also overestimate the extent to which C programs depend on third-party libraries for their own statekeeping.
And again... what are we talking about here? I'm not advocating writing things in C. I'm saying, it's bogus to say that since code tends to be I/O bound, C isn't a performance win for most programs. That is simply a bogus argument. That the level of performance you can get out of C is usually not worth the investment is neither here nor there. Again: Ruby programmer here.
Where I think performance advantages that come out of writing things in C come from is being rarely being able to take shortcuts by relying on provided libraries and primitives which turn out to be not quite tweaked for the problem at hand. That is, C forces you to do so much yourself - largely because it has such poor abstraction tools - that you end up with a more specialized codebase. That specialization can include cache-oriented optimization, but I don't think it's the most important aspect or unique to C such that you can't get 95% of it - to the point where it's no longer a meaningful advantage - in a GC'd language.
In other words, I'm curious what are the underlying malloc() implementations that pool allocation so outperforms.
I think your real observation is that C level performance just isn't that relevant in scale-horizontal network-bound programs, and I agree with you.
(and no programs are high-performance if they have a pointer bug that makes the program crash).
"The difference between theory and practice is small in theory and large in practice..."
Performant C code doesn't use linked lists. Linked lists shred caches and allocators. Your go-to data structure for storing resizeable arrays is... wait for it... the resizeable array. I call mine "vector.h".
(That performant code often doesn't want a hash table is also why I backported STLport's red-black tree to C specialized on void* in my second-to-last major C project, but I'm just saying that because I'm proud of that hack.)
This is how the Linux kernel maintains just about everything. It's also how I've implemented lists inside of an allocator: http://github.com/scotts/streamflow/blob/master/streamflow.h...
(Felt I should clarify that it's nothing about hooked-in lists that prevent you from navigating the whole list. It's just the algorithms you end up using to solve the problems that pop-up at that level of systems programming.)
I think the original article should add, "performance tends to depend on the programmer, not the language".
No, because you could be learning that today and get a head start on your own future that way. It's not that hard to understand and once you've seen how the clock ticks from the inside it becomes a lot easier to build other clocks.
Lists vs resize-able arrays is a very old debate, both have their places, it all depends on the problem you're trying to solve which one is the most appropriate. If you code your stuff in a not too stupid way (say using a macro for your accesses) it's trivial to replace one approach with the other in case you're not sure which one will perform better.
Lists work well when they're short and you use a an array of pointers to the beginnings of your lists and maintain a 'tail' pointer for fast insertion in case inserts are frequent. If you use resize-able arrays you probably want to allocate a bunch of entries in one go, roughly doubling the size of your array every time you need to increase the size. That seems to waste a lot of memory, but since it is heap memory (allocated using brk) it (usually) won't be mapped to real memory until you use it, so the overhead is smaller than it seems.
I hope that's correct and clear :)
shouldn't the paradigm be "least amounts of bottle necks possible"?
Also the points mentioned by other comments are also valid (battery, VM behaviours)
However, his final point, or at least the crux of it, still stands: "Focus on stability and features first, scalability and manageability second, per-unit performance last of all, because if you don’t take care of the first two nobody will care about the third."
Incidentally on the final point, there was an interesting test that was run many years ago. Multiple teams were given the same spec, but were each told to optimize for a different characteristics (speed of development, speed of execution, maintainability, memory use, etc). Most of the teams managed to come in #1 on what they were trying to optimize for. The team that optimized for maintainability came in #2 on most other characteristics. The team that optimized for speed of execution came in last on most other characteristics.
The lesson from this is that a consistent focus on maintainable code results in more of everything else you need. Yes, there really are times that you're writing throw-away code and can forget all that. But by default, code well and let the other details take care of themselves.
Source? I'd really like to read about it.
There are unlimited number of potential future requirements that a program might have to meet. You should only code for these if you have a concrete basis for believing these will become real at some point.
If there's a serious chance your sedan will do stock car racing, then you outfit it accordingly. Otherwise, that super-muffler's just an unnecessary expense and something more to break.
Edit: You'll notice that in average car, every part has about the same quality, power and durability. In a sense, engineering is actually about achieving the least cost and the largest number of bottle necks, since any unneeded quality is wasted time and money.
When writing software, there's no direct analogy because usually the components we build our programs out of have hardly any per-unit cost. We don't save any money by using a crappy third-party library over a high-quality one. Using a library with lots of unnecessary features may be cheaper both for prototyping and for the final product.
I've only worked in I/O bound, memory bound, and CPU bound code before (but never at the same time.) My hats off to anyone or group that has to work in these types of situations. Guess that's why I'm not a kernel level developer.
I've had to sort out a lot of performance problems. Almost always they were algorithm problems, architecture problems, or some simple bottleneck. Only once have I encountered a performance problem which was best solved by writing in a lower level language. More than that, my experience says that people who brag about how they've designed for scalability have generally made stupid design mistakes that cost them huge amounts of performance.
In fact here is a performance tip. If you want performance, make sure you have verbose logging options written in. Because when you hit performance problems, it is incredibly valuable to flip logging on, take the logs, study them, and identify your performance problems that way. Try it. For most applications that will matter a lot more than what language you write it in.
For instance a tool I incorporated into one system would see a parameter to a web request and would issue a command to Oracle telling it to log everything that happened on the database for that connection, and then would turn it off afterwards. So, for instance, we could take a slow web page, add a parameter, and a minute later be studying a log generated by Oracle telling us exactly what that web request did, and where it spent its time.
Having the ability to selectively do this on the live production system against live data with a problematic request while it was being problematic was huge. We were tracking down problems that only showed up in production, under production load, so no amount of profiling in development would have helped. Using the same idea, every day we would just take one random database handle, turn logging on for half an hour, and use it as a canary to look for potential problems. We found a lot of things that way.
Addendum (added later) It is also worth noting that in many horizontally scaled systems you can trivially have a fair amount of logging, even in production, if you're willing to accept a constant factor overhead in inefficiency. This can be utterly invaluable in tracking down latency, bottlenecks, and other larger scalability problems. Every large system that I've seen that was well-run did this to some extent.
But that doesn't change the fact that most of the places that I have seen worry about it have not been among them, even if they thought they were. (I admittedly mostly work in the web world.)
There are lots of domains -- simulation, computer vision, robotic control, machine learning -- which are CPU bound and will remain so for the next decade. In fact, one of the cutting edges of these domains is always CPU bound, practically by definition (sensor densities also increase quickly with time).
But thing may have changed since then.
Slightly offtopic, but one of the things I love about software is that it's trained me to recognize patterns in everything. Abstract thinking is really useful.
And I've recognized the same tendency you mention in myself, and across a few different topics.
It makes me wonder when I wholeheartedly agree or disagree with something... maybe I'm just still on that journey.
First you get your Bachelor's Degree, and you think you know everything. Then you get your Master's Degree, and you realize you don't know anything. Finally, you get your Doctorate, and you realize that nobody knows anything.
Finding those patterns between disciplines is always a delight--there is a surprising amount of crossover between many fields of study and human behavior. Programming was where I first found that humility is very positively correlated with competence, and the same principle shows up in a lot of other places.
One thing the article doesn't mention is that Java once had a slow interpreter and now it has a potentially faster interpreter. When Java had a slow interpreter, then it would be inherently slow for a larger spectrum of problem - but still not all of them.
I've seen most ridiculous stacktraces only in Java.
Wake up, this is reality.
That said, there are probably very few applications that have been this heavily hand-optimised, and probably equally few where you actually need it. Where C really stomps Java is around very low level memory management, I think. With modern processors, code can benefit greatly from colocation of related data that can be very difficult to achieve in an idiomatic way with Java.
Edit: link layout
it's also interesting to note that there are usable HLLs that suffer from few of the problems noted in those posts: both D and C# (though the latter required a second version to get some of it right) provide a garbage collected, object oriented environment like Java, but also provide a lot of the primitives needed for the kind of optimization discussed: pointers, structures, unboxed types, reinterpretation, and unsigned values.
I think this suggests that the problem is less 'hig level' languages and more 'immature' ones: C is a very mature language, descended from other mature languages, while Java was one of the first mainstream languages to make many of its decisions and as a result even now some of the larger mistakes have yet to be corrected (lack of function types and checked exceptions being two examples). Younger languages like D get to benefit from those lessons in the same way that C/C++ benefited from the mistakes of their predecessors.
1. A critical process slowed down drastically on certain days and certain times. Narrowed down to Oracle. Turned out another group went behind our back and ran expensive reports on our database server. Solution: Politic, spent half of a year to kick them out.
2. Some distributed processing slowed down steadily over time. Narrowed down to bandwidth throttling on the cross data center fiber optic. Solution: Scheduled emergency migration of processes to the same data center.
3. Site-wide page serving time slowed down. Narrowed down the Regex and XML parsing on pages; yes, this was CPU bounded. Solution: Faster libraries, pre-computation, caching result.
4. Lucene indexing took longer as data volume grew. Narrowed down to database bottleneck. Solution: revamp indexing architecture to use DFS and Hadoop.
5. Linux process spawning drastically slowed own on 64-bit machine. Narrowed down to OS page table copy-on-write overhead. Solution: work around the spawning requirement.
6. File system driver slowed down with more cache. Narrowed down to inefficient sorting algorithm. Solution: replaced bubble sort with heap sort.
In all these cases, language is never the issue.
Have also experienced this firsthand. Blame is automatically pinned on us, and it's never our fault.
With this arguing, isn't it reasonable to assume that a project Foo written in C or C++ is faster than an equivalent written in Java simply because the author writing project Foo in C/C++ likely understands performance by choosing C/C++ in the first place? (I am not saying anything about the performance of a certain language implementation)
The author also argues from a performance critical application perspective. What about desktop applications where perceived performance rather acts like a quality property? I know many people that shy away from using desktop Java and even .NET applications simply because they feel sluggish and waste memory. I don't care if the Java application is as fast in pure algorithmic performance.
If I can choose between using two equivalent C/C++ or Java/.NET applications I will choose the C/C++ application. I still think this is a good assumption.
No, not at all. First of all, don't assume that someone knows what they're doing just by choosing C or C++ over Java. There are plenty of dumb C/C++ programmers out there, and a well-written Java program is always going to outperform a poorly written C/C++ one.
Secondly, remember that Java programs may actually be faster than C/C++ programs. Programs written in C/C++ require more time and knowledge to performance tune. Writing something in Java (or other high-level language) allows the author to spend more time focusing on the big picture issues rather than having to deal with a lot of lower-level issues.
I'm not against Java and I'll even admit that theoretically I could imagine a situation where a Java program ended up being faster, but in reality, that never happens.
In reality, we always end up in situations like utorrent vs. azureus (for those that don't know, utorrent is written in c++ and pretty much better than azureus in every way). In fact, I can't really think of one instance where a piece of java software is better than an equivalent written in c or c++ (outside of developer tools, because those aren't really directly comparable anyway)
The Azureus/Vuze DHT is a lot nicer than the Mainline DHT (which uTorrent supports), it's just not documented, there are no other implementations, and this statement probably does not apply to code quality.
No. C++ may permit more extensive performance tuning, but the same level of tuning shouldn't take any longer in C++ than in Java. And really I'd say C++ as a language is at least as high-level as Java (especially considering templates), just more of the libraries you'll want to build on are shipped separately.
It's used to build kernels because it doesn't require multiples of the needed RAM in order for the memory management to be timely.
I think that's kind of a stretch.
The point is that if someone is able to choose C/C++ means that he's much more smarter than many existing Java programmers.
Sad but true.
Plus, C/C++ programmer can do a refactoring or optimizing more in their code, even in low level, with the limit is only their time and skills, but in high level languages like Java/.NET optimizing level is stuck in VM.
Until you explain what factor(s) you're optimizing for, "It's faster because it's written in (whatever)" is a canard.
You can take that discussion in most any direction.
Budget. Even free coders and open-source has its costs.
Raw speed? Custom hardware? Hand-tweaked assembler? FPGA?
Speed, but without the budget for bumming instructions? Architecture- or machine-dependent C code?
Staffing? Enterprise plug-compatible Java.
Maintainability? Not everybody can hack source code in Bliss or some other obscure or domain-specific languages.
I/O? Does removing the rotating rust from the design help?
Memory footprint or ROM space, the available languages, the stinky compiler that's available on (expurgated), or whatever other factors are key to your goals...
To paraphrase that ancient Microsoft slogan, what are you optimizing for today?
Java might make it easier, and granted the extra object allocations around Longs and Integers will make it scale poorly more rapidly, but bad (or compromised) design and poor use of data structures is always going to lead to problems of some kind or another.
(^) Yes, I know about STL et al. Original programmer clearly did not.
I agree with the author, the language has no intrinsic slowness, its the tendency to use a triply-nested abstraction for every trivial purpose (a hash table of objects containing references to a database API...) instead of Hey! a pointer, that lead the app programmer down the primrose path.
C often leads to bad algorithms, though, for the same reason it often leads to lean code tuned to the specific application at hand. Absent many general library functions, the C world is littered with lots of custom reimplementations of data structures and algorithms, not all of which are the best (and a lot of which are actually buggy). Even when they're good, they tend to have short shelf-lives: much hand-optimized 90s-era C code is now slower than more naive implementations, because the optimizations used to save some instructions often actively harm cache performance.
On the same hardware? That seems unlikely - do you have a specific example in mind?
To give one of several likely causes, CPU pipelines have grown much longer. As a result it is more important to avoid stalls these days. Naive code compiled with a modern compiler knows about the importance of this. For instance the compiler will know it can avoid a stall in certain cases by making sure that a read from memory that happens soon after a write obeys something called store to load forwarding restrictions. Doing that can mean extra code which would be slower on an old computer, but it is faster than a modern one.
At any rate, the same is true for all languages, and _delirium's point is spot on: it's not the language that matters, it's the fact that bad (or slow, or inefficient, call it what you will) code is encountered regardless. It's time we stopped language wars, don't you think?
As long as those facts remain true, it is fair to complain about this tendency in C code in the wild. Even though the problem clearly lies with some of the programmers the language attracts rather than with the language.
You can't imply "fact" and use "generally" in the same sentence, sorry.
If you don't know what to look for you could end up looking for a long time.
Minimum String memory usage (bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)
Objects in general also have significant overhead.
I'd say that C programs are generally faster because C world has near-zero amount of mediocre and copy/paste programmers, so coders just know what they're doing.
Live in the industry for 15 years or so and observe for yourself. Carefully.
"Categorically": that's what you said. Just observe, instead of trying to apriori-tize the world.
However, writing it in C will probably take more time. The question to ask yourself, does it matter if my program is taking few more cycles to finish or not?
Most of the times it IS faster in C but the difference is insignificant(e.g. C takes 0.0001, Java/Python takes 0.0002. Who cares at this point? Very few).
There is one exception though: startup speed, it comes from just the fact that you're written in C, the same language the OS is written in, which means that majority of dynamic libraries you depend on are already loaded, that's what makes piping simple programs like "wc" possible.
JVM is a mere C++ program. Period.
If you "use C for control", then you must have written your own compiler.