The "C is Efficient" Language Fallacy (2006)

nickolai · on Jan 9, 2012

Umm, yes C/C++ will not parallelize your code for you, whereas Language XYZ will. Im not sure how this is an argument against C/C++ being effective. C/C++ will not parallelize your code by design. It's was never meant to. I'd never blame my stick-shift car for not changing gears by itself - thats the first reason I didnt buy an automatic in the first place!

If your matrix manupuilation code performance becomes an issue, you, the C/C++ coder will have to parallelize it and take responsibility for whatever assumptions have to be made.

Im sure that the limit of what one considers as acceptable code alteration by the compiler/interpreter depends on the person. Personally I'd be very wary of a system trying to guess-parallelize my code - especially if I can do it myself whenever I see fit by adding a dozen boilerplate code lines. I'm sure most academics needing a numerical calculus system would be glad to have the system abstract away every possible optimization, so that they can stay as much as possible in the 'abstract math space'.

I am also kind of wary of the "look, I wrote the program in several languages and here is the perf comparison". Our skills with each language vary. What I liked was (i cant seem to find the link) a teacher who asked his class to write a program doing some text manipulation/indexing (iirc) in whatever language they wanted. The fastest code was in C, yet the worst C implementation was significantly slower than an average java code. To sum all this up - the speed depends on the skill of the person in the particular language, much more than on the language itself.

jrockway · on Jan 9, 2012

You're missing the point. The target demographic for this article is someone who is not a computer scientist. He's someone who doesn't have time to deal with thread libraries and low-level matrix operations. He just needs to crunch his data quickly so that he can publish his valuable research. This person reads the same blog articles that computer scientists do, and comes to the conclusion that "I should learn C++ for my number crunching". That's simply not the case; you can use C++ to get good performance, but the amount of knowledge you need to acquire to do so is an entire field. You aren't going to be the world's best human genome researcher and C++ programmer; the time spent learning C++ could have been used to do research.

The idea is: if you use a tool that lets you describe your problem at a high level and let the computer worry about how to do the calculation efficiently, you'll beat your hand-rolled C++ every time. And, there is now the possibility of hiring a "Real Programmer" to hack your math library (Octave, Mathematica, Python + numpy, whatever), allowing you to delegate work. If every calculation was a brand new C++ program, you'd only be able to improve your application's performance by hacking on your application. Separating the problem description and the solution description allows the solution-finding code to be optimized without knowledge of the specific problem.

I think this approach scales to the work that practicing programmers, too. We don't see it a lot because writing tools is time consuming and we all have deadlines; so we pic k the "worse is better" solution and hard-code our applications in what amounts to glorified machine code.

Just because a practice is common doesn't mean it's a good idea.

squidfood · on Jan 9, 2012

Exactly. I'm a numerical scientist (does a lot of oceanography) who started out around 1989 in c "for speed". After 15 years of wrestling with hand rolling up, unrolling, threading, rolling for the simplest matrix multiplication, someone said "you should try modern Fortran, it's not like F77 was when you started."

my GOD what a breath of fresh air. Natural array expressions, operations slicing, sizing, etc. (talking about the core language not libraries) and incredible, incredible speed. Revolutionary, and makes me deeply regret those years I recited the "c is fast" mantra.

Mind you, the compiler that handles the optimizations is written in c. But you know - I don't need to know about that;in c, I always felt like I was half writing-the-compiler anyway. Thankfully, I'm not anymore.

Now the thing is, I tell people this and they look at me like I'm crazy ("fortran? no way!") This article... I love a little validation.

JoeAltmaier · on Jan 9, 2012

C/C++ are not the language they were when you did that. Now there are pragmas to control these things, and libraries to help.

squidfood · on Jan 9, 2012

Believe me, I'm current with c/c++ fixes, tricks, and libraries up to about 2007 at least. About every two years during this long period of self-abuse I'd go library-hunting and find some piece of c++ that marginally improved things; the sheer brainpower (mine and the library-writers) wasted on these workarounds is staggering.

Libraries are very nice, but better to start with a suited base-language.

Edit: not to say that my years in that landscape were wasted; other fortran-only programmers look at me as crazy when I create the simplest "object" using the fortran-equivalent of struct; to them, life is nothing but wild unencapsulated matrices deserving to be free.

m_for_monkey · on Jan 9, 2012

I'm interested. Which Fortran compiler would you recommend for a beginner for using on Linux?

squidfood · on Jan 9, 2012

gfortran is just fine! Intel fortran is a step up on various optimizations (especially parallel architectures) but gfortran is a good baseline.

danieldk · on Jan 9, 2012

Except that there are fine high-level matrix libraries, optimization libraries, etc. for C/C++. I needed an parameter estimator for maximum entropy models that was efficient for training rankers (e.g. for parse disambiguation, fluency ranking, etc.), that also had good support for feature selection. I just used an off-the-shelf optimizer (liblbfgs), implemented calculation of the objective and gradients, plus various feature selection methods.

Within no-time, we had an extremely fast estimator, that could also help answering my research question (what are the most discriminative features in our models).

Let me push this point somewhat further: C is a very simple language. It's something that someone with a certain level of intelligence can fairly quickly pick up. There is also a wealth of excellent libraries available. The problem with 'use high-level, the compiler/interpreter will take care of it' is that those languages are either a lot more complex (Haskell, OCaml) or a lot slower (Python, Perl, Ruby). In the latter case, you will end up implementing modules in C anyway.

yummyfajitas · on Jan 9, 2012

Except that there are fine high-level matrix libraries, optimization libraries, etc. for C/C++.

You'd be surprised how much of it is actually fortran at the bottom. Lapack and FFTpack are two of the more important examples.

jbooth · on Jan 9, 2012

True, but the reasons for that are historical, not performance-based.

Lapack depends on a BLAS implementation for the primitive matrix and vector operations (the actual number crunching), and while the reference impl is in Fortran, the high-performance ATLAS implementation is written in C, and I believe Intel MKL is as well. These implementations parallelize the matrix and vector ops, leaving the single-threaded fortran code in more of a "controller" role.

There's also work being done to parallelize the LAPACK operations at a higher level than delegating to parallel BLAS, it's shown promise but isn't done yet, see the PLASMA package released by some oak ridge national labs people.

ArbitraryLimits · on Jan 9, 2012

Until C changes so that the compiler can tell that a function's pointer arguments are actually arrays (a drastic change, so it will never happen), Fortran will always have a place in numeric work..

jbooth · on Jan 9, 2012

I'm not enough of a compiler buff to be able to say for sure, but I think for problems like BLAS or LAPACK where you have a very well specified problem and the motivation to dedicate absurd amounts of time to micro-optimization, C's got an edge there.

Now, if you're writing from scratch and if the Fortran compiler is capable of the right optimizations, then yeah, it's probably more worth it to use fortran. Especially if you can code directly to your problem domain. But I'm pretty sure ATLAS and MKL beat the standard compiled fortran, otherwise they wouldn't exist.

In short, just because the C compiler can't parallelize your code for you doesn't mean that you can't do that yourself. It's just a lot of work.

ArbitraryLimits · on Jan 9, 2012

ATLAS is fast because its C code is generated beforehand by a more intelligent program. ATLAS is essentially the output of a compiler that has access to the information that would have been thrown away had your project been written directly in C. All the latest numeric libraries use this approach now, e.g. FFTW's backend is written in ML. The ATLAS overview paper (http://www.cs.utsa.edu/~whaley/papers/atlas_siam.pdf) explicitly states, "ATLAS does not require an excellent compiler, since it uses code generation to perform many optimizations typically done by compilers."

The point is that neither C nor the C compiler used to build ATLAS's output is what makes it fast. What makes it fast is essentially a different compiler which operates upstream of the C compiler.

jbooth · on Jan 9, 2012

Well ok, but someone sufficiently skilled could still hand-write code in C, using pthreads and SSE instructions, that will beat the fortran code every single time, because those constructs just aren't available in fortran and it's hard to claim that the compiler's always going to be better (as a Java guy most of the time, trust me, I'd love to believe it).

ArbitraryLimits · on Jan 9, 2012

Why would pthreads and SSE instructions not be available in Fortran? Pthreads are just another library which Fortran programs can link to with no problem, and a modern Fortran compiler is perfectly capable of handling inline assembler.

jbooth · on Jan 9, 2012

I saw this link claiming that it was difficult/impossible to use threads in fortran due to surprisingly global variables: http://math.arizona.edu/~swig/documentation/pthreads/#fortra...

Maybe it's wrong, I'm not knowledgeable about Fortran and certainly not horribly invested in this, go ahead and write a world-conquering BLAS library in Fortran if you want.

sampo · on Jan 10, 2012

The text in your link claims: "it is illegal for a FORTRAN subroutine to call itself recursively, either directly or indirectly". That is not true anymore (for almost 20 years), since Fortran 90 has 'recursive' keyword with which you can declare that a subroutine can be used recursively.

(Also the all-caps spelling, FORTRAN, strongly hints that the link only talks about Fortran 77.)

ArbitraryLimits · on Jan 9, 2012

FORTRAN's problems with rentrancy don't mean that FORTRAN programs can't be multi-threaded, it just means that each function has to be run from the same thread every time it's called.

I'm not saying FORTRAN is great - clearly it has a bunch of problems which C mostly doesn't have, which is why people mostly use C instead. But this one single advantage (allowing the compiler to do optimizations on array work) is so important to some people that it will always be around.

squidfood · on Jan 9, 2012

Yes, it is in fact so "simple" that you have to bang it on the head and shout at it to get it to actually do anything.

For the audience of this article, a good test is to compare a copy of Numerical Recipes in c versus the fortran edition, where the whole purpose of the book is to present numerical code to scientists. c (and c++) have to spend ages developing up notation and workarounds for the simplest matrix to correct c's deficiencies; the pointer manipulations (array pointers, function pointers, pointer pointers) make the code nigh-unreadable compared to the straightforward and understandable fortran.

Edit: I agree with your point with respect to interpreted languages, especially those with weak typing; I don't need to control where my number lives in memory, but I do need to control the bits of precision from the beginning.

danieldk · on Jan 9, 2012

Your definition of unreadable must then differ from mine:

http://eigen.tuxfamily.org/dox/QuickRefPage.html

archgoon · on Jan 9, 2012

And working code breaks when moving to new computers or upgrading the operating system, meaning you have to dive into the template library to figure out how the array alignment is working, and how Mac OS X function calls affect the stack.

Been there. Not pleasant.

danieldk · on Jan 9, 2012

Well, arguably, Apple upgraded Xcode, sorry your favorite interpreter/compiler broke, is even less fun.

derleth · on Jan 9, 2012

> interpreted languages, especially those with weak typing

Dynamic and weak are not the same thing.

Weak typing means you're allowed to break abstractions; C has weak typing, C++ is marginally less weak, and Python has strong typing for its built-in types.

MichaelSalib · on Jan 9, 2012

C is a very simple language.

It is? Out of curiosity, did you know about all the undefined behaviors described at http://blog.llvm.org/2011/05/what-every-c-programmer-should-... and did you understand how your C compiler takes advantage of them before you wrote that statement?

rcfox · on Jan 9, 2012

Yes, that's what makes C simple. If all of those edge cases were covered, the language would be much more complex.

Simple != Easy to use

MichaelSalib · on Jan 9, 2012

So when people say that "C is simple", you think they really mean "C is simple to implement" or maybe "C is simple to specify"? I don't think the grandparent meant either of those things and I don't think either of them are true anyway.

danieldk · on Jan 9, 2012

Simple in the sense that it is a small language. There are not a lot of constructs, abstractions, etc. C lets you allocate blocks of memory, and perform operations on those blocks of memory. That's mostly it. In my experience, this very much maps to the problem domain (number crunching).

If you want to do performant number crunching with most other languages, you have to not only grasp the language, but also the underlying virtual machine or compiler. The more layers you pile on top, the harder it becomes to identify bottlenecks.

Sure, you can use fast implementations of common algorithms in a high-level language. But they are often written in C or C++. So, if you want to modify or extend such algorithms (which is likely, or you could just use an off-the-shelf program), you will end up in C-land anyway.

AngryParsley · on Jan 9, 2012

>Simple in the sense that it is a small language. There are not a lot of constructs, abstractions, etc. C lets you allocate blocks of memory, and perform operations on those blocks of memory. That's mostly it.

You forgot "to free those blocks of memory." It's OK. We C programmers often forget this. :)

qdog · on Jan 9, 2012

Not to pick too many nits, but I didn't use free() or malloc() for about 7 years of my career in programming C.

Embedded systems sometimes still frown upon willy-nilly memory allocation ;)

Of course, days when your build breaks and it's because someone, somewhere defined a piece of data as 8 bits, but your latest build has shifted some stuff around and suddenly two memory definitions have become one because of memory alignment...was probably more painful to track down than free() issues.

loup-vaillant · on Jan 9, 2012

> In my experience, this very much maps to the problem domain (number crunching).

I find that… odd. Surely we all know here that number crunching wasn't C's domain to begin with. It was writing an OS (UNIX) on 2 slightly different machines.

But even more disturbing, are you seriously suggesting that being able to manage the freaking memory makes you closer to number crunching? Sorry for the emphasis, but I am astonished. Manual memory management (and unrestricted pointers for that matter) are about the hardware. They are about manual tweaking of implementations. They are definitely not about number crunching.

And even if they were, I take one goal of number crunching is to milk every single cycle out of your CPU farm. As far as I know, C loses that match to Fortran.

> If you want to do performant number crunching with most other languages, you have to not only grasp the language, but also the underlying virtual machine or compiler.

This is already the case with C. It has been a few years (decades?) since C code and assembly no longer match neatly at all. GCC optimized assembly code is such a mess that I see it as hermetic magic. Heck, even the performance model of modern CPU is half magic to me.

Now you are correct. I'm just saying that it applies to C as well.

But there is hope. The Viewpoint Research Institute, with their STEPS project, managed to write a complete compilation suite in about two thousands lines of code. It's not optimized for runtime speed, but it does suggest that it might eventually be manageable. Here is their last report: http://www.vpri.org/pdf/tr2011004_steps11.pdf

> Sure, you can use fast implementations of common algorithms in a high-level language. But they are often written in C or C++.

Of course. But the the OP did quite clearly talk about implementing those algorithms completely in those high-level languages. Either he was being dishonest, or your argument doesn't apply.

wvenable · on Jan 9, 2012

> C lets you allocate blocks of memory

It doesn't even let you do that -- those things are external library calls.

danieldk · on Jan 9, 2012

The C99 standard specifies that the stdlib.h header should be provided with amongst other things malloc()/free(). So, C99 provides heap allocation functions.

Also, some people will argue that 'asking' an array on the stack is also allocation ;).

derleth · on Jan 9, 2012

They're standardized, so they are part of the language.

They don't have special syntax, but, then, neither does practically anything you do in Common Lisp, and that's standardized as well.

stcredzero · on Jan 10, 2012

They're standardized, so they are part of the language.

"Language" has slightly different meanings in different contexts. Perhaps he's talking about something in a theoretic context, as opposed to practice? Smalltalk has no special syntax for allocation (creating new objects) either.

derleth · on Jan 10, 2012

> Perhaps he's talking about something in a theoretic context, as opposed to practice?

Then he's still wrong. A standardized part of the language is a part of the language.

jerf · on Jan 9, 2012

It may not be what was meant, but it is the correct way in which C is simple. C is simple because, like much of UNIX (especially back in the very beginning), anywhere there was a sharp pointy bit that was difficult to handle in the compiler, it was simply relayed up to the user to handle.

I mean this descriptively, not as a criticism. It is critical to understanding C and UNIX. It is also worth pointing out that while we might all prefer "the perfect compiler that hides all problems", that "letting the pointy bits stick the user but ensuring they can handle it" still beats "a compiler that badly hides the pointy bits, still lets them stick you, and doesn't give you any way to deal with it because the assumption that you couldn't be stuck was built in too deeply".

MichaelSalib · on Jan 9, 2012

You mean "simple" in the 'Worse is better' sense. That's fine. It means that the language designers made their lives simple at the expense of their users. But from a user's perspective, C is not simple.

aaronblohowiak · on Jan 10, 2012

I'd argue it didn't go far enough in this direction w.r.t. automatic datatype promotion rules (particularly where signedness comes into play.)

ArbitraryLimits · on Jan 9, 2012

You're confusing simplicity of interface and simplicity of implementation.

danieldk · on Jan 9, 2012

It is? Out of curiosity, did you know about all the undefined behaviors described at

Yes, those are familiar and described in every decent C introduction. Of course, that doesn't mean people do not make those mistakes. Let the compiler be pedantic.

did you understand how your C compiler takes advantage of them before you wrote that statement?

At the very least, is fairly easy to understand how the compiler optimizes correct code. In say, Haskell (which I do like a lot), you often have to resort to reading ghc's Cmm output to see why something is not optimized.

jemfinch · on Jan 10, 2012

It's not the strings that aren't in the language that make it simple; it's the strings that are in the language. "Undefined behavior" is caused by strings that aren't in the language, and is irrelevant to the complexity of the actual language.

hyperbovine · on Jan 9, 2012

For the record, you should never be paying someone to tinker with and/or parallelize low-level matrix routines. The numerical linear algebra folks spend their entire careers thinking about this stuff and writing (FOSS) software to wring every last bit of performance out of the hardware. There are specialized suites for multicore, distributed and even GPU setups.

groby_b · on Jan 12, 2012

... unless you have a very specialized use cases. The linalg guys are great, but they write for the generalized case. They have to.

And so, generations of game developers write matrix multiplication code. With quite nice performance results.

It'd be nice to see if some of that performance intensive code would benefit from being written in fortran. Anybody up for porting box2d as a small test case? ;)

feralchimp · on Jan 9, 2012

There is a very small set of humans who should be implementing their own performance-intensive numerical computing routines, in any language.

Should I happen to wander into a problem space that needs this stuff, my choice will be how many minutes to spend Googling for a decent free library, or whether to license a commercial one.

On the shoulders of giants...

MichaelSalib · on Jan 9, 2012

I think you're getting stuck on a side point that Chu-Carrol overemphasized. The aliasing issue prevents a bunch of optimizations; auto-vectorization is only one of them.

MichaelSalib · on Jan 9, 2012

Putting aside his general point C efficiency, I'm curious about his specific claim that Fortran compilers outperform C specifically because aliasing conceals optimization opportunities. John Reghr points [1] to a really interesting paper from 2004 that used a special analysis tool to mark every single pointer as restrict as it safely could in the SPEC benchmark. The result was a 1% performance improvement.

That suggests that at least circa 2004, if aliasing were a serious problem for C compilers, restrict annotations were not the solution, which calls Chu-Carroll's claim into question. But there might be other explanations. Any thoughts?

[1] http://blog.regehr.org/archives/537

ETA: Just to clarify what the 2004 paper involved: the researcher took the SPEC code and ran it through a dynamic analysis tool that identified every single use of non-restricted pointers that did not alias any other in-scope pointer at the time. He then took all of those pointers and marked them as restricted in the source text. The result was a program with every possible pointer marked as restricted that could be so marked. That's way more than a human annotater will ever do. And all that got him a 1% performance improvement.

lukesandberg · on Jan 9, 2012

Well maybe the reason performance improved only slightly is because not a lot of effort has gone into optimizing situations involving restrict pointers because it is so rare. If it is common in fortran it would make sense that a lot of effort has gone into special optimizations for that case, the same effort would not have been spent in c compilers because it is so rare.

MichaelSalib · on Jan 9, 2012

That might be true, but if so, then everyone below who is claiming 'ZOMG! The article is totally wrong because C99 added restricted pointers!' is wrong.

Also, I find it weird that the C99 committee, which was full of compiler vendors, got so excited about adding a fairly dangerous (and IMHO hard to use correctly) feature to the language without bothering to update their optimisers to make use of it. The only benefit that restricted pointers offer is better optimization; to add support for them without adding better optimization is a complete waste of everyone's time.

AshleysBrain · on Jan 9, 2012

I'm not sure it's as rare as you make it out to be - AFAIK most compilers that support restrict declare malloc() and new to return restrict pointers, so

    int *x = malloc(100), *y = malloc(100);

followed by some loop should be able to spot that x and y are not aliased. It should also be able to propagate the restrict as pointers are passed around, at least to some extent.

lukesandberg · on Jan 10, 2012

would the c compiler actually pull that restrict annotation forward without declaring the locals as restrict?

cliffbean · on Jan 9, 2012

Both compilers in that 2004 paper were hardware vendor compilers, which are often heavily tuned for SPEC CPU2000. Also, both were tested with cross-translation-unit optimizations enabled; inlining in key places can make many pointer aliasing problems disappear. It's likely that these compilers already know how to do whatever is needed to get good SPEC scores even without restrict.

MichaelSalib · on Jan 9, 2012

That's a really good point. I'd love to see the same analysis done with gcc/clang/icc.

unwind · on Jan 9, 2012

This is old (2006), and seems to base its argument around the problems with unrestricted pointers in C, that cause aliasing.

As of C99, of course, C has the "restrict" keyword which allows pointers to explicitly be declared to not alias, thus enabling all these optimizations in C, too.

buff-a · on Jan 9, 2012

And then there are SIMD compiler intrinsics. Not technically part of the C standard, but if you want them, C/C++ is where you find'em. Failing that there's inline assembly.

kabdib · on Jan 9, 2012

Ding. It's amazing what you can do with a decent set of vector operations.

Honestly, if you want balls-out performance, you probably need specialized hardware. In days of yore that meant you bought a vector box for your computer, or a Cray. These days you can just load up a PC with a bunch of high-end GPUs. A cow-orker of mine down the hall has a machine with six of them installed -- he's so giddy, it's kind of irritating :-)

aaronblohowiak · on Jan 10, 2012

if you use the Accelerate Framework on OSX or iOS, you have a very easy-to-use interface to vector ops (pass in pointers to your array..)

rbehrends · on Jan 9, 2012

The problem with the C99 restrict is that its correct use is not (and cannot) be enforced by the compiler. For example, the following is legal C (in the sense that no compiler that I know of will issue as much as a warning):

void f(char * restrict p, char * restrict q); char * h() { static char s[10]; return s; } void g() { f(h(), h()); }

(Correctness of a program using restrict is, in general, not decidable.)

The tradeoff here is that while restrict can be used to allow for further aliasing, it is very easy to write code with undefined behavior as a result. Dennis Ritchie argued as much when he (successfully) kept "noalias", the precursor of "restrict", out of the C89 standard: http://www.lysator.liu.se/c/dmr-on-noalias.html (while "restrict" is not quite as dangerous as "noalias", it still has to be used with care).

jmilloy · on Jan 9, 2012

I don't know the first thing about FORTRAN. How does it enforce this?

rbehrends · on Jan 9, 2012

This is independent of what FORTRAN does. The problem with the "restrict" keyword in C99 is that introducing it can make otherwise correct code incorrect and the compiler can't tell you.

As to program analysis, programming languages without pointer arithmetic have it a lot easier than languages with. The pair (array, index) holds more information than the address of array[index]. For example, it is trival to perform array bound checks if you have the former information, but very hard if you deal with arbitrary pointers. Pointer arithmetic loses information.

stonemetal · on Jan 9, 2012

In C restrict says in this function assume pointers don't alias. Which may or may not be true. This gives the library author a hard choice to make: limit usefulness or performance.

Fortran for the longest time(introduced in 90) didn't have pointers so there was no aliasing possible. With pointers in Fortran you have to specify what they may alias so the problem is much easier.(note: I have read about it but never actually worked in Fortran so I may be wrong.)

zvrba · on Jan 9, 2012

If "undefined behavior" were an argument for keeping features out of the language, the language we would have ended up with would not be C.

CountHackulus · on Jan 9, 2012

In addition to that, if you're using the Intel compiler, then you can say

   #pragma ivdep

Which tells the compiler that the loop you're writing doesn't have any vector dependencies. Or -fno-fnalias to say that none of the arguments you're passing to a function are aliased.

Seems like pretty reasonable ways around the problem. Not to mention things like OpenMP.

tlb · on Jan 10, 2012

Pragmas suck. What I want is to explicitly assert the condition:

  assert(a+n < b || b+m < a);

(saying that a[0..n] and b[0..m] don't alias) and have the compiler generate code that first tests the condition and generates optimized code given the assumption.

gillianseed · on Jan 9, 2012

iirc the GCC equivalent of ICC's '-fno-fnalias' is '-fargument-noalias'. A pragma would be nice though.

AlisdairO · on Jan 9, 2012

Thank you. I see some variant of this argument somewhere on the web every few months, and it makes me want to bang my head on the table.

wickedchicken · on Jan 9, 2012

Yeah, but you have to manually do that. Maybe I'm a purist, but I hate manual features that the compiler should just 'figure out.' At best it's an unneeded annoyance, at worst you can actively slow your program down or even cause it to be incorrect. Aliasing can be tricky even to experienced programmers, and it's nicer to just have a compiler figure it out for you. It will probably do a better job anyway.

AlisdairO · on Jan 9, 2012

I really don't mind seeing people say "it's easier to do this and make it run fast in another language" - that's a completely fair argument. It just really annoys me when I see "C can't do this", when it can (and has been able to do so for a long time). It makes the author appear completely ignorant.

kstenerud · on Jan 9, 2012

While that would be nice, it's unfortunately not the reality. The computer falls into failed optimization traps all the time. The only difference is that with a human, you can detect it with a profiler and correct the problem by reordering a few things. With automatic optimization, you can't.

sharpneli · on Jan 9, 2012

Exactly. The whole article seems to based on the writers ignorance about "restrict" keyword.

However the article had been valid if it were published before C99.

m_for_monkey · on Jan 9, 2012

At first I didn't include the date in the title, because I don't think it matters, the author's position seems even stronger now with many popular alternative languages, fast VMs and effective JIT compilers. Anyway, C99 was, of course, already 6-7 years old in 2006.

exDM69 · on Jan 9, 2012

And restricted pointers were probably available through compiler-specific language extensions way before that. You can still use them if you have legacy code that won't compile with c99.

chalst · on Jan 9, 2012

C and C++ suck rocks as languages for numerical computing. They are not the fastest, not by a longshot. In fact, the fundamental design of them makes it pretty much impossible to make really good, efficient code in C/C++.

I dare say this is more or less true of C, but following big improvements in the quality of C++ compilers (changes that happened well before 2006), C++ has proven itself as a language for high-performance scientific computing. Todd Veldhuizen provided a survey back in 1997 of the changing case in favour of C++ to accompany his Blitz C++ library: http://www.autistici.org/sens/inf/scientificcomputingcfortra...

Fortran still has considerable advantages, but it's been a long time since Fortran programmers could regard C++ as offering unserious performance.

I would also hesitate to say that C++ is lower level than Fortran. With a suitable coding style, C++ is a quite high-level language. In fact, it is precisely the abstractions that C++ offered (templates) that allow the optimisations to take place that have delivered these improvements in compiler performance. Was the author not aware that templates can be used in this way? The discussion of alias detection suggests so.

The high-level point is right, namely that abstractions make for safer languages and give compilers freedom to make optimisations that apparently more efficient, less safe languages cannot, and so deliver better performance. But it would have been a better article if it had not mentioned C++.

Another conclusion to draw is that benchmarks produced by people who are out to make a point are worthless.

lutorm · on Jan 9, 2012

I was going to post something similar, with one addition:

One performance problem with Blitz was that the expression templates obfuscated the code enough that the compiler bailed on doing SIMD vectorization of the evaluation loops. That was fixed this year at least for the Intel compiler (gcc doesn't have pragmas to control vectorization), so now performance can even exceed Fortran for large-ish arrays.

I made some plots while fiddling with this if you are interested: http://governator.ucsc.edu/filer/blitzbench_r1845/blitzcomp....

attractivechaos · on Jan 9, 2012

The longest common substring problem (LCS) can be solved in O(n^2) time using dynamic programming, not O(n^3) as is stated by the author of that post. If the author is unable to get the basic fact right, I can hardly trust his benchmark. Also, I question the author's skill in C/C++: in my experiences, C is consistently faster than Java for such tasks and C++ is nearly as fast as C as long as we use it the right way. If we look at the computer benchmark games, OCaml never beats C in terms of speed. I doubt the conclusion was much different in 2006. The author should released the source code; otherwise the benchmark tells us nothing but his incapability in programming.

EDIT: in his comments to another commenter, the author was saying this: "[The OCaml compiler] could do some dramatic code rewriting that made it possible to merge loops, and hoist some local constants out of the restructured merged loop." A good C programmer should be able to do all the above simply by instinct. The author was not good enough. I buy the argument that being really good at C/C++ is more difficult than at other languages, but this is not the same thing as arguing C is inefficient.

pandaman · on Jan 9, 2012

Judging by the difference in his numbers between C and C++ he did not have a slightest clue of what he was doing. I could imagine a slight difference is possible if you use default compiler switches due to exception handling and difference aliasing rules, not 3x as he has.

chengl · on Jan 9, 2012

What the author mentioned is longest common SUBSEQUENCE (not substring). And it's true that LCS requires O(n^2) if only need to find ONE LCS, using dynamic programming. But if it's required to find ALL longest common subsequence, it definitely requires higher order. http://en.wikipedia.org/wiki/Longest_common_subsequence_prob...

I agree that the author should post the code he used to benchmark different languages. Otherwise, it's not convincing

attractivechaos · on Jan 9, 2012

Can you find the proof that finding all LCS is O(n^3)? The author was talking about "the standard algorithm for computing LCS". I was assuming the standard algorithm finds one LCS only.

cperciva · on Jan 9, 2012

Indeed, finding all LCSs requires exponential time: The strings "abcdefghijkl..." and "badcfehgjilk..." have the 2^(N/2) LCSs "[ab][cd][ef][gh][ij][kl]..." and no algorithm can ever run faster than the amount of time it takes to print its output.

zvrba · on Jan 9, 2012

C and C++ are efficient for general-purpose programming, if you know how to use them. C is here to stay because it is lingua franca of the computing world: OS APIs are defined in terms of C functions, and I know of no libraries in wide-spread use that do not offer a C or C++ interface.

People otherwise rightfully challenge his conclusions.

There's a funny comment there about matlab: "MATLAB struck me as being the wrong tool for every problem."

Peaker · on Jan 9, 2012

Matlab may be a pretty bad language, but it gives access to a huge (and growing) body of existing code.

wickedchicken · on Jan 9, 2012

If you're an engineer (like an engineer engineer, not a software engineer) 90% of the time matlab has some module built-in or for sale that simply solves the problem you have. For example: http://www.mathworks.de/products/dsp-system/demos.html?file=...

The code to construct an LMS filter (http://en.wikipedia.org/wiki/Least_mean_squares_filter#LMS_a...) is effectively one line:

   h = adaptfilt.filtxlms(L,muW,1,Hhat);

KingMob · on Jan 9, 2012

You've hit the nail on the head. This is exactly why Matlab is so pervasive in science. Not because it's actually good, but because it has so many handy libraries. The Mathworks is the Microsoft of science.

I had to use Matlab for seven years in neuroscience, and it's a terrible language for everything other than matrix math, data plotting, and (if you cough up for the Parallel Computing Toolbox) easy parallelization.

rcfox · on Jan 9, 2012

Somewhat off-topic: Have you heard of the Neural Engineering Framework? http://ctn.uwaterloo.ca/~cnrglab/?q=node/10 It's got a Python (well, Jython) scripting interface and can also interact with Matlab.

KingMob · on Jan 9, 2012

No, I hadn't heard of it, but I was operating slightly higher, in cognitive neuroscience, and doing a lot of fMRI and intracranial EEG work. For those domains, the biggest packages are SPM and Fieldtrip, both of which run on Matlab.

I do, however, hope that Python will push out Matlab one day.

fauigerzigerk · on Jan 9, 2012

There are many widely used Java libraries.

haberman · on Jan 9, 2012

As someone who doesn't know Fortran, how does Fortran solve the aliasing problem? Even if pointers and arrays are different, how can you ensure two arrays don't alias each other? The only way I can think of to do this is to always copy arrays when they are passed to functions, but this seems expensive. Otherwise I don't see how you can avoid this pseudocode:

  void f(array1, array2) { /* somehow guaranteed not to alias? */ }
  void g() {
    array my_array[50];
    f(my_array, my_array);
  }

scott_s · on Jan 9, 2012

I thought about it, and I realized I didn't know. This is the best discussion I found: http://gcc.gnu.org/onlinedocs/gcc-3.4.6/g77/Aliasing-Assumed...

Short answer: they don't. What you wrote is an undefined behavior, just like dereferencing a null pointer in C. If the compiler can't tell statically, then it just assumes they're not aliased.

I'm basing this conclusion mainly on this statement from that piece on aliasing: Essentially, compilers are promised (by the standard and, therefore, by programmers who write code they claim to be standard-conforming) that if they cannot detect aliasing via static analysis of a single program unit's EQUIVALENCE and COMMON statements, no such aliasing exists. In such cases, compilers are free to assume that an assignment to one variable will not change the value of another variable, allowing it to avoid generating code to re-read the value of the other variable, to re-schedule reads and writes, and so on, to produce a faster executable

haberman · on Jan 9, 2012

Interesting, that sounds equivalent to just assuming "restrict" on all function parameters. If that's true, then C and Fortran aren't fundamentally that different in this respect except that C assumes things can be aliased by default and Fortran assumes they can't.

stevecooperorg · on Jan 9, 2012

(I don't know if I have this right -- I've not done Fortran in 15 years, but I feel like I've got about 1/3 of an answer and someone more knowledgeable may be able to finish it off. Any code samples guaranteed to be wrong. ;) )

Fortran 77 doesn't allow recursion, and all arrays are fixed in size at compilation time. That means every array in the source code is allocated memory when the program starts; you can never allocate a new array during program run; either by allocating the memory dynamically, or by making a recursive call that has an array local. I don't believe there's a stack -- not the way C/C++ has a stack for locals and return values. So if we declare a main program and a subroutine, like so;

    01  PROGRAM TEST1
    02  REAL  D(10)
    03  REAL  E(10)
    04  CALL  A(D,E)
    05  CALL  A(D,D)
    06  END
  
    07  SUBROUTINE A(F,G)
    08  REAL  F(*)
    09  REAL  G(*)
  C 10 --- Do something with F and G
    11  RETURN
    12  END

Then this program has exactly two arrays -- the ones declared on line 02 and 03. The ones on line 08 and 09 declare the type of the parameter passed on 03 and 04, but it doesn't allocates any memory. This idea is true for all F77 programs -- you can look at a program and say 'This program has exactly nineteen arrays'.

So -- this limitation may explain the aliasing problem. A call which passes an array (line 04, line 05) always calls a particular array -- it's not just 'pass a pointer to an array' but 'pass a pointer to memory location 85349'.

So in

    04 CALL A(D,E)

then we know they are separate arrays, and when we write

    05 CALL A(D,D)

we know they are the same array. We're never confused about whether we are, or are not, aliasing.

There are some notes on Fortram memory allocation and arrays here (http://www.ibiblio.org/pub/languages/fortran/ch2-4.html):

" When the array is declared in the 'outermost' procedure, the compiler allocates memory for it. When you pass the array with a CALL statement, the compiler actually passes the base-address of the same array to the called procedure. When the called procedure operates on the array it works on the same array - uses the same memory storage area, the array is not 'copied' to another memory area (But, it might be in some cases on some Fortran systems)."

jpdoctor · on Jan 9, 2012

> Modern architectures have reached the point where people can't code effectively in assembler anymore

Someone should inform the guys over at ffmpeg that the jig is up.

ternaryoperator · on Jan 10, 2012

The statement is overbroad but not wholly incorrect. The biggest challenge to writing good (I don't know about "effective") assembly language is indeed the complexity of processors and architectures which can greatly magnify the effects of imperfect instruction choices or instruction sequencing. Obviously, some people can code effectively within those constraints, but it's a very much more specialized skill than it used to be.

dextorious · on Jan 10, 2012

Someone should inform you about generic statements and exceptions that prove a rule.

Locke1689 · on Jan 9, 2012

Arrays and pointers are most certainly not the same thing in C. Check yourself here:

    char x[100];
    char* y = malloc(100*sizeof(char));
    printf("Array: %ld\nPointer: %ld\n", sizeof(x), sizeof(y));

emillon · on Jan 9, 2012

> In C and C++, there's no such thing as an array

Yes there is. `int a[10]` allocates 10 consecutive `int`s on the stack, and it's the only language construct to express that (with `alloca` but it's a builtin function).

exDM69 · on Jan 9, 2012

The OP wants to point out that C hasn't got first class arrays the same way that Fortran does. The C kind of arrays get passed to and from functions using naked pointers and that makes lots of compiler optimizations more difficult because of aliasing issues while a Fortran compiler knows that two arrays cannot alias (use overlapping memory areas). By adding a "restricted" declaration to your pointer types in function signatures, the compiler assumes no aliasing occurs and goes on with the optimizations. It's the coders' responsibility to make sure no aliasing happens, but if it does, all hell can break loose.

jrockway · on Jan 9, 2012

Sort of true. In C99, you can:

  void f(int len){
    int array[len];
    printf("sizeof len: %zu\n", sizeof(array));
  }

Yes, I was weirded out when I saw this for the first time. But C does in fact have arrays; they're just not very good arrays.

pjmlp · on Jan 9, 2012

If len is big enough your f() call will crash with a stack overflow.

repsilat · on Jan 9, 2012

I don't understand why stacks are still so small. On nice operating systems memory is only needed when it's touched, not when it's asked for, so making the stack large doesn't cost anything unless you need a large stack. On 64-bit systems you could make the stack a billion gigabytes and still have 95% of your process' virtual address space available for the heap.

On Windows the stack is one megabyte by default. We live in the future and we're still afraid of recursion without tail-call optimisation and arrays on the stack. Ridiculous.

jemfinch · on Jan 10, 2012

Stacks are small because every thread in the system has to have one, it's that simple.

quotemstr · on Jan 10, 2012

Who said stacks had to be contiguous and allocated ahead of time?

emillon · on Jan 10, 2012

How would you manage non-contiguous stacks ? ie, how would you know (upon return) where is the caller's stack frame is ?

exDM69 · on Jan 11, 2012

The stack is only contiguous in the virtual address space. The physical address space is a different beast. The operating system allocates a big chunk of virtual address space for the stack but doesn't allocate the physical memory to fill that. When an app wants to read or write from the unallocated part of the stack, a page fault interrupt is generated and the operating system allocates physical memory frames and assigns them to the virtual memory addresses of the stack. When the thread/process is interrupted for servicing, the operating system checks the value of the stack pointer register (rsp in x86_64) and frees the physical memory corresponding to the virtual addresses above that in the stack.

So having a large stack will only consume virtual address space which we have plenty of (2^48 bytes in most x86_64's) but will not consume physical memory, which is a more limited resource.

stephencanon · on Jan 10, 2012

You would keep a frame pointer, which would tell you where to find the caller's stack frame. Most systems already do this, to aid debugging and tracing tools, to support variable-length stack allocations, or for any number of other reasons.

cks · on Jan 9, 2012

I was actually under the impression that Fortran was used simply because the experts (in this case in fluid dynamics) was familiar with Fortran. It was the language they learned and used while back at the university. At least this is the impression I got from working in the field.

I never heard of anyone suggesting we should use Fortfran for performance reasons, instead there was an ongoing movement to evolve the code, moving it to use the OpenFOAM solver that's written in C++.

jmaygarden · on Jan 9, 2012

Matrix operations can be faster in FORTRAN than C because of differences in the way arrays are defined/implemented in each language.

stephencanon · on Jan 9, 2012

This is just totally false. You can lay out your C arrays in exactly the same way that Fortran does, if you choose to. Or you can do something else, if that will give better performance. Tools are just tools; they don't determine what you can do with them. (I write matrix operations for a living, generally in C or Assembly; much of what I write is provably as fast as possible on the target hardware, so it could not possibly be faster if written in Fortran).

jheriko · on Jan 9, 2012

Okay, so there are reasons why C is /difficult/ to make very efficient for numerical computations. Aliasing is not one of them since C99 for a sufficiently knowledgable programmer thanks to the restrict keyword.

The library is the real problem.

code.google.com/p/fridgescript - faster than C in some cases, only because it doesn't use the math library, but uses hardware without indirect calls and without caring for obscure edge cases.

stephencanon · on Jan 9, 2012

I write high-performance numerical software for a living. There are a lot of baseless claims in this post. You can write high performance software in C, C++, Fortran, Assembly, or a whole host of other languages. There are syntactic reasons to prefer one or another, but you should not choose among them for performance reasons.

I choose to write in C and Assembly, for example, and much of the code I write is provably as fast as possible on the targeted architecture. It is literally impossible that it would go faster if I wrote it in Fortran instead. All of these languages are just tools, and if you know your tool, you can do great things with it. The specifics of which tool you choose are often unimportant.

There are some syntactic niceties in fortran which make it more comfortable for people who don't want to think about certain low-level details. However, you cannot write software that runs as fast as possible without considering those details, so a programmer with that goal is forced to think about them no matter what tool he or she chooses.

Fortran does have a (slightly) more relaxed numerics model than standard C, which allows a compiler to make some optimizations that a C or C++ program would need to explicitly license. However, these optimizations are disallowed in standard C and C++ because they are unsafe. The fact that Fortran enables them does not make Fortran a better language for numerical computation (from my perspective as low-level library writer, they make it worse). Performance without correctness is absolutely meaningless.

Write software in the language that is comfortable for you. Use libraries written by experts for performance critical operations. Use a profiler to identify operations that are hotspots in your code. Don't complain about your (or someone else's) tools.

yummyfajitas · on Jan 9, 2012

The author's point is well taken, but his example leaves a lot to be desired:

If you look at that loop, it can be parallelized or vectorized without any problem if and only if the array pointed to by x and the array pointed to by y are completely distinct with no overlap. But there's no way to write code in C or C++ that guarantees that.

    double* doMath(double** y) {
        double** x = allocateNew2DArray(20000, 20000);
        for (int i=0; i < 20000) {
            for (int j=0; j < 20000) {
                x[i][j] = y[i-2][j+1] * y[i+1][j-2];
            }
        }
        return x;
    }

All you need to do for this example is replace the fortran coding style doMath(double* x , double* y) with the c coding style doMath(double* y).

I don't think any C compilers actually do parallelize code like this, or at least they don't do much beyond using SIMD. But in principle they could.

mturmon · on Jan 9, 2012

The use in the blog post of this example, with 2d arrays in C implemented through pointers-to-pointers, perplexed me.

If you care about efficiency, you don't use pointers-to-pointers for rectangular matrices. You use 1D vectors and strides. For instance, this is how numerical linear algebra codes typically represent matrices. This approach also generalizes well to N-dimensional problems.

The pointers-to-pointers idiom is picked up by some people due to its use in Numerical Recipes. Of course, NR used it to make C look like Fortran, because they were comfortable with Fortran. Another NR artifact is the gyrations used to make C arrays appear to start at 1 rather than 0. Which is a hoot.

I agree with some comments above that the Fortran 90 matrix constructs are an improvement on C's capabilities.

nitrogen · on Jan 9, 2012

I think something may have been lost when you typed your code. y is a pointer to double, so can't accept two array subscripts (if it were "double * * y" it could, or you could keep it as a one-dimensional array and use something like y[(i - 2) * 20000 + j + 1]), and i and j don't appear to be incremented anywhere.

yummyfajitas · on Jan 9, 2012

Thanks, fixed the "double * * y". The i and j not being incremented is actually an error in the original blog post - I'll leave it here for consistency.

nitrogen · on Jan 9, 2012

The i and j not being incremented is actually an error in the original blog post - I'll leave it here for consistency.

Ah. In that case, I wonder how the author managed to get an infinite loop to complete in 0.8 seconds ;).

antirez · on Jan 9, 2012

C does not offer parallelization automatically, but you can pick a model, and a library, that is a good fit for turning your app into a parallel one. In this regard the following slides are interesting: http://swtch.com/~rsc/talks/threads07/ The model proposed here may not be the right one for your application, so you can design your own (including multiple processes exchanging messages for instance, or the usual threading with locks, and so forth, it's up to you).

This requires efforts but to be honest, there is currently no language that is a good fit for system programming and that is able to parallelize your code magically and automatically. Such a language would give a strong competitive advantage to programmers using it, as C did in the past over other languages, so would become mainstream soon or later, or its ideas would incarnate in some other "better C" language. If this is not happening IMHO there is something wrong in languages that currently are able to do more than C in this regard.

In programming ideas tend to take years to be accepted, but there is a very clear trend over decades: something that is really better (as in code that is faster, or simpler to write (very useful abstraction XYZ), or more easy to debug, or with higher quality libraries, or even much simpler to deploy (PHP I'm looking at you)) eventually becomes mainstream.

gsg · on Jan 9, 2012

No mention of representation issues? Memory is a big optimisation target these days, and a weak point of languages like Java and OCaml.

dwc · on Jan 9, 2012

The missing meta lessons: don't fall for overly simple characterizations, and don't be a language bigot.

"C is Efficient" is largely true, but it's certainly not always true, and there are problem domains where it's seldom true. If you don't know that this is the case with any language then you're not qualified to be picking an implementation language anyway.

growingconcern · on Jan 10, 2012

What a dolt. It's possible to tell the compiler that two pointers aren't aliases: the "restrict" keyword. Unless I'm mistaken that is really his sole argument against "pointer-based languages".

duaneb · on Jan 10, 2012

Well argued... until I realized that his argument hinges on C/++ compilers not being able to guarantee non-aliasing. That's why they introduced the `restrict` keyword in C99....

akg · on Jan 9, 2012

Every language has it's pitfalls. There isn't one single greatest language for everything. Within an application domain one needs to consider the tradeoff between machine-time and human development-time and determine what the best tool for the job is.

A 5minute run-time in Python might be acceptable if it takes you 1 hour to write it and are only going to use it once; whereas you may not even know how to program in OCaml even though it offers the best run-time.

jensnockert · on Jan 9, 2012

While a lot of Scientific computing is done in FORTRAN, much of the lower-level `plumbing' is written in assembly, C or C++.

C isn't an efficient language, C is just thin layer on top of assembly. You can write shit code in C, I know that from experience, but you can also write really fast code in C.

exDM69 · on Jan 9, 2012

> C isn't an efficient language, C is just thin layer on top of assembly.

This is a pretty common fallacy. These days a C compiler does so much magic that calling it sugared assembly doesn't describe it accurately.

With a 1970's single pass compiler your argument may have been more true.

jrockway · on Jan 9, 2012

Yes, but it would be just as easy to write an "assembly compiler" that did the same optimizations on your program text. C provides a layer above the machine that acts as we imagine a machine to work; the compiler adjusts our expectation to reality.

In the end, C has branching and memory access, and that's basically what our computers have too. The rest is details.

adgar · on Jan 9, 2012

> Yes, but it would be just as easy to write an "assembly compiler" that did the same optimizations on your program text.

While there are ways to optimize machine code without extra debug information, no, what you've said is not true. It is not nearly as easy to optimize machine code as it is to optimize C.

gizzlon · on Jan 9, 2012

For me, the blinking and moving ads completely invalidate this site..

z0r · on Jan 9, 2012

Because I've read this blog before, I can point you to this:

http://scienceblogs.com/goodmath/2010/07/seed_conflicts_of_i...

Don't blame the author

stonemetal · on Jan 9, 2012

Given that you might like his new site better.

http://scientopia.org/blogs/goodmath/

ww520 · on Jan 9, 2012

I didn't notice them. Was it because AdBlocker?