
The "C is Efficient" Language Fallacy - silentbicycle
http://scienceblogs.com/goodmath/2006/11/the_c_is_efficient_language_fa.php
======
tptacek
This argument is as old as the hills. It's probably true in a lot of niche
situations. But the fact is, most C code is faster than higher-level language
code, because:

* C programmers have more freedom to arrange data in memory to exploit locality

* C data structures need less bookkeeping

* C programs manage memory manually, and so lack GC overhead

* C programs can easily swap in different allocators for different work sets

* C function calls are usually wired into the code, not indirected through (several layers of) tables.

That's to say nothing of bytecode interpretation overhead, which is probably a
straw-man argument.

I buy the instruction scheduling argument for inner-loop numerical and vector
scenarios, but even in performant code, that's usually less than 10% of the
total, and from what I've seen, both C and HLL code tends to delegate that to
machine-specific assembly libraries.

~~~
silentbicycle
While I don't have enough experience in _really_ low-level tuning, I know that
sometimes people get so focused on optimizing their current implementation
that they fail to see that their overall design has trapped them in a local
maxima. Problems like inappropriate algorithms are usually more obvious in
higher level languages, because there's less detail obscuring them. That stuff
needs to be right before getting to fine tuning. (Prototyping in another
language first helps.)

C and C++ are quite fast locally, but sometimes that works against being fast
overall.

~~~
10ren
The standard approach I heard was to code in a very high level language (like
python) to get the algorithms right, then rewrite in C for performance. Python
then becomes a sort drawing board as part of the design stage, instead of
"coding".

~~~
russell
Rewrite only those pieces that are consuming a lot of time or memory, leave
the rest in Python, which is particularly good at integration with C/C++. Many
programs are a lot of setup, error handling, and edge cases where performance
is not an issue. but LOC is.

~~~
10ren
I thought about including the idea of only rewriting the hotspots, but left it
out as detracting from the main point and it seemed tedious to iterate the
details. But then there are 3 replies pointing this out, and which seem to be
more valued than mine by the community (according to their votes.)

Ask HN: Does this mean that HN would prefer I wrote comments that do cover all
the cases, instead of just sticking to the key point? Or is it just that there
seemed to be a gap in my comment, which people naturally wanted to cover?

Note: the replies add more than just "rewrite only hotspots" (like the above
one detailing setup, errors, edge cases), and I certainly appreciate extra
details being filled in. Is the valuable extra detail the reason for the extra
votes? I just feel kind of annoyed that the replies seem to suggest I was
stupid in not mentioning the hotspot idea. This has happened to me a few times
now. Am I taking it too personally? Is it just a question of different
opinions on what is important? Or is it just that I failed to communicate my
decision, and then get annoyed when people point out the other branch of the
decision?

There is a issue of preference here: I prefer to keep all the code in one
language. One factor in this is that my projects are small, and for these, the
overhead of switching languages and managing the different source isn't worth
it. I'm not running up against efficiency problems either. I'm sure it's
different for larger projects, especially multi-person ones, and particularly
if the project covers different kinds of activities (for which different
languages are suitable), and even more so if there's need to integrate with or
reuse existing assets in different languages.

Thanks for any clarifications you may have. :-)

~~~
silentbicycle
> I just feel kind of annoyed that the replies seem to suggest I was stupid in
> not mentioning the hotspot idea.

I didn't mean anything personal, and I don't read any of the other comments
that way.

The other comments may have just been voted up by people who also like Lua or
Mercurial, or something. I thought it was worth noting that Lua was explicitly
designed for that style of development. (Lua's also my favorite language.)

~~~
10ren
Thanks. Yes, that's true. I appreciate the information about Lua and an
example of Mercurial - it was just the hotspot part. I guess the fairest guess
is that a comment is voted up as a whole, based on _all_ the things they add,
not just the part that I happen to be concerned about. It seems a bit silly of
me now, but I much appreciate your reply.

------
chancho
Fortran's (alleged) dominance in scientific is probably attributable to
tradition (they still teach it to undergrads in non-CS departments) but the
native multidimensional arrays have a much bigger impact than aliasing. In
Fortran you just index your array like A(i,j,k) and the compiler will compute
(and optimize) the addressing for you. In C, a typical (non-computer)
scientist who doesn't really focus on mundane shit like this will end up
writing something like

    
    
        for ( int i=0; i<ni; ++i )
        for ( int j=0; j<nj; ++j )
        for ( int k=0; k<nk; ++k ) {
           a[ k*ni*nj + j*ni + i ] = ...
        }
    

which sucks. Optimizing multidimensional array access (which characterizes
most of scientific computing) is much easier for a Fortran compiler.

~~~
Aron
This looks like a lot of calcs for the inner loop but is quite easily
optimized. The compiler knows that (j * ni + i) is constant in the k loop and
that ni * nj is constant. Check the assembly output. But it is probably better
to reorder the loops and traverse linearly through memory so that each cache
line brought down is fully consumed in order.

~~~
scott_s
The point of that code example is not the calculations, but accessing memory
in a cache-friendly way.

~~~
chancho
Well, yeah, but since Fortran stores things in column-major that's not really
an issue. My broader point was that all of these details are hidden from
programmers, so there's less for the programmer to screw up and more room for
the compiler to work within.

------
10ren
_Java: 1 minute 20 seconds._

 _About a year later, testing a new JIT for Java, the Java time was down to
0.7 seconds_

I've been surprised at the speed of Java recently. I wonder how much
improvement is left in dynamic compilation.

 _The HP project Dynamo was an experimental JIT compiler where the bytecode
format and the machine code format were of the same type; the system turned
HPA-8000 machine code into HPA-8000 machine code. Counterintuitively, this
resulted in speed ups, in some cases of 30% since doing this permitted
optimisations at the machine code level. For example inlining code for better
cache usage and optimizations of calls to dynamic libraries and many other
run-time optimizations which conventional compilers are not able to attempt._
<http://en.wikipedia.org/wiki/Just-in-time_compilation>

~~~
alecco
Time-space trade-off. Meaning Java uses a lot of memory for almost anything.
Speed isn't everything.

~~~
michaelneale
Sometimes memory can cost speed as well.

I find with java essentially you are amortising gains which are payed back
with GC at a later date (in many cases I guess it is worth it).

Its not one size fits all !

~~~
pohl
This sounds like it had less to do with Java and more to do with a large
number of short-lived allocations in your design. A garbage collector can be
abused in any language that has one.

------
kmavm
I worked on VMware's virtual machine monitor from 2000 until June. Overcoming
customers' performance fears was a major impediment in the early years. Every
fraction of a percent we gave up relative to native ruled out whole classes of
applications. Nothing other than C would have been conceivable. Even C++ would
have been wildly inappropriate, as it is for most kernels, because so much
invisible code can hide behind a close-brace.

The mapping from source to machine representation in C is relatively trivial,
which is the source of all C's ups and downs. If you ever plan to count
microseconds, cache misses, TLB misses, mispredicted branches, etc., you had
better start with a toolchain whose machine-level output is grokable from the
source.

------
nickpp
Read the comments: turns out the author was ignorant of C++ templates
(including Blitz++ scientific computing library) and he was lumping C and C++
together in his "benchmarks".

It always annoys me when clueless people judge a language they don't even
understand.

Repeat after me: there is no such thing as C/C++.

~~~
sundarurfriend
> turns out the author was ignorant of C++ templates

I believe you actually meant 'ignorant of C++ template metaprogramming
techniques'. The author seems well aware of C++ templates and even says:

>> the thing I coded immediately before the Stellation experiments was a very
hairy template analysis for a C++ compiler

>he was lumping C and C++ together in his "benchmarks".

I couldn't see where, could you point out which comment leads to that
inference?

------
makecheck
Always remember that the total time between you having a problem, and you
achieving results, includes the coding (and recoding) time, _and_ the run
time. There's a tendency for people to ignore "slow" languages because they
focus only on the runtime.

I am well aware that there are good reasons to optimize things in languages
like C (and I use them), but consider...

If I take several extra _weeks_ to code, debug and test a C solution, and I
could have had a script done much sooner, then my results were not faster
overall. Why? Well, the script could be slow as dirt, but if it has a few
extra weeks to churn through data and produce results, it may be done before
the C program is even ready.

It's also important to remember that not all bugs are in software. Suppose I
was looking at an entire problem in the wrong way, and this wasn't apparent
until I started seeing results? In that case, my earlier start with a "slow"
program meant that this mistake was found much sooner, so the script can be
thrown out and redone, producing _correct_ results with not much of a time
penalty.

~~~
boryas
Also, thought I would mention the fact that if you take a day to write a slow
program that takes a week to run, that's cheaper than spending a week writing
a fast program that finishes in a day. After all, your time is much more
expensive than the computer's!

~~~
fauigerzigerk
You're both making a pretty odd assumption here, which is that a program
typically runs exactly once and that I am the only user of my own program.

Users' time is also much more expensive than the computer's. That's why we
write software in the first place.

~~~
makecheck
No, I expect my scripts to run for a long time with many users. This doesn't
preclude optimization; some of it is automatic (new hardware, interpreter
library improvements), some of it is well established (using SWIG and C to
replace only a _tiny piece_ of the program that must be faster).

In some respects, having long-lived software with lots of users makes speed
the least of my concerns, because they're always asking for new features, and
those are relatively easy to add to scripts.

And the relationship between software speed and productivity isn't linear,
because people multitask. If a program takes 10 seconds to run, I might sit
and wait for it to complete, without doing anything else. Whereas, if the
program takes a minute, I may decide to switch to another quick task, and then
return to see results. In this case, both tasks needed to be done, one took
longer _but_ it ate up the "slow" runtime of the program, and was only
parallelized because of that long runtime.

~~~
fauigerzigerk
I was referring to this statement: "Well, the script could be slow as dirt,
but if it has a few extra weeks to churn through data and produce results, it
may be done before the C program is even ready."

The comparison of development times and running times simply makes no sense if
you assume the script is going to run a 1000 times. I agree that this
relationship isn't linear. That's exactly why it's pointless to compare the
the two numbers as if it were. The only number that is comparable is probably
the profit you make in each case.

~~~
boryas
Yes, good point. I guess he was referring more to the scientific end of
things, where you write a program that runs for days on massive data, and
there really isn't a user

------
AlisdairO
Much as I take his overall point - that a close to the metal language is not
always best for performance - the author would be well advised to take a look
at the C99 restrict keyword.

~~~
setjmp
Just looked it up... looks like you're right. The whole blog post is nonsense.

~~~
habibur
Just asking, does currently available compilers support restrict? (or C99
standards)?

I use msvc mostly and gcc sometimes.

~~~
AlisdairO
gcc certainly does (use -std=c99). If you don't want to use C99, use
__restrict__ to use it as a gcc extension. This is C-specific - I don't
believe a similar standard has worked its way into C++ yet, although compilers
may have extensions that support it. I don't know about msvc, I'm afraid.

GCC's C99 implementation is mostly complete - you can find out more here:
<http://gcc.gnu.org/c99status.html>

------
aplusbi
I think a large part of the "C is Efficient" fallacy/truth is that C is more
predictable with its assembly output than HLLs. Since ultimately all programs
are converted to machine instructions at some point, you will always be able
to write machine instructions that are at least as efficient.

The test lies in the ability of the programmer to do so.

------
madair
Reminds me of The Practice Of Programming, by Kernighan and Pike, a truly
excellent book. They cover some of the bum steers of C efficiency quite well
there. A classic.

------
likpok
One issue that I see is that they only (seem) to compare with gcc, which is
not particularly good. It would be better to compare against something like
icc, which has better register coloring, SIMD support, etc.

This might redeem C a little.

However, the real reason C will not go away any time soon is that there is no
replacement for low-level software yet. Nothing eles has quite the same
minimal dependencies.

~~~
chancho
GCC C vs Intel Fortran :
[http://shootout.alioth.debian.org/u64/benchmark.php?test=all...](http://shootout.alioth.debian.org/u64/benchmark.php?test=all&lang=gcc&lang2=ifc&box=1)

Intel C vs Intel Fortran :
[http://shootout.alioth.debian.org/gp4/benchmark.php?test=all...](http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=icc&lang2=ifc&box=1)

(I'm not even going to link to the GCC Fortran benchmarks. They're
embarrassing.)

C is no slower than Fortran on any of those benchmarks, and on some it cleans
Fortran's clock. The aliasing issue is the only thing Fortran has going in its
favor, but clearly its not ubiquitous. The n-body benchmark, for instance, is
fairly typical numerical code. You might even think, since it's simultaneously
reading and writing through multiple pointers of the same type, that aliasing
is an issue, but its not. And in the rare case that it becomes an issue,
there's compiler hints (e.g. C99 restrict) for that.

Picking Fortran over C solely because of aliasing worries is premature
optimization of the worst kind.

~~~
likpok
This matches with my experiences with ICC doing kernel development. It would
vectorize loops and break out the SIMD instructions where gcc would not.

It was quite strange the first time looking through the objdump seeing things
like punpwlkd and xmm.

And then discovering what -fast would do to things (it makes icc look at your
whole program to optimize, so it does things like ignore CDECL and uses
whatever registers it can.

------
Oxryly
For the given example code the answer is to use the underlying SIMD types and
intrinsic instructions. If you're on Intel or PowerPC then the SSE or Altivec
registers and instructions are available. Use them and you'll beat any
compiler optimization every time. And, most importantly, the chances that up-
to-date SIMD types and intrinsics will be available in any language _but_
C/C++ is vanishingly small. Java, Ocaml, Haskell... you name it, you can't
properly use SIMD (AFAIK... please let me know if there are exceptions).

And if you're application doesn't fit the native SIMD properly, there's no
chance the compiler can really do anything meaningful with it anyway.

~~~
jrockway
Some quick googling suggests you are wrong:

[http://www.cas.mcmaster.ca/~kahl/Publications/TR/Anand-
Kahl-...](http://www.cas.mcmaster.ca/~kahl/Publications/TR/Anand-
Kahl-2007a_DSL/)

[http://wwwlasmea.univ-
bpclermont.fr/Personnel/Jocelyn.Serot/...](http://wwwlasmea.univ-
bpclermont.fr/Personnel/Jocelyn.Serot/camlg4.html)

<http://tirania.org/blog/archive/2008/Nov-03.html>

etc.

~~~
Oxryly
I would suggest taking a closer look at those links. They bring up another
point in that you'll only find current, usable, robust support for SIMD in
C++:

The first link is just a paper; the second is a 1.0 release that is seven
years old, it only supports SSE, and it's all in French; the third is only for
Mono, it only supports SSE and its SSE support is old and incomplete.

On the other hand, if you try to use SIMD types and intrinsics in C++ you'll
find current and comprehensive support from the major compilers on all SIMD
platforms.

(I'd love to use a current and comprehensive version of Haskell SIMD, but its
just not ready for prime time.)

~~~
jrockway
_(I'd love to use a current and comprehensive version of Haskell SIMD, but its
just not ready for prime time.)_

Since you're obviously an expert in the area, why aren't you helping make it
ready for prime time?

~~~
scott_s
That's really not fair. I know of many things in and outside of my area of
research that need improvement, as does any researcher. But we can only work
on one thing at a time.

------
dryicerx
I disagree with some of the Authors points.

The language is a tool, the efficiency of it really depends on how the
programmer who designs and implements a program. You can have a horrible coder
write something in C that would be very slow and inefficient, and given the
same problem to good programmer and you can arrive at a faster and more
efficient result using a bash script.

The reason why lot of people (including my self) believe that C/C++ is a high
performance language is not because it fast for all applications, but gives
the programmer more control instead of leaving all the fine details to the
compiler to second guess. (@tptacek's reasons are perfect for this).

~~~
AlisdairO
...except that in this case, when you take full control of the memory
management, you slow things down. The problem with using pointers (without
using the restrict keyword) is that it can seriously impede out of order
processing on modern CPUs. Now, the author has missed the existence of the
restrict keyword in C99, but in the absence of that his point is good.

------
wicknicks
The arguments about language efficiency on a single processor machine are
probably outdated today. Most machines have multiple cores. We need good tools
which can exploit this CPU architecture. Languages like C/C++ place a large
amount of responsibility on the shoulders of a programmer. Effectively, you
are writing two programs - one for the task at hand, and the other is memory
allocation for it. Control does not always give speedup, also it is not always
a boon. It implies managing the different resources on your own. In a large
scale application, this is not a great idea. GC is a good alternative over
Malloc and Free'ing when the size of the application gets huge.

Also, the scalability of a single system is limited. If you really need extra
speed, I think going parallel is the key. Most numerical methods are
parallelizable. And C/C++ were not created keeping this mind. It can be a
programmer's nightmare to debug multiple threads with memory leaks. Languages
like Java do make this task easier using programming paradigms like
Map/Reduce.

~~~
lacker
It doesn't make sense to use mapreduce as an example of Java making
parallelism easier, given that the original Google MapReduce is a C++
framework.

------
jamii
The debate seems to be between trusting your compiler to generate good code or
writing close-to-metal code yourself.

There might be a third option though - code generation in a HLL. Coconut is a
nice example of this:

[http://www.youtube.com/watch?gl=GB&hl=en-
GB&v=yHd0u6...](http://www.youtube.com/watch?gl=GB&hl=en-GB&v=yHd0u6zuWdw)

Instead the compiler being a black box, Coconut is structured as a set of
libraries for code generation, analysis and optimisation. They report
outperforming c SIMD by up to 4x on the cell architecture.

------
bradgessler
Where is the "Assembly is Efficient" Language Fallacy article?

------
ddg
"In C and C++, there's no such thing as an array - there's just pointers,
which you can subscript and a shorthand for pointer arithmetic and
indirection(x[n] in C/C++ is the same thing as _(x+n).)"

Please remember that "char x[N];" and "char _x = malloc(N);" are NOT the same.
(Not sure if this is news to anyone, but when I was learning C reading that
would have made me think otherwise).

------
wlievens
He's absolutely right about pointer aliasing. I work on a DSP compiler for
architectures strongly geared towards instruction-level parallelism, and when
the compiler occurs pointers that may alias (but probably won't, but we can't
tell because of the language's liberal use of pointers), many optimizations
have to be a lot more conservative.

------
ilitirit
C is one of the fastest languages and most space-efficient languages out of
the box. Other languages get their speed by resource trading (memory, space,
time). You could arguably gain these sort of speed ups in C if you invested
enough time in writing a VM or JIT-ing/dynamic compiler to host your C
program.

------
illumen
C works in CUDA and OpenCL - which for scientific programming is the fastest
thing there is.

The GPU is used with C.

C isn't faster than hand coded assembler on cpus... but is pretty damn quick.

There are specialist compilers for C, like vector C etc.

anyway... whatever. back to typing text into a file now.

~~~
malkia
CUDA and OpenCL have C syntax, but they are not C.

------
oomkiller
This is an interesting article, but I believe that the benchmarks should be
updated since a ton of progress has been made in compilers and how they handle
aliasing.

------
jgrant27
<http://news.ycombinator.com/item?id=591897>

------
ramoq
I remember reading this article years ago :)

------
Allocator2008
Regarding the discussion of "real" arrays in Fortran vs. "pointer arrays" in
c, it seems in the real world often we need resizable arrays, which means the
only way to do that is to have pointer arrays. A Fortran array cannot be
resized. What if I want to read in a tab delimted file that has ints, and read
that into a 2-D matrix, where the matrix rows correspond to the file rows, and
the matrix columns correspond to the tab delimited columns in the file. And I
have say no idea how big the file is. So the only way I can do this is to have
a resizable array. I can "guess" that say my rows/columns won't be more than
some arbitrarily large number, but then I likely end up wasting a lot of
space. So out of curiousity how would I even do that with "real arrays"? Seems
like if I ever need to resize an array, even in just as simple an example as
reading in a file, then I need a pointer array, since "real" (fortran-like)
arrays will not resize. Am I missing something here?

------
c00p3r
In most cases manually cooked food is much cheaper and more healthy than
purchased one, if you can cook. The same is correct for C/C++ programming in
terms of execution speed and resource usage if you have knowledge and
experience.

btw, JVM is mere a c++ code.

