
It’s Faster Because It’s C - fogus
http://pl.atyp.us/wordpress/?p=2947
======
tptacek
No. C programs have fine-grained control over the memory layout of all their
data, and thus are far better positioned to exploit caching in general and
optimize for locality in particular. It's an attractive fallacy to suggest
that most programs are I/O bound and thus are equally performant in Java and
in C; while that statement does bode well for async code and poorly for
threads, it's not as relevant for language comparisons.

~~~
jrockway
But higher-level languages can write the machine code that "exploits caching
in general and optimizes for locality in particular" _for you_. You teach the
computer how to do that once, and then you get it for free from then on. You
can program your problem domain instead of the solution domain.

Most people using C for high-performance computing are just using it to glue
together the high-performance libraries, anyway.

Remember, most people are writing big applications, not tiny procedures. If
you just want to multiply a few numbers together, sure, C is going to be fast.
If you want to build a complicated application, then C's advantages are going
to be almost unnoticeable and its disadvantages are severe.

~~~
statictype
_But higher-level languages can write the machine code that "exploits caching
in general and optimizes for locality in particular" for you_

Are there any in existence that do this particularly well (ie, at least as
good or better than a reasonably experienced C programmer might)?

~~~
jrockway
Sure, lots. They use the same C libraries that the reasonable experienced C
programmer would.

(Also, take a look at some of the ghc on llvm benchmarks, it's very
competitive with C, and doesn't require you to jump through any hoops. I'd
link you, but Google is blocked at work due to incompetent firewall rules.
Sigh.)

------
wheaties
When I was younger I would have agreed wholeheartedly with this article
because he seems more knowledgeable. After a few years experience I would have
disagreed with him. Now I'm experienced enough to realize I have no idea if
he's right or wrong but he seems to make reasonable points.

I've only worked in I/O bound, memory bound, and CPU bound code before (but
never at the same time.) My hats off to anyone or group that has to work in
these types of situations. Guess that's why I'm not a kernel level developer.

~~~
redcheetah
I wonder if there are any startups out there who hire folks with "kernel
developer" mentality/experience. I love dynamism and excitement of startup
life but unfortunately it often comes with Ruby/JavaScript, which is fine but
I prefer lower-level hacking: hardware interrupts, malloc-free environment,
etc. I do believe startups who need such skills exist, they just seem to be
quieter for some reason. :-(

~~~
paulbaumgart
One I came across:

<http://fastsoft.com/kernel-developer/>

~~~
onecreativenerd
mouseover the image

------
lemming
There are a couple of excellent mailing list posts discussing the relative
speed of JGit and Git, which are excellent reading:

[http://article.gmane.org/gmane.comp.version-
control.git/1180...](http://article.gmane.org/gmane.comp.version-
control.git/118034)

[http://article.gmane.org/gmane.comp.version-
control.git/1180...](http://article.gmane.org/gmane.comp.version-
control.git/118035)

That said, there are probably very few applications that have been this
heavily hand-optimised, and probably equally few where you actually need it.
Where C really stomps Java is around very low level memory management, I
think. With modern processors, code can benefit greatly from colocation of
related data that can be very difficult to achieve in an idiomatic way with
Java.

Edit: link layout

~~~
kevingadd
I find those posts particularly interesting because they are not the typical
criticisms levelled at high level languages; for the most part those posts
read as a detailed list of design mistakes in Java: no unsigned types, no
'struct'-style types, reliance on boxing for generic containers, no way to
'reinterpret' blocks of memory C-style, etc. When people complain about using
java to write software you often hear these individual design decisions come
up.

it's also interesting to note that there are usable HLLs that suffer from few
of the problems noted in those posts: both D and C# (though the latter
required a second version to get some of it right) provide a garbage
collected, object oriented environment like Java, but also provide a lot of
the primitives needed for the kind of optimization discussed: pointers,
structures, unboxed types, reinterpretation, and unsigned values.

I think this suggests that the problem is less 'hig level' languages and more
'immature' ones: C is a very mature language, descended from other mature
languages, while Java was one of the first mainstream languages to make many
of its decisions and as a result even now some of the larger mistakes have yet
to be corrected (lack of function types and checked exceptions being two
examples). Younger languages like D get to benefit from those lessons in the
same way that C/C++ benefited from the mistakes of their predecessors.

~~~
lemming
I agree that the criticisms are not typical, but I don't agree that they're
necessarily design flaws - just things that make Java less suitable for
certain tasks that require fine memory management or low-level bit twiddling.
I write Java in some fairly high-performance scenarios and this is rarely a
real problem (except maybe memory layout - having _every_ object reference be
a pointer does kill us occasionally, but again in specific circumstances). For
the vast majority of applications it's a non-issue.

------
ww520
Let me see. Here are some of the performance related cases I encountered.

1\. A critical process slowed down drastically on certain days and certain
times. Narrowed down to Oracle. Turned out another group went behind our back
and ran expensive reports on our database server. Solution: Politic, spent
half of a year to kick them out.

2\. Some distributed processing slowed down steadily over time. Narrowed down
to bandwidth throttling on the cross data center fiber optic. Solution:
Scheduled emergency migration of processes to the same data center.

3\. Site-wide page serving time slowed down. Narrowed down the Regex and XML
parsing on pages; yes, this was CPU bounded. Solution: Faster libraries, pre-
computation, caching result.

4\. Lucene indexing took longer as data volume grew. Narrowed down to database
bottleneck. Solution: revamp indexing architecture to use DFS and Hadoop.

5\. Linux process spawning drastically slowed own on 64-bit machine. Narrowed
down to OS page table copy-on-write overhead. Solution: work around the
spawning requirement.

6\. File system driver slowed down with more cache. Narrowed down to
inefficient sorting algorithm. Solution: replaced bubble sort with heap sort.

In all these cases, language is never the issue.

~~~
stcredzero
_1\. A critical process slowed down drastically on certain days and certain
times. Narrowed down to Oracle. Turned out another group went behind our back
and ran expensive reports on our database server. Solution: Politic, spent
half of a year to kick them out._

Have also experienced this firsthand. Blame is automatically pinned on us, and
it's never our fault.

~~~
ww520
Oh yeah, it's pure politic. People would demand benchmark and measurement to
prove it's their reports causing the problem, and finger pointing your way.

------
aufreak3
Even when IO bound, you might want to spend less of that precious battery when
you're not waiting. So even if your perceived speed doesn't change, the
battery can tell the difference.

~~~
preview
This is a great point. Power usage is growing in importance. I think power
management APIs will continue to evolve. In the not too distance future, power
will be another axis of optimization.

~~~
redrobot5050
At the hardware level, it already is. And it seems in the Mobile Space, we are
seeing that on the software side.

------
cks
"I’d even argue that the main reason kernel code tends to be efficient is not
because it’s written in C but because it’s written with parallelism and
reentrancy in mind, by people who understand those issues."

With this arguing, isn't it reasonable to assume that a project Foo written in
C or C++ is faster than an equivalent written in Java simply because the
author writing project Foo in C/C++ likely understands performance by choosing
C/C++ in the first place? (I am not saying anything about the performance of a
certain language implementation)

The author also argues from a performance critical application perspective.
What about desktop applications where perceived performance rather acts like a
quality property? I know many people that shy away from using desktop Java and
even .NET applications simply because they feel sluggish and waste memory. I
don't care if the Java application is as fast in pure algorithmic performance.

If I can choose between using two equivalent C/C++ or Java/.NET applications I
will choose the C/C++ application. I still think this is a good assumption.

~~~
j_baker
"With this arguing, isn't it reasonable to assume that a project Foo written
in C or C++ is faster than an equivalent written in Java simply because the
author writing project Foo in C/C++ likely understands performance by choosing
C/C++ in the first place?"

No, not at all. First of all, don't assume that someone knows what they're
doing just by choosing C or C++ over Java. There are plenty of dumb C/C++
programmers out there, and a well-written Java program is always going to
outperform a poorly written C/C++ one.

Secondly, remember that Java programs may actually be faster than C/C++
programs. Programs written in C/C++ require more time and knowledge to
performance tune. Writing something in Java (or other high-level language)
allows the author to spend more time focusing on the big picture issues rather
than having to deal with a lot of lower-level issues.

~~~
btmorex
"Secondly, remember that Java programs may actually be faster than C/C++
programs. Programs written in C/C++ require more time and knowledge to
performance tune. Writing something in Java (or other high-level language)
allows the author to spend more time focusing on the big picture issues rather
than having to deal with a lot of lower-level issues."

I'm not against Java and I'll even admit that theoretically I could imagine a
situation where a Java program ended up being faster, but in reality, that
_never_ happens.

In reality, we always end up in situations like utorrent vs. azureus (for
those that don't know, utorrent is written in c++ and pretty much better than
azureus in every way). In fact, I can't really think of one instance where a
piece of java software is better than an equivalent written in c or c++
(outside of developer tools, because those aren't really directly comparable
anyway)

~~~
swolchok
> for those that don't know, utorrent is written in c++ and pretty much better
> than azureus in every way

The Azureus/Vuze DHT is a lot nicer than the Mainline DHT (which uTorrent
supports), it's just not documented, there are no other implementations, and
this statement probably does not apply to code quality.

------
Hoff
This is the standard performance-tuning discussion, in a different guise.

Until you explain what factor(s) you're optimizing for, "It's faster because
it's written in (whatever)" is a canard.

You can take that discussion in most any direction.

Budget. Even free coders and open-source has its costs.

Raw speed? Custom hardware? Hand-tweaked assembler? FPGA?

Speed, but without the budget for bumming instructions? Architecture- or
machine-dependent C code?

Staffing? Enterprise plug-compatible Java.

Maintainability? Not everybody can hack source code in Bliss or some other
obscure or domain-specific languages.

I/O? Does removing the rotating rust from the design help?

Memory footprint or ROM space, the available languages, the stinky compiler
that's available on (expurgated), or whatever other factors are key to your
goals...

To paraphrase that ancient Microsoft slogan, what are you optimizing for
today?

~~~
eru
Perhaps they are optimizing for getting coders that rather code C than Java?

------
shin_lao
C(++) has got a lower memory footprint than Java/C# which is also quite
important.

~~~
brazzy
True; Java makes it very easy to become memory-bound needlessly. No JIT in the
world can save you if your primary data structure is a TreeMap<Long,Integer>
with a billion entries.

~~~
angusgr
C++ also makes it perfectly possible to build this kind of mess. I know
because I've worked on the exact same situation you describe, except in that
case the TreeMaps of pointers were hand-rolled, "tested in production", and
yet another problem to maintain.(^)

Java might make it easier, and granted the extra object allocations around
Longs and Integers will make it scale poorly more rapidly, but bad (or
compromised) design and poor use of data structures is always going to lead to
problems of some kind or another.

(^) Yes, I know about STL et al. Original programmer clearly did not.

------
JoeAltmaier
Java leads the programmer into using bloated libraries, which absolutely
litter the Java landscape. Its very hard to measure or even predict what
effect a Java interface will have on your solution.

I agree with the author, the language has no intrinsic slowness, its the
tendency to use a triply-nested abstraction for every trivial purpose (a hash
table of objects containing references to a database API...) instead of Hey! a
pointer, that lead the app programmer down the primrose path.

~~~
_delirium
Compared to C that's true, but bloated libraries aren't exactly absent in the
C++ world (one of many reasons that "C/C++" is usually a weird
generalization). Nested templates of templates are all over the place, and as
for pointers, it's common to wrap those too using one of the various smart-
pointer classes.

C often leads to bad algorithms, though, for the same reason it often leads to
lean code tuned to the specific application at hand. Absent many general
library functions, the C world is littered with lots of custom
reimplementations of data structures and algorithms, not all of which are the
best (and a lot of which are actually buggy). Even when they're good, they
tend to have short shelf-lives: much hand-optimized 90s-era C code is now
slower than more naive implementations, because the optimizations used to save
some instructions often actively harm cache performance.

~~~
mfukar
" _much hand-optimized 90s-era C code is now slower than more naive
implementations, because the optimizations used to save some instructions
often actively harm cache performance._ "

On the same hardware? That seems unlikely - do you have a specific example in
mind?

~~~
btilly
Yes. On the same hardware. And I'm sure that a specific example was thought
of.

To give one of several likely causes, CPU pipelines have grown much longer. As
a result it is more important to avoid stalls these days. Naive code compiled
with a modern compiler knows about the importance of this. For instance the
compiler will know it can avoid a stall in certain cases by making sure that a
read from memory that happens soon after a write obeys something called store
to load forwarding restrictions. Doing that can mean extra code which would be
slower on an old computer, but it is faster than a modern one.

~~~
mfukar
Well, if we're talking about different hardware, this whole discussion is moot
- CPU (not C-entral anymore..) architecture changes render "old" optimization
techniques only situational today.

At any rate, the same is true for all languages, and _delirium's point is spot
on: it's not the language that matters, it's the fact that bad (or slow, or
inefficient, call it what you will) code is encountered regardless. It's time
we stopped language wars, don't you think?

~~~
btilly
The discussion is only moot to the extent that the complaint is inaccurate. It
is true that code, once optimized, is frequently hard to unoptimize. It is
further true that what you optimize for at one point does not match what you
optimize for at another. It is also true that there is a lot of C that is now
optimized for the wrong thing. And finally it is true that people who write C
because they are trying to squeeze performance are more generally prone to
create more of it.

As long as those facts remain true, it is fair to complain about this tendency
in C code in the wild. Even though the problem clearly lies with some of the
programmers the language attracts rather than with the language.

~~~
mfukar
_And finally it is true that people who write C because they are trying to
squeeze performance are more generally prone to create more of it._

You can't imply "fact" and use "generally" in the same sentence, sorry.

------
IgorPartola
Faster is not always what we are after though. I recently wrote a piece of
code to run in tightly constrained (but not embedded) environments, and C was
the natural choice. The Python or PHP or JavaScript interpreter and VM just
wouldn't fit in the RAM.

~~~
gryan
Sometimes you don't want to suck up all of the RAM on a non-embedded machine,
either. Also faster startup times and no collector pauses. A lot of times
those are required.

~~~
ramy_d
collector pauses brings up some discussions I've had with some friends - in
essence, it's not something you want your platform to do when you're
programming something like a game where you're pushing 60 frames a second +
audio.

If you don't know what to look for you could end up looking for a long time.

~~~
moomba
Yea, Strings in Java take up much more memory than one would expect.

Minimum String memory usage (bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)

[http://www.javamex.com/tutorials/memory/string_memory_usage....](http://www.javamex.com/tutorials/memory/string_memory_usage.shtml)

Objects in general also have significant overhead.

------
kunley
" I’d even argue that the main reason kernel code tends to be efficient is not
because it’s written in C but because it’s written with parallelism and
reentrancy in mind, by people who understand those issues. A lot of code is
faster not because it’s written in C but for the same reasons that it’s
written in C. "

Brilliant.

I'd say that C programs are generally faster because C world has near-zero
amount of mediocre and copy/paste programmers, so coders just know what
they're doing.

~~~
a-priori
_C world has near-zero amount of mediocre and copy/paste programmers, so
coders just know what they're doing._

[Citation needed]

~~~
kunley
Citation is overrated. Today anyone can publish anything and be cited by
another anyone.

Live in the industry for 15 years or so and observe for yourself. Carefully.

~~~
a-priori
What I meant was that you made a very bold claim without any justification.
Are programmers who code in C categorically better than programmers who don't?
I doubt it.

~~~
kunley
Yes I know my claim, and I showed you a way to grasp it.

"Categorically": that's what you said. Just observe, instead of trying to
apriori-tize the world.

------
kqueue
It takes more cpu cycles to run a java program than a C program given they are
both well written and optimized.

However, writing it in C will probably take more time. The question to ask
yourself, does it matter if my program is taking few more cycles to finish or
not?

Most of the times it IS faster in C but the difference is insignificant(e.g. C
takes 0.0001, Java/Python takes 0.0002. Who cares at this point? Very few).

------
tszming
A better title (IMHO) to the article would be "Compare results, not
approaches.", as stated in the last sentence.

------
limaya
Great writeup!

There is one exception though: startup speed, it comes from just the fact that
you're written in C, the same language the OS is written in, which means that
majority of dynamic libraries you depend on are already loaded, that's what
makes piping simple programs like "wc" possible.

------
wfjackson3
Excellent point. You could toss in assembly as yet another choice for the very
highly constrained environment (very cheap microcontrollers, although many now
have abstractions so you don't actually have to code in assembly directly, it
is sometimes necessary).

------
c00p3r
It is funny how people who are bounded to an artificial sandbox are trying to
view themselves as equal (or even better) than those who created this sandbox.
^_^

JVM is a mere C++ program. Period.

------
jrockway
C is a language, not an implementation. You can JIT C just like any other
programming language.

~~~
cloudhead
Irrelevant, we all know what he's talking about, and whether it's gcc, llvm or
icc, his point stands.

~~~
jrockway
I don't see the distinction. C is a language for describing what operations
the program will perform. The compiler (gcc) / runtime (llvm) then turns that
description of the solution to the problem into something the computer can
actually execute. Sometimes it uses JIT compliation... other times, perhaps
not.

If you "use C for control", then you must have written your own compiler.

