
Why is C faster than Java: git vs JGit - acqq
http://marc.info/?l=git&m=124111702609723&w=2
======
snprbob86
I almost skipped this link; I assumed it was typical borring blog noise. It's
not.

This is an insightful post from the git mailing list which shows some of the
real limitations that a top tier developer hits when trying to write Java code
as fast as neatly optimized C code. Definitely worth reading.

~~~
jaylevitt
Yep. The usual "Program X is faster in C than Java" gets a barrage of "That's
because you know C better". Shawn is a performance-obsessed Java expert,
Eclipse committer and longtime Google coder who works on JGit. If he says Java
is slower than C at this, then Java is slower than C at this.

EDIT: but as wcoenen points out, this was written in 2009 and Java 1.7 does a
better job with some of this.

~~~
huherto
> "If he says Java is slower than C at this, then Java is slower than C at
> this."

Yes. I like how you qualify the statement. Furthermore, on the C side you have
Linus and other C gurus that really, really know how to exploit the strengths
of the C language.

~~~
yassim
I would also expect them to know how it works _all_ the way down through the
OS, which can also have an impact on performance. IE, they know how the
supporting systems operate thus can make further assumptions/optimisations.

------
ot
All the points are valid but they are peculiar to Java, not to all managed
high-level languages. C#/.NET, for example, have unsigned types, value-type
arrays and structs, memory mapped files and specialized collections.

As an example, the C# port of Sqlite is sometimes faster than the C version on
queries, although updates are slower, despite Sqlite is a highly optimized C
library.

EDIT: link <http://code.google.com/p/csharp-sqlite/wiki/Benchmarks>

~~~
kevingadd
It's also worth pointing out that the C# port of SQlite omits using certain C
mechanisms (like pointers) in favor of passing copies of byte arrays around.
You'd expect this to make it slower, but in many cases, it doesn't! (C#
supports pointers, but the port doesn't use them so that it'll work in limited
environments like Silverlight)

~~~
alexchamberlain
This is slightly off topic, but a good read... [http://cpp-
next.com/archive/2009/08/want-speed-pass-by-value...](http://cpp-
next.com/archive/2009/08/want-speed-pass-by-value/)

------
zxypoo
This is an old email... there's been many improvements to both JGit, Java/JVM
and other areas of interest.

Shawn and I gave a presentation at the Googleplex not so long ago about JGit
[1]. In particular, you may be interested in the 'JGit at Google' section.

There are some cases where JGit is faster than CGit, but the benefits of JGit
are that it's easy to embed. There are projects like gitblit and other IDEs
that use the library. On top of that, you have crazy folks like NGit [2] who
cross compile the library using Sharpen so it can be used by the .NET
community...

[1] -
[https://docs.google.com/present/edit?id=0ATM14GNiXaXfZGZkeHp...](https://docs.google.com/present/edit?id=0ATM14GNiXaXfZGZkeHpiNjVfMjdzNmhydGQ0bg)
[2] - <https://github.com/slluis/ngit>

~~~
gelliott
That's really interesting - according to this presentation JGit clone is
significantly faster than native git clone (2.3x in the example).

I'd love to hear more about any code changes that lead to this result.

~~~
chubot
Total speculation... but maybe because C git clone always reads from local
disk (?). jgit clone appears to read from Bigtable/GFS, and those systems have
in-memory caches, or columns can reside totally in memory. Also you could
probably make use of parallelism in I/O with cluster of servers, where as with
local disk you are probably limited by there being a single disk head that has
to move around.

So I doubt it has anything to with Java, but the underlying storage. If I'm
wrong I'd also like to hear about it!

------
wcoenen
This was posted in 2009. I think that some of the arguments are no longer
valid, e.g. Java 1.7 now uses escape analysis to eliminate heap allocations
where possible:
[http://weblogs.java.net/blog/forax/archive/2009/10/06/jdk7-d...](http://weblogs.java.net/blog/forax/archive/2009/10/06/jdk7-do-
escape-analysis-default)

~~~
jshen
"I think that some of the arguments are no longer valid"

Has anyone measured it?

------
cookiecaper
Note the date: 2009-04-30 18:43:19

Many of us have already read this and it's been submitted to HN several times
before. This, of course, does not mean that it's not worth reposting, but
interested parties may want to dig up some of the past discussions.

~~~
njs12345
Here's one: <http://news.ycombinator.com/item?id=1026909>

------
njs12345
I find it kind of interesting that in Haskell, which is arguably even higher
level than Java, most of these optimisations are eminently possible..

EDIT: This obviously came across a bit as language fanboyism, so I guess I
should mention that the language features that let you do many of them let you
shoot yourself in the foot just as easily as you can in C, and you can
certainly argue that with a strong FFI you might as well just call into C if
you really need that kind of low level performance..

~~~
kmm
I've heard the argument before that in need, one can use a FFI to optimize
bottlenecks in high-level code, but I've never understood.

Won't using a high-level language incur an omnipresent speed slump? And even
if a bottleneck exists, how would using a FFI remedy crucial problems in the
language, like the absence of unsigned types or that all types are boxed. The
types will have to be unboxed anyway, so whether that happens in foreign code
or in the interpreter/JIT code won't matter.

~~~
ced
You're right that the FFI can create significant friction, but once you're in
C-land, you get C-level performance. So you need to move whole algorithms into
C. In a O(n²) algorithm, the O(n) FFI friction will be negligible for a large
enough value of n.

 _like the absence of unsigned types or that all types are boxed_

FFIs often provide access to C arrays.

~~~
dkarl
It isn't always that straightforward. With Java, if you move your code into C
you may also need to keep all of your data in C-land to avoid the overhead of
copying it back and forth. Then the data is harder to access from Java, plus
you can't rely on garbage collection to free that memory when you're done with
it.

------
SeanLuke
I build fairly high-performance Java code. And get hit with three major
gotchas which prevent it from approaching C code.

\- There's no way to do array access without null pointer and index checks
each and every time.

\- Generics with basic types, and their unfortunate embedding into syntax
(like the new for() syntax), are awful. Boxing and unboxing incur a
ludicrously high penalty, and generics push coders away from using arrays.
Unlike in C++, generics have been the enemy of performance.

\- Poor quality collections classes (ArrayList and HashMap are notoriously
bad)

Sure there's a few other things like pointer walking etc. in C, and Java's
poor floating point, but the big three above are the killers.

~~~
killedbydeath
I would also add inability to create objects in stack, if you are doing
anything recursive. The overhead of heap object creation is pretty visible. So
I had to either reuse objects and essentially create my own memory management
layer or try to stick data into primitive types which obfuscated code logic
quite a bit.

~~~
elehack
Java 7 uses escape analysis to do stack allocation by default. So, if you play
along and write code for which the escape analyzer can activate stack
allocation (I don't know what the rules are for this), you can get those
benefits.

------
Yrlec
I had a similar experience when I was doing some Galois Field arithmetic in
Java. You pay a huge penalty because of the absence of unsigned types. In our
case we had to use long instead of int, which is extra costly, since many
basic operations in Java return int by default.

~~~
adobriyan
Why does signedness matter?

Addition is XOR which is sign-agnostic. Multiplication has to be done via
table lookups to be fast which also makes it sign agnostic.

Well, at least for p=2.

~~~
Yrlec
I was doing it in GF(2^32-5). Your statement is true for GF(2^n) where n is
small enough to keep the entire multiplication-table in memory (usually n <=
8). When it's bigger you keep log-tables in memory then sign matters. However
when n=16 you get lucky and can use char as an unsigned 16 bit int.

------
buff-a
More specifically, the problem of trying to write a _binary compatible_ java
implementation of a neatly optimized solution written in C. So, the program in
question is executed, reads a whole bunch of binary data from a whole bunch of
different files, does some calculations on that data and then exits.

The questions are, if you had to develop a distributed version control system
in java: a) would you solve it the same way, b) would your solution be faster
or slower, and c) would it take more or less time to write it and be easier to
maintain?

Clearly you would not solve it the same way, for example it might stick around
in memory as you worked. Could it then appear faster, from a user's
perspective? Possibly. Might it be easier to maintain. Also possible.

Pretty much by definition, if you are writing it in C, a _binary compatible_
solution is not going to run as fast if you port it to java.

I don't think that is a conclusion that has much value.

------
willvarfar
So why do they write and use jgit at google instead of just git?

~~~
durin42
Because cgit is a bunch of binaries that expect to call each other. That makes
it harder to abstract out the storage layer, and we don't use vanilla
repositories sitting on a filesystem. Things are backed by some other storage
abstraction, which isn't always very posix-filesystem like.

~~~
wh-uws
Can you elaborate on the storage abstraction and the repository setup?

Just curious about what advantages there are to make you sacrifice the
performance of the cgit binaries. Mostly out of ignorance on the subject.

~~~
axlelonghorn
There was a google talk on this posted to HN recently, but I can't find it. In
it, one of the directors of the build / testing / code review system at google
was talking about how they get things working at scale. Since everyone works
out of the HEAD of one Perforce repo, they end up using the map-reduce
infrastructure to perform tests in the cloud for each checkout. In line with
this, there are too many files, that update too often for every developer to
be checking out of the repo, so they use a custom FUSE filesystem to lazily
give access to files only when they're needed.

~~~
adpowers
That sounds like some of the posts on this blog: <http://google-
engtools.blogspot.com/>

Also, related to source control but not Git, a few years ago Google had a tech
talk about writing a Mercurial storage system on top of BigTable:
[http://www.google.com/events/io/2009/sessions/MercurialBigTa...](http://www.google.com/events/io/2009/sessions/MercurialBigTable.html)

------
rayiner
A lot of this is poor API design, and the product of Java's baggage as
something that needs to have well-defined safety semantics for internet
applications. It is not a necessary constraint of high-level languages that
they don't offer the ability to get down to the metal. SBCL, for example,
offers a lot of mechanisms for unboxed primitive arrays, unsafe declarations,
and these days even SSE intrinsics.

~~~
erichocean
So does Factor: <http://factorcode.org>

------
babebridou
I've had the issue using maps with primitive keys. I solved it by isolating
the performance critical functionality and not using the Collections framework
there, instead writing my own data structure for it (with heavy influence from
the hashmap one).

This tends to be my general philosophy, by the way. Reuse code to get
something working fast, isolate what really causes bad performances, then
solve only those problems by going under the hood. If the performance issues
remain, cheat by pretending it doesn't exist, by making sure we're never in a
worst case scenario and handling the worst case scenario differently.

In my "IntHashMap" case, the worst case scenario was gathering the keySet. I
made sure that I'd only call it when I really really needed it. The rest was
"fast enough" once I had removed the underlying Integer Object on the key.

------
bajsejohannes
> when you do use Java NIO MappedByteBuffer, we still have to copy to a
> temporary byte[] in order to do any real processing

Does anyone know why this is the case?

~~~
nvarsj
Well, with a MappedByteBuffer (or any DirectByteBuffer), if you want to
manipulate the data as a Java type (e.g. byte[]) you have to copy the data
into the heap. byte[] cannot exist outside of the heap.

Still, I wonder why they're using a MappedByteBuffer in the first place if
they're working with the data in the Java heap.

------
cube13
>So. Yes, its practical to build Git in a higher level language, but you just
can't get the same performance, or tight memory utilization, that C Git gets.
That's what that higher level language abstraction costs you. But, JGit
performs reasonably well; well enough that we use internally at Google as a
git server.

I think that this is the key takeaway for the entire post.

One of the reasons I generally dislike any of the "X IS BETTER THAN Y"
bakeoffs is that performance is now so implementation dependent that these
comparisons are pretty much moot. Given that basically any non-trivial
implementation can be improved, it's difficult to say that anything is faster,
especially when one considers developer skill.

Developers should not be chasing the abstract, absolute best performance.
Instead, the language used should be the one that delivers performance that is
good enough for their client's needs. If they can get it with something that
we're familiar with, that's great. If they need to learn a new tool, that's
also good. But it doesn't make much sense to throw away all the knowledge that
a developer has about a certain language to chase "better performance" with a
different one. Most likely, the first effort implementations on a new language
won't be nearly as good as the implementations on the more familiar language.

It's generally true that optimized Java won't ever be as fast as optimized C.
But for the vast majority of cases, it doesn't need to. Java's speed is enough
for those cases. And in the small minority where it's not sufficient, C is
still around.

------
malkia
I wonder if mercurial gets rewritten in "C" whether there would be any
speedup.

~~~
dochtman
I'm sure there would be some speedup, the question is whether it would be
worth it (and I suppose that can only be adequately be assessed by the
developers, who now have to maintain C code instead of Python).

But for some perspective from a former Mercurial developer: lots of the more
performance-sensitive code has already been rewritten in C. Rewriting the rest
of it would simply be a question of diminishing returns. One thing that would
improve is hg's startup time; starting up Python just takes a while, which
kind of sucks for command-line programs like VCS clients that tend to have
many short-running invocations.

~~~
azakai
> One thing that would improve is hg's startup time; starting up Python just
> takes a while

Python starts up very fast for a language runtime (much much faster than
Java). But yes, if you run a large amount of extremely short tasks, the
startup might become significant I guess.

------
alpb
I wonder where do Shawn work at Google and in which product they use jgit.

~~~
zxypoo
You look at this presentation which talks about JGit at Google, if you skip
the first few slides...

[https://docs.google.com/present/edit?id=0ATM14GNiXaXfZGZkeHp...](https://docs.google.com/present/edit?id=0ATM14GNiXaXfZGZkeHpiNjVfMjdzNmhydGQ0bg)

~~~
alpb
I don't have access to this doc somehow.

------
itmag
I remember a post on here recently which said that sometimes a high-level
language can be faster than C, because you can convey more of your algorithmic
intent and thus the compiler can optimize better for you.

It gave an example where the compiler's knowledge that something is an
immutable array means better optimization. Which you can't express in C.

------
m0shen
Mirror ( Google Cache ) :
[https://webcache.googleusercontent.com/search?q=cache:marc.i...](https://webcache.googleusercontent.com/search?q=cache:marc.info%2F%3Fl%3Dgit%26m%3D124111702609723%26w%3D2)

------
tomandersen
Ahh, getting to the metal. Only in C do I get that feeling. For some reason
even C++ just fails on that 'fresh metallic taste' test.

Most code should not be C.

------
ExpiredLink
Amateurs - they should have chosen Fortran
<http://news.ycombinator.com/item?id=3455883>

------
verroq
Page is down?

~~~
rsneekes
[http://viewtext.org/article?url=http%3A%2F%2Fmarc.info%2F%3F...](http://viewtext.org/article?url=http%3A%2F%2Fmarc.info%2F%3Fl%3Dgit%26m%3D124111702609723%26w%3D2&format=)

~~~
verroq
Appreciate it.

------
nknight
In cases where performance actually matters, just avoid bit-twiddle in high-
level languages. It sucks too much. You'll probably waste less time on
optimizations by offloading the biggest bottlenecks to C/C++ with the
native/extension interfaces in your high-level language of choice.

Be careful and stay standards-compliant and you can keep most of the
portability and maintenance advantages while picking up some significant
speed.

~~~
mbell
>You'll probably waste less time on optimizations by offloading the biggest
bottlenecks to C/C++ with the native/extension interfaces in your high-level
language of choice.

In reality this often requires a heavy refactor to actually work. In Java with
JNI for instance the overhead of calling native methods is actually rather
high, over 200 cpu cycles in many cases. The stack often has to be re-
arranged, a CPU stall is usually caused and in the case of most data types
passed to the native function, they have to be copied (last i knew java.nio
buffers were the only types that weren't copied).

Point is, just moving your "hot function" to C / C++ and calling with JNI
doesn't work unless that function is rarely called and does a lot of work
internally. More often the "hot function" is something that is called
thousands of times and moving something like that to JNI is just as likely to
kill performance as help it. You'd have to abstract away an entire module of
work and minimize its call surface to JNI to achieve your goal.

