"We find that in regards to performance, C++ wins out by
a large margin. However, it also required the most extensive
tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer.
Scala concise notation and powerful language features allowed for the best optimization of code complexity.
The Java version was probably the simplest to implement,
but the hardest to analyze for performance. Specifically the
effects around garbage collection were complicated and very
hard to tune. Since Scala runs on the JVM, it has the same
issues.
Go offers interesting language features, which also allow
for a concise and standardized notation. The compilers for this language are still immature, which reflects in both performance and binary sizes."
Untuned C++ will probably be slower then even tuned python, I can tune any language, but C++ (although less so then something like C) requires more tuning to get good performance. The performance is what we care about here.
Could you give an example of where this would be the case? I write both Python and C++ code all day and I can't think of a single situation where I'd expect better performance from Python than a similar C++ implementation.
Here's one that bites every c++ newbie at least once (please forgive minor syntactic mistakes, it's been a while):
std::vector<LargeStruct> things = GetABunchOfThings(); // Why does this take so long?
// Oh well, time to sort
std::sort(things.begin(), things.end(), LargeStructComparator());
...and then your program takes forever and your profiler tells you that 99.9% of your time is spent in LargeStruct's copy constructor. And then you refactor to use pointers and teach the comparator to dereference them, whatever, it's fine, but the naive Python version will just work.
Of course, after c++0x becomes widespread this problem will go away. Yay move constructors.
Basically correct, except std::sort uses assignment, not copy construction.
It's also likely that the optimizer will have GetABunchOfThings construct its result once in the caller's "things". Lambdas are C++0x, so maybe move constructors (another C++0x feature) will be employed instead.
Python might be a bad example, as the interpreter really isn't terribly fast. But with Javascript, I wouldn't be surprised at all if the same naive implementation of a typical web-ish task (lots of strings, parsing, lots of heap churn) ran faster on V8 than it did compiled from C/C++.
By 'untuned' I'm also including using the wrong algorithm. For example, the using the usual recursive definition for the Fibonacci sequence in C++ as opposed to the loop or n^2 version of recursive Fibonacci in python. The python version will be faster in this case. It is probably true however that any decent programmer will churn out C++ that is faster than python for cpu intensive tasks most of the time.
The use of the algorithm is mostly orthogonal to the language being used. It's like saying 'walking from Amsterdam to Rotterdam can be faster than going by bike, if you take a detour to Beijing when using a bike'.
Personally, I often do an initial write of a program first in Python and rewrite in C/C++ if necessary. Usually, the first unoptimized C++ implementation is 10-100x faster than an optimized Python implementation.
> The use of the algorithm is mostly orthogonal to the language being used.
I don't find that to be true. I use more sophisticated algorithms in "higher level" languages because I can implement them in about the same time it takes me to implement less sophisticated algorithms in "lower level" languages.
> Personally, I often do an initial write of a program first in Python and rewrite in C/C++ if necessary. Usually, the first unoptimized C++ implementation is 10-100x faster than an optimized Python implementation.
Yes, but the c/c++ rewrite gets to take advantage of what you learned in the python AND you only do the c rewrite when you need the speed and are willing to pay the time penalty to get it. (There's some extra cost to the c version otherwise you'd always do c and never do python.)
I use more sophisticated algorithms in "higher level" languages because I can implement them in about the same time it takes me to implement less sophisticated algorithms in "lower level" languages.
These are factors not related directly to the language, but level of understanding, time, and money. An experienced C++ programmer can probably implement the 'better' algorithm equally quick in C++.
There's some extra cost to the c version otherwise you'd always do c and never do python.
Not necessarily. In an existing project in a non-C language, the C implementation will require you to write and maintain a binding as well. That could be a good reason to use Python/Ruby/Prolog/whatever by default, unless it's not fast enough.
> These are factors not related directly to the language, but level of understanding, time, and money. An experienced C++ programmer can probably implement the 'better' algorithm equally quick in C++.
There are programmer performance factors directly related to language differences.
The mainstream example is efficiency of DSLs. Try writing a complex regexp in a standard regexp DSL that originates from Perl and in plain C/C++.
The less known example is writing complex business logic in Clojure vs Java. The very fact that Clojure version lets you examine much more business logic in one screen without scrolling puts much less pressure on programmer's concentration and memory.
Thus the very ability to write complex business logic without lots of bugs in tight time constraints depends on a language.
> An experienced C++ programmer can probably implement the 'better' algorithm equally quick in C++.
While there are programmers who can implement a given algorithm faster in C++ than Python, I suspect that they're the exception (among folks who know both).
> In an existing project in a non-C language, the C implementation will require you to write and maintain a binding as well.
If there's no c penalty, other languages wouldn't have been used as often for said existing projects to begin with.
My view is that each language has both strengths and weaknesses, so different languages are better suited to different tasks depending on factors like available personnel, existing sw, ease of implementation (to reach an acceptable level of performance).
In the average case for large datasets. Quicksort is O(n^2) worst-case (choosing the worst pivots possible), and Bubblesort is O(n) in the best case (e.g. an already-sorted array).
The C++ Bubblesort will have better worst-case and best-case performance, but worse average-case performance, for large inputs.
But this is all irrelevant, because in Python, you'll be using Timsort[1], which I'm pretty sure is implemented in C anyway. In C++, you'll be using std::sort unless you have a template allergy. If you use the Gnu stuff, that will be Introsort[2]; if MSVC, Quicksort.
Python doesn't even use quicksort. It uses timsort (http://en.wikipedia.org/wiki/Timsort), which is a hybrid merge/insertion sort. It's also used to sort arrays in two different versions of Java. In other words, it's one of the fastest-known sorting algorithms.
Down voted because people think Quicksort always wins in Python vs. a Bubble sort in C++? Wish I had time to find a case where that's not true. I'll point out that in the worst case, Quicksort is O(n^2).
when turning map into hash_map gives you 30.4% increase in performance, you can't call this an 'extensive' tuning effort... The 'extensive' tuning in the paper (structure peeling, ...), accounts for minor improvements (< 10% total).
The initial code was supposed to be written in a straightforward way. That said, I was playing around with the code, and using tcmalloc with the original C++ code gives a better speedup (something close to 40% with the provided example.)
"We find that in regards to performance, C++ wins out by a large margin. However, it also required the most extensive tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer.
Scala concise notation and powerful language features allowed for the best optimization of code complexity. The Java version was probably the simplest to implement, but the hardest to analyze for performance. Specifically the effects around garbage collection were complicated and very hard to tune. Since Scala runs on the JVM, it has the same issues.
Go offers interesting language features, which also allow for a concise and standardized notation. The compilers for this language are still immature, which reflects in both performance and binary sizes."