Their conclusions: "We ﬁnd that in regards to performance, C++ wins out by a lar...

gaius · on June 3, 2011

It's not that you have to tune C++ it's that you can.

keeperofdakeys · on June 3, 2011

Untuned C++ will probably be slower then even tuned python, I can tune any language, but C++ (although less so then something like C) requires more tuning to get good performance. The performance is what we care about here.

MrScruff · on June 3, 2011

Could you give an example of where this would be the case? I write both Python and C++ code all day and I can't think of a single situation where I'd expect better performance from Python than a similar C++ implementation.

saucetenuto · on June 3, 2011

Here's one that bites every c++ newbie at least once (please forgive minor syntactic mistakes, it's been a while):

    std::vector<LargeStruct> things = GetABunchOfThings(); // Why does this take so long?

    // Oh well, time to sort
    std::sort(things.begin(), things.end(), LargeStructComparator());

...and then your program takes forever and your profiler tells you that 99.9% of your time is spent in LargeStruct's copy constructor. And then you refactor to use pointers and teach the comparator to dereference them, whatever, it's fine, but the naive Python version will just work.

Of course, after c++0x becomes widespread this problem will go away. Yay move constructors.

jongraehl · on June 4, 2011

Basically correct, except std::sort uses assignment, not copy construction.

It's also likely that the optimizer will have GetABunchOfThings construct its result once in the caller's "things". Lambdas are C++0x, so maybe move constructors (another C++0x feature) will be employed instead.

ajross · on June 3, 2011

Python might be a bad example, as the interpreter really isn't terribly fast. But with Javascript, I wouldn't be surprised at all if the same naive implementation of a typical web-ish task (lots of strings, parsing, lots of heap churn) ran faster on V8 than it did compiled from C/C++.

Raphael_Amiard · on June 3, 2011

> Untuned C++ will probably be slower then even tuned python

[citation needed]

keeperofdakeys · on June 3, 2011

By 'untuned' I'm also including using the wrong algorithm. For example, the using the usual recursive definition for the Fibonacci sequence in C++ as opposed to the loop or n^2 version of recursive Fibonacci in python. The python version will be faster in this case. It is probably true however that any decent programmer will churn out C++ that is faster than python for cpu intensive tasks most of the time.

microtonal · on June 3, 2011

The use of the algorithm is mostly orthogonal to the language being used. It's like saying 'walking from Amsterdam to Rotterdam can be faster than going by bike, if you take a detour to Beijing when using a bike'.

Personally, I often do an initial write of a program first in Python and rewrite in C/C++ if necessary. Usually, the first unoptimized C++ implementation is 10-100x faster than an optimized Python implementation.

anamax · on June 3, 2011

> The use of the algorithm is mostly orthogonal to the language being used.

I don't find that to be true. I use more sophisticated algorithms in "higher level" languages because I can implement them in about the same time it takes me to implement less sophisticated algorithms in "lower level" languages.

> Personally, I often do an initial write of a program first in Python and rewrite in C/C++ if necessary. Usually, the first unoptimized C++ implementation is 10-100x faster than an optimized Python implementation.

Yes, but the c/c++ rewrite gets to take advantage of what you learned in the python AND you only do the c rewrite when you need the speed and are willing to pay the time penalty to get it. (There's some extra cost to the c version otherwise you'd always do c and never do python.)

microtonal · on June 3, 2011

I use more sophisticated algorithms in "higher level" languages because I can implement them in about the same time it takes me to implement less sophisticated algorithms in "lower level" languages.

These are factors not related directly to the language, but level of understanding, time, and money. An experienced C++ programmer can probably implement the 'better' algorithm equally quick in C++.

There's some extra cost to the c version otherwise you'd always do c and never do python.

Not necessarily. In an existing project in a non-C language, the C implementation will require you to write and maintain a binding as well. That could be a good reason to use Python/Ruby/Prolog/whatever by default, unless it's not fast enough.

SergeyHack · on June 6, 2011

> These are factors not related directly to the language, but level of understanding, time, and money. An experienced C++ programmer can probably implement the 'better' algorithm equally quick in C++.

There are programmer performance factors directly related to language differences.

The mainstream example is efficiency of DSLs. Try writing a complex regexp in a standard regexp DSL that originates from Perl and in plain C/C++.

The less known example is writing complex business logic in Clojure vs Java. The very fact that Clojure version lets you examine much more business logic in one screen without scrolling puts much less pressure on programmer's concentration and memory.

Thus the very ability to write complex business logic without lots of bugs in tight time constraints depends on a language.

anamax · on June 3, 2011

> An experienced C++ programmer can probably implement the 'better' algorithm equally quick in C++.

While there are programmers who can implement a given algorithm faster in C++ than Python, I suspect that they're the exception (among folks who know both).

> In an existing project in a non-C language, the C implementation will require you to write and maintain a binding as well.

If there's no c penalty, other languages wouldn't have been used as often for said existing projects to begin with.

My view is that each language has both strengths and weaknesses, so different languages are better suited to different tasks depending on factors like available personnel, existing sw, ease of implementation (to reach an acceptable level of performance).

brown9-2 · on June 3, 2011

By 'untuned' I'm also including using the wrong algorithm

I believe this is a different definition of "tuned" than most people use.

swah · on June 3, 2011

Quicksort Python sorts faster than Bubblesort C++.

amalcon · on June 3, 2011

In the average case for large datasets. Quicksort is O(n^2) worst-case (choosing the worst pivots possible), and Bubblesort is O(n) in the best case (e.g. an already-sorted array).

The C++ Bubblesort will have better worst-case and best-case performance, but worse average-case performance, for large inputs.

But this is all irrelevant, because in Python, you'll be using Timsort[1], which I'm pretty sure is implemented in C anyway. In C++, you'll be using std::sort unless you have a template allergy. If you use the Gnu stuff, that will be Introsort[2]; if MSVC, Quicksort.

[1]http://en.wikipedia.org/wiki/Timsort [2]http://en.wikipedia.org/wiki/Introsort

BrandonM · on June 3, 2011

Python doesn't even use quicksort. It uses timsort (http://en.wikipedia.org/wiki/Timsort), which is a hybrid merge/insertion sort. It's also used to sort arrays in two different versions of Java. In other words, it's one of the fastest-known sorting algorithms.

melling · on June 3, 2011

Please provide the code and set of data. If C++ is 100x faster than Python this might not always be the case.

melling · on June 3, 2011

Down voted because people think Quicksort always wins in Python vs. a Bubble sort in C++? Wish I had time to find a case where that's not true. I'll point out that in the worst case, Quicksort is O(n^2).

http://en.wikipedia.org/wiki/Sorting_algorithm#Quicksort

pilooch · on June 3, 2011

when turning map into hash_map gives you 30.4% increase in performance, you can't call this an 'extensive' tuning effort... The 'extensive' tuning in the paper (structure peeling, ...), accounts for minor improvements (< 10% total).

copper · on June 3, 2011

The initial code was supposed to be written in a straightforward way. That said, I was playing around with the code, and using tcmalloc with the original C++ code gives a better speedup (something close to 40% with the provided example.)

chrisjsmith · on June 4, 2011

That's what I noticed. I think that's called "knowing what you are doing" personally.