
Ask HN: Is C/C++ really faster than Python? - sunilkumarc
I have seen many people saying they choose C&#x2F;C++ over Python during programming challenges because the performance of Python is lesser compared to C&#x2F;C++. How correct is this fact?<p>If Python is actually slower compared to C&#x2F;C++, how much slower is it when we implement the same algorithm in C&#x2F;C++?<p>Is it really necessary to use only C&#x2F;C++ and not Python when solving some questions which involve a lot of computation?
======
overgard
Yes. It's much faster. The reason C++ is fast is because you have explicit
control over allocations and it compiles to native code (among other reasons).
There are also a lot of subtle things that you can control in C++ that you
can't in python, like memory layout. Memory layout is extremely important for
CPU cache, and CPU cache is extremely important for performance. (Reading data
from RAM is generally orders of a magnitude slower than reading from an L1 or
L2 cache. For some reason this never gets mentioned in university CS courses,
but there are a lot of data structures that are "theoretically" fast for
certain things but in practice very unfriendly to hardware. For instance,
linked lists. Sure insert/delete is O(1), but since linked lists are quite
possibly the least cache friendly data structures in existence, you're better
off with a vector/array even if you're inserting all the time. I'm not just
making this up, you can read talks by people like Bjarne Stroustroup on this
topic)

As to whether it matters for the algorithm, it depends on the algorithm. It's
almost always going to be much faster in C, but, how fast do you really need
it to be? Python is very good at saving programmer time, so if performance is
not particularly important, I just wouldn't worry about it.

~~~
CyberDildonics
While this is true, and linked lists are somewhere between exotic and
obsolete, the presentation by Bjarne Stroustroup had buried in it him saying
'the time to find the element takes so long a vector is always faster'. But,
of course if you include a search linked lists are going to be way slower.

Also I wouldn't say that the same algorithm would be faster in C, there is no
reason why C++ can't be as fast as C, even idiomatic modern C++.

~~~
floatboth
Exotic? Obsolete? Really? Linked lists are used in pretty much all functional
programming languages (because head/tail splitting is extremely useful)

~~~
overgard
I should preface this by saying I love functional programming, I think it's a
great way to write stable and testable code, it's likely the future, but...

Functional programming is much more popular in academic circles than in
industry. I think academics tend to hand wave away practical concerns like
cache coherency as "an implementation detail" or as an "exercise for the
reader." Big O times are often considered, but rarely do researchers make a
career off of "how well does this fit into an X86 L3 cache"

So yes, functional programming tends to depend on linked lists, and that's
probably why you don't see it in things like the game industry. This might
change in the future, for instance, I could see it being the case where having
provably immutable memory could allow processors to manage more memory
efficiently, but I don't think you can have a fast language that considers the
CPU as an abstract entity; you have to design these things in.

~~~
jacquesm
Erlang is used quite a bit in the gaming industry (backend stuff for online
games) and depends quite strongly on the linked list concept.

~~~
CyberDildonics
He is obviously talking about engine programming. For tools and back ends game
studios will use whatever.

------
dekhn
It's basically 100% true. Any program written in python can be translated to
an equivalently functioning C++ program which requires fewer expensive
operations to run.

The Python VM is doing a fair bit more work per instruction it executes; for
example, incrementing an int isn't going to be a simple register add, it's
going to involve a pointer dereference, some comparisons and branches,
execution of several C functions, and overhead for interpreting the VM
bytecode.

That said, if you understand what Python is doing under the hood you can
greatly speed up your programs and approach C++ performance in some areas.

That said, I strongly recommend all performance-sensitive code is written in
C++ with wrappers to drive it from Python.

~~~
ericfrederich
It's true, you can translate any Python program to C++. When you do this
you'll have the benefit of having debugged it and getting the logic correct in
a safe language. Just have to make sure all of that translates correctly. It's
nicer to develop an algorithm in Python first where your off by one errors
will not segfault or worse silently fail.

------
dalke
See "The Computer Language Benchmarks Game" at
[http://benchmarksgame.alioth.debian.org/](http://benchmarksgame.alioth.debian.org/)
for a head-to-head comparison of various small programs as measured across a
wide number of language implementations. Do be careful that a "language" has
no performance, only implementations of the language. For example, different
implementations of Python may differ by an order of magnitude on a given task,
as might different C compilers. Java and Javascript used to be considered slow
languages, but the effective billions of dollars of R&D improved the
implementations.

The overall solution space is complex. For example, development time on Python
is about 1/2 that of C++ (the classic paper is [http://page.mi.fu-
berlin.de/prechelt/Biblio/jccpprt_computer...](http://page.mi.fu-
berlin.de/prechelt/Biblio/jccpprt_computer2000.pdf) \- is there something more
recent?). Since algorithms can trump raw performance, it may be that in 8
hours its possible to develop a more sophisticated implementation in Python
which outperforms the naive implementation in C++.

In the late 1990s, when people started using Python for traditional
supercomputing problems, the phrase was to use Python to "steer" the low-level
code written in a faster language. For example, NumPy is a Python API which
uses C/ Fortran/ assembly code at lower levels for doing array calculations.
Thus, only some people need to be skilled in both levels, while most can focus
on writing Python code. The actual code has parts written in multiple
languages, and not "only C/C++".

~~~
orf
Didn't the PyPy folks have some issue with the language shootout, which led
them to make their own?[1][2]. IIRC a lot of the Python code wasn't that
optimized at all compared to the C/C++ programs.

1\. [http://speed.pypy.org/](http://speed.pypy.org/)

2\. [https://alexgaynor.net/2011/apr/03/my-experience-computer-
la...](https://alexgaynor.net/2011/apr/03/my-experience-computer-language-
shootout/)

~~~
igouy
[2] The blog post tell's us that Alex Gaynor confirmed "with some CPython core
developers" that his program didn't work because of a bug in CPython.

But the blog post doesn't tell us that _Alex Gaynor never said there was any
problem with a CPython bug._

Alex Gaynor's blog post tell's us that "It's also not possible to send any
messages once your ticket has been marked as closed, meaning to dispute a
decision you basically need to pray the maintainer reopens it for some
reason."

 _But that 's completely untrue!_ You can send messages when the ticket is
marked closed! And you can open topics in the public forum! And you can click
on a username and send email in 2 clicks.

There just wouldn't be any story to blog about, if Alex Gaynor admitted that
he could easily have told me -- _the bug is in CPython not in my program, so
show my program_ \-- but chose to say nothing.

------
vardump
Programming languages aren't slower or faster. The implementations of them
are.

Provided no FFI or extensions written in other languages, non-I/O bound
interpreted CPython is 50-100x slower than C/C++. JIT compiled (PyPy, etc.)
Python might be in 2-10x slower bracket.

If extensions written in C/C++ are used, then you might be effectively
comparing those extensions against another C/C++ implementation.

So _if_ your code is going to be I/O bound and/or can effectively offload all
computation in modules written in C/C++ (like NumPy), then Python might be
practically as fast as C/C++.

If you use pure interpreted Python and implement compute limited algorithm in
it, your code is going to be 50-100x slower than C/C++.

Real life scenarios are probably somewhere in between those two extreme cases.

~~~
overgard
> Programming languages aren't slower or faster. The implementations of them
> are.

People only say this if they haven't implemented a language.

Look at something like PyPy. The work is absolutely brilliant, written by
insanely smart people... and it's still vastly slower than C for reasons that
are implicit in the language. There are just aspects of the language that are
hard to deal with (a lot of hidden allocations on the heap, etc.)

Here's a good talk on the subject: [https://speakerdeck.com/alex/why-python-
ruby-and-javascript-...](https://speakerdeck.com/alex/why-python-ruby-and-
javascript-are-slow)

~~~
spoiler
PyPy is still just a Python VM--it's not compiled, like C is.

You could write a Python compiler, but it would be difficult, and you'd need
to sacrifice certain parts of the language (and/or extend it, even).

Take a look at Crystal[1]. Most Ruby code will compile without modification.
Sure, some cases where type deduction is not possible (like `[]`), and certain
cases of meta programming (like dynamic class creation at run-time), or
accidentally calling a method on Nil (which is actually a feature to prevent
runtime NPE[2]).

[1]: [http://crystal-lang.org/](http://crystal-lang.org/) [2]: Null Pointer
Exception, in case the acronym was ambiguous.

~~~
overgard
Well, there is a python compiler. And the compiled code is fast (although, the
compile times are absurdly long). Specifically, RPython is a compiled
language, and PyPy is written in RPython. So yeah, it's possible, and it has
been done, but on the other hand... RPython is only superficially like Python,
the semantics are vastly different. It's technically a subset, but the way you
approach writing it is way different. It tends to feel like C with some
syntactic sugar. So I don't think that invalidates what I'm saying, I think it
actually demonstrates it: if you want efficient code the language needs to
make efficiency a priority.

~~~
spoiler
I agree, although I wouldn't say efficiency needs to be a priority. The
language should simply be aware of the machine model instead of abstract
those. It should know there's a processor, and that there's memory. Crystal
does a pretty good job at being efficient without sacrificing much!

------
bane
I find Python's speed is _highly_ uneven and dependent on what you are doing
with it.

Are you doing something that has some kind of underlying C implementation
(file I/O usually)? Then it should be pretty quick.

Are you doing something that is all handled by the Python interpreter? Expect
it to be slower than you'd like. e.g. for loops in Python are usually slower
than list comprehensions.

It can be kind of frustration to write a bunch of code that runs lightning
quick and then suddenly see your runtimes explode when you add a few more
fairly trivial lines to complete off some code. I'd almost rather Python had a
more consistent performance profile even if it was a bit slower on average
than fastest possible, because it would make it easier to reason about the
performance of your code.

So far I mainly just use the default Python implementation, but if you're
really struggling with it, it's probably worth it to look into using one of
the JIT or JVM implementations for more consistent speed profiles.

BTW, there's also all sorts of weird performance tweaks in the
community...mostly of the form of syntax A vs. syntax B, and I've found that
surprisingly few of them are "real" or offer more than trivial speedups.

In other cases, in other languages, you _know_ what you need to do to speed
things up, but Python either doesn't offer that mechanism in the language or
buries it under somewhere else (try finding out how to preallocate lots of
entries in a dictionary instead of incrementally growing it, it exists, but
it's almost impossible to find using search words like "preallocate")

------
koonsolo
If you consider the raw languages, C/C++ is indeed faster. But unfortunately
it's not that easy.

It's easier and faster to write a program in python that in C++. So given
limited programmer time, it might be possible to write a faster solution in
python than in C/C++ because you have more time to optimize your algorithm
instead of fighting with the compiler. If you have a big project, you can
spend the extra programmer time you gain by using python by moving crucial
code from python to C. The crucial code can be found by using a profiler. (Saw
an article about this some time ago, but don't remember which company actually
did this)

Finally, python has a lot of math libraries that offer good speed for
computation intensive operations, see
[http://www.scipy.org/](http://www.scipy.org/).

So yes, it's not that easy. I would suggest use Python because it programs
easier/faster, and if you really run into a wall, you can always port parts to
C.

------
tehwalrus
The same code is faster (and easier) to write, correctly, in Python than in
either C or C++. (this depends a little on what you're writing, but not much:
C/C++ make you specify much more about your code than python does.)

The same code, however, will run faster on C/C++. This is because python is
doing some of the work for you, and you did that work upfront when writing the
C/C++.

If you are doing something which involves a lot of computation, which all fits
in RAM (using numpy in python will make that a fairer test between the two
choices), then C/C++ will be faster.

Remember that a good pattern is to write your high-level logic in Python, and
write a python implementation of your app, and then profile it to work out the
slow parts of the code and speed them up using Cython[1], which will allow you
to convert python-like code to efficient C (you can also straight up borrow
C/C++ implementations from other people, too). This pattern is designed to
balance minimising both programmer-time (since you will only have to code the
important bits in C) and run time (since the performance-critical bits can be
optimised.)

[1] [http://cython.org/](http://cython.org/)

(for pedants: I am referring to the CPython implementation, which is the only
one which supports C/C++ extensions.)

------
meir_yanovich
Yes , and there is no point to ask questions like is X scripting is faster
then c++ . each software that is running by software will be slower then
native code .

------
aikah
I wish people would stop saying "C/C++" , these are 2 different languages.

Python* itself is written in C, not in Python. Which should give you clues to
mediate and find your own answers.

*: Python, the canonical implementation.

~~~
omni
> Python itself is written in C, not in Python. Which should give you clues to
> mediate and find your own answers.

PyPy is written in Python and is much faster than CPython in a lot of cases,
so I'm not sure the argument you're making here is valid.
[http://speed.pypy.org/](http://speed.pypy.org/)

~~~
paramsingh
This might be a silly question as I don't know much about compilers or
interpreters, but how is that possible? Also, why hasn't the Python community
tried to make PyPy the default then?

~~~
empyrical
> how is that possible

PyPy is written in a restricted subset of Python called RPython, which gets
translated into C

[https://rpython.readthedocs.org/en/latest/faq.html](https://rpython.readthedocs.org/en/latest/faq.html)

------
koenigdavidmj
CPython is built more naively for lots of operations. A global lookup is a
dictionary lookup. Getting a symbol out of an object is a dictionary lookup.
An object creation is always an allocation; variables don't go on the stack.
Everything is reference counted. The old 'range' builtin would create an
actual list of that length in memory, rather than just keep track of the start
and end points like a C-style for loop.

Other implementations could theoretically do better. PyPy can detect after a
while that a variable can safely be declared on the stack, and so it will
start to do that. If the object type is known, then a field lookup becomes
just adding an offset to the pointer. It can even use different
implementations for the same Python-visible classes (to store a range result
in constant space, but still appear to be the full list, or use a different
type of dictionary for objects and {}, or other things like that).

------
mangeletti
The "Python is the language, not the implementation" answer has already been
given by @dalke, so I'll answer whether cPython, the implementation of Python
that Python developers commonly use, is slower than C/C++:

cPython is necessarily slower and uses more memory than C for most tasks, and
that's because cPython is implemented in C. You couldn't expect a new
programming language that was implemented in Python to be faster than Python,
either.

This is less of a concern for scientists and others with very tough
performance requirements than it otherwise would be, in part, because cPython
code can delegate C and C++ code[0]. This means portions of your program that
need to be highly optimized can be implemented in C or C++, while the rest of
your program can continue to be written in Python.

[0]
[https://docs.python.org/3/extending/extending.html](https://docs.python.org/3/extending/extending.html)

~~~
fauigerzigerk
_> cPython is necessarily slower and uses more memory than C for most tasks,
and that's because cPython is implemented in C. You couldn't expect a new
programming language that was implemented in Python to be faster than Python,
either._

You are probably thinking of purely interpreted languages because for compiled
languages this is not true. You could write a compiler for a new statically
typed language in Python and that language could be orders of magnitude faster
than any Python implementation.

~~~
mangeletti
Ah, yes. Good point.

------
davidkhess
My perspective having developed with Python for over 20 years now: it depends
on the nature and context of the problem being solved.

In my experience, you usually need a CPU intensive problem for the
implementation language to be a speed concern. Many practical everyday
problems solved with Python (outside of contests and specialized fields) tend
to be I/O bound.

Secondly, even when you do have a CPU bound problem, many Python modules you
might use to solve it are wrappers for C/C++ implementations.

As a result, I've come to think of Python as a control and integration
language and less as a platform for base algorithm implementation.

Viewed that way, I've found Python to be unbeatable for speed of
implementation and long-term maintainability (two of my top metrics for
valuing code).

------
Jdam
I implemented an algorithm in Python and C++. Heavily IO bound task, read a
float, multiply it with a factor, save to another file. Input file was several
Gigabytes. I didn't do any optimization and I'm a C++ noob. Python was 30
times slower.

------
fauigerzigerk
It is generally true. For the same algorithm, C++ can be hundereds of times
faster, but sometimes you can push computations down to the C code in which
the most popular Python implementation is written.

A very general approach to optimizing Python code is to look at all your loops
and ask yourself which ones you can replace with library calls.

[https://wiki.python.org/moin/PythonSpeed/PerformanceTips](https://wiki.python.org/moin/PythonSpeed/PerformanceTips)

------
Bootvis
What would be a good book to start to learn high performance C++ code?

~~~
vinay427
The reference manual is Stroustrup's _The C++ Programming Language_ , however
it's not a great read for newcomers to the language. If you are new, I would
start with university lectures (many esteemed schools teach C++ intro classes)
and move onto the Stroustrup book which will get more into the implementations
of STL algorithms and data structures. Those implementations will give you a
good understanding of which to use when, i.e. high performance.

~~~
Bootvis
Thanks, will do. And if I want to go even further and also learn about how the
code interacts with the hardware on the lowest level? So that I for example
know what code will have lots of cache misses?

------
cia48621793
What about PyPy? It generates native (JIT) code and run as fast as C.

~~~
overgard
PyPy is an amazing project, but if you're really writing performance sensitive
code then relying on the heuristics of a JIT is a dangerous game. There are
also a lot of things that PyPy's JIT just can't realistically optimize.

------
uxcn
It's generally difficult to say one language is _faster_ than another, but
there are some significant differences between the languages and
implementations of Python and C/C++ that have very real performance
implications.

Python is a dynamically typed language. This means that the language will
handle conversions between types depending on the context. For example, one
context can yield a 64 bit integer, another a arbitrary precision integer,
another a string. Keeping track of when to use one versus another, and
converting between costs performance. The different data representations for
each type have different performance implications as well. For example, a 64
bit integer can fit into a register, and it's generally only one cycle to
perform any math on it. Arbitrary precision requires math and carry be
performed for every digit in the value. With C++, types are explicit, which
means there is only one type for a value, and the representation is generally
the most efficient (eg 64 bit integer).

Python is also generally interpreted. This means that code can't be directly
executed on the CPU. An intermediate layer, called the interpreter takes an
intermediate representation of the Python code, and translates it into
instructions that run on the CPU. This happens at runtime, which adds overhead
for each intermediate instruction. The more instructions you need to do
something, the more overhead you have. C/C++ is compiled directly into code
that executes on the CPU which is typically faster. This also allows for
higher levels of optimization in code, since there's an understanding of what
a piece of C/C++ code will look like when it executes on the CPU. The compiler
can also optimize the code further, since it knows the code (generally) won't
change. It's also a lot easier to use native features of the CPU like stack
allocation, which is drastically faster than heap allocation.

One of the other differences is that Python tends to provide higher levels of
abstraction than C/C++ code. So, doing something like a set intersection using
_x.intersection(y)_ may not necessarily be the fastest way. Python may do a
hashed compare, but sorting and using _memcmp_ may be faster. It's a lot more
code to write though. This is generally the tradeoff for Python vs. C/C++
(natively compiled code) in general.

If you're concerned with performance, but you like programming in Python, you
may want to check out Haskell. It's provides a lot of the same higher level
abstractions Python provides, but the language is strongly typed and it can be
compiled to native code. A simple example...
[http://uxcn.blogspot.com/2011/11/algorithm-
complexity.html](http://uxcn.blogspot.com/2011/11/algorithm-complexity.html).

~~~
pfultz2
> One of the other differences is that Python tends to provide higher levels
> of abstraction than C/C++ code. So, doing something like a set intersection
> using x.intersection(y) may not necessarily be the fastest way.

C++ provides `set_intersection` as well:
[http://en.cppreference.com/w/cpp/algorithm/set_intersection](http://en.cppreference.com/w/cpp/algorithm/set_intersection)

~~~
uxcn
And the algorithm version may or may not actually be faster than Python.
Essentially, the more abstract and generic things are (like Python), the more
you can use them to easily do complex things arcoss different domains quickly.
However, the actual concrete implementation is ultimately what matters as far
as performance.

So, for example, Python may have a better algorithm, but C++ may use the
hardware more efficiently.

~~~
pfultz2
> Essentially, the more abstract and generic things are (like Python)

And in C++, it provides the same high-level expressiveness as in Python.

> the more you can use them to easily do complex things arcoss different
> domains quickly.

But the advantage of C++ is that those abstractions have almost no overhead.

> So, for example, Python may have a better algorithm

Actually, C++ has a complexity guarantee for all of its algorithms.

> but C++ may use the hardware more efficiently.

Well, the brilliance of what Stepanov did was to build general purpose
algorithms with out taking away random access capabilities which is beneficial
for hardware efficiency(even more so now with modern cpus).

------
yoklov
Yes. Does it matter? It depends.

If you're writing a game in python, you're going to have a bad time. Yes, I
know CCP does it, but from what I've heard their engine is C++, and even then
it's been a long, uphill battle. You're just going to have a very hard time
meeting soft-realtime performance constraints (e.g. 30ms or 15ms frames) if
very much of your code is in python.

If you're doing a programming challenge, you'll probably be fine? I've never
done a programming challenge, but my assumption is that getting a result is
the goal, and the performance of the code that gets it isn't that big of a
deal, so long as it runs in a reasonable amount of time.

Really, the point is that you aren't going to see algorithmic-level speed
differences between C/C++ and python most of the time. There are cases where
this is untrue (e.g. where the algorithm relies on low level memory control),
but it's true enough. You'll see a constant factor speedup, ranging somewhere
between small and massive. I've heard 100x is typical for code that runs
mostly in python and isn't just calling out to C, C++, or Fortran.

That said, the 100x number could be way off. For compute intensive tasks, if
you're using numpy, you'll see a much smaller speedup. If you're not using
numpy, but are great at writing multicore and simd code, you could see a 100
_(num_vector_lanes)_ (num_cores), so on a recent quad-core intel chip, you
might be able to get a 100 _8_ 4 = 3200x speedup. This would take a long time
to write, only applies to problems that are both compute-bound and easily
parallelizable, and could be overly optimistic, but you get the idea (I don't
know where the 100x number comes from, and how well the C++ programs it was
compared with were written -- if it's from Alioth, last I checked they tended
to leave a lot of perf on the table, but are not too bad).

Also, cache friendliness can make a massive difference. I've seen 10x speedups
in C++ programs by utilizing the cache better, and I've heard of people
getting even more significant speedups. Python's implementation is extremely
cache unfriendly, _but_ writing cache friendly C++ programs is non-obvious to
a lot of programmers (or they don't consider it, or something), so you might
loose out here. Along these lines, if your memory allocation patterns in C++
are naive, your code won't perform very well, and a lot of C++ programmers
fall victim to this.

So who knows. If you're great at C/C++, know your architecture, are excellent
at parallization and SIMD, and have the right problem, the difference could be
massive. Barring that, I'd guess you'd get something like a 10x-50x speed up
for average code, which is a lot, but maybe not that significant

Sort of rambled there, hope that made sense.

P.S. A lot of this is only true for CPython. I hear PyPy is faster (still no
simd, multicore, or memory allocation or layout control though, so I suspect
you can still beat it a lot of the time).

------
known
If we load entire data into memory the difference is negligible.

~~~
mikeash
This is actually likely to be the scenario in which the differences are most
visible. If your data fits in memory then you're more likely to be compute-
bound, at which point C or C++ will completely flatten Python.

If your data is too big to fit into memory, then you're likely to be I/O
bound, at which point the cost of I/O operations is likely to erase
performance differences between the languages.

