
Why Python Is Slow: Looking Under the Hood - karlheinz_py
https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/
======
tjradcliffe
A related question is: why is Perl so fast?

Quite a few years ago I wrote a little Runge-Kutta solver in Perl for some
simulation work. It seemed like a good idea at the time. The equations of
motion had to be integrated over a very long time, and it could take hours for
a single run (still much faster than the Monte Carlo it was being used to do a
sanity-check on). I re-wrote everything in C++, and picked up less than a
factor of two in speed.

So it isn't that "Python is interpreted" that is the problem, because "Perl is
interpreted" in exactly the same way. It really does seem to come down to the
Python object model. Perl's scalar types have vastly less overhead, so much so
that you can actually do reasonably efficient numerical computation in it.

I abandoned Perl for Python shortly thereafter because once I got over the "oh
my god it's full of whitespace" thing Python was just more fun to code in, but
the speed that Perl provided is something I've definitely missed, and it was a
real awakening to the notion that interpreted languages don't have to be slow.
The striking thing was that unlike Java (say) where it _can_ be fast but you
generally have to think about it, I was getting fast Perl without even really
trying.

~~~
haberman
I'm not sure your results are typical.

In microbenchmarks, Perl is 2-125x slower than C++:
[http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...](http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=perl&lang2=gpp&data=u32)

And Java is quite a bit faster than Perl too:
[http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...](http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=perl&lang2=java&data=u32)

Perl isn't really that fast. It's faster than Python in most cases, but gets
beat by Lua pretty consistently:
[http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...](http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=perl&lang2=lua&data=u32)

~~~
cleaver
For a general-purpose problem, Perl won't be particularly fast. For a Perl-
type problem (scanning and parsing big files), Perl is very fast.

Doing a Perl-type problem in a general-purpose language would be considerably
slower. However, Python or others will perform much better in the "can I read
my own code six months later" benchmark.

~~~
haberman
> For a Perl-type problem (scanning and parsing big files), Perl is very fast.

I think it's a matter of what you're comparing it to.

Compared to using Perl for a general-purpose problem, Perl for
scanning/parsing is fast.

Compared to scanning/parsing with C, Perl is not fast.

    
    
        $ ruby -e '1.upto(1000000) { |n| puts "This is line number #{n}" }' > file
        $ time perl -ne 'print if /number 12345/' < file
        [...]
        real    0m0.193s
        user    0m0.189s
        sys     0m0.004s
        $ time grep "number 12345" file
        [...]
        real    0m0.023s
        user    0m0.019s
        sys     0m0.005s
    

I gave Perl every possible advantage here. I didn't actually even write any
Perl except a regular expression, which is delegated immediately to C. I
didn't even write the loop in Perl, I like the Perl main() function handle
that. And still the C program is almost 10x faster.

Note: these test runs are from Linux. On OS X the Perl results were almost the
same, but "grep" was unexplainably way slower. It seems to hang after it's
already dumped all of its output. Basically grep on OS X appears to be badly
broken somehow.

~~~
kbenson
What you are seeing is different regex engines and capabilities, and grep's
focus on pure speed and optimization of a common case and Perl's focus on
versatility.

I see very similar results between Perl and grep, and you can see this by also
including egrep, which allows slightly more complex expressions:

    
    
        [root@stats ~]# time perl -ne 'print if /number 123456/' < /tmp/file
        [...]
        real    0m1.990s
        user    0m1.937s
        sys     0m0.049s
        [root@stats ~]# time grep "number 123456" /tmp/file
        [...]
        real    0m0.158s
        user    0m0.115s
        sys     0m0.035s
        [root@stats ~]# time egrep "number 123456" /tmp/file
        [...]
        real    0m0.150s
        user    0m0.127s
        sys     0m0.023s
    

But what happens if we use a slightly more complex expression?

    
    
        [root@stats ~]# time perl -ne 'print if /number [1]23456/' < /tmp/file
        [...]
        real    0m1.989s
        user    0m1.925s
        sys     0m0.047s
        [root@stats ~]# time grep "number [1]23456" /tmp/file
        [...]
        real    0m1.402s
        user    0m1.366s
        sys     0m0.022s
        [root@stats ~]# time egrep "number [1]23456" /tmp/file
        [...]
        real    0m1.414s
        user    0m1.382s
        sys     0m0.031s
    

The difference becomes much less pronounced. What if we make the expression
just a bit more complex?

    
    
        [root@stats ~]# time perl -ne 'print if /number [1]23456[0-9]*/' < /tmp/file
        [...]
        real    0m1.950s
        user    0m1.910s
        sys     0m0.039s
        [root@stats ~]# time grep "number [1]23456[0-9]*" /tmp/file
        [...]
        real    0m9.353s
        user    0m9.307s
        sys     0m0.037s
        [root@stats ~]# time egrep "number [1]23456[0-9]*" /tmp/file
        [...]
        real    0m9.539s
        user    0m9.483s
        sys     0m0.045s
    

So, now we have the Perl regex engine fairly static across extra complexity
while grep and egrep are seeing order of magnitude time increases, and are
_much_ slower than Perl at this point. I suspect your first benchmark was the
result of a specific optimization grep has that Perl doesn't, or it may be
that grep was able to switch to using a DFA regex for that first case, while
Perl doesn't both with a completely different regex implementation for special
cases like that.

Anecdata: I needed to process a large amount of XML a while back, to the point
where a week spent testing and optimizing XML parsing libraries in Perl was
worth it, because it could shave weeks or months off the processing time. The
winner? A regex that captured attributes and content and assigned name/value
pairs directly out to a hash. This was only possible because the XML was
highly normalized, but it was actually _over 10 times faster_ than the closest
competitor for XML parsing I could fine, and I checked all the libXML libXML2,
and SAX libraries I could get my hands on.

In the end, it was something as simple as the following approximation:

    
    
        while (my ($doc) = $xml =~ /$get_record_xml_re/) {
            my %hash = $get_record_xml =~ /$record_begin_re$capture_name_and_value_pairs_re$record_end_re/;
            process_record( \%hash );
        }

~~~
haberman
Your results are interesting and I'd be curious to know why the grep degrades
so badly on that last regex.

But the original benchmark was ridiculously biased in favor of Perl by not
actually doing anything in Perl.

If Perl is actually being competitive in the unfair benchmark, the benchmark
should be made more fair by actually putting some logic in Perl, and writing
the equivalent logic in C. At that point, you would start to see C win again
(modulo any inherent inefficiencies in grep's regex engine).

> This was only possible because the XML was highly normalized

Another way of putting this is: your regex wasn't actually an XML parser.
Things that are actually XML parsers were slower. This is not too surprising.

A 10x slowdown _does_ surprise me somewhat. It doesn't surprise me that you
beat libXML or any library that builds the XML structure into a complete DOM
before you can process the first record. It does surprise me that you beat SAX
by 10x. SAX does have some inefficiency built in, like how it turns attributes
into a dictionary internally before giving them to the application. That would
probably mean that SAX bindings for Perl to a C parser would have to take a
full SAX attribute hash and turn it into a Perl attribute hash. Still, 10x is
pretty bad.

~~~
kbenson
> Your results are interesting and I'd be curious to know why the grep
> degrades so badly on that last regex.

I really do think it has to do with grep swapping out regex implementations
based on features needed. The last regex matches a variable length string, so
it may trigger a much more complex and/or cpu-intensive regex engine to be
used.

> If Perl is actually being competitive in the unfair benchmark, the benchmark
> should be made more fair by actually putting some logic in Perl, and writing
> the equivalent logic in C. At that point, you would start to see C win again
> (modulo any inherent inefficiencies in grep's regex engine).

I see your point, but I think it's less relevant than you suppose. Regular
expressions are first class citizens in Perl, just as much as Arrays and
Hashes. This doesn't just mean that the syntax has some niceties, but you can
actually call Perl code within the regex itself[1], and even use this feature
to build a more complex regular expression as you parse[2]. Complaining that
Perl uses a regex and it isn't Perl is sort of like complaining Perl is using
hashes, and any fair benchmark between C and Perl should just stick to Arrays.

> Another way of putting this is: your regex wasn't actually an XML parser.
> Things that are actually XML parsers were slower. This is not too
> surprising.

Yes. I didn't want to give the impression a wrote a general purpose XML parser
that beat all the C implementations I could find. I still think it's
interesting that well formed regular expressions are performant enough in this
circumstance to make them a preferred alternative of the many options. I could
have written a simple parser in C that would have been faster, but the
solution I ended up with is quite fast, robust, and very, _very_ easy to
debug.

> That would probably mean that SAX bindings for Perl to a C parser would have
> to take a full SAX attribute hash and turn it into a Perl attribute hash.
> Still, 10x is pretty bad.

I think it's more related to the fact that the actions of the regex parsing
implementation when optimized sufficiently is very close in implementation to
C code that steps through a char array looking for the record beginning and
ending indicators, and for the content between them, and then steps through
the record looking for data items and saving the key and value for each. The
main benefit of using a regex in Perl is that I get what is probably a fairly
close approximation (all things considered) of a C implementation of that
without having to write any C, and without having to marshal data back and
forth to a library to accomplish it, with very simple and concise code.

There are other tasks which obviously aren't going to be nearly as efficient
in Perl, but this exchange was spurred by someone talking about Perl being
very fast for a Perl-type problem, which this definitely is.

1:
[http://perldoc.perl.org/perlre.html#(%3f%7b-code-%7d)](http://perldoc.perl.org/perlre.html#\(%3f%7b-code-%7d\))

2:
[http://perldoc.perl.org/perlre.html#(%3f%3f%7b-code-%7d)](http://perldoc.perl.org/perlre.html#\(%3f%3f%7b-code-%7d\))

~~~
haberman
> I really do think it has to do with grep swapping out regex implementations
> based on features needed.

That wouldn't explain why one regex engine is 5x faster than another. Only
looking at the regex engines themselves would tell you that.

> Complaining that Perl uses a regex and it isn't Perl

I'm not complaining that it uses a regex, I'm complaining that it doesn't do
anything else.

A representative Perl program would use regexes _and_ contain some logic that
processes the results of those regexes.

> I think it's more related to the fact that the actions of the regex parsing
> implementation when optimized sufficiently is very close in implementation
> to C code that steps through a char array

I used to think that, but it is really not true unless your regex engine
contains a JIT compiler.

Specialized machine code for a text parser (which is what you would get from
writing C) is significantly faster than generic NFA/DFA code. In these tests,
an average of 65% of runtime was saved when the regex engine included a JIT
(ie. the specialized code was over twice as fast):
[http://sljit.sourceforge.net/pcre.html](http://sljit.sourceforge.net/pcre.html)

~~~
kbenson
> That wouldn't explain why one regex engine is 5x faster than another. Only
> looking at the regex engines themselves would tell you that.

It could definitely explain it, but it may not be the best explanation given
the facts. I'll definitely concede that it's pure conjecture.

> A representative Perl program would use regexes and contain some logic that
> processes the results of those regexes.

Sure, depending on what you want to show. Nobody is trying to say Perl is as
fast or faster than C, just that _relatively_ , it's fast for the development
cost it requires.

>> I think it's more related to the fact that the actions of the regex parsing
implementation when optimized sufficiently is very close in implementation to
C code that steps through a char array > I used to think that, but it is
really not true unless your regex engine contains a JIT compiler.

I think we're referring to different things, which is mostly my fault for
being loose with my terminology. I really only meant close to C in a
conceptual manner, which yields some performance benefit by keeping a large
chunk of the looping and work storing specific chunks of text low level and in
the interpreter. I wasn't trying to imply the regex engine's cost was
negligible or the actual machine operations we comparable in a large way.

------
rurban
Unfortunately this blog post is misleading and uninformed. Python and the
related other slow dynamic languages perl, php, and ruby are slow not because
the dynamic type checks, branches and indirections are costly, but because
they are badly written. They are just inefficient and poor interpreters, with
bad ops and data structures. In comparison efficient dynamic languages like
lisp, scheme, lua or javascript and many others not so well known do exist.
Mine, potion, is also in the lua category, and not in the slow PPPR (python,
perl, php, ruby) category.

The seminal paper about efficient dynamic interpreters is Ertl's "The
Structure and Performance of Efficient Interpreters"
[http://www.complang.tuwien.ac.at/papers/ertl%26gregg03jilp.p...](http://www.complang.tuwien.ac.at/papers/ertl%26gregg03jilp.ps.gz)

The abstract is: "Interpreters designed for high general-purpose performance
typically perform a large number of indirect branches (3.2%–13% of all
executed instructions in our benchmarks). These branches consume more than
half of the run-time in a number of configurations we simulated. We evaluate
how accurate various existing and proposed branch prediction schemes are on a
number of interpreters, how the mispredictions affect the performance of the
interpreters and how two different interpreter implementation techniques
perform with various branch predictors. We also suggest various ways in which
hardware designers, C compiler writers, and interpreter writers can improve
the performance of interpreters."

My own input to this is that the GC, the JIT, the calling convention and esp.
the size of the datastructures and ops do matter more, than eliminating the
type checks.

~~~
lispm
I think the paper is even worse, in that it spreads confusion about what an
Interpreter for a programming language actually is, what it does and what it
provides.

The paper fails to define the concept of what to call an Interpreter. A
bytecode interpreter is entirely different from a Lisp interpreter, which
actually runs from source structures. Mixing VMs into the picture makes it
only worse.

Part of the reason may be that they don't understand what an Interpreter
actually provides and why it is used.

------
btown
Why, oh, why does CPython bother keeping refcounts for small integers? Sure,
it lets you make pretty graphs with [sys.getrefcount(i) for i in
range(1000)]... but that's an extra memory read and write on every instruction
that uses an integer. I can only imagine that not only are these extra
instructions, but they're extra instructions that kill pipelining, if the
interpreter needed to do "a = 1; b = 1; c = 1" for instance. Maybe they found
that the branch needed to not bother refcounting for small integers was even
slower? Does anyone know if that was ever tried?

~~~
dalke
What you propose sounds like it would be a pure headache for all code which
otherwise expects a uniform memory API.

Consider a C extension which takes an object and appends it to a list. If
small integers did not have a refcount then that extension would have to have
special code, like "if object is not a small integer, then increment the
reference count".

~~~
kazinator
I have experience turning a Lisp dialect with refcounted integers into
supporting non-refcounted integers, identified by a type tag field in the
"value cell" type.

> _If small integers did not have a refcount then that extension would have to
> have special code, like "if object is not a small integer, then increment
> the reference count"._

Easily implemented in one place in the "increment_refcount(obj)" inline
function:

    
    
        // roughly speaking
        if (is_heap_pointer(obj))
          obj->refcount++;
    

where "is_heap_pointer(obj)" is an inlined bitmask check like

    
    
        (((unsigned int) obj) & TAG_MASK) == TAG_PTR)
    

If TAG_PTR is zero bits, then a value which satisifes is_heap_pointer can be
dereferenced straight.

In a garbage collection implementation, you don't have refcounts, but the
garbage collector's "mark object" function does the same check:

    
    
       if (!is_heap_pointer(obj))
         return;  // don't try to mark non-heap things
    
       switch (type(obj)) {
       case TYPE_CONS_CELL
          mark_obj(obj->cons.car);
          mark_obj(obj->cons.cdr);
          break;
       // ...
       }
    

Another thing is that you provide an API to the extension writers, which
abstracts the use of objects. For instance, you can give them a function that
can be called like this:

    
    
       value n = number(42);
       value z = number(INT_MAX);
    

The first case might construct an unboxed value because the integer is small
enough. But perhaps the second returns a bignum because INT_MAX requires 32
bits, wheres unboxed integers only go up to 30 bits.

So in the first case you get an object with no refcount, whereas in the second
you get an object with a refcount of 1. The extension code is written such
that it doesn't care.

~~~
dalke
Those are excellent points. In the context of CPython, Guido van Rossum made
the design decision to not use tagged objects, based on experience with ABC
implementation, which did.

See [https://mail.python.org/pipermail/python-
dev/2004-July/04614...](https://mail.python.org/pipermail/python-
dev/2004-July/046147.html) .

I just realized though that since the patch to support tagged integers is
small, a closed system like the CCP Games distribution of Python, which has no
extensions they don't control, might be able to use this idea.

------
shanemhansen
cPython does alot of work for just about any piece of code. Just a small
handful of things:

1\. everything's a hash (object data, variable lookup, etc). jmoiron wrote a
great article on the topic [http://jmoiron.net/blog/whats-going-
on/](http://jmoiron.net/blog/whats-going-on/)

2\. refcounting introduces overhead for every variable access

3\. loops are making all kinds of function calls to __iter__, next(), and
catching StopIteration, which makes it hard to have a tight loop.

4\. There's the GIL, builtin locking isn't great.

5\. lots of indirection (lack of value types)

~~~
angersock
Well, it's hard to write an interpreter with the performance of, say, MRI
Ruby.

~~~
masklinn
I'm getting poe'd so I'll check: you're being sarcastic, right?

~~~
angersock
:3

(do a little research...you might be pleasantly surprised)

------
madengr
Please forgive my ignorance, but why can't Python just be compiled into
assembly like C? The compiler should be able sift through the code and see
that "x=1" is an integer, or converts to a float when "x=x+1.0".

If I'm doing something like counting from 1 to 10 and summing the count, why
can't this be compiled to run the same speed as C?

Obviously since it's interpreted, but when I'm done with development, why
can't I just run it through a compiler to get fast assembly code?

Way back when, you could get a BASIC compiler to turn your interpreted BASIC
code into assembly.

What fundamental am I missing?

~~~
plikan13
Why can't a C++ compiler optimize away virtual function tables? If a C++
compiler could deduce the exact derived type of all objects at compile time,
it could call the correct virtual function statically instead of going through
the extra indirection at runtime.

~~~
zanny
If you declare a C++ virtual function final the compiler can avoid the virtual
call table for any invocations of that function in the future on that object
or any of its derivatives. Of course, you then cannot override it in child
classes anymore.

If you declare a virtual function inline, if it is invoked from a well defined
object (ie, not through a reference or pointer) you also get a similar effect
while still being able to override, except the code is not a static call but
is just inlined with that one implementation of the function.

------
Skunkleton
Another reason python is slower than other languages is that abstraction adds
significant overhead which most implementations cannot optimize away. See
here:

[http://blog.reverberate.org/2014/10/the-overhead-of-
abstract...](http://blog.reverberate.org/2014/10/the-overhead-of-abstraction-
in-cc-vs.html)

~~~
coolsunglasses
The overhead of abstraction...unless you're using Haskell where typeclass
instances get specialized, so you can abstract without slowing things down.

~~~
tome
Let's not overstate the case. There's still plenty of abstraction which is
tricky or impossible for GHC to optimize. But yes, with static analysis comes
the possibility of nice optimizations.

~~~
coolsunglasses
Indeed.

I like the tricky one they're kicking around that would make `lens` faster.
Pretty intriguing. I'm hopeful they'll be able to make it work in STG.

------
mikkom
Almost all these points apply to javascript as well. Javascript, however, is
not really slow anymore.

[http://benchmarksgame.alioth.debian.org/u64/benchmark.php?te...](http://benchmarksgame.alioth.debian.org/u64/benchmark.php?test=all&lang=v8&lang2=python3&data=u64)

~~~
jlebar
> Almost all these points apply to javascript as well.

Sort of.

The article talks about why CPython, an implementation of the Python language,
is slow. You're right that a naive implementation of Javascript would have
many of the same problems, and indeed many of the first implementations of JS
did have these problems. But due to heated competition, Mozilla, Google,
Apple, and Microsoft have invested a huge amount of top-quality engineer-years
into optimizing their implementations of JS.

Implementations of Python, in contrast -- and CPython in particular -- have
not received the same sort of love.

I largely agree with the notion that the speed of modern JS engines serves as
an existence proof that Python could be similarly fast.

~~~
anentropic
that's what PyPy is [http://speed.pypy.org/](http://speed.pypy.org/)

------
richardwhiuk
Previously:
[https://news.ycombinator.com/item?id=7721096](https://news.ycombinator.com/item?id=7721096)

This is at least 6 months old.

~~~
veneratio
Ah I thought I had seen this before.

------
ggchappell
> 2\. Python is interpreted rather than compiled.

Can we stop saying things like this? Virtually all Python is compiled, as part
of the interpretation process.

There is a valid point here, of course. The point is that the top Python
compilers do not compile to native code. So say that.

Words have meanings. Use them correctly. <grumble, grumble>

~~~
jeffreyrogers
There are three primary methods of running code: interpretation, compilation
to object code, and running on a VM/JIT. Obviously python is interpreted by
any meaningful definition of interpretation and thus it is not compiled in the
typically sense of the word (at least not in the implementation everyone
uses).

The reason human language is so expressive is because we can leave out a lot
of context and formalism that is required in mathematics and programming
languages because the listener/reader will be able to infer it. It's pedantic
to expect a writer to hedge against every possible interpretation, when their
focus should be on communicating clearly in the first place.

~~~
andreyf
FWIW, wikipedia says "The main Python implementation, named CPython, [...]
compiles Python programs into intermediate bytecode, which is executed by the
virtual machine." I'm not sure why people call Python's VM an interpreter, but
it's definitely interpreting byte code, not the source directly.

This is very different from Perl, where a line of code isn't parseable until
you know the values of the variables, e.g.

    
    
        whatever / 25 ; # / ; die "this dies!";

~~~
haberman
> This is very different from Perl, where a line of code isn't parseable until
> you know the values of the variables

Perl is crazy, but I don't think it's that crazy.

From
perlcompile([http://perldoc.perl.org/5.8.9/perlcompile.html](http://perldoc.perl.org/5.8.9/perlcompile.html))

    
    
        Perl has always had a compiler: your source is compiled
        into an internal form (a parse tree) which is then optimized
        before being run.
    

If what you say is true, it would not be possible to produce a parse tree
prior to running the program.

I could not get your code sample to die based on the type/value of "whatever"
\-- can you?

~~~
scott_s
Apparently, Perl is that crazy. See
[https://news.ycombinator.com/item?id=5770531](https://news.ycombinator.com/item?id=5770531)

Basically, you can't always parse a Perl program without running it. There is
a subset that you can, not not everything.

~~~
haberman
> Basically, you can't always parse a Perl program without running it.

That is true, but what the grandparent said is not:
[https://news.ycombinator.com/item?id=8626454](https://news.ycombinator.com/item?id=8626454)

To expand on this, I think a succinct way of summarizing the issue is: you
might have to run BEGIN blocks in Perl to parse the non-BEGIN-block part of
the program. The canonical example of this is:

    
    
        BEGIN {
            if(arbitrary_function()) {
                eval "sub whatever() { }; 1" or die $@;
            } else {
                eval "sub whatever { }; 1" or die $@;
            }
        }
    
        # The parsing of this line depends on the result of arbitrary_function()
        whatever  / 25 ; # / ; die "this dies!";
    

arbitrary_function() can be anything, so the only way to parse the rest of the
program is to actually run arbitrary_function().

------
slantedview
"it's slow"

"just write a C extension"

"but then why use Python?"

~~~
bch
Because beautiful control structures and a REPL ?

~~~
pjmlp
There are many languages that offer that alongside an AOT compiler to native
code.

~~~
bch
Well sure -- but I guess considering Python at all suggests you want or need
to. Python control and the REPL are _fine_ reasons to use it. To be clear
though, I'm a Tcl-er, but I use it in exactly this context. I meant my comment
to stand along the other sibling comments.

------
TheLoneWolfling
Is it just me who sees a large chunk of these as flaws in the Python
_implementation_ as opposed to the Python _language_?

Dynamic typing can often be optimized at the compilation stage - and yes,
Python has a compilation stage - this particular example is basic type
inference, for example. Even more complex examples can be optimized by
emitting specialized versions for the types that the compiler can see it will
be called with. Effectively emitting specialized versions of specific nodes in
the control flow graph.

~~~
dagw
_Is it just me who sees a large chunk of these as flaws in the Python
implementation as opposed to the Python language?_

There are a number of design decision in python language that makes it
inherently hard to write a fast python implementation.

~~~
Animats
Ah, the right answer!

There are lots of languages where variables are dynamically typed that can be
compiled and optimized with a JIT compiler. In Python, though, any code can
mess with any data, using "setattr". You can find all the variables in another
module and mess with them at run time, using their names as strings. At
compile time, the compiler can't detect that's going to happen.

This is sometimes called the Guido von Rossum Memorial Boat Anchor.

So Python implementations usually have to assume the worst case. Google's
attempt at a faster, compatible Python, "Unladen Swallow", was an embarrassing
failure. PyPy manages to get past that, but at a huge cost in JIT compiler
complexity. After 12 years of work, PyPy is finally shipping stable versions
and starting to get some use. It's been all uphill for the PyPy developers,
though.

It's kind of sad. Python never achieved its full potential because of the
speed problem. Google ended up developing Go because Python was too slow.

~~~
adamtj
How is python substantially different from javascript in this case? Google has
a crazy fast javascript JIT compiler, even though javascript has something
equivalent to python's setattr().

~~~
masklinn
> even though javascript has something equivalent to python's setattr().

Python provides significantly more hooks into the behaviour of arbitrary
objects, and arguably provides more flexibility than JS.

Here are a few notes from Mike Pall (LuaJIT) on Python features making a
straight interpreter slower, and the language harder to optimise (JIT
tightly): [http://lua-users.org/lists/lua-l/2004-11/msg00083.html](http://lua-
users.org/lists/lua-l/2004-11/msg00083.html)

Animats is full of shit, the ability to add arbitrary "static" attributes to
existing objects is really not the main problem.

------
philippeback
Pharo 3's results (JIT enabled VM)

[(1 to: 100000000) sum] timeToRun 0:00:00:07.335

[http://pharo.org](http://pharo.org)

Version 4 with new VM due in 2015.

Will perform much much better as with the range used, we need to use
LargeIntegers as the 32 bit VM must promote to LargeInteger objects. With the
64bit VM, all fits in.

~~~
masklinn

        > pypy -mtimeit 'sum(xrange(1, 100000001))'
        10 loops, best of 3: 153 msec per loop
    

however the context of TFA is scientific computing, hence pypy being
ignored/dismissed

~~~
philippeback
We can do it this was if needed.

[http://clementbera.wordpress.com/2013/06/19/optimizing-
pharo...](http://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-
speed-with-nativeboost-ffi/)

~~~
masklinn
Having an FFI is not relevant, Pypy has an excellent FFI already[0].

You can't interface with the existing SciPy ecosystem which expects the
CPython API, which is the reason why pypy doesn't matter in scientific
computing.

[0] inspired by LuaJIT and usable from both CPython and PyPy:
[https://cffi.readthedocs.org/en/release-0.8/](https://cffi.readthedocs.org/en/release-0.8/)

------
rhgraysonii
One of my favorite topics lately is Python optimization. A few weeks ago FB
put out a release on how they had done fast randomized SVD and in the
implementations section they mentinoned using intels kernel math libraries for
the BLAS libraries. Largely the result of this is many more functions are
F-Contiguous rather than C-contiguous (in linear algebra, fortan obviously
wins here). I had done something similar on a much smaller scale about 4
months ago. [https://medium.com/@_devbob/from-0-to-warp-
speed-b780a2bc36c...](https://medium.com/@_devbob/from-0-to-warp-
speed-b780a2bc36ce)

Does anyone else have some good bits or blog entries on these sorts of inner
workings and optimizations?

------
scott_s
Even if you're already familiar with all of the main reasons that answer the
question (that is, you're already familiar with how a VM works), read the
_Just for fun: a few "never use these" hacks_. I mean, _wow_. That's the kind
of nastiness I normally associate with C's free-for-all memory model.

~~~
slashnull
With the nuance that C _forces_ you to deal with those _never use these hacks_
in order to do anything non-trivial.

I can't see why anyone would go and mess with CPython's internals like that in
production code. Other than for nefarious purposes, I guess.

------
sgt101
In simple language -> because it's difficult to provide the machine with the
information it needs to make decisions about efficiency.

------
hackersf
Python's GC is a conservative collector as opposed to Java's precise collector
=> terrible GC compaction.

------
slashnull
Short answer: because it doesn't JIT

~~~
ludamad
That's a lossy short answer. LuaJIT with JIT turned off still runs circles
around CPython.

~~~
trynumber9
Even normal Lua is fast compared to Python, if the shootout games are any bit
accurate.

[http://benchmarksgame.alioth.debian.org/u64/benchmark.php?te...](http://benchmarksgame.alioth.debian.org/u64/benchmark.php?test=all&lang=lua&lang2=python3&data=u64)

~~~
igouy
You are told how the measurements were made -- Are the benchmarks game
measurements _accurate_?

I think these questions are more to the point: Were the Python programmers as
skilled as the Lua programmers? Do the programs do anything like the things
your programs do?

------
mahouse
I don't understand why is it compared to C, when it can be compared to Perl or
PHP, which are also dynamically typed, interpreted, etc, but much faster.

~~~
masklinn
1\. Because TFA is about scientific computing

2\. And in that context (and most others), the difference between Perl, PHP
and CPython is basically non-existent, you might get 2x on one bench, half on
the other

------
pnathan
Heh, Python is slow: in order to write fast Python, you have to write in C and
conform to the FFI.

I vastly prefer to use tools which don't come misshapen out of the box,
personally.

~~~
the_real_bto
Different tools for different jobs. Being able to develop code quickly is a
huge win that Python delivers. Sometimes that makes Python the right tool,
e.g. Youtube. Other times slow execution speed makes it the wrong choice.

[0] [http://www.gooli.org/blog/youtube-runs-on-
python/](http://www.gooli.org/blog/youtube-runs-on-python/)

~~~
pnathan
other tools also deliver code quickly. that's an evasion of the criticism.

