Hacker News new | comments | show | ask | jobs | submit login
Why Python Is Slow: Looking Under the Hood (jakevdp.github.io)
220 points by karlheinz_py on Nov 18, 2014 | hide | past | web | favorite | 148 comments



A related question is: why is Perl so fast?

Quite a few years ago I wrote a little Runge-Kutta solver in Perl for some simulation work. It seemed like a good idea at the time. The equations of motion had to be integrated over a very long time, and it could take hours for a single run (still much faster than the Monte Carlo it was being used to do a sanity-check on). I re-wrote everything in C++, and picked up less than a factor of two in speed.

So it isn't that "Python is interpreted" that is the problem, because "Perl is interpreted" in exactly the same way. It really does seem to come down to the Python object model. Perl's scalar types have vastly less overhead, so much so that you can actually do reasonably efficient numerical computation in it.

I abandoned Perl for Python shortly thereafter because once I got over the "oh my god it's full of whitespace" thing Python was just more fun to code in, but the speed that Perl provided is something I've definitely missed, and it was a real awakening to the notion that interpreted languages don't have to be slow. The striking thing was that unlike Java (say) where it can be fast but you generally have to think about it, I was getting fast Perl without even really trying.


I'm not sure your results are typical.

In microbenchmarks, Perl is 2-125x slower than C++: http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...

And Java is quite a bit faster than Perl too: http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...

Perl isn't really that fast. It's faster than Python in most cases, but gets beat by Lua pretty consistently: http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...


For a general-purpose problem, Perl won't be particularly fast. For a Perl-type problem (scanning and parsing big files), Perl is very fast.

Doing a Perl-type problem in a general-purpose language would be considerably slower. However, Python or others will perform much better in the "can I read my own code six months later" benchmark.


I don't know why I always have to chip in on "perl is unreadable lol" comments, but over the last 8 years apart from a steady trickle of C coding here and there the bulk of my dayjob has moved around from C/Verilog, then I discovered Ruby, then Perl/R/Python, to full-time Python now.

It's true that unsupervised, weak coders using perl turn out worse code than in other languages. But it really doesn't take much to produce good code if you have a tiny bit of supervision/discipline (which stems mostly from 80% of perl tutorials on the web are teaching 1997-era perl anti-patterns). And/or if you happen to be a stronger programmer in something else, hopefully you stumble across perlcritic and modern perl patterns more quickly.

Well-written Moose code involves less boilerplate, the declarative nature and composability of classes, types and data members with free type/value validation is a delight to maintain, and results in more robust code with a lot fewer silent failures (or objects happily chugging along silently with invalid state) than typical Python classes.

Of course, now that I can't use Moose and the Python community actively seems to discourage the very thought of relying on any superset of core Python OO features like enthoughts' traits package - I really want to revisit static/stronger typed programming languages for large projects. So it feels like I've come full-circle in my programming career...


Check out Nim.

    import rdstdin, strutils
    
    let
      time24 = readLineFromStdin("Enter a 24-hour time: ").split(':').map(parseInt)
      hours24 = time24[0]
      minutes24 = time24[1]
      flights: array[8, tuple[since: int,
                              depart: string,
                              arrive: string]] = [(480, "8:00 a.m.", "10:16 a.m."),
                                                  (583, "9:43 a.m.", "11:52 a.m."),
                                                  (679, "11:19 a.m.", "1:31 p.m."),
                                                  (767, "12:47 p.m.", "3:00 p.m."),
                                                  (840, "2:00 p.m.", "4:08 p.m."),
                                                  (945, "3:45 p.m.", "5:55 p.m."),
                                                  (1140, "7:00 p.m.", "9:20 p.m."),
                                                  (1305, "9:45 p.m.", "11:58 p.m.")]
    
    proc minutesSinceMidnight(hours: int = hours24, minutes: int = minutes24): int =
      hours * 60 + minutes
    
    proc cmpFlights(m = minutesSinceMidnight()): seq[int] =
      result = newSeq[int](flights.len)
      for i in 0 .. <flights.len:
        result[i] = abs(m - flights[i].since)
    
    proc getClosest(): int =
      for k,v in cmpFlights():
        if v == cmpFlights().min: return k
    
    echo "Closest departure time is ", flights[getClosest()].depart,
      ", arriving at ", flights[getClosest()].arrive
https://github.com/Araq/Nimrod/wiki/Nimrod-for-C-programmers


If I had to guess speed is pretty close to C, right.

Nim is pretty awesome and has been around for a while. I wish one of big entities out there Google, Mozilla, Microsoft, Apple would have adopted Nim and ran with it instead of inventing their own langauges.


At the time when Go and Rust were conceived Nim would have certainly not been on their radar. Both were announced in 2009, at a time when the Nimrod repository had barely even got off the ground[0]. And Nim doesn't really address one of Apple's chief concerns: smoothly transitioning away from Objective-C.

[0]: https://github.com/Araq/Nimrod/graphs/contributors



Those are only checks, and don't make your code faster (in fact, slower, when checks are enabled). To get efficiency benefits you need a language/compiler designed around static typing. Cython offers a superset of Python that can be statically typed.


Last time I used Perl it went badly, but it was a typical hacked up mess.

For someone mainly in the Python/Java/C++ world is Moose something worth looking at as a mind expansion exercise? Your description makes it sound appealing.

And, dare I ask, what about Perl 6?


> And, dare I ask, what about Perl 6?

As an ardent Perl lover, I asked this question myself. Shameless self-promotion yada yada :

http://simula67.wordpress.com/2014/03/20/perl-6-evaluation/


I want to write something concise and coherent, but it's not happening today :) Instead, assuming you've read the teasers in the Moose manual [1] I'd recommend this [2] interesting comparison of how horizontal code re-use can be achieved in Java, Ruby, PHP and Perl+Moose. There's more philosophical/winding essays on Moose from Chromatic, for example at [3].

Roles/traits/method modifiers/MOP/composability etc. are all great things for getting more reusability... however (and this might sound stupid) the thing that saddens me most when writing Python code is when I find myself adding a bunch of asserts or adding program logic "manually" in situations where I'd normally specify that sort of thing declaratively in a Moose class definition or by referring to a centrally managed Type or delegate stuff through attribute features.

On the other hand, I'm barely into year 2 of full-time Python dev, so perhaps I've yet to find the idiomatic way of doing Python things I used to take for granted in Moose.

Regarding Perl 6, I don't know much about it, except that the original authors of Moose had some inspiration from it.

Is picking up Perl+Moose mind-expanding? It was for me, but I feel that what Moose gave me in Perl was a bit of a band-aid over the fact that it's such a malleable open-ended dynamic language. So as a C++/Java programmer this aspect might not be so enlightening to you, except to see how Moose achieves it in a pretty painless way that I think is very nice and idiomatic for a dynamic language (with the caveats that brings). To put it another way: it gives Perl some of the great benefits of properly declaring things up-front, without the boilerplate/inflexibility pain that I perceive the Java ecosystem's bureaucracy to be (I haven't touched Java for 10 years, so take that with a grain of salt).

If you want to explore some Moose-ish kinds of things within Python, check out [4] (there's another Moose clone in Python that's similarly inactive, sadly) and [5] (Enthought's stuff is perhaps a bit too heavy and incomplete to be the "Moose of the python world", but it gives you a good idea of some of the nice patterns possible when you think outside of the core Python OO featureset).

[1] http://search.cpan.org/dist/Moose/lib/Moose/Manual.pod [2] http://radar.oreilly.com/2014/01/horizontal-reuse-an-alterna... [3] http://modernperlbooks.com/mt/2009/05/perl-roles-versus-inte... [4] https://github.com/frasertweedale/elk [5] http://code.enthought.com/projects/traits/


One of those times I wish I had more than one upvote - thanks!

I definitely have a preference for a more declarative approach, and not in the J2EE giant piles of XML way. Will give these a look for inspiration - thanks again!


Well said :). I like Perl but when I write it I'm aiming above all for readability


> For a Perl-type problem (scanning and parsing big files), Perl is very fast.

I think it's a matter of what you're comparing it to.

Compared to using Perl for a general-purpose problem, Perl for scanning/parsing is fast.

Compared to scanning/parsing with C, Perl is not fast.

    $ ruby -e '1.upto(1000000) { |n| puts "This is line number #{n}" }' > file
    $ time perl -ne 'print if /number 12345/' < file
    [...]
    real    0m0.193s
    user    0m0.189s
    sys     0m0.004s
    $ time grep "number 12345" file
    [...]
    real    0m0.023s
    user    0m0.019s
    sys     0m0.005s
I gave Perl every possible advantage here. I didn't actually even write any Perl except a regular expression, which is delegated immediately to C. I didn't even write the loop in Perl, I like the Perl main() function handle that. And still the C program is almost 10x faster.

Note: these test runs are from Linux. On OS X the Perl results were almost the same, but "grep" was unexplainably way slower. It seems to hang after it's already dumped all of its output. Basically grep on OS X appears to be badly broken somehow.


I'm always wary of these kinds of sub-second benchmarks because more often than not you've only accidentally measured just the compilation and startup times.

I might have some bias though from speeding up a crusty old Perl CGI web apps with multi-second request times down to less than 100ms simply by keeping the perl processes persistent with mod_fcgid or whatever.


> I'm always wary of these kinds of sub-second benchmarks because more often than not you've only accidentally measured just the compilation and startup times.

I increased the iteration count by 10x and observed exactly the same pattern:

    $ time perl -ne 'print if /number 12345/' < file
    [...]
    real    0m1.661s
    user    0m1.586s
    sys     0m0.076s
    $ time grep "number 12345" file
    [...]
    real    0m0.188s
    user    0m0.132s
    sys     0m0.056s


Cool, thanks for entertaining my superstitions :)


I recall anecdotal reports that Perl was faster than egrep in some cases. I never tested it myself and that was a long time ago—could be a bug that is long since fixed.


I think the takeaway is that perl is "fast enough" for perl type problems. Writing a complicated file parser in C would be a nightmare.


When I have a one-time computation job that takes an hour to write and two hours to run in Perl, but in C takes 10 hours to write and half an hour to run, then Perl is faster than C.

And these one-time/rare/short jobs are much more frequent than intense, high throughput C code like the nginix web server or the node javascript interpreter.


I see your point, C is definitely the choice for long running jobs or jobs that will be run more than a handful of times, in my experience however I'm writing a lot of one-off scripts that take 30 seconds tops to run so Perl wins out pretty hard over C.


What you are seeing is different regex engines and capabilities, and grep's focus on pure speed and optimization of a common case and Perl's focus on versatility.

I see very similar results between Perl and grep, and you can see this by also including egrep, which allows slightly more complex expressions:

    [root@stats ~]# time perl -ne 'print if /number 123456/' < /tmp/file
    [...]
    real    0m1.990s
    user    0m1.937s
    sys     0m0.049s
    [root@stats ~]# time grep "number 123456" /tmp/file
    [...]
    real    0m0.158s
    user    0m0.115s
    sys     0m0.035s
    [root@stats ~]# time egrep "number 123456" /tmp/file
    [...]
    real    0m0.150s
    user    0m0.127s
    sys     0m0.023s
But what happens if we use a slightly more complex expression?

    [root@stats ~]# time perl -ne 'print if /number [1]23456/' < /tmp/file
    [...]
    real    0m1.989s
    user    0m1.925s
    sys     0m0.047s
    [root@stats ~]# time grep "number [1]23456" /tmp/file
    [...]
    real    0m1.402s
    user    0m1.366s
    sys     0m0.022s
    [root@stats ~]# time egrep "number [1]23456" /tmp/file
    [...]
    real    0m1.414s
    user    0m1.382s
    sys     0m0.031s
The difference becomes much less pronounced. What if we make the expression just a bit more complex?

    [root@stats ~]# time perl -ne 'print if /number [1]23456[0-9]*/' < /tmp/file
    [...]
    real    0m1.950s
    user    0m1.910s
    sys     0m0.039s
    [root@stats ~]# time grep "number [1]23456[0-9]*" /tmp/file
    [...]
    real    0m9.353s
    user    0m9.307s
    sys     0m0.037s
    [root@stats ~]# time egrep "number [1]23456[0-9]*" /tmp/file
    [...]
    real    0m9.539s
    user    0m9.483s
    sys     0m0.045s
So, now we have the Perl regex engine fairly static across extra complexity while grep and egrep are seeing order of magnitude time increases, and are much slower than Perl at this point. I suspect your first benchmark was the result of a specific optimization grep has that Perl doesn't, or it may be that grep was able to switch to using a DFA regex for that first case, while Perl doesn't both with a completely different regex implementation for special cases like that.

Anecdata: I needed to process a large amount of XML a while back, to the point where a week spent testing and optimizing XML parsing libraries in Perl was worth it, because it could shave weeks or months off the processing time. The winner? A regex that captured attributes and content and assigned name/value pairs directly out to a hash. This was only possible because the XML was highly normalized, but it was actually over 10 times faster than the closest competitor for XML parsing I could fine, and I checked all the libXML libXML2, and SAX libraries I could get my hands on.

In the end, it was something as simple as the following approximation:

    while (my ($doc) = $xml =~ /$get_record_xml_re/) {
        my %hash = $get_record_xml =~ /$record_begin_re$capture_name_and_value_pairs_re$record_end_re/;
        process_record( \%hash );
    }


Your results are interesting and I'd be curious to know why the grep degrades so badly on that last regex.

But the original benchmark was ridiculously biased in favor of Perl by not actually doing anything in Perl.

If Perl is actually being competitive in the unfair benchmark, the benchmark should be made more fair by actually putting some logic in Perl, and writing the equivalent logic in C. At that point, you would start to see C win again (modulo any inherent inefficiencies in grep's regex engine).

> This was only possible because the XML was highly normalized

Another way of putting this is: your regex wasn't actually an XML parser. Things that are actually XML parsers were slower. This is not too surprising.

A 10x slowdown does surprise me somewhat. It doesn't surprise me that you beat libXML or any library that builds the XML structure into a complete DOM before you can process the first record. It does surprise me that you beat SAX by 10x. SAX does have some inefficiency built in, like how it turns attributes into a dictionary internally before giving them to the application. That would probably mean that SAX bindings for Perl to a C parser would have to take a full SAX attribute hash and turn it into a Perl attribute hash. Still, 10x is pretty bad.


> Your results are interesting and I'd be curious to know why the grep degrades so badly on that last regex.

I really do think it has to do with grep swapping out regex implementations based on features needed. The last regex matches a variable length string, so it may trigger a much more complex and/or cpu-intensive regex engine to be used.

> If Perl is actually being competitive in the unfair benchmark, the benchmark should be made more fair by actually putting some logic in Perl, and writing the equivalent logic in C. At that point, you would start to see C win again (modulo any inherent inefficiencies in grep's regex engine).

I see your point, but I think it's less relevant than you suppose. Regular expressions are first class citizens in Perl, just as much as Arrays and Hashes. This doesn't just mean that the syntax has some niceties, but you can actually call Perl code within the regex itself[1], and even use this feature to build a more complex regular expression as you parse[2]. Complaining that Perl uses a regex and it isn't Perl is sort of like complaining Perl is using hashes, and any fair benchmark between C and Perl should just stick to Arrays.

> Another way of putting this is: your regex wasn't actually an XML parser. Things that are actually XML parsers were slower. This is not too surprising.

Yes. I didn't want to give the impression a wrote a general purpose XML parser that beat all the C implementations I could find. I still think it's interesting that well formed regular expressions are performant enough in this circumstance to make them a preferred alternative of the many options. I could have written a simple parser in C that would have been faster, but the solution I ended up with is quite fast, robust, and very, very easy to debug.

> That would probably mean that SAX bindings for Perl to a C parser would have to take a full SAX attribute hash and turn it into a Perl attribute hash. Still, 10x is pretty bad.

I think it's more related to the fact that the actions of the regex parsing implementation when optimized sufficiently is very close in implementation to C code that steps through a char array looking for the record beginning and ending indicators, and for the content between them, and then steps through the record looking for data items and saving the key and value for each. The main benefit of using a regex in Perl is that I get what is probably a fairly close approximation (all things considered) of a C implementation of that without having to write any C, and without having to marshal data back and forth to a library to accomplish it, with very simple and concise code.

There are other tasks which obviously aren't going to be nearly as efficient in Perl, but this exchange was spurred by someone talking about Perl being very fast for a Perl-type problem, which this definitely is.

1: http://perldoc.perl.org/perlre.html#(%3f%7b-code-%7d)

2: http://perldoc.perl.org/perlre.html#(%3f%3f%7b-code-%7d)


> I really do think it has to do with grep swapping out regex implementations based on features needed.

That wouldn't explain why one regex engine is 5x faster than another. Only looking at the regex engines themselves would tell you that.

> Complaining that Perl uses a regex and it isn't Perl

I'm not complaining that it uses a regex, I'm complaining that it doesn't do anything else.

A representative Perl program would use regexes and contain some logic that processes the results of those regexes.

> I think it's more related to the fact that the actions of the regex parsing implementation when optimized sufficiently is very close in implementation to C code that steps through a char array

I used to think that, but it is really not true unless your regex engine contains a JIT compiler.

Specialized machine code for a text parser (which is what you would get from writing C) is significantly faster than generic NFA/DFA code. In these tests, an average of 65% of runtime was saved when the regex engine included a JIT (ie. the specialized code was over twice as fast): http://sljit.sourceforge.net/pcre.html


> That wouldn't explain why one regex engine is 5x faster than another. Only looking at the regex engines themselves would tell you that.

It could definitely explain it, but it may not be the best explanation given the facts. I'll definitely concede that it's pure conjecture.

> A representative Perl program would use regexes and contain some logic that processes the results of those regexes.

Sure, depending on what you want to show. Nobody is trying to say Perl is as fast or faster than C, just that relatively, it's fast for the development cost it requires.

>> I think it's more related to the fact that the actions of the regex parsing implementation when optimized sufficiently is very close in implementation to C code that steps through a char array > I used to think that, but it is really not true unless your regex engine contains a JIT compiler.

I think we're referring to different things, which is mostly my fault for being loose with my terminology. I really only meant close to C in a conceptual manner, which yields some performance benefit by keeping a large chunk of the looping and work storing specific chunks of text low level and in the interpreter. I wasn't trying to imply the regex engine's cost was negligible or the actual machine operations we comparable in a large way.


> > regex parsing implementation when optimized sufficiently is very close in implementation to C code

> I used to think that, but it is really not true unless your regex engine contains a JIT compiler.

The P6 Rules engine is written in NQP so it gets JIT'd on the JVM and MoarVM backends.

Of course it'll be many years before the engine is seriously optimized but it's a good start.

It'll be interesting to see if the code gen of this next gen regexen engine gets good enough in 2015 to make its advantages (most notably the grapheme-by-default design) actually pay dividends.


I'm not sure microbenchmarks are typical.


I think they are a pretty good representation of the performance of doing something directly in the language (as opposed to just calling into lower-level libraries written in a different language).


It's sad that this question hasn't actually even been answered yet.

Some things perl does under the hood to be fast are that integers are (mostly) actually integers under the hood. Arrays are actually arrays under the hood (a fact that makes perl's DBI very fast). And more importantly it has had decades of people trying to make it faster without changing the (sometimes crazy) semantics of the language.

But of course languages with JITs a have massively overtaken it in the performance stakes (PyPy, LuaJIT, and JavaScript). Mostly I think this is a result of lack of funding and huge company spending. Just look at what Facebook have managed to do with PHP.


Also had the same experiences. Perl is relatively fast and memory compact compared to other modern scripting languages.

Perl also has PDL (http://pdl.perl.org/), which is very similar in scope to numpy, but predates it by several years.


predates it by several years

If you want to get technical, Numpy is a continuation of Numeric which in turn pre-dates pdl.


If you're doing numerical computation Numpy will be extremely fast. Most of it is a thin layer over C code so you get the best of both worlds(unless you're the Numpy maintainers).


>related question is: why is Perl so fast?

Easy answer: it's not. On most benchmarks it's on par with Python, PHP, Ruby, etc, some are faster on one, others on some other test.

>once I got over the "oh my god it's full of whitespace" thing

What full of whitespace? Python has as much (or as little) whitespace as normally intended Perl, Ruby, C, etc.

What it has that they don't have is significant whitespace.


"why is Perl so fast?"

Is not. We tested our own code in Perl for doing some text processing in our company.

The result was 89 times slower than our c code, which is not so slow compared with other options, but is slow.

We use a lot of c++ and python, but always for managing encapsulated low level code.

With anything that has to compute intensely, like audio, video or 3d work, the differences are way over 3000 times slower. That is: the computer working for a second or for an hour.

Java is not fast, it has never been. It is not its "forte". You could decide to use java or any high level languages based on its merits, but being fast is not one of them.

It is a good idea to learn assembly and disassembly with the debugger before using high level languages. It gives you the knowledge of what computers are really doing under the hood.


"oh my god it's full of whitespace"

When I write in python, this always triggers my mental slogan "python, the language which makes 'nothing' important".

This is said with a trace of irony, as my most fluent language is bash scripting :)


If you only got a 2x speedup there is something wrong. A more usual number for Perl is about 100x slower than C++ for numerical code. Perl isn't very different than Python in this regard. e.g. on the shootout on the n-body benchmark, Perl is about as fast as Python, and 125x slower than C++.


I re-wrote everything in C++, and picked up less than a factor of two in speed.

Don't think it has been mentioned in the other comments, but this could also be a performance problem with your C++ code - possibly in combination with doing something Perl is good at


Because it was written to do (certain) things very fast.


Unfortunately this blog post is misleading and uninformed. Python and the related other slow dynamic languages perl, php, and ruby are slow not because the dynamic type checks, branches and indirections are costly, but because they are badly written. They are just inefficient and poor interpreters, with bad ops and data structures. In comparison efficient dynamic languages like lisp, scheme, lua or javascript and many others not so well known do exist. Mine, potion, is also in the lua category, and not in the slow PPPR (python, perl, php, ruby) category.

The seminal paper about efficient dynamic interpreters is Ertl's "The Structure and Performance of Efficient Interpreters" http://www.complang.tuwien.ac.at/papers/ertl%26gregg03jilp.p...

The abstract is: "Interpreters designed for high general-purpose performance typically perform a large number of indirect branches (3.2%–13% of all executed instructions in our benchmarks). These branches consume more than half of the run-time in a number of configurations we simulated. We evaluate how accurate various existing and proposed branch prediction schemes are on a number of interpreters, how the mispredictions affect the performance of the interpreters and how two different interpreter implementation techniques perform with various branch predictors. We also suggest various ways in which hardware designers, C compiler writers, and interpreter writers can improve the performance of interpreters."

My own input to this is that the GC, the JIT, the calling convention and esp. the size of the datastructures and ops do matter more, than eliminating the type checks.


I think the paper is even worse, in that it spreads confusion about what an Interpreter for a programming language actually is, what it does and what it provides.

The paper fails to define the concept of what to call an Interpreter. A bytecode interpreter is entirely different from a Lisp interpreter, which actually runs from source structures. Mixing VMs into the picture makes it only worse.

Part of the reason may be that they don't understand what an Interpreter actually provides and why it is used.


It seems learning about compiler design is not as good as it used to be.

Better learn about the JavaScript framework of the day.


Exist a sample I can follow that is made in a imperative language? I'm fluent in python, pascal, C#, obj-c but can survive C if the code is clear...


The numerous Common Lisp implementations out there would agree with you.


Why, oh, why does CPython bother keeping refcounts for small integers? Sure, it lets you make pretty graphs with [sys.getrefcount(i) for i in range(1000)]... but that's an extra memory read and write on every instruction that uses an integer. I can only imagine that not only are these extra instructions, but they're extra instructions that kill pipelining, if the interpreter needed to do "a = 1; b = 1; c = 1" for instance. Maybe they found that the branch needed to not bother refcounting for small integers was even slower? Does anyone know if that was ever tried?


There is no such thing as "small integers" in CPython, only PyObjects, of which PyInt_Type and PyLong_Type are implementations. You can't special case integers without changing every C interface that accepts an object, and special casing every piece of code too (i.e. several hundred million lines of 3rd party extension modules, NumPy, lxml, ...). The trade-off is 1 extra machine word per integer (adding to the 2 + heap/alignment slack already present), or 1 conditional branch per object access everywhere.

This isn't so bad on memory either, since Python keeps a static cache of integers that are reused on every int constructor call. Also allowing for slack, I wouldn't be surprised if the machine word came for free.


When the GP talks about "small integers", I'm assuming he's talking about numbers between -5 and 256, which are cached ahead of time. They're never going to be garbage collected, so refcounting is largely unnecessary.


Once again, you can't implement this without specializing every call site that invokes Py_INCREF or Py_DECREF, or in other words, "1 conditional branch per object access everywhere."


You are right, but cost is probably very low. The conditionality of the branch doesn't matter if it's correctly predicted. On a modern processor, a correctly predicted branch-not-taken is usually indistinguishable from free unless you're already front-end constrained, which is rare for an interpreter. A correctly predicted branch-taken costs the same as a unconditional branch.

This is essentially why a JIT can be almost as fast as compiled code even when including the necessary guards. It's only when the branches become unpredictable that the costs become real. The trick with fast interpreted code is threading the execution so that the hardware is able to predict the branches accurately, but this is largely independent of the number of conditional branches.


Right, so the question becomes whether a branch mispredict is cheaper than simply refcounting an object that already lives in L1 cache (by virtue of having any of its words accessed). For Intel Sandy Bridge, this is 14-18 cycles for the mispredict, or ~4+4 for the L1 word read/write. Now the question becomes how often that mispredict will occur, and there's no real way to measure this short of an implementation.


What you propose sounds like it would be a pure headache for all code which otherwise expects a uniform memory API.

Consider a C extension which takes an object and appends it to a list. If small integers did not have a refcount then that extension would have to have special code, like "if object is not a small integer, then increment the reference count".


I have experience turning a Lisp dialect with refcounted integers into supporting non-refcounted integers, identified by a type tag field in the "value cell" type.

> If small integers did not have a refcount then that extension would have to have special code, like "if object is not a small integer, then increment the reference count".

Easily implemented in one place in the "increment_refcount(obj)" inline function:

    // roughly speaking
    if (is_heap_pointer(obj))
      obj->refcount++;
where "is_heap_pointer(obj)" is an inlined bitmask check like

    (((unsigned int) obj) & TAG_MASK) == TAG_PTR)
If TAG_PTR is zero bits, then a value which satisifes is_heap_pointer can be dereferenced straight.

In a garbage collection implementation, you don't have refcounts, but the garbage collector's "mark object" function does the same check:

   if (!is_heap_pointer(obj))
     return;  // don't try to mark non-heap things

   switch (type(obj)) {
   case TYPE_CONS_CELL
      mark_obj(obj->cons.car);
      mark_obj(obj->cons.cdr);
      break;
   // ...
   }
Another thing is that you provide an API to the extension writers, which abstracts the use of objects. For instance, you can give them a function that can be called like this:

   value n = number(42);
   value z = number(INT_MAX);
The first case might construct an unboxed value because the integer is small enough. But perhaps the second returns a bignum because INT_MAX requires 32 bits, wheres unboxed integers only go up to 30 bits.

So in the first case you get an object with no refcount, whereas in the second you get an object with a refcount of 1. The extension code is written such that it doesn't care.


Those are excellent points. In the context of CPython, Guido van Rossum made the design decision to not use tagged objects, based on experience with ABC implementation, which did.

See https://mail.python.org/pipermail/python-dev/2004-July/04614... .

I just realized though that since the patch to support tagged integers is small, a closed system like the CCP Games distribution of Python, which has no extensions they don't control, might be able to use this idea.


I hadn't considered that extensions which needed direct memory access would need access to these objects. But it's a solvable problem. The proposal might read as follows:

"Objects with IDs between 0 and 2055, inclusive, are considered unreleasable. The Python runtime shall initialize the refcount of these objects to 1 when they are first used, and shall make no guarantee that the refcount on such objects accurately reflects the true number of references to that object. Native extensions may increase and decrease the refcount on these objects as if they were normal objects; if those extensions are well-behaved, then the refcount should never decrease below 1. However, native extensions that can statically reason about whether an object is likely to be unreleasable, may benefit in performance by checking for whether objects passed to their functions are indeed unreleasable, and refraining from modifying or checking their refcount in such cases."


Of course it's solvable. My point is that solving it adds a layer of complexity. Now every decref needs a check for "id". As the id is implemented as a pointer, all of these special objects need to be allocated in a contiguous block of memory for this check to be fast, and with two extra if-conditionals. Also, this range can't change without recompiling all extensions ... or that range must be resolved dynamically, adding more overhead for every single decref.

Nor is there much advantage. We know that more advanced implementations can look at program flow and determine that, say, certain operations are only integer based, and therefore unboxable. Or tracing systems can assume that certain types are constant over multiple calls to the same code, and write optimized versions for those types.

These gives much better (eg, >5x according PyPy) optimizations over the error prone change that you believe might be faster.


Python also keeps refcounts for other objects that shouldn't be garbage collected such as None, True, and False. The reason being to prevent having certain objects that need to be refcounted and certain objects that don't. Otherwise, C code would be littered with if statements like the below:

    if (number < -5 || number > 256) {
      decrement_refcount(number);
    }


To be fair, you'd expect that to live in decrement_refcount.


Does that also imply that a small integer is created on the heap? I would think the heap allocation would be the huge performance killer.


Python is not going to call malloc(..) for every single object that you create. It will allocate a block of same-sized objects and maintain a linked list of free slots.


cPython does alot of work for just about any piece of code. Just a small handful of things:

1. everything's a hash (object data, variable lookup, etc). jmoiron wrote a great article on the topic http://jmoiron.net/blog/whats-going-on/

2. refcounting introduces overhead for every variable access

3. loops are making all kinds of function calls to __iter__, next(), and catching StopIteration, which makes it hard to have a tight loop.

4. There's the GIL, builtin locking isn't great.

5. lots of indirection (lack of value types)


Well, it's hard to write an interpreter with the performance of, say, MRI Ruby.


I'm getting poe'd so I'll check: you're being sarcastic, right?


:3

(do a little research...you might be pleasantly surprised)


Please forgive my ignorance, but why can't Python just be compiled into assembly like C? The compiler should be able sift through the code and see that "x=1" is an integer, or converts to a float when "x=x+1.0".

If I'm doing something like counting from 1 to 10 and summing the count, why can't this be compiled to run the same speed as C?

Obviously since it's interpreted, but when I'm done with development, why can't I just run it through a compiler to get fast assembly code?

Way back when, you could get a BASIC compiler to turn your interpreted BASIC code into assembly.

What fundamental am I missing?


What you describe is pretty much what functional languages with strong static typing, like Haskell, Scala, and Ocaml do. Works very well for examples like you gave: "x = 1", "x = x + 1.0". For expressions similar to these, inferring the types (int, float) works very well.

But things get much more complex very fast, and then the compiler requires the programmer to tell it what types things are.

Then there is the approach taken by JavaScript engines like V8. Run the code for a while, see what the types of variables are, then compile a customized, much more efficient version of the code to speed things up. Still need a fallback, though, in case a totally different kind of value gets stuck in a variable than the type at the time the optimized code was generated. Nothing in JavaScript (or Python) prevents a programmer from assigning any kind of value into a variable, though, so the engine must be prepared for the types of variables to change in unexpected ways.

(The reason the state of the art for these techniques is in the JavaScript engines is because the organizations behind the major browsers pour so many resources into making them fast.)

Now, if you can figure out a way to always predict the types of every variable in a JavaScript or Python program without actually running it, pretty much any CS program would be willing to give you a tenured position on their faculty on the spot. :)


If Python had the same amount of time and energy put into it that JavaScript does from Google and Mozilla it'd be willing to bet it'd be a lot faster.


Like PyPy?


Python is similar in terms of dynamic features to Ruby. I'm actually working on an ahead of time "a static as possible" Ruby compiler.

You certainly can compile languages this dynamic, the problem is that the "simple" way of compiling it doesn't result in very fast code, because you essentially end up mimicking what the interpreters/vms do very closely, and a lot of what ends up being executed will be things like method lookups which are reasonably well optimized in the interpreters.

The problem boils down to the ability to "change the world" at almost any point.

E.g. in Ruby (don't know about Python), you can come back from an eval() call (or load or require) and find that the expression 1 + 1.0 results in the string "two", or will send you an insulting e-mail, because you can even override Fixnum#+.

Now, there's a lot that can be done to infer typing for these types of languages well enough that you can get away with slow-path fallbacks for the rare cases where users does something stupid (like override Fixnum#+) and run faster most of the time, but it's a lot of extra effort.


> The compiler should be able sift through the code and see that "x=1" is an integer, or converts to a float when "x=x+1.0".

The short version is "no, it shouldn't".

The longer version is that, viewing Python's runtime class memberships as "types", the type system it would represent is not one for which inference without explicit declarations is generally decidable, so an AOT compiler for unmodified, unrestricted Python that is expected to infer types at runtime simply would not be possible.

You could AOT compile it into C code with a lot of runtime metadata and indirections -- but that would probably have worse performance than the existing interpreter.

> Way back when, you could get a BASIC compiler to turn your interpreted BASIC code into assembly.

Well, yeah, BASIC is actually structurally quite close to assembly. Its not a dynamically-typed multiparadigm language like Python.


Its hard to be worse than an interpreter. 100X slower than compiled code is typical. Whatever 'metadata and indirection' your C target has, it seems likely to be 10X better than interpreted, easily.


Cython is a very mature python-to-C translator and it offers 0-40% speed improvements in my experience on straight Python. To get any significant speedups you have to add type declarations, replace python function with equivalent C functions from the standard library, turn off bounds checking and bunch of similar things. But if you do that then it does produce some pretty fast code.


> Its hard to be worse than an interpreter.

Its actually quite easy to be worse than an interpreter that is actually a combination of a bytecode compiler and a VM built around the execution model of the language it is for, which is what the main Python interpreter is.


Interpreters have to be implemented somehow; that's a way. Its not fast. Its orders of magnitude slower than native code. Even bad native code.


I believe that's essentially what Cython allows you to do, by way of C. Since the Cython compiler isn't especially advanced, it requires some hints to get good performance, which is why Cython code is usually typed, structured more like C, etc. But still, it executes on your basic idea.


Runtime metaprogramming. In Python you can create new classes and add methods to existing classes at runtime. If you AOT-compile everything to assembly, you have to take away those runtime features, and then it's not Python anymore.


No, you don't. You "just" need slower fallbacks that does metadata lookup for things that are not statically decidable.

Consider how many people end up implement their own introspectable C object models.


I think the very real risk here is that you think you're adding those slower fallback features just for edge cases, but in reality they get used in the majority of cases.

I believe that some code paths critical for performance in Rails use metaprogramming, for example.

I did an experiment in this paper http://dl.acm.org/citation.cfm?id=2647517&dl=ACM&coll=DL&CFI... where I showed that just be rearranging some code you can confuse the Rubinius object allocator into using a slow path for instance variables. That gives you both slow performance, and worse than that, unpredictable performance - the programmer doesn't understand what they've done wrong to trigger that fallback performance.


Why can't a C++ compiler optimize away virtual function tables? If a C++ compiler could deduce the exact derived type of all objects at compile time, it could call the correct virtual function statically instead of going through the extra indirection at runtime.


If you declare a C++ virtual function final the compiler can avoid the virtual call table for any invocations of that function in the future on that object or any of its derivatives. Of course, you then cannot override it in child classes anymore.

If you declare a virtual function inline, if it is invoked from a well defined object (ie, not through a reference or pointer) you also get a similar effect while still being able to override, except the code is not a static call but is just inlined with that one implementation of the function.


It can, but not across dynamic libraries.

You just need to use profile guided optimisation.


The problem is that an average interpret/compiler is going to be "dumb" in that it assumes that when it starts parsing a python file that all of the features are going to be used.

It provides facilities for all of python abilities regardless of what the program is going to do.

Now, there are subsets of python people have developed that can be compiled and all sorts of other stuff.

Beyond that, I'd just say "all this stuff sounds easy but gets much harder when you actually have to program it".

"Why can't it just know!" is the programmer's lament.


Cython, numba, and other projects do this for a subset of the language, but you lose the metaprogramming that makes things like SQLAlchemy possible. You also have to treat most user objects as hashmaps and constantly look up their fields since members can be added or redefined at any time.


Write a function. It accepts two variables. Add them both together and return the result. What type is returned?

Read input from a file or stdin. Eval that input. What type does it return?

Etc.


I don't know the full story either, but I expect it's because lots of things change during runtime. It's too hard to lock things down.


Another reason python is slower than other languages is that abstraction adds significant overhead which most implementations cannot optimize away. See here:

http://blog.reverberate.org/2014/10/the-overhead-of-abstract...


The overhead of abstraction...unless you're using Haskell where typeclass instances get specialized, so you can abstract without slowing things down.


Let's not overstate the case. There's still plenty of abstraction which is tricky or impossible for GHC to optimize. But yes, with static analysis comes the possibility of nice optimizations.


Indeed.

I like the tricky one they're kicking around that would make `lens` faster. Pretty intriguing. I'm hopeful they'll be able to make it work in STG.


The article is not suggesting that abstraction inherently has overhead in all languages. Rather it is exploring the difference between languages where abstractions can be essentially free (in terms of runtime overhead), and languages/implementations where abstraction has a noticeable cost.


Almost all these points apply to javascript as well. Javascript, however, is not really slow anymore.

http://benchmarksgame.alioth.debian.org/u64/benchmark.php?te...


"Javascript, however, is not really slow anymore."

What's easy to lose track of under the steady stream of headlines about this microbenchmark going 5x faster and that microbenchmark going 10x faster and this one microbenchmark being "faster than C" is that Javascript is still a fairly slow language, though. Flick that to compare Javascript vs. C: http://benchmarksgame.alioth.debian.org/u64/benchmark.php?te...

It has come a long way. It is, of course, still usable for many things. But it is still an interpreted language, and my guess is that we've basically plateaued on performance for Javascript, and it's important to remember that it isn't as "fast as C" in any useful sense of the term. And for that matter, to get even that performance out of JS seems to require you to write a very specialized subset of JS that will tickle the optimizers correctly. In practice I suspect you will get even worse deltas over C, which, for all of its many failure modes, does make it often easier to do the fast thing than the slow thing, usually regardless of the correctness of either of them.

(That's why we have asm.js, after all. You can't have both of "Javascript is as fast as C" and "asm.js is several times faster than Javascript", because the composition of both statements is absurd.)


> Almost all these points apply to javascript as well.

Sort of.

The article talks about why CPython, an implementation of the Python language, is slow. You're right that a naive implementation of Javascript would have many of the same problems, and indeed many of the first implementations of JS did have these problems. But due to heated competition, Mozilla, Google, Apple, and Microsoft have invested a huge amount of top-quality engineer-years into optimizing their implementations of JS.

Implementations of Python, in contrast -- and CPython in particular -- have not received the same sort of love.

I largely agree with the notion that the speed of modern JS engines serves as an existence proof that Python could be similarly fast.


that's what PyPy is http://speed.pypy.org/


Previously: https://news.ycombinator.com/item?id=7721096

This is at least 6 months old.


Ah I thought I had seen this before.


> 2. Python is interpreted rather than compiled.

Can we stop saying things like this? Virtually all Python is compiled, as part of the interpretation process.

There is a valid point here, of course. The point is that the top Python compilers do not compile to native code. So say that.

Words have meanings. Use them correctly. <grumble, grumble>


There are three primary methods of running code: interpretation, compilation to object code, and running on a VM/JIT. Obviously python is interpreted by any meaningful definition of interpretation and thus it is not compiled in the typically sense of the word (at least not in the implementation everyone uses).

The reason human language is so expressive is because we can leave out a lot of context and formalism that is required in mathematics and programming languages because the listener/reader will be able to infer it. It's pedantic to expect a writer to hedge against every possible interpretation, when their focus should be on communicating clearly in the first place.


> It's pedantic to expect a writer to hedge against every possible interpretation, when their focus should be on communicating clearly in the first place.

Our discipline (computer programming) is, perhaps more than any other, one in which precision of meaning is of paramount importance. I expect people to hedge. Especially when it's easy: "Python is not usually compiled to native code."

In any case, since Python is virtually always compiled, saying it's not is simply false. That's hardly clear communication, I think.

> Obviously python is interpreted by any meaningful definition of interpretation ....

Now, here I'm not sure what you mean. You've already distinguished interpretation from Just-In-Time compilation to Virtual Machine code. Since the latter is pretty much always what happens with Python, would you then say that Python is not interepreted? (Serious question!)

Me, I don't have a problem with these concepts overlapping (and I don't see why many people do). Python is interpreted. The usual mechanism for this is JIT compilation to virtual-machine code, which is then -- in the case of CPython -- interpreted as-is. In contrast, something like "C" is traditionally Ahead-Of-Time (AOT) compiled to native code. Interpreted: no. Compiled: yes. And then there are languages whose interpreters do not involve a compilation step -- although they are becoming rare.


FWIW, wikipedia says "The main Python implementation, named CPython, [...] compiles Python programs into intermediate bytecode, which is executed by the virtual machine." I'm not sure why people call Python's VM an interpreter, but it's definitely interpreting byte code, not the source directly.

This is very different from Perl, where a line of code isn't parseable until you know the values of the variables, e.g.

    whatever / 25 ; # / ; die "this dies!";


> This is very different from Perl, where a line of code isn't parseable until you know the values of the variables

Perl is crazy, but I don't think it's that crazy.

From perlcompile(http://perldoc.perl.org/5.8.9/perlcompile.html)

    Perl has always had a compiler: your source is compiled
    into an internal form (a parse tree) which is then optimized
    before being run.
If what you say is true, it would not be possible to produce a parse tree prior to running the program.

I could not get your code sample to die based on the type/value of "whatever" -- can you?


Apparently, Perl is that crazy. See https://news.ycombinator.com/item?id=5770531

Basically, you can't always parse a Perl program without running it. There is a subset that you can, not not everything.


> Basically, you can't always parse a Perl program without running it.

That is true, but what the grandparent said is not: https://news.ycombinator.com/item?id=8626454

To expand on this, I think a succinct way of summarizing the issue is: you might have to run BEGIN blocks in Perl to parse the non-BEGIN-block part of the program. The canonical example of this is:

    BEGIN {
        if(arbitrary_function()) {
            eval "sub whatever() { }; 1" or die $@;
        } else {
            eval "sub whatever { }; 1" or die $@;
        }
    }

    # The parsing of this line depends on the result of arbitrary_function()
    whatever  / 25 ; # / ; die "this dies!";
arbitrary_function() can be anything, so the only way to parse the rest of the program is to actually run arbitrary_function().


[deleted]


I am aware of this result. However, that differs from what GP said in one important way.

Grandparent said "This is very different from Perl, where a line of code isn't parseable until you know the values of the variables."

If that were true, it would imply that you could have a loop whose body is parsed differently on every iteration. It would imply that the parser runs inline with the interpreter. Neither of those things are true.

What your link says is that the parsing can vary based on whether or not a subroutine prototype with appropriate arity was previously defined. That is a very different thing. A statement in a loop could be parsed differently depending on what was defined before it, but that parse doesn't change just because a subroutine is defined later, I'm pretty sure.

Yes, this means Perl's parsing is undecideable, because you can conditionally define a subroutine (this might be limited to BEGIN blocks; I'm not sure). But that's not the same as saying the parse varies based on the "values of the variables."


Did you try running that?

  [13:06:36] ~$perl -e 'whatever / 25 ; die "this dies!";'
  this dies! at -e line 1.
  [13:06:36] ~$

  [13:06:38] ~$perl -e "use strict; use warnings; whatever / 25 ; die "this dies!";"
  Useless use of division (/) in void context at -e line 1.
  Bareword "whatever" not allowed while "strict subs" in use at -e line 1.
  Execution of -e aborted due to compilation errors.
  [13:06:38] ~$
After removing the comment the former parses just fine and died with "this dies!". The latter parsed and found a syntax error.


FWIW, wikipedia says "The main Python implementation, named CPython, [...] compiles Python programs into intermediate bytecode, which is executed by the virtual machine." I'm not sure why people call Python's VM an interpreter...

Because CPython implements its virtual machine with an interpreter, without the option of, say, a JIT compiler.


Might want to start by asking the core Python community. Check out the first sentence on wiki.python.org.

Nowadays it's the norm for interpreted languages to be JIT-compiled. They still universally call themselves interpreted languages because the jitter is an implementation detail, and the use of one does nothing to negate the characteristics they're trying to advertise when describing themselves as interpreted languages.

Think of it this way: Can I take a text file containing source code in the language and execute it anywhere that has a special program for executing files written in that language (commonly known as an interpreter)? Or do I have to manually perform an intermediate step of converting the program to some binary representation using a special program for doing that (commonly known as a compiler), and then execute the output that program produces?


There are several compiler-only Lisp implementations which can execute source from text. Nobody there would call them interpreter because of that. In Lisp we call them interpreter, when the implementation traverses the source code during execution. If it compiles the source to some byte-code or machine code, we call in Compiler.

The incremental nature of a compiler, then does not make it that we would call it an Interpreter. We would call it an incremental compiler. That's a compiler, which can compile all expression, regardless of the size of the expression, not just 'whole programs'.


> There are several compiler-only Lisp implementations which can execute source from text.

Why not? I would.


Oops. My "Why not?" above is in reference to this:

> Nobody there would call them interpreter because of that.


Why would you call an 'compiler' an 'interpreter'?

The following is Clozure Common Lisp. Every single input is compiled directly to machine code. Feed it source text and everything is compiled. Each and every expression.

Why should I call a direct native code compiler an 'Interpreter'? Makes no sense.

    ? (let ((f (lambda (a)
                 (+ 1 a))))
        (disassemble f)
        (funcall f 2))


    ;;; (lambda (a) (+ 1 a))
    L0
        (leaq (@ (:^ L0) (% rip)) (% fn))       ;     [0]
        (cmpl ($ 8) (% nargs))                  ;     [7]
        (jne L57)                               ;    [10]
        (pushq (% rbp))                         ;    [12]
        (movq (% rsp) (% rbp))                  ;    [13]
        (pushq (% arg_z))                       ;    [16]

    ;;; (+ 1 a)
        (leaveq)                                ;    [17]
        (testb ($ 7) (% arg_z.b))               ;    [18]
        (jne L39)                               ;    [22]
        (addq ($ 8) (% arg_z))                  ;    [24]
        (jo L32)                                ;    [28]
        (repz)
        (retq)                                  ;    [30]
    L32
        (jmpq (@ .SPFIX-OVERFLOW))              ;    [32]
    L39
        (movl ($ 8) (% arg_y.l))                ;    [39]
        (jmpq (@ .SPBUILTIN-PLUS))              ;    [44]

    ;;; #<no source text>
    L57
        (uuo-error-wrong-number-of-args)        ;    [57]

    3 ; result


An interpreter is a box that takes code in some programming language and executes it. A compiler is a box that takes code is some programming language and translates it into another programming language.

Traditionally, the twain ne'er did meet. However, many (most?) modern interpreters include a compiler as part of their implementation. Using a compiler in this manner is "Just in Time" (JIT) compilation.

So a box that takes Clozure Common Lisp and converts it to machine code is a compiler.

A box that takes Clozure Common Lisp, feeds it into this compiler to obtain machine code, and executes this machine code immediately, is an interpreter, using JIT compilation to do its interpretation.


> An interpreter is a box that takes code in some programming language and executes it.

The interpreter executes that language, not another.

> A box that takes Clozure Common Lisp, feeds it into this compiler to obtain machine code, and executes this machine code immediately, is an interpreter, using JIT compilation to do its interpretation.

It's neither an interpreter not using JIT compilation. It's an AOT compiler. It's just an incremental compiler.


"it's slow"

"just write a C extension"

"but then why use Python?"


    "it's slow"
    "just write a C extension"
    "but then why use Python?"
Because you typically need to write a C extension for like 2% or your code, while you could keep the remaining 98% in an easier to write (and more compact) language.


Because the C extension you need has probably already been written.

This is the same kind of network effect that drove Perl usage for some many years. CPAN had it. Even if I didn't want to use Perl, CPAN was overwhelmingly compelling.

Python has since taken on that mantle.


Because 90% of the runtime is spent in 10% of the code.


Because beautiful control structures and a REPL ?


I'm not sure that these are compelling reasons to use Python, given the alternatives.

Squeezing performance out of Python often involves dropping down to C. This isn't the case with the competition.


There are many languages that offer that alongside an AOT compiler to native code.


Well sure -- but I guess considering Python at all suggests you want or need to. Python control and the REPL are _fine_ reasons to use it. To be clear though, I'm a Tcl-er, but I use it in exactly this context. I meant my comment to stand along the other sibling comments.


Is it just me who sees a large chunk of these as flaws in the Python implementation as opposed to the Python language?

Dynamic typing can often be optimized at the compilation stage - and yes, Python has a compilation stage - this particular example is basic type inference, for example. Even more complex examples can be optimized by emitting specialized versions for the types that the compiler can see it will be called with. Effectively emitting specialized versions of specific nodes in the control flow graph.


Is it just me who sees a large chunk of these as flaws in the Python implementation as opposed to the Python language?

There are a number of design decision in python language that makes it inherently hard to write a fast python implementation.


Ah, the right answer!

There are lots of languages where variables are dynamically typed that can be compiled and optimized with a JIT compiler. In Python, though, any code can mess with any data, using "setattr". You can find all the variables in another module and mess with them at run time, using their names as strings. At compile time, the compiler can't detect that's going to happen.

This is sometimes called the Guido von Rossum Memorial Boat Anchor.

So Python implementations usually have to assume the worst case. Google's attempt at a faster, compatible Python, "Unladen Swallow", was an embarrassing failure. PyPy manages to get past that, but at a huge cost in JIT compiler complexity. After 12 years of work, PyPy is finally shipping stable versions and starting to get some use. It's been all uphill for the PyPy developers, though.

It's kind of sad. Python never achieved its full potential because of the speed problem. Google ended up developing Go because Python was too slow.


Google ended up developing Go because C++ was too hard to use, not because Python was too slow. The initial target market for Go was C++ programmers - it's only once it was exposed to the Internet that they found out there were more Python programmers interested in a faster Python than C++ programmers interested in an easier C++.


How is python substantially different from javascript in this case? Google has a crazy fast javascript JIT compiler, even though javascript has something equivalent to python's setattr().


> even though javascript has something equivalent to python's setattr().

Python provides significantly more hooks into the behaviour of arbitrary objects, and arguably provides more flexibility than JS.

Here are a few notes from Mike Pall (LuaJIT) on Python features making a straight interpreter slower, and the language harder to optimise (JIT tightly): http://lua-users.org/lists/lua-l/2004-11/msg00083.html

Animats is full of shit, the ability to add arbitrary "static" attributes to existing objects is really not the main problem.


He said "a large number of" though, not "all".


You are technically correct. A Sufficiently Smart Compiler(TM) could theoretically optimize python code.

However certain language features can be inherently difficult to optimize. Most of it's over my head, but headius has written some great articles as part of his experience of implementing ruby on the jvm. http://blog.headius.com/2012/09/avoiding-hash-lookups-in-rub...


Pharo 3's results (JIT enabled VM)

[(1 to: 100000000) sum] timeToRun 0:00:00:07.335

http://pharo.org

Version 4 with new VM due in 2015.

Will perform much much better as with the range used, we need to use LargeIntegers as the 32 bit VM must promote to LargeInteger objects. With the 64bit VM, all fits in.


    > pypy -mtimeit 'sum(xrange(1, 100000001))'
    10 loops, best of 3: 153 msec per loop
however the context of TFA is scientific computing, hence pypy being ignored/dismissed



Having an FFI is not relevant, Pypy has an excellent FFI already[0].

You can't interface with the existing SciPy ecosystem which expects the CPython API, which is the reason why pypy doesn't matter in scientific computing.

[0] inspired by LuaJIT and usable from both CPython and PyPy: https://cffi.readthedocs.org/en/release-0.8/


One of my favorite topics lately is Python optimization. A few weeks ago FB put out a release on how they had done fast randomized SVD and in the implementations section they mentinoned using intels kernel math libraries for the BLAS libraries. Largely the result of this is many more functions are F-Contiguous rather than C-contiguous (in linear algebra, fortan obviously wins here). I had done something similar on a much smaller scale about 4 months ago. https://medium.com/@_devbob/from-0-to-warp-speed-b780a2bc36c...

Does anyone else have some good bits or blog entries on these sorts of inner workings and optimizations?


Even if you're already familiar with all of the main reasons that answer the question (that is, you're already familiar with how a VM works), read the Just for fun: a few "never use these" hacks. I mean, wow. That's the kind of nastiness I normally associate with C's free-for-all memory model.


With the nuance that C forces you to deal with those never use these hacks in order to do anything non-trivial.

I can't see why anyone would go and mess with CPython's internals like that in production code. Other than for nefarious purposes, I guess.


In simple language -> because it's difficult to provide the machine with the information it needs to make decisions about efficiency.


Python's GC is a conservative collector as opposed to Java's precise collector => terrible GC compaction.


Short answer: because it doesn't JIT


That's a lossy short answer. LuaJIT with JIT turned off still runs circles around CPython.


Even normal Lua is fast compared to Python, if the shootout games are any bit accurate.

http://benchmarksgame.alioth.debian.org/u64/benchmark.php?te...


You are told how the measurements were made -- Are the benchmarks game measurements accurate?

I think these questions are more to the point: Were the Python programmers as skilled as the Lua programmers? Do the programs do anything like the things your programs do?


True.

So, with element of answer 1) ruled out as Lua is also dynamic, and 2) ruled out by your evidence, remains

3. Python's object model can lead to inefficient memory access


I don't understand why is it compared to C, when it can be compared to Perl or PHP, which are also dynamically typed, interpreted, etc, but much faster.


1. Because TFA is about scientific computing

2. And in that context (and most others), the difference between Perl, PHP and CPython is basically non-existent, you might get 2x on one bench, half on the other


Heh, Python is slow: in order to write fast Python, you have to write in C and conform to the FFI.

I vastly prefer to use tools which don't come misshapen out of the box, personally.


Different tools for different jobs. Being able to develop code quickly is a huge win that Python delivers. Sometimes that makes Python the right tool, e.g. Youtube. Other times slow execution speed makes it the wrong choice.

[0] http://www.gooli.org/blog/youtube-runs-on-python/


other tools also deliver code quickly. that's an evasion of the criticism.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: