

Python: New patch may give speedups of up to 20% - ks
http://svn.python.org/view?rev=68924&view=rev

======
lunchbox
Although this is a potentially exciting development, I dislike when people use
phrases like "speedups of _up to_ 20%." Quoting only the best-case scenario is
misleading, since it says little about the average case. What if that 20%
speedup only occurs 2% of the time, and the other 98% of the time, you have a
<2% improvement? I understand that people want to promote and be recognized
for their work by referencing the best cases, but it would be more helpful if
they quoted typical results, like the range of speedups achieved on the
benchmarks.

~~~
dfox
Interpreter speedups of this kind (and probably almost any kind) tend to not
have significant impact on most real-world workloads, but nonetheless are very
important in the long run. Thus it is counterproductive to measure impact of
these changes on any realistic benchmark, because it will be lost in the
noise. And phrase "on various benchmarks" looks adequate for commit message.

~~~
lunchbox
I think this speedup (and any optimization) is important in the long run; I
just dislike when numbers are used without being put in context. I see this in
the media all the time. Like when they say that taking [X
supplement/drug/food] can reduce your risk of cancer by up to 20%, and then
you find out in the fine print that it's only if you're over 65, female, over
200 pounds, have a specific rare genetic mutation...

~~~
dfox
Numbers without context are certainly misleading, but my opinion is that
commit message is not right place to specify when such speedup applies,
because for expected readers that is either obvious or irrelevant (or both at
once).

Other question is whenever this patch is really so interesting to HN. It is
pretty much insignificant for real applications and I'd consider the computed
goto trick well known in circles where it might be useful for something (which
means interpretive VMs and not much else).

~~~
lunchbox
Agreed on both points.

------
jd
When you look at the bottlenecks in opcode throughput for virtual machines
it's almost always lousy branch prediction. Using a branch table instead of a
switch block is not going to make much of a difference in any realistic
program. The CPU has no idea which function is going to get called next so it
can never fully use all the advanced look-ahead mechanisms.

For those interested, look at all the performance improvements made in the
ocaml and perl and clisp interpreters. There is a lot of low hanging
performance fruit left in the Python interpreter - the question is whether
it's worth plucking. For instance, python can get a dedicated accumulator (I
don't think it has one now) and you can introduce new opcodes that combine
functionality of frequently occurring opcode sequences. For instance, python
now has a LOAD_CONST and BINARY_ADD instruction, but not LOAD_COST&ADD
instruction like clisp.

By combining the functionality of two instructions into one you're essentially
saving an opcode dispatch. Saving a jump is going to make a bigger difference
than making a jump faster. So that's where I think the python guys should
focus on if they really want to improve performance.

~~~
kragen
Last I remember, this patch was purported to make a big difference precisely
because it improves branch prediction, by giving the CPU a bunch of different
possible sites to jump from.

------
adamsmith
This surprises me because it's the second of its kind in the past couple
months. (There was another patch that helped CPUs exercise their execution
branch prediction capabilities.)

I would have guessed that this phase in a language's evolution would come much
earlier in the adoption curve. ...like, five years ago for python.
Interesting!

Here's to further optimizations!

~~~
gojomo
This appears to be the commit of that same patch; see msg80515 at:

<http://bugs.python.org/issue4753>

