
If PyPy is 6.3 times faster than CPython, why not just use it? - neokya
http://stackoverflow.com/questions/18946662/if-pypy-is-6-3-times-faster-than-cpython-why-not-just-use-faster-interpreter
======
seiji
[actual real code story example]

I wrote two approaches to the same problem.

The first approach uses simple python data structures and greedy evaluation.
It runs under CPython in 0.15 seconds. Running under pypy takes 1.2 seconds.
pypy is 8x slower.

The second approach (using the same data) builds a big graph and visits nodes
v^3 times. Running under CPython takes 4.5 seconds. Running under pypy takes
1.6 seconds. pypy is almost 3x faster.

So... that's why. "It depends." But—it's great we have two implementations of
one language where one jits repetitive operations and the other evaluates
straight-through code faster.

~~~
burntsushi
I have to echo this sentiment here. Every time I see a post about PyPy being
fast, I think, "Hmm, perhaps I should try out this package I'm working on and
see if it performs better." After getting a PyPy environment working---
sometimes by installing forks that are PyPy compatible---I almost always end
up with real word uses that are noticeably slower with PyPy as opposed to
regular ol' CPython.

I may not be coding to PyPy's strengths, but I've gone through this process on
several different packages that I've released and I tend to see similar
results each time. I want to try and use PyPy to make my code faster, but it
just doesn't seem to do it with real code I'm using.

~~~
kingkilr
Please file bugs. We can't fix issues we don't know exist.

------
haberman
> Because PyPy is a JIT compiler it's main advantages come from long run times
> and simple types (such as numbers).

It is not _inherent_ to JIT compilers that they need long running times or
simple types to show benefit. LuaJIT demonstrates this. Consider this simple
program that runs in under a second and operates only on strings:

    
    
      vals = {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o"}
    
      for _, v in ipairs(vals) do
        for _, w in ipairs(vals) do
          for _, x in ipairs(vals) do
            for _, y in ipairs(vals) do
              for _, z in ipairs(vals) do
                if v .. w .. x .. y .. z == "abcde" then
                  print(".")
                end
              end
            end
          end
        end
      end
    
      $ lua -v
      Lua 5.2.1  Copyright (C) 1994-2012 Lua.org, PUC-Rio
      $ time lua ../test.lua 
      .
      
      real	0m0.606s
      user	0m0.599s
      sys	0m0.004s
      $ luajit -v
      LuaJIT 2.0.2 -- Copyright (C) 2005-2013 Mike Pall. http://luajit.org/
      $ time ./luajit ../test.lua 
      .
      
      real	0m0.239s
      user	0m0.231s
      sys	0m0.003s
    

LuaJIT is over twice the speed of the (already fast) Lua interpreter here for
a program that runs in under a second.

People shouldn't take the heavyweight architectures of the JVM, PyPy, etc. as
evidence that JITs are _inherently_ heavy. It's just not true. JITs can be
lightweight and fast even for short-running programs.

EDIT: it occurred to me that this might not be a great example because LuaJIT
isn't actually generating assembly here and is probably winning just because
its platform-specific interpreter is faster. _However_ it is still the case
that it is instrumenting the code's execution and paying the execution costs
associated with attempting to find traces to compile. So even with these JIT-
compiler overheads it is still beating the plain interpreter which is only
interpreting.

~~~
kingkilr
PyPy also manages to speed this program up (or at least, what I understand
this program to be):

    
    
        Alexanders-MacBook-Pro:tmp alex_gaynor$ time python t.py
        .
    
        real    0m0.202s
        user    0m0.194s
        sys 0m0.007s
        Alexanders-MacBook-Pro:tmp alex_gaynor$ time python t.py
        .
    
        real    0m0.192s
        user    0m0.184s
        sys 0m0.008s
        Alexanders-MacBook-Pro:tmp alex_gaynor$ time python t.py
        .
    
        real    0m0.198s
        user    0m0.190s
        sys 0m0.007s
        Alexanders-MacBook-Pro:tmp alex_gaynor$ time pypy t.py
        .
    
        real    0m0.083s
        user    0m0.068s
        sys 0m0.013s
        Alexanders-MacBook-Pro:tmp alex_gaynor$ time pypy t.py
        .
    
        real    0m0.083s
        user    0m0.068s
        sys 0m0.013s
        Alexanders-MacBook-Pro:tmp alex_gaynor$ time pypy t.py
        .
    
        real    0m0.082s
        user    0m0.067s
        sys 0m0.013s
        Alexanders-MacBook-Pro:tmp alex_gaynor$ cat t.py
        def main():
            vals = {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o"}
    
            for v in vals:
                for w in vals:
                    for x in vals:
                        for y in vals:
                            for z in vals:
                                if v + w + x + y + z == "abcde":
                                    print(".")
    
        main()

~~~
hnriot
wow, so much faster than lua

~~~
kingkilr
You can't compare measurements taken on different computers, for all you know
the OP has a potato and I have a speed demon toaster.

~~~
mistercow
Don't knock potatoes. I do all of my hardcore data analysis on my Very Large
Potato Array.

------
kbuck
I actually experimented a while ago by running a long-running Twisted-based
daemon on top of PyPy to see if I could squeeze more speed out. PyPy did
indeed vastly increase the speed versus the plain Python version, but once I
discovered that Twisted was using select/poll by default and switched it to
epoll, my performance issues with the original CPython version were gone (and
PyPy couldn't use Twisted's epoll at the time).

Another major issue was that running the daemon under PyPy used about 5 times
the memory that the CPython version did. This was a really old version of
PyPy, though, so they have probably fixed some of this memory greediness.

~~~
alexk
What version of twisted and os you were using? I'm asking because all latest
Twisted releases are using epoll by default.

~~~
kingkilr
It's worth noting that PyPy also supports epoll (and kqueue), and has for a
few versions.

~~~
kbuck
I remember looking at that, but Twisted's epoll reactor was a C extension at
the time. It looks like Twisted 12.1.0 switched to using the epoll provided by
the Python base library, but that was released about a year after I was
originally installing this daemon (and I was installing everything from apt,
so add another year to the age of the packages I got).

------
RamiK
Because CPython came first.

Because Python isn't about performance.

Because it's not really 6.3 times faster for most(any?) use cases.

Because VMs are misunderstood as superfluous abstractions where a good
interpreter should be instead.

Because VMs are understood as superfluous abstractions where a good OS should
be instead.

...

And most of all, because better hardware costs less then the extra man hours
involved the transition.

~~~
sanxiyn
Re: any? use cases. While dramatic speedup is not too common, MyHDL, a
hardware description and verification language written in Python, is known to
run 10 times faster on PyPy.

[http://www.myhdl.org/doku.php/performance](http://www.myhdl.org/doku.php/performance)

I also remark that MyHDL's simulation is competitive (on PyPy) with open
source Icarus Verilog, in case you wonder why would anyone write HDL in
Python.

------
dmk23
PyPy has LOTS of problems with 3rd party libraries. If you want to deploy it
in production you'll have to check that each one of them does exactly what you
need it to and oftentimes you are surprised how things are broken.

We are using PyPy for some of our services (where it is doing about x3 faster
than CPython), while for some others (Django UI - at least the way we are
using it) we found that PyPy is actually slower, so we are sticking with
CPython.

Unfortunately PyPy team has not even made it their priority to test PyPy with
Django. It is one thing to have a cookie-cutter test suite that measures
simple use cases and it is entirely different matter to test how well it can
run the whole stack of apps on top of it.

~~~
easytiger
Why don't you help out then, instead of complaining?

~~~
louhike
I do not understand this kind of comment. Even if Open Source is a great
thing, some people just want to use the tool. They are not interested or do
not have the time to participate in the project.

~~~
easytiger
Then you have no right to complain.

Much open source exists because people in the event of scratching their own
itch, happen to release something as it might be of use to others.

When others contribute it helps make the solution more generic/robust as it is
guided towards meeting multiple requirement sets.

Do you have any idea how shit it was writing software in the 1990s without the
breadth of tools we have today that are open source and permissively licensed?

PyPy is a smallish project with very limited funding. If you try it out and it
doesn't help you complete your goals, find another way. That might be to make
it better somehow and if you don't have the resources, find an other way to
meet your business goals.

~~~
kingkilr
Please stop replying to people like this. It's extremely discouraging to
people, and not helpful to the PyPy project (of which I'm one of the
developers).

People have a right to have a problem with our software without trying to fix
it themselves.

------
Elv13
I would like to point out that while (c)python reliance on C seem to be the
problem here, it is not inherent to general use of C either. Again, Lua vs.
LuaJit prove that a Jit implementation can be a drop in to an other (non Jit)
one.

On Gentoo, I tend to force applications to link against LuaJit and it works
just fine.

Message written in LuaKit using AwesomeWM and my alt+tag show VLC, Wireshark
and MySql Workbench running, all on LuaJit to some level of success (most are
flawless). All of those applications doesn't (AFIAK) officially support
LuaJit.

~~~
snogglethorpe
LuaJIT isn't a perfect drop-in, however, as it has various limitations that
base Lua doesn't (in addition to the obvious ones if you're using Lua 5.2
features, which LuaJIT doesn't support).

In my case it's because of LuaJIT has address-space limitations that standard
Lua does not, due to its use of NaN-encoding for pointers. There are some
inputs where LuaJIT simply runs out of memory (or rather, address-space),
which work fine when run using standard Lua.

[For my app the speedup from LuaJIT isn't so great anyway, so it's just a
minor annoyance.]

~~~
dmpk2k
_Lua 5.2 features, which LuaJIT doesn 't support_

Just being pedantic: LuaJIT supports some 5.2 features. Search for "5.2" on
this page:
[http://luajit.org/extensions.html](http://luajit.org/extensions.html)

------
tomrod
It's a great question. I'd say for myself I've been hesitant to use CPython or
PyPy simply because their documentation seems focused on the extremely
technical, rather than a person just trying it out for the first time.

I know Python, and I know C. But I'm worried ending up down rabbit holes in
PyPy and competitors. I've not been able to find a really solid tutorial or
parse the docs very well.

Perhaps it's just me, though. That's always possible. I just see a large
barrier to adoption.

~~~
OseOse
When you say CPython, do you mean Cython? I've only recently learned what
"Python" really is myself, and it's easy to miss the difference between these
two:

[http://en.wikipedia.org/wiki/CPython](http://en.wikipedia.org/wiki/CPython)

[http://en.wikipedia.org/wiki/Cython](http://en.wikipedia.org/wiki/Cython)

~~~
tomrod
I meant Cython, yes.

------
robert-zaremba
I'm successfully using PyPy in production. It's about data processing. The
most important dependencies: redis driver, beautiful soup.

    
    
                 PyPy   cPython
      jobs/sec    ~60      ~8  
      mem usage   1.5G     2G
    

When using lxml on cPython, the jobs/sec increased to 10 (on that time lxml
wasn't supported by PyPy, now it is). I really encourage to give a try to
PyPy.

------
z3phyr
Will CPython always remain the reference implementation?

