
Nuitka — A Python Compiler - lehmannro
http://kayhayen24x7.homelinux.org/blog/nuitka-a-python-compiler/what-is-nuitka/
======
zachbeane
I sometimes see people ask about translating a language like Python to Common
Lisp (or another language that can be compiled) as a kind of optimization.

The problem, in general, isn't that Python and languages like it don't have a
compiler, it's that the semantics of the language are hostile to good
performance by traditional means of compilation. To do what the programmer
requests requires doing things at runtime that are hard to make fast. That's
why things like tracing JITs are being used for things like JavaScript.

The speedup you get from actually compiling Python programs is because the
CPython interpreter is pretty awful, not because compilation is a magic
solution to performance problems. The IronPython guy gave a nice explanation
of this at OOPSLA 2007's Dynamic Languages Symposium - maybe things have
changed in CPython since then.

~~~
Goladus
Compilation also helps distribution, though. Distributing a single compiled
binary for a particular platform is a lot easier than telling people they need
to have some particular version of Python and libraries installed.

~~~
zachbeane
I hear that idea put forth sometimes. When was the last time you downloaded a
single compiled binary from someone? I don't think I've ever done that, except
maybe for darcs.

~~~
petsos
Downloading compiled binaries from someone is what you usually do on Mac and
Windows.

~~~
zachbeane
I've never done that; I always see either dmgs (on Mac) or installers of some
sort (on Mac and Windows). The thing you download is a single file, but they
expand into a lot of files. The Mac makes it look nicer with the .app idea,
but you're still getting a bunch of files in there, one of which might be an
interpreter for some of the other files.

~~~
endtime
You think an archive doesn't contain compiled binaries simply because it
contains more than one file?

~~~
zachbeane
I think compilation to a single binary is not a big advantage when
distributing to others. I think that because I've rarely seen anyone
distribute software that way. If it's a widespread practice in some circles,
I'd like to know more about the circumstances.

------
truiu
The licence (GPLv3) limits its use a bit - at least for people who prefer
other licences like BSD or MIT.

The generated C++ source contains the following comment:

// This code is in part copyright Kay Hayen, license GPLv3. This has the
consequence that // your must either obtain a commercial license or also
publish your original source code // under the same license unless you don't
distribute this source or its binary.

~~~
mycroftiv
I'm confused by this. I thought the license used by a compiler had no effect
on the licenses that could be used for programs compiled by it. If the author
of Nuitka is claiming that software compiled by Nuitka is in fact a derivative
work of Nuitka, that is indeed very problematic.

~~~
reitzensteinm
That was my immediate assumption too, but part of the compiler's output is
going to be some kind of a runtime library. If that itself is GPL 3 and the
code generated by the compiler statically links to it, then I'm pretty sure a
case could be made that the compiler's output is a derivitive work. Kind of
sneaky and non intuitive, though.

This is all way out of my area of expertise, so take it with a grain of salt.

~~~
swolchok
This issue is why bison's license contains a special exception for the part(s)
of itself it includes in its output.
[http://www.gnu.org/software/bison/manual/html_node/Condition...](http://www.gnu.org/software/bison/manual/html_node/Conditions.html)

------
stygianguest
Personally I have more faith in JITs for dynamic languages such as Python. It
just seems a more natural match. That said, I'm sure there are many Python
programs out there that are essencially static.

Did anybody else notice the large number of compilers/interpreters/tools built
for python in comparison to many other languages out there? I think it might
partly be the advantage of having an easy to parse language with well defined
semantics.

~~~
vanschelven
"I think it might partly be the advantage of having an easy to parse language
with well defined semantics."

Either that or the combination of a popular language and poor performance

~~~
pbiggar
I think you're right about the popular language part. But also important is
that it's popular amongst really talented hackers. By contrast, PHP is a
million times more popular than Python, but has almost nobody building tools
for it. The overlap between the type of people who like PHP and those who have
the ability and desire to hack on tools for it, is very very small.

------
scg
Here's a simple test for the curious. It's not a benchmark.

    
    
      import math
      num_primes = 0
      for i in xrange(2, 500000):
        if all(i % j for j in xrange(2, int(math.sqrt(i)) + 1)):
          num_primes += 1
      print num_primes
    

Here's the code above translated to C++ by Nuitka:
<http://pastebin.com/41ueyTEB>

    
    
      # CPython 2.6.6
      $ time python hello.py 
      41538
      real	0m6.377s
      user	0m6.350s
      sys	0m0.020s
    
      # Nuitka & g++-4.5
      $ time ./hello.exe
      41538
      real	0m4.573s
      user	0m4.270s
      sys	0m0.300s

~~~
DisposaBoy
i ran each test 3 times and picked the fastest. I shutdown all servers(mysql,
apache), closed music player to minimize the system effect on the test.

Python:

    
    
        real    0m12.775s
        user    0m12.636s
        sys    0m0.037s
    

Nuitka:

    
    
        real	0m7.096s
        user	0m6.930s
        sys	0m0.093s
    

Lua:

    
    
        real	0m2.641s
        user	0m2.410s
        sys	0m0.010s
    

LuaJit:

    
    
        real	0m0.613s
        user	0m0.600s
        sys	0m0.000s
    

from experience experimenting with a toy scripting language where tried to
make it as minimal as possible, essentially every operation was a function
call so it just figured out what the right function was and called the
corresponding function directly via a C++ function pointer. In the end it was
slightly faster than LuaJit at doing some math for 100,000 times. It was a
file with the same operation pasted 100,000 times which tested parsing
speed... anyway...

TL;DR If you want to know why Python and Nuitka are so much slower, run the
test through callgrind or something that reports the number of functions calls
being made. You will find Python(possibly Nuitka as well) making billions of
functions and allocations while lua's count in maybe a couple hundred million
at most.

Also, I tested my Lua code converted to Python but it only shaved less than 1
second of fastest so no difference.

test.lua:

    
    
        local sqrt = math.sqrt
        num_primes = 0
        for i = 2, 500000 do
            n = 1
            for j = 2, sqrt(i) do
                if (i % j) == 0 then
                    n = 0
                    break
                end
            end
            num_primes = num_primes + n
        end
        print (num_primes)

------
ableal
In this Python-to-C++ vein, there's also Shed Skin ( <http://shed-
skin.blogspot.com/> ), which has been at it for a few years.

------
codedivine
I was developing a compiler called unPython for a while but I have not yet
released it openly. Plan to do so "soon". It is a compiler for annotated
subset of Python (particularly NumPy, rest of it being very slow or not
supported) to a C++ Python module. Will post here once I release it.

~~~
hogu
Why wouldn't you just use Cython? it has good numpy integration as well as
being quite good for everything else?

------
beambot
Should also just check out Psyco: <http://psyco.sourceforge.net/>

"Psyco is a Python extension module which can greatly speed up the execution
of any Python code."

~~~
jaen
Psyco is unmaintained, the developer of it (Armin Rigo) is working on PyPy (a
Python compiler written in Python) instead:
<http://codespeak.net/pypy/dist/pypy/doc/>

They are already seeing quite nice results for computation-heavy benchmarks
with the (tracing) JIT: <http://speed.pypy.org/comparison/>

------
njharman
50% speed up, or even 2x, 3x matters to a few niches and users. But for the
vast majority it's not significant enough to change/accept limitations (not
2.7/3.1)/accept risks (is this as tested/supported as CPython). We'll just
wait for CPython's regular speed improvements and/or effective processing
power to increase another order of magnitude.

Research like this is very important. I just don't think it's wise to be
viewing it as a silver bullet for use in production.

------
pbiggar
Ah, it's phc (<http://phpcompiler.org>) for Python. Excellent!

