
Why is Python so slow? - rasputinmachine
https://hackernoon.com/why-is-python-so-slow-e5074b6fe55b
======
wahern
Lua is just as dynamic as Python, yet PUC Lua is significantly faster. LuaJIT
is faster still, even before it JITs the code, and doesn't have a startup
penalty. And LuaJIT doesn't sacrifice anything--it's 100% compatible with Lua,
including Lua's C API.

The real difference is that PUC Lua is developed by a team of 2-3 people (1-2
for the VM, 1-2 for the compiler. LuaJIT was written by a single individual
who relieved himself of the necessity to design and temptation to change the
language or interface semantics. Engineering complex, deeply interdependent
systems doesn't scale well. While Lua is nearly as old as Python (1993 vs
1991), the Lua VM has been substantially refactored multiple times; LuaJIT at
least 1.5x times. The Lua team doesn't accept outside contributions, ensuring
that implementation complexity doesn't outgrow their capacity to refactor the
internals as they see fit. LuaJIT is almost impervious to outside
contributions. (It's not that LuaJIT can't survive without Mike Pall, it's
that it can't survive without a Mike Pall-like maintainer.)

Also, the Lua C API and control flow semantics were _very_ deliberately
designed. Some people find the stack-based API to be verbose, but it's crucial
in keeping the number and complexity of public interfaces low, while ensuring
that C modules can fully and seamlessly leverage constructs like closures and
coroutines. CPython is much more difficult to refactor because the module API
and semantics are too leaky.

But ultimately it comes back to having a small, dedicated team. Larger teams
tend to rely upon too many internal abstractions. It's a consequence of having
to orchestrate so many cooks in the kitchen. It's not that abstractions are
bad. It's that these are the wrong kind of abstractions in the wrong place in
the pursuit of the wrong goals. Both PUC Lua and LuaJIT actually have
beautifully consistent code with well-designed internal interfaces. Crucially,
the interfaces aren't designed to avoid maintainer conflict but rather in
service of the overall design and performance of the engines.

~~~
rasputinmachine
>PUC Lua is significanly faster [than python]

[https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/faster/lua-python3.html)

Damn, where are you getting your numbers from?

EDIT: If I've got the wrong version of Lua, you may proceed to call me an
idiot.

~~~
wahern
In the benchmarks where Python wins it's multithreaded. And even with 4 cpus
against 1 cpu Lua still hangs on.

EDIT: Binary trees is peculiar in that it loses by 10x. There's something
going on with memory consumption and possibly GC behavior, but I've already
wasted enough time and don't care to dig in. There's a reason the Computer
Language Benchmark Game has limited utility. It's pretty clear the core Lua VM
(e.g. bytecode execution, function dispatch, etc) is substantially faster.
Once you move past the core interpreter loop it becomes difficult to compare
apples-to-apples, if at all.

EDIT2: Ok, it was bugging me. Turns out that the Python binary tree program
has an interesting (clever?) bug. make_tree isn't building a binary tree, it's
building... well, I'm not sure what to call it... but has substantially fewer
nodes. It _walks_ like a binary tree, though, and so the output of the program
is the same. You can do the same thing in Lua with

    
    
      local function BottomUpTree(depth)
        local t = {}
        for i=1,depth do
          t = { t, t }
        end
        return t
      end
    

which ~ quadruples performance, which means apples-to-apples (1 cpu Lua to 4
cpu Python) Lua is almost _twice_ as fast.

~~~
igouy
> In the benchmarks where Python wins it's multithreaded

For reverse-complement and binary-trees, the Python program uses less cpu time
— it doesn't matter that it's multithreaded.

~~~
wahern
For binary-trees it's because of the issue I mentioned. For reverse-complement
I believe its partly because the low-level transformation is happening through
a Python C module: see reverse_translation = bytes.maketrans(...).

Lua is notoriously batteries _NOT_ included. I suspect only a fraction of
Python programmers have written a Python C module, at least for production
use, because they rarely need to. By contrast, almost all PUC Lua programmers
I have met have written Lua C modules, as you can't even do simple things like
directory manipulation or sockets without third-party modules (few of which
are "standard") or by writing a custom module. Lua is somewhat handicapped in
these benchmark games as the language is fundamentally designed to be extended
with C code. It doesn't come with any real modules of its own beyond a simple
(non-regex) string matcher and bindings to the ISO C library. The whole
_point_ of Lua is that it's super trivial to write C modules; and as compared
to writing Python or Perl C modules it's incomparably more straight forward,
even for inexperienced C programmers. (The use of binding generators is
usually discouraged as 9 times out of 10 manually written bindings are both
easier to write and maintain.) But it obviously wouldn't be any more fair to
permit custom-written Lua C modules in the benchmarks.

If you look at all those benchmarks in just a little depth, I think it should
be fairly clear that the core Lua VM is significantly faster--roughly 2x.
OTOH, that's often irrelevant, especially if Python's core module ecosystem
directly or indirectly helps you solve an issue. It's a fundamental limitation
of the benchmark game, _particularly_ regarding Lua. Lua isn't just a
scripting language; it's thoroughly designed to be embedded in or extended by
C. See the paper "Passing a Language through the Eye of a Needle: How the
embeddability of Lua impacted its design",
[https://queue.acm.org/detail.cfm?id=1983083](https://queue.acm.org/detail.cfm?id=1983083)

~~~
igouy
> It's a fundamental limitation of the benchmark game, particularly regarding
> Lua.

It's a fundamental limitiation of language vs language comparison.

See "Apples and Oranges" —

view-source:[https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/dont-jump-to-conclusions.html#apples-and-
oranges)

