
Python startup time: milliseconds matter - vanni
https://mail.python.org/pipermail/python-dev/2018-May/153296.html
======
quotemstr
I've always been disappointed by how large software projects, both FOSS and
commercial, lose their "can do" spirit with age. Long-time contributors become
very quick with a "no". They dismiss longstanding problems as illegitimate use
cases and reject patches with vague and impervious arguments about
"maintainability" or "complexity". Maybe in some specific cases these concerns
might be justified, but when everything garners this reaction, the overall
effect is that progress stalls, crystallized at the moment the last bit of
technical boldness flowed away.

You can see this attitude of "no" on this very HN thread. Read the comments!
Instead of talking about ways we can make Python startup faster, we're seeing
arguments that Python shouldn't be fast, we shouldn't try to make it faster,
and that programs (and, by implication, programmers) who want Python startup
to be fast are somehow illegitimate. It's a dismal perspective. We should be
exercising our creativity as a way to solve problems, not finding creative
ways to convince ourselves to accept mediocrity.

~~~
geofft
This isn't an attitude of "no" \- it's an attitude of "yes" to other things.
The arguments are that making Python startup fast makes other things worse,
and we care about those other things.

Here are some other things we can say "yes" to:

\- Rewrite as much of Mercurial in Rust as possible, which will provide
performance improvements well beyond what Python can possibly offer.
[https://www.mercurial-scm.org/wiki/OxidationPlan](https://www.mercurial-
scm.org/wiki/OxidationPlan)

\- Spend resources on developing PyPy, which (being a JIT) has relatively slow
startup but much faster performance in general, for people who want fast
performance.

\- Write compilers from well-typed Python to native code.

\- Keep CPython easy to hack on, so that more people with a "can do" spirit
can successfully contribute to CPython instead of it being a mess of special
cases in Guido's head.

Will you join me in saying "yes" to these things and not convincing ourselves
to accept mediocrity?

~~~
turbinerneiter
I'm so longing for a Python(like) compiler.

MicroPython put together a Python in 250kb. Why the hell can't we make an LLVM
frontend for Python that can use type hints for optimization? Sure, you lose
some dynamic features as you optimize for speed, but that's the dream. Quickly
write a prototype, not caring about types, optimize later with adding types
and removing dynamicism.

I'm currently learning Racket and LLVM and I have about 70 more years to live.
I'm gonna try make Python fast on slow weekends 'til I die.

~~~
quotemstr
Unless your Python compiler can use cpython modules without a massive
performance penalty, it's going to see very limited adoption. The ecosystem
matters.

~~~
sitkack
We had a chance with ctypes and now CFFI to move away from the platform
calcifying cpython module interface that is overly coupled to the cpython
runtime. I am very disappointed in the lack of affordances that cpython gives
to alternative pythons to support their work. The stdlib is a crufty mess that
is overly coupled to cpython as well. The batteries are corroded and need to
be swapped out for a modular pack.

------
rossdavidh
I have to say that my first reaction was: "maybe you shouldn't use python for
this, then". If you are using a language in a way that it gets worse in
subsequent versions, that's a good sign that they're optimizing for something
other than what you care about.

The programming language R does not, as I understand it, optimize for speed,
because they are optimizing for ease of exploratory data analysis. R is
growing quite rapidly. So is python, actually. It doesn't mean that either one
is good at everything, and it's probably the case that both are growing
because they don't try to be good at everything. A good toolbox is better than
a multi-tool.

~~~
indygreg2
(I authored the linked post)

While the "maybe you shouldn't use Python" comment could be construed as
trolling to some, there is definite truth to your line of reasoning and I
agree with comment.

I absolutely love Python as a programming language for the space it is in. But
as someone who needs to think long term about maintaining large projects with
lifetimes measured in potentially decades, Python has a few key weaknesses
that make it really difficult for me to continue justify using it for such
projects. Startup time is one. The GIL is the other large one (not being able
to achieve linear speedups on CPU-bound code in 2018 with Moore's Law dead is
unacceptable). General performance disadvantages can be adequately addressed
with PyPy, JITs, Cython, etc. Problems scaling large code bases using a
dynamic language can be mitigated with typing and better tools.

Python can be _very_ competitive against typed systems languages. But if it
fails to address its shortcomings, I think more and more people will choose
Rust, Go, Java, C/C++, etc for large scale, long time horizon projects. This
will [further] relegate Python to be viewed as a "toy" language by more
serious developers, which is obviously not good for the Python ecosystem. So I
think "maybe you shouldn't use Python for this, then" is a very accurate
statement/critique.

~~~
qaq
If one needs Rust, C/C++ level of performance I doubt there is much Python can
do and one can wonder if Python was ever the right tool for such a project.

~~~
mixmastamyk
It’s a great tool for prototyping.

~~~
rrcaptain
If you expect to need the performance of a statically typed, compiled language
I don't see why you'd prototype in a dynamically typed, interpreted language.

~~~
pas
That's why build systems still look like black magic infused with even darker
sh, and a bit of perl sprinkled all over, presumably because the previous
maintainers were all out of goat blood.

------
deaps
I totally understand that milliseconds matter in the use case described in the
article.

For me, personally, I use python to automate tasks - or to quickly parse
through loads and loads of data. To me, startup speed is _somewhat_
irrelevant.

I built a micro-framework that is completely unorthodox in nature, but very
effective for what I needed - that being a suite of tools available from an
'internet' server, available to me (and my coworkers) over port 80 or 443.

My internet server, which runs python on the backend (and uses apache to
actually serve the GET / POST) literally spits out pages in 0.012 seconds.
Some of the 'tools' run processes on the system, reach out to other resources,
and spit the results out in under 0.03 seconds (much of that being network /
internet RTT). To me, that's _good enough_ \- adding 30 or even 300
milliseconds to any of that just wouldn't matter.

I totally get that if Python wants to be a big (read bigger?) player then
startup time matters more...but for my personal use cases, I'm not concerned
with the current startup time one bit.

~~~
cjhanks
As expected, language start up time only matters to _some_ people. Often in my
case, Python is used to build command line tools (similar to the case of
Mercurial).

In such an event, the start-up time of the program might _dominate_ the total
run time of the application. And on my laptop or desktop with a fast SSD with
good caching and a reasonably fast CPU... that still ends up being 'okay'.

But once I put that on an ARM chip with a mediocre hard drive - some python
scripts spend so long initializing that they are practically unusable. Whereas
the comparable Perl/BASH script runs almost instantaneously.

Often to make Python even practically usable for such systems I have to
implement my own lazily loaded module system. Having _some_ language which
allowed me to say...

    
    
        import(eager) some_module
        import(lazy) another_module
    
    

Which could trigger the import process only when that module becomes necessary
(if ever).

~~~
mywittyname
Have you tried moving import statements into the functions where they are
invoked? My understanding is this is effectively the same as lazy loading the
module[1].

[1] [https://stackoverflow.com/questions/3095071/in-python-
what-h...](https://stackoverflow.com/questions/3095071/in-python-what-happens-
when-you-import-inside-of-a-function)

~~~
cjhanks
I have, and it actually works well (performance wise). The maintenance burden
is a little higher.

~~~
sli
A little Python preprocessor that lets you annotate your lazy modules sounds
like a fun little toy project, actually. Not something I'd use for real, but
it would be fun to build.

~~~
jsmeaton
3.7 makes it easier to use dynamic imports. [https://snarky.ca/lazy-importing-
in-python-3-7/](https://snarky.ca/lazy-importing-in-python-3-7/)

------
stinos
Sort of related story: we needed a scripting language able to run on an x86
RTOS type of architecture compiled with msvc and looked into CPython because,
well, Python is after all quite a nice language. After spending a considerable
amount of time to get it compiled (sorry, don't recall all the issues there,
but main one was that the source code assumed msvc == windows which I know is
true for 99% of cases but didn't expect a huge project like CPython to trip
over) it would segfault at startup. During step-by-step debugging it was
astonishing how much code got executed before even doing some actual
interpreting/REPL. Now I get there might not be a way around some
initialization, but still it simply looked too much to me and perhaps not
overly clean either. Moreover it included a bunch of registry access (again,
because it saw msvc baing used) which the RTOS didn't have in full hence the
segfault. Anyway we looked further and thankfully found MicroPython which took
less time to port than the time spend to get CPython even compiling. While not
a complete Python implementation, it does the job fur us, and it gets away
with startup/init code of just something like 100 LOC (including argument
parsing etc). Yes I know it's not a fair comparision, but still, the
difference is big enough to, at least for me, indicate CPython might just be
doing too much at startup and/or possibly spend time on features which aren't
used by many users and/or possibly drags along some old cruft. Not sure, just
guessing.

~~~
airstrike
[http://boo-lang.org/](http://boo-lang.org/)

~~~
tecleandor
Context?

------
faho
Mercurial's startup time is the reason why, for fish, I've implemented code to
figure out if something might be a hg repo myself.

Just calling `hg root` takes 200ms with hot cache. The equivalent code in
fish-script takes about 3. Which enables us to turn on hg integration in the
prompt by default.

The equivalent `git rev-parse` call takes about 8ms.

~~~
majewsky
Wow, that's quite a difference.

But 8ms is still too slow for me. :) I implemented the Git recognition code
myself in my own prompt using the minimal amount of FS operations [1], and it
renders in 5 ms from start to finish, including a "git:branch-name/47d72fe825"
display.

[1]
[https://github.com/majewsky/gofu/blob/master/pkg/prompt/git....](https://github.com/majewsky/gofu/blob/master/pkg/prompt/git.go)

~~~
avar
(I work on Git in my copious free time)

One of the reasons git-rev-parse takes slightly longer than your
implementation is that you just unconditionally truncate the SHA-1 to 10
bytes. E.g. run this on linux.git:

    
    
        git log --oneline --abbrev=10 --pretty=format:%h |
        grep -E -v '^.{10}$' |
        perl -pe 's/^(.{10}).*/$1/'
    

You'll get 4 SHA-1s that are ambiguous at 10 characters, this problem will get
a lot worse on bigger repositories.

Which is not to say that there isn't a lot of room for improvement. The scope
creep of initialization time is one of the things that tends to get worse over
time without being noticed, but Git unlike (apparently) Python makes huge use
of re-invoking itself as part of its own test suite (tens of thousands of
times), so it's naturally kept in check somewhat.

If you have this use-case I'd encourage you to start a thread on the Git
mailing list about it.

------
std_throwaway
This is truly a problem. Even more so if you host your application on a
network directory. Loading all the small files takes ages. I really wish there
would be a good way to compile the whole application with all the modules into
one package once you're ready to release. I really wish the creators of Python
would have given such use-cases more consideration.

Edit: I'm aware that there are solutions that put everything a program touches
into a kind of executable archive. A single file several hundred Megabytes in
size. I've tested it. It doesn't really pre-compile the modules. The startup
time was exactly the same.

~~~
sametmax
Nuikta ([http://nuitka.net/](http://nuitka.net/)) already does that and much
more:

\- it compiles your program and make it stand alone so you can distribute just
the exe

\- it makes it start faster

\- it makes it run faster

\- it's fully independant of the system python. Actually your system doesn't
even need a python at all

I don't get why it's not used, it's very robust, compatible with 3.6 and on
some of my script I get about x4 speed up just on start up alone.

~~~
scrollaway
First time I hear about this, and I've looked for alternatives to cxfreeze and
its cousins in the past.

Any time I see something like this, I feel like I'm hearing about some
homeopathic cancer cure. If Nuitka actually does what it says it does, it's
solving a big recurrent problem for the Python community, so why is nobody
talking about it?

~~~
sametmax
That's the question I'm asking.

Not only it's a beautiful tool, but the author has been quietly and steadily
working on it for 8 years. Compatibility is the number one goal.

The guy has a lot of rigor and humility, so maybe communication suffered ?

~~~
merb
hg allows loading modules at runtime. maybe thats a problem.

~~~
sametmax
I doubt it, since you can pass pass manually a list of all modules you want
nuikta to embed with --recurse-plugins=MODULE/PACKAGE

~~~
thedufer
But the problem is that you can't know the list you need at embed time for hg,
because extensions are arbitrary python files discovered at runtime.

------
marshray
Here's what has worked for me:

1\. Don't do that. Either write the driving app in Python or write the
subprocesses in an ahead-of-time compiled language. Python's a great language
but it's not the right tool for everything.

2\. Be parsimonious with the modules you import. During development, measure
the performance after adding new imports. E.g., one graph libraries I tried
had all its many graph algorithm implementations separated into modules and it
loaded every single one of them even if all you wanted to do was to create a
data structure and do some simple operations on it. We just wrote our own
minimal class.

~~~
JoshTriplett
> Don't do that. Either write the driving app in Python

Even if you write the driver in Python, you don't necessarily want to call the
program you're testing in the same process. You might want independent
launches of a command-line tool, so that you test the same behavior people get
when they run the tool. Otherwise, your test suite might trip over some
internal state that gets preserved from run to run in ways that command-line
invocation wouldn't.

~~~
marshray
Good point, but I didn't mean to sound specific to testing apps. I just meant,
in general, write big apps using Python top-down and something precompiled if
you must spawn lots of external processes.

------
the_mitsuhiko
The slow startup combined with the general lack of interest of the Python
ecosystem to try to find a solution for distributing self contained
applications was the biggest reason we ended up writing out CLI tool in
something else even though we are a Python shop.

I'm really curious why there hasn't been much of a desire to change this and
it even got worse as time progressed which is odd.

~~~
chubot
This is disappointing to me too, but I think there are some problems baked in
to the language that make it hard.

\- Imports can't be parsed statically.

\- Startup time has two major components: crawling the file system for
imports, and running all the init() functions of every module, which happens
before you get to main(). The first is only fixable through breaking changes,
and the second is hard to fix without drastically changing the language.

The import code in CPython was a mess, which was apparently cleaned up by
importlib in Python 3, through tremendous effort. But unfortunately I think
importlib made things slower?

I recall a PyCon talk where as of 3.6, essentially everything about Python 3
is now faster than Python 2, EXCEPT startup time!

This is a shame, because I would have switched to Python 3 for startup time
ALONE. (As of now, most of my code and that of my former employer is Python
2.) That would have been the perfect time to address startup time, because
getting a 2x-10x improvement (which is what's needed) requires breaking
changes.

I don't think there's a lack of interest in the broader Python community, but
there might be a lack of interest/manpower in the core team, which leads to
the situation wonderfully summarized in the recent xkcd:

[https://xkcd.com/1987/](https://xkcd.com/1987/)

FWIW I was the one who sent a patch to let Python run a .zip file back in 2007
or so, for Python 2.6 I think. This was roughly based on what we did at Google
for self-contained applications. A core team member did a cleaner version of
my patch, although this meant it was undocumented until Python 3.5 or so:

[https://docs.python.org/3/library/zipapp.html](https://docs.python.org/3/library/zipapp.html)

The .zip support at runtime was a start, but it's really the tooling that's a
problem. And it's really the language that inhibits tooling.

Also, even if you distributed self-contained applications, the startup time is
not great. It's improved a bit because you're "statting" a zip file rather
than making syscalls, but it's still not great.

In other words, I have wondered about this "failure" for over a decade myself,
and even tried to do something about it. I think the problem is that there are
multiple parts to the solution, the responsibility for these parts is
distributed. I hate to throw everything on the core team, but module systems
and packaging are definitely a case where "distributed innovation" doesn't
work. There has to be a central team setting standards that everyone else
follows.

Also, it's not a trivial problem. Go is a static language and is doing better
in this regard, but still people complain about packaging. (vgo is coming out
after nearly a decade, etc.)

I should also add that while I think Python packaging is in the category of
"barely works", I would say the same is true of Debian. And Debian is arguably
the most popular Linux package manager. They're cases of "failure by success".

~~~
blattimwind
> The import code in CPython was a mess, which was apparently cleaned up by
> importlib in Python 3, through tremendous effort. But unfortunately I think
> importlib made things slower?

AFAIK importlib is entirely written in Python and kinda portable across Python
implementations, while previously most was C code. It's not surprising
something gets slower when written in Python.

> Also, even if you distributed self-contained applications, the startup time
> is not great. It's improved a bit because you're "statting" a zip file
> rather than making syscalls, but it's still not great.

PyQt applications on Windows typically take two or more seconds before they
can do _anything_ , including Enterprise's favourite start-up pastime,
splashscreens. Except maybe if you rolled your own .exe wrapper that displayed
the splash before invoking any of the Python loading.

That's really, really poor in the age of 4 GHz CPUs from the factory, RAM big
enough to fit multiple copies of all binaries on a PC and SSDs with at the
very least tens of thousands of IOPS.

~~~
chubot
Yeah the time it takes is really mind-boggling if you think about it. I
recently had occasion to run Windows XP in a VirtualBox on fairly underpowered
Macbook Air.

It not only installed really fast, but at runtime it was fast and responsive!
And so were the apps! Virtualbox recommends 192 MB of RAM for Windows XP, and
it works fine. Amazing. Remember when everyone said Windows was slow and
bloated?

On the other hand, I tried compiling Python 2.7 on a Raspberry Pi Zero, which
is probably around as fast as the machines at the time of XP (maybe a little
slower). This was not a fun experience!

Actually I just looked it up, and the Pi Zero has 512 MB of RAM. So in that
respect it has more power. Not sure about the CPU though... I think I ran
Windows XP on 300 Mhz computers, but I don't remember. Pi Zero is 700 Mhz, but
you can't compare clock rates across architectures. I think they're probably
similar though.

\---

FWIW I think importing is heavily bottlenecked by I/O, in particular stat() of
tons of "useless" files. In theory the C to Python change shouldn't have
affected it much. But I haven't looked into it more deeply than that.

~~~
blattimwind
IIRC the foundation originally compared the RPi's CPU to a Pentium II running
at 266 MHz, which seems about right to me.

IME/IMB startup is almost always CPU bound (to a single CPU thread, of
course). Note that the Linux kernel also caches negative dent lookups, so
these "is there something here?" stat()s will stay in the dentry cache.

------
avar
Best out of 5 times on my Debian testing laptop for a "hello world", in order
of worst to best:

    
    
        ruby2.5:     83ms (-e 'puts "hi"')
        python3.6:   35ms (-c 'print("hi")')
        python2.7:   24ms (-c 'print("hi")')
        perl5.26.2:  8ms  (-e 'print "hi"')
        C (GCC 7.3): 2ms  (int main(void) { puts("hi"); })

~~~
henry_flower

      $ time ruby --disable-gems -e 'puts "hi"'
      hi
    
      real    0m0.009s
      user    0m0.008s
      sys     0m0.000s

~~~
akx
Sure, two can play that game. Let's add `-S`, which disables the site module,
to the Python invocations.

    
    
        perl ........... 0m0.012s
        siteless py27 .. 0m0.018s
        gemless ruby ... 0m0.021s
        siteless py36 .. 0m0.025s
        siteful py27 ... 0m0.034s
        siteful py36 ... 0m0.049s
        gemful ruby .... 0m0.089s

------
oneweekwonder

      in the temple of tmux
      for the cult of vi
      we sit and wait 
      for venv to activate

------
fpoling
Given it is known how slow Python at starting up, I am puzzled why Mozilla
continue to use it in build scripts. Perl is just as portable but starts up
like 10 times faster.

~~~
indygreg2
I wrote the linked post and maintain the Firefox build system. The reason is
that in 2018 (and for the past 10 years honestly) and it is far easier to find
people who know Python than Perl. Python is essentially the lingua franca in
Firefox land for systems-level tasks that don't warrant a compiled language.
As I said in the post, Rust will likely infringe on Python over time due to
performance, lower defect rate, long-term maintenance advantages, etc.

~~~
fnord123
>As I said in the post, Rust will likely infringe on Python over time due to
performance, lower defect rate, long-term maintenance advantages, etc.

Indeed. As hg is moving to use more and more Rust:

[https://www.mercurial-scm.org/wiki/OxidationPlan](https://www.mercurial-
scm.org/wiki/OxidationPlan)

~~~
cpeterso
indygreg knows because he is the author of that wiki page. :)

[https://www.mercurial-
scm.org/wiki/OxidationPlan?action=info](https://www.mercurial-
scm.org/wiki/OxidationPlan?action=info)

~~~
fnord123
I didn't realize that so thanks for pointing it out. But sometimes I also
comment for third parties reading along to pick up some info as well.

------
falcolas
Naive question: If the startup time matters because you're imposing that
startup time hundreds or thousands of times - why not remove the startup time?

I'm saying, use the emacs model. Start hg with a flag so it simply keeps
running in the background while listening on a port. Run a bare-bones nc
script to pipe commands to hg over a port and have it execute your commands.

This isn't a new problem, nor is it even a new solution. No complete re-write
of the interpreter or the tool required.

Anyways, that's my 2¢

~~~
price
There's a paragraph in the OP about how they've actually done this:

> Mercurial provides a `chg` program that essentially spins up a daemon `hg`
> process running a "command server" so the `chg` program [written in C - no
> startup overhead] can dispatch commands to an already-running Python/`hg`
> process and avoid paying the startup overhead cost. When you run Mercurial's
> test suite using `chg`, it completes _minutes_ faster. `chg` exists mainly
> as a workaround for slow startup overhead.

Just like this isn't what the usual `emacs` command does (it's `emacsclient`),
it isn't what the usual `hg` command does either. There are some disadvantages
to this solution and some assumptions it makes, which have apparently led the
Mercurial maintainers to conclude, like the Emacs maintainers, that it won't
work as the default. Hence the desire for solutions that will.

------
agumonkey
I hate to admit it but it's partly why I don't use clojure (pardon the side-
topic) more. I can't bear the boot process and the overall cost.

Python is free to tinker, and all similar interpreters are joyful to use.
Anything else is probably better for heavy duty jobs environments.

~~~
john2x
I feel the same way about Clojure. For a LISP, where interactive development
via the REPL is supposed to be one of the value-add of the language, it falls
completely short in that aspect. They even have entire libraries and design
patterns (Component, etc.) to work around the issue, but I find it ridiculous
that your entire program structure is dictated by the fact that the REPL boot
up time is too damn slow.

------
bayesian_horse
Knock, Knock, who's there? ---- Long Pause --- Java!

------
zwieback
Python is great for prototyping or even real apps if performance isn't so
critical. However, more than once I've found myself in the situation where I
wrote a bunch of Python code and then end up starting that code up from
another app, just like the thread discusses and I immediately feel like this
is an anti-pattern.

What's even more annoying is that my Python code usually calls a whole lot of
C libraries (OpenCV, numpy, etc.) So it's like this: app->OS process->python
interpreter->my python code->C libraries. That just really feels wrong so I'd
like two things:

1) better/easier path to embed python scripts into my app e.g. resident
interpreter

2) some way of passing scripts to python without restarting a new process,
this may exist and I'm unaware

------
SZJX
Startup time has also been the biggest gripe I have with Julia so far.
Otherwise it's a truly fantastic language to work in. I wasn't able to put the
`__precompile__()` function to good use it seems - the time it takes to
execute my program didn't change at all for some reason. Or maybe it's not
actually the startup time that caused the problem, but the time it took to
perform file IO. Anyways my program now takes even much longer time to startup
than the Python equivalent (though it runs much faster once started), which is
a real disappointment.

~~~
ChrisRackauckas
precompile doesn't store native compiled code. Though I know from talking to
the compiler developers that this is high on the 1.x list. It's an annoyance
but at least it has a clear solution in sight.

------
area_man
Truly solving this problem is difficult, but you can hack around it with a
zygote process to remove a substantial amount of overhead, in exchange for
RAM. While this is generally more of win for server processes, you can see it
applied to a CLI proof of concept:

[https://github.com/msolo/pyzy](https://github.com/msolo/pyzy)

------
NelsonMinar
I agree Python's startup time is too slow. But one trick you can use to
improve it some is the "-S" flag, which skips site-specific customizations. On
my Ubuntu system it brings Python 3.6 startup time down from 36ms to 18ms for
me; still not great, but it helps.

The drawback is this may screw up your Python environment, not sure how easy
it is to work around it if it does.

------
pjc50
Proposed solution: steal undump from emacs.
[https://news.ycombinator.com/item?id=13073566](https://news.ycombinator.com/item?id=13073566)

Perhaps it would be possible to read in the source files, compile them, and
preserve an image of the state immediately before reading input or command
line.

~~~
hobofan
I'm pretty sure Python 3 already does this, and that's what the __pycache__
directories it creates when running a command are for.

~~~
lmm
Those are only bytecode. It helps a little but you still have to load the file
from the filesystem and run it on import.

------
makecheck
I was kind of amazed how penalized a script could be by collecting all its
“import” statements at the top. Once somebody’s command couldn’t even print
“--help” output in under 2 seconds, and after measuring the script I told them
to move all their imports later and the docs appeared instantly.

------
kelvin0
I'm a long time python user, but never really peeked under the hood. However,
I have a few ideas.

Optimized modules loading: maybe loading a larger 'super' module would be
faster than several smaller ones? For example a python program could be
analyzed to find it's dependent modules, and then pack all these into a
'super' module.

Once the python program executes, it would load the single 'super' module and
hopefully bypass all the dynamic code which each module runs when imported to
load up.

As mentioned previously, this is just off the top of my head and would
certainly warrant more investigation/profiling to confirm my hypothesis.

------
bgongfu
I'm pretty sure it's too late by now for Python, but I've had some success
with compiling C-based interpreters [0] to C; that is, generating the actual C
code that the interpreter would execute to run the program. That way you can
reuse much of the interpreter, keep the dynamic behavior and still get nimble
native executables.

[0] [https://github.com/basic-gongfu/cixl#compiling](https://github.com/basic-
gongfu/cixl#compiling)

------
crb002
Should be able to hot boot the VM with the right tooling. You can reuse HPC
"checkpoint" code from supercomputing environments as a generic hammer for
Python/Ruby/JVM. Some Russians figured out how to do it in userspace without a
kernel mod: [https://criu.org/Main_Page](https://criu.org/Main_Page)

------
beiller
People here comment about how python is slow, but even fast/slow is I'll
defined in my opinion. You don't see people hacking tensor flow (generally) in
native languages to speed it up, they just enable CUDA. I'm imagining fast
definition is limited to massively parallel server workloads with io.

------
est
Reminds me of buildout. It's awful piece of software. We used in previous
Flask project, and a simple flask shell takes 3 minutes to start. If you type
`import` in CPython shell it will literally freeze for a few seconds. Because
it injects one sys.path for each packages specified!!!

------
Murrawhip
I'm just curious why more people don't make use of chg to avoid the mercurial
startup time. It seemed to solve it for me - are there drawbacks?

~~~
im3w1l
At a guess: They didn't hear about it (keeping your ears open is a cost not
everyone wants to pay). They don't want to bother with setting it up. They
don't want to bother with maintaining it (even if it's as simple as reinstall
every time you get a new computer).

~~~
Murrawhip
That's fair. My experience with it so far has literally just been aliasing hg
to chg. It performed all the magic in the background for me.

------
YesThatTom2
A recent article in ACM Queue included an off-hand remark that Go's compile
time is often faster than Python's startup time. Just sayin'

------
2RTZZSro
Would it be feasible to keep a set of Python interpreters around at all times
and use a round robin approach to feed each already-on interpreter commands
then perform an interpreter environment cleanup out-of-band after a task is
complete?

~~~
tlrobinson
Or just use the operating system's `fork` system call?

There's also nailgun for Java which sounds like it works a little differently:
[http://martiansoftware.com/nailgun/](http://martiansoftware.com/nailgun/)

~~~
amelius
I guess a fork()'ed process triggers copy-on-write behavior in the kernel once
the process starts running. So that's latency (the copying) you could still
optimize away.

~~~
greglindahl
You might want to measure it before you optimize it! Oftentimes I find that
forks where I don't write much are quite inexpensive, with little COW action.

~~~
amelius
You could be right.

Also, could it be that Intel's cache hierarchy plays some really smart tricks
behind the scenes, to make this fast?

------
jahvo
Slowness is the elephant in the room in Python land. It's like everybody has
decided to cover their eyes in front of this massive pachyderm. A massive
delusion

~~~
solarkraft
Delusion? I don't think many cover their eyes. More likely they've come to
accept that for their use cases the performance is good enough and the
convenience gain well worth it.

------
dingo_bat
It's weird to see someone make this pitch when C systems software development
regularly requires us to try and shave off microseconds. Millisecond delays
mean you've already fucked up.

------
peterkelly
For use cases where performance is important, using an interpreted
(implementation of a) language is a bad idea.

There are many great reasons to use Python, but execution speed is not one of
them.

~~~
std_throwaway
But what else do you switch to?

Imagine you import tons of modules which often are only available in Python.
This gets you going really quickly with your project and it runs very
smoothly. Transferring this to C++ would probably take so long you won't even
finish to find out before you run out of funding.

I have hopes that Rust or some descendant of Rust will get us there in maybe
10 years but in the meantime it would be better to get Python up to speed as
good as possible.

~~~
julvo
For some use cases, Go might be a good alternative to Python. It's performant,
yet simple and readable and it has a great ecosystem.

~~~
no_wizard
I find laying out a go project with dependencies to be miserable though. I
write a lot of go and python code and why on God’s green earth did google
decide to handle dependencies by having you effectively git clone a library
and then have these huge tree deps subdirectories and so on I don’t know. It’s
mess. I vastly prefer the approach of either having one canonically designated
folder to install dependencies to where each dependency is a top level
directory that can be scanned or having them all stored in a project folder
relative to root of the workspace similar to node_modules then the current
mess.

Drives me nuts. Look at this layout if you don’t know what I’m referencing:

[https://golang.org/doc/code.html#remote](https://golang.org/doc/code.html#remote)

Yes I have used dep

And yes I have all kinds of shortcut commands for navigating my Go workspace

But look at this canonical example From the docs and it’s easy to see this is
a giant mess that is utterly unnecessary. No other language I’ve used has had
such a gross problem with dependency layout. It also leads to gross import
strings.

It’s one of my biggest criticisms of Go to be honest.

Thst and it’s reliance on environmental variables that have to be set
perfectly in order to actually do anything (thank god for direnv
[https://direnv.net/](https://direnv.net/))

~~~
julvo
Isn't what a dependency folder like node_modules pretty much what the vendor
folder is to go. Have you tried using a dep mangement tool like glide before?

~~~
no_wizard
Thst is one I haven’t tried I tried a few others whose names I forgot because
most alt managers seemingly stopped development but this looks active. Thank
you!

I now would shift my argument to the fact that Google should really just adopt
this as their standard if it works as advertised.

Link for those who haven’t seen it:

[https://glide.sh](https://glide.sh)

------
strkek
IIRC CPython devs reject performance-related patches if they cause the code to
become "less readable".

>> I believe Mercurial is, finally, slowly porting to Python 3.

I just gave up on Mercurial since it didn't let me push to BitBucket nor to an
Ubuntu VPS via SSH.

For better or worse, Git just works.

~~~
michaelhoffman
I'm confused, since my daily workflow is pushing to Bitbucket via hg and ssh.

~~~
Noumenon72
My work is considering switching to Git mostly because we think adopting
Bitbucket will force us to. Is that not true? I'd love some reasons to stay
with hg...

~~~
sicariusnoctis
Bitbucket was originally for hg though. Why would switching to git be better?

