
Making Python Programs Blazingly Fast - gilad
https://martinheinz.dev/blog/13
======
andrepd
This is a terrible article. How this has gotten this kind of traction is
unexplainable to me.

> _This is the program I will be using for demonstration purposes_

>Never comes up again for the rest of the post.

???

>Let's show you how to profile code.

>Also, here's a bunch of unprofiled suggestions with such precise and helpful
comments as "slow" and "fast".

?????

> _Python haters always say, that one of reasons they don 't want to use it,
> is that it's slow. Well, whether specific program - regardless of
> programming language used - is fast or slow is very much dependant on
> developer who wrote it and their skill and ability to write optimized and
> fast programs._

This is so ridiculous it's honestly laughable. It's such an obvious falsehood
that the only explanations is either the person is truly this clueless, or
else they are wilfully spewing bullshit. A bare metal language like C/C++ will
_of course_ let you do things faster than a heavy dynamic language like
Python.

The mental gymnastics people do to justify not learning another tool. You know
what they say, if all you have is a hammer, everything looks like a nail.

> _First rule of optimization is to not do it._

If this person is representative, this explains why computers are hundreds of
times faster but most software feels slower than in 1999.

~~~
Accujack
>If this person is representative, this explains why computers are hundreds of
times faster but most software feels slower than in 1999.

I think they are representative of a lot of developers. With the continued
pace of chip development for the last 35 years, there hasn't been a continuing
need to program for performance - in general, unless programmers did something
very dumb or were dealing with large amounts of data, they could just write
the way they wanted and let the hardware handle making their program fast.

Contrast this with early computer games - to get the best performance some
games would actually boot your computer without an OS, sacrificing some
convenience to get the last few percent of speed needed out of the system
because it was the only way to out perform the competition.

One reason there's such opportunity in the present state of CPU technology
(clock speeds have halted at about 4 Ghz in favor of more cores) is that few
people remember how to program for performance, and those that do are
handicapped by a bloated OS built for profit rather than value.

~~~
ddalex
The real reason is in numbers - how many people that could program a PC
starting from bootstrap in assembly are out there, and again how many
programmers that can paste Java(EE|script)? code together to make something
work are out there?

The world needs tons of software, and the vast majority of that software just
needs to do some things right some of the time, and an average Java EE
developer toiling away in a cubicle is good enough to deliver it.

Writing efficient software is a HARD problem, and it doesn't make economical
sense to actually write efficient software, it makes sense to write just good
enough software and throw hardware at it. For the price of a developer-year
you can provision hundreds of machines to run that piece of code.

------
nayuki
The title is bad; the article doesn't deliver on the promise of making Python
programs "blazingly" fast.

The first example given (the exponential function) is basically the worst
scenario, because it's a purely numerical computation expressed in pure Python
code. Whereas Python's performance is okay-ish for I/O or calling C modules.

From doing Project Euler solutions, I have ample evidence that for pure
numerics (e.g. int, float, array), Java is anywhere from 10× to 30× faster
than pure Python code executed in CPython.
[https://www.nayuki.io/page/project-euler-
solutions#benchmark...](https://www.nayuki.io/page/project-euler-
solutions#benchmark-timings)

I believe it is basically impossible for Python to win back all that
performance loss without adopting radical and jarring features like static
typing, machine-sized integers, and no more "every number is a full-fledged
object".

~~~
bjoli
Chez scheme is just as dynamic as python and is about as fast as C# running on
mono IIRC. My scheme was probably worse than my C# was when I did project
Euler though.

Chez does unboxed integer arithmetic (but not floats) and does not have to do
any OO-like dispatch, and is also probably one of the best language
implementations there are.

~~~
jerf
"Chez scheme is just as dynamic as python"

Is it? Python is _really, really_ dynamic, which contributes to its slowness.
You can directly change an instance's __class__ attribute. You can add
properties to classes dynamically, changing how fundamentals of how attributes
get looked up at run time. You can write a new class, using a new metaclass,
and then set an existing instance to the new class.

A great deal of why Python is so slow is that it is really _too_ dynamic. A
language doesn't really want to be "as dynamic as Python".

~~~
bjoli
Chez lacks a built in OOP system, but there is nothing prohibiting you from
adding something like CLOS which does all python does and more (much faster
than python).

Most modern lisp compilers do a lot of different things to make CLOS fast,
though, prefilling caches and all that for you. Not only that, you can connect
to a running program and redefine it while it is running.

~~~
jerf
"Chez lacks a built in OOP system,"

In that case, the answer is actually _no_ to my question. Yes, of course you
_could_ program in that level of dynamicness, because you could in any
language, but it will then slow you down. No sensible CLOS would be as dynamic
as Python.

Like I said, in a lot of ways, you don't _want_ to be as dynamic as Python,
and I advise against language advocates seeing the phrase "Python is more
dynamic than your language" as a cue to jump up and start insisting that they
are just as dynamic as Python. Even in hindsight, I'd say the level of
dynamicness in Python was a mistake. You don't need it to have a nice, usable,
dynamic language, but it has been a ball & chain around its legs in terms of
performance for decades.

To be clear, this isn't a criticism of dynamic languages as a concept. I have
criticisms, but these aren't it. This is a criticism of Python specifically. A
dynamic language can be pretty nice with, let's say, two or three layers of
dynamicness, but Python has four or five. If you follow the _full_ process
that Python has to go through to resolve "x.y", including all possible points
where you might have done something to affect the result, it's crazy overkill.
In Guido's defense, when he was writing it way back when, that wasn't clear.
There wasn't a lot of highly-relevant prior art to look at for that style
language.

~~~
bjoli
CLOS is just as dynamic as Python _and_ fast, at least in SBCL and LispWorks.
CLOS is probably the most expressive object system you can find, and if you
wanted it fast you would restrict it somewhat to allow for at least some of
the dispatch to happen at compile time :)

------
zmmmmm
> So, let's prove some people wrong and let's see how we can improve
> performance of our Python programs and make them really fast!

I have to say, the desperate lengths Python programmers will go to to use it
for things it was not meant for rather than learn or use other languages is
one of the aspects I most dislike about it. _However fast you make it, the
same effort would have made it that much faster again in a performant
language_.

~~~
Waterluvian
I love Python as a glue language. So much heavy lifting done in numpy or
opencv or whatnot. But Python as the interface makes it trivial to explore,
experiment, and glue together a workflow, especially when the solution is
unclear.

Then at some point if Python isn't needed because you know exactly what you
want your software to do, rewrite it in C++ or whatever.

Also with CFFI and other interoperable libraries, it's really quite easy to
write some heavy work in a more appropriate language and call into it.

~~~
socialdemocrat
For that kind of workflow you would be far better off with e.g. Julia. You get
the same advantages as Python as having a language you can experiment with
until you find a solution. Only difference is the optimization step later does
not involve having to rewrite in another language.

If you already know Python, and Python packages already do all you will ever
need then sure stick with that. But I don't get why people would go to such
lengths to avoid using a new language. Being proficient in Julia is a lot less
work than maintaining proficiency in Python and C++.

~~~
leotaku
The last time I checked, using Julia was clunky at best with ridiculously high
jit-compile times, packages that refused to build on my machine etc.. What is
more, many of the "best" Julia libraries were seemingly just linked-in Python
code.

I don't mean to discredit the advantages Julia clearly has over Python, but
these are just the kinds of problems that make people like me stick with tried
and tested last-gen languages like Python.

~~~
socialdemocrat
Did you ever check after 1.0 was released? In the earlier days it was a lot of
problems with packages. Totally agree. JIT compile times are much better now.

A lot of the issues are simply that people have not learned a sensible
workflow with Julia. Python guys have a lot of habits that don't translate
well to Julia. I know because I work daily with two hardcore python guys. I
notice all the time how we approach problems in very different ways.

Python guys seem to love making lots of separate little programs they launch
from the shell. Or they just relaunch whole programs all the time.

In Julia in contrast you focus on packages from the get go and you work
primarily inside the Julia REPL. You run Revise.jl package which picks up all
the changes you make to your Julia package.

I guess it just depends on the workflows you are used to. For me it is the
opposite. Whenever I have to jump into our Python code base I absolutely hate
it. It is very unnatural for me to work in the Python way. I also find Python
code kind of hard to read compared to Julia code.

But I know Python coders have the opposite problem. Basically Python guys look
a lot at module names when reading code. Julia developers look more at types.
The difference makes some sense since you don't really write types in Python
code.

I found that the new Python type annotation system helped me feel at home in
Python.

------
kaslai
> First rule of optimization is to not do it.

This is an unfortunately common misunderstanding of the phrase: "premature
optimization is the root of all evil."

Optimization is a crucial part of developing successful software. It can be
harmful to get overzealous with certain types of optimization, however basic
wins like using string builder primitives or formatted strings from the outset
is hardly premature. Some optimizations can only be realized at the early
conceptual stages too; going for those early on isn't always premature.

~~~
dahart
Yeah, I see that a lot. People always leave off the last part of the quote,
which was the actual point.

“We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. _Yet we should not pass up our
opportunities in that critical 3%_.”

~~~
socialdemocrat
So annoying. I would get people telling me to not make an obvious performance
improvement that adds no complexity to the code and which is obvious. Yet some
basically insist on using the least performant solution possible as somehow
being good software engineering. It is insane how rule bound people can get.
No wonder religion exits. People just live inventing rules and forcing others
to follow them.

~~~
m_mueller
there's tons of dogma in programming. likely more than in most professions
because it's not really a scientifically driven field. e.g. I'd love to just
put all Emacs and VI zealots together in a room and show them Engelbart's 1968
demo, Clockwork Orange style.

~~~
socialdemocrat
Hahaha that is a good one. I remember as a C++ programmer there was several
occasions where I saw a goto statement would have given the cleanest and most
maintainable code (typically exiting deeper loops). Yet I always picked more
convoluted solutions because I knew what an immense shit-storm I would have
cause if I had checked in code with just a single goto statement.

It would not have mattered that I could have provided a rational explanation
for why that was a rational choice in that instance. They would have just kept
reciting scripture and called me a heretic.

Meanwhile people will let you you commit the worse most unmaintainable code,
as long as it doesn't break any the 10 commandments of coding or whatever the
equivalent would be.

~~~
m_mueller
I actually almost wanted to mention goto as an example of this kind of dogma.

------
the_jeremy
None of the performance tuning suggestions are benchmarked, and I find it hard
to believe these would ever make a substantial difference. They could make a
statistically significant difference, maybe, but local variables vs class
attributes? You should show how much of a time saver this is, because I can't
envision a realistic scenario where this is worth the developer time.

~~~
kragen
The runtime cost of instance attribute access rather than local variable
access can account for a quarter of a program’s run time; I just tried it on
my phone:

    
    
        Python 3.7.4 (default, Jul 28 2019, 22:33:35)           
        Type 'copyright', 'credits' or 'license' for more information
        IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.                                                                                                     
        In [1]: class X:                                           
           ...:     def y(z):                                      
           ...:         return z.a + z.a + z.a + z.a + z.a
           ...:     def w(z):     
           ...:         a = z.a                                    
           ...:         return a+a+a+a+a
           ...:                                             
    
        In [2]: x = X()
    
        In [3]: x.a = 3                                     
    
        In [4]: x.y()                                           
        Out[4]: 15                                          
    
        In [5]: x.w()                                  
        Out[5]: 15
                                                                
        In [6]: %timeit x.y()                               
        1.7 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)                                                                                              
    
        In [7]: %timeit x.w()                                  
        1.11 µs ± 5.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    

This is often surprising to novices in Python, but attribute access involves a
hash table lookup.

Note that here we are comparing two instance attribute accesses against seven,
not zero against five. Evidently each of them cost about 118 ns, so if we
could reduce them to zero, the method call and return and four additions would
cost only 870 ns, which is closer to half the runtime than ¾.

Moral: benchmark before pooh-poohing a hotspot.

Also though note that several thousand instructions is a pretty heavy price to
pay for four integer additions.

~~~
maksimum
Compared to __slots__ (also Python 3.7.4)

Using your definition of class X

    
    
      %timeit x.w()
      313 ns ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    

Add __slots__

    
    
      class X:
        __slots__ = ('a')
        def w(z):
          a = z.a
          return a+a+a+a+a
    
      %timeit x.w()
      271 ns ± 7.13 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    

About 14% less time.

~~~
kragen
Your computer is evidently faster than my phone; how fast was y() for you?

Also you are missing a comma in your would-be tuple.

~~~
Dylan16807
For what it's worth I got about 196 and 131 for the original y and w, and
after adding __slots__ (with comma) I got 186 and 123.

~~~
kragen
The fact that the difference is 10ms and 8ms respectively suggests that the
speedup of attribute access isn't what's showing up in your measurements. In
one case we access the slot "a" once; in the other case we access it five
times. How can that be a 20% difference?

------
benfrederickson
Interesting article. While I definitely think you should be profiling your
code to figure out the hot spots, cProfile has some limitations for profiling:
cProfile doesn't give you line numbers, doesn’t work with threads, and
significantly slows your program down.

I wrote a tool py-spy ([https://github.com/benfred/py-
spy](https://github.com/benfred/py-spy)) that is worth checking out if you’re
interesting in profiling python programs. Not only does it solve those
problems with cProfile - py-spy also lets you generate a flamegraph, profile
running programs in production, works with multiprocess python applications,
can profile native python extensions etc.

~~~
Jugurtha
Have you looked at Yappi[0]? I use it in combination with kcachegrind[1] (call
graph viewer) and the combination has been extremely useful in eliminating
bottlenecks across entire programs.

Side note: I also used pyreverse, now part of pylint, to diagram entire
projects and get a class hierarchy. It helped tremendously in refactoring and
decoupling code through whole projects, finding redundancies, and have a
better architecture.

I'll have a look at py-spy. Thanks for that.

[0]: [https://pypi.org/project/yappi/](https://pypi.org/project/yappi/)

[1]:
[https://kcachegrind.github.io/html/Home.html](https://kcachegrind.github.io/html/Home.html)

------
mbeex
> Python haters always say

Stopped here immediately. I have been writing software for more than 20 years,
mainly in C++ and Python. No professional would start this kind of discussion
with this childish attitude (apart from the fact, that content-wise the
problem was beaten to death for decades).

~~~
socialdemocrat
There kind of are hater of all categories though. But then again you also have
Python fanboys who will do just about anything to avoid using something that
isn't Python.

As a Julia developer I see this a lot. You point out Julia advantages and the
Python guy will respond with: Oh I can do that in Python to if I use package
X, Y, Z combined with feature A, B, C. Basically their response to a simple
well engineered feature is a complete mess of a solution. But hey they prefer
that because they can still stick the label Python on top of it.

I admit I also get set in my ways, but at least I like to think that when I
dismiss another language it is not for purely silly reasons.

~~~
toyg
This happens with every language that reaches popularity. That’s because it’s
typically easier for individuals to engineer solutions with tools they know
well, even if suboptimal, than it is to become proficient with new ones that
might or might not deliver better results in the end. No community is immune
to this, even outside IT.

I’m pretty sure you’ll also occasionally bang nails for which Julia is a poor
hammer, you just don’t realize it.

~~~
socialdemocrat
Oh I definitely know there are things Julia is not good at. It is just that
Julia does not get in my way as frequently as many other languages.

But I kind of keep a collection of favorite languages under my belt which
cover different areas. My favorites are probably Julia, Go, Swift, Python, Lua
and LISP in that order.

If I need more low level style coding I would go with Go (pun not intended).
Swift is nice if you want to actually want to make GUI applications and
something that is quite robust. The type system in Swift is quite good at
catching many problems.

------
d--b
Ugh, the whole “python is slow - but it’s great for piping C libraries” trade
off has been discussed a gazillion times before.

This article is written by someone who obviously doesn’t know much about CS.

Please HN community, try to not upvote these, it’s a waste of time for all of
us.

------
adrianN
The only Python programs that can be called "blazingly" fast compared to
equivalent programs in performant languages are either spending all their time
in I/O, or all spending all their time in C. Python is a nice language and
with some tricks you might speed it up by a factor 2-10, but writing the same
program in, say, Java, will often be 50-100x faster.

~~~
fctorial
> ...or all spending all their time in C...

Python performance varies the most in pl benchmark game.

[https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/which-programs-are-fastest.html#chart-
fastest-more)

~~~
adrianN
Yeah and if you look at the Python program with the best performance compared
to C you'll see that it spends all its time in gmpy2, which is exactly the
same library C uses. Python still manages to be 2x slower.

------
kragen
The article has some embarrassing errors, and its advice is not going to make
your Python programs blazingly fast, but it's a good start.

Resuming a generator in CPython is a lot faster than creating a whole new
function call, and especially a whole new method call, contrary to what the
article said. But often enough it's faster to just eagerly materialize a list
result.

Some other good tips: %timeit, ^C, sort -nk3, Numpy, Pandas, _sre, PyPy,
native code. In more detail:

• For benchmarking, use %timeit in IPython. It's much easier and much more
precise than time(1). For super lazy benchmarking use %%time instead.

• The laziest profiler is to interrupt your program with ^C. If you do this
twice and get the same stack trace, it's a good bet that's where your hotspot
is. cProfile is better, at least for single-threaded programs. Others here
suggest line_profiler.

• If you have output from the profile or cProfile module saved in a file, you
can use the pstats module to re-sort it by different fields. But you probably
don't, you have some text it output. The shell command `sort -nk3` will re-
sort it numerically by column 3, which is close enough. In Vim you can
highlight the output and type !sort -nk3, while in Emacs it's M-| sort -nk3.

• You can probably speed up a pure Python program by a factor of 10 with Numpy
or Pandas. If it's not a numerical algorithm, it may not be obvious how, but
it's usually feasible. It requires sort of turning the whole problem sideways
in your mind. You may not appreciate the effort when you are attempting to
modify the code.

• The _sre module is blazingly fast for finite state machines over Unicode
character streams. It can be worth it to transmogrify your problem into a
regular expression if you can.

• PyPy is probably faster. Use it if you can.

• The standard advice is to rewrite your hotspots in C once you've found them.
Maybe this should be updated; Cython, Rust, and C++ are all reasonable
alternatives, and for invoking the C etc., you have available cffi and ctypes
now. In Jython this is all much simpler because you can easily invoke code in
Java, Kotlin, or Clojure from Jython. An underappreciated aspect of this is
that using native code can save you a lot of memory as well as instructions,
and that may be more important. Consider trying __slots__ first if you suspect
this may be the case.

~~~
vbarrielle
> The laziest profiler is to interrupt your program with ^C. If you do this
> twice and get the same stack trace, it's a good bet that's where your
> hotspot is.

I do that sometimes, but it has some pitfalls. If most of the time is spent
inside a C module (for instance in numpy), then the interrupt won't be caught
before the C module is exited, which can lead to a wrong stacktrace.

~~~
kragen
Excellent point!

------
j88439h84
These are all trivial micro-optimizations.

“If you want your code to run faster, you should probably just use PyPy.” —
Guido van Rossum

[https://pypy.org/](https://pypy.org/)

~~~
edgyquant
Until you need a module that requires the C api: at which point pypy becomes
useless

~~~
j88439h84
They're developing a replacement API to address those issues.

[https://morepypy.blogspot.com/2019/12/hpy-kick-off-sprint-
re...](https://morepypy.blogspot.com/2019/12/hpy-kick-off-sprint-report.html)

------
eesmith
Fourth time this has been posted in 12 days. My comment from 12 days ago is at
[https://news.ycombinator.com/item?id=21930569](https://news.ycombinator.com/item?id=21930569)
. I pointed out that kernprof profiling shows that 99+% of the time is spent
in

    
    
        s += num / fact
    

so none of the techniques describe give blinding speedup. I also suggest pre-
compiling the regex.

~~~
tcbasche
Pre-compiling regex doesn't actually give you any real performance benefit as
Python3 caches it internally anyway. See
[https://docs.python.org/3/library/re.html#re.compile](https://docs.python.org/3/library/re.html#re.compile)

~~~
eesmith
Sure. My original, linked-to comment ended:

> Now, re.findall() does cache the last 100 or so regexps, so it probably
> won't re-evaluate the regex each time. But really, pre-compute that regex
> with "_my_pattern = re.compile(regex) ... _my_pattern.findall()" and avoid
> even that cache lookup.

cpburns2009 says its 512 these days, which doesn't change the essence of my
comment.

------
jonstewart
I regret reading this article and I think the title is clickbait. I was hoping
for something like PyPy or Unladen Swallow, etc. The equivalent programs in
TFA will be blazingly faster if ported simply to other languages.

~~~
j88439h84
No need to port, just run them with PyPy to make them multiple times faster.
As usual.

------
drdaeman
> Don't Access Attributes (example `import re; re.findall(...)` vs `from re
> import findall; findall(...)`

I find it a good habit to always import modules and almost never (sane
exclusions apply) import individual functions from them. If I use something
frequently, I'd alias it for clarity (`import sqlalchemy as sa`)

The reason is that otherwise, patching with mocks becomes somewhat tricky, as
you'll have to patch functions in each individual importer module separately.
Here's an example:
[https://stackoverflow.com/a/16134754/116546](https://stackoverflow.com/a/16134754/116546)

Maybe that's wrong but my idea is that I don't want to assume which module
calls some specific function but just mock the thing (e.g. make sure Stripe
API returns a mock subscription - no matter where exactly it's called from).
Then, if I refactor things and move a piece of code around (e.g. extract
working with Stripe to a helper module), my unit tests just continue to work.

\---

> Based on recent tweet from Raymond Hettinger, the only thing we should be
> using is f-string, it's most readable, concise AND the fastest method.

I love f-strings, but to best of my knowledge, one can't use f-strings for
i18n/l10n, so all end-user-facing texts still have to use `%` or `format`.
E.g. `_("Hello, %(name)").format(name=name)`.

~~~
uranusjr
FWIW I came to the same conclusion as yours for the exact same reason
(mocking). So there are at least two of us :p

------
smabie
A 30% speed-up in Python is still dog-slow. This is a terrible article, he
doesn't even talk about his "example." it's like he gave up 1/10th of the way
through the post.

------
kashug
The article does not seems to work for me. I only get "undefined" as contewnt.
Looking at the network-debugger in Firefox the call to load article seems to
be blocked due to CORS. (it tries to do a call on port 1234 for some reason)

------
luord
Just read the three top comments and their threads. There was absolutely no
meaningful discussion or worthwhile contributions in any of them, just fans of
less popular languages mostly venting their resent.

The weirdest thing is that they aren't even using python nor it seems that
they're being forced to use it currently, making all this... Ranting (there's
literally no other word for this) all the more inexplicable.

I don't understand it; I've been using Go for a year now at work. I hate
pretty much everything about it, yet I haven't ranted about it in an article
about the language for about that time. There's just no point to it.

~~~
rednafi
This pretty much sums up the whole thread. Lots of new people are doubting
Python in cases where Python is being heavily used in megacorps. Python is
special for its community, libraries and the huge amount of work that it's
built on top of. And wake me up when some of these obscure languages that are
being mentioned here take over python.

But Python zealots can be annoying. That's true for any language. Personally I
don't like python's asynchronous programming paradigm. Objectively Go does it
better than Python.

------
ezzzzz
Anybody with experience able to chime in on a question? So, at a high-level, I
am looking at using Python at my workplace. We are a weird amalgamation of a
Java and Microsoft shop, using Java and Kotlin for 'critical' systems, while
heavily relying on SQL Server/SSIS/SSRS for all our back-office processing
(batch jobs, reporting, ETL etc). This is the stuff my team is responsible
for, and we are constantly hitting the limitations of this stack. My feeling
is that Python brings enough to the table as a general purpose language to be
a good fit for our use-cases. Simple automation of file io, analytics and
reporting, small footprint web frameworks (flask), big data tools like Spark,
libraries like Pandas, PyTorch etc. Also, I don't have time to learn idiomatic
Scala. It's not about laziness, its just that I feel Python brings enough to
the table to be useful, while still being productive and readable. Then I read
threads like this and start second-guessing myself. I see some red-flags for
sure, but I'm just looking for some validation here. Basically, we have a lot
that needs fixing, we need to do it quickly, and I'm wondering if Python can
work. We are certainly in the realm of 'big-data', and are currently handling
everything with procedural SQL, some Java apps that need refactoring, Perl
scripts and scheduled tasks on Win Server, and a bloated, poorly implemented
Java Web App to provide a front-end to our poorly maintained, non-normalized
database.

~~~
pjmlp
Back on my TCL days, I learned to never rely again on a language without
JIT/AOT toolchain.

So unless you are into adopting PyPy, you will be better off with JVM and .NET
stacks.

Plenty of languages to choose from, while benefiting from their performance
and tooling.

~~~
ezzzzz
I should note, I'm not particularly concerned with performance. We already
have fairly optimized DB code, views, sprocs, indexes etc. This layer is
currently sufficient for our needs. So ideally, we would still continue to
leverage the SQL-Server. What we need, is to extract business logic from the
DB, into application code which is testable. All of this processing is
'batch', we also have options for deploying (Azure, PCF) which can handle
issues of scale. I'm more concerned with getting it right, than making it
fast. I'm not very experienced with C#, but have experience with Java/Spring
web development, and have yet to find any frameworks that allow for rapid
development akin to flask or rails. Java/Kotlin is great for back-end dev with
spring-boot, but full-stack... not so much. Also, I don't want to manage the
complexities of any front-end JS framework-du-jour. I know React, Angular and
some Vue. I'm very much of the YAGNI philosophy when it comes to front-end (at
least for Enterprise apps). PyPy is a viable option, as I don't see any
immediate need to call into C (although this assumption is likely to come back
to bite me).

~~~
pjmlp
Grails, although it has gone out of fashion.

~~~
ezzzzz
meh. I’m not trying to sound cultish, but if you’re not at least familiar with
some of the packages I mentioned... Python is different than TCL. Python isn’t
growing in popularity for nothing. At the end of the day, I just want tools
that get out of my way, while keeping the loc I’m responsible for maintaining
to be small and easy to grok.

~~~
pjmlp
Python is growing in fashion because those Fortran and C++ GPGPU libraries
happen to have Python bindings out of the box, whereas other languages are
only getting them now.

That and has replaced Java in many introduction to programming courses.

Which is good, when learning to programm performance isn't a concern as such.

I know Python since Zope was the only reason to use it, so around Python 1.5
or something.

Other than replacing what I used Perl for, regarding UNIX shell scripting, I
never used Python in any scenario where performance might come into play.

There are plenty of options that beat Python's LOC, while providing an AOT/JIT
toolchain out of the box.

------
imtringued
I personally dislike the use of caching to increase performance. It is very
easy to slap on caching and then the benchmarks say the problem is fixed but
you will end up with unpredictability. You can no longer know how much memory
your program is using and you don't know if a given function call is the
source of a bottleneck or not. Your profiler will show a single hot function
when the cache is empty but all the other calls that happen after caching
become invisible.

------
Alex3917
There are some interesting things in here I wasn't aware of. That being said,
you should really be timing individual functions by using line_profiler,
otherwise even if you find a slow function you won't have any idea what part
is making it slow. Often it's extremely counter intuitive. E.g. compiling
regular expressions can be hundreds of times slower than executing them.

------
sbr464
I’m currently working on a lib that allows choosing the best implementation of
a method based on the current browser/os.

Performance varies wildly for basic coding decisions across platforms.
Especially diff combinations of browser + os.

Im deciding on a name still, was thinking concepts like ‘popular’ from the
song by Nada Surf, or photo finish (horse racing), or something like
unfortunate/wheel of unfortune, poking fun at the need to have this lib.

Here's a messy example that shows this issue (try it in diff browsers).

[http://jsben.ch/Uzj2Q](http://jsben.ch/Uzj2Q)

------
daenz
>Generators are not inherently faster as they were made to allow for lazy
computation, which saves memory rather than time. However, the saved memory
can be cause for your program to actually run faster. How? Well, if you have
large dataset and you don't use generators (iterators), then the data might
overflow CPUs L1 cache, which will slow down lookup of values in memory
significantly.

Can someone chime in about the L1 cache? The claim is made without
measurements, so I am skeptical.

------
im3w1l
I think you'd be better of offloading the hot part to C++.

------
ascotan
[https://dev.to/martinheinz/making-python-programs-blazing-
fa...](https://dev.to/martinheinz/making-python-programs-blazing-fast-4knl)

Honestly the quality bar for most things written is python is pretty low so
anything that can help people improve is fine. So kudos to the author.

------
gridlockd
The only way to make Python programs "blazingly" fast is to not use the Python
interpreter at all in the hot path.

Almost everything the Python interpreter does is ridiculously slow, even for
an interpreted language. The language design[1] prevents fast
implementations[2].

[1] Restricted subsets of Python do not count

[2] No, PyPy is not fast. It is slow, even for a JIT.

~~~
pjmlp
> The language design prevents fast implementations.

Apparently the fact that the complete world may change at every given moment,
and every single operation requires method calls, doesn't impact the existence
of reasonable good JIT compilers for Smalltalk, in fact they are the genesis
for Java JITs.

------
JelteF
Interesting post, but the code examples are completely unreadable on firefox +
windows, because of the CSS color: #333 on .hljs class.

[https://imgur.com/tUnlwWa](https://imgur.com/tUnlwWa)

------
qwerty456127
> Use Functions. This might seem counter intuitive, as calling function will
> put more stuff onto stack and create overhead from function returns, but it
> relates to previous point. If you just put your whole code into one file
> without putting it into function, it will be much slower because of global
> variables. Therefore you can speed up your code just by wrapping whole code
> in main function and calling it once, like so.

Wow, this is the one I couldn't expect. I always wrap the scripts in the main
function out of pure perfectionism (or perhaps that's OCD) but the fact a
script without it is going to run slower seems counter-intuitive and should
really be among the first things taught.

~~~
hoiuyoi9087
> _should really be among the first things taught._

No, it shouldn't. You don't teach a language by discussing micro-
optimizations, especially when you're talking about Python.

------
uncle_j
Some of these optimisations are very similar to what you used to do in
JavaScript with slower JS engines. Caching a value in a variable name than
constantly accessing a property.

------
armitron
Blazingly fast? Not the words I would use..

------
commandersaki
Write CPython with an emphasis on C. Then get the speed gains you need.

------
jokoon
Aren't there some thumb rules for writing fast python code?

------
Jaxkr
Surprised PyPy was not mentioned.

------
AzzieElbab
never got pass the spinner. is this an insider joke? the spinner was pretty
fast

