This is a terrible article. How this has gotten this kind of traction is unexplainable to me.
>This is the program I will be using for demonstration purposes
>Never comes up again for the rest of the post.
???
>Let's show you how to profile code.
>Also, here's a bunch of unprofiled suggestions with such precise and helpful comments as "slow" and "fast".
?????
>Python haters always say, that one of reasons they don't want to use it, is that it's slow. Well, whether specific program - regardless of programming language used - is fast or slow is very much dependant on developer who wrote it and their skill and ability to write optimized and fast programs.
This is so ridiculous it's honestly laughable.
It's such an obvious falsehood that the only explanations is either the person is truly this clueless, or else they are wilfully spewing bullshit. A bare metal language like C/C++ will of course let you do things faster than a heavy dynamic language like Python.
The mental gymnastics people do to justify not learning another tool.
You know what they say, if all you have is a hammer, everything looks like a nail.
>First rule of optimization is to not do it.
If this person is representative, this explains why computers are hundreds of times faster but most software feels slower than in 1999.
>If this person is representative, this explains why computers are hundreds of times faster but most software feels slower than in 1999.
I think they are representative of a lot of developers. With the continued pace of chip development for the last 35 years, there hasn't been a continuing need to program for performance - in general, unless programmers did something very dumb or were dealing with large amounts of data, they could just write the way they wanted and let the hardware handle making their program fast.
Contrast this with early computer games - to get the best performance some games would actually boot your computer without an OS, sacrificing some convenience to get the last few percent of speed needed out of the system because it was the only way to out perform the competition.
One reason there's such opportunity in the present state of CPU technology (clock speeds have halted at about 4 Ghz in favor of more cores) is that few people remember how to program for performance, and those that do are handicapped by a bloated OS built for profit rather than value.
The real reason is in numbers - how many people that could program a PC starting from bootstrap in assembly are out there, and again how many programmers that can paste Java(EE|script)? code together to make something work are out there?
The world needs tons of software, and the vast majority of that software just needs to do some things right some of the time, and an average Java EE developer toiling away in a cubicle is good enough to deliver it.
Writing efficient software is a HARD problem, and it doesn't make economical sense to actually write efficient software, it makes sense to write just good enough software and throw hardware at it. For the price of a developer-year you can provision hundreds of machines to run that piece of code.
I was rubbing my hands together thinking "oh boy, I already know about speeding up with PyPy, I wonder what other tricks I don't know?" The article was sorely disappointing.
> So, let's prove some people wrong and let's see how we can improve performance of our Python programs and make them really fast!
> [sets the stage with a program that takes 11 seconds to run]
> This is more about general ideas and strategies, which when used, can make a huge impact on performance, in some cases up to 30% speed-up.
...that's it? up to 30% speed-up is "blazingly fast"? On a program that takes 11 seconds to run, that still takes 8 seconds... I was expecting speed-ups that took execution to milliseconds or microseconds.
Seeing as the interpreter takes about 60-70 ms to just start up and do nothing, I don't think we're going to be seeing any benchmarks in the microseconds.
$ time python -c pass
real 0m0.062s
user 0m0.011s
sys 0m0.042s
That was the quickest of three runs on one of my servers.
FWIW, I upvote things like this on occasion because I often learn more from comments/debate on articles I disagree with than from those that I solidly agree with already. Sadly HN's ability to have productive debate seems to be in decline, but that's another topic.
I don't know why but Python attracts some real zealots. I think everyone has had this conversation about Python's speed with someone (I have, unfortunately, had it many times)...if your program is slow, it is always your fault, Python is perfectly optimised and there is no reason to use any other language...I don't get it. Why does Python need to be all things to all people (srs, I want to understand)?
Perhaps Python doesn't need to be all things to all people, but it does need to be performant because people really shouldn't start projects with Python unless they are sure their project will never have performance requirements that Python can't deliver on, and they understand the limits of the "just rewrite it in C!" or "just use multiprocessing!" advice. Since it's generally unpredictable whether a project's performance requirements will exceed Python's ability or whether the eventual bottlenecks will be amenable to the aforementioned advice, using Python is generally a bad idea; however, many people don't understand the significant caveats to C/multiprocessing and therefore continue to happily start projects in Python (and paint themselves into corners that are expensive to back out of).
> The mental gymnastics people do to justify not learning another tool. You know what they say, if all you have is a hammer, everything looks like a nail.
This is so important to listen to .
Some people just can't accept there is a right tool for a job. And filling that hammer down to a needle is not something to be proud of.
The title is bad; the article doesn't deliver on the promise of making Python programs "blazingly" fast.
The first example given (the exponential function) is basically the worst scenario, because it's a purely numerical computation expressed in pure Python code. Whereas Python's performance is okay-ish for I/O or calling C modules.
From doing Project Euler solutions, I have ample evidence that for pure numerics (e.g. int, float, array), Java is anywhere from 10× to 30× faster than pure Python code executed in CPython. https://www.nayuki.io/page/project-euler-solutions#benchmark...
I believe it is basically impossible for Python to win back all that performance loss without adopting radical and jarring features like static typing, machine-sized integers, and no more "every number is a full-fledged object".
That's because PyPy has poor support for C-API exentions, which is what all the performance-oriented python packages rely on.
Most people doing serious numeric work don't care about the speed of the Python interpreter because all the heavy lifting is done by optimized libraries like Numpy and TensorFlow.
Then maybe the statement should have been "I believe it is basically impossible for Python to win back all that performance loss without adopting radical and jarring features like dropping the legacy C interface to Python objects."
The fastest dynamic languages I see are those which have no C interface at all. How many of the performance optimizations in a modern JavaScript engine would be possible if it had to support accessing every property of every object as a C string at any point in a program? JS has exactly none of the features that comment claims would be necessary for performance.
> dropping the legacy C interface to Python objects.
This is nonsense. No language ever will have linear algebra or numeric implementations faster than Fortran/C implementations of BLAS, LAPACK etc. Not being able to make ffi calls makes python essentially unusable for most of its niche uses.
> JS has exactly none of the features that comment claims would be necessary for performance.
I don't see people doing ML, scientific computing etc in JS.
You're confusing "no FFI whatsoever" with "a sane FFI". Python's legacy interface is insane--it exposes virtually all details of the CPython interpreter to C extensions such that any deviation from CPython's implementation is effectively a breaking change.
Because of the web, Google and other companies have poured tons of resources into improving the speed of JavaScript. I wonder where Python would be if the same level of resources had been devoted to it.
Interestingly you didn't mention Lua. The best Common Lisp, Smalltalk, and JS systems are always about a factor of 2 to 10 slower than C. This is not very good evidence for your assertion. (I suspect that's true of Dylan too, but haven't tried it.) By contrast, LuaJIT often beats GCC on performance. This is excellent evidence for your assertion.
Typically I measure CPython at about 40× slower than C, and SBCL, V8, and the like at about 2–10× slower than C. What cases are you seeing where those runtimes are hundreds of times faster than CPython?
People don't normally update to new versions of Lua, because they're not backwards compatible; it's not like Python or JS. WoW is still using Lua 5.1, as is MediaWiki. It's unlikely that this will ever change.
Probably at some point someone will continue LuaJIT development. It seems as likely that they will diverge in a different direction as follow PUC.
Right! When your hot code is in C you get a performance mix. Still, only about 3% of your logic has to be in interpreted Python for it to dominate your runtime.
Chez scheme is just as dynamic as python and is about as fast as C# running on mono IIRC. My scheme was probably worse than my C# was when I did project Euler though.
Chez does unboxed integer arithmetic (but not floats) and does not have to do any OO-like dispatch, and is also probably one of the best language implementations there are.
Is it? Python is really, really dynamic, which contributes to its slowness. You can directly change an instance's __class__ attribute. You can add properties to classes dynamically, changing how fundamentals of how attributes get looked up at run time. You can write a new class, using a new metaclass, and then set an existing instance to the new class.
A great deal of why Python is so slow is that it is really too dynamic. A language doesn't really want to be "as dynamic as Python".
Chez lacks a built in OOP system, but there is nothing prohibiting you from adding something like CLOS which does all python does and more (much faster than python).
Most modern lisp compilers do a lot of different things to make CLOS fast, though, prefilling caches and all that for you. Not only that, you can connect to a running program and redefine it while it is running.
In that case, the answer is actually no to my question. Yes, of course you could program in that level of dynamicness, because you could in any language, but it will then slow you down. No sensible CLOS would be as dynamic as Python.
Like I said, in a lot of ways, you don't want to be as dynamic as Python, and I advise against language advocates seeing the phrase "Python is more dynamic than your language" as a cue to jump up and start insisting that they are just as dynamic as Python. Even in hindsight, I'd say the level of dynamicness in Python was a mistake. You don't need it to have a nice, usable, dynamic language, but it has been a ball & chain around its legs in terms of performance for decades.
To be clear, this isn't a criticism of dynamic languages as a concept. I have criticisms, but these aren't it. This is a criticism of Python specifically. A dynamic language can be pretty nice with, let's say, two or three layers of dynamicness, but Python has four or five. If you follow the full process that Python has to go through to resolve "x.y", including all possible points where you might have done something to affect the result, it's crazy overkill. In Guido's defense, when he was writing it way back when, that wasn't clear. There wasn't a lot of highly-relevant prior art to look at for that style language.
CLOS is just as dynamic as Python _and_ fast, at least in SBCL and LispWorks. CLOS is probably the most expressive object system you can find, and if you wanted it fast you would restrict it somewhat to allow for at least some of the dispatch to happen at compile time :)
It’s hard to find a Scheme implementaron that isn’t significantly faster than Python... all while being at least as dynamic and more expressive.
I think a lot of CS students had their brains melted by tough classes where Scheme was the vehicle. Thus, you have a population of users that either are anti-evangelists because they have PTSD, or you have evangelists who exist on a plane above the lumpenproletariat like me, which can contribute to the false notion that Scheme is intrinsically esoteric.
I never got a formal CS education, so I don’t have PTSD from Scheme. Also, I am not a super brain, so I guess I haven’t felt compelled to become expert at the hard concepts that Scheme enables.
People fixate on the S-expression syntax (“all those parentheses!”; counter argument “way fewer commas!”). But I think the real issue for Scheme is the lack of libraries that do hard things for “normals” like me.
If I’m strictly honest, I’m more productive in Python than Scheme. This is not because Python is easier. It’s because the Python community has attracted the CS grads who grokked enough of the hard stuff to make libraries that abstract away stuff.
There’s no reason people can’t write Scheme like they write Python. That is, people don’t need to do all the possible stuff in Scheme all the time. Truthfully, Scheme is at least as easy as Python.
Scheme just needs more smart normals writing libraries for mediocre normals like me for Scheme to become popular. Maybe take a domain approach. I feel like adapting R’s tidyverse to Chez is an easy target. Scheme could be the data scientist’s goto. Maybe show people how easy it is to build self-contained serverless apps in the cloud.
If there were a Scheme community that earnestly tackled any domain with the idea of making it accessible to practitioners within the domain, I think it could get real traction.
A nice show of Chez Scheme speed is Idris 2. Edwin is very impressed with the performance; it is very fast. And imho it is a lovely language with, as you rightfully say, one of the best implementations out there. Extremely portable (cpu/OS). And quite readable too.
Scheme is relentlessly monomorphic, and most implementations don't provide much reflection, preferring compile-time macros. (I don't know what Chez’s reflection facilities are like; can you do the equivalent of def __getitem__ or x.__setattr__ = y?) These attributes make Scheme much easier than Python to implement efficiently.
My experience is that simple programs usually take about twice as much code in Scheme. This is related to the polymorphism thing, but also I think Python supports imperative programming better. Python is built around autogrowing hash tables (“dicts”) and autogrowing arrays (“lists”); standard Scheme provides neither, preferring alists and cons lists.
in chez I would probably use chez soop, which is hideously underdocumented and probably bitrotten by now, or any of the other object systems available.
It would of course slow things down since I doubt chez optimizes that so you get runtime dispatch. SBCL however has an amazing object systems that does all of python's Oop with pretty good speed.
You are right that python is better at imperative programming, but that is also very much a matter fo taste. I always get a sour feeling in my mouth when using python because it is almost exclusively about mutability.
(hashtables are a part of r6rs, btw. Not very pleasant to use since: (hashtable-ref h key default-value))
That story of mine is old, but I have some newer anecdata: porting all my old project Euler solutions to SBCL and getting help by some better lispers than me to optimize it made SBCL come out close, but almost universally slower than Chez if someone really wants I can see if I find the code on any of my external HDDs.
It is hard to claim, because there are hardly any benchmarks. I just had the opportunity to move between the implementations and found chez to be faster for the kind of things I was doing, which was numerical stuff and loops.
SBCL is also an amazing project, and lately the GC story has gotten better IIRC.
Probably not, but the code is a hell of a lot cleaner! I looked through it and thought that even I could work on it. The new macro expander is even possible to follow :)
You could just use Julia which basically gives you a Python like language but with C like performance. Julia will kick Java to the curb on numerics because it was actually built with that in mind.
And you get all of this WITHOUT static typing and yes in Julia everything is still a full-fledged object. There is no difference between numbers and other objects in Julia. But numerics is still fast.
I tried really hard to like Julia and just couldn't get behind it. We used it for a research paper in the area of optimization and by the end of the project I really regretted it. Performance and built in numerics libraries were great, but I feel the language ergonomics are pretty bad. I hate Python just as much as the next developer, but I do find it much easier to use than Julia for research type work.
My biggest qualm with Julia (and maybe this speaks to my inexperience with the language) is that it isn't always obvious when Julia is going to make a copy. We spent about an hour working through some code that was very slow (props to Julia's profiling tools) but couldn't figure out _why_ it was slow. It turned out that despite our best efforts, Julia was still copying a vector despite us using pre-allocated scratch space for the work.
From my point of view, if I am comparing algorithms then Python's performance doesn't really matter and it's ergonomics win. If performance matters I'd just use C++ or Rust.
Interesting how we can have such vastly different experiences.
When Julia makes a copy is pretty straightforward and natural IMHO. I would have been curious to see an example of the code you used where a copy was made without you knowing.
I started with Python, but I find Julia better is almost every single way I can think of. Like even if Julia was slower than Python I would have picked it because I find it so much nicer to use.
I wrote an article here about some of the observations I had about using Python after coming back to it from Julia:
> My biggest qualm with Julia (and maybe this speaks to my inexperience with the language) is that it isn't always obvious when Julia is going to make a copy.
If you're talking about slices of an array, those always create a copy unless created with the @views macro (or the equivalent function call).
This is serious nonsense! Julia is way nicer than Python. If Python was faster than Julia, I would still have picked Julia, unless I really needed the performance.
With Julia I get first class meta programming. I get awesome multiple dispatch. I get environments and package system really well integrated. I get awesome integration with the shell. Better module system. More natural syntax for arrays. Much better system for closures. Better named functions.
REPL programming in Julia is just light years ahead of anything in Python. The OOP design of Python really kills the REPL experience.
Unless you are a very skilled C++ programmer, Julia is going to outperform you as the program gets larger. C++ programmers are going to get themselves tangled up when trying to run multi-threaded code, running on multiple machines on GPUs and specialized hardware. Julia does this effortlessly.
C++ cannot do JIT, hence as soon as you deal with complicated machine learning algorithms with custom kernels, C++ is going to tie itself into a knot.
Why do you think large Astronomy projects like Celeste and the next major climate models are built in Julia and not C++? Because developers realized that when you need to run massive calculations on super computers on hundreds of thousands of cores, C++ is going to get in the way.
As for libraries and tools. All the Python tools I have tried to match my Julia tools have just sucked. Julia tools often excel over much older Python tools.
Library development moves much faster on Julia than Python. It is not hamstrung by relying on complicated C++ code based. Also Julia libraries integrate very well, while Python libraries are often their own deserted island. That means a few Julia libraries can do what must be accomplished with dozens of Python libraries.
I don't think it's dead. You are right on the other points. I think that Julia could have its day, it's having a lot of trouble getting traction. I mean I remember 20 years ago being laughed at for using Python for a web app, but now look at python.
No. But the quality of the libraries is often much higher, and can do things that are impossible in Python and in most other languages (owing to the high degree of Julia's polymorphism and homoiconicity).
It's somewhat the other direction. In the area that I work, scientific machine learning and differential equation modeling, Python does not really have a well-developed ecosystem while Julia has all of the tools. High performance methods with stiffness handling, automatic detection of Jacobian sparsity form a Julia program, methods for stochastic/delay/differential-algebraic equations, and the ability to embed neural networks into arbitrary differential equations and train them in a scientific context. Python is very far behind even MATLAB or Mathematica in this domain.
That's interesting. How do you think R fares in this respect?
(The problem that I'm having with Julia isn't the math/computational aspect, it's Julia's use as a more general purpose programming language in additional to math.)
Typically straightforward Numpy gets me from 40× slower than straightforward C to 5× slower. Tricky Numpy (output arguments, conversions to lower precision, weird SciPy functions) gets me another factor of 2. C SIMD (intrinsics or GCC’s portable vector types) gets me a factor of 4 faster than straightforward C. Presumably CUDA would buy me another factor of 100 but I haven't tried it and I haven't tried Numba either.
You're probably expecting a bit much from CUDA. If you have heavily optimized CPU code running on a high-core count Xeon, it's probably more like 2-3x. The reason why CUDA is so popular is because it makes that comparatively easy to achieve. Optimizing x86 to the last 10% is a dark art only very few programmers are capable of, while writing decent GPU code is IMO an order of magnitude easier, i.e. just a craft.
Main difference being: High memory bandwidth vs. heavy usage of cache, unified programming model for vectorization and multicore parallelism
Yes, but it's much less emphasized on GPU I think. If you have a data parallel algorithm, as long as you design the array ordering to allow coalesced access, the memory architecture will usually already allow better performance than what you can hope to get from CPU even with heavily cache optimized code that's basically unmaintainable (as it will likely perform much differently on the next architecture).
without lots of CPU cores and with a high-end NVIDIA card your speedup expectations just can become a bit higher. Typically 100x when comparing GPU-friendly algos to unoptimized (but native) CPU code or 10x when comparing it to decently optimized code running on slower CPUs.
Generally I think a performance/cost comparison is more useful: Take the price of the GPU and compare it to something with equivalent cost in CPU+RAM.
> Typically straightforward Numpy gets me from 40x slower than straightforward C to 5x slower.
I find this hard to believe. What kind of numerical work are you doing? Even something as simple as matrix-matrix multiplication should be hard to beat with C, unless your C code is using a cache efficient algorithm.
Branch heavy code, for example trading order book updating.
People always say "use numpy", but that is only possible if your algorithm can be described in terms of vectorized operations. For many kinds of processing, the only alternative is C/C++ (through Cython)
> People always say "use numpy", but that is only possible if your algorithm can be described in terms of vectorized operations. For many kinds of processing, the only alternative is C/C++ (through Cython
I think using numpy is always good first step after just trying to improve the algorithm. Numpy will be less effort than going to cython. After that cython is a good next step. I seriously don't know any situation where I would do the kind of micro-optimizations mentioned in the article.
My personal experience is that you can actually get another factor of 2 or 3 speed-up by ditching Cython and using actual C instead (I think it's because optimizers have a hard time cleaning up the C that Cython produces), even if you've turned off thing's like bounds checking.
I guess you haven't tried it, then. But your lack of knowledge is not a reasonable justification for attacking my integrity.
> Even something as simple as matrix-matrix multiplication
That's the best case for Numpy, not the worst. SGEMM is indeed just as fast when invoked from Numpy as when invoked from C, at least for large matrices.
Indeed, numpy is awesome. I used to do my computational experimenting (some of them including neural networks) the plain way (using classic data structures, the languages' built-in arithmetic operators and functional facilities) in many different languages. Once I've tried Python with numpy my mind was blown, it's so much faster than anything. Now I feel like I enjoyed writing functional code more but given the performance difference I can hardly imagine coming back. So the very reason I use Python is performance.
> I believe it is basically impossible for Python to win back all that performance loss without adopting radical and jarring features like static typing, machine-sized integers, and no more "every number is a full-fledged object".
Pypy is often quite a bit faster than CPython for real-world programs so clearly some improvement is possible.
It is a shame that Python became so popular. There are a number of choices that are just as “easy” (or better) and an order of magnitude more efficient.
Two options you can try guiding a budding python user to are Nim and F#
> So, let's prove some people wrong and let's see how we can improve performance of our Python programs and make them really fast!
I have to say, the desperate lengths Python programmers will go to to use it for things it was not meant for rather than learn or use other languages is one of the aspects I most dislike about it. However fast you make it, the same effort would have made it that much faster again in a performant language.
I love Python as a glue language. So much heavy lifting done in numpy or opencv or whatnot. But Python as the interface makes it trivial to explore, experiment, and glue together a workflow, especially when the solution is unclear.
Then at some point if Python isn't needed because you know exactly what you want your software to do, rewrite it in C++ or whatever.
Also with CFFI and other interoperable libraries, it's really quite easy to write some heavy work in a more appropriate language and call into it.
For that kind of workflow you would be far better off with e.g. Julia. You get the same advantages as Python as having a language you can experiment with until you find a solution. Only difference is the optimization step later does not involve having to rewrite in another language.
If you already know Python, and Python packages already do all you will ever need then sure stick with that. But I don't get why people would go to such lengths to avoid using a new language. Being proficient in Julia is a lot less work than maintaining proficiency in Python and C++.
The last time I checked, using Julia was clunky at best with ridiculously high jit-compile times, packages that refused to build on my machine etc..
What is more, many of the "best" Julia libraries were seemingly just linked-in Python code.
I don't mean to discredit the advantages Julia clearly has over Python, but these are just the kinds of problems that make people like me stick with tried and tested last-gen languages like Python.
Did you ever check after 1.0 was released? In the earlier days it was a lot of problems with packages. Totally agree. JIT compile times are much better now.
A lot of the issues are simply that people have not learned a sensible workflow with Julia. Python guys have a lot of habits that don't translate well to Julia. I know because I work daily with two hardcore python guys. I notice all the time how we approach problems in very different ways.
Python guys seem to love making lots of separate little programs they launch from the shell. Or they just relaunch whole programs all the time.
In Julia in contrast you focus on packages from the get go and you work primarily inside the Julia REPL. You run Revise.jl package which picks up all the changes you make to your Julia package.
I guess it just depends on the workflows you are used to. For me it is the opposite. Whenever I have to jump into our Python code base I absolutely hate it. It is very unnatural for me to work in the Python way. I also find Python code kind of hard to read compared to Julia code.
But I know Python coders have the opposite problem. Basically Python guys look a lot at module names when reading code. Julia developers look more at types. The difference makes some sense since you don't really write types in Python code.
I found that the new Python type annotation system helped me feel at home in Python.
OMG. Until now I didn't realize Cobol is more popular than Haskell and Delphi. I had just read there are ~2.5 Cobol programmers left in the world, who are old like Gandalf, enjoy ridiculously high salaries and can't retire because nobody learns Cobol any more while the world needs somebody to maintain those legacy systems that still run its economy. While Delphi used to be the PHP of the desktop world until very recently (and could hardly be expected to decline so fast) and Haskell seems to be the computer science lingua franca.
>The more a language tutorial is searched, the more popular the language is assumed to be. It is a leading indicator. The raw data comes from Google Trends.
That's the problem, nobody is actively learning Cobol these days and nobody knows how much is running out there in the wild, because none of the big banks or credit companies will actually admit it.
For the pretty much perfect language, try Erlang or Elixir. The runtime guarantees are like no other languages, and it can easily drop down to C, C++, Rust and several others when needed.
Able to abstract concisely at a high level, provides decent facilities for modeling business logic in a succinct yet correct manner etc.
Elixir and Erlang both lack a type system and maintain a focus on keeping the language constructs simple rather than providing various high level abstractions some other more expressive languages have.
I mean sure, you are right, but is there a language that is expressive by your criteria?
You seem to be describing DSLs -- that can be done by all LISPs and some others -- and not programming languages per se.
Elixir has very powerful macros by the way, they are compile-time and not the runtime `method_missing` stuff that Ruby does, so they are basically code generation that you don't get to see but get all the benefits from.
With those macros you can go quite far creating business DSLs with Elixir (and a bunch more language, in the interest if full honesty). That's why many people qualify Elixir as "LISP-y".
The lack of static typing is indeed something that is poking my eyes as well, but so far I have been managing. The extra cognitive load due to it -- plus the need to utilise the strong but dynamic typing system -- is definitely there though, cannot deny that.
I'm aware, I'm running Elixir in production, including custom macros.
That said, even the macro system pales in comparison with say, Haskell, in terms of expressiveness. Haskell has such a depth of constructs and allows highly sophisticated (& safe) abstractions that you often don't feel like you'd even need a separate DSL or an embedded DSL.
If anything, writing Elixir macros feels a lot like writing Typescript AST transformers to me.
static typing (especially HKT), macros, or laziness? The BEAM is amazing, but Erlang is kind of meh. Also the BEAM is very slow so you end up writing C to get any real work done. I haven't used Elixer, but I suspect it suffers from the same problems. When the JVM finally supports preemptive multi-tasking, Erlang and the BEAM will become completely unnecessary.
As mentioned in a reply to your sibling comment, the lack of static typing is definitely a minus. Not denying it.
Full laziness I found quite overrated while trying to write several very small business services in Haskell. Even in Elixir when you can utilise laziness in limited areas (in the I/O processing stdlibs, but there are almost no data structures for it) it's a proven fact [with benchmarks] that unless your data is at least a list with 500+ elements then laziness introduces both performance penalty and complexity. Not bashing the idea but it's not so universally good as several groups of rather religious programmers seem to make it. Right tool for the right job and all. ;)
Macros is Elixir are quite powerful and very different from stuff like it's done in Ruby. They are basically compile-time code generators, utilised with crushing success in Phoenix (web framework), Absinthe (GraphQL) and Ecto (DB data mapper library). Friendly programmer advice: don't underestimate Elixir macros. Many people stayed with Elixir mostly because of them.
---
> When the JVM finally supports preemptive multi-tasking, Erlang and the BEAM will become completely unnecessary.
1. Maybe, but I was hearing about preemptive multi-tasking on the JVM when I gave up on Java and that was around 2009. Believe me when I tell you, I really want at least 4-5 other runtimes to gain the capabilities of the BEAM. But alas, they still don't. Only Rust 3rd party runtime devs seem to be truly trying to push the envelope. Everybody else is like "yeah, we're gonna get it done by 2050, no worries". Sorry if that comes across as a bit cynical and dismissive, but I do read history.
2. The preemptive multi-tasking of the BEAM is indeed a huge selling point but not the only one. The actor model -- basically green threads with message inboxes -- are historically being proven more and more as a much more successful parallel computing model compared to... pretty much anything else. Really good Golang devs can kinda-sorta-partially emulate that with channels and mutexes/semaphores but it's obviously manual and error-prone. Rust's `actix` devs also agree and if you look at Techempower's benchmarks that try to emulate real web apps, you'll notice that `actix-web` is on top. Kinda funny when you think about how many people here in HN keep saying the actor model is only a good idea on paper and would never perform well in practice.
Additionally, if you keep an eye on the whole ecosystem (as much as that is actually even possible) you'd notice that almost everyone, JS included, is begrudgingly moving to immutable data and lock-less synchronisation. A lot of tech is starting to come to grips with the reality of us humans being flawed and not as bright as we'd like -- me included. I paid my dues with manually crafting mutex and condition variables workflows in C for years, then translating that to Java, then Go, etc. And I can confidently say: it doesn't work sustainably well. You can make a few simple pieces with it but for long-term projects, just go with immutability and lock-less sync.
---
I guess my point is, Erlang/Elixir are still quite strong and don't have much serious competition in their niche (where they are doing well).
Lastly, I agree that the BEAM in general isn't the fastest thing to run code on. Absolutely. In practice however -- and I mean importing a few million lines of CSV/XML a day -- I found that if you invest just a little more effort, Elixir will stay out of your way and your code will be mostly I/O bound. I mean yeah, Elixir isn't as fast as C++ or Rust. But I already used it for several pretty different projects and it has been performing absolutely excellently.
I am looking forward to learn Rust's `actix-web` though. I feel it could be a better fit for some more dramatically more computationally demanding web projects.
Actor model is not for parallel computing, it's for concurrency. It introduces unnecessary overhead for parallel computing and is not ideal for it. For concurrency, on the other hand, it's probably superior to anything else out there.
Sure. Just saying that the BEAM lends itself to parallelism rather excellently. Conceptually you are correct but in practice -- in the BEAM at least -- those ideas can be converged.
You're right, Erlang still has an important feature that (almost?) all other languages lack: preemption. I think the actor model is great, but is crippled if required to use cooperative multi-tasking (like Akka); the code starts simple and then becomes increasingly tangled and complicated. I just think that the Erlang ecosystem has stagnated and the language itself feels dated. If/when the JVM or maybe Rust gets preemption, l think it would be best if all moved on to an actor system built on a more performant, robust, and expressive ecosystem.
> If/when the JVM or maybe Rust gets preemption, l think it would be best if all moved on to an actor system built on a more performant, robust, and expressive ecosystem.
If Rust gained all BEAM capabilities I'll switch tomorrow, dude.
At the moment though, Elixir is just the only choice for excellent parallelism and concurrency.
I know Rust is working hard to get there. It's not there yet though.
"...I paid my dues with manually crafting mutex and condition variables workflows in C for years, then translating that to Java, then Go..."
Quite frankly most of the multithreading things an synchronization is quite simple and does not bother me except some rare cases (dealing with DirectShow for example). If some languages fail to support real world constructs directly it is the problem of their designers and users. All this stuff about horrible, horrible manual memory management and awful synchronization issues is overblown I think.
We'll have to respectfully agree to disagree here. My first job was with 50+ year old guys who chased such bugs in micro-kernels and drivers literally every day. They swore that the world hasn't seen a dumber abstraction than pthreads, lol.
They were exaggerating of course -- and they were like gods, they fixed hundreds of sync bugs in C code, mixed with assembly for like 15 embedded platforms.
I am not such a hardcore programmer like them to this day even though I am almost 40 but the mutex/condition_var combo has bitten me many, many times. Spurious wakes, for example: not all platforms do them right (side note: I realise these may remain a forever fact of life and be a key to having preemptive scheduling like in the BEAM VM [Erlang/Elixir]). A lot of defensive coding is needed. And that's only one example out of dozens.
IMO you are just used to it. If you code in Erlang/Elixir on very parallel-friendly projects I am pretty sure you'll never go back to the pthreads model!
It's just the conclusion of the group of programmers to whom I philosophically belong (basically people who ran away screaming from C/C++ after working 2 to 10 years with them) that this stuff looks deceptively easy and it works quite well in many cases... until the projects gets big, many people get involved and bugs start to creep in. In such conditions, immutability + actor model have so far proven to be much more people-friendly paradigms that make introducing certain class of bugs much harder.
"We'll have to respectfully agree to disagree here"
Sure thing, I have just stated my personal opinion, not looking for anyone to agree. To each their own. Everything has a price, immutability/actor included. The rest is all about the trade-offs one is willing to make
That's unequivocally true! Immutability + actor model definitely punish you with some performance losses, especially in dynamic languages like Erlang and Elixir. No two ways about it.
I am looking forward to see if the Rust community will manage to make a statically-typed BEAM-like runtime with much stronger performance characteristics. That would be amazing.
> [...]the desperate lengths Python programmers will go to to use it for things it was not meant for rather than learn or use other languages[...]
I don't think it's an issue of not wanting to learn other languages.
If you really like screwdrivers and go to screwdriving events, have a collection of screwdrivers, you may end up living in a echo chamber where you and your peers are convinced using a screwdriver is a good tool to plant a nail.
I think this is exactly how R's accreted features to make it do things that it was never designed to do. When the "tool you use" becomes the "kind of developer you are," things get a little restrictive.
It's also how languages like Scala become "everything and the kitchen sink" multi-paradigm monsters. People making the tool want it to do everything, so that they can get all the developers, so they make it a functional object-oriented imperative procedural declarative hodgepodge of 15 different barely-mutually-operable sub-languages, and you get a mess.
> However fast you make it, the same effort would have made it that much faster again in a performant language.
If you're trying to make something go 'really fast' these days, that means either (a) some kind of vectorization, or (b) pushing work onto a GPU. In either case Python is unlikely to perform meaningfully worse than any other language, since the host language isn't doing much anyway.
This is an unfortunately common misunderstanding of the phrase: "premature optimization is the root of all evil."
Optimization is a crucial part of developing successful software. It can be harmful to get overzealous with certain types of optimization, however basic wins like using string builder primitives or formatted strings from the outset is hardly premature. Some optimizations can only be realized at the early conceptual stages too; going for those early on isn't always premature.
Yeah, I see that a lot. People always leave off the last part of the quote, which was the actual point.
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”
So annoying. I would get people telling me to not make an obvious performance improvement that adds no complexity to the code and which is obvious. Yet some basically insist on using the least performant solution possible as somehow being good software engineering. It is insane how rule bound people can get. No wonder religion exits. People just live inventing rules and forcing others to follow them.
I think you’re right, programmers (and people) can be blinded by rules. Knuth never meant to suggest people should use the least performant solution available. He’d be horrified by some of the things people justify with his quote. The idea was always to make good engineering choices from the start, including using tools and techniques that are known to give good performance, but to wait to get low-level and measure and count cycles until the code is mostly done being written and not going to change significantly later. Choosing a less performant option because it’s slower is a bad choice. Choosing a slower option that is easier to refactor as you go, when it’s clear refactoring will happen, over a faster solution that will make refactoring harder, that is a perfectly acceptable engineering decision.
there's tons of dogma in programming. likely more than in most professions because it's not really a scientifically driven field. e.g. I'd love to just put all Emacs and VI zealots together in a room and show them Engelbart's 1968 demo, Clockwork Orange style.
Hahaha that is a good one. I remember as a C++ programmer there was several occasions where I saw a goto statement would have given the cleanest and most maintainable code (typically exiting deeper loops). Yet I always picked more convoluted solutions because I knew what an immense shit-storm I would have cause if I had checked in code with just a single goto statement.
It would not have mattered that I could have provided a rational explanation for why that was a rational choice in that instance. They would have just kept reciting scripture and called me a heretic.
Meanwhile people will let you you commit the worse most unmaintainable code, as long as it doesn't break any the 10 commandments of coding or whatever the equivalent would be.
I wonder if using string builders is really a critical 3%... and how many people who do practice premature optimization actually measure if their optimization of choice is in their program's critical 3%.
String builders are often not a critical optimization, however the additional cognitive burden on the reader is nearly zero. In some languages, string builders can even overload operator += which makes the type of the object the only visible distinction outside of the final conversion to string.
In languages that have immutable strings, a chain of `+=` operators is basically O(n^2) vs O(n) for a string builder. For how easy the optimization is, there's little excuse to not use them for any bulk append operations.
rv = []
for x in y:
rv.append(f(x))
if g(x):
rv.append(h(x))
return ''.join(rv)
This gives you the same O(N²) to O(N) speedup you would get from a StringBuilder.
More recently, though, I've often been preferring the following construction instead:
for x in y:
yield f(x)
if g(x):
yield h(x)
This is sometimes actually faster (when you can pass the result to somefile.writelines, for example, which does not append newlines to the items despite its name) and is usually less code. If you want to delegate part of this kind of string generation to another function, in Python 3.3+, you can use `yield from f(x)` rather than `for s in f(x): yield s` or the just `yield f(x)` you use if `f` returns a string, and the delegation is cleaner and more efficient than if you're appending to a list and the other function is internally joining a list to give you a string.
However, if you're optimizing a deeply nested string generator, you're better off using the list approach and passing in the incomplete list to callee functions so they can append to it. Despite the suggestive syntax, at least last time I checked, `yield from` doesn't directly delegate the transmission of the iterated values; on this old netbook, it costs about 240 ns per item per stack level of `yield from`. (By comparison, a simple Python function call and return takes about 420 ns on the same machine.)
But if you really wanted your code to run fast you wouldn't have written it in Python anyway. You'd've used JS, LuaJIT, or Golang. Or maybe Scheme. Or C or Rust. But not Python.
This is why you really should always benchmark. In my view, "premature optimization" is not so much about optimizing too early in a project, it's about writing code a particular way you assume will make it faster without testing first.
So that means JS strings aren't truly immutable in a modern environment (which is fine). The runtime environment is internally using an approach similar to a string builder, which is a good optimization.
I agree that you shouldn't operate on assumptions alone for a decision like whether or not you should use a string builder. That's where prior experience should come in to play to guide your decisions. For instance, I am not a JS developer, so I have no prior experience to inform a decision to use a builder vs concat in JS.
I cited that case in particular since the slowness of concatenation was called out in the article, and in some languages it actually does make a huge difference at a very small complexity cost.
when concatenating strings, stringbuffer can be orders of magnitude faster. one c# example from google search "stringbuilder vs string benchmark" from codinggame.com measures 2.5 minutes for string += and 99ms for stringbuilder.
None of the performance tuning suggestions are benchmarked, and I find it hard to believe these would ever make a substantial difference. They could make a statistically significant difference, maybe, but local variables vs class attributes? You should show how much of a time saver this is, because I can't envision a realistic scenario where this is worth the developer time.
The runtime cost of instance attribute access rather than local variable access can account for a quarter of a program’s run time; I just tried it on my phone:
Python 3.7.4 (default, Jul 28 2019, 22:33:35)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: class X:
...: def y(z):
...: return z.a + z.a + z.a + z.a + z.a
...: def w(z):
...: a = z.a
...: return a+a+a+a+a
...:
In [2]: x = X()
In [3]: x.a = 3
In [4]: x.y()
Out[4]: 15
In [5]: x.w()
Out[5]: 15
In [6]: %timeit x.y()
1.7 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]: %timeit x.w()
1.11 µs ± 5.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is often surprising to novices in Python, but attribute access involves a hash table lookup.
Note that here we are comparing two instance attribute accesses against seven, not zero against five. Evidently each of them cost about 118 ns, so if we could reduce them to zero, the method call and return and four additions would cost only 870 ns, which is closer to half the runtime than ¾.
Moral: benchmark before pooh-poohing a hotspot.
Also though note that several thousand instructions is a pretty heavy price to pay for four integer additions.
> can account for a quarter of a program’s run time
It can but it will more likely account for much much less than that, unless all your programs are massive loops that do little more than access the same attribute repeatedly.
The fact that the difference is 10ms and 8ms respectively suggests that the speedup of attribute access isn't what's showing up in your measurements. In one case we access the slot "a" once; in the other case we access it five times. How can that be a 20% difference?
Yes, but TFA begins with a discussion of benchmarking methods and it would just take some copy and pasting to figure out if you're right.
Edit: I tested it and there is a difference
I picked the example you mentioned, defined the regex constant as "s" and the line constant as "asas", increased the iteration count from 10^4 to 10^6 to make the difference more noticable (got no noticable difference w/o that change) and measured the programs with time in Termux on my Nexus 5.
Real time for fast: 3.150, real time for slow: 3.623. Times average of three runs ("time python fast|slow.py"). I ran once with same setup and threw out results before the runs.
Edit2: actually you mentioned a different example, my bad. I didn't measure any others because vim with a touchscreen keyboard is a PITA, no idea if the one you referring is true.
Maybe coming from a diff place? It seemed fair: I spent an hour or two today basically undoing some canonical high-level Python/Pandas abstractions into canonical numpy+C, which was exactly this kind of stuff. We normally do GPU offloading (rapids.ai / blazingsql), though today's kernel was the uncomfortable inbetween where latency mattered but the task is too small for the offloading communication hit. We do the same kind of stuff (and a bit cleaner / more predictable) in JS with typed arrays.
I wouldn't call that blazingly fast python though, that's barely approaching C, which is also slow. Fast is maxing out SIMD + cores + GPUs + bandwidth, so should aim for ~20X+ faster than regular C...
Interesting article. While I definitely think you should be profiling your code to figure out the hot spots, cProfile has some limitations for profiling: cProfile doesn't give you line numbers, doesn’t work with threads, and significantly slows your program down.
I wrote a tool py-spy (https://github.com/benfred/py-spy) that is worth checking out if you’re interesting in profiling python programs. Not only does it solve those problems with cProfile - py-spy also lets you generate a flamegraph, profile running programs in production, works with multiprocess python applications, can profile native python extensions etc.
Have you looked at Yappi[0]? I use it in combination with kcachegrind[1] (call graph viewer) and the combination has been extremely useful in eliminating bottlenecks across entire programs.
Side note: I also used pyreverse, now part of pylint, to diagram entire projects and get a class hierarchy. It helped tremendously in refactoring and decoupling code through whole projects, finding redundancies, and have a better architecture.
Shoutout to Ben, py-spy is an amazing profiler. I believe cProfile has certain limitations and doesn't fully understand deep call stacks. py-spy does not have that limitation. It also offers multiple output formats (especially flamegraph and speedscope format, https://www.speedscope.app/) which make it so much nicer to identify bottlenecks.
At our company, py-spy has helped us a lot for our line-of-business application. I'm not affiliated with Ben in any way, but he deserves some praise for his work on py-spy.
Stopped here immediately. I have been writing software for more than 20 years, mainly in C++ and Python. No professional would start this kind of discussion with this childish attitude (apart from the fact, that content-wise the problem was beaten to death for decades).
There kind of are hater of all categories though. But then again you also have Python fanboys who will do just about anything to avoid using something that isn't Python.
As a Julia developer I see this a lot. You point out Julia advantages and the Python guy will respond with: Oh I can do that in Python to if I use package X, Y, Z combined with feature A, B, C. Basically their response to a simple well engineered feature is a complete mess of a solution. But hey they prefer that because they can still stick the label Python on top of it.
I admit I also get set in my ways, but at least I like to think that when I dismiss another language it is not for purely silly reasons.
This happens with every language that reaches popularity. That’s because it’s typically easier for individuals to engineer solutions with tools they know well, even if suboptimal, than it is to become proficient with new ones that might or might not deliver better results in the end. No community is immune to this, even outside IT.
I’m pretty sure you’ll also occasionally bang nails for which Julia is a poor hammer, you just don’t realize it.
Oh I definitely know there are things Julia is not good at. It is just that Julia does not get in my way as frequently as many other languages.
But I kind of keep a collection of favorite languages under my belt which cover different areas. My favorites are probably Julia, Go, Swift, Python, Lua and LISP in that order.
If I need more low level style coding I would go with Go (pun not intended). Swift is nice if you want to actually want to make GUI applications and something that is quite robust. The type system in Swift is quite good at catching many problems.
There is a cost to switching and introducing new technologies.
Otherwise you end up with a project which uses all of Python/Perl/Java/Julia/MATLAB/R/C++/Fortran/Rust/Go, because hey, for this particular problem X has the best solution so lets use that.
I think it is valid though to start transitioning to a better language for a project when initial experiments show it is superior for the task.
People are often WAY WAY too reluctant to rewrite code. Instead they spend years maintaining garbage.
I remember rewriting an iOS app from Objective-C to Swift. Everybody thought it was a waste of time and should not be done. People tend to only think about what is of immediate benefit.
I only rewrote the most important parts. About 30% remained in Objective-C. Once it was in Swift lots of developers suddenly started getting interested in joining. They loved working with Swift and made lots of contributions.
But then they hit the Objective-C parts and where bummed out. All the guys who had said rewriting to Swift was a waste of time was now complaining about the existence of Objective-C code and that we had to get rid of it.
So I rewrote the rest. The point is that, people seldom realize how much benefit a better language can be until they actually start working on a code based written in a better language. Then they will often start hating the very code base they had previously defended.
Think of the millions of lines of Cobol code stuck on mainframes which is almost impossible to maintain today. We are stuck with that because at every juncture where there was a chance to upgrade and switch to a more modern technology, somebody made a variant of your argument.
Sure you cannot have free for all. But it most be possible to have a sensible process where you experiment with some alternatives. Evaluate the pros and cons and then switch to the better choice.
I think, you unterestimate infrastructure (API-wise/tooling/people etc.) here. As it happens, I am myself a beginner in Julia. Not the language is the hard part, but the glueing with my current working environment. And as a beginner I don't speak or even dare to think about commercial projects. This is quite non-trivial and a wonderful sophisticated one-liner quickly becomes far costlier, considering all constraints - even in the long run.
> I like to think that when I dismiss another language it is not for purely silly reasons.
One should also to choose a suiting language, if you do not want to risk falling victim to the same attribute.
The only Python programs that can be called "blazingly" fast compared to equivalent programs in performant languages are either spending all their time in I/O, or all spending all their time in C. Python is a nice language and with some tricks you might speed it up by a factor 2-10, but writing the same program in, say, Java, will often be 50-100x faster.
Exactly. And then after the "blazingly" fast clickbait title we read:
"I'm (mostly) not going to show you some hacks, tricks and code
snippets that will magically solve your performance issues. This
is more about general ideas and strategies, which when used, can
make a huge impact on performance, in some cases up to 30%
speed-up."
Python CPU-bound programs are e.g. 30 times slower than C or Java, and "up to 30% speedup" makes them still 20 times slower which is really far from "blazingly".
Yes, but writing C is a completely different, and much more onerous experience. You have no sensible strings or hashmaps in the standard library, and using external libraries is a massive pain. That's not even to mention Undefined Behaviour and memory safety issues.
For most data-munging programs, python, node, java, and rust code will be roughly similar (Java and Rust will make you annotate types). I've been amazed at the performance you can get from Rust code that looks practically identical to the equivalent JavaScript.
To be honest I find both the fastest C and Rust answers ("C gcc #6" @ 1.64s and "Rust #6" @ 1.70s) highly unreadable.
For me, "Rust #5" @ 1.98s is the first readable solution in the list, and seems comparable (code wise) with the Java, Python, etc implementations. ("C++ g++ #6" is also quite reasonable, but that's C++, not C).
Yeah and if you look at the Python program with the best performance compared to C you'll see that it spends all its time in gmpy2, which is exactly the same library C uses. Python still manages to be 2x slower.
The article has some embarrassing errors, and its advice is not going to make your Python programs blazingly fast, but it's a good start.
Resuming a generator in CPython is a lot faster than creating a whole new function call, and especially a whole new method call, contrary to what the article said. But often enough it's faster to just eagerly materialize a list result.
Some other good tips: %timeit, ^C, sort -nk3, Numpy, Pandas, _sre, PyPy, native code. In more detail:
• For benchmarking, use %timeit in IPython. It's much easier and much more precise than time(1). For super lazy benchmarking use %%time instead.
• The laziest profiler is to interrupt your program with ^C. If you do this twice and get the same stack trace, it's a good bet that's where your hotspot is. cProfile is better, at least for single-threaded programs. Others here suggest line_profiler.
• If you have output from the profile or cProfile module saved in a file, you can use the pstats module to re-sort it by different fields. But you probably don't, you have some text it output. The shell command `sort -nk3` will re-sort it numerically by column 3, which is close enough. In Vim you can highlight the output and type !sort -nk3, while in Emacs it's M-| sort -nk3.
• You can probably speed up a pure Python program by a factor of 10 with Numpy or Pandas. If it's not a numerical algorithm, it may not be obvious how, but it's usually feasible. It requires sort of turning the whole problem sideways in your mind. You may not appreciate the effort when you are attempting to modify the code.
• The _sre module is blazingly fast for finite state machines over Unicode character streams. It can be worth it to transmogrify your problem into a regular expression if you can.
• PyPy is probably faster. Use it if you can.
• The standard advice is to rewrite your hotspots in C once you've found them. Maybe this should be updated; Cython, Rust, and C++ are all reasonable alternatives, and for invoking the C etc., you have available cffi and ctypes now. In Jython this is all much simpler because you can easily invoke code in Java, Kotlin, or Clojure from Jython. An underappreciated aspect of this is that using native code can save you a lot of memory as well as instructions, and that may be more important. Consider trying __slots__ first if you suspect this may be the case.
> The laziest profiler is to interrupt your program with ^C. If you do this twice and get the same stack trace, it's a good bet that's where your hotspot is.
I do that sometimes, but it has some pitfalls. If most of the time is spent inside a C module (for instance in numpy), then the interrupt won't be caught before the C module is exited, which can lead to a wrong stacktrace.
> If it's not a numerical algorithm, it may not be obvious how, but it's usually feasible. It requires sort of turning the whole problem sideways in your mind.
At this stage why are you even using python anyway? The code isn’t going to be very pythonic or readable and the effort would in my opinion be better spent on C++ or Rust.
Cython is alternative option for speeding up specific parts of code. There is also numba and hope modules providing JIT decorators.
Personally, I’ve tried pypy without issues, out of curiosity, but in about 15 years of using python never ran into python code as being the performance bottleneck. There are too highly performant modules for everything.
Do you have advice for finding code snippets that would most benefit from being re-written in C and called with Cython? I know how to find slow functions in Python, but obviously not every slow function will be a good candidate for rewriting.
In my experience, most pure python code will be 10 to 100x faster if it’s rewritten in C++. So I just profile it as usual using cProfile, try to make algorithmic improvements (eg caching) and then, if I need another order of magnitude, rewrite it in C++.
So basically anything where the hot path is in pure Python, rather than a standard library method.
Fourth time this has been posted in 12 days. My comment from 12 days ago is at https://news.ycombinator.com/item?id=21930569 . I pointed out that kernprof profiling shows that 99+% of the time is spent in
s += num / fact
so none of the techniques describe give blinding speedup. I also suggest pre-compiling the regex.
> Now, re.findall() does cache the last 100 or so regexps, so it probably won't re-evaluate the regex each time. But really, pre-compute that regex with "_my_pattern = re.compile(regex) ... _my_pattern.findall()" and avoid even that cache lookup.
cpburns2009 says its 512 these days, which doesn't change the essence of my comment.
I regret reading this article and I think the title is clickbait. I was hoping for something like PyPy or Unladen Swallow, etc. The equivalent programs in TFA will be blazingly faster if ported simply to other languages.
> Don't Access Attributes (example `import re; re.findall(...)` vs `from re import findall; findall(...)`
I find it a good habit to always import modules and almost never (sane exclusions apply) import individual functions from them. If I use something frequently, I'd alias it for clarity (`import sqlalchemy as sa`)
The reason is that otherwise, patching with mocks becomes somewhat tricky, as you'll have to patch functions in each individual importer module separately. Here's an example: https://stackoverflow.com/a/16134754/116546
Maybe that's wrong but my idea is that I don't want to assume which module calls some specific function but just mock the thing (e.g. make sure Stripe API returns a mock subscription - no matter where exactly it's called from). Then, if I refactor things and move a piece of code around (e.g. extract working with Stripe to a helper module), my unit tests just continue to work.
---
> Based on recent tweet from Raymond Hettinger, the only thing we should be using is f-string, it's most readable, concise AND the fastest method.
I love f-strings, but to best of my knowledge, one can't use f-strings for i18n/l10n, so all end-user-facing texts still have to use `%` or `format`. E.g. `_("Hello, %(name)").format(name=name)`.
A 30% speed-up in Python is still dog-slow. This is a terrible article, he doesn't even talk about his "example." it's like he gave up 1/10th of the way through the post.
The article does not seems to work for me. I only get "undefined" as contewnt. Looking at the network-debugger in Firefox the call to load article seems to be blocked due to CORS. (it tries to do a call on port 1234 for some reason)
Just read the three top comments and their threads. There was absolutely no meaningful discussion or worthwhile contributions in any of them, just fans of less popular languages mostly venting their resent.
The weirdest thing is that they aren't even using python nor it seems that they're being forced to use it currently, making all this... Ranting (there's literally no other word for this) all the more inexplicable.
I don't understand it; I've been using Go for a year now at work. I hate pretty much everything about it, yet I haven't ranted about it in an article about the language for about that time. There's just no point to it.
This pretty much sums up the whole thread. Lots of new people are doubting Python in cases where Python is being heavily used in megacorps. Python is special for its community, libraries and the huge amount of work that it's built on top of. And wake me up when some of these obscure languages that are being mentioned here take over python.
But Python zealots can be annoying. That's true for any language. Personally I don't like python's asynchronous programming paradigm. Objectively Go does it better than Python.
Anybody with experience able to chime in on a question? So, at a high-level, I am looking at using Python at my workplace. We are a weird amalgamation of a Java and Microsoft shop, using Java and Kotlin for 'critical' systems, while heavily relying on SQL Server/SSIS/SSRS for all our back-office processing (batch jobs, reporting, ETL etc). This is the stuff my team is responsible for, and we are constantly hitting the limitations of this stack. My feeling is that Python brings enough to the table as a general purpose language to be a good fit for our use-cases. Simple automation of file io, analytics and reporting, small footprint web frameworks (flask), big data tools like Spark, libraries like Pandas, PyTorch etc. Also, I don't have time to learn idiomatic Scala. It's not about laziness, its just that I feel Python brings enough to the table to be useful, while still being productive and readable. Then I read threads like this and start second-guessing myself. I see some red-flags for sure, but I'm just looking for some validation here. Basically, we have a lot that needs fixing, we need to do it quickly, and I'm wondering if Python can work. We are certainly in the realm of 'big-data', and are currently handling everything with procedural SQL, some Java apps that need refactoring, Perl scripts and scheduled tasks on Win Server, and a bloated, poorly implemented Java Web App to provide a front-end to our poorly maintained, non-normalized database.
I should note, I'm not particularly concerned with performance. We already have fairly optimized DB code, views, sprocs, indexes etc. This layer is currently sufficient for our needs. So ideally, we would still continue to leverage the SQL-Server. What we need, is to extract business logic from the DB, into application code which is testable. All of this processing is 'batch', we also have options for deploying (Azure, PCF) which can handle issues of scale. I'm more concerned with getting it right, than making it fast. I'm not very experienced with C#, but have experience with Java/Spring web development, and have yet to find any frameworks that allow for rapid development akin to flask or rails. Java/Kotlin is great for back-end dev with spring-boot, but full-stack... not so much. Also, I don't want to manage the complexities of any front-end JS framework-du-jour. I know React, Angular and some Vue. I'm very much of the YAGNI philosophy when it comes to front-end (at least for Enterprise apps). PyPy is a viable option, as I don't see any immediate need to call into C (although this assumption is likely to come back to bite me).
meh. I’m not trying to sound cultish, but if you’re not at least familiar with some of the packages I mentioned... Python is different than TCL. Python isn’t growing in popularity for nothing. At the end of the day, I just want tools that get out of my way, while keeping the loc I’m responsible for maintaining to be small and easy to grok.
Python is growing in fashion because those Fortran and C++ GPGPU libraries happen to have Python bindings out of the box, whereas other languages are only getting them now.
That and has replaced Java in many introduction to programming courses.
Which is good, when learning to programm performance isn't a concern as such.
I know Python since Zope was the only reason to use it, so around Python 1.5 or something.
Other than replacing what I used Perl for, regarding UNIX shell scripting, I never used Python in any scenario where performance might come into play.
There are plenty of options that beat Python's LOC, while providing an AOT/JIT toolchain out of the box.
I personally dislike the use of caching to increase performance. It is very easy to slap on caching and then the benchmarks say the problem is fixed but you will end up with unpredictability. You can no longer know how much memory your program is using and you don't know if a given function call is the source of a bottleneck or not. Your profiler will show a single hot function when the cache is empty but all the other calls that happen after caching become invisible.
There are some interesting things in here I wasn't aware of. That being said, you should really be timing individual functions by using line_profiler, otherwise even if you find a slow function you won't have any idea what part is making it slow. Often it's extremely counter intuitive. E.g. compiling regular expressions can be hundreds of times slower than executing them.
I’m currently working on a lib that allows choosing the best implementation of a method based on the current browser/os.
Performance varies wildly for basic coding decisions across platforms. Especially diff combinations of browser + os.
Im deciding on a name still, was thinking concepts like ‘popular’ from the song by Nada Surf, or photo finish (horse racing), or something like unfortunate/wheel of unfortune, poking fun at the need to have this lib.
Here's a messy example that shows this issue (try it in diff browsers).
>Generators are not inherently faster as they were made to allow for lazy computation, which saves memory rather than time. However, the saved memory can be cause for your program to actually run faster. How? Well, if you have large dataset and you don't use generators (iterators), then the data might overflow CPUs L1 cache, which will slow down lookup of values in memory significantly.
Can someone chime in about the L1 cache? The claim is made without measurements, so I am skeptical.
The only way to make Python programs "blazingly" fast is to not use the Python interpreter at all in the hot path.
Almost everything the Python interpreter does is ridiculously slow, even for an interpreted language. The language design[1] prevents fast implementations[2].
[1] Restricted subsets of Python do not count
[2] No, PyPy is not fast. It is slow, even for a JIT.
> The language design prevents fast implementations.
Apparently the fact that the complete world may change at every given moment, and every single operation requires method calls, doesn't impact the existence of reasonable good JIT compilers for Smalltalk, in fact they are the genesis for Java JITs.
> Use Functions. This might seem counter intuitive, as calling function will put more stuff onto stack and create overhead from function returns, but it relates to previous point. If you just put your whole code into one file without putting it into function, it will be much slower because of global variables. Therefore you can speed up your code just by wrapping whole code in main function and calling it once, like so.
Wow, this is the one I couldn't expect. I always wrap the scripts in the main function out of pure perfectionism (or perhaps that's OCD) but the fact a script without it is going to run slower seems counter-intuitive and should really be among the first things taught.
Some of these optimisations are very similar to what you used to do in JavaScript with slower JS engines. Caching a value in a variable name than constantly accessing a property.
>This is the program I will be using for demonstration purposes
>Never comes up again for the rest of the post.
???
>Let's show you how to profile code.
>Also, here's a bunch of unprofiled suggestions with such precise and helpful comments as "slow" and "fast".
?????
>Python haters always say, that one of reasons they don't want to use it, is that it's slow. Well, whether specific program - regardless of programming language used - is fast or slow is very much dependant on developer who wrote it and their skill and ability to write optimized and fast programs.
This is so ridiculous it's honestly laughable. It's such an obvious falsehood that the only explanations is either the person is truly this clueless, or else they are wilfully spewing bullshit. A bare metal language like C/C++ will of course let you do things faster than a heavy dynamic language like Python.
The mental gymnastics people do to justify not learning another tool. You know what they say, if all you have is a hammer, everything looks like a nail.
>First rule of optimization is to not do it.
If this person is representative, this explains why computers are hundreds of times faster but most software feels slower than in 1999.