I'd say if you do data-intenstive computation with Numpy you are not leaving muc...

_aavaa_ · on March 3, 2023

Have gone through the exercise, I know this is false.

Not everything can be pushed into numpy, and you can still be left with lots of loops in python.

dragonwriter · on March 3, 2023

That's what numba [0] is for (can also help with the NumPy stuff in certain cases.)

_aavaa_ · on March 3, 2023

Oh, I'm familiar with numba and while it certainly helps, it has plenty of it's own issues. You don't always get a performance gain and you only find this out at the end of a refactoring. Your code can get less readable if you need to transport data in and out of formats that it's compatible with (looking at you List()).

To say nothing of adding yet another long dependancy chain to the language (python 3.11 is still not supported even though work started in Aug of last year).

I do wonder if the effort put into making this slow language fast could have been put to better use, such as improving a language with python's ease of use but which was build from the beginning with performance in mind.

dagw · on March 3, 2023

I've rewritten real world performance critical numpy code in C and easily gotten 2-5x speedup on several occasions, without having to do anything overly clever on the C side (ie no SIMD or multiprocessing C code for example).

VBprogrammer · on March 3, 2023

Did you rewrite the whole thing or just drop into C for the relevant module(s)? Because the ability to chuck some C into the performance critical sections of your code is another big plus for Python.

mandevil · on March 3, 2023

But... pretty much any language can interoperate with C, it's calling conventions have become the universal standard. I mean, I still remember at $previousJob when I was deprecating a C library and carefully searched for any mention of the include file... only to discover that a whole lot of Fortran code depended on the thing I was changing, and I had just broken all of it (since Fortran doesn't use include files the same way, my search for "#include <my_library" didn't return any hits, but the function calls were there none-the-less).

Julia, to use the great-great-grand-op's example, seems to also have a reasonably easy C interop (I've never written any Julia, so I'm basing this off skimming the docs, dunno, it might actually be much more of a pain than it looks like here).

cjalmeida · on March 3, 2023

Calling C from Julia is pretty straightforward

https://docs.julialang.org/en/v1/manual/calling-c-and-fortra...

_aavaa_ · on March 3, 2023

Even if you do this, you're still paying a penalty whenever you move data Python->C and C->Python.

Plus that you now need to write performant (and safe) C code, which (to me) defeats part of the reason to use Python in the first place.

pbourke · on March 3, 2023

I’ve done the same but moved from vanilla numpy to numba. The code mostly stayed the same and it took a couple hours vs however long a port to C or Rust would have taken.

_aavaa_ · on March 3, 2023

For a package whose pitch is "Just apply one of the Numba decorators to your Python function, and Numba does the rest." a few hours of work is a long time.

slt2021 · on March 3, 2023

2-5x speedup is not a lot, I would say it is not worth it to rewrite from py to C if you don't have an order of magnitude improvement.

Because if you compare the benefit to the cost of rewrite from py to C and cost of maintaining/updating C code and possible C footguns like manual memory safety, etc - then there is no benefit left

_aavaa_ · on March 3, 2023

2-5x IS a lot. It's the speed difference between the current iPhone 14 and an iPhone XS-iPhone 6. That's 4-8 years of hardware improvements.

And the parent was talking about numpy code which is better than stock python, who knows how far back normal python would send you.

jononomo · on March 3, 2023

I'm in the camp that 2-5x performance improvement is not really worth re-writing Python code in C for.

_aavaa_ · on March 3, 2023

Guess that'll depend on how much you need the performance and how much code it is.

They're comparing numpy (SIMD plus parallelism) with straightforward C code and getting a 2-5x improvement.

slt2021 · on March 4, 2023

I highly doubt that numpy can ever be a bottleneck. In typical python app - there are other things like I/O that consume resources and become bottleneck, before you run into numpy limits and justify rewrite in C.

_aavaa_ · on March 4, 2023

I haven't personally run into IO bottlenecks so I have no idea how you would speed those up in Python.

But there's two schools of thoughts I've heard from people regarding how to think about these bottlenecks:

1. IO/network is such a bottleneck so it doesn't matter if the rest is not as fast as possible.

2. IO/network is a bottleneck so you have to work extra hard on everything else to make up for it as much as possible.

I tend to fall in the second camp. If you can't work on the data as it's being loaded and have to wait till it's fully loaded, then you need to make sure you process it as quickly as possibly to make up for the time you spend waiting.

dagw · on March 4, 2023

In my typical python apps, it's 0.1-20 seconds of IO and pre-processing, followed by 30 seconds to 10 hours of number crunching, followed by 0.1-20 seconds of post processing and IO.

redsaz · on March 3, 2023

It can be worth it. What matters is how much time it saves your users over the course of using the app vs the time it took to develop it. So, if:

#-of-users * total-time-saved-per-user > time-spent-optimizing

Then it's worth it. You can even multiply by cost of user per time unit and cost of developer per time unit, to see how much money was saved.

Even in cases where its the same person on both sides, it can still work out. There's an xkcd comic about it, even.

jononomo · on March 3, 2023

2-5x speedup barely seems worth re-writing something for, unless we're talking calculations that take literally days to complete, or you're working on the kernel of some system that is used by millions of people.