Profiling your Numba code

IshanMi · 2024-01-30T19:15:18 1706642118

I still remember discovering & using numba for the first time in university. At the time I didn't know any programming languages except for Python, and wasn't super great at Python either. We had to write and run molecular simulations for our numerical methods and simulations class, and people were writing their code in C/C++ and were blazing through it. I remember finding out about Numba through the High Performance Python O'Reilly book, and by just adding a single @jit decorator, my simulations were dramatically faster.

Amazing stuff, honestly felt like black magic at the time.

IanOzsvald · 2024-01-30T21:18:18 1706649498

That's my book :-) Micha and I are working on the 3rd edition right now. Cheers!

mafro · 2024-02-02T08:24:50 1706862290

Amazing! Any idea when the third edition will be published?

Thanks for your work on this

eachro · 2024-01-30T19:45:38 1706643938

Do people use numba outside of scientific computing code? feel like for most "normal" uses, you'd probably be fine with having your code written in numpy or pandas, so I'd love to hear what people generally use numba for outside of scientific computing.

IanOzsvald · 2024-01-30T21:36:38 1706650598

Really Numba will speed up numpy and some scipy (there's partial API coverage) and math based pure python. I think it is unlikely it'd be used away from math problems. As another commenter mentioned it can be used to accelerate numpy-array based Pandas (but not the newer Arrow based arrays), and again that's for numeric work.

a-dub · 2024-01-31T02:51:32 1706669492

numpy and scipy are largely just wrappers around native libraries that are already optimized machine code. numba is good for writing loops or doing other things you can pretty much imagine as simple C code.

while it is pretty cool, it's also a bit awkward thinking about machine structures and machine types in high level python. there are some gotchas with respect to the automatic type inference.

IanOzsvald · 2024-01-31T07:28:54 1706686134

Don't forget that a sequence of numpy operations will likely each allocate their own temporary memory. Numba can often fuse these together, so although the implementation behind numpy is compiled C you end up with fewer memory allocations and less memory pressure, so you still get your results even faster. Numba also offers the OpenMP parallel tools too. I have a nice sequence of simulations in my Higher Performance Python course going from raw python through numpy then to Numba showing how this all comes together. Just try having a function with a=np.fn1(x); b=np.fn2(a); c=np.fn3(b) etc and compile it with @jit and you should get a performance impact. Maybe you can also turn on the OpenMP parallelizer too.

Kaijo · 2024-01-30T23:08:01 1706656081

I've seen numba used in a few addons for Blender 3D. Speeding up reaction-diffusion simulations on meshes, things like that. Scientific computing for artistic purposes!

dheera · 2024-01-30T23:20:37 1706656837

I've used numba numerous times to get stuff done for robotics work. C++ is just a shitshow of a mess right now with 5 billion new unnecessary features yet the real inconveniences (like even the very existence of header files that massively violate DRY) haven't been addressed at all. I can't wait till Rust gains more adoption and ecosystem.

sobellian · 2024-01-31T06:22:25 1706682145

I want to like Rust, but I find myself fighting the language so much - and not because of the lifetimes. Off the top of my head:

- Yes you get const generics, but no you do not get const generic expressions

- Yes you can provide impls for traits, but no you cannot impl a foreign trait for a foreign type

- Macros cover these gaps in meta-programming, but macro code is really painful to read/write

- If you want to write one tiny little proc macro, you need a whole. new. crate. So I usually write ugly, convoluted "declarative" macros instead of what would properly be a proc macro

- Last time I wrote a Rust macro that generated a lot of code, the intellisense in VS Code stopped giving hints for functions in the macro output. Breaking intellisense is very painful

- Yes const fns exist, but no you cannot use iterators or for loops in them. AFAICT a lot of library authors just don't go through the trouble of defining const fns. I was bit by this when I tried to make a const array using the ndarray library. Instead you have to go through the whole song and dance with OnceCell.

- Speaking of which, OnceCell feels like a downgrade compared with static initialization in C++. You get more soundness, but also much more verbosity

There are very good reasons for many of these restrictions, but still. I fear doing anything too fancy because I've come to expect that the language will yank away the football at the last second.

PS. Not a language shortcoming, but - do not just write out idiomatic vector math expressions as you might do with Eigen like:

``` let a = b + c + d + e; ```

IIRC Eigen can be smart enough to only allocate one result vector for this expression. Rust numeric libraries, on the other hand, will allocate many unnecessary temporaries. You need to be thinking much more imperatively to get similar performance:

``` let mut a = b + c; a += d; a += e; ```

abdullahkhalids · 2024-01-31T01:51:53 1706665913

I learned Rust over the last year, to write some scientific simulations. Since, you can just call C++ libraries from Rust with minimal overhead, there is no real reason to not write in Rust.

Also, if you want a Python interface for your library, PyO3 allows you to call Rust from Python naturally. It definitely needs better documentation, but otherwise its quite good.

montebicyclelo · 2024-01-31T08:52:19 1706691139

There's an example here of using Numba to speed up data science code, (disclaimer, I'm the author): https://sidsite.com/posts/python-corrset-optimization/

> for most "normal" uses, you'd probably be fine with having your code written in numpy or pandas

In my experience this is the case. Occasionally though, there might be an bottleneck, and Numba can be a good way to handle it. E.g. when implementing seam carving, I found normal Python / NumPy too slow, and same with Naive Bayes, so used Numba. (Why was I implementing these, instead of using a library? As an exercise. But I guess the takeaway is that if you want to do something more intensive, and are implementing it yourself rather than using a library, Numba can be a good option).

goriloser · 2024-01-30T23:28:08 1706657288

Sometimes you just need a plain old "for" over your pandas dataframe, to do a more complicated aggregation (variable window size, conditionals, ...)

You can try twisting numpy/pandas into a solution, but it might be much simpler to just write a dumb numba function which will be easier to maintain since the logic will be very clear.

itamarst · 2024-01-30T20:15:46 1706645746

You can use Numba to speed up some Pandas calculations: https://pandas.pydata.org/docs/user_guide/enhancingperf.html...

__mharrison__ · 2024-01-31T06:41:25 1706683285

I have a section on speeding up pandas code in Effective Pandas 2. It discuss using NumPy, Cython, and Numba to make pandas faster. In general, numba is magic and makes (decorated) pure Python run fast!

bbstats · 2024-01-31T03:56:37 1706673397

If you must use a for loop on a data transformation (e.g. any recursive calculation), numba is way faster than most alternatives.

bionhoward · 2024-01-30T23:06:53 1706656013

algotrading

tripplyons · 2024-01-30T23:22:00 1706656920

If your trading strategy cares about this level of speed, you shouldn't be using Python.

goriloser · 2024-01-30T23:31:49 1706657509

Algotrading has two parts - analyzing data to find strategies and executing the strategies.

Numba is for the first part, where you might need to process billions of messages to extract signals.

tripplyons · 2024-01-30T23:38:03 1706657883

Thanks for the clarification. That would make more sense.

tripplyons · 2024-01-30T23:27:57 1706657277

I didn't realize Numba supported CUDA:

https://numba.readthedocs.io/en/stable/cuda/index.html

montebicyclelo · 2024-01-31T09:31:36 1706693496

This addresses a real pain point, of Numba not being so easy to profile, (especially compared to Python and line-profiler), so will definitely be trying this out!

LeakedCanary · 2024-01-31T14:56:47 1706713007

I tend to disagree with the following sentence mentioned in the article:

> One hypothesis is instruction-level parallelism

This is Python code, whose execution has a massive gap to the actual CPU instructions executed. The experiment result feels more like something related to the memory cache.