How Julia ODE Solve Compile Time Was Reduced from 30 Seconds to 0.1

jakobnissen · on Sept 21, 2022

Super impressive work. While work such as this can show what is possible in terms of compilation speed, it's probably beyond the scope of the average dev.

It'll be interesting to see, going forward, how much of these recent tooling improvements will trickle down to more casual devs, such that they too, can get significantly lower compile times.

I'm pretty optimistic on that front. It looks like much of the improvements came from writing more inferrible code, removing code that breaks reasonable invariants in Base, using SnoopPrecompile, and using a sysimage.

The first two things should be done anyway, and better tooling will help with that. Future native code caching will probably get results similar to current sysimages. That leaves SnoopPrecompile, which is a small maintenance burden, but is a pretty minor addition in most case.

One approach is to include a block of code which exercises the functionality of your package. This block can then, at the same time be used to: 1. Check your code is inferrible 2. Run static analysis to catch bugs 3. Use SnoopPrecompile

alhirzel · on Sept 21, 2022

Julia has been making big strides recently on (pre-)compilation. Nice work Chris--beyond just these recent results, you've helped to create a community. I wonder how long it will be until we have a demonstrably stable static/aot compilation capability that a new user can employ without friction on Linux, Mac and Windows? No doubt it will be just as soon as it becomes a focus area. This is an area I'd love to see advanced soon and potentially help with, because I use Julia for my work.

JHonaker · on Sept 22, 2022

I really like Juila. I just came back to writing a lot of Julia (after I realized I was starting down the path of recreating the enormous effort of the Julia devs by adding specialized multiple dispatch to Scheme). It feels incredibly flexible yet performant in a way that most languages fail to realize.

However, it can be incredibly frustrating sometimes. This article is a great example of the kind of wizardry you have to do to figure out why something isn't as fast as you thought it would be. There seem to be a lot of, "If you don't do it this way, it'll still work, but it will slow as hell," practices that are in the community's collective knowledge, but are big stumbling blocks in the beginning. (making sure you have concrete types in struct fields, the Val type and dispatch vs an if statement, how to write "type stable" code, etc.)

I had a lot of incorrect ideas and assumptions about the type system and the specialization mechanisms until I watched the Julia internals talk on YouTube from like 2014. This is probably due to it still being relatively young, and it has gotten better over time.

There are a lot of things to like about Julia that make this bearable though! It's really a great language to work in.

ChrisRackauckas · on Sept 22, 2022

There are ways to fix a lot of the issues, like over allocating arrays could be fixed through escape analysis to delete allocations through compiler analysis. And there's a lot of prototypes for these features. I expect a good chunk of them start rolling out over the next year. Some of them though are more foundational, like knowing when to use dynamic dispatch vs avoiding dispatch.

JHonaker · on Sept 24, 2022

Yea, I fully anticipate that they'll be fixed eventually. Having toyed around with Julia since it's 0.3 days it's amazing to see how far the language has come so quickly. I only bring that stuff up because I think it's better for people to realize that there are some pain points in Julia. If you come to the language with a more realistic viewpoint of it, then I think you're more likely to stay.

The one that gets me most often is the boxing of captured values. I come from Scheme/Racket and I'm just so used to exploiting closures. I also struggle with "over-typing" function declarations. I'm getting better about that though, now that I have a better handle on what exactly the dispatch mechanism is doing.

physicsguy · on Sept 22, 2022

> In most programming languages, the linear algebra handling for these kinds of standard operations is performed by underlying libraries called the BLAS and LAPACK library. Most open source projects use an implementation called OpenBLAS, a C implementation of BLAS/LAPACK which does many of the tricks required for getting much higher performance than "simple" codes by using CPU-specialized kernels based on the sizes of the CPU's caches. Open source projects like R and SciPy also ship with OpenBLAS because of its generally good performance and open licensing, though it's known that OpenBLAS is handily outperformed by Intel MKL which is a vendor-optimized BLAS/LAPACK implementation for Intel CPUs (which works on AMD CPUs as well).

Much of this is more complex than this. Most open source software doesn’t ship with any assumptions about a particular BLAS/LAPACK implementation at all - and on HPC systems you are generally expected to choose one as appropriate and compile your code against it. It is generally only when you download a precompiled version that you’re given a particular implementation, but it doesn’t mean you can’t use another one if you compile from source as the BLAS and LAPACK libraries just present a standard API. Generally, for performance reasons, you want to compile specifically for your platform, because precompiled wheels from Conda, PyPI, etc. will leave performance on the table.

On forward thinking cluster teams these days, sysadmins use tools like Spack and Easybuild and to some degree software is made available to available to users either directly or by request, so it’s usual to log into a cluster and have multiple implementations available to choose from and compile your code against. More often than not, it’s still on you however to compile against what you need as dependencies. It’s a worthwhile exercise in HPC to try different ones and check the performance characteristics of your code on the particular machine with multiple implementations.

ChrisRackauckas · on Sept 22, 2022

If you look at the LinearSolve.jl defaulting system, if non-standard BLAS's are installed on the system then that takes priority in the defualt, so this all still works just fine (And would reduce compilation). This extra handling is mostly to make sure that the default desktop version works sufficiently well, since that's the baseline for most people.

adgjlsfhk1 · on Sept 22, 2022

This is true, but for desktop and laptop compute, I'd estimate something like 95% of users won't change the defaults. I'm also not sure how good our solvers are on clusters, but on desktops, the Julia versions are often faster than MKL which is the faster BLAS outside Julia that I know of.

actinium226 · on Sept 21, 2022

This was a real blocker for me when I played with Julia a bit ago, 30 seconds to run the ODE package was just crazy when Python could import it basically instantly. I'll look forward to giving Julia another go once I get a chance.

spacedome · on Sept 24, 2022

Thanks for writing this up, Chris!

I took a break from Julia a year or two ago because of some of these issues, one of the big ones being I didn't want to write and maintain a set of non-allocating LAPACK wrappers for iterative solvers, but the memory churn was killing my performance. So, so glad FastLapackInterface and LinerarSolve are a thing now, and the MKL situation is much easier with this trampoline development, makes me want to start working on Julia solvers again.

It does feel difficult to write performant Julia if you don't put a lot of effort to stay "in the know" as a lot of this knowledge is very dispersed, but I guess it makes sense as the language is still changing quite rapidly.

freemint · on Sept 21, 2022

Really in-depth. A good read if you happen to run into exactly this problem as a Julia package developer.

entropicgravity · on Sept 21, 2022

Is Julia/Rust the new Python/C?

lmiq · on Sept 22, 2022

Not exactly, because with Julia you don't need the lower level language for performance.

krapht · on Sept 21, 2022

I think in actual users Julia is behind Python, MATLAB, and R for scientific use...

tryptophan · on Sept 22, 2022

True. Such a shame though. Ignoring the ecosystem and dev experience the language itself is easily the best of the four.

pjmlp · on Sept 22, 2022

Julia is the new Python/C, that is the whole point having a JIT.