Hacker News new | past | comments | ask | show | jobs | submit login
Julia 2.0 isn’t coming anytime soon, and why that is a good thing (logankilpatrick.medium.com)
45 points by logankilpatrick on Sept 12, 2022 | hide | past | favorite | 28 comments



I'd like to use Julia for audio and image processing. The language and library support is well suited for my applications. However, the current state of the PackageManager is reason for Julia to be a non-starter for me. I looked at the linked release notes with hope that PackageManager improvements would be slated for 1.9. The startup time for standalone executables is unacceptably large and baffling given the computing efforts required to compile and package them. I reached for Julia to improve upon Python and the start up costs far exceed the available system time for gains. The mere existence of the acronym TTFP indicates that the Julia community has neither an appropriate frame of reference nor prioritization of developing Julia into a truly general programming language -- Julia remains far from production ready. Computing generality goes far beyond the habitual use of: input, analysis, plot. Swift and Rust ought to be benchmarks for executable startup. From my perspective, Julia runtimes ought to be able to be constructed and frozen in dynamic shared libraries such that startup can approach that of Swift and Rust. Julia developers perhaps can look to GraalVM for inspiration. A long wait for Julia 2.0, particularly if standalone executable compilation would be addressed, can not possibly help the adoption of Julia from my perspective.


> A long wait for Julia 2.0, particularly if standalone executable compilation would be addressed, can not possibly help the adoption of Julia from my perspective.

It's the opposite. Julia 2.0 would not bring standalone executable compilation, but instead would require compiler work for language changes. The whole point of saying "we don't need to do a 2.0 right now" is that folks on the compiler team believe working on things like standalone compilation and improved caching of LLVM code (i.e. fixing TTFP) are a better use of time than changes to language semantics (i.e. a 2.0).


If PackageManager changes can be addressed in 1.10, 1.11+ etc., I would be elated. Again, TTFP does not need to 'addressed' -- the acronym instead needs to be eliminated from usage -- a shift to use of 'startup-time' can help a change of mindset to open the Julia community up to a broader range of applications and adopters. Otherwise, Julia simply continues to feed its currently narrow niche of applications.


I think by PackageManager here you mean package compiler, and yes these improvements do not need a 2.0. v1.8 included a few things to in the near future allow for building binaries without big dependencies like LLVM, and finishing this work is indeed slated for the v1.x releases. Saying "we are not doing a 2.0" is precisely saying that this is more important than things which change the user-facing language semantics.

And TTFP does need to be addressed. It's a current shortcoming of the compiler that native and LLVM code is not cached during the precompilation stages. If such code is able to precompile into binaries, then startup time would be dramatically decreased because then a lot of package code would no longer have to JIT compile. Tim Holy and Valentin Churavy gave a nice talk at JuliaCon 2022 about the current progress of making this work: https://www.youtube.com/watch?v=GnsONc9DYg0 .

This is all tied up with startup time and are all in some sense the same issue. Currently, the only way to get LLVM code cached, and thus startup time essentially eliminated, is to build it into what's called the "system image". That system image is the binary that package compiler builds (https://github.com/JuliaLang/PackageCompiler.jl). Julia then ships with a default system image that includes the standard library in order to remove the major chunk of code that "most" libraries share, which is why all of Julia Base works without JIT lag. However, that means everyone wants to have their thing, be it sparse matrices to statistics, in the standard library so that it gets the JIT-lag free build by default. This means the system image is huge, which is why PackageCompiler, which is simply a system for building binaries by appending package code to the system image, builds big binaries. What needs to happen is for packages to be able to precompile in a way that then caches LLVM and native code. Then there's no major compile time advantage to being in the system image, which will allow things to be pulled out of the system image to have a leaner Julia Base build without major drawbacks, which would then help make the system compile. That will then make it so that an LLVM and BLAS build does not have to be in every binary (which is what takes up most of the space and RAM), which would then allow Julia to much more comfortably move beyond the niche of scientific computing.


'startup-time' vs. 'TTFP' are just semantics, but a crucial distinction. The focus on 'time-to-first-plot' is entrenched in the dominant workflow of the Julia community. If you want a broader range of adoption, the word choice matters. Having experienced a very similar dialogue in the Clojure community over startup time, and seen the new range of applications emerge as the issue was effectively addressed, I was startled by the very narrow and focussed acronym (TTFP) in the Julia community.

I am happy that there is progress on the code caching front and will look to the link you shared. Thanks for elucidating this further!


The group working on it, (Valentin, Tim Holy etc) agree with you, which is why we are tending to call it TTFX(time to first X) instead of TTFP, so it includes everything, not just plotting. It's just that when plotting the delay is very visible.


Can I ask what about your use case would involve starting Julia and importing packages over and over?

At least for my workloads, I often write a small .jl script that imports everything I want and then runs some functions on a list of files, for instance, or I just do my work in a Jupyter notebook, and I pay the load cost once for hours or days of computation.. Annuitized over the entire time I might use the kernel the startup time is basically nothing.

I'm not saying the load time should be a lot lot faster! That would be awesome! But I see folks complain about this and I'm wondering if they really need to be booting a new kernel constantly or if they could adapt how they work a little bit and get their desired result. (For instance, by running a .jl script instead of a bash script that calls Julia or something.)


Currently, the adaptation cost is too high. The ecosystem that I work in has many small, distributed, fast-startup applications. While I could create a fairly complex client server application written in Julia to work with its limitations, and write compact fast startup applications to communicate with it, the complexity and time investment is undesirable. Again, to use Julia here one must bend to the ecosystem's weaknesses. I think the community truly doesn't understand what a general purpose language is. If you compile with LLVM startup really ought to be a non-issue. Look at the startup of GraalVM, and they began with Java bytecode.


Ah okay well yes I see how that might be an issue. I basically do everything in Julia, including calling out to shell commands, running scripts in other languages, Python packages, etc. etc. It's effectively my shell.

I think the problem is that there needs to be a way to store the compiled code to disk and reimport it, which I think there are some plans toward that path. There were some developments discussed [here](https://www.youtube.com/watch?v=GnsONc9DYg0) to help with this, but the holy grail of fully compiled function caching across Julia kernels is not there yet.


In my last job, I built a data pipeline with Snakemake. Snakemake is a workflow manager where each command (often hundreds) is called from the shell. That was excruciating to do with Julia, and I can think of no alternative.


While I appreciate it as a user, I hope that when it happens they learn from Python and, surprisingly, PHP.

PHP took over 10 years between 5 and 7. A bit push to upgrade from 5.6 was around 2017/2018 when support would stop. It was a mess because most developers simply didn't care or didn't know enough to handle this.

Now they're doing yearly releases (7.2, 7.3, 7.4) but also PHP 8 was released already (along with 8.1 and 8.2).


I am interested to hear if folks think a 2.0 release would make Julia more appealing.


Yes, I'd love to see a revamp of the type system with real interfaces instead of the single inheritance with abstract types. I also think packaging and imports need a lot of work — for god's sake, there is no "import adjacent file" mechanism other than a C-style `include`, which just copies and pastes code from another file into the current file.


Not sure if interfaces work as you'd expect them to in Julia. Values don't have methods for instance, you just define functions for the types.

So what would an interface look like? Best I can figure out an interface is equivalent to a requirement that a couple of methods are implemented and that a bunch of fields exist, which sort of gets taken care of by the duck-typing, the only problem is that it's not documented in code.

One approach that I'd been thinking about would be defining a method IMyInterface that returns a tuple of fields and methods defined by the interface and saying a type implements that interface if there's an implementation. Technically this performs the same job, but unfortunately it doesn't seem like it'd be easy to use without hefty syntactic sugar.

What do you imagine interfaces would look like in Julia?


Most I want to be able to write something like `f(it::Iterable{<:Number})`. The current way to do this is to simply not specify the type and then it let it fail wherever the duck typing fails (e.g., if you try to use a for loop on a non iterable argument). I'd rather it fail when calling f saying no method exists for <passed type>.


I tend to think that this is the one thing that would bring a v2.0. Compile time, startup time, binary building, and native code caching improvements I think take the big priority here (and I describe elsewhere in this thread how those are really all one technical issue), but when that's completed I do think formalizing a traits interface will be the next major semantic change to the language. It could be a v1.x if it does not require a breaking change to the language's semantics, but it's hard (for me at least) to see how to nicely incorporate trait definitions and required dispatches without making a single breaking change.


I agree that that would be better, I'm just wondering what the definition of an interface would look like.

Technically "all" you need is to extend the 'where' keyword a "little":

    Iterable{T,S} = TIter where {
        iterate(iter::TIter)           <: Tuple{T,S},
        iterate(iter::TIter, state::S) <: Tuple{T,S}}


I'm imagining it would look more or less like a struct definition, but with methods instead of fields. Something like

interface{S,T} Iterable{T} Next = Tuple{Union{T,Nothing},S} iterate(self::Self)::Next iterate(self::Self, state::S)::Next end

This means "Iterable has two type parameters, one of which (T) is named explicitly in the type and the other of which (S) is implicitly used in the definition of the interface but is otherwise not nameable.".

I think the real question is how to implement structural sub typing of interfaces -- you wouldn't want to have to list the interfaces a struct implements when defining it because then you're limited to only those traits you (the struct's author) know about. Better (and more Julian) to automatically implement Iterable for any struct with those two iterate methods defined on it, but it seems like a pretty big task to keep track of that at runtime. Especially if you care about matching the return types of the interface's functions since in general those can't be type checked until they're actually called (or at the least compiled, but the whole point of Julia is to not compile the world ahead of time).


I don't think you really can list the interfaces a struct implements up front, the way I understand Julia you only implement those methods afterwards so you'd have an awkward gap in between the definition of the struct and the part where the definition is actually correct.

In theory an interface ought to just be a shorthand for 'this duck-typing works', but in a way that ideally can be verified statically.

A static type checker is the only possible (though major) advantage I see. It's debatable if it even makes sense for Julia to check interfaces at runtime, if something doesn't satisfy an interface but the code compiles, then why would it be wrong?


Yes, this is the major issue. When do you check for interface compatibility, and how? If you check for interface compatibility immediately then it will always fail because methods will be defined after the struct. Do you check when the module definition is then completed? Maybe another module completes the interface definition: do you disallow that? This question becomes more subtle with multiple dispatch. If you have a type parameter, A{T}, you may only define the iteration protocol on A{T <: Number}, but the struct definition may allow any T. Should that error? So then if you error if the interface is incomplete at the bottom of the module, should you error if there exists any possible dispatch that is missing an interface function? Maybe the answer is then to error at the bottom of the module if there exists any type parameter that does not implement this interface.

Okay cool, that does sound sensible. But now let's take a case where A{T <: AbstractFloat} defines the interface by doing direct dispatches for every floating point type in Julia Base. It won't error because iterate(::A{Float32}), Float64, etc. are all defined. Now someone adds the package ArbFloats.jl. Do you error when `using ArbFloats` because the interface is now not satisfied for `A{ArbFloat64}`? That would be annoying for users who never use type A, but if you don't error then your interfaces are not guaranteed to work. This would mean interfaces should only ever be used with `A{T <: T2}` where all interface definitions are done with `iterate(A{T}) where {T <: T2}` in the generic form so that it automatically extends well. That would be good code, but does that mean you should enforce that `iterate(::A{<:AbstractFloat})` is defined and error if it's not, even if every `T <: AbstractFloat` case is handled in the current REPL?

There's a lot to think about. But no matter how that's done, I cannot see it not breaking some code somewhere, hence probably being a 2.0.


That's starting to look a lot Haskell's typeclasses.


Haskell is a well-designed language so there is a lot to learn from it!


It's been very exciting to follow progress on Julia, it's quite a wonderful language. Thank you for all the fantastic work. While I understand 2.0 making sense in terms of breaking changes, I think a 2.0 release would need a major improvement in one area to be worthy of the title, and get more users interested. For example, if there was an update that made compilation significantly faster, or an update that empowered users to never have to restart the REPL with Revise, I think that would be a "defining" thing that would interest a lot of people, more so than breaking changes.


I'd like to see a way to specify interfaces / traits, in the base package itself.

I would also like to see a better test suite, with the ability to run a single test if I want to. Test Driven Development on a large project is a nightmare (although it's gotten a little better recently). I still can't believe how people contribute to Base Julia, because running the tests takes forever.

Lastly, better support for debugging in Neovim, Vim, Emacs etc. Better linting tools too. There's a push in other languages to write core language tooling in Rust or Zig, I think Julia would benefit from that approach (there's `juliaup` but I'd like to see more official tools similar to this).

Personally I'd like to see SetField.jl or Accessors.jl in Base. I'd also like to see improvements in the ergonomics of using immutable structs. In Julia, users should want to use immutable structs all the time, and ONLY if they need reference semantics they should want to reach for mutable structs. I think the naming and the sematics here is REALLLYY confusing when trying to get a team to write performant code. I can't tell you how many times I've had this kind of conversation with various members in my team.

Me: Create a struct for holding all that related data together

F: Okay, let me create a struct. - Refactors code base for a day - Realizes the data is being updated - Uses mutable structs

Me: Don't use mutable structs unless you want reference semantics

F: What even is reference semantics? - Tries to refactor by removing the `mutable` keyword, code breaks when doing `a.b += 1`, writes messy code like `a = A(B(a.b + 1, a.b.c, a.b.d, ...), a.c, a.d, ...)`.

Me: Ugh, why didn't you use `SetField` instead?

F: No unnecessary extra dependencies to reduce complexity.

Project Manager: Where are we with this task? Just use mutable struct and move on.

Me: Sigh. Fine.

On a non-language side of things,

1) it would be nice to have a bigger focus on teaching how to write performant Julia code. Currently, most if not all the material is geared toward someone who already knows how to write performance code (in C++ for example) and showing them the "Julian" way to do it.

2) I'd like to see a usability survey done from Julia users to see what features should be tackled next.


> Personally I'd like to see SetField.jl or Accessors.jl in Base. I'd also like to see improvements in the ergonomics of using immutable structs. In Julia, users should want to use immutable structs all the time, and ONLY if they need reference semantics they should want to reach for mutable structs. I think the naming and the sematics here is REALLLYY confusing when trying to get a team to write performant code. I can't tell you how many times I've had this kind of conversation with various members in my team.

I agree that one of these should be standardized to be a Base functionality. But that won't need a v2.0 for that to happen: it can happen in a v1.x.

> it would be nice to have a bigger focus on teaching how to write performant Julia code. Currently, most if not all the material is geared toward someone who already knows how to write performance code (in C++ for example) and showing them the "Julian" way to do it.

The issue is that many of these tricks are actively being addressed, for example with escape analysis to reduce allocations. So what really needs to be taught is how to profile and understand what the code is doing, and how to effect it to do what one wants for performance. That then increases the barrier to entry. I think we shouldn't be teaching people "how to write performant Julia code" much more than is already out there though: since most of the tricks are really just workarounds for missing optimizations, those compiler optimizations should just get added (and there's some work showing it is possible).


There are of course many breaking changes that could improve the language itself, but in my opinion the attention of maintainers is best spent on improving the ecosystem--while a 2.0 release may be exciting, I've been very pleased to see the language steadily improve with 1.x releases.


Well, yes, because there are several breaking changes that should happen, but can't before then, because of the adherence to semver.


Can we get a 5 year freeze on languages? There is too much fragmentation in this industry.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: