Hacker News new | past | comments | ask | show | jobs | submit login
Correctness and composability bugs in the Julia ecosystem (yuri.is)
699 points by benjojo12 on May 16, 2022 | hide | past | favorite | 407 comments



So this one is a tough one for me, because Yuri has certainly spent significant time with Julia and I think he's a very competent programmer, so his criticism is certainly to be taken seriously and I'm sad to hear he ended up with a sour opinion.

There's a lot of different issues mentioned in the post, so I'm not really sure what angle to best go at it from, but let me give it a shot anyway. I think there's a couple of different threads of complaints here. There's certainly one category of issues that are "just bugs" (I'm thinking of things like the HTTP, JSON, etc. issues mentioned). I guess the claim is that this happens more in Julia than in other systems. I don't really know how to judge this. Not that I think that the julia ecosystem has few bugs, just that in my experience, I basically see 2-3 critical issues whenever I try a new piece of software independent of what language it's written in.

I think the other thread is "It's hard to know what's expected to work". I think that's a fair criticism and I agree with Yuri that there's some fundamental design decisions that are contributing here. Basically, Julia tries very hard to make composability work, even if the authors of the packages that you're composing don't know anything about each other. That's a critical feature that makes Julia as powerful as it is, but of course you can easily end up with situations where one or the other package is making implicit assumptions that are not documented (because the author didn't think the assumptions were important in the context of their own package) and you end up with correctness issues. This one is a bit of a tricky design problem. Certainly adding more language support for interfaces and verification thereof could be helpful, but not all implicit assumptions are easily capturable in interfaces. Perhaps there needs to be more explicit documentation around what combinations of packages are "supported". Usually the best way to tell right now is to see what downstream tests are done on CI and if there are any integration tests for the two packages. If there are, they're probably supposed to work together.

To be honest, I'm a bit pained by the list of issues in the blog post. I think the bugs linked here will get fixed relatively quickly by the broader community (posts like this tend to have that effect), but as I said I do agree with Yuri that we should be thinking about some more fundamental improvements to the language to help out. Unfortunately, I can't really say that that is high priority at the moment. The way that most Julia development has worked for the two-ish years is that there are a number of "flagship" applications that are really pushing the boundary of what Julia can do, but at the same time also need a disproportionate amount of attention. I think it's overall a good development, because these applications are justifying many people's full time attention on improving Julia, but at the same time, the issues that these applications face (e.g. - "LLVM is too slow", better observability tooling, GC latency issues) are quite different from the issues that your average open source julia developer encounters. Pre 1.0 (i.e. in 2018) there was a good 1-2 year period where all we did was think through and overhaul the generic interfaces in the language. I think we could use another one of those efforts now, but at least that this precise moment, I don't think we have the bandwidth for it. Hopefully in the future, once things settle down a bit, we'll be able to do that, which would presumably be what becomes Julia 2.0.

Lastly, some nitpicking on the HN editorialization of the title. Only of the issues linked (https://github.com/JuliaLang/julia/issues/41096) is actually a bug in the language - the rest are various ecosystem issues. Now, I don't want to disclaim responsibility there, because a lot of those packages are also co-maintained by core julia developers and we certainly feel responsibility to make those work well, but if you're gonna call my baby ugly, at least point at the right baby ;)


FWIW my take is not that Yuri is expressing "there are too many bugs" so much as he's expressing a problem in the culture surrounding Julia itself:

> But systemic problems like this can rarely be solved from the bottom up, and my sense is that the project leadership does not agree that there is a serious correctness problem.

Concisely:

1. The ecosystem is poorly put together. (It's been produced by academics rather than professional software developers.)

2. The language provides few tools to guarantee correctness. (No static typing; no interfaces.)

Personally, what I'd love to see is one of the big tech companies come on board and just write their own ecosystem. The Julia language is amazing. The ecosystem needs to be rewritten.


Lots of things are being rewritten. Remember we just released a new neural network library the other day, SimpleChains.jl, and showed that it gave about a 10x speed improvement on modern CPUs with multithreading enabled vs Jax Equinox (and 22x when AVX-512 is enabled) for smaller neural network and matrix-vector types of cases (https://julialang.org/blog/2022/04/simple-chains/). Then there's Lux.jl fixing some major issues of Flux.jl (https://github.com/avik-pal/Lux.jl). Pretty much everything is switching to Enzyme which improves performance quite a bit over Zygote and allows for full mutation support (https://github.com/EnzymeAD/Enzyme.jl). So an entire machine learning stack is already seeing parts release.

Right now we're in a bit of an uncomfortable spot where we have to use Zygote for a few things and then Enzyme for everything else, but the custom rules system is rather close and that's the piece that's needed to make the full transition.


The fact that things are being rewritten and the primary criteria being looked at is speed IS culturally a big part of the problem. If you don't prioritize provable correctness first, then I guarantee that the code is not correct. And as the complaint explains, incorrect code costs people months and leads them to not trust the result.

Don't believe me? Re-read the blog post about how a major source of bugs is people making assumptions into silent errors by removing bounds checks. Simply being able to re-run the same code in a slow mode with the bounds checks turned back on would undoubtably catch bugs.


100% this. In a discussion on "cultural correctness issues prevents me from using Julia", it's very telling that the response is "more speed!"

There's been a decent number of posts based around "Julia has these problems". And I don't think that's because the world at large has a vendetta; I think it's because the world at large desperately wants to use Julia, but struggle with hard blocks that are currently preventing adoption.

FWIW I do think there's a growing acceptance in the Julia community that these concerns are real, which is good. (See the parallel discussion on the Julia Discourse.)


Two of the mentioned packages, Lux and Enzyme, have increased correctness and decreased API surface... and were not mentioned for speed (though a lot of things end up faster when it's easier to prove correctness in the compiler)... so the response wasn't "more speed" but "here's correctness with resulting speed"...


Actually Enzyme was mentioned for speed, not correctness. To verify, go back and see that you wrote, Pretty much everything is switching to Enzyme which improves performance quite a bit over Zygote...

You didn't mention speed on Lux, but it is a rewrite. The rule is that a rewrite should be assumed to be buggy until proven otherwise. A culture of having everything permanently under construction comes with upsides and downsides. And unless you have good testing, correctness problems is one of the downsides.


> Re-read the blog post about how a major source of bugs is people making assumptions into silent errors by removing bounds checks. Simply being able to re-run the same code in a slow mode with the bounds checks turned back on would undoubtably catch bugs.

Running Julia with the command line argument --check-bounds=yes does that, and package testing always uses this option to disable inbounds.


In most cases, being fast is useless without being correct. Even approximate things like simulations depend on the programming language being deterministic and correct. Otherwise the math of approximation doesn't work out.

With my programmer hat, the first thing I care is not speed for most cases. Unless there's an explicit need for speed, I don't select the language I gonna use with respect to its performance, and I don't port a tool unless the speed becomes limiting.

It's important to make it run first, then make it fast. Otherwise, things go very wrong, very fast (pun intended).


But that feature already exists: you can re-run the same code in a slow mode with the bounds checks turned on... It is just a flag you can set at startup.


Enzyme dev here, so take everything I say as being a bit biased:

While, by design Enzyme is able to run very fast by operating within the compiler (see https://proceedings.neurips.cc/paper/2020/file/9332c513ef44b... for details) -- it aggressively prioritizes correctness. Of course that doesn't mean that there aren't bugs (we're only human and its a large codebase [https://github.com/EnzymeAD/Enzyme], especially if you're trying out newly-added features).

Notably, this is where the current rough edges for Julia users are -- Enzyme will throw an error saying it couldn't prove correctness, rather than running (there is a flag for "making a best guess, but that's off by default"). The exception to this is garbage collection, for which you can either run a static analysis, or stick to the "officially supported" subset of Julia that Enzyme specifies.

Incidentally, this is also where being a cross-language tool is really nice -- namely we can see edge cases/bug reports from any LLVM-based language (C/C++, Fortran, Swift, Rust, Python, Julia, etc). So far the biggest code we've handled (and verified correctness for) was O(1million) lines of LLVM from some C++ template hell.

I will also add that while I absolutely love (and will do everything I can to support) Enzyme being used throughout arbitrary Julia code: in addition to exposing a nice user-facing interface for custom rules in the Enzyme Julia bindings like Chris mentioned, some Julia-specific features (such as full garbage collection support) also need handling in Enzyme.jl, before Enzyme can be considered an "all Julia AD" framework. We are of course working on all of these things (and the more the merrier), but there's only a finite amount of time in the day. [^]

[^] Incidentally, this is in contrast to say C++/Fortran/Swift/etc, where Enzyme has much closer to whole-language coverage than Julia -- this isn't anything against GC/Julia/etc, but we just have things on our todo list.

[ps sorry if this ended up as a dup, I meant to reply deeper in the tree, so I deleted the older comment and moved it here].


With luck you will succeed. And that is a great thing.

But I maintain my position. If users are choosing packages because of speed without worrying about correctness, then packages will become popular that care less about correctness than what you describe. And when people combine popular packages that make conflicting assumptions, correctness will be lost.

In other words the problem is the attitude, not the specific package. For another example of the same problem, look at how C/C++ compilers prioritizing speed has resulted in their taking advantage of undefined behavior in a way that makes it far harder for any significant C/C++ codebase to be correct.


> The Julia language is amazing. The ecosystem needs to be rewritten.

I think this is pretty unfair. Julia has many libraries that have allowed me to build things that would have taken orders of magnitude more effort to produce in other languages with the same conciseness and efficiency.

Composability and efficiency hard. Are things better elsewhere? Python has excellent libraries. But these are big monoliths that not only do not compose well, but are also hard to understand deeply as they are essentially a thin layer over C, C++, Fortran, etc.

Julia simply needs more maintenance and more tests. There is no big corporate backing, and things depend on individual efforts. In my opinion, most packages are already polished and easy to understand.

IMHO, the biggest problem is that there is no reliable library to build huge transformers.


As a user, I’d prefer “correct but lacking composability” over “composable but sometimes my results will be silently wrong”.

What is Julia’s composability useful for if it leaves me unable to trust my results?


The answer as far as I can see is to write integration tests when you want to use composability (and/or just check that the packages in question already have integration tests against each other -- increasingly many do). It's not especially hard or anything, but you need to know to do it.


Hm. Some cases are where non-composable things might just fail; that is indeed "not especially hard or anything" to detect with simple integration tests, sure. But those you'll also probably notice without integration tests too, when your code just obviously doesn't work. Either way, sure, you'll find out about this quick, and know you have to fix it if you want to use that code to get the results you wanted to get.

But some of the examples in OP of non-composability are where edge cases (not all cases) will give you the wrong answers. In the worst case, in ways that aren't immediately obvious.

I think it's too optimistic to say that you just need to know that you should write some integration tests and then write some easy and obvious ("not especially hard") ones, and then you'll always catch these easily and early.


Sure, but that's a caveat that applies to testing in general, not something somehow special to composability. Bugs that only appear in edge cases or give wrong answers are always a concern when writing tests, that's nothing new.

I don't actually even see any evidence here that these sorts of bugs are more common or more insidious than any other sort of bug, rather it looks to me that since dispatch-based composability is a bit of a new phenomenon, people haven't always been on the lookout for these bugs yet, or learned the patterns of where and what to test for to find them -- but once you've seen a few cases, patterns start appearing and it becomes more clear what you need to test, just like anything else.

The broader issue to me is that I think people often underestimate the degree to which Julia is a fundamentally different programming paradigm than any other language most people have used before (unless you've spent a lot of time in maybe CLOS or Dylan) -- there's not even an agreed-upon name for it, but I'd call it "dispatch-oriented programming". Often times people will come in to Julia expecting all their expertise and intuition from class-based OO or whatever other paradigm to just effortlessly transfer to Julia, and that's really not the case, which tends to lead to frustration.


Right, it's a caveat to testing in general that shows "well, just write integration tests, it's not especially hard to do" is not a satisfactory solution to the problems discussed in OP with specific actually occuring examples. You were the one suggesting it was, not me!


Well I don't think it's hard any more than writing any other sort of test in a given language. That doesn't mean it doesn't require expertise in that language.


I used to love javascript. When people asked, I made a similar argument - that you need tests anyway, and if you're writing tests you'll spot any obvious typing bugs in your javascript code.

I think I was wrong. Moving to typescript has been a revelation for me, because I find I need far fewer tests to make working code. (And my code so often just works as soon as it compiles!). I can code much more fearlessly.

Rust is similar. By the time the borrow checker has sufficiently bruised my ego, my rust code often just works. And thats a joy.

What I'm hearing is that Julia is more like javascript and C, and less like typescript and rust. That in Julia you need to be more paranoid around correctness - because you can easily compile a binary that produces incorrect results. You need extensive integration tests to guard against that, because its not enough to know that two libraries work individually and compile together. They need to also be tested together or they might silently do the wrong thing.

That sounds like a bad experience. Honestly, that sounds worse than javascript. At least in javascript, type errors almost always result in instant crashes with a useful stack trace. You say testing Julia code isn't hard, but that doesn't mean I want to do it. Building a stone wall by hand isn't hard either, but I'll take my desk job every time.


I can appreciate that perspective for sure.

Just to make sure we're on the same page though, it's perhaps worth clarifying that this particular integration issue isn't something you have to worry about every time you use two packages together (far from it), it's only in the case where you're trying to use a custom type from one package (e.g., one which replaces base Arrays or base Numbers or such) in functions from another package.


> Are things better elsewhere? Python has excellent libraries. But these are big monoliths that not only do not compose well, but are also hard to understand deeply as they are essentially a thin layer over C, C++, Fortran, etc.

I dunno.

Things like the use of scipy.spatial.distance metrics[1] by in sklearn clustering[2] seems a great example of composability that is easy to learn and very efficient.

And the sklearrn side isn't a "thing layer over C, C++, Fortran" even if scikit is (sort of) this.

[1] https://docs.scipy.org/doc/scipy/reference/spatial.distance....

[2] https://scikit-learn.org/stable/modules/generated/sklearn.me...


How has Python - almost surely the most successful and widely adopted scientific programming ecosystem - avoided the problems of #2? E.g. Python doesn't have static typing.

Is it just that Python is so widely used there's institutional support for incredible linting and type check tools despite the lack of static typing? Or that much of the science/data ecosystem of Python is written in lower level statically typed languages?

(sadly possibly necessary edit/clarification: I'm not trying to be That Guy who answers every complaint about Julia with a matching complaint about Python. I'm legitimately curious about how Python got where it is without static typing, and what that implies about paths to a better ecosystem for Julia.)


Julia heavily makes use of multiple dispatch among with other convenient type related features much more complex than Python, to a point where they are often abused and sometimes have uncaught edge cases. It makes the language very powerful but has its downsides.

And to be fair to Python, static analysis has come a very long way and the CPython interpreter makes far fewer complex assumptions than the Julia compiler. It’s also fairly strongly typed as well, so I’ve found that challenges with the type system cause more issues with packaging and maintenance than it does correctness.


Indeed Python is relatively simple language. It also adheres to the principle of least astonishment and dynamic types are used towards this goal. Finally it does not mind making large changes to the language.

With Python most of the time when you have an unexpected result with the language or a library it is often a matter of realizing "OK that's the way it works", and moving on with your work. The language and libraries strive so much to always return sensible results that they are fewer instances when you would call a behavior a bug.


I don't think it needs a rewrite as much as careful maintenance from people who have time to dedicate to software quality. Most of the APIs are good, it's just that a lot of the code is under-tested and doesn't receive enough love. Having more big companies using Julia would help a lot with that.


Hi Keno,

Thanks for the honest assessment. Do you have any thoughts about correctness/ composability of compiler transforms like AD, reliability of GPU acceleration and predictability of optimizations? (basically what you've discussed in some of your compiler talks).

How is that going to be possible in an imperative language? Right now we have lux.jl, which is a pure by convention DL framework, but that ends up being jax without the TPUs, kernel fusion, branching (Lux relies on generated functions) and copy elision (though this last part is being worked on IIUC).

A bunch of folks in the ML, Probprog and fancy array space have been grappling with things like generated functions, type level programming and such, and were wondering about future directions in this space: https://julialang.zulipchat.com/#narrow/stream/256674-compil... there among other discussions

Edit: re : bandwidth issue Jan Vitek's group is thinking a lot about the verification vs flexibility tradeoff and some people are working on a trait/ static typing system. Maybe something can be done to help them along?


> Thanks for the honest assessment. What about correctness/ composability of compiler transforms like AD, reliability of GPU acceleration and predictability of optimizations? (basically what you've discussed in some of your compiler talks).

I don't think we really have a good answer yet, but it's actively being worked on. That said, I don't think we can be faulted for that one, because I don't think anybody really has a good answer to this particular design problem. There's a lot of new ground being broken, so some experimentation will be required.

> TPUs, kernel fusion, branching (Lux relies on generated functions) and copy elision (though this last part is being worked on IIUC).

We have demonstrated that we can target TPUs. Kernel fusion is a bit of an interesting case, because julia doesn't really use "kernels" in the same way that the big C++ packages do. If you broadcast something, we'll just compile the "fused" kernel on the GPU, no magic required. There is still something remaining, which is that when you're working on the array level, you want to be able to do array-level optimization, which we currently don't really do (though again, the TPU work showed that we could), but is broadly being planned.

> Edit: re : bandwidth issue Jan Vitek's group is thinking a lot about the verification vs flexibility tradeoff and some people are working on a trait/ static typing system. Maybe something can be done to help them along?

We work closely with them of course, so I think there'll be some discussions there, but it's a very tough design problem.


Glad to hear it's being worked on!

> That said, I don't think we can be faulted for that one, because I don't think anybody really has a good answer to this particular design problem.

Agreed! To be clear, If there's any implication of "fault" it was certainly not in a moral sense or even anything around making poor design decisions. Julia's compiler is being asked to do many new things with semantics that necessarily predated many advances in PL.

Re Kernel fusion, there's another piece here, which you may or many not have included in "array-level optimizations". Julia's "just write loops" ethos is awesome, until you get to accelerators...now we're back to an "optimizer defined sub language" as TKF puts it. People like loops and flexibility, Dex, Floops.jl, Tullio, Loopvec and KA.jl show that it's possible to retain structure and emit accelerator-able loopy code. But none of those, except for dex, has a solution for fusing kernels that rely on loops. I'm still using the concept of Kernels, because there's still a bit of a separation between low level CUDA.jl code/these various DSLs and higher level array code, even if not as stark as python or C++.

Would be really cool, if like Dex, there's a plan to fuse these sorts of structured loops as well. Dex does it by having type level indexing and loop effects (they're actually moving to a user defined parallel effect handler system (https://arxiv.org/abs/2110.07493) ...the latter can tell the compiler when it's safe to parallelize and fuse+beta reduce loops. But that relies on structured semantics/effects and a higher level IR than exists in Julia.

Not sure what a Julian solution would look like, if possible. But given the usability wins, it would be great to have in Julia as well.


> But none of those, except for dex, has a solution for fusing kernels that rely on loops.

The LV rewrite will. Some day, I'd like to have it target accelerators, but unlike fusion, I've not actually put any research/engineering into it so can't make any promises.

But my long term goal is that simple loops in -> optimized anything you want out. Enzyme also deserves a shout out for being able to generate reverse mode AD loops with mutation.


to add, as you know, this is part of a more general problem about type level programming vs write your own compiler vs the non composability of DSLs, where Julia folks in various other non ML domains like PPLs and fancy arrays have been wondering about how to do things that get compiled away, without relying on compiler heuristics or generated function blowups: https://julialang.zulipchat.com/#narrow/stream/256674-compil...

Another non ML example I discussed with some Probprog folks is that there was an arxiv review of PPLs and Julian ones that heavily rely on macros don't compose well within and across packages. The same mechanism for composability which Dex uses for parallelism and AD (effect handlers) is what new gen PPLs in jax and Haskell are using for composable transformable semantics, so maybe that's worth looking into.

We've been having some discussions about how to bring that to Julia, but stalled on engineering time and PL knowledge. Eventually wanted to talk to the core team about it with proposal in hand, but never got there. Let me know if you'd like to talk to some of those folks who have been involved in the discussions as you design the new compiler plugin infra.

https://julialang.zulipchat.com/#narrow/stream/256674-compil...


The big language design problem that I think this post highlights is that the flip side of Julia's composability is that composing generic code with types that implement abstractions can easily expose bugs when the caller and the callee don't agree on exactly what the abstraction is.

Several of the bugs that Yuri reported are a very specific case of this: there's a lot of generic code that assumes that array indexing always starts at one, but that's not always the case since OffsetArrays allow indexing to start anywhere. The older code in the stats ecosystem is particularly badly hit by this because it often predates the existence of OffsetArrays and the APIs that were developed to allow writing efficient generic code that works with arrays that don't start at the typical index (or which might even want to be iterated in a different order).

Fixing these specific OffsetArray bugs is a fairly straightforward matter of searching for `1:length(a)` and replacing it with `eachindex(a)`. But there's a bigger issue that this general problem raises: How does one, in general, check whether an implementation of an abstraction is correct? And how can one test if generic code for an abstraction uses the abstraction correctly?

Many people have mentioned interfaces and seem to believe that they would solve this problem. I don't believe that they do, although they do help. Why not? Consider the OffsetArray example: nothing about `for i in 1:length(a)` violates anything about a hypothetical interface for AbstractArrays. Yes, an interface can tell you what methods you're supposed to implement. There's a couple of issues with that: 1) you might not actually need to implement all of them—some code doesn't actually use all of an interface; 2) you can find out what methods you need to implement just by running the code that uses the implementation and see what fails. What the interface would guarantee is that if you've implemented these methods, then no user of your implementation will hit a missing method error. But all that tells you is that you've implemented the entire surface area of the abstraction, not that you've implemented the abstraction at all correctly. And I think that covering the entire surface area of an abstraction when implementing it is the least hard part.

What you really want is a way to generically express behaviors of an abstraction in a way that can be automatically tested. I think that Clojure's spec is much closer to what's needed than statically checked interfaces. The idea is that when someone implements an abstraction, they can automatically get tests that their implementation implements the abstraction correctly and fully, including the way it behaves. If you've implemented an AbstractArray, one of the tests might be that if you index the array with each index value returned by `eachindex(a)` that it works and doesn't produce a bounds error.

On the other end, you also want some way of generating mock instances of an abstraction for testing generic code. We do a bit of this in Julia's test suite: there are GenericString and GenericSet types, which implement the minimal string/set abstraction, and use these to test generic code to verify that it doesn't assume more than it should about the string and set abstractions. For a GenericArray type, you'd want it to start at an arbitrary index and do other weird stuff that exotic array types are technically allowed to do, so that any generic code that makes invalid assumptions will get caught. You could call this type AdversarialArray or something like that.

I've personally thought quite a bit about these issues, but as Keno has said, there hasn't been time to tackle these problems in the last couple of years. But they certainly are important and worth solving.

On a personal note, Yuri, thanks for all the code and I'm sorry to see you go.


It seems to me that much of the difficulty with interfaces, whether they are made explicit or kept implicit, lies in defining the semantics that the functions are supposed to have.

As we expand the types our generic code can handle, we have to refine the semantics it relies on. For a long time, Base.length(::AbstractArray) could mean “the largest one-based index of the array”, but then we started using the same code that handles regular Arrays for OffsetArrays and this interpretation was no longer valid. I guess the alternative would have been to leave length(::OffsetArray) unimplemented and block the valid use of OffsetArrays for all generic code that understands Base.length as “the number of values”.

It can still be difficult to tell what a function like Base.length should mean if I implement it for my types. For example, should it return the number of local values or the global length for an array that is distributed between multiple processes (e.g. in an MPI program)? Perhaps some generic code will use it to allocate a buffer for intermediate values, in which case it should be the local length. Or some generic code computes an average by dividing the (global) sum by the global length.

It seems impossible to come up with a precise definition of all the semantics your generic code assumes a priori, so we can either restrict our usage of generics to a small number of concrete types that were considered when the code was written, or we have to accept that we occasionally run into these sorts of issues while we refine the semantics.

Anecdotally, it has been my experience that packages that have been made to work in many generic contexts (such as the ODE packages) are likely to work flawlessly with my custom types, while packages that have seen less such effort (e.g. iterative solvers) are more likely to cause issues. This makes me hopeful that it is possible to converge towards very general generic implementations.

It is also worth mentioning that it is very possible to use Julia without ambitious use of cross-package generic functionality, and use it “merely” as a better Fortran or Matlab.


To expand on the "interfaces are not enough" part: Defining an interface on an abstract type only gives you that a implementation exists, not that it is correct, i.e. that the specific implementation for a subtype guarantees the same properties the interface specifies.

On top of this, you really want to be alerted to when you expect more of an interface than the interface guarantees - this is what happened in the case of `1:length(A)` being assumed to give the indices into `A`, when the `AbstractArray` interface really only guarantees that a given set of methods exists.

I feel like these sorts of issues more or less require more formal models being provided & checked by the compiler. Luckily for us, nothing in this space has been implemented or attempted in & for julia, while there are a lot of experiments with formal methods and proofing systems being researched right now (TLA+, coq,..). There are of course a lot of footguns[1], but the space is moving fast and I'd love to see something that makes use of this integrated into julia at some point.

[1]: Why specifications don't compose - https://hillelwayne.com/post/spec-composition/


> Defining an interface on an abstract type only gives you that a implementation exists, not that it is correct

Pretty far off topic for Julia, but the definition of Rust's Traits over semantics rather than syntax (even though of course the compiler will only really check your syntax) gives me a lot of this.

The fact that this Bunch<Doodad> claims to be IntoIterator<Item=Doodad> tells me that the person who implemented that explicitly intends that I can iterate over the Doodads. They can't accidentally be IntoIterator<Item=Doodad> the author has to literally write the implementation naming the Trait to be implemented.

But that comes at a heavy price of course, if the author of Bunch never expected me to iterate over it, the best I can do is new type MyBunch and implement IntoIterator using whatever ingredients are provided on the surface of Bunch. This raises the price of composition considerably :/

> you really want to be alerted to when you expect more of an interface than the interface guarantees

In the case alluded to (AbstractArray) I feel like the correct thing was not to implement the existing interface. That might have been disruptive at the time, but people adopting a new interface which explicitly warns them not to 1:length(A) are not likely to screw this up, and by now perhaps everything still popular would have upgraded.

Re-purposing existing interfaces is probably always a bad idea, even if you can persuade yourself it never specifically said it was OK to use it the way you suspect everybody was in practice using it, Hyrum's Law very much applies. That interface is frozen in place, make a new one.


I think the OP specifically complains about the use of @inbounds and that the documentation was advocating an invalid use of it. Some libraries may not have been updated to handle AbstractArray: that's normal SW rot. But the out of bound access being unreported is the actual grief of the OP.


> What you really want is a way to generically express behaviors of an abstraction in a way that can be automatically tested.

The pure FP ecosystems in Scala often accomplish this in the form of "laws", which are essentially bundles of pre-made unit tests that they ship alongside their core abstraction libraries.


Invenia's approach to interface testing ("Development with Interface Packages" on our blog) does some of the things you suggest as a standard of practice, by providing tools to check correctness that implementers can use as part of package tests. ChainRulesTestUtils.jl is a decent example (although this one doesn't come with fake test types). I think this is typically good enough, and only struggles with interface implementations with significant side effects.

One little win could be publishing interface tests like these for Base interfaces in the Test stdlib. I appreciate that the Generic* types are already exposed in the Test stdlib!


> 2) you can find out what methods you need to implement just by running the code that uses the implementation and see what fails.

For large codebases this is SO painful to do. I just don't understand how anyone gets anything done when this is how they have to develop code.


That's why interfaces are useful—they save you from that. But they don't actually solve the problem of checking that an abstraction has been implemented correctly, just that you've implemented the entire API surface area, possibly incorrectly. Note, however, that if you have a way of automatically testing the behavioral correctness of an implementation, then those tests presumably cover the entire API, so automatic testing would subsume the benefit that static interface checking provides—just run the automatic tests and it tells you what you haven't implemented as well as what you may have implemented incorrectly.


Indeed, static types don't save you from this issue, at least structural ones don't, exactly the same issue would occur as in Julia (see: C++ templates). Static structural types have the same problem as Julia here, you gain a lot of compositional power at the expense of potential correctness issues.

However, nominal types do "solve" the problem somewhat, as there's a clear statement of intent when you do "X implements Y" that the compiler enforces. If that promise is missing, the compiler will not let you use an X where a Y is expected. And if you do say X implements Y, then you probably tested that you did it correctly.

But this would also fail at the OffsetArray problem. The only way I can see of protecting against it (statically or dynamically) is to have an "offset-index" type, different from an integer, that you need to have to index an OffsetArray. That makes a[x] not be compatible between regular Arrays and OffsetArrays.

I don't think anyone wants that mess though. So if your language has OffsetArrays, and they're supposed to be compatible with Arrays, and you can index both the same way, no amount of static types will help (save for dependent/refinement types but those are their own mess).

EDIT: I seem to have replied to the wrong comment, but the right person, so hey, no issue in the end :)


I think I agree with your claim that writing tests would be a super set in terms of covering the use cases of interfaces. But

1) Testing is such a PAIN in Julia. You HAVE to run all the tests every time. You HAVE to write tests in a separate folder. Multiple dispatch and composibility prevent having confidence that your tests cover all the cases.

2) In a lot of cases, interfaces improve readability of the code. Being able to just look at code in a large code base and know which interfaces are implemented + need to be implemented is such an advantage

3) Static analysis tooling can provide linting. This doesn't even have to be implemented by the core team at the moment. The lack of interface limits any kind of tooling to be developed.

All in all, when I write Julia code, I know I have to write EXTENSIVE tests. Even more tests than I have to write with Python (with mypy). Almost an order of magnitude more tests than I have to write with Rust.

Sometimes it feels like I spend more time writing / running tests than adding features to our packages. And that is honestly just not fun for me, let alone for other scientists and researchers on my team.


Interface’s provide correctness guarantees by way of implementing them is a conscious decision. If your array implements GenericArray, you know about that interface, and presumably what it is used for. Its methods can also contain documentation.

The point is a common point of… trust may be the word? Two developers that don’t even know each other can use each other’s code correctly by programming against a third, hypothetical implementation that they both agree on. Here OffsetArray would simply not implement the GenericArray interface if the latter expects 1-based indexing.

In this specific case the solution would be to move the indexing question into the interface itself - it is not only an implementation detail. Make the UltraGenericArray interface have an offset() method as well and perhaps make [] do 1-based indexing always (with auto-offsetting for indexed arrays), and a separate index-aware get() method, so that downstream usage must explicitly opt in to different indexing.


I remember reading a long time ago about the 1-based array and the offset-array 'kludge'.

My first thought was they should have replicated Ada's design instead, my second thought I hope that they have a good linter because putting arbitrary offset implementation in a library is a minefield.

I don't claim to be especially smart: this is/was obvious.. Unfortunately what isn't obvious is how to fix this issue and especially how to fix the culture which produces this kind of issue..


Offset arrays aren't a kludge and the package would exist regardless of whether zero or one has been chosen as a default base for indexing. Having arbitrary index ranges in different dimensions is extremely useful in many application domains. When working with FFTs, for example, it's natural for the indices to be symmetrical around zero. Or when doing coordinate transforms like in this example from the excellent Images.jl package: https://juliaimages.org/stable/tutorials/indexing/.


Offset arrays are a wonderful source of footguns, and this should have been obvious from the start. Either you train everyone to write loops in an official way that avoids the problem, or you are going to have bugs. And if you choose training, you have to train EVERYONE, because anyone coming in from another language will have the wrong habits.

Even Perl realized that changing the indexing of arrays was a bad idea. Which is why the special variable $[ has been deprecated every which way possible because it caused too many bugs.


Here's a post I wrote on Julia discourse about why one-based indexing is not the culprit here, no matter what people may feel about it aesthetically: https://discourse.julialang.org/t/offsetarrays-inbounds-and-....

Several of the most serious scientifically minded programming languages have also had arrays that can be indexed starting at different offsets, including Fortran, Chapel and Fortress. If Julia were zero-based OffsetArrays would still exist and be highly useful. OffsetArrays is not some "kludge" just to allow indexing from zero. Frankly, indexing from zero isn't improtant enough for that. What really makes OffsetArrays useful is being able to do things like have indices go from -k:k where k may depend on the particular dimension. That way the point A[0, 0, 0] is the center of the array and you can navigate up/down/back from there naturally. Of course you can simulate that with arrays that arrays that start at zero or one, but it's a major pain.


Hubris


> there are a number of "flagship" applications that are really pushing the boundary of what Julia can do, but at the same time also need a disproportionate amount of attention.

Disproportionate effort is an obvious sign that hacks to keep such flagships seaworthy are prioritized over a good language and a good library.

> Basically, Julia tries very hard to make composability work, even if the authors of the packages that you're composing don't know anything about each other.

Typically, programming languages and libraries don't need to "try very hard" because they are designed to be safe and correct, at the cost of curbing ambitious features.

> not all implicit assumptions are easily capturable in interfaces. Perhaps there needs to be more explicit documentation around what combinations of packages are "supported".

Supporting useful "combinations of packages" isn't a desirable approach to language and library evolution. Implicit assumptions must disappear, either by becoming explicit or by becoming unnecessary; both ways represent genuine progress, not fruitless firefighting.


> Disproportionate effort is an obvious sign that hacks to keep such flagships seaworthy are prioritized over a good language and a good library.

I do not think this is true; from my limited Julia experience the reason the flagship features need disproportionate efforts is precisely because they are research project and the developers make sure they are not hacks.


Re the title: ok, we've replaced the submitted title ("The Julia language has a number of correctness flaws") with a representative phrase from the OP which uses the word 'ecosystem'.

HN's title rule calls for using the original title unless it is misleading or linkbait (https://news.ycombinator.com/newsguidelines.html) and "Why I no longer recommend Julia" is generic enough to be a sort of unintentional linkbait - I think it would lead to a less specific and therefore less substantive discussion. In that sense the submitter was probably right to change the title, and for the same reason I haven't reverted it.

I'm going to autocollapse this comment so we don't get a big thread about titles.


Thanks. Appreciate your thoughtful moderation as always :).


Everything has correctness issues somewhere. Julia ships an entire patched version of LLVM to fix correctness bugs in numerical methods. It has its own implementations of things like software-side FMA because the FMA implementation of Windows is incorrect: https://github.com/JuliaLang/julia/pull/43530 . Core Julia devs are now the maintainers of things like libuv because of how much had to be fixed there. So from those three points, that clearly points out tons of cases where Python, R, etc. code is all incorrect where Julia isn't.

I think what's interesting about Julia is that because the code is all Julia, it's really easy to dig in there and find potential bugs. The standard library functions can be accessed with @edit sum(1:5) and there you go, hack away. The easier it is to look at the code, the easier it is to find issues with it. This is why Julia has such a higher developer to user ratio. That has its pros and cons of course. It democratizes the development process, but it means that people who don't have a ton of development experience (plus Fortran or C knowledge) are not excluded from contributing. Is that good or bad? Personally I believe it's good in the long run, but can have its bumps.

As an aside, the author highlights "for i in 1:length(A)". I agree, code should never do that. It should be `eachindex(A)`. In general things should use iterators which are designed for arbitrary indexing based on iterators. This is true in any language, though you'll always have some newcomers write code (and documentation) with this. Even experienced people who don't tend to use arrays beyond Array tend to do this. It's an interesting issue because coding style issues perpetuate themselves: explicitly using 1-base wasn't an issue before GPUs and OffsetArrays, but then loop code like that trains the next generation, and so more people use it. In the end the people who really know to handle these cases are the people who tend to use these cases, just like how people who write in styles that are ARM-safe tend to be people who use ARM. Someone should just run a bot that opens a PR for every occurrence of this (especially in Base), as that would then change the source that everyone learns from and completely flip the style.


> Everything has correctness issues somewhere.

This is fallacy of gray. The blog post isn't complaining that there are non-zero bugs, it's complaining that when you use the language you hit a lot of correctness bugs. More bugs than you'd hit using e.g. python.

Also, to the extent that Julia uses LLVM, a correctness bug in LLVM is also a correctness bug in Julia. So arguing "LLVM has lots of correctness bugs" is not helping the case...

> because the code is all Julia, it's really easy to dig in there and find potential bugs.

The blog post is about bugs hit while running code, not bugs found while reading code. The fact the issue can be understood and pointed at is great, but it's the number of issues being hit that's the problem.


> So arguing "LLVM has lots of correctness bugs" is not helping the case

It does not help the case about the correctness of Julia, but it does help the case about Julia having more bugs than other software (negatively for the other projects). Every library built with LLVM that touches those code paths will have those bugs.

Another thing to have in mind is that Julia ships patches for some of these, that are not used upstream yet. So Julia does not suffer from some bugs on LLVM that other projects might.


It shows that Julia's tests are systematically finding (and leading to fixes) of numerical bugs that are pervasive throughout the rest of the LLVM ecosystem. And since Julia's LLVM is patched to solve these while other variants of LLVM are not, Julia is more correct in these aspects than other languages which rely on the Base build of LLVM. Of course Julia doesn't solve "all bugs", but some of them (like the correctness of certain math library implementations) really make you question how hard other language tests are hammering those for correctness testing (Julia has a lot of numerical tests checking the precision of such methods against MPFR bigfloats at higher precision to ensure ~X ulp correctness for example). Julia definitely spends a lot of time testing numerical correctness than it does testing something like a web server. It's just a prioritization thing.


Do Julia devs not upstream their LLVM patches?


We do, but it takes a while for LLVM to accept and release patches, so by the time any one issue is fixed, there will be a new bug to take it's place.


I do think there's a particularly unique challenge to Julia in that so many packages can theoretically coexist and interoperate. While it quadratically increases the power of Julia, it also quadratically increases the surface area for potential issues. That — to me — is the most interesting part of the blog post. How can we help folks find the "happy" paths so they don't get lost in the weeds by trying to differentiate a distributed SVD routine of an Offset BlockArray filled with Unitful Quaternions? And — as someone who worked with and valued Yuri's reported issues and fixes — how can I more quickly identify that they're not someone who gets joy out of making such a thing work?


Good comments, Chris. I think the author has a little bit of nuance in that Julia isn't correct in the specific use cases he needs them to be. While your point is also well taken that Julia is correct in cases where other languages aren't as well.

I'm a little unfamiliar with the versioning in the package ecosystem, but would you say most packages follow or enforce SemVer? Would enforcing a stricter dependency graph fix some of the foot guns of using packages or would that limit composability of packages too much?


> but would you say most packages follow or enforce SemVer?

The package ecosystem pretty much requires SemVer. If you just say `PackageX = "1"` inside of a Project.toml [compat], then it will assume SemVer, i.e. any version 1.x is non-breaking an thus allowed, but not version 2. Some (but very few) packages do `PackageX = ">=1"`, so you could say Julia doesn't force SemVar (because a package can say that it explicitly believes it's compatible with all future versions), but of course that's nonsense and there will always be some bad actors around. So then:

> Would enforcing a stricter dependency graph fix some of the foot guns of using packages or would that limit composability of packages too much?

That's not the issue. As above, the dependency graphs are very strict. The issue is always at the periphery (for any package ecosystem really). In Julia, one thing that can amplify it is the fact that Requires.jl, the hacky conditional dependency system that is very not recommended for many reasons, cannot specify version requirements on conditional dependencies. I find this to be the root cause of most issues in the "flow" of the package development ecosystem. Most packages are okay, but then oh, I don't want to depend on CUDA for this feature, so a little bit of Requires.jl here, and oh let me do a small hack for OffSetArrays. And now these little hacky features on the edge are both less tested and not well versioned.

Thankfully there's a better way to do it by using multi-package repositories with subpackages. For example, https://github.com/SciML/GalacticOptim.jl is a global interface for lots of different optimization libraries, and you can see all of the different subpackages here https://github.com/SciML/GalacticOptim.jl/tree/master/lib. This lets there be a GalacticOptim and then a GalacticBBO package, each with versioning, but with tests being different while allowing easy co-development of the parts. Very few packages in the Julia ecosystem actually use this (I only know of one other package in Julia making use of this) because the tooling only recently was able to support it, but this is how a lot of packages should be going.

The upside too is that Requires.jl optional dependency handling is by far and away the main source of loading time issues in Julia (because it blocks precompilation in many ways). So it's really killing two birds with one stone: decreasing package load times by about 99% (that's not even a joke, it's the huge majority of the time for most packages which are not StaticArrays.jl) while making version dependencies stricter. And now you know what I'm doing this week and what the next blog post will be on haha. Everyone should join in on the fun of eliminating Requires.jl.


> Julia ships an entire patched version of LLVM to fix correctness bugs in numerical methods

Sounds like the banana ships with the gorilla which requires the entire jungle, and we're too busy fixing the gorilla to give the banana our undivided attention.


I'll be honest, based on my experience with Julia, this makes me more worried about using e.g. libuv in production systems now, not less. I understand your opinion that "The easier it is to look at the code, the easier it is to find issues with it", but I don't think that has anything to do with the fact that `prod((Int8(100), Int8(100)))` and `prod([Int8(100), Int8(100)])` disagree, because someone decided to special-case tuple multiplication. And to make it even worse, this bug was even documented(!) in the comments by whoever committed the original code:

   # TODO: this is inconsistent with the regular prod in cases where the arguments
   # require size promotion to system size.
How did this pass code review? Why would it be okay for a standard library function to be "inconsistent" in this way?

(EDIT: Since writing this comment, I've realized that (100 * 100) % 256 is in fact 16, so the results are a little less inexplicable to me. I think having the types annotated in the REPL would have made it clearer what was going on, and it's still a very difficult inconsistency to debug, especially as an end user)

I also think your argument that "[...] you'll always have some newcomers write code (and documentation)" that is broken is completely incorrect, and it shifts the blame from providing a safe and easy-to-use system from the language authors onto the users. The OP goes to pains to point out that this was not just an issue of "some newcomers"—it was a fundamental issue across the entire community, including what seem to be some of the most heavily-used packages in Julia's ecosystem, including Distributions.jl and StatsBase.jl. It's deeply misleading to blame issues like that simply on "people who don't have a ton of development experience" and "newcomers writing documentation", and it indicates a lack of responsibility and humility from Julia's proponents.

P.S: You're correct that the documentation about @inbounds was written by someone who was new to the language (https://github.com/JuliaLang/julia/pull/19726). But in fact the example itself was copied over entirely as-is from devdocs, where it was written by the author of the boundschecking feature(!) https://github.com/JuliaLang/julia/pull/14474. And it was only fixed last year. And the entire docs PR was reviewed thoroughly by two core team members, with lots of changes and suggestions—but nobody noticed the index issue. So I don't think you can blame this one on newcomers.


Julia released experimental support for arrays whose indexes don't start at 1 in Julia 0.5, October 2016.

The boundschecking feature was added in 2015, so at the time they wrote their code and examples, they were correct.

The documentation and review happened in December and January 2016/2017 when the non 1-based indexing was still experimental and very new, so I don't think this is as big a fail as you've made out either.

Yes, the documentation should have been updated when non-standard indexing was made non-experimental, and the reviewers should maybe have noted the new and experimental array indexing stuff, but it's only natural to miss some things.


That's fair enough! I was unaware of that history. But my point wasn't that the issue was "a big fail", it's that the GP was unfair in assigning the responsibility of that failure to "some newcomers [who] write code (and documentation) with this" while "the people who really know to handle these cases" are fine. The responsibility should have been on the people pushing for the experimental array indexing code to make it work safely with the existing usage of boundschecks that existed in the ecosystem and the existing documentation. It's a fundamental disagreement between whether the onus of code safety is on the user (who is responsible for understanding the totality of the libraries they're using and all of the ways they can fail) or on the programming language (for ensuring the stability and correctness of its code, documentation and ecosystem when making changes).


Just to clarify, the prod() bug you mention was fixed about a year ago.


The problem in this case (as with most issues regarding `@inbounds`) is that this text was written before arrays with non-standard indices existed in Julia. So the example was correct at the time it was written, just like the StatsBase code was correct. Old code needs careful checking to fix all these occurrences.


Discussed in a sibling thread: https://news.ycombinator.com/item?id=31401155.


> I agree, code should never do that. It should be `eachindex(A)`

Will that generate the same code as "i in 1:length(A)"?

Maybe whoever wrote that didn't believe so at least, or perhaps didn't find it so at the time.

The reason @inbounds would have been used is performance, so that's likely why the for loop header was written that way?


`eachindex` is — in quite a few situations — faster than `1:n`.

We've also been trying to promote a culture of not blindly putting `@inbounds` notations on things as the compiler gets smarter. `@inbounds` is a hack around a dumb compiler, especially when the loop is as simple as many of these examples. It's not needed there anymore (but was 5 years ago).


Perhaps that is part of the point of the article? If you accept things like @inbounds, which is a horrible hack and was a horrible hack five years ago, then perhaps the culture is a little too tolerant towards horrible hacks. Because many of the bugs the author enumerates are of the "fixes the problem for now, let's deal with the consequences later" type.


I wouldn't say that @inbounds is a "horrible hack". Just like the `unsafe` part of Rust is not a "horrible hack". There are cases where it is impossible for a compiler to statically verify that an index access is in bounds and in those cases it will need to emit a check and an exception. This prevents many other optimizations (for example SIMD). So for a language that is intended for people to write low-level numerical routines there has to be a way to opt out of these checks or people would have to write their numerical routines in a completely different language. But the important part is that index access is memory safe by default (as opposed to e.g. C) and you can also force boundschecking to be turned on (to override @inbounds) with a command-line flag (--check-bounds=yes). So if you want, you could pretend "@inbounds" doesn't exist by just aliasing your julia executable to apply that command line flag.


Yes and no... Julia's been focused on high performance numerical computing from the beginning (and other related scientific applications). Using macros to get good performance from relatively generic code was (from my outside perspective) a really effective way to support real applications early on and also give time for the compiler to get "sufficiently smart" to make the macros less necessary.


The question is: is it at least as fast in all situations? Was it always that way?

The 1 to length loop just has to initialize a local variable and step it; it cannot do anything else. It doesn't worry about the kinds of array that A may be, with its particular configuration of indexing, right?

You may promote a culture of not doing certain things, but that by itself won't make those things disappear from existing code.

Say you're trying to ship some product and you receive a bulletin from the language mailing list encouraging you, "try not to use @inbounds, it's a hack around a dumb compiler". You know you have that in numerous places; but you're not going to stop what you're doing and start removing @inbounds from the code base. If you're remarkably conscientious, you might open a ticket for that, which someone will look into in another season.


> The question is: is it at least as fast in all situations? Was it always that way?

Yes and yes.


I think it should be fine for performance AFAIU to use `eachindex` instead; at least I know `eachindex` plays nicely with LoopVectorization.jl with no performance costs there.

That said, I think you're exactly right that people may wonder just this and use the seemingly "lower-level" form out of concern with or without testing it.


One of my intentions with the rewrite is to let `@turbo` to change the semantics of "unreachable"s, allowing it to hoist them out of loops. This changes the observed behavior of code like

  for i = firstindex(x):lastindex(x)+1
    x[i] += 2
  end
where now, all the iterations that actually would have taken place before the error will not have happened. But, hoisting the error check out when valid will encourage people to write safer code, while still retaining almost all of the performance. There is also a class of examples where the bounds checks will provide the compiler with information enabling optimizations that would've otherwise been impossible -- so there may be cases with the rewrite where `@inbounds` results in slower code than leaving bounds checking enabled.


Oh, nice!


Is "for i in 1:length(A)" ever correct? Should Julia just emit a warning any time it encounters that pattern? Or maybe something slightly more complicated, such as that pattern followed by usage of i to index into A inside the loop?


> Is "for i in 1:length(A)" ever correct?

In some rare cases, it very well might be exactly what the code's author intended and needed.

I tend to lean towards when Martin Fowler calls an "enabling attitude"[0] (as opposed to a "directing attitude") -- that is, when faced with a choice about how to design the primitives of an interface, I lean more often towards providing flexibility, and I try to avoid choosing ahead of time what users aren't allowed to do. It's better to document what's usually the wrong way to do something than to enforce it in the design. You can never guess what amazing things people will create when they are given flexible, unrestricted primitives.

So for cases like this, I think it's better to rely on a flexible linting tool (if available) than warnings or errors.

[0] https://martinfowler.com/bliki/SoftwareDevelopmentAttitude.h...


Why not have a feature to allow you to turn off the warning? E.g. have something recognise 1:length(x) and complain unless you write e.g. @nowarn eachindex before it.


Warning about such things is the job for a linter. There is a linter for Julia so such a thing could be added there. It shouldn't be a runtime warning though, like you propose.


It is correct if `A` is of type `Array` as normal Array in julia has 1-based indexing. It is incorrect if `A` is of some other type which subtypes `AbstractArray` as these may not follow 1-based indexing. But this case errors normally due to bounds checking. The OP talks about the case where even bounds checking is turned off using `@inbounds` for speed and thus silently gives wrong answers without giving an error.

An issue was created sometime ago in StaticLint.jl to fix this: https://github.com/julia-vscode/StaticLint.jl/issues/337


It's correct if you want to do something `length(A)` times and want an iteration counter, but it's never better than `for idx in eachindex(A)` if what you actually want are indexes into A (which is of course the much more common case).

Julia did not initially support arrays that aren't indexed from 1 (experimental support added in Julia 0.5, I don't know when it was finalised), and at that time I'm not even sure we had something like eachindex, certainly there would be no reason why someone would use it for an array.


> Is "for i in 1:length(A)" ever correct?

Yes, actually. While I have approximately zero knownledge of Julia specifically, a language-independent example might be:

  B = OneBasedArray(length(A))
  A_ = iter(A)
  for i in 1:length(A) { B[i] = pop(A_) }
  assert(iter_isdone(A_))
And if that looks contrived... yes; it is contrived.

> that pattern followed by usage of i to index into A inside the loop?

I can't think of any legitimate uses for that, but there probably are some; make sure to allow:

  len = length(A)
  for i in 1:len ...
as a `if( (x = foo()) )`-style workaround.


Why allow iterating with 1:length(A) if it's not the good way ?


I don't think there's any clean way to stop that at a language level (some languages prevent this by disallowing random access to arrays, but that's a non-starter for a performance-oriented language), and also it would be a massively breaking change.


you can't disallow it at a language level since either way, you are just indexing with Ints. That said, we can add better linting rules to catch stuff like this.


> Everything has correctness issues somewhere.

Yes but Julia is (yet another) dynamic language, presumably for "ease of use". A language with static types would have made it easier to build correct software (scientific code in e.g. OCaml and F# can look pretty good). Julia chose a path to maximize adoption at the expense of building a reliable ecosystem. Not all languages choose to make this trade-off.


> A language with static types would have made it easier to build correct software

This claim is repeated often, but numerous attempts have failed to demonstrate that this is generally the case in practice (there have been a couple of studies showing an effect in very specific circumstances). Static types might indeed assist with correctness, but they are not the only thing that does, and in some situations they could come at the expense of others. I.e., even if types were shown to significantly help with correctness, it does not follow that if you want correctness your best course would be to add types.

Given empirical studies, the current working hypothesis should be that if static types do have a positive effect on correctness, it is a small one (if it were big, detecting it would have been easy).

Note that Matlab, the workhorse of scientific computing for a few decades now, is even less typed than Julia. That's not to say that Julia doesn't suffer from too many correctness issues (I have no knowledge on the matter), but even if it does, there is little support for the claim that typing is the most effective solution.


We can trade anecdotes on this topic, but I've written numerical code in OCaml and also Julia. The strictness of OCaml's type system is painful in a numerical context but for virtually all other things it is awesome to pass code into the interpreter/compiler and catch structural problems at compile-time rather than maybe at runtime.

OCaml's type system is almost certainly not the right model for Julia but the ad-hoc typing/interface system Julia currently employs is at strong odds with compile-time correctness. There's almost certainly some middle ground to be discovered which might be unsound in a strict sense but pragmatically constrains code statically so there is high likelihood of having to go out of your way to pull the footgun trigger.

You can see how little type annotations are used in practice in major Julia libraries. It should be integral to best practice in the language to specify some traits/constraints that arguments must satisfy to be semantically valid, but what you often see instead is a (potentially inscrutable) runtime error.


"Awesome", i.e. more enjoyable for you, and "more correct", i.e. fewer bugs in production, are two very different things. I also prefer typed languages for the software I tend to write and find them more enjoyable, but that still doesn't make me claim that types lead to more correct software.


I am not familiar with the studies you are relying on to make the point that statically-typed languages have no significant difference in terms of number of bugs in production compared to dynamically-typed. Measuring such things is challenging, and the most useful measure may not be in terms of "bugs in production" but by a number of other measures, such as how long it takes to surface bugs after the code is accepted by the interpreter/compiler, how much time is spent on writing the implementation vs. writing & running tests, how many bugs occur on major refactorings, etc. If you have citations for studies you like, I'm certainly interested.

My use of colloquialism aside, it is strictly more rigorous to catch equivalent bugs through the interpreter/compiler than through testing or other runtime-dependent approaches. In my own experience, despite being a more experienced programmer in my Julia-writing phase than in my OCaml-writing phase, it takes much more time to surface bugs in my "running" Julia code than OCaml. The lack of determinism in surfacing these bugs does not suggest as much confidence in the Julia code. You could counter by saying I'm probably able to implement more functionality in Julia per unit of up-front development time than the equivalent development time in OCaml, which I'd probably have to concede, but that just highlights measuring these things in a directly-comparable way is not easy.

In the physical engineering disciplines, we often have disagreements about the level of sophistication of physics-based models that should be used for design and analysis. It's very reminiscent of these static vs. dynamic typing discussions in software development. There isn't a "one size fits all" answer, but generally, the more complex and expensive the system, the more important the models incorporate greater physical fidelity. My analogous conclusion here is a lot of technical/numerical code is complex enough that more rigor enforced by the language would likely be the right tradeoff for a net win on up-front correctness (vs correctness as a result of testing).


Anyone is allowed to prefer a programming style that suits their aesthetics and habits, and like that one over all others. Aesthetic preferences are a very valid way to choose your programming language — ultimately that's how we all pick our favourite languages — and there's no need to make up universal empirical claims to support our preferences.

Here's a good talk to watch on the subject: https://youtu.be/ePCpq0AMyVk

And here's a summary of various studies done: https://danluu.com/empirical-pl/

As of today, what we know is that if there's a positive effect of types on correctness, then it is probably a small one.

There's really no need to assert what is really a conjecture, let alone one that's been examined and has not been verified. If you believe the conjecture is intrinsically hard to verify, you're conceding that you're only claiming a small effect at best (big effects are typically not hard to verify), and so there's even less justification for continuing to assert it. It's okay to prefer typed languages even though they do not, as far as we know, have a big impact on correctness.


  Anyone is allowed to prefer a programming style that suits their aesthetics and habits, and like that one over all others. Aesthetic preferences are a very valid way to choose your programming language — ultimately that's how we all pick our favourite languages — and there's no need to make up universal empirical claims to support our preferences.
That's fine, but I'm not sure what it has to do with my comment as it was not about preferences based on aesthetics or habits.

Thanks for the links though.

  There's really no need to assert what is really a conjecture, let alone one that's been examined and has not been verified.
There's no unsupported conjecture in "it is strictly more rigorous to catch equivalent bugs through the interpreter/compiler than through testing or other runtime-dependent approaches."

  If you believe the conjecture is intrinsically hard to verify, you're conceding that you're only claiming a small effect at best (big effects are typically not hard to verify), and so there's even less justification for continuing to assert it.
It's easy to fall victim to the Robert McNamara fallacy, that if something isn't easy to measure its effect or importance is insignificant. Anyone looking back at U.S. defense and procurement policy from his era is free to observe the lack of real-world congruence with such thinking. The Dan Luu page you cited, more than anything else, seems to reinforce that the cited studies are hard to interpret for any rigorous conclusions or for validity of methodology.

This is why I did not make sweeping statements along the lines of "the majority of dynamically-typed software in production [no qualifier on what "production" means] would have fewer bugs if it were statically-typed" or the like.


> That's fine, but I'm not sure what it has to do with my comment as it was not about preferences based on aesthetics or habits.

Because you made the claim that "it is strictly more rigorous to catch equivalent bugs through the interpreter/compiler than through testing or other runtime-dependent approaches," but that claim was simply not found to be true.

> There's no unsupported conjecture in "it is strictly more rigorous to catch equivalent bugs through the interpreter/compiler than through testing or other runtime-dependent approaches."

There is, unless you define "more rigorous" in a tautological way. It does not seem to be the case that soundly enforcing constraints at compile time always leads to fewer bugs.

> It's easy to fall victim to the Robert McNamara fallacy, that if something isn't easy to measure its effect or importance is insignificant.

The statement, "you will have fewer bugs but won't be able to notice it," is unconvincing. For one, if you can't measure it, you can't keep asserting it. At best you can say you believe that to be the case. For another, we care about the effects we can see. If the effect doesn't have a noticeable impact, it doesn't really matter if it exists or not (and we haven't even been able to show that a large effect exists).

That the effect is small is still the likeliest explanation, but even if you have others, your conjecture is still conjecture until it is actually verified.

> The Dan Luu page you cited, more than anything else, seems to reinforce that the cited studies are hard to interpret for any rigorous conclusions or for validity of methodology.

It does support my main point that despite our attempts, we have not been able to show that types actually lead to significantly fewer bugs, i.e. that the approach is "more rigorous" in some useful sense.


I'm not sure if that's true, even big effects can be hard to verify if there are significant confounders.

For example, let's imagine that writing OCaml code really leads to fewer bugs than writing code in Lisp (to just choose two languages) but only after you've trained people in OCaml for ten years. Or maybe, technically Java leads to measurably fewer bugs than Ruby, but because most popular Java projects make heavy use of reflection, the effect dissipates... and so on (these are just examples for potential confounders, I'm not claiming they're true).

You are correct that one cannot claim that "static typing leads to fewer bugs" is a demonstrably correct statement, but I don't think you can claim that there demonstrably can be no (big) effect either. And in the end, you're also allowed to believe in conjectures even when there is no solid evidence behind it. People do that all the time, even scientists.


You can believe in such a conjectures, but it's wise to consider the more probable possiblity that if an effect hasn't been found, then it is likely small.

Also, in the end it doesn't really matter, because the conjecture that's repeated as an assertion isn't said merely as a scientific claim, but as an attempt to convince. Companies are interested in some bottom line effect, and rather than trying to sell your favourite approach with something like, "I like it; maybe you'll like it, too", you make some unsupported assertion that goes like this: "you should use my thing because it will actually make an important contribution to some bottom-line effect you're interested in; oh, and by the way, you might not notice it." That isn't convincing at all, so it's best to stick with what we know: "I like it, maybe you'll like it, too."


Show me the companies that only ever implement policies that have shown to be effective in rigorous empirical studies.

Usually some person (or a group of people) is in charge of some decision and that person will make judgment calls based on their beliefs. This is no less true of programming techniques than it is of management styles, corporate strategy or anything else.

Your insistence that we may not have beliefs about the very things we work with daily, unless they're empirically verified, is IMHO frankly ridiculous.


That's not my insistence at all. You can believe what you like. What you can't do is make empirical assertions that we've not been able to validate empirically.

Companies may adopt a technique based on empirical findings or anything else they like; most people choose a favourite programming language because they like working with it better. But the statement that types lead to fewer bugs is a very particular assertion that is simply unsupported by evidence. You may believe that using types reduces baldness and make your choices based on that, but it's still a conjecture/belief at best.


I think you're guilty yourself of what you're accusing other people of.

I haven't seen people ITT arguing that there is empirical evidence for types providing better correctness guarantees, just that they strongly believe it to be the case given their own experience.


The original statement was "A language with static types would have made it easier to build correct software." This is an empirical claim that the evidence does not support. Note that it's not that there's merely no evidence supporting the claim, but that studies designed to support the claim failed to do so.


> Given empirical studies, the current working hypothesis should be that if static types do have a positive effect on correctness, it is a small one.

Which use cases, languages and static type systems are you referring to? The context is very important, especially when seeking to draw general conclusions from empirical studies.

As someone who has previously posted extolling the merits of static analysis, I'm very surprised at your position regrding static types. Static types help to constrain a language and enable reasoning, either by additional static analysis or otherwise.

It is precisely the flexibility of dynamic languages that makes them difficult to reason about and difficult to build correct software in. This is why the use of dynamic languages are mostly banned in the defense industry.

Static types clearly help with composition (one of the complaints with Julia), especially at scale. How many academic empirical studies considered multimillion-line code bases? I submit for evidence a lot of expensive type-retrofitting projects such as Facebook Hack, Microsoft Typescript or Python types, which demonstrate that many companies have or had real problems with dynamic languages at any kind of scale.


> Note that Matlab, the workhorse of scientific computing for a few decades now, is even less typed than Julia.

You always make this argument when discussing PL features and I find it irksome. People get along fine without this feature, therefore there’s no sense in implementing it. But it cuts the other way, or we’d all still be using assembly. How many Matlab users know things could be better? Was the superiority of structured programming and avoiding GOTO ever empirically proven, or did we all just collectively realize it was a good idea?


> People get along fine without this feature, therefore there’s no sense in implementing it.

As someone whose job is to add new features to a programming language, that's never been my argument.

> But it cuts the other way, or we’d all still be using assembly

High-level languages were satisfactorily shown to be more productive than Assembly. I don't claim that no innovation works, just that not all do, and certainly not to the same degree. That feature X is helpful is certainly no evidence that feature Y is helpful, and that Python is more productive than Assembly does not support the claim that programs in OCaml are more correct than programs in Clojure.

Also, my argument isn't "we got by without it" or that no idea could ever work. It's that a specific claim was tested and unconfirmed.

> Was the superiority of structured programming and avoiding GOTO ever empirically proven, or did we all just collectively realize it was a good idea?

I don't know about the former, but the latter is certainly true, and until we actually reach concensus you can't claim we have.

BTW, I certainly don't claim that types aren't useful or even that they're not better in some ways (I believe that they help a lot with tooling and organisation), but the particular claim that they universally help with correctness, and do so better than other approaches, was studied, and simply not confirmed. You can't come up with a claim, try and fail to support it time and again, and keep asserting it as if it's obviously true, despite the evidence.


> As someone whose job is to add new features to a programming language, that's never been my argument.

I’ve definitely seen you argue along the lines of “it hasn’t been implemented in Java, therefore nobody uses it and we can’t tell if it’s a good idea or not” before. Forgive me for assuming this followed from that.

> You can't come up with a claim, try and fail to support it time and again, and keep asserting it as if it's obviously true, despite the evidence.

But “correctness” of itself is pretty nebulous. If we define it as whether or not the program conforms to one’s intentions with writing it, I would expect static types alone not to show a significant difference in correctness. Probably formal methods do but they have much higher overhead.

However, in terms of eliminating patterns which are literally never correct, like dereferencing null pointers, violating resource lifetimes, or calling methods that don’t exist, static typing can in fact eliminate those patterns.

> the claim that programs in OCaml are more correct than programs in Clojure

My full-time job is Elixir so I know full well the consequences of maintaining large codebases in dynamic languages. I would switch to OCaml in a heartbeat if it ran on the BEAM! I want to know that I am calling functions correctly within a node when the module can be resolved at compile time. This is a really basic thing to want, and not one that dynamic languages can offer. The qualitative difference is similar to that between structured and unstructured programming: I can actually do local reasoning about a function without having to check all the call sites or write a lot of defensive tests.

This is an obvious advantage, and on some level I don’t really care if it contributes to formal correctness or not because it would make my job easier.


> I’ve definitely seen you argue along the lines of “it hasn’t been implemented in Java, therefore nobody uses it and we can’t tell if it’s a good idea or not” before.

You have not seen me argue anything along those lines. I have, however, said the converse, that we try not to adopt features in Java until they've proven themselves elsewhere.

> However, in terms of eliminating patterns which are literally never correct, like dereferencing null pointers, violating resource lifetimes, or calling methods that don’t exist, static typing can in fact eliminate those patterns.

But the implication is reveresed! From A => B, i.e. types prevent certain bad things, you're concluding B => A, i.e. if you don't want those bad things then you should use types. That simply does not follow.

> This is an obvious advantage, and on some level I don’t really care if it contributes to formal correctness or not because it would make my job easier.

I wouldn't dare imply that types don't have certain important advantages, but that doesn't support the specific claim that types generally and significantly improve correctness — which many have tried to show and failed — and it certainly doesn't support the much stronger claim that if you want to improve correctness, the most effective way to do it is to use types.


> You have not seen me argue anything along those lines. I have, however, said the converse, that we try not to adopt features in Java until they've proven themselves elsewhere.

You’ve definitely argued that not enough software has been written in Haskell to know whether it’s the right tool for anything and whether strong types are actually a good idea.

> But the implication is reveresed! From A => B, i.e. types prevent certain bad things, you're concluding B => A, i.e. if you don't want those bad things then you should use types. That simply does not follow.

I really don’t know of a simpler way to do this than types. Do you? Honestly I would be using it if I did. All of the solutions I know to these problems involve types.

> that doesn't support the specific claim that types generally and significantly improve correctness — which many have tried to show and failed

What definition of correctness is being used here? Surely you’re not saying that optionals don’t eliminate null pointer dereferencing, are you?


> to know whether it’s the right tool for anything and whether strong types are actually a good idea.

Nope. I don't know what you mean by "a good idea", and I prefer typed languages myself (mostly for tooling support), but I do often point out that the claim that types improve correctness — let alone the claim that they do that better than other approaches — is an empirical claim that is not supported by empirical evidence (which, in fact, appears to contradict it).

Also, that there have been few programs written in Haskell, and that Haskell has failed to demonstrate that it leads to better correctness are both pretty basic facts.

> All of the solutions I know to these problems involve types.

I don't know what you mean by "solutions to these problems", but while we've not found a correlation between types and more correct programs, we have found correlations between code reviews and tests and more correct programs. Types might well be the solution to many things (e.g. automatic refactoring and jump-to-definition), but the empirical evidence we have suggests that increased correctness isn't one of them.

> Surely you’re not saying that optionals don’t eliminate null pointer dereferencing, are you?

Types certainly eliminate various kinds of errors, yet studies did not find that they reduce bugs (except in specific circumstances; for example, there was one study that reported that TypeScript has 15% fewer bugs than JavaScript).

Just to give you a sense for one reason that happens, we can take your example of Maybe types. A null pointer exception occurs when code assumes a reference can't be null, but is wrong to make the assumption. A Maybe type would force a test somewhere. But the question then, is, what do you do when the value is empty? A brilliant study on software correctness [1] found that most catastrophic crashes in distributed systems occur not because programmers fail to consider certain exceptional situations — in fact, the language forces them to consider those situtations — but because they frequently do the wrong thing when those situations occur.

[1]: https://www.usenix.org/system/files/conference/osdi14/osdi14...


> Types certainly eliminate various kinds of errors, yet studies did not find that they reduce bugs

Right, but this is why I asked about structured programming. These things are hard to study by their nature. I understand there’s not a consensus on this, but there are a lot of programmers who feel quite strongly that static types reduce bugs. Maybe that’s not good enough for you! But it’s clear that you accept other practices as beneficial on insufficient evidence. Or maybe you don’t — maybe you don’t think writing correct unstructured programs is harder.

> most catastrophic crashes in distributed systems occur not because programmers fail to consider certain exceptional situations

This is irrelevant though. The question is not whether static types prevent the most common types of bugs, or the most dangerous (it’s clear that memory safety is more important than static types in that regard.) If static types narrow the problem space to situations where you explicitly made the wrong decision, that’s significant. If there are any bugs caused by not handling exceptional conditions (and we both know that there are), then static type systems help reduce those.


> These things are hard to study by their nature.

Small effects are hard to study by their nature. Big effects are usually easy to spot.

> there are a lot of programmers who feel quite strongly that static types reduce bugs

There are a lot of people who feel quite strongly that homeopathy cures all kinds of diseases, but they've failed to demonstrate that.

> But it’s clear that you accept other practices as beneficial on insufficient evidence

It's not about "accept." I myself practice typed programming without asserting the empirical claim that it reduces bugs, that seems not to be true (or at least, the effect does not seem to be big).

> The question is not whether static types prevent the most common types of bugs, or the most dangerous

So far, we've failed to show that static types reduce bugs, period. You're allowed to like them and promote them, but your feeling towards them does not make a specific empirical claim more or less true.


> There are a lot of people who feel quite strongly that homeopathy cures all kinds of diseases, but they've failed to demonstrate that.

This is a really poor comparison. How many people who know enough to judge these things believe that? I can understand that you’re rigorous in the empirical claims you accept here, but surely you can see that experienced software developers have somewhat more basis to make claims about type systems than random people do about homeopathy.

> or at least, the effect does not seem to be big

Fair enough! It’s probably not as big as testing or code review but I do think it exists. You mentioned the TypeScript study, it’s not like there’s no evidence for believing this like there is with homeopathy.

> So far, we've failed to show that static types reduce bugs, period.

I mean, how could they not? You still haven’t explained this part, except with vague allusions that the problems they catch are not the most common problems. I’ve explained how static types reduce bugs: by eliminating cases where you intend to handle a case and forget to. I’ll specify that I mean a type system with exhaustivity checking or narrowing with control flow (like TS or Kotlin). This is not an empirical argument, it’s a rational argument, and I still legitimately don’t understand what the flaw is. The way I see it, static type systems restrict operations on types to those known to be valid at compile time. In a sound type system and compiler, the resulting program is guaranteed not to make invalid operations on types at runtime. Dynamic programs are fully free to make invalid operations on types at runtime. Some number of bugs are caused by making invalid operations on types at runtime. Where are those bugs in a statically typed program, if they haven’t been eliminated?


The extra bugs in a statically typed program go in the duplicated code that someone copied and changed the type names on because their type system wasn't flexible enough to let different types share the same code. This means that 3 years later when someone fixed a bug in part of the code, the bug remained in the other copy because the person writing the fix didn't know about the copy.

For a simple example, consider Arrow.jl vs the C++ implementation of the Arrow format. The Julia implementation is roughly 1/10th the lines of code (with more functionality), so even if there are 5x more bugs per line, the code still has fewer bugs.

Static types definitely reduce bugs per line, but they can still increase bugs per functionality.


But there are plenty of static type systems that let different types share the same code, the problem you describe only exists in nominal type systems (like C++) as far as I know. With structural types like in TypeScript or Go you can express this trivially.

For that matter you can do this with subtype polymorphism in most cases. In Rust you can do it with trait objects as long as you control either the type or the trait. Probably there’s a way to do it in C++ too.


> but surely you can see that experienced software developers have somewhat more basis to make claims about type systems

But experienced software developers, more than "random people", should know that if they make a conjecture, it's tested and isn't verified, they should reconsider their conjecture.

> You mentioned the TypeScript study, it’s not like there’s no evidence for believing this like there is with homeopathy.

There is no evidence for believing this. If you believe the evidence for TS vs JS in particular, then you should also believe the failure to find a more general effect.

> I mean, how could they not?

That's an interesting question and there are many answers; I've been interested in the complex subject of software correctness for years, and have written a bit about it (https://pron.github.io/). The more you study software correctness, the more you learn how complicated it is and that there are no easy answers. In particular, you learn that it's not true that more soundness is always a good path toward more correctness. But if you accept your preconceived notions over empirical study, then there's little hope for making actual progress.

> Where are those bugs in a statically typed program, if they haven’t been eliminated?

I gave you an example of where they are. If you want to go down the path of thinking about the theory of software correctness, start by convincing yourself that for every JavaScript program there is a Haskell program (perhaps living entirely inside an Either monad) that behaves the same way.


I don’t think the subject of software correctness in practice is itself well-studied enough to say conclusively that my conjecture is false. I think what can be said conclusively is that at scale people cannot write memory-safe code in an unsafe language or type-safe code in a dynamic language, but obviously these are not the only kinds of correctness.

> In particular, you learn that it's not true that more soundness is always a good path toward more correctness.

I’m still curious what definition of “correctness” you’re using here. Formal correctness? Bugs per line of code? Generally I’m thinking in terms of formal correctness, in which case I think it’s basically a truism that more soundness leads to more correctness. At some cost, perhaps.

> I gave you an example of where they are.

Yes, but those bugs are not unique to static languages, and I’ve never claimed that static languages eliminate all kinds of bugs. As far as I could tell from reading the study, it’s not evidence that static languages encourage this sort of bug more.

To be blunt: I think the set of kinds of bugs that can be written in dynamic languages is a strict superset of the kinds of bugs that can be written in a static language. Maybe I’m completely wrong about this! But this is the root of my reasoning.


> I don’t think the subject of software correctness in practice is itself well-studied enough to say conclusively that my conjecture is false.

I don't claim that. Given what we know, the likeliest explanation to the findings so far is that an effect, if it exists, is probably small.

> Formal correctness? Bugs per line of code?

Both would work.

> in which case I think it’s basically a truism that more soundness leads to more correctness. At some cost, perhaps.

And you'd be wrong, or, at least, the second part of your statement makes all the difference. What we want is the best correctness we can get for some given cost, or, given some effort, what should you do to get the most correct program? If you follow formal methods, some of the hottest lines of research right now are about reducing soundness to improve correctness.

> As far as I could tell from reading the study, it’s not evidence that static languages encourage this sort of bug more.

I didn't say they did. But you asked how we explain the observation that types don't improve correctness, and one explanation is that the kind of mistakes that types catch aren't the costliest bugs that make it to production, and perhaps the extra effort invested comes at the expense of other approaches that do uncover more serious bugs.

> But this is the root of my reasoning.

That's as good a conjecture to start with as any, but it needs to be revised with findings.


> perhaps the extra effort invested comes at the expense of other approaches that do uncover more serious bugs

I guess in my experience the effort invested programming in a static language is really not that much higher than dynamic, and in some ways I find it less effortful. For example: pattern matching on a sum type, being sure that I’ve handled all the cases I want to. Is there good empirical research on this?

> That's as good a conjecture to start with as any, but it needs to be revised with findings.

I was attempting to make a factual statement, not a conjecture. If it is true that static types eliminate a class of errors, then type errors must be really cheap for static types not to be worth it on those grounds. My prior is that compiler errors are cheaper than runtime errors here.


Until it's been measured, a statement is a conjecture, not "factual." A conjecture that we've tried to verify yet failed to see a large effect is a problematic conjecture.

We don't have good empirical findings about many things, most likely because many effects are small at best. But it doesn't matter. You're can say that you still believe something despite failed attempts to measure it, but you can't see it's "factual." That is the difference between fact and conjecture.


> That is the difference between fact and conjecture.

Yes, but I'm trying to make a formal statement of fact here, not an empirical one.

> I think the set of kinds of bugs that can be written in dynamic languages is a strict superset of the kinds of bugs that can be written in a static language.

Here I am attempting to make a formal statement about the set of runtime behaviors that can be exhibited under static type systems. What I'm saying is that there is a set of incorrect runtime behaviors that can only be exhibited in a dynamic type system, that is, the set of type errors. I'm not aware of any runtime errors that can only be exhibited under a static type system. I'm not well educated enough in the relevant fields to be able to formalize this with notation (I would do so if I was, I think notation communicates these things much more clearly than words) but I do believe it has a formal representation.

> A conjecture that we've tried to verify yet failed to see a large effect is a problematic conjecture.

I think we're somewhat talking past each other here. What I'm saying is that the set of possible incorrect runtime behaviors is smaller in a static language. This can't really be empirically verified, it should have a formal answer in type theory or programming language theory. It's possible I'm wrong about what the formal answer is, but I haven't seen you address it yet. It's also possible that this is true but not strongly related to the way that bugs evolve in typical software engineering practice, I've speculated upthread that the total number of bugs might be similar because programmers make a higher number of repeated mistakes from the more narrow set while programming in static languages (though personally this seems unlikely). It's further possible that this is just not a very significant effect, as you have conjectured. However it is my belief that if there is an effect on bugs overall that it derives from what I understand to be a formal character of the runtime behavior of statically typed programs, that there are fewer ways for them to "go wrong."


> What I'm saying is that there is a set of incorrect runtime behaviors that can only be exhibited in a dynamic type system, that is, the set of type errors. I'm not aware of any runtime errors that can only be exhibited under a static type system.

But that doesn't mean they have fewer bugs! They might well have more.

> What I'm saying is that the set of possible incorrect runtime behaviors is smaller in a static language.

Not exactly. For every program in an untyped language, you could write a typed program with the exact same behaviours.

The assertion that typed programs have fewer bugs is simply false in theory, and failed to be confirmed empirically.

> However it is my belief that if there is an effect on bugs overall that it derives from what I understand to be a formal character of the runtime behavior of statically typed programs, that there are fewer ways for them to "go wrong."

I understand that that is your belief, but it is supported by neither theory nor practice.


“Want B” is not the same as “B”.

“Should do A” is not the same as “A”.

The reverse of “types prevent certain kinds of bugs” would be that “all code which doesn’t suffer from those kinds of bugs is typed”, not “if you don’t want those bugs, use types”. The parent does not assume B->A; your actual disagreement with them is over whether A -> B implies A -is-the-most-effective-way-to—> B.

I share your understanding of the evidence base around types, and I agree with your conclusions; my beef here is just with divorcing the language of formal logic from its substance.


Yeah, though I do actually think “all code which doesn’t suffer from those kinds of bugs is typed” is true to a first approximation, for what it's worth. That's a big part of the reason I prefer static types. Under a few thousand lines it's less true. It's really at scale that these things become a problem, and even so they're still manageable most of the time, they're just kind of a pain.


Formalising natural language is tricky, but I believe "every PL that leads to more correct programs is typed" is either equivalent or stronger than "if you want a PL that leads to more correct programs it should be typed." It could be stronger if the latter phrasing takes measure into account (so a bit of fuzzy logic), meaning the possibility that other approaches also lead to more correct programs but not as well. Still, I contend that my formalisation is correct enough to demonstrate the logical error that even if types imply correctness (not substantiated by evidence), it does not follow that correctness implies types (even less supported by evidence).


> How many Matlab users know things could be better?

not very many in my experience - matlab rots the brain


In particular, not a single issue mentioned in this article would have been prevented by static type checking.


This is not true. For example, the issue regarding custom index ranges causing silent data corruption (6 examples) could be fixed with static types.

Look how many of the other bug reports contain the phrase "does not check for" or refer to specific primitive types.


How would static types help with that? Whether your indexing range starts with zero or one or something else isn't necessarily encoded in the type domain. `1:length(A)` is just a range of `Int`s.


Why not encode the starting offset into the type domain? Or at least distinguish between normal and unusual. Then the function signature can restrict to 1-offset arrays if that is what it assumes internally.


If the function signature said `Array` rather than `AbstractArray`, then this code would have been fine. `Array` indexing starts at `1`.

``` julia> function f(A::Array) println(A[1:length(A)]) end f (generic function with 1 method)

julia> f([1,2,3,4]) [1, 2, 3, 4]

julia> f(OffsetArray(1:10, -1)) ERROR: MethodError: no method matching f(::OffsetVector{Int64, UnitRange{Int64}}) ```

You could prevent this problem using Julia's type system. The `AbstractArray` might have been too broad. Based on the chronology of the code that might not have been apparent. See other threads for details.

Another way would be to treat `firstindex` as a trait and dispatch on that. ``` julia> f(A::AbstractArray) = f(A, Val(firstindex(A))) f (generic function with 1 method)

julia> f(A::AbstractArray, firstindex::Val{1}) = println(A[1:length(A)]) f (generic function with 2 methods)

julia> f(A::AbstractArray, firstindex::Val{T}) where T = error("Indexing for array does not start at 1") f (generic function with 3 methods)

julia> f(A::AbstractArray, firstindex::Val{0}) = println("So you like 0-based indexing?") f (generic function with 4 methods)

julia> f([1,2,3,4]) [1, 2, 3, 4]

julia> using OffsetArrays

julia> f(OffsetArray(1:10, 1)) ERROR: Indexing for array does not start at 1 Stacktrace: [1] error(s::String) @ Base .\error.jl:33 [2] f(A::OffsetVector{Int64, UnitRange{Int64}}, #unused#::Val{2}) @ Main .\REPL[5]:1 [3] f(A::OffsetVector{Int64, UnitRange{Int64}}) @ Main .\REPL[3]:1 [4] top-level scope @ REPL[9]:1

julia> f(OffsetArray(1:10, -1)) So you like 0-based indexing? ```


The starting offset is encoded in the type domain, btw, and accessible with the `firstindex` function.

But you will still want to calculate indices at runtime, and then out-of-bounds errors will have to be caught at runtime anyway.


That means disallowing indexing with integers, I presume? Since an integer can take the values 0 or 1 equally. And what about the other end of the array. Must every index be restricted by type to be located in the acceptable range?


it is only fixed with static typing and no generics (i.e. C/Fortran). If you have a generic array supertype, a statically typed language would let you write exactly the same bug.


If one has a different type or trait for unusual and normal range indices, then the signature for the procedure that assumes indexing from 1 can be written to disallow other starting indices.


Julia allows you to specify the type of a datum if you feel the need (not unlike Common Lisp). Is any of the bugs the author mentioned related to the type system?


I'm surprised at this critique, as I thought Julia's type system was often considered to be one of its strongest features.


So, I really respect what you've done (for those who don't know, Chris is the original developer and lead of DifferentialEquations.jl) and use your work heavily. However, understanding and writing idiomatic Julia, especially with these large packages, is severely hampered by the documentation culture.

A prior comment I made, all of which seems unaddressed to me three years later: https://news.ycombinator.com/item?id=20589167

To be fair, I've only submitted a small documentation patch for a package and haven't significantly "put my money where my mouth is" on this topic. But I hope the next time there are thoughts among the core team about what is the next capability to add to the language, addressing this deficiency is prioritized.


FWIW, I posted the other month that I'm looking for any devs who can help with building a multi-package documentation for SciML, since I don't think the "separate docs for all packages" ends up helpful when the usage is intertwined. SciML is looking for anyone looking to help out there (and there's a tiny bit of funding, though "open source sized" funding). In the meantime, we're having a big push for more comprehensive docstrings, and will be planning a Cambridge area hackathon around this (follow https://www.meetup.com/julia-cajun/ for anyone who is curious in joining in).

As for high level changes, there's a few not too difficult things I think that can be done: https://github.com/JuliaLang/julia/issues/36517 and https://github.com/JuliaLang/julia/issues/45086 are two I feel strongly about. I think limiting the type information and decreasing the stack size with earlier error checking on broadcast would make a lot of error messages a lot more sane.


FMA can't be broken on Windows because FMA is implemented in hardware by Intel. What's broken is the compiler that Julia uses on Windows.


When FMA isn't in the hardware (due to using some chip where it doesn't exist) it has a fallback to a software-based emulation. That is incorrectly implemented in Windows. Julia ends up calling that in this case because that's what LLVM ends up calling, and so any LLVM-based language will see this issue.


Even when FMA is implemented in hardware, LLVM will generally use the software version when the arguments are known at compile time.


FMA is only implemented in hardware on Haswell and later uArches. If you’re running on (or compiling for) IVB or earlier, you’ll get a libcall instead, and MSVC’s has been broken since forever.


Is this actually broken in MSVC, or is it broken because Julia is using mingw and linking to an ancient version of libc on windows (which is intentionally left as-is for back-compat)?

(I genuinely don't know, but the linked issue mentioned mingw specifically)


It's broken in MSVC and mingw (in different ways). See https://github.com/MicrosoftDocs/cpp-docs/pull/3526.


Thanks for the link!


the problem is that LLVM will happily miscompile fma instructions by turning them into incorrect constants due to windows having a broken libm. This is a bug in C/C++, and I'm currently unaware of a language that has fma and a good compiler which gives correct fma results on Windows.


CPUs support these instructions for 9 years now. When ignoring these old CPUs, most languages and compilers are usually doing a good job. Example in C which does not depend on any library functions:

    double fma( double a, double b, double c )
    {
        __m128d av = _mm_set_sd( a );
        __m128d bv = _mm_set_sd( b );
        __m128d cv = _mm_set_sd( c );
        return _mm_cvtsd_f64( _mm_fmadd_sd( av, bv, cv ) );
    }


2 problems: Julia supports cpus without FMA, and on windows, llvm will use libc to constant fold the value of fma even on computers that have fma in hardware.


Hardware requirements are up to the product management. For instance, many modern videogames (they generally want compatibility because directly translates to sales) no longer support 32-bit or pre-AVX1 processors. Technically, Julia can drop the support of pre-FMA3 processors if it helps moving forward.

It’s inevitable anyway due to the changing hardware requirements of the OS, the only question is “when”. I don’t think Windows 10 21H2 supports any CPU which doesn’t have SSE 4.1, it’s only a matter of time when Windows will require support of newer instruction sets.

About LLVM, can’t they compile that thing with an option like -mfma to use hardware FMA3 for constant folding?


I am a long-time member of the Julia community and had a discussion with the author about these issues a long time ago – but did not give feedback on the post. Let me first state that Yuri is a great person and was a valuable member of the community. He pushed the boundaries of the language and produced some very nice packages in his time. His concerns are genuine and should be respected and discussed in that context.

Also, let me say that encountering these kinds of bugs is not something I have had experience with. But, I tend to be very conservative with my usage of libraries and fancy composition.

If I had more experience with programming language theory and implementation, perhaps I would have a better name to describe the source of the issues described. My attempt is to call it “type anarchy”. The way I see it, there is not a clear way to assign responsibility for correctness. In the case of the array used in the post, is it the fault of the implementer of the `sum` function (without a type signature, as it should be) or implementer of the data structure? I am honestly not sure. But as Julia breaks news ground with its type system and multiple dispatch, this could very much be an open question.


I've spent a lot of time developing large computational codebases in Julia, and I think the most insidious of these issues is a product of no formal way of enforcing interfaces. Using one of the common packages to build a trait system and add some sort of guarantee that all the right methods are implemented for a given trait simplifies maintenance dramatically.

This doesn't catch mathematical bugs, but those crop up everywhere. Instead, knowing what the interfaces must be specified so you can trust your implementation is crucial, and being able to know when it is invalidated is invaluable.

I've had a few awful bugs involving some of the larger projects in this language, but a proper interface/trait system would simplify things exponentially. There are some coding style things that need to be changed to address this, like using `eachindex` instead of `1:length(A)` for array iteration as the example in the article points out. However, these should be one-off lessons to learn, and a good code linter should be able to catch potential errors like this.

Between a good code linter (or some static analysis, I'm pulling for JET.jl) and a formal interface spec, I really think most of Julia's development-side issues could be quelled.


I agree with the kernel of your point here, but also with the author of the article when he says "But systemic problems like this can rarely be solved from the bottom up, and my sense is that the project leadership does not agree that there is a serious correctness problem. They accept the existence of individual isolated issues, but not the pattern that those issues imply."

My impression is that the Julia core devs are more focused on functionality and being able to construct new, more powerful, faster capabilities than on reflecting on how the foundations could or should be made more rigorous. For this, I think the devs have to philosophically agree that soundness in the large should be a first-tier guiding principle, and that the language should have mechanisms whereby correctness-by-construction can be encouraged, if not enforced. Presently, notions of soundness seems to only be considered in the small, such as the behavior of specific floating point ops. Basically, I don't think the core devs are as concerned with soundness, rigor, and consistency as they are with being able to build more impressive capabilities.

I don't want this to sound like I'm ungrateful for the awesomeness that Julia and its ecosystem does bring to the table. For numerical computing, I don't see any alternatives whose tradeoffs are more favorable. But it is disappointing that it doesn't seem to learn the lessons about rigorous language design and the language-level implications for engineering vs. craftsmanship appropriate for a twenty-first century language.


Sounds like Julia needs a Snow Leopard/Mountain Lion/High Sierra release - no new features, just cleaning things up...


Could some of the need for interfaces be addressed by providing an extensive test battery for types of object? It seems like if something claims to be an implementation of a floating point number it should be possible to smash that type into every error ever found to uncover implementation errors.


It's possible to hack interface verification into place at test-time, but that has a couple of problems:

1. Running the whole testing framework to determine if you implemented an interface is a high overhead when you're developing

2. You have a lot of tests to write to really check every error. Perhaps a package which defines an interface could provide a tester for this purpose

3. Interfaces should be attached to the types, and that should be sufficient for verifying the interface

I would settle for something like checking for the implementation of methods a la BinaryTraits.jl over what we have now, which is nothing. A huge step would be documentation and automated testing that proper interface methods are implemented, not even verifying if they're "correct". This drastically reduces the surface area you need to write and check to confirm compatibility with outside code.

This simple interface specification does produce design issues of its own, but correctness is much easier to handle if you know what needs to be correct in the first place.


Yes, although that seems like the easy half of this, making sure `struct NewNum <: AbstractFloat` defines everything. There aren't yet tools for this but they are easy to imagine. And missing methods do give errors.

The hard half seems to be correctness of functions which accept quite generic objects. For example writing `f(x::Number)` in order to allow units, means you also allow quaternions, but many functions doing that will incorrectly assume numbers commute. (And not caring is, for 99% of these, the intention. But it's not encoded anywhere.) Less obviously, we can differentiate many things by passing dual numbers through `f(x::Real)`, but this tends to find edge cases nobody thought of. Right now if your algorithm branches on `if det(X) == 0` (or say a check that X is upper triangular) then it will sometimes give wrong answers. This one should be fixed soon, but I am sure there are other subtleties.


It might be useful to separate the issues that are "just" bugs from the problems that come with Julia's unusual level of composability. I have no idea if Julia has more bog-standard, local bugs – things like data structure problems or compiler faults – than other languages of comparable maturity and resources, but clearly the OP has bumped into several, which is frustrating.

The composition bugs – as in offsetarrays or AD – are a bit of a special case. In most languages package A will only work with package B if it's specifically designed to, and the combination will be explicitly developed and tested. That A and B can work together by default in Julia is really cool, but it also means that as you add new types and packages, you have a quadratically growing set of untested edges.

The canonical solution is strict interfaces. But Julia is laissez faire about those too (with some good reasons). Together this means that if A doesn't work with B as expected, it's not always easy even to assign fault, and both might be reluctant to effectively special-case the other. Program transformations (autodiff) compound this problem, because the default is that you promise to support the universe, and it's not easy to opt out of the weird cases.

I think it's absolutely right to celebrate Julia's approach to composition. I also hope new research (in Julia or elsewhere) will help us figure out how to tame it a bit.


> That A and B can work together by default in Julia is really cool, but it also means that as you add new types and packages, you have a quadratically growing set of untested edges.

But as the authors example showed, they clearly can't work together - they just fail at runtime instead of at compile time.

Other languages have generics and interfaces to make stuff like this dynamically exchangeable. Sure, your code needs to be designed to support this, but it also means that the author explicitly thought about what they expect from their data structures. If they don't, you might suddenly find yourself violating implicit assumptions like arrays starting at 1.


any tutorial/blog on what make julia compositionability special vs othe languages? Is there a relation with multiple dispatch or delegation?


Julia has a very nice type system, the nicest of any dynamically typed language I am familiar with. This is something to do with multiple dispatch, but it's more to do with trying to have a type system that allows all the JIT to unbox all the things that have to be unboxed for high performance without sacrificing the freedom of dynamic typing.

IIUC, Common Lisp is the giant on whose shoulders Julia built in this respect.


Yes. Its a side effect of multiple dispatch being the core paradigm of the language. See Stefan Karpinski's talk about it: https://www.youtube.com/watch?v=kc9HwsxE1OY


The title of Stefan's talk is great: The Unreasonable Effectiveness of Multiple Dispatch. He gives a nice example of composability: how you can throw a new type into an existing algorithm and it just works.


The "Unreasonable Effectiveness of Multiple Dispatch" talk is a good example of how multiple dispatch is special in a good way, in that everything (should) work together as new types and functions are added to the ecosystem. However, this also means the scope of potential integration bugs encompasses the entire ecosystem. The Julia manual has a small section about special composibility pitfalls arising from multiple dispatch: https://docs.julialang.org/en/v1/manual/methods/#man-method-...

As best as I can summarize it: Multiple dispatch is supposed to dispatch a function call to the implementation with the most "specific" call signature. This means that you must design your functions with an eye to what everyone else has implemented or might implement so whatever function gets called does the "right" thing, and also that your implementation doesn't block someone else from writing their own implementation specialized to other types. This requires some coordination across packages, as shown in one of the manual's examples.

The rules defining type specificity (subtyping) are complicated, and I think not in the manual. They have been inferred by observation: http://janvitek.org/pubs/oopsla18a.pdf. To quote from that paper, "In many systems answering the question whether t1 <: t2 is an easy part of the development. It was certainly not our expectation, approaching Julia, that reverse engineering and formalizing the subtype relation would prove to be the challenge on which we would spend our time and energy. As we kept uncovering layers of complexity, the question whether all of this was warranted kept us looking for ways to simplify the subtype relation. We did not find any major feature that could be dropped." Julia's multiple dispatch allows a high degree of composibility, but this does create new complexity and new problems.


Most of these seem to be about packages in the ecosystem (which, after clicking through all links, actually almost all got fixed in a very timely manner, sometimes already in a newer version of the packages than the author was using), not about the language itself. Other than that, the message of this seems to be "newer software has bugs", which yes is a thing..?

For example, the majority of issues referenced are specific to a single package, StatsBase.jl - which apparently was written before OffsetArrays.jl was a thing and thus is known to be incompatible:

> Yes, lots of JuliaStats packages have been written before offset axes existed. Feel free to make a PR adding checks.

https://github.com/JuliaStats/StatsBase.jl/issues/646#issuec...

EDIT: Since this comment seems to gain some traction - title is editorialized, original is "Why I no longer recommend Julia".


"known to be incompatible"

Known to whom? People who regularly participate in the Julia forum/chat? Julia's composability relies on people agreeing on unwritten rules and standards.

In other languages, such incompatibilities are caught by the compiler. Even in other dynamic languages like Python or Javascript, it is now considered best practice by many to annotate types whenever you can. Like Julia, Haskell is also composable. Unlike Julia, it does not need to sacrifice correctness.


Agreed, one cannot just expect this to be known.

Does type annotations in Python actually catch type errors? I thought they were mainly for documentation.


Absolutely yes, but you have to use a typechecker like mypy (and generally make it part of your release builds). I've found typechecking my Python code makes my development iterations much faster than writing tests. My biggest issue is that if you are using a legacy codebase or 3P library without type annotations then the "Any" type become pervasive and removes much of the value you get from type annotations. You can run mypy in a mode that flags when this is happening, but it's not like you're going to go type annotate the world just to push your code change.


Yes, if you use tooling (mypy). It definitely helped me a few times.


I'm not sure what to make of this. Yuri is great and I'll certainly miss having him in the Julia community. Yes, of course there are bugs. We work on fixing them all the time. If there are just too many for you, or we are too slow at fixing them for you, then OK I understand you might walk away.

With these kinds of posts (and the reactions to them) lots of issues tend to get conflated. For example there are issues with OffsetArrays because some people write code assuming indexes start at 1. Starting at 0 wouldn't fix that. A static type system wouldn't fix that; most static type systems don't check array bounds. Are we supposed to un-register the OffsetArrays package? Should we disallow overloading indexing? Personally I have told people not to use `@inbounds` many times. We could remove it, but those who want the last drop of performance would not be too happy. The only path I see is to fix the bugs.

> They accept the existence of individual isolated issues, but not the pattern that those issues imply.

I admit, I do not see the pattern allegedly formed by these issues. Of course, static types do remove a whole category of issues, but "switch to static types" is not really a practical request. There are other things you can do, like testing, but we do a LOT of testing. I really do not mean to downplay Yuri's experience here, I am just not sure what to take away other than that we should work even harder on bugs and quality.


I've worked on large engineering projects in physical disciplines. When I am the customer, I often bring in a group of independent experts to review the design products. Often these experts provide inputs that are not 100% usable in the form they're provided. One may have to disentangle their conflation of related-but-not-the-same issues, or ignore the specific solutions they propose, etc.

That being said, I have learned the hard way not to ignore or trivialize these review inputs, even if they are not immediately actionable as-provided. Users and reviewers are really good at figuring out weak areas or flaws even if they can't articulate the solutions, fully unentangle related issues, or do all the generalization or abstraction that would make those issues easier to address. There is usually some truth underlying the negative feedback.

The article looks to potentially be an example of an expert review in the above vein. If you are able to take a step back, you might find the HN discussion on this submission to provide further inputs to help figure out how any of this should be channeled into language, practice, and ecosystem improvements. Certainly there is more to work with here than little "to take away other than that we should work even harder on bugs and quality."


If you look at the history of lots of packages in matlab they fixed tons of bugs that sound similar to this stuff over the years. It requires consistent hard work by a core group of people who understand the issues to get everything right. I have no idea who maintains Julia and these packages but the author of the article mentions this as language problems — aren’t these just bugs? Like if gcc was incorrectly multiplying some constant by the wrong value, that doesn’t sound like a bug with C but a bug with gcc right?


The author's point seems to be something like: not only are there these bugs, but there is a lot of them that people are running into regularly, and the project isn't headed in a direction where the situation as such will improve (as in even if these are fixed, by the time that happens, there will be even more).

Hard to prove or disprove.


Julia has more than 18k closed issues on its github. No wonder such an active user encountered a lot of it. It's not a problem with the language, though. Yes, it allows to use offsetarrays and @inbounds together, but C can read out-of-memory locations too, so what?

Edit: Julia is better than C in this regard, since the usage of @inbounds is explicit, i.e. everyone can see that the code is potentially unsafe.


I think the point he was trying to make was that the example for @inbounds from the official documentation could cause out-of-bounds accesses, while it was clearly stated that you should only use @inbounds if you are sure that no out-of-bounds accesses are possible.


The issue is that there is no way to verify if OOB access is possible given an abstract type, unless you know how that type behaves, i.e. how it's indexed.

And Julia provides no way of specifying the behaviour of abstract types.


> And Julia provides no way of specifying the behaviour of abstract types.

I'm not sure if "no way" is accurate. There are interfaces and one could use traits. As cited above, we could use `eachindex` or `CartesianIndices` to get a list of the valid indices. The problem is enforcing and testing these interfaces.


There are no interfaces in Julia. There is simply documentation, and the hope that people will read, understand, and follow it.


> but C can read out-of-memory locations too, so what?

So it's widely considered a plague upon the field, suffered because of the lack of alternative?


> but C can read out-of-memory locations too, so what?

Simply decades of exploitable security issues.


The Julia example is closer to Rust's `unsafe`. Pretty much every language let's you skip bound checks, in Julia (like other modern languages) it is elective. The author was complaining about a library that decided to skip the bound check in a clumsy way (there happens to be a "correct" way to skip the bound check). It is not really about the language.


Then it makes sense. Thanks for the clarification. Was worried that skipped bounds checks was something more intricate than simply explicitly annotating a statement to say "trust me, I know what I'm doing!".


A more appropriate title for the OP would have been:

"A new language that makes it easy to write and use generic algorithms on a growing number of custom types developed by others is bound to experience growing pains as difficult-to-foresee correctness bugs have to be discovered and fixed over time."

In my humble opinion, this kind of universal composability, which Julia makes easy via multiple dispatch and naming conventions, is the underlying root cause of all the correctness bugs that have surfaced as the language has evolved. But the bugs are being fixed, one at a time, and ultimately the result should be both beautiful and powerful. We will be all be thankful for it!


The most tragic thing here to me is that we're losing Yuri — who has been an invaluable contributor and bug-reporter for issues like these — and that Yuri got burned out instead of feeling empowered.


Yeah, good point. Sometimes I wonder if the fact that so many of the folks developing and using Julia are both highly educated (e.g., in math) and insanely smart (evidently) is a barrier to mass adoption. That is, I wonder if the broader mass of developers out there -- many of whom are less knowledgeable -- find it difficult to benefit from and contribute to the Julia ecosystem.


I am not sure I know of any statically typed languages with generics, that experienced the same kind of problems on multiple occasions. The only one I am aware of is C# and array variance, which is kept for compatibility purposes.


Viral frequents HN so I will be curious to see if he engages this directly in a productive manor.

There are many great qualities of Julia, and I've wanted to love it and use it in production. However, coming from the tooling and correctness of Rust leaves me thinking something is just missing in Julia. One of the links in the post references "cowboy" culture. While I don't think this is the correct nomenclature, there is a sense with looking at the package ecosystem and even Julia itself that makes me think of the pressure in academia to publish constantly. I'm not sure what to make of that, and it's simply a feeling.


I think Keno's comment above pretty much articulates my thoughts as well. I have met Yuri on several occasions and have been thrilled to see his contributions. I find the post constructive and it will certainly help make Julia better, and hope Yuri will be back at a later date.

Some of the issues linked are JuliaStats issues, and there's a lot happening to improve it, which should become more visible over the next few months. Example: https://discourse.julialang.org/t/pushing-julia-statistics-d...

Julia really pushes on language and compiler design in ways many statically typed languages do not. There is real wok to be done at the frontiers, and also investment in tooling built on top of that. It is all happening. The package ecosystem takes time to mature - Julia has a deliberate release process, the key packages have adopted a more deliberate release process, but stuff out in the long tail naturally tends to move fast - as it should.


I've been a user of Julia for some time (at least since beta versions). I love the language and feel like the author of the blog post is maybe exaggerating or generalizing a bit too much. On the other hand, based on my personal experiences with Julia, I can definitely empathize and feel like there's a lot about the blog post that rings true.

I share your sense that "something is just missing in Julia" but I maybe disagree with the author in that I see it as potentially changeable or something, as not hopeless.

Julia has grown tremendously in a short period of time, both in the language, its implementation, and the size of the community. So in that sense I see it as inevitable there's going to be a lot of bugs and chaos for a bit.

On the other hand, I've always felt a bit of unease that a numerical language was being developed from the ground up as that, without it being an offshoot of more general purpose language. It's not that I think there's something inherently wrong with it, but I do think that having a greater variety of perspectives looking at it are more likely to catch things early.

I don't think in this regard it's a function of academia -- although it certainly could be -- it's more a function of having a very narrow community looking at the language. Regardless of how smart they all are, I think having a broader range of perspectives might catch things earlier.

In this regard, I might have preferred the Julia fervor and effort be put into some numerical Nim libraries, or a numerical "abstracted subset of Rust" or something. It's not so much I dislike Julia as much as it is I'd feel safer with a more generalist perspective on basic language design.

But who knows. To me it's a bit ironic the author focuses on Python as an alternative, because it's not like that is free from problems, and Python has been around for a lot longer. They might be different problems, but they're not absent. Python is a bit ironic too in that it has been sort of kludged together over time into what it is today, for better or worse. I guess it feels like to me all the major numerical programming platforms have this kind of kludgy feeling in different ways; Julia feels/felt a bit like an opportunity for a clean break, if nothing else.


I don't think there is anything "numerical" about the core language design of julia; it is just a general generic-function-based OO language. In fact I think we made many decisions in line with trends in the broader language world, e.g. emphasizing immutable objects, having no concrete inheritance, using tasks and channels for concurrency, and deliberately avoiding "matlab-y" features like implicit array resizing. Of course many in the "general purpose" crowd don't like 1-based indexing, but surely that is not the source of all of our problems :)


Fair enough, I might have to eat my words a bit. Julia does have a lot to offer in terms of language characteristics, that's true, and I think part of the appeal. It has been a breath of fresh air, and feels well-thought out from basic principles. But along the lines of the original linked article, it's maybe worth thinking about why Julia hasn't seen more widespread adoption in say, web servers or systems programming, etc and so forth. I don't mean that as a criticism, just that I do think it's been marketed (or received) as a numerical computing language, and that's the community that it's primarily developed around for one reason or another, with its concomitant specific blind spots (as do all language communities).

For what it's worth, I prefer 1-based indexing.

My guess is a lot of what's in the post is probably tied to growing pains and maybe butterfly effects of novel language features on bigger-picture patterns. It would be interesting to see where things were at at a similar stage in other languages.


I've been a part of many language communities, and that the Julia team is the very best in terms of the professionalism of the language and the key modules.

Maybe the best response to this is to view it as a call to action for us Julia fanboys/girls to stop cheering and fix some bugs ;-).


I've had a couple of conversations on twitter with Viral B Shah (co creator of Julia) which I found unprofessional, so I stopped learning Julia. Unless he was just having a very bad day, in my opinion he takes badly to minor criticism of Julia (although others might disagree).

Edit, here is one thread I could find quickly: <EDIT2: edited out link which most people seem to think is actually fine, just people getting slightly annoyed on Twitter. I deleted the link as people were going and interacting with people in the old thread>

The comments aren't particularly bad, but they do feel to me like making a bad faith interpretation of someone's comment, then digging in. I don't feel that's a good way to talk to users, and ethos comes from the top.


I don't see anything problematic in what Viral said here; I think it would be fair to say your initial take ("Julia has been the future of machine learning for 10 years and will stay as the future of machine learning for the next 10 years") is likely to be perceived as at least somewhat inflammatory, a defensive response is natural enough in that context.


Yeah, I fully expected based on the description of the twitter interaction to see something really terrible, and from actually looking at it, it seems pretty mild. If anything, it seems like they went out of their way to try to bait the Julia creator and he had a fairly reasonable response to it. I'm not sure what could be considered "inflammatory" about any that.


What part of the conversation justifies "If you truly believe that nobody will ever adopt anything new, we would all have been programming in Fortran or assembly!"? To me that is a stupid escalation -- noone was suggesting not to do new things, Python (the discussed AI alternative) is of course newer than Fortran and assembly for a start!

That just seemed like a bizarre overreaction to me.


With the greatest respect, nothing about his comment is inflammatory in the least, and I say this as someone who is avowedly skeptical about the ability of the Julia creators to accept criticism.


It's nice to hear an independent viewpoint. To me it was "oh, so randomo on Twitter is coming in randomly and looking angry. Oh, it's not a randomo, it's to co-creator of Julia!".


That thread is just ripe with bad communication across the board. It's pretty clear that none of you understand what each other is saying, but are very willing to infer.

Maybe try not communicating on twitter.


Do you have an example? I'd like to know more about this - it must have been quite egregious if it makes you stop learning a language.


I posted one in. It isn't that bad, but to be honest nowadays I believe the community of a language is as important, if not more important, than the language itself. I don't want to get into a community whose leaders just start jumping on random minor Twitter users.


That guy is a famous Kaggler that works for Nvidia, not a minor Twitter user


Honestly, it's interesting you say that, I went and looked at his follower count and see what you mean. I knew him back before any of us had Twitter :)


I wonder how much of this is just that Julia is more composable than most people are used to, and the community hasn't yet developed the patterns and culture that are needed to avoid these kinds of problems.

I'm thinking, for example, of the way that Smalltalkers often create parameters with type-evocative names, such as "aString". Or Objective-C with two-letter prefixes to work around lack of namespaces. Or even the Java "EntityAdaptorFactoryFactory" design aesthetic. (Some of you will shudder, and I'm with you, but it did solve real problems that the Java world was facing.)

Julia is still a pretty young language, and it's probably only recently that the ecosystem has gotten big enough to hit these problems.

Edit: come to think of it, one of the issues that the Java folks were dealing with was lack of composability. :-/


As a huge fan of Julia i got to fully agree. Although i would probably not "no longer recommend Julia" but "give huge caveats when mentioning Julia". Organisations (that includes those who maintain programming language) have values Bryan Cantrill has an excellent talk on this https://youtu.be/2wZ1pCpJUIM and i got to agree with the author that correctness (especially correctness under arbitary composability) is not a value that Julia teaches and instills in its users. Some Julia users care about this, some core maintainers do to (as the Pkg3 demonstrates). However there are many invocations (SafeTestSets vs Test) and stumbling blocks. I am aware of no efforts to do formal verification on Julia code. There are no good ways to move certain Run-Time to compile errors. Correctness is not a value of the Julia language. Here is the good thing though, as Bryan demonstrates in his talk, you can hire for values.


In extreme composability, it might be hard to determine where the origin of a bug is. Worse yet, when libraries start adhering and relying on the brokenness of other libraries, fixing the once minor bug isn't enough anymore. How do you address technical debt in such situations?

In my mind Julia broke new ground in terms of what happens when you create an environment where such compasibility is possible. Author's finishing thought is apt:

> Ten years ago, Julia was introduced to the world with inspiring and ambitious set of goals. I still believe that they can, one day, be achieved—but not without revisiting and revising the patterns that brought the project to the state it is in today.


Ouch. That sounds all the more damning for the authors studious care to calmly describe instead of angrily rant.

I’ve spent too much time in research working on codebases that feel like quicksand — you never know what changing something might do!— to want to worry about that for stdlib or major package ecosystems, too.


I feel this post is a bit unfair and quite outdated (seems like it's written 9-12 months ago), and I interpret his issue as a prioritization issue, not a language one. If your priorities mandate a more mature ecosystem, you should use one. The Julia ecosystem is much smaller - both in terms of people and development invested, than Python, Java or JavaScript, and still overperforms in many aspects of computing. If those aspects, where Julia is first-of-class, are not your priorities, and your fault tolerance is very low, maybe another tool is better for you.

Also, as every ecosystem, the Julia Ecosystem will naturally see some packages come and go. JSON3 is the third approach to reading JSON (and it's terrific). HTTP.jl is the reference HTTP implementation - Julia hasn't had it's `requests.py` moment. Web frameworks have also been immature, python has had `Django`, `pyramid`, `flask` and so many others before `FastAPI` (along with new language features) came and dominated. Some people need to put effort in attempts that will naturally hit a dead end before we have a super polished and neat FastAPI.jl, and the same goes for everything.

Also, https://github.com/JuliaLang/julia/issues/41096 is referenced with a wrong name that involves the issue's author's misunderstanding, can you update please and, if possible, add a note about the edit?


Only thing "interesting" to me there would be the automatic differentiation bugs ...but is there any argument as to them being the fault of the language, instead of just poor engineering from the library developers' part?

I mean, one can't expect all algorithms to work correctly with all datatypes just because the compiler allows that code to run ...you write tests and guarantee numerical stability for a small subset of types you can actually do it for, and then it's the code's consumers' job to ensure it work with types it's not documented to work and such, no? ...Julia is quite a dynamic language, JITed or what not, its semantics are closer to Python and Lisp than to Rust or Haskell ...maybe don't expect guarantees that aren't there and just code more defensively when making libraries others depends on?

Probably the Python + C(++) ecosystems works better bc their devs know they are working in loose, dynamic and weekly typed shoot-your-foot-off type languages and just take action and code defensively and test things properly, whereas Julia devs expect the language to give them guarantees that aren't there.


I think the author addresses this. It’s a Catch-22. If you restrict use to a small subset of types you’re undermining one of Julia’s best features.

As someone who has been writing a lot of numerical analysis code recently, I would absolutely love a type system that could describe and enforce numerical stability traits.


Right. It's important to remember that tools like JAX and PyTorch have total control over the numerical libraries they are differentiating, and have freedom to impose whatever semantics, rules and restrictions are convenient (immutability and referential transparency in JAX, for example). Seemingly small decisions in an existing language and library can have a big impact on the feasibility and practicality of AD.


That's exactly where Dex might improve over Julia, with language level control over mutability and effect handlers and array access safety ... time will tell.

So packages just use those features

Maybe it will hit the right trade off, or maybe Julia will adopt similar language level tools, but adjusted for dynamic semantics. Is that even possible?


> a type system that could describe and enforce numerical stability traits

Wow, that sounds cool! have your reasearched if anyone has done anything in this are? how would you even start to approach the problem?

Do you think it has any change of being done without massive sacrifices to performance?


in rust code, I like using `debug_assert!` to represent numerical expectations/assumptions of the implementation. later if I have a problem, I can turn on debug assertions and I will get a bunch of additional checks. but I can also turn them off and not pay for them all the time.


So it seems Julia's multiple dispatch (dynamic dispatch for any function based on argument types) has a flaw: namely, if the types used do not match assumptions present in the implementation of the function (e.g. arrays start at 1), the results may be silently incorrect. Julia's multiple dispatch is really cool, but I'm not sure how this issue can be prevented in practice (without a lot of added verbosity). It'd be a pity to have to restrict yourself to a small set of types you know work with the functions you're using, because multiple dispatch is one of Julia's killer features.


Not specific to specific examples in the article, I think some of the things people perceive as "bugs" other people see as features or an opportunity to correct past mistakes.

I can remember an example where I suggested automatic treatment of missing values in a stats library, and the library maintainer disagreed. Meaning, my lobbying for Julia to do what R/Python did was seen as "Yes, but that's wrong and we shouldn't promote that sort of treatment". As a business user, I didn't care that it was theoretically wrong, the maintainer as an academic did.

That ends up becoming open-source prerogative. I could do it wrong "on my own time" in my own code...doesn't make either a bug, but a different choice based on perspective.


This is a pity. It seems like a great language and I'd be keen to dive in more, but it seems fair to expect a math/numerical analysis-oriented language to be especially dependable wrt correctness.

I remember a claim made by Mathworks about MATLAB and wondering if it wasn't far fetched, but if true I appreciate it: "A team of MathWorks engineers continuously verifies quality by running millions of tests on the MATLAB code base every day." https://www.mathworks.com/products/matlab/why-matlab.html#re...


I actually wouldn't be surprised if the total number of tests run in the Julia ecosystem wasn't too different (thousands of packages with typically hundreds to thousands of unit tests, run on every commit and PR) -- virtually every Julia package has CI set up (at least standalone unit tests, though many packages could use more integration tests). Of course, in neither Matlab nor Julia do tests guarantee correctness.


Is that tests for the purpose of verifying correctness or tests of applications that will flag problems incidentally? I'm not too familiar, but like the idea of dedicating resources to that specifically.

Guarantees aside, does MATLAB have an issue with this to the same extent as Julia?


Personally I'd probably categorize most unit tests as verifying correctness (but only for the scenarios tested); integration tests may be more useful for finding incidental issues that you wouldn't have thought to test for directly. I'm for sure on board with dedicating more resources to testing -- and in my case as an academic, this is something I only have really been exposed to as a result of interacting with the Julia community.

Matlab is pretty mature at this point, but I'm sure it's had its share of bugs over the years as well (especially if you also counted the file exchange, which is probably the closest thing they have to an open source package ecosystem); it would be interesting to compare the two at a similar level of maturity / development person-hours if quantitative data could be found.


sample size of 1, but I've run 1 billion tests today in Julia (floating point power for Float16, Float32 and Float64)


For correctness? What was the result?


Up until a few days ago, we were testing that x^y was accurate to 1.3 ULPs for Float16, Float32, and Float64. However, for Float16 and Float32, we were actually accurate to (at least) .51 ULP, and Float64 was accurate to 1 ULP so I made the tests stricter there. There are 2 exceptions to this: x^3 and x^-2. because people from a math background often write code with literal powers, and expect it to be fast, for small integer powers (-2, -1, 0, 1, 2, and 3) that are constant, Julia will replace the call to pow with a call to (for example) xxx for x^3. As such, the accuracy bound for x^3 is 1.5 ULP and the bound for x^-2 is 2 ULP for all data types.

This fixed a rare test failure on CI (since for ^3 and ^-2 the bounds were too tight the previous test would fail roughly 1/1000 runs), and will prevent regressions in accuracy if I ever come back to try to make the implementation faster.


Knowing adgjlsfhk1's work, yes, this would be for correctness — specifically measuring error in ULPs. Most frequently, adgjlsfhk1 pushes Julia's numeric routines to errors below 0.5 ULPs — that is, perfect correctly rounded behavior.


I actually am not a believer in perfect rounding. It tends to have a high performance cost, and IMO isn't that useful.


This article contains no instances of the word "test", which seems surprising but entirely in keeping with the author's observations.

> Julia has no formal notion of interfaces, generic functions tend to leave their semantics unspecified in edge cases, and the nature of many common implicit interfaces has not been made precise (for example, there is no agreement in the Julia community on what a number is).

> The Julia community is full of capable and talented people who are generous with their time, work, and expertise. But systemic problems like this can rarely be solved from the bottom up, and my sense is that the project leadership does not agree that there is a serious correctness problem. They accept the existence of individual isolated issues, but not the pattern that those issues imply.

It sounds like the cultural standard for writing libraries is, "works good enough for users like me" which should be good if you are using things the same way as the authors. Writing good tests for numerics is hard and grueling; testing numerics or numerics-like code is not nearly as fun or productive-feeling as using numerics to get shit done, so it all makes sense to me.


I mean this looks like good potential targets to improve the language moving forward, it's healthy to not be in awe of your tools and push to make them better. I don't see this as "bad" honestly.


It seems like the point of the article is that that push is insubstantial, if it even exists. Given the language has been around this long it's a bit worrying that stuff like that is the potential target for moving a language forward.

Julia has always had a reputation in my mind at least of being "by academics, for academics" and there's unfortunately a dark side to that in terms of reliability and maintainability. The concept and goals are great, which is annoying. If this language had stayed focussed on the basics, it would be extremely handy for someone like me who trains and deploys models in an edge computing environment. No way I'm doing that with stuff like this going on.


For what it's worth many people feel similarly about R. R is great for people actively working in statistics research (I assume because that's what I'm always told). But for a lot of us who just want to do some analysis, it's constantly breaking and we've learned to default to just starting from scratch when we need to revisit something we did a few years ago. Or we figure out how to buy a commercial system.


R is not constantly breaking. R Core does a remarkable job ensuring backwards compatibility. There are only a few prominent examples of significant "breaking" behavior across decades of the language existing, and those can often be reverted by setting an option (e.g. `options(stringsAsFactors = TRUE)`). But backwards compatibility is the primary concern with any update to the R language or the packages maintained by R Core.

Now, if you're thinking about changes introduced by a specific user-contributed package breaking your analysis, that can indeed be a problem. But that can't be blamed on the R language. And the main user-contributed R statistics packages that have been around for decades (such as lme4 or survival) are mature and stable.


I suppose we'll see? Honestly this is maybe an opportunity to adjust some goals of the language if this is the feeling people are having now and outreach to purely CS and SE people will probably be needed but seeing the presence it has at the MIT, I don't see it being a problem.


I think the real test will be whether or not Julia's custodians / developers start putting a greater focus on semantics and correctness.

When a language's raison d'être is to try out certain ideas, it probably makes sense for a while to ignore corner cases and rigor. But as the author points out, they eventually become gating factors for wider adoption.


The question here is are these merely just bugs or is there something about the language that makes Julia error prone?

There is potential in using Julia's type inference engine to check for correctness. For example see JET.jl. "JET.jl employs Julia's type inference to detect potential bugs."

https://github.com/aviatesk/JET.jl https://www.youtube.com/watch?v=7eOiGc8wfE0

The video brings up some potential difficulties with Julia's metaprogramming facilities for static or lexical analysis, but also shows that these issues are also addressable.

The type inference system could be exploited for further effect. For example, the type system could be extended to check for shape information within the type as demonstrated in this prototype: https://twitter.com/KenoFischer/status/1407810981338796035

Julia has guard rails (e.g. default bounds checking), but also also provides facilities to work outside them (`@inbounds`, `unsafe_*` methods, `ccall`, in-place methods with a `!` suffix). Typically these provide features that trade safety for performance or access to features. Used judiciously one can achieve a balance between performance and safety. Julia is not a language that restricts its users to a sandbox in the name of safety, but it does provide bounds of where the sandbox is and is not.

Another take away from the original blog post is that much Julia development is happening in the open on Github. These issues and their fixes just require a Github account to contribute to. Is this a feature?


Jet.jl is far from a solution. Over short or long JuliaComputing (or someone else) will have to pay people full time to develop such tools if it wants to see larger adoption. Nobody expects Julia to be system language a language you write an OS in). The later those tools come the more code will need to be fixed up.


It's still at version 1.x, maybe an explicit roadmap could help tackling those issues?


The power of allowing everyone to make foundational types and functions that work together is indeed dangerous. I'm not sure you are better off in the even more dangerous waters of c/c++/fortran, except that they are older and more established with many times the man-hours sunk into them. Is there a good way to control the interaction of these many different libraries with losing the generality and composability of Julia?

I will say that as a matter of language design 1 based indexing is perfectly fine, 0 based indexing is perfectly fine. Choose your own indexing is a hilarious foot gun, so no surprise it went off sometimes. Fortunately using it seems to be quite rare.


But it's not a matter of language design. The 'choose your own indexing' is something you do entirely in libraries.

You can create your own indexing in python too, it will just be slow. The 'sin' of Julia is that it will be fast...


Wait, are those examples real?

I remember complaining about 1-bsaed indexing only to be told "julia is great! we have offsetindex". If it's a source of bugs, that ... greatly reduces my future interest in adopting the language.


I was bit by trying to figure out of how to combine unit of measurements with other numerical computations. Ultimately a lot of the features look great on paper, but once you start using them, I only ever was able to produce an ungodly mess instead of what I could accomplish in Python in roughly the same time. Everything that goes beyond what Matlab does, sometimes looks great on paper but is not very pleasant to use / sometimes badly broken unfortunately. That being said I work in an area of scientific research where Julia or more specifically DifferentialEquations.jl would seem to walk away with the win, but I find myself searching for alternatives implemented in Jax.

I would still think most of this is my failings, but it is also extraordinarily hard to figure out what is going wrong.


Anything other than units? I'd be curious to know. Unitful.jl is something which I think is completely the wrong architecture (it violates many standard assumptions about arrays when used in arrays) so that's a somewhat special case (and I plan to create a new units library to completely remove uses of Unitful).


I greatly appreciate your work, by the way, I just have not been able to make use of it effectively. Some of the problems revolve around the question of how to best proceed in a situation, where you have a high dimensional state space, but it is naturally partitioned in some way. There are several solutions for this (SubArray etc.), but the burden is entirely put on the user. I had the impression that there was a tension between what I would have preferred to write and what I could easily write as soon as I attempted to generalize from the examples that I could find. With JAX the corresponding libraries operate on Pytrees and as a user you can specify equations pretty naturally and easily without much fuss. If you want to use XLA, MPI, CUDA in Julia it typically is the same, in theory it should be possible to make things work in practice I have struggled tremendously to do anything productive.

When I looked at the adjoint event handling code last, I couldn’t figure out in the implementation, whether the general case was handled correctly, especially since parts of it still seemed in flux. Writing similar code in JAX leaves close to no room for interpretation that the code is correct. I am sure most of it is down to familiarity. But since ultimately I want to do ML relared things, right now JAX and related libraries ties up things that are there much more neatly even though overall SciML implements a more comprehensive set of techniques. I am still closely following the work around it especially in the area I am interested in and have some prototypes written in it, but it just hasn’t clicked yet.


I highly recommend taking a look at George Hart's work on linear algebra: http://georgehart.com/research/multanal.html , although I do think he misses the point (or at least insufficiently emphasizes) that you almost never want to work with linear maps that cannot be described by a set of units on each axis that are multiplied together to get the entries.


I am curios, what are those many standard assumptions about arrays that Unitful violates. There does not seem to be any space left for alternatives as representing each element with struct:

> struct Element{T, Unit} <: Number > value::T > end

which is placed in the array.


You may already know of it, but if you want differential-equations-in-JAX then allow me to quickly advertise Diffrax: https://github.com/patrick-kidger/diffrax (of which I am the author, disclaimer).


Yes I am aware :) it is missing a few things but I might end up contributing.


Excellent! I'm very happy to take contributions generalising the tool.


>If it's a source of bugs, that ... greatly reduces my future interest in adopting the language.

It can be a source of bugs because some/many packages incorrectly assume that what you pass is 1-based indexed.


I was wondering if the 1-based arrays (and option to change index base) would factor into this.

> OffsetArrays in particular proved to be a strong source of correctness bugs. The package provides an array type that leverages Julia’s flexible custom indices feature to create arrays whose indices don’t have to start at zero or one.

Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.


>Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

Because starting with 0 is neither math nor array indexing in general.

It's just how the base addresss of an array pointer memory block was referenced in C (and it spread from there).

Which is why all math focused languages use 1-based (fortran, apl, matlab, r, mathematica, etc.)


> Because starting with 0 is neither math nor array indexing in general.

It very, very much is. Polynomials all start at a zero "index", as does just about every expansion I can think of (Fourier, Bessel, Legendre, Chebyshev, Spherical Harmonic, etc.) Combinatorics, too, make lots of use of zero indices and zero-sized sets. As for arrays, I'll leave it to Dijkstra[1] to explain why zero indexing is most natural. Zero indexing overwhelmingly makes the most sense in both math and computers because indexing is a different operation than counting.

[1]: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...


>It very, very much is. Polynomials all start at a zero "index"

Notice how you had to put index in quotes.

Because it's not an index, it's the degree of each polynomial term, which is a power.


Notice how you're ad homenim-ing the structure of the argument and not the argument itself? I do not at all see how putting quotes around that word invalidates the argument. I did so because mathematical literature doesn't refer to it as an index (rather as a degree as you mentioned), but it very much does index each monomial. There are an infinite number of index sets for each polynomial -- just as i can index the i'th monomial, so can (i - 7), or (i - 239842), or (i - pi) -- but one of them is obviously the most natural (pun intended).


>I do not at all see how putting quotes around that word invalidates the argument

When the argument is:

  [0] is very, very much is [natural for indexes]
and as an example for that points to something that's not an index -- and the person making the argument knows it is not an index, so they have to put index in quotes:

  Polynomials all start at a zero "index"
...then pointing this out, does invalidate the argument. It might not prove that the opposite is true, but it sure does invalidate the argument.

Notice also how there's no ad-hominen in my response (this or the previous one) as you claim. I argue against the case and the choice of example, not against who wrote it.


GP was addressing the argument. Using the word "you" in the reply does not make it ad hominem.

(For more than you ever wanted to know about this, The Ad Hominem Fallacy Fallacy covers it exhaustively and entertainingly:

https://laurencetennant.com/bonds/adhominem.html )


This is an interesting point, though, respectfully, I do still think it's ad homenim. Internet arguments being what they are I don't much care, but I offer my reasoning here to better understand your point. OP did not engage with any of the points made, merely offering another term (without any sort of elaboration or definition), and said

> Notice how you had to put index in quotes.

as the thrust of the argument. In saying that, they imply that I, the arguer, 'A' in Bond's article, don't actually know what an index is (so how could I have a cogent argument about 'correct' indexing?). As this is the only argument of merit, it seems as though OP is actually trying to counter the point by suggesting (attacking) something of the arguer (myself).

Now, it may be that this ad homenim is justified -- if I truly don't know what an index is then yes, I probably should not be making claims about them -- but it's still an ad homenim (and, possibly, poor form).

Of course, this is ascribing a lot to 25 words of text with little other context. I would be interested to understand if you see things differently/think I have grossly erred in my analysis.


>OP did not engage with any of the points made, merely offering another term (without any sort of elaboration or definition), and said Notice how you had to put index in quotes.

Yes. That's addressing the point you made.

Ad hominem would be: "You're a bad person/you have this or that flaw/etc (unrelated personal stuff)".

This is: "You put the index in quotes, because even you know that this is not an index. And in any case, this is not considered an index in math, it's a degree, which is a different thing".

I also didn't "merely offered another term", as if I made up some term on my own, or just offered on of several equal alternatives. Instead, I gave the correct math term for the thing described.

>In saying that, they imply that I, the arguer, 'A' in Bond's article, don't actually know what an index is (so how could I have a cogent argument about 'correct' indexing?).

It would rather imply the opposite: that you know what an index is, and you know that the thing you applied it to, is not an index (which is why it was put in scare quotes).


> In saying that, they imply that I, the arguer, 'A' in Bond's article, don't actually know what an index is (so how could I have a cogent argument about 'correct' indexing?).

That's not what they are saying. They are saying you know what an index is so well that you correctly put quotes around your usage of the term, because you understood it's not in fact a technically correct usage.

Now calling an argument poor form... that's closer to ad hominem.


The coefficients are indexed. The n on a_n in Σ a_n X^n is an index.


For some of these polynomials such as Fourier polynomials, it is natural to think about negative subscripts from a pure mathematical perspective. While these can mapped into non-negative integers, it is often intuitive to use the "negative subscripts" as indexes necessitating methods such as `fftshift`. For many of these polynomials the concept of where they "start" is arbitrary.


Math (usually) uses 1-based indexes because those parts of math started before the concept of zero as a number, and then the convention persisted, even down to Matlab.

There are many similar path-dependent conventions in human culture. E.g. percentages originated before the concept of decimal fractions, base-sixty time units come from ancient Mesopotamia, and conventions about multi-dimensional array memory layout are based on the convention for drawing matrices on paper.

Most common mathematical sequences and series work better (more naturally/clearly) when zero indexing is used instead, and off-by-1 errors are a problem in mathematics just like computing (but less of a problem, because notation errors get silently corrected in readers’ heads, and don’t actually have to be interpreted strictly).


I am pretty certain that matrix algebra does not predate the number 0


Math traditionally has had some bad notation from a formal point of view, because humans are good at coping with bad notations (or going back and forth between variants), unlike machines and formal systems. Computer science being a (more) formal science (vs math which is overwhelmingly not done formally), it has criticized some traditional math notation which are ad-hoc and not nicely formalizable (and put forward variants that are actually better behaved in terms of mathematical structures).

For indices: indices are about referencing elements of finite ordered sets, say of size N. Hence the 'abstract' indexing set for N elements is the ordinal N. The most canonical way to represent it is to take the length-N prefix of the natural numbers (eg 0-based indexing, von neumann ordinals), which happen to have all sorts of additional structure (eg mod-N arithmetic). This is also consistent with the offset view (the i-th element is at offset i). The fact that people tend to start ordinal numbers at 1 doesn't change anything that mathematicians working with ordinal numbers take them to start at 0, for the same reason we start naturals at 0.

See also: notation for higher derivatives https://arxiv.org/abs/1801.09553; a bit further but in the same vein: notations for free variables in programs as de-bruijn indices (or some variant thereof) (it's further because it's practical for doing proofs, but not for writing concrete terms). There are probably other instances.


It makes iteration less error-prone too when the index of the last element is equal to the length of the array. In C it’s pretty easy to iterate past the end of an array if you use <= by mistake in a for loop, or forget a “length - 1” somewhere.


> It's just how the base addresss of an array pointer memory block was referenced in C (and it spread from there).

There was also the famous 1-pager by Dijkstra: "Why numbering should start at zero"

https://news.ycombinator.com/item?id=777580


Where the argument was "That is ugly"...


> C (and it spread from there).

BCPL.


I don't read much about users modern languages with 0-based index requesting 1-based options/alternatives.


I'll start by saying that I greatly prefer 0-based, and have used but 0- and 1-based indexing, but the choice is largely arbitrary.

0 makes sense as the '0-th offset' when thinking from a pointer perspective, but I often find when teaching, that 1-based comes more naturally for many students (the 'first' item).

You mention mathematical or scientific work...but I often/mainly see enumerations (such as weights x_1, x_2, ... x_n or SUM 1 to N) start with 1, so for these 1-based can be a more natural/direct translation of mathematical notation to code.


My experience is that 0-based offsets (and use of < or even != for upper bounds) mean that I should almost never have to write something like idx - 1 or idx + 1.

I came to 0-based offsets later in my career, having started with Matlab. So I have some real experience with 1-based offsets. Experience that was 'untainted' by being used to a different option. I much prefer 0-based.

Especially because I now sort-off have a linter rule in my head 'if I am writing i - 1 then I am making a mistake or doing something the wrong way'. Which has been quite successful.


I have a similar experience. Matlab was my first experience with computational work. Now I use Python and 0-based is still more natural.


Not for polynomial coefficient indices :-)


Or Fourier coefficients :-) Or pretty much anything where the index/subscript is related to the math itself.

Math textbooks and papers tend to use 1-based subscripts when it doesn't matter. It's hard to come up with examples where starting at 1 facilitates the actual math.


Just for consistency: a_n is the coefficient of x^n, so the constant term ends up being a_0. Based on my experience, numbering starts from one (like (x_1, x_2, x_3) as point of R^3) and off-sets from zero, e.g., when dealing with discrete time, t_0 is the first.


As other posters noted, in mathematics both 0 based and 1 based indexing is used.

When dealing with matrices and vectors (including data tables and data columns), there is a strong preference for 1 based indexing: first row, first column, first entry, etc. Most matrix and vector based algorithms in literature use 1 based indexing. Programming these in a language with 0 based indexing is a mess, and a common source or errors.

When dealing with sequences, especially recursively defined ones, there is usually an initial value (indexed with 0) and then the n-th value is obtained by n applications of the recursive step, so 0 based indexing makes more sense, but in literature there is no fixed convention, and you can find examples with 0 based and with 1-based indexing. Another example of 0 based indexing in math are polynomials (and in extension, power series) where the index is the degree of the term, or in general any functional series where the 0-th term is the constant term.

There are also negative indices.


> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

Because that's how maths work? Literally everywhere in maths you count from 1, except in software engineering. That's why. I hope that clarified your confusion.


My hot take is that 1-based indexing is often a mistake in math too. It's also not universal, even within math. And linear algebra doesn't need 1-based indexing either, and some operations are even more easily expressed with 0-based indexing.


Starting with 0 is quite common in series, e.g. Taylor, Fourier, Chebyshev expansions, etc.


No... in those cases you're starting with 0 because that's the lowest exponent of a polynomial.


You're saying no while explaining my reasoning.

What index to start with only strongly matters when the indexes have semantics. Otherwise you should just treat it as an opaque index, i.e. eachindex(), keys(), etc. In math when there are semantics, the indices usually include 0. When not (vector components, matrix indices, etc), they usually (but not uniformly) don't.

The one nice side-benefit of Julia's mistake in adopting 1-based indexing is that it provided an extra impetus to build machinery to handle arbitrary indexing, though too much code still doesn't work correctly, and code still gets written to only handle 1-based arrays.


> What index to start with only strongly matters when the indexes have semantics.

Which in everyday computing (as opposed to mathematics) they often do, and those cases are (most?) often, in human terms, much more natural to start from 1: "I have an array of N elements. The first of a bunch of things is thing number one, and the last of N things is thing number N." Hence:

   N_things: Array[1-N] of Thing;

   for i := 1 to N do begin

      Whatever := Whatever + Whateverize(N_things[i]);

   end; // for i := 1 to N...
Yeah, that's how old I am: That's Pascal. (With some declarations skipped, and I may have misremembered some syntax.) The canonical example is of course the original Wirth-style max-255-ASCII-characters fixed-length[1] String type: In a string of length N, the nth character is at position n in the string. Character number N is the last one.

> The one nice side-benefit of Julia's mistake in adopting 1-based indexing is that it provided an extra impetus to build machinery to handle arbitrary indexing

1) Arguably, as per the above, not a mistake.

2) Muahaha, "build machinery"? No need to build anything new; that's already existed since the early 70s. (Yeah, that's how old I am: Not adding 19 in front. There was only one "the seventies".) It's not like starting at 1 was mandatory; you could well declare

   My_fifty_things: Array[19-68] of Thing;
And then "for i := 19 to 68 do ..." whatever with it, if those specific numbers happened to be somehow essential to your code.

(At least in Turbo, but AFAICR also in original Wirth Pascal. Though probably with the max-255-ASCII-elements limitation in Wirth, and possibly also in Turbo up to v. 2 or 3 or so.)

__

[1]: Though from at least Turbo Pascal 3 (probably earlier; also think I saw it on some minicomputer implementation) with the backdoor of changing the length by directly manipulating -- surprise, surprise, it exists! String was a built-in type with its own implementation -- the length bit at index [0]. Better start out with your string declared as length 255, though, so you don't accidentally try to grow it beyond what's allocated.


> often, in human terms, much more natural to start from 1

This meaning of natural is highly cultural dependent. It took the Greeks a startlingly long time to accept that one was a number (because it's a singleton), much less zero. I do not e.g. want arrays that can't have length one, because they have to be containing a number of things.

> No need to build anything new;

Well, no, not "new". Arrays with arbitrary bounds is a well-trod path. But they still had to make it work in Julia: CartesianIndices, LinearIndices, and overloading of "begin", and "end" keywords, etc. And the radical dependence on multimethod dispatch meant they couldn't quite just reuse existing work from other languages.


Algol had negative indexes. You could declare an array of nine elements going from -4 to 4, for example. I couldn't find why they wanted such a thing.


Hm, maybe Pascal does too. Can't recall.


I'm not "explaining your reasoning" because I don't agree with a single work you just said.

> In math when there are semantics, the indices usually include 0

Complete nonsense. For example, in a Laurent expansion you start in the negatives and go up. Now you're gonna say "but that's an exception, I said _usually_". But it's not, this is the general case.


> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

From data analytic point of view, indexing should start with 1. When we analyze a data table, we always call the first row as the 1st row, or row #1, not row #0. It will be very strange to label rows as 0, 1, 2, 3, .... It may be fine for people with Computer Science background. But it would create so much confusion for almost everyone else...


It causes problems for people with a CS background too. I once numbered machines in racks with zero-indexing (so that they could match up with zero-indexed ip addresses). Even though literally everyone who touched those machines had CS background: DO NOT DO THIS.


It just amuses me that one of the big differences between the US and EU is which floor is "first" and which one is "zero" or "minus one".


Yes. A German friend of mine moved into her student dormitory in the US, and when she was told that her room was on the first floor, asked whether there was a lift, because she had a heavy suitcase...

Having said that, given that there are basements (in Europe, at least), it makes sense to call the ground floor 0. We are dealing with integers here, not natural numbers.


But floors of multi-storey buildings are a pretty unique exception in the real world in having a characteristic where zero -- the number of stairs you need to climb from the ground floor -- has an actual tangible meaning (on the ground floor).

How many other such examples can you (editorial you; anyone) come up with? Not many, I'd bet.


I thought you'd be wrong, and immediately came up with:

- Hours after midnight. The (non-anglosaxon) watch goes from 0:00 to 24:00 (the latter is useful for deadlines: The proposal must be submitted by Friday, 24:00 (which coincides with Saturday, 0:00)).

Then I cheated and looked up Wikipedia (https://en.wikipedia.org/wiki/Zero-based_numbering#Other_fie... ) - and whoops, that's basically it!


Yeah, duh, totally forgot that one.

And I should know; the alarm clock on my windowsill says 0:54 right now.


Year -- or rather, decade, century, and millennium -- numbering. We're only in the second year of the second decade of the twenty-first century, so the decade will end and the next start at the end, not the beginning, of 2030.

But this is perhaps more of a problem with numbers and zeroes, and ends and beginnings; people don't get that 10 is the last of the 00s, not the first of the 10s. The very first year was numbered 1, not 0, so that's not the problem. Or, it kind of is: People would be right, if the first year had been 0.


> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

So, no FORTRAN, huh?


> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

Counting things is such a core thing to humans that when we have a bunch of N things we think of them as thing #1 to thing #N. We start counting from 1, not 0.

Indexing from 0 in computing is adapting the human mind to the computer, purely for performance reasons that may have been relevant in the 50s or 60s but were beginning to be obsolete by the 70s. It was done so you could access elements of an array by the simplest possible calculation of your offset into heap memory. When your first element is stored at Starting_address, you need i for that first element to be = 0, just so you don't need to have the compiler add another constant term for each element to "Element is at Starting_address + i * sizeof(element)".

Would have been trivial, even then (as Wirth showed) to add that constant term calculation to compilers, but it was done without in C because that eliminated one whole integer operation from each (set of?) array access(es).

In stead, we got the mental gymnastics of

   for(i=0, i++, i<=N-1) {...}
and its many variations (in stead of just for i := 1 to N...), which surely have caused orders of magnitude more headaches in off-by-one bugs over the years than it saved on performance.


There are good arguments for using either 0- or 1-based indices. As you should be aware, there are many languages on each side.

While preferring one over the other is perfectly fine, I question the intellectual honesty of anyone claiming incredulity about opposite choice.


If packages use generic indexing functions like eachindex, there would be no correctness issue with that specific example


The problem isn't that 1-base indexing can be "fixed" in Julia. The problem is that you see 1-based indexing as a flaw.


I didn't say I see 1-based indexing as a flaw. I said I complained about it, and then learned they supported multiple types of offsets (which ostensibly resolved the issue for me), only to learn that the stats library was "written before offsetindex" and still has bugs related to it.


It is a flaw. Computers don't work that way fundamentally, and it introduces lots of awkward translation.


But humans don't work 0-based. Try explaining to a bunch of scientists why for rows 2-5 of the DataFrame they have to write df[1:5].


Yeah because humans got it wrong. Really the word for "first" should correspond to the number 0.

Try doing a block iteration over an array, or any kind of interval algorithms in 0-based and 1-based. 0-based with right-open intervals just results in way way more elegant, easier to understand and (very) slightly more efficient code.


The reason why this 0 vs 1 based indexing debate is never resolved is because all of the arguments are subjective. You've claimed definitively that "humans got it wrong", but to back up this argument you've pointed to vague notions of "elegance" and "understandability". Even Dijkstra in his argument relies on a notion of "ugliness". All such arguments fall squarely in the realm of "preferences".

Just think about what you're saying: that natural language usage of the word "first" is wrong, and doing things the opposite way that everyone expects in programming languages is somehow more understandable? Really? Maybe that makes perfect sense to you, but not to people using the word as it's commonly used.


It's not subjective. The fundamental meaning of an index is "how far are we from the start of an array". The first element is 0 from the start of the array.

If humans had got this right then we wouldn't be even having this discussion. Same for similar mistakes like Pi vs Tau, negative electrons. But the mistake is understandable given that we didn't even think of 0 for a long time.


I assume you're not arguing that humans got it wrong to ever count from 1. So you must mean that when counting things, we start from 1 and when indexing we start from 0. Like, "that dog has 4 legs", but when you refer to them it's the "0th leg, 1st leg, 2nd leg and 3rd leg"? This seems weird. It's true that then the first - sorry, 0th - leg is 0 from the start of the array. But it also means that the last element of the array is one less than its size. You'd also struggle to do the operation of counting. You'd have to remember, am I indexing or counting?

If you say so! Maybe you should start a foundation to encourage humanity to change its counting behaviour. The effective altruists might throw you a couple of million.


It's subjective because you haven't defined or quantified elegance or understandability. I could say that zero based indexing is not in fact understandable based on the amount of confusion I encounter explaining the concept to new users. I could say it's inelegant based on the fact it makes algorithms I use harder to implement. Others argue it's more elegant because it makes algorithms they use easier to implement. Same rationale, subjective conclusions.

Dijkstra's argument as well hinges on an undefined notion of "ugliness", which can mean anything to anyone. That's why these conversations never end, because most people are talking past one another based on their own definitions of "elegance" or "ugliness".


You encounter confusion explaining it to new users because they've learned it wrong their whole lives.

Nobody has any trouble in the UK with lifts where the ground floor is 0 - one of the few places where humans got it right.


Right but the existence of confusion cuts against your argument it’s more understandable. What you really mean is your brain works differently than everyone else’s, and you find something straightforward which they find confusing, therefore they are just thinking wrongly and it turns out your brain is the one with the correct worldview.

That’s certainly a perspective.


No, this is not the fundamental meaning of an index, it is simply the interpretation that you are using to make it easier to deal with for you personally. In my opinion, the fundamental meaning of an index is a number referring to element number `n`, so that `a[1]` is the first element, etc. But I don't begrudge you your working definition.

For some reason, people who are used to 1-based indexing generally recognize that both interpretations are perfectly fine, and that each works well in different contexts. For some other reason, people used to 0-based indexing feel so intellectually superior that they are unable to see the other side of the argument.


Yes, precisely. Indexing and counting are different operations.


There are non-subjective arguments. See EWD831: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...


I referenced that one in my post. It hinges on a concept of “ugliness” which as I said can mean anything to anyone. It also presumes that this argument based on ugliness is the only one; even if we take his entire argument at face value, that the method he proposes is more “beautiful” than the rest and we agree on what that means, we still can’t say it’s the preferred methods in situations where beauty or ugliness of code doesn’t matter, because Dijkstra doesn’t even acknowledge they exist.


> ...humans got it wrong. [...] 0-based with right-open intervals just results in way way more elegant, easier to understand...

You must have warped your brain into thinking like a computer (well, a C-family-language compiler) for so long and/or so thoroughly that you no longer think quite like an ordinary human.

Having to sprinkle semi-arbitrary "-1"s all over your code is in no reasonable sense of the words "way way more elegant, easier to understand" than not having to do so.


> If you pass it an array with an unusual index range, it will access out-of-bounds memory: the array access was annotated with @inbounds, which removed the bounds check.

It think making indexes configurable is a huge mistake. Even if they are not ideal for the situation, having a single way to do indexes makes a huge source of confusion and potential bugs just go away. And this is orthogonal to whether you pick 0 or 1 as your starting point, as long as the whole language embraces that.

For example with C/C++/Rust, you know it is zero based indexing. Even if it is not perfectly ideal for your formulas, the mental math of translating to zero based is with not constantly having to worry about if a library is one based or zero based and what happens if you compose them.


There's a parallel idea, that you should avoid--insofar as is possible--numerical indexing. In other words, instead of iterating over `0:length(X) - 1` or `1:length(X)`, you use something like `for element in array` or

    indices = CartesianIndices(multidimensional_X)
    for index in indices

       X[index] = # whatever
If you do that, you don't need to keep track of whether it's zero-based, one-based, or anything else. In fact, you may not even need to keep track of the number of dimensions, as in this example, https://julialang.org/blog/2016/02/iteration/


Works great for trivial cases where there's no interdependency between array elements. As soon as you need to access, for example, adjacent elements, you want to be able to just iterate over 1:length(X) - 1 and access a[i-1] and a[i]. This is the most direct way and thus easiest to get right. Abstractions only make it more error prone.


Is `for i in eachindex(X)` really any worse?

You can still do math on i, it avoids issues with OffsetArrays, and it might even be clearer why you're iterating. It requires that the array type support linear indexing, but so does doing anything sensible with X[i] and X[i-1].


I'm only skimming this post and I'm not familiar with Julia so maybe I'm missing it, but does it have a way to get an item AND its index? There's I think Enumrable? in Rust where it gives you a tuple with both the item and its index in cases where you need both.


     julia> pairs("François") |> collect                                                                        
     8-element Vector{Pair{Int64, Char}}:                                                            
     1 => 'F'                                                                                                
     2 => 'r'                                                                                                
     3 => 'a'                                                                                           
     4 => 'n'                                                                                                
     5 => 'ç'                                                                                                  
     7 => 'o'                                                                                                  
     8 => 'i'                                                                                                  
     9 => 's'
Notice the missing index 6, because ç takes two bytes.

In contrast, enumerate() gets you the iteration number:

    julia> enumerate("François") |> collect                                                                    
    8-element Vector{Tuple{Int64, Char}}:                                                           
     (1, 'F')                                                                                                
     (2, 'r')                                                                                                
     (3, 'a')                                                                                           
     (4, 'n')                                                                                                
     (5, 'ç')                                                                                                  
     (6, 'o')                                                                                                  
     (7, 'i')                                                                                                  
     (8, 's')
This can trip you up.


Rust has the same problem with dealing with strings where if you don't realize how you are supposed to handle it with unicode you'll get burned when you don't correctly access code points.

Edit: Also thank you for the answer. I have been curious about Julia even though I'm not a data science/ML type, but never find time. I do like to keep an eye on it though.


I haven’t used Rust, but Julia keeps you from getting burned too badly by giving you an informative error message if you try to index "inside" a character:

    julia> "François"[6]                                                                                       
    ERROR: StringIndexError: invalid index [6], valid nearby indices [5]=>'ç' [7]=>'o'
I’m not a data science type either. I came to Julia through physics and general computing. It’s the best language for science I’ve ever encountered.


for (i,val) in pairs(array)


OffsetArrays can be really nice for things like convolutions. For example, it ends up being really natural to have a matrix that is indexed on [-2:2, -2:2] to implement a gausian blur. It definitely is a potential bug source though.


Indexes being configurable makes a ton of sense. It's why so many languages end up with a slice type (or array_view or span or whatever you want to call it). Why shouldn't the base array type just itself be the slice type?


What do you mean by 'mistake'? How are the Julia devs going to stop someone from defining arrays with configurable indices?

Are you suggesting that the core language should somehow make this impossible? How?


> Given Julia’s extreme generality it is not obvious to me that the correctness problems can be solved. Julia has no formal notion of interfaces, generic functions tend to leave their semantics unspecified in edge cases, and the nature of many common implicit interfaces has not been made precise (for example, there is no agreement in the Julia community on what a number is).

Does all that apply to Python? I think so? Yet apparently similar problems don't exist in python, and even one of the examples in OP had the reporter moving to python to have no problems getting the same thing to work that was problematic in Julia.

In a language intended for math, I do understand the desire to have something with more formal properties suited for guarantees and such. But Python seems to be doing just fine in that domain without those features, so, I'm not sure what we should conclude here.


The main difference between Julia and python is that most of the "core" python ecosystem has had a lot more dev time put into it. Google, Facebook, and Microsoft all have hundreds of full time developers on major python packages.


Makes sense. I guess the author's contention is that if Julia had those formal features the author wants, it would need very significantly less dev time to reach python's levels of reliability?

It's of course plausible, that's what those sorts of features are intended for, but I'm not certain I'm absolutely confident. At any rate, python demonstrates it is not the only path, as the author seems to be suggesting ("it is not obvious to me the problem can be solved" without these features, says the author. But it's not obvious to me that those features are necessary to solve the problem, or sufficient to solve the problem...)


Python's reliability here comes because it is a much less flexible language in some ways. If you write your own array type in python, and pass it into tensorflow, you would expect it to error. If you do the same thing in Julia, you would expect it to work.


More like you hope it works is what I'm gathering.


My opinion is that Julia was too ambitious from day one. Reimplementing the whole scientific computing stack AND a new modern language with an innovative type system and introspection AND perfecting tooling is just too big an effort.

The priority for correctness has been drowned out by too much other issues and we are here with a 10 years old language with a very perfectionist and ambitious mindset that is still a raw fruit in basically everything. It's not some rough edges it's just too many edges, most of them rough.

I cannot help thinking that if the same amount of people focused on a much smaller goal we could have something much more usable today. As it is now I know Julia won't be production ready for at least 10 years. And that's in the lucky case that it doesn't become irrelevant in the meantime.


If Julia followed your recommendation, it would be irrelevant before it ever started.

There is no way for a new language to be useful or relevant unless it brings significant improvements.


In terms of saving human time I have found R to be fastest (in human time) for iterative prototyping, exploring, and visualising data

R still has the best statistical package ecosystem, although python is catching up.


Actual title: “Why I no longer recommend Julia”.


@Dang could we get the title corrected?


Unfortunately this is not a feature but a bug, and the worst kind, a bug at the language design level:

Basically, Julia tries very hard to make composability work, even if the authors of the packages that you're composing don't know anything about each other. That's a critical feature that makes Julia as powerful as it is, but of course you can easily end up with situations where one or the other package is making implicit assumptions that are not documented (because the author didn't think the assumptions were important in the context of their own package) and you end up with correctness issues.


It's an interesting point in the language design space. Composing unrelated projects gives a rapidly increasing state space of interactions which noone is directly responsible for. I can't decide if that's brilliant or broken by design.


why not both?


This seems hard to evaluate without a quantitative comparison to the abundance of bugs in the package ecosystems of other languages at the same age. So, for instance, how many correctness bugs existed (or, alternatively, had been found and fixed) in the Python ecosystem when Python was ten years old? The author makes a subjective claim, but from the few other languages they mention it seems they are comparing primarily to older and more stable ecosystems.


Correctness in Julia feels like it'll never happen, because interfaces seem like they'll never happen.

Correctness guarantees / interfaces and slow startup are both my biggest pain points in Julia.

I often think what would happen if every Julia dev just dropped the language and used Rust instead. A scientific ecosystem in Rust would be amazing.


As someone who really likes both Rust and Julia, there is absolutely no way Julia's scientific users would switch to a static language. Rust is slow to write, verbose, also suffers from long compile times, has no REPL or garbage collector... It is deeply unsuitable for scientific coding.


> OffsetArrays in particular proved to be a strong source of correctness bugs. The package provides an array type that leverages Julia’s flexible custom indices feature to create arrays whose indices don’t have to start at zero or one.

I always thought this sounded like a bad idea. I remember one time I was working with a C++ guy on a Matlab project, and he handed me some Matlab code with 0 based indexing assumed. I said "Did you even run this code?", and he assured me he had. But of course he had not, because if he did it would have complained about the 0-based indices. But the point is that it did complain when I ran it, and I was able to match it to my code. I imagine in Julia he would have used 0-based indices, and I would have used 1-based, and our programs would have silently failed.


For it to silently fail of course though, he would have had to explicitly used the OffsetArrays package and explicitly switched all `Array`s to `OffsetArray`s (which hopefully you would notice) -- and then you would have to go ahead and use those OffsetArrays in a package which doesn't support them; if you just go ahead use 0 as an index in plain Julia code it will error as you would expect.


> In my experience, Julia and its packages have the highest rate of serious correctness bugs of any programming system I’ve used, and I started programming with Visual Basic 6 in the mid-2000s.

Oh God, is this what qualifies you as "old" now


Kids these days, eh? Lawn, etc.


Nah, it's not like that. In my mind an "older" programmer is like from the 90s.

But I did that stuff in the mid-2000s too: am I now an "old"?

Terrifying!


Relax: No, you're not. You've got another fifteen years.


A lot of these issues can be fixed. Adding robust type constraints (e.g. traits) and accompanying "static analysis" tooling would help a lot. Julia can learn a lot from ML-family languages (e.g. OCaml, Haskell) in that regard. And there are efforts in the Julia community to add these features via third-party libraries. However, I don't see things improving unless such features are baked into the language and used more ubiquitously in open source modules.


I tried Julia but the compilation time for interactive use was just too insane.

I ended up paying £125 for MATLAB. Nothing else really remotely compares to MATLAB's plotting facilities.


Did you tried Octave, GNU's numerical package that is compatible to MATLAB?


Of course! The language implementation is decent and the GUI is promising, except for the most important feature of the GUI - the plot viewer, which is completely awful. Forget about the same league, it's not even playing the same game as MATLAB.


I use Matlab daily, and the plotting is indeed excellent.

But the language itself is a horrible kludgy mess. Most of the development time is spent on input parsing and contorting your code into a vectorized shape.


Yeah I agree the language is not great, especially for non-matrix things. But Julia isn't exactly great either (unsurprisingly since it is pretty much a MATLAB derivative).

But I only use it for prototyping. I would absolutely not recommend it for production code. If there's some gnarly input processing to be done I'll do that in another language and just have it output CSV or similar.


Similar correctness issues are a big part of the reason that, several years ago, I submitted a series of pull requests to Julia so that its entire test suite would run without memory errors under Valgrind, save for a few that either (i) we understood and wrote suppressions for, or (ii) we did not understand and had open issues for. Unfortunately, no one ever integrated Valgrind into the CI system, so the test suite no longer fully runs under it, last time I checked. (The test suite took nearly a day to run under Valgrind on a fast desktop machine when it worked, so is infeasible for every pull request, but could be done periodically, e.g. once every few days.)

Even a revived effort on getting core Julia tests to pass under Valgrind would not do much to help catch correctness bugs due to composing different packages in the ecosystem. For that, running in testing with `--check-bounds=yes` is probably a better solution, and much quicker to execute as well. (see e.g. https://github.com/JuliaArrays/OffsetArrays.jl/issues/282)


I tend to be a a bit wary of dynamic languages with sophisticated, performant implementations of complex abstractions, especially if they have somewhat niche appeal. In my experience this is a combination that makes for running into a lot of implementation bugs. For example, I've run into many more nasty compiler bugs with lisps (and julia at least qualifies as an almost-lisp) than with more simple-minded dynamic languages like python or erlang[1] or fairly sophisticated but niche statically typed languages.

I think watching Julia over the next few years will be quite interesting: it's the only dynamically typed language that has both sophisticated abstractions and a sophisticated implementation[1] that has enough pull to have a chance to become entrenched in certain domains. I wonder to what extent they will be able to get this problem under control.

[1] BEAM, unlike cpython, is actually a marvel of engineering and making very deliberate trade-offs. But it's not very complex.

[2] Javascript is of course the one pervasive dynamically typed programming language that has sophisticated implementations, but of mostly ill-conceived constructs.


It would be interesting to know which language the author currently uses.


The author mentions that he was stuck on a problem for weeks using Julia, but solved it with Python within hours


That was someone else: Patrick Kidger is mentioned in the article. If I look at the author's github, it's go and javascript.


You're right. I misread.


Pretty sure it was Go last time I talked to Yuri, he is very much a stand-up guy.


The examples provided feel more like bugs in various libraries than an actual problem intrinsic to Julia the language.


According to the article the problem is in the ecosystem, and partly the standard lib.

Basically it doesn't matter if Julia the language is fine, if all the stats packages make wrong calculations. Then what is the point of Julia, if you have to rewrite all things? might as well use another language where you trust the result of the ecosystem, since it is the ecosystem you need in order to produce results.


All bugs mentioned had been quickly fixed: https://news.ycombinator.com/item?id=31397425


That comment doesn't say all bugs have been fixed, or even quickly fixed. When I check on the posted links, many are in fact still open, e.g.

https://github.com/JuliaStats/Distributions.jl/issues/1253

https://github.com/JuliaStats/StatsBase.jl/issues/642

https://github.com/JuliaStats/StatsBase.jl/issues/616

https://github.com/JuliaLang/julia/issues/39385


I don't get these complaints about `sum!(a, a)`. Sure it's a bit of a footgun that you can overwrite the array you are working with. This doesn't rise to a "major problem" of composability.

The histogram errors seem annoying though. Hopefully they can get fixed.


Sure, it's unsurprising that it produces unexpected results, but there are actually semantics that should be expected. The problem is that implementing those semantics correctly for all cases is hard, because aliasing. Same issue that e.g. memcpy() vs memmove() have.


What semantics are expected in other languages? This seems solidly in the realm of undefined behavior as far as I can tell.


The obvious semantics for these functions is that f!(a, args...) should do the same thing as a .= f(args...).

It's only undefined behavior because the simple implementations don't do that in the presence of aliasing.

I brought up memcpy() and memmove() (which in C are copying identity functions on bytes) exactly for this point. memcpy() has undefined behavior when the source and destination ranges overlap (implementable as a simple loop), while memmove() does the right thing if they do overlap, at the cost of having to check what direction they overlap when they do. And in C you can actually easily check if they overlap and in what direction, because the only interface there is the pointer. Aliasing with objects with internal details that are more complicated than that to check is difficult, perhaps too difficult to expect. But it is possible if your only handling your own objects: witness analogous behavior getting specified in numpy: https://docs.scipy.org/doc/numpy-1.13.0/release.html#ufunc-b... . They do note that this can require allocation, even in some cases where it shouldn't. But not allocating is of course most of the point of the in-place versions.


Thanks for the detailed response.

Yeah allocation seems like the biggest hangup here. I would rather have a function stick to a "no allocating" contract and allow for some undefined behavior than have a function unexpectedly allocate to preserve safety.


Yea, all are just bugs, not some intrinsic flaws in the language.

Given Julia's goals (performance, abstractions, accessible to science people), it's understandable if they had slightly higher bug concentration than other (similarly sized) ecosystems.


The author's argument is that the bugs all share a pattern, and thus there is an intrinsic flaw. That doesn't necessarily mean the community wants to fix the intrinsic flaw, just like nobody is really interested in fixing the intrinsic memory safety flaws of C. But they shouldn't be denied as real risks, either, or a tradeoff of some kind.


@inbounds is a Base feature.


Yes, and it is a perfectly fine feature when applied correctly. It would be incorrect to assume that an `AbstractArray` starts at `1` or `0` which is why the updated example now correctly uses `eachindex`: https://docs.julialang.org/en/v1/devdocs/boundscheck/#Elidin...

If you want to assume that an array starts at `1` one needs to require an `Array` rather than an `AbstractArray`.


@inbounds isn't the problem, it's incorrect usage of it. The poor docstring is absolutely a problem though, you should be iterating over eachindex(A), not 1:length(A).


I wonder why not much is done to bring high performance scientific computing to common lisp. There are some interesting projects I was able to find like https://github.com/clasp-developers/clasp and https://github.com/marcoheisig/Petalisp and https://github.com/takagi/avm. But I guess it would be good to have a coordinated effort in this area.


A devastating article for Julia. I was thinking about trying Julia out... but not after reading this.


Oof, accessing out of bounds memory is pretty surprising to me for a dynamic language ... But I guess it's not surprising if your goal is to compile to fast native code (e.g. omit bounds checks).

I don't know that much about how Julia works, but I feel like once you go there, you need to have very high test coverage, and also run your tests in a mode that catches all bound errors at runtime. (they don't have this?)

Basically it's negligent not to use ASAN/Valgrind with C/C++ these days. You can shake dozens or hundreds of bugs out of any real codebase that doesn't use them, guaranteed.

Similarly if people are just writing "fast" Julia code without good tests (which I'm not sure about but this article seems to imply), then I'd say that's similarly negligent.

-----

I've also learned the hard way that composability and correctness are very difficult aspects of language design. There is an interesting tradeoff here between code reuse with multiple dispatch / implicit interfaces and correctness. I would say they are solving O(M x N) problems, but that is very difficult, similar how the design of the C++ STL is very difficult and doesn't compose in certain ways.

(copy of lobste.rs comment)


You can also use `julia --check-bounds=yes` — and our testing frameworks automatically do so.


Does the Python have similar issues?


Think about programing layers: A->B->C->D->...->Compiler->binary->output, where A is the end programmer, and B, C, D are the libraries and modules. I think what the article describes is not much different from issues in any complicated software systems, as quite a few comments also pointed out. However, when the language become more expressive and compiler become more clever, more of the issues will be rooted from the the compiler->binary link. I think this is inevitable with the current model of how software works, which I can simplify as: A -> [super compiler] -> output

The middle part is the concatenation of all the middle links and handles the complexity necessary to translate from language to output. As we trying to make A less complex, the middle [super compiler] will get more complex, and more buggy because of the complexity.

I believe the fundamental issue with this model is the lack of feedback. A feedback on output, and A makes change (in A) until output get correct. With the big complex and opaque middle, for one, we can't get full feedback on output -- that is the correctness issue. The more complex the middle gets, the less coverage the testing can achieve. For two, even with clear feedback -- a bug -- A cannot easily fix it. The logic from A to output is no longer understandable.

I believe the solution is to abandon the pursuit of magic solution of A -> [super compiler] -> output but to focus on how to get feedback from every link in A->B->C->D->...->compiler->binary->output

For one this give A a path to approach and handle complexity. A can choose to check on B or C or ... directly on output, depending on A's understanding and experience. For the least, A can point fingers correctly.

For two, this provides a path to evolve the design. The initial design on which handles which or how much complexity is no longer crucial. Each link, from A, to B, to C, ... to compiler can adjust and shift the complexity up and down, and eventually settle down to a system that fits the problem and team.

I believe this is how natural language works. Initially A tells B to "get an apple" and they directly feedback on the end result of what apple B gets to A and may alter layer of A by expanding into more details until it gets the right result. Then, some of the details will be handled by B and A can feed back on B's intermediate response for behavior. As the world gets more complex, the complexity at the layer A stays finite but we added middle layers. Usually, A only need feedback on its immediate link (B) and the final output, but B needs to be able to feedback on its next immediate link, and if A is capable, A may choose to cut-out the one of his middle man.


[flagged]


I'm guessing you've never used Python for a serious project before, because your statement is incorrect. Aside from the fact that Python is memory safe, my experience has been that it is far easier to avoid logic errors in Python than in C/C++.

Equivalent Python code is shorter and less noisy than C/C++, and Python makes it easier to create succinct and simple to use abstractions. The result is that code is far easier to audit. Furthermore, my experience has been that it is much easier to create unit and integration tests in Python, which means that people actually do it and/or test more comprehensively.

My experience has been that if the performance cost of Python is no object then there is a correctness benefit over C/C++ for a disciplined programmer. I suggest you gain some more experience with both languages before making an offhand comparison.


In my experience it's much harder to write Python that won't crash than C/C++ that won't crash. With Python, rarely taken code paths can have very dumb errors in them (e.g. accidentally introducing a new variable with a different name than the variable you wanted or typoing a method name, or passing arguments with the wrong type) that would get caught by a C/C++ compiler but won't be caught in Python until the crash happens some months later. Typically in "reliable" code you're not going to do much dynamic allocation anyway, at least past the initialization stage (because memory fragmentation!) so memory errors aren't that common anyway.

Yes, probably Rust and Ada provide even more guarantees...


C/C++ memory errors are literally responsible for 70% of ALL critical security bugs. Most python "dumb errors" can be caught by linters. Memory is infinitely harder.

Source -- https://msrc-blog.microsoft.com/2019/07/16/a-proactive-appro...


The linter can't tell that the method doesn't exist until runtime, can it? If so, it wouldn't accept lots of valid python... Also, what linter should I use?

Either way, not all software needs to be secure to an adversary (since it's running behind several firewalls and if the attacker has shell access, then it's already game over).


And Python is implemented as a large C program.


66% python, 32% C - for cpython.

94% python, 5% C - for pypy.


cpython is the reference implementation and 32% means about 350,000 lines of code. My point stands.


So is glasgow haskell compiler or ocaml compiler - so what?


> ...there is a correctness benefit over C/C++ for a disciplined programmer.

Businesses want cheap programmers, not disciplined programmers. It's a great utopia if everyone just follows the best practices and does it perfectly all the time, but I guarantee that as margins become thin people writing, say, code for autonomous vehicles are doing the _very_ minimum to meet correctness requirements.


Genuinely curious (I don’t work with low level systems day-to-day), but why would C/C++ be better for safety?

I would imagine that the risk of bugs related to memory access and data racing would be pretty high, compared to a higher-level language or something like Rust that focuses on safety.


AFAIK (and I am not an expert) when they use C/C++ in automotive systems (in the automotive part, not elsewhere) they might tend to use deterministic programming (f.e. fixed-length arrays, no allocations, etc.) which eliminate most of the bugs.

But, again, I am not an expert and Toyota's sudden acceleration problem was said to be caused by bad C programming (said by some, not by others).


I thought it was caused by loose floormats.


Tooling and experience. In principle, there are other languages that might be better than C or C++ for safety-critical software. You know all the hoary jokes about the difference between theory and practice?


> You know all the hoary jokes about the difference between theory and practice?

Only in theory; do you have a practical example?


Can't a guy leave anything as an exercise to the reader anymore?


If you really want safety, use Rust or Ada.


Or Nim [1]..kind of an Ada with Lisp macros and more Pythonesque surface syntax.

All three are ahead-of-time compiled/less REPL friendly than Julia, though. Taking more than 100 milliseconds to compile can be a deal breaker for some, especially in exploratory data analysis/science settings where the mindset is "try out this idea..no wait, this one..oops, I forgot a -1" and so on. In my experience, it's unfortunately hard to get scientist developers onboard with "wait for this compile" workflows.

[1] https://nim-lang.org/


The Python-like syntax is a minus for me. I didn't care for semantic leading whitespace in ABC. I don't care for it in Python. I therefore doubt I'd care for it in Nim. If that's your thing, an Ada with Lisp-like macros sounds delightful.


It is definitely more possible in Nim than in Python to use parens () in many (but not all) places to be more like bracy languages (but with parens being the brace..).


I've actually become somewhat interested in Hy lately. Having a Lisp that targets the Python AST may come in handy. If there was a Nim facility for full Lisp or a very Lispy syntax that could be really nice.

https://github.com/hylang/hy


Even if you take, 100ms to compile, in practice, it takes about 2 seconds since it requires the user to type in another command and register that it has finished.


Is C/C++ really such a better choice in safety-critical systems? It's notorious for having all sorts of buffer overflows and memory issues on unexpected input.


MISRA C or MISRA C++ are used, enforcing much stricter guarantees than what the C or C++ specs provide.


IMO MISRA C, CERT C, and CERT C Secure should be standardized together into a compiler with a different language name. I'm aware there are other tools. If the compiler itself enforced everything to do with those standards and rejected violations as invalid code, we could have a much improved grounding for that language. There are already languages similar to a safer C, so we know there's a demand.


like dropping memory safety! oh, wait…


> tons of C++/C engineers needed ... educational background is irrelevant, but all must pass hardcore coding test.

https://twitter.com/elonmusk/status/1224182478501482497?lang...


When Julia was first released, I tried it out and decided I'd write a syntax highlighter for it, so I asked for a grammar. There wasn't one. I was told to refer to the parser source code, which was written in a custom dialect of LISP. That was a red flag for me and I never returned.


Oftentimes people describe languages as "Turing complete" but how often do they talk about languages being "Gödel incomplete?" Another way of stating maybe is "Are what some call flaws what others call features?"

https://stackoverflow.com/questions/7284/what-is-turing-comp...

https://plato.stanford.edu/entries/goedel-incompleteness/


Even fairly simple arithmetic is incomplete, so unless the language is heavily restricted, allowing only multiplication of positive integers (x)or addition of natural numbers, they're all going to be incomplete.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: