Hacker News new | comments | ask | show | jobs | submit login

I'm a quite happy Julia user, however I feel there are still some warts in the language that should have warranted a bit more time before banging 1.0 on the badge.

Exception handling in julia is poor, which reminds me of how exceptions are (not/poorly) handled in R. Code can trap exceptions, but not directly by type as you _would_ expect. Instead, the user is left to check the type of the exception in the catch block. Aside for creating verbose blocks of boilerplate at every catch, it's very error prone.

Very few packages do it right, and like in R, exceptions either blow up in your face or they simply fail silently as the exception is handled incorrectly upstream by being too broad.

Errors, warnings and notices are also often written as if the only use-case scenario is the user watching the output interactively. Like with R, it's possible but quite cumbersome to consistently fetch the output of a julia program and be certain that "stdout" contains only what >you< printed. As I use julia also a general-purpose language to replace python, I feel that julia a bit too biased toward interactive usage at times.

That being said, I do love multiple dispatch, and julia overall has one of the most pragmatic implementations I've come across over time, which also makes me forget that I don't really like 1-based array indexes.

In 0.7 I put quite a lot of design effort into the logging system to allow log records to be redirected in a sane way. It's by no means a complete system, but I think the changes were worthwhile and the core infrastructure is now in place to allow consistent and structured capture of messages from any package which uses the Base logging interface.

Exception handling is indeed somewhat of a wart with the core system not having changed much since very early on. I think there's still some serious design work to do and the core people take it seriously. On the other hand, I suspect that numerical codes don't want to use exceptions for much other than "panic and escape to an extreme outer scope". So a lot of the scientific users are probably content for the time being.

I had a similar reaction about this being a little premature.

On the other hand, I'm wondering if this will help a little with the dependency hell that's caused me to drift away from Julia over the last year or so.

At first I was fairly excited about Julia, and greatly preferred it over R or Python for numerical work, library resources aside. It was fast and I liked the language design itself.

Over the last year or two, though, I've had recurring problems with trying to install and use packages, and them failing at some point due to some dependency not working. Sometimes the target package itself won't install, but most of the time it's some lower-level package.

It's more frustrating than not having packages available, because it creates some sense that packages are available, when they're not. At first I looked at it as some idiosyncratic case with one package, but it's happened over and over and over again.

Basically in this time I've given up on Julia, because there's such a huge discrepancy between what it looks like on the surface and what it looks like in practice, once you get out of certain limited-use cases, or when you're not coding everything yourself from the base language. (Related concerns about misleading speed claims have been raised, although my personal experience in that regard has been mixed, because my experience overall has been pretty good in some critical performance cases, but there have also been some wildly unpredictable exceptions that act as bottlenecks... but it's still much better than R or Python).

When I've tried to figure out what's going on with dependency hell, usually it's because some package that was previously working with a earlier version is no longer working. So maybe stabilizing the language will help that?

Package management is a really-really-really difficult problem that is far from solved. Not to say that your critiques are invalid (quite the opposite), but Julia is just now hitting 1.0 - by contrast Node/NPM have been around quite a bit longer and still have terrible package issues they're working to solve.

If you can, try and find a few hours to help pitch in and solve the package problems, even if it's just updating docs or updating deprecated-but-useful packages.

I know the JS ecosystem has some pretty counterproductive culture when it comes to package management (leftpad), but can you provide some examples of terrible issues still present in NPM? I hear this complaint often and I'm wondering what other people think of as insurmountable technical issues or design flaws in NPM.

I don't believe there are any insurmountable issues with NPM currently.

The current major issue, as it stands, is that it's very easily for a malicious bit of code to sneak into a heavily used JS package and have oversized effects - this happened very recently with a very popular linting-support package.

The other issue is general posting of malicious packages under mistyped names, or takeover of existing packages with malicious updates by new owners.

At the same time, nobody wants to have NPM (the org) manually vet every upload ever made. So, there's that.

Many JS packages are extremely dep heavy, overwhelmingly for minor features (checking something is an array, promise-fying FS, etc) which makes it very easy to infiltrate packages and very hard to vet a package entirely.

Finally, npm (the program) runs into a fair bit of caching woes and it's own dumb bugs which feel like they shouldn't slip into production nowadays. Oh, and sometimes npm (the website) goes down.

The answer for JS, unfortunately, is probably segmentation - as better managed and more secure package repos come up, likely with their own package managers, npm will probably have to up their game. That, I am sure, will bring a whole fresh set of issues.

Because there are a lot less packages with binary dependencies, I've overall had less problems than work R. Most packages are pure Julia, and there you shouldn't face issues.

With the new binary builder, binaries should be another note, so long as you download an official Julia binary, or built Julia from source in the same way those binaries are made (eg, build OpenBLAS with gfortran 7).

That's really disappointing to hear. Dependency hell is what drove me away in 2016. I hope it clears up eventually.

It is a lot lot better than it was in 2016. The new package manager helps a lot. I mean separate environments per project, and upper-bounding package dependencies by default, is just going to avoid a lot of head-aches.

But more generally things are maturing.

And 1.0 will help too, since things won't be chasing a moving target.

If your not in any hurry, I'ld give it 6 months, of people who don't mind a bit of packages breaking (e.g. people like me) using it.

That will be plenty of time for everything to shake out. More than you might expect has actually already been shaken out in the last few weeks in the package ecosystem. Hitting 1.0 should give some package maintainers the drive to get it done.

I don't think _any_ language has got exception handling handling right yet, despite a lot of effort. The problems are particularly apparent in parallel and multithreaded programs.

That said, I am optimistic Julia will have a good solution at some point: contextual dispatch, a la Cassette.jl, enables just the sought of interventions you want for error handling. I'm not quite sure what the result will look like, but I imagine you will see some experimentation in this direction in the near future.

Exception handling is one of the few subsystems that hasn't really had a revamp. It's probably one of the areas that'll get a good bit of thinking post 1.0.

Also: not allowing the expected `catch e::ErrorType` syntax leaves it open for a better design without breaking existing code which means it can be done in the 1.x timeframe instead of needing to wait until 2.0. This is why that syntax hasn't yet been added with the expected meaning.

Related: https://github.com/JuliaLang/julia/issues/7026

Yeah, I agree with your comments about error handling. It’s far from ideal in non-interactive contexts. It’s especially disappointing since you could easily imagine something like Julia replicating Python’s success at transitioning code from interaction (e.g. Jupyter notebook) to production.

I initially defended the choice, but I now agree that 1-based indexing now seems like a poor choice since Julia has become something more than the original mission of a better MATLAB or Octave. It’s a, admittedly, minor tragedy of Julia’s success.

This is what 0-based indexing looks like in data analysis:

>In order to read a csv in that doesn't have a header and for only certain columns you need to pass params header=None and usecols=[3,6] for the 4th and 7th columns:


Just reading that hurts me.

This is very much a non-argument. Call the columns the 4th and 7th is as arbitrary as calling them the 3rd and the 6th.

Again, 0-based indexing exists to fit a purpose: http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EW...

In my opinion, reading `a = b[1:n-1]` hurts much more than reading `a = b[:n]`.

>Call the columns the 4th and 7th is as arbitrary as calling them the 3rd and the 6th.

No, that's their ordinal position, and how 8 billion non-programmers would refer them as in any everyday setting.

That's also how programmers would refer to them if it wasn't for a historical accident.

What's more, that's also how programmers refer to them when they talk between then and not to the machine ("check the 3rd column" not "check the column at the index of 2").

It's not arbitrary. It's English. In the list [apple, orange, tree] which element is orange? It's the second element.

I have taught Python quite a bit, and I have gotten good at explaining 0 based indexing and slicing based on it. When I switched to Julia there was nothing to explain. And my code has about as many +/- 1s as before...

Just because it is our common convention in lay conversation doesn’t mean it isn’t “arbitrary”.

These spoken language conventions developed before there was an established name for “zero” or even a concept that “nothing” could be a number per se.

For similar reasons, we have no zero cards in our decks, no zero faces on a dice, no zero hour on our clocks, no zero year in our calendar, no zeroth floors in our buildings, East Asian babies are born with age one year, etc.

It’s only by another set of historical accidents that we have a proper 0 in our method of writing down whole numbers. Thankfully that one was of obvious enough benefit that it became widely adopted.

> no zeroth floors in our buildings

In (North?) America. In Europe, there's a ground floor (zero), then first floor (1), etc. Basement is -1 (etc.).

A European friend of mine arrived at college in the USA, and was assigned a room on the first floor of the dorm. She then asked the housing office whether there was a lift, because she had quite some heavy luggage, earning some rather amused looks :-)

Yes, it would be interesting to know the history.

Whoever designed the European convention for labeling building floors was numerically competent.

Too bad medieval European mathematicians and the designers of Fortran weren’t. ;-)

The base level doesn't need to have a floor, it's just ground. Once you add a floor you are on the first floor above ground. Really your condescending tone as if all the mathematicians that prefer to work with 1 indexing are just incompetent is grating.

I'm happy to be writing

``` for i in 1:n func!(a[i]) end ```

to iterate over an object of length n. Or split an array as a[1:m], b[m+1:n]. Slicing semantics which are far more prevalent in my code (and the code I read) than index arithmetic, are truly vastly simplified by 1 based indexing of Julia compared to the 0 based numpy conventions. We simply no longer code in the world that Dijkstra argued for, and I have not seen anybody give a clear argument that is actually rooted in maths and contemporary programming.

I genuinely thought that the Python convention was brilliant, and that 1-based indexing in Julia would suck. It turned out not to be the case.

Sorry, that last bit of my comment was gratuitous.

I am legitimately (mildly) curious about the history of the different naming conventions for floors of buildings though.

> The base level doesn't need to have a floor, it's just ground. Once you add a floor you are on the first floor above ground.

Yes, my point is this is an example where the European 0-based indexing system makes more sense (in my opinion) than the American 1-based indexing system. I speculate that whoever started calling the ground floor the “first floor” hadn’t really put much thought into how well that would generalize to large buildings with many floors including some underground.

Similarly, whoever decided the calendar should start at year 1 AD with the prior year as 1 BC hadn’t really considered that it might be nice to do arithmetic with intervals of years across the boundary.

There are many standard mathematical formulas which are clarified by indexing from 0. But nobody can switch because the 1-indexed version is culturally fixed. Most of the rest of the time the 0-indexed vs. 1-indexed versions makes basically no difference. It is rare that the 1-indexed version is notably nicer.

> Or split an array as a[1:m], a[m+1:n]

Yes, I find it substantially clearer to write this split as a[:m], a[m:]. Particularly when dealing with e.g. parsing serialized structured data. But also when writing numerical code. Carrying the extra 1s around just adds clutter, and forces me to add and subtract 1s all over the place; reasoning about it adds mental overhead, and extra bugs sneak in. (At least when writing Matlab code; I haven’t spent much time with Julia.)

> no zero hour on our clocks

There is. We call it 12 for some crazy reason (it goes 12 AM, 1 AM, 2 AM, ..., 11 AM, 12 PM, 1 PM, ...).

> no zero year in our calendar

Which is quite irritating really. New Year's Day 2000 wasn't the start of the 3rd millenium, because there was no year zero.

> East Asian babies are born with age one year

But not western babies.

The 12 on a clock is a compromise to match between a 1-indexing oral culture and natural 0-indexing use case (which came from the Sumerians who had a better number system).

I don’t know the history of reported ages of Western babies.

> quite irritating really

Yes that is my point.

The equivalent of `a = b[:n]` is `a = b[1:n]`. And I don't think you can get around admitting that there is a fundamental ambiguity in the spoken statement "Take a look at the fourth column!" in a zero-based index system. You always need a follow-up question to clarify whether you mean "everyday informal speech fourth" or "zero-index fourth."

But you can say "Take a look at column 4" instead, which is unambiguous.

Calling the 3rd and 6th columns "3rd" and "6th" is hardly arbitrary.

a = b[1..n]?

> 1-based indexing now seems like a poor choice since Julia has become something more than the original mission of a better MATLAB or Octave. It’s a, admittedly, minor tragedy of Julia’s success.

I’m curious as to why this is a problem outside numerical computing. From my perspective, this is consistent with a long history of mathematics dealing with matrices that predates electronic computers.

0-based arrays are popular because C decided to deviate from what had previously been standard in math and in Fortran.

Is there a reason other than aesthetic preference and habit that makes 0-based indexes better for computing in non-numerical contexts?

I realize both indexing standards are arbitrary and boil down to “that’s the way grandpa did it,” however 1-based indexing grandpa is way older and more entrenched outside computing circles.

EDIT: I suppose with Julia it’s not that important, as other commenters have pointed out that you can choose arbitrary indexes.

> the technical reason we started counting arrays at zero is that in the mid-1960’s, you could shave a few cycles off of a program’s compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.


Thanks for the great article.

An unexpected takeaway: Michael Chastain's time-traveling debugging framework, which he had in 1995, and which still reads like sci-fi from the future[1].

Alas, this quote will probably stay relevant for a while:

>[We keep] finding programming constructs, ideas and approaches we call part of “modern” programming if we attempt them at all, sitting abandoned in 45-year-old demo code for dead languages.


Very strange article. The first part builds suspense for the great truth that's going to be revealed by the author's extensive research, insisting that it's not something as simple as pointer arithmetic.

Then the second parts quotes the creator of BCPL, revealing – a-HA! – what we already knew, that the array is a pointer to the first element and the index is the offset from the pointer, and that's why it's zero.

(And after that it veers of into complaining about how the papers documenting this history cost too much money, so they didn't actually read them.)

Wow, I hadn’t seen this before. The history of this feud goes back before C. Thank you for a fascinating read!

I've always found this a rather weak argument - he even implies that 2...12 is at least as clear (it being the starting point of the text).

I also thought I'd seen a longer text focusing more on the counting/indexing.

I still don't see the appeal of "element at zero offset" vs simply "first element".

I do agree that < vs <= etc can get messy. But outside of now fairly archaic programming languages I don't see the need. Just use a construct that handles ranges, like "for each element in collection/range/etc". (Or for math, "pattern matching" (or "for n in 1..m").

I believe that may be cited in the article (but its not in an obvious place :) )

This was a fantastic (and relevant) read, thank you for sharing it.

0-based indexing has the advantage of easily mapping to memory addresses since the index is an offset from a base address (perhaps multiplied by a unitary per-item length). I personally prefer this since I tend to think about and work with data as an offset from a starting point.

1-based indexing has the advantage of always knowing the length of the array since the index of the last element is this value. You also get "proper" counting numbers as you iterate.

I've used both in various languages. Perl lets you define the array index start to be whatever number you wish.

As noted above, Julia does too. Heck, if you wanted to, in Julia with probably <100 lines of code you could create an array type that only allows prime numbers as indices. And subject to the constraints of the raw calculations needed, it would be as fast as it can be---there is no "extra" performance overhead associated with doing basically whatever you want.

Zero based arrays are frequently better in a numerical context. Many times when you’re using the index in the computation itself (FFTs for instance), zero based is what you want. For instance, the zeroth frequency (DC) is in the zeroth bin.

Which is why julia doesn't make any assumptions on how your axes are indexed. If you're working in a numerical domain where 0-indexed arrays, or symmetric arrays about the origin, or arbitrarily other transformed axes make sense, just use those.

I understand the argument, but when the default disagrees with your override, there's almost always an impedance mismatch and some pain. It's like being left handed when 99% of the interfaces in the world assume you're right handed.

People argue that zero-based is incidental, and that 1-based is the right way because of it's long history in mathematics notation. I would argue that 1-based is incidental, and that zero-based is better most of the time for modern math and computer architectures.

I can understand why you might get the impression, but I'd encourage you to try out julia and see that we're really quite good at using index-agnostic abstractions, so most code doesn't care what your arrays are indexed with. If a certain set of indices make sense in your domain (0-based for FFTs as you say, symmetric indices about the original for image filters, 1-based for just regular lists of things, etc), just use it, and it'll be convenient interactively, but most library code doesn't really think about it that much.

I'll (cautiously) take your word that this works transparently when I supply arrays as arguments to a library function. However, what does the library function return to me as arrays it allocates? What if I do an outer product of two arrays with different base indices?

Whatever your answer, I suspect it is more cognitive overhead to remember than "always 1 based" or "always 0 based".

> However, what does the library function return to me as arrays it allocates?

Depends what the library function does of course. If it's shape preserving (e.g. if it's a map over the array), it'll generally preserve the axes of the input array.

> What if I do an outer product of two arrays with different base indices?

You'll get an offset array indexed by the product of the axes of the input:

    julia> A = OffsetArray(1:3, 0:2)
    OffsetArray(::UnitRange{Int64}, 0:2) with eltype Int64 with indices 0:2:

    julia> B = 4:6

    julia> A*B'
    OffsetArray(::Array{Int64,2}, 0:2, 1:3) with eltype Int64 with indices 0:2×1:3:
      4   5   6
      8  10  12
     12  15  18
> Whatever your answer, I suspect it is more cognitive overhead to remember than "always 1 based" or "always 0 based".

Sure, but the real point here is that in most situations, you don't actually care what the axes are, because you use the higher level abstractions (e.g. iteration over elements or the index space, linear algebra operations, broadcasts, etc.), which all know how to deal with arbitrary axes. The only time you should ever really have to think about what your axes are is if there's some practical relevance to your problem (OffsetArrays are used heavily in the images stack).

Not really. Imagine taking a rectangular region of interest from an image. With offset arrays, you can use the original indices if that suits you. I’d say that’s strictly better than using always zero based offsets.

Well for FFTs you really want periodic indices. Luckily Julia has those too: https://github.com/JuliaArrays/FFTViews.jl

I do a lot of FFTs for a living - I really want my zeroth frequency in my zeroth bin. Negative indices (as done by Python and other places) are nice though.

This points to another example where 0-based should be preferred. When doing modulo arithmetic, 0..N-1 mod N gives 0..N-1, but 1..N mod N puts the zero at the end. I also cringe at languages where -1 mod N does not equal N-1.

Yes, people should never use “mod” or % as a synonym for “remainder”. It is horrible.

For any language with a mod operator, a mod b should always be equal to (a + k×b) mod b for any integer k.

Breaking this invariant makes the mod operator useless in pretty much every application I ever have for it. In e.g. JavaScript I need to define a silly helper function like

    mod = (x,y) => x%y + y*(x%y && x>0^y>0)
And then remember to never use the native % operator.

What language defines % as mod and not remainder?

According to Wikipedia: Perl, Python, Lua, Ruby, Tcl, R (as %%), Smalltalk (as \\), various other languages under some alternate name, sometimes with the two variants given different names.

The other (“remainder”) version which takes its sign from the first argument is pretty much worthless in practice. IMO it doesn’t need any name at all. But what it definitely doesn’t need is a shorter and more convenient name than the useful modulo operator. Its ubiquity is a serious screwup in programming language design, albeit largely accidental.

I guess because remainder is simpler to implement? (anecdotally, the assembly generated by @code_native in Julia is much shorter for remainder)

In my opinion neither are used sufficiently often to justify taking up a valuable ASCII symbol (same with xor).

The other point is that remainder makes much more sense for floating point numbers, as it's exact. mod on the other hand is not, and is fiendishly difficult to get "right" (if that is even possible).

In what context do you use it? I use a “mod” operator all the time for floating point calculations, and have never come across a need for the other one.

As one simple example, it is frequently useful to map floats into the range [0, 2π), but I have never once wanted to map positive floats into the range [0, 2π) while negative floats get mapped into (–2π, 0] by the same code.

Well, I have indeed spent more time thinking about this than is probably healthy, but here are some nice edge cases to think about:

- https://github.com/JuliaLang/julia/issues/14826

    mod(-eps(), 4.0)
- https://github.com/JuliaLang/julia/issues/17415

- https://github.com/JuliaLang/julia/issues/3127

- https://github.com/JuliaLang/julia/issues/3104

I'm pretty sure there are more issues.

re: mod 2pi, typically it is most accurate to reduce to (-pi,pi) (i.e. division rounding to nearest). Also to get it accurate you need fancy range reduction algorithms, hence julia has rem2pi https://docs.julialang.org/en/stable/base/math/#Base.Math.re...

Without knowing the intended use case of the code, the edge case behavior in some of these cases is pretty meaningless. Different behavior might make sense in different applications, and in many applications the choice is largely irrelevant. Whichever one you choose someone who wants the other version will have to work around it.

These examples don’t really have much bearing on the general usefulness of “remainder” vs. “modulo” though.

This depends on what the behavior is of the division operator. People should use floor division, then implementing modulo is easy.

I don’t know what is available as chip instructions, I could certainly believe hardware designers made the wrong choice.

I take the opinion that people should have the option to round division (and compute remainders) however they want: down, up, to zero, to nearest (with different options for breaking ties).


Common Lisp has all four division operators ('floor', 'ceiling', 'truncate' (equivalent to what most languages consider division), and 'round') [0], and both 'mod' and 'rem' [1].

[0] http://www.lispworks.com/documentation/lw50/CLHS/Body/f_floo...

[1] http://www.lispworks.com/documentation/lw50/CLHS/Body/f_mod_...

> I really want my zeroth frequency in my zeroth bin

That's precisely what it does!

> I also cringe at languages where -1 mod N does not equal N-1.

julia> mod(-1,10)


A 1-character infix operator is much cleaner to read than a 3-character name of a 2-parameter function.

But admittedly when you use 1-indexed arrays, any kind of modulo operator becomes pretty inconvenient a lot of the time (lots of futzing to avoid off-by-1 or boundary errors). So maybe it doesn’t matter in the Julia context.

Three years ago, when I found out about Julia (and quickly fell in love with the language), I not happy at all about the 1-based indices and column-major storage, and at the time, a lot of the responses I got were along the lines of "Julia is for mathematicians, and 1-based indexing and column major are just fine for us". Now, Julia is able to have indices with any base that you want, and can handle row-major storage as well (thanks to Tim Holy's great work). Why is anybody concerned about this anymore? Julia can do whatever you want, you shouldn't be stuck with languages that can ONLY do 0-based indexing.

> I initially defended the choice, but I now agree that 1-based indexing now seems like a poor choice since Julia has become something more than the original mission of a better MATLAB or Octave. It’s a, admittedly, minor tragedy of Julia’s success

Especially since it would have been very easy to 'do it like Ada' and allow any start index by default (I have use Lua and it's really annoying to use an extension language with 1-index when the base language is 0-indexed)

The problem is the default behaviour: in Ada you can use whatever you want, in Julia the default is 1-based which is quite controversial.

Picking whatever you want is one line of code, and similar to the line you'd need to allocate any array. And your choice propagates to the downstream operations.

Instead of complaining in the abstract, check it out, you'll be impressed. https://julialang.org/blog/2017/04/offset-arrays

I already know about this but if you use someone else's code in a library for example in Julia the library will most likely only work with 1-indexed array, in Ada it will work with any base index.

Default matters!

It's straightforward to write library code in Julia that handles any type of index. If it doesn't, it is probably old code from before the time before offset arrays. Take a look at Tim's linked blog post.

It is straightforward to create a language where a user can specify any type of index (like Ada), if you need to import a library everywhere to have the correct behaviour then this library should be a part of the language instead.

The controversy reminds me a bit of the "Python uses whitespace semantically, oh my" back in the days. Close to bikeshedding: Lots of discussion about a very minor point that everyone has an opinion on though.

(not referring specifically to you here!)

Even Visual Basic (of all things) allowed you to set it with Option Base. Never really understood why its not supported in more languages...

Because due to UNIX's adoption every programming language needed to be like C.

Until then Fortran, Lisp, Algol, Pascal, Basic never had any big issue with indexes.

because the whole Option Foo stuff that Visual Basic had, is a hideous global state thing that makes it hard to reason about code without checking for what it's options were set to.

Programming languages should not be configurable in that kind of way.

This is most certainly an area that needs improvement.

You can get around some of the pain of haning error types in catch statements if you are comfortable paying the price for a locally defined function:

result = try Something() catch err h!(e)=throw(e) h!(e::MethodError) = SomethingElse() h!(err) end

This pattern works well if an error case should return a default value and all others should throw.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact