Hacker News new | comments | show | ask | jobs | submit login
Julia 1.0 (julialang.org)
915 points by montalbano 4 months ago | hide | past | web | favorite | 434 comments



As an outsider, I'd like to see somewhere near the home page a few short snippets of code to get a feel for Julia and hopefully show the kind of uses for which it is a natural choice.

Nim's home page¹ shows a piece of sample code right at the top. Perl6's page² has a few tabs quickly showing some patterns it's good at. Golang³ has a dynamic interpreter prepopulated with a Hello World.

Julia's home page shows a nice feature list and links to docs to deep dive but it doesn't do a good job of selling it.

¹ https://nim-lang.org

² https://perl6.org

³ https://golang.org


For scientific computing, showing the package ecosystem is the most important thing. When you look at this thread, people are asking about dataframes and differential equations. Julia's site reflects this: yes there are things like Pandas, and for plotting, etc.


I do think we should add a code sample prominently on the page, however. I've always find it really frustrating when I look at a programming language and can't get a quick sense of how it looks. If a language looked like, say, APL, I'd be reluctant to use it even if it had an impressive ecosystem.

Issue filed: https://github.com/JuliaLang/www.julialang.org/issues/115.


Oh I thought the first tab had code and the complaint was it was too low. I guess that was a prototype build of the site. Yeah we should get that back.


Yeah, there is code set up in the page to add a code sample for each tab in the ecosystem. It's a matter of content (adding it).


Racket sounds like a really exciting language, but every time I look at some code...


Racket is having its runtime replaced with the now open-source chez scheme, so it should get faster (my limited understanding) even though it is already faster than a lot of dynamic languages.

Lisp languages do take a long time to get used to.


The blog mentions that Julia is supposed to be a general purpose language, and not a language built specifically for scientific computing. Is that wrong?

The first impression does leave me thinking that using Julia for different programming domains like distributed internet-facing servers or web services is not something it was built for.


> The blog mentions that Julia is supposed to be a general purpose language, and not a language built specifically for scientific computing. Is that wrong?

No. Julia is a general purpose language that has so far been mainly focused on scientific and mathematical programming.

It's design is probably least friendly to the real-time programming domain (GC based) but it can apparently be used there as well:

http://www.juliarobotics.org/

I see an extremely bright future ahead for Julia!


Fun fact, the GC really isn't an issue and instead the opposite issue was found. There had to be callbacks built to slow down the computations for the robotics simulations in order to get it to run at real-time because it was too fast.

https://github.com/JuliaRobotics/RigidBodySim.jl/blob/34ac43...

Notice that this function is purposefully sleeping the differential equation solver in order to slow it down to the exact amount to get the simulation back to real-time.


That's a very amateurish way to solve this problem. Games typically have a main loop which will check the amount of time that has passed on every iteration of the loop.

You can see how much extra time is left for that frame at the targeted frame rate, then sleep for that amount of time.

Then you never have to slow down anything else, you can set a maximum amount of cycles per second and you can sleep once per cycle. The faster the CPU, the less power it should use.


Interesting to hear. This just isn't a thing I think is ever encountered in this kind of stuff. The author's previous struggle was to get to real-time. Getting below it wasn't something they really considered. Here's a talk they gave:

https://www.youtube.com/watch?v=dmWQtI3DFFo


My sense has been it's designed specifically to be the best language for scientific and mathematical computing, but also a general purpose language in the sense you don't have to switch languages when you need to incorporate into a web service or a GUI tool or text munging.

So like Python, you have SciPy Pandas etc. but don't have to leave Python when you need to do a bunch of text processing or whatever.


Big difference being that most of the packages in Julia are written in Julia itself (unlike Python, where they're often written in some more performant language); so the aspiration is that you have to leave Julia even less frequently than you have to leave Python.

At the same time, calling into other languages (eg C) is super straightforward (disclaimer: last time I checked).


For a talk on "Low Level Systems Programming in High Level Julia" see the following video:

http://www.youtube.com/watch?v=AaQ7XuAR2yY&t=5m6s


On a similar note, I continue to be interested in Julia. However, the examples and videos shared by the community are predominately scientific or mathematical in nature. The material is too dense for the layman programmer such as myself. Not specifically a compliant, but more of a call to action for the community to share more traditional usage examples to intrigue others like myself.

And of course, I should really assist to solve this problem myself. But, I thought to encourage other language experts.


So show code samples using dataframes and differential equations.


Using the IterableTables interface, the DifferentialEquations.jl solutions are presented as tables and directly convert to dataframes:

http://docs.juliadiffeq.org/latest/features/io.html#Tabular-...

f_2dlinear = (du,u,p,t) -> du.=1.01u

prob = ODEProblem(f_2dlinear,rand(2,2),(0.0,1.0))

sol1 =solve(prob,Euler();dt=1//2^(4))

using DataFrames

df = DataFrame(sol1)

# Result 17×5 DataFrames.DataFrame │ Row │ timestamp │ value 1 │ value 2 │ value 3 │ value 4 │ ├─────┼───────────┼──────────┼──────────┼──────────┼──────────┤ │ 1 │ 0.0 │ 0.110435 │ 0.569561 │ 0.918336 │ 0.508044 │ │ 2 │ 0.0625 │ 0.117406 │ 0.605515 │ 0.976306 │ 0.540114 │ ...


I agree but and I think is better to show both. In a first section the package ecosystem is presented mostly targeting scientific computing and on a second section some code examples for general-purpose computing.


Definitely agree. We just revamped the website and I'd love to see some domain-specific examples of Julia code in the multi-tab "ecosystem" section. I think it'd make a great addition. :)


Agreed. Even better if they had code snippets that highlights what makes the language cool (instead of the usual "Hello World").


what is "cool" is very person specific though. It's hard to fully appreciate multiple dispatch in 8 lines for example


I found https://giordano.github.io/blog/2017-11-03-rock-paper-scisso... to be a pretty good attempt at that. Of course, the code's elegance will probably not be immediately evident to a beginner, but a link to the full explanation could suffice to complement that.


Off-topic, but how do you type footnote-style number?


Open a julia editor or the julia repl, type `\^1<tab>` and copy and paste the unicode superscript ¹ into your post. Or whatever else your preferred method of getting unicode characters is ;).


I see I'm not the only one that uses the Julia REPL to get unicode characters!


I agree, and I think they had a short code example earlier. It was one of the things that got me interested in it then.


I'm a quite happy Julia user, however I feel there are still some warts in the language that should have warranted a bit more time before banging 1.0 on the badge.

Exception handling in julia is poor, which reminds me of how exceptions are (not/poorly) handled in R. Code can trap exceptions, but not directly by type as you _would_ expect. Instead, the user is left to check the type of the exception in the catch block. Aside for creating verbose blocks of boilerplate at every catch, it's very error prone.

Very few packages do it right, and like in R, exceptions either blow up in your face or they simply fail silently as the exception is handled incorrectly upstream by being too broad.

Errors, warnings and notices are also often written as if the only use-case scenario is the user watching the output interactively. Like with R, it's possible but quite cumbersome to consistently fetch the output of a julia program and be certain that "stdout" contains only what >you< printed. As I use julia also a general-purpose language to replace python, I feel that julia a bit too biased toward interactive usage at times.

That being said, I do love multiple dispatch, and julia overall has one of the most pragmatic implementations I've come across over time, which also makes me forget that I don't really like 1-based array indexes.


In 0.7 I put quite a lot of design effort into the logging system to allow log records to be redirected in a sane way. It's by no means a complete system, but I think the changes were worthwhile and the core infrastructure is now in place to allow consistent and structured capture of messages from any package which uses the Base logging interface.

Exception handling is indeed somewhat of a wart with the core system not having changed much since very early on. I think there's still some serious design work to do and the core people take it seriously. On the other hand, I suspect that numerical codes don't want to use exceptions for much other than "panic and escape to an extreme outer scope". So a lot of the scientific users are probably content for the time being.


I had a similar reaction about this being a little premature.

On the other hand, I'm wondering if this will help a little with the dependency hell that's caused me to drift away from Julia over the last year or so.

At first I was fairly excited about Julia, and greatly preferred it over R or Python for numerical work, library resources aside. It was fast and I liked the language design itself.

Over the last year or two, though, I've had recurring problems with trying to install and use packages, and them failing at some point due to some dependency not working. Sometimes the target package itself won't install, but most of the time it's some lower-level package.

It's more frustrating than not having packages available, because it creates some sense that packages are available, when they're not. At first I looked at it as some idiosyncratic case with one package, but it's happened over and over and over again.

Basically in this time I've given up on Julia, because there's such a huge discrepancy between what it looks like on the surface and what it looks like in practice, once you get out of certain limited-use cases, or when you're not coding everything yourself from the base language. (Related concerns about misleading speed claims have been raised, although my personal experience in that regard has been mixed, because my experience overall has been pretty good in some critical performance cases, but there have also been some wildly unpredictable exceptions that act as bottlenecks... but it's still much better than R or Python).

When I've tried to figure out what's going on with dependency hell, usually it's because some package that was previously working with a earlier version is no longer working. So maybe stabilizing the language will help that?


Package management is a really-really-really difficult problem that is far from solved. Not to say that your critiques are invalid (quite the opposite), but Julia is just now hitting 1.0 - by contrast Node/NPM have been around quite a bit longer and still have terrible package issues they're working to solve.

If you can, try and find a few hours to help pitch in and solve the package problems, even if it's just updating docs or updating deprecated-but-useful packages.


I know the JS ecosystem has some pretty counterproductive culture when it comes to package management (leftpad), but can you provide some examples of terrible issues still present in NPM? I hear this complaint often and I'm wondering what other people think of as insurmountable technical issues or design flaws in NPM.


I don't believe there are any insurmountable issues with NPM currently.

The current major issue, as it stands, is that it's very easily for a malicious bit of code to sneak into a heavily used JS package and have oversized effects - this happened very recently with a very popular linting-support package.

The other issue is general posting of malicious packages under mistyped names, or takeover of existing packages with malicious updates by new owners.

At the same time, nobody wants to have NPM (the org) manually vet every upload ever made. So, there's that.

Many JS packages are extremely dep heavy, overwhelmingly for minor features (checking something is an array, promise-fying FS, etc) which makes it very easy to infiltrate packages and very hard to vet a package entirely.

Finally, npm (the program) runs into a fair bit of caching woes and it's own dumb bugs which feel like they shouldn't slip into production nowadays. Oh, and sometimes npm (the website) goes down.

The answer for JS, unfortunately, is probably segmentation - as better managed and more secure package repos come up, likely with their own package managers, npm will probably have to up their game. That, I am sure, will bring a whole fresh set of issues.


Because there are a lot less packages with binary dependencies, I've overall had less problems than work R. Most packages are pure Julia, and there you shouldn't face issues.

With the new binary builder, binaries should be another note, so long as you download an official Julia binary, or built Julia from source in the same way those binaries are made (eg, build OpenBLAS with gfortran 7).


That's really disappointing to hear. Dependency hell is what drove me away in 2016. I hope it clears up eventually.


It is a lot lot better than it was in 2016. The new package manager helps a lot. I mean separate environments per project, and upper-bounding package dependencies by default, is just going to avoid a lot of head-aches.

But more generally things are maturing.

And 1.0 will help too, since things won't be chasing a moving target.

If your not in any hurry, I'ld give it 6 months, of people who don't mind a bit of packages breaking (e.g. people like me) using it.

That will be plenty of time for everything to shake out. More than you might expect has actually already been shaken out in the last few weeks in the package ecosystem. Hitting 1.0 should give some package maintainers the drive to get it done.


I don't think _any_ language has got exception handling handling right yet, despite a lot of effort. The problems are particularly apparent in parallel and multithreaded programs.

That said, I am optimistic Julia will have a good solution at some point: contextual dispatch, a la Cassette.jl, enables just the sought of interventions you want for error handling. I'm not quite sure what the result will look like, but I imagine you will see some experimentation in this direction in the near future.


Exception handling is one of the few subsystems that hasn't really had a revamp. It's probably one of the areas that'll get a good bit of thinking post 1.0.


Also: not allowing the expected `catch e::ErrorType` syntax leaves it open for a better design without breaking existing code which means it can be done in the 1.x timeframe instead of needing to wait until 2.0. This is why that syntax hasn't yet been added with the expected meaning.

Related: https://github.com/JuliaLang/julia/issues/7026


Yeah, I agree with your comments about error handling. It’s far from ideal in non-interactive contexts. It’s especially disappointing since you could easily imagine something like Julia replicating Python’s success at transitioning code from interaction (e.g. Jupyter notebook) to production.

I initially defended the choice, but I now agree that 1-based indexing now seems like a poor choice since Julia has become something more than the original mission of a better MATLAB or Octave. It’s a, admittedly, minor tragedy of Julia’s success.


This is what 0-based indexing looks like in data analysis:

>In order to read a csv in that doesn't have a header and for only certain columns you need to pass params header=None and usecols=[3,6] for the 4th and 7th columns:

https://stackoverflow.com/questions/29287224/pandas-read-in-...

Just reading that hurts me.


This is very much a non-argument. Call the columns the 4th and 7th is as arbitrary as calling them the 3rd and the 6th.

Again, 0-based indexing exists to fit a purpose: http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EW...

In my opinion, reading `a = b[1:n-1]` hurts much more than reading `a = b[:n]`.


>Call the columns the 4th and 7th is as arbitrary as calling them the 3rd and the 6th.

No, that's their ordinal position, and how 8 billion non-programmers would refer them as in any everyday setting.

That's also how programmers would refer to them if it wasn't for a historical accident.

What's more, that's also how programmers refer to them when they talk between then and not to the machine ("check the 3rd column" not "check the column at the index of 2").


It's not arbitrary. It's English. In the list [apple, orange, tree] which element is orange? It's the second element.

I have taught Python quite a bit, and I have gotten good at explaining 0 based indexing and slicing based on it. When I switched to Julia there was nothing to explain. And my code has about as many +/- 1s as before...


Just because it is our common convention in lay conversation doesn’t mean it isn’t “arbitrary”.

These spoken language conventions developed before there was an established name for “zero” or even a concept that “nothing” could be a number per se.

For similar reasons, we have no zero cards in our decks, no zero faces on a dice, no zero hour on our clocks, no zero year in our calendar, no zeroth floors in our buildings, East Asian babies are born with age one year, etc.

It’s only by another set of historical accidents that we have a proper 0 in our method of writing down whole numbers. Thankfully that one was of obvious enough benefit that it became widely adopted.


> no zeroth floors in our buildings

In (North?) America. In Europe, there's a ground floor (zero), then first floor (1), etc. Basement is -1 (etc.).

A European friend of mine arrived at college in the USA, and was assigned a room on the first floor of the dorm. She then asked the housing office whether there was a lift, because she had quite some heavy luggage, earning some rather amused looks :-)


Yes, it would be interesting to know the history.

Whoever designed the European convention for labeling building floors was numerically competent.

Too bad medieval European mathematicians and the designers of Fortran weren’t. ;-)


The base level doesn't need to have a floor, it's just ground. Once you add a floor you are on the first floor above ground. Really your condescending tone as if all the mathematicians that prefer to work with 1 indexing are just incompetent is grating.

I'm happy to be writing

``` for i in 1:n func!(a[i]) end ```

to iterate over an object of length n. Or split an array as a[1:m], b[m+1:n]. Slicing semantics which are far more prevalent in my code (and the code I read) than index arithmetic, are truly vastly simplified by 1 based indexing of Julia compared to the 0 based numpy conventions. We simply no longer code in the world that Dijkstra argued for, and I have not seen anybody give a clear argument that is actually rooted in maths and contemporary programming.

I genuinely thought that the Python convention was brilliant, and that 1-based indexing in Julia would suck. It turned out not to be the case.


Sorry, that last bit of my comment was gratuitous.

I am legitimately (mildly) curious about the history of the different naming conventions for floors of buildings though.

> The base level doesn't need to have a floor, it's just ground. Once you add a floor you are on the first floor above ground.

Yes, my point is this is an example where the European 0-based indexing system makes more sense (in my opinion) than the American 1-based indexing system. I speculate that whoever started calling the ground floor the “first floor” hadn’t really put much thought into how well that would generalize to large buildings with many floors including some underground.

Similarly, whoever decided the calendar should start at year 1 AD with the prior year as 1 BC hadn’t really considered that it might be nice to do arithmetic with intervals of years across the boundary.

There are many standard mathematical formulas which are clarified by indexing from 0. But nobody can switch because the 1-indexed version is culturally fixed. Most of the rest of the time the 0-indexed vs. 1-indexed versions makes basically no difference. It is rare that the 1-indexed version is notably nicer.

> Or split an array as a[1:m], a[m+1:n]

Yes, I find it substantially clearer to write this split as a[:m], a[m:]. Particularly when dealing with e.g. parsing serialized structured data. But also when writing numerical code. Carrying the extra 1s around just adds clutter, and forces me to add and subtract 1s all over the place; reasoning about it adds mental overhead, and extra bugs sneak in. (At least when writing Matlab code; I haven’t spent much time with Julia.)


> no zero hour on our clocks

There is. We call it 12 for some crazy reason (it goes 12 AM, 1 AM, 2 AM, ..., 11 AM, 12 PM, 1 PM, ...).

> no zero year in our calendar

Which is quite irritating really. New Year's Day 2000 wasn't the start of the 3rd millenium, because there was no year zero.

> East Asian babies are born with age one year

But not western babies.


The 12 on a clock is a compromise to match between a 1-indexing oral culture and natural 0-indexing use case (which came from the Sumerians who had a better number system).

I don’t know the history of reported ages of Western babies.

> quite irritating really

Yes that is my point.


The equivalent of `a = b[:n]` is `a = b[1:n]`. And I don't think you can get around admitting that there is a fundamental ambiguity in the spoken statement "Take a look at the fourth column!" in a zero-based index system. You always need a follow-up question to clarify whether you mean "everyday informal speech fourth" or "zero-index fourth."


But you can say "Take a look at column 4" instead, which is unambiguous.


Calling the 3rd and 6th columns "3rd" and "6th" is hardly arbitrary.


a = b[1..n]?


> 1-based indexing now seems like a poor choice since Julia has become something more than the original mission of a better MATLAB or Octave. It’s a, admittedly, minor tragedy of Julia’s success.

I’m curious as to why this is a problem outside numerical computing. From my perspective, this is consistent with a long history of mathematics dealing with matrices that predates electronic computers.

0-based arrays are popular because C decided to deviate from what had previously been standard in math and in Fortran.

Is there a reason other than aesthetic preference and habit that makes 0-based indexes better for computing in non-numerical contexts?

I realize both indexing standards are arbitrary and boil down to “that’s the way grandpa did it,” however 1-based indexing grandpa is way older and more entrenched outside computing circles.

EDIT: I suppose with Julia it’s not that important, as other commenters have pointed out that you can choose arbitrary indexes.


> the technical reason we started counting arrays at zero is that in the mid-1960’s, you could shave a few cycles off of a program’s compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.

http://exple.tive.org/blarg/2013/10/22/citation-needed/


Thanks for the great article.

An unexpected takeaway: Michael Chastain's time-traveling debugging framework, which he had in 1995, and which still reads like sci-fi from the future[1].

Alas, this quote will probably stay relevant for a while:

>[We keep] finding programming constructs, ideas and approaches we call part of “modern” programming if we attempt them at all, sitting abandoned in 45-year-old demo code for dead languages.

[1]http://lwn.net/1999/0121/a/mec.html


Very strange article. The first part builds suspense for the great truth that's going to be revealed by the author's extensive research, insisting that it's not something as simple as pointer arithmetic.

Then the second parts quotes the creator of BCPL, revealing – a-HA! – what we already knew, that the array is a pointer to the first element and the index is the offset from the pointer, and that's why it's zero.

(And after that it veers of into complaining about how the papers documenting this history cost too much money, so they didn't actually read them.)


Wow, I hadn’t seen this before. The history of this feud goes back before C. Thank you for a fascinating read!



I've always found this a rather weak argument - he even implies that 2...12 is at least as clear (it being the starting point of the text).

I also thought I'd seen a longer text focusing more on the counting/indexing.

I still don't see the appeal of "element at zero offset" vs simply "first element".

I do agree that < vs <= etc can get messy. But outside of now fairly archaic programming languages I don't see the need. Just use a construct that handles ranges, like "for each element in collection/range/etc". (Or for math, "pattern matching" (or "for n in 1..m").


I believe that may be cited in the article (but its not in an obvious place :) )


This was a fantastic (and relevant) read, thank you for sharing it.


0-based indexing has the advantage of easily mapping to memory addresses since the index is an offset from a base address (perhaps multiplied by a unitary per-item length). I personally prefer this since I tend to think about and work with data as an offset from a starting point.

1-based indexing has the advantage of always knowing the length of the array since the index of the last element is this value. You also get "proper" counting numbers as you iterate.

I've used both in various languages. Perl lets you define the array index start to be whatever number you wish.


As noted above, Julia does too. Heck, if you wanted to, in Julia with probably <100 lines of code you could create an array type that only allows prime numbers as indices. And subject to the constraints of the raw calculations needed, it would be as fast as it can be---there is no "extra" performance overhead associated with doing basically whatever you want.


Zero based arrays are frequently better in a numerical context. Many times when you’re using the index in the computation itself (FFTs for instance), zero based is what you want. For instance, the zeroth frequency (DC) is in the zeroth bin.


Which is why julia doesn't make any assumptions on how your axes are indexed. If you're working in a numerical domain where 0-indexed arrays, or symmetric arrays about the origin, or arbitrarily other transformed axes make sense, just use those.


I understand the argument, but when the default disagrees with your override, there's almost always an impedance mismatch and some pain. It's like being left handed when 99% of the interfaces in the world assume you're right handed.

People argue that zero-based is incidental, and that 1-based is the right way because of it's long history in mathematics notation. I would argue that 1-based is incidental, and that zero-based is better most of the time for modern math and computer architectures.


I can understand why you might get the impression, but I'd encourage you to try out julia and see that we're really quite good at using index-agnostic abstractions, so most code doesn't care what your arrays are indexed with. If a certain set of indices make sense in your domain (0-based for FFTs as you say, symmetric indices about the original for image filters, 1-based for just regular lists of things, etc), just use it, and it'll be convenient interactively, but most library code doesn't really think about it that much.


I'll (cautiously) take your word that this works transparently when I supply arrays as arguments to a library function. However, what does the library function return to me as arrays it allocates? What if I do an outer product of two arrays with different base indices?

Whatever your answer, I suspect it is more cognitive overhead to remember than "always 1 based" or "always 0 based".


> However, what does the library function return to me as arrays it allocates?

Depends what the library function does of course. If it's shape preserving (e.g. if it's a map over the array), it'll generally preserve the axes of the input array.

> What if I do an outer product of two arrays with different base indices?

You'll get an offset array indexed by the product of the axes of the input:

    julia> A = OffsetArray(1:3, 0:2)
    OffsetArray(::UnitRange{Int64}, 0:2) with eltype Int64 with indices 0:2:
     1
     2
     3

    julia> B = 4:6
    4:6

    julia> A*B'
    OffsetArray(::Array{Int64,2}, 0:2, 1:3) with eltype Int64 with indices 0:2×1:3:
      4   5   6
      8  10  12
     12  15  18
> Whatever your answer, I suspect it is more cognitive overhead to remember than "always 1 based" or "always 0 based".

Sure, but the real point here is that in most situations, you don't actually care what the axes are, because you use the higher level abstractions (e.g. iteration over elements or the index space, linear algebra operations, broadcasts, etc.), which all know how to deal with arbitrary axes. The only time you should ever really have to think about what your axes are is if there's some practical relevance to your problem (OffsetArrays are used heavily in the images stack).


Not really. Imagine taking a rectangular region of interest from an image. With offset arrays, you can use the original indices if that suits you. I’d say that’s strictly better than using always zero based offsets.


Well for FFTs you really want periodic indices. Luckily Julia has those too: https://github.com/JuliaArrays/FFTViews.jl


I do a lot of FFTs for a living - I really want my zeroth frequency in my zeroth bin. Negative indices (as done by Python and other places) are nice though.

This points to another example where 0-based should be preferred. When doing modulo arithmetic, 0..N-1 mod N gives 0..N-1, but 1..N mod N puts the zero at the end. I also cringe at languages where -1 mod N does not equal N-1.


Yes, people should never use “mod” or % as a synonym for “remainder”. It is horrible.

For any language with a mod operator, a mod b should always be equal to (a + k×b) mod b for any integer k.

Breaking this invariant makes the mod operator useless in pretty much every application I ever have for it. In e.g. JavaScript I need to define a silly helper function like

    mod = (x,y) => x%y + y*(x%y && x>0^y>0)
And then remember to never use the native % operator.


What language defines % as mod and not remainder?


According to Wikipedia: Perl, Python, Lua, Ruby, Tcl, R (as %%), Smalltalk (as \\), various other languages under some alternate name, sometimes with the two variants given different names.

The other (“remainder”) version which takes its sign from the first argument is pretty much worthless in practice. IMO it doesn’t need any name at all. But what it definitely doesn’t need is a shorter and more convenient name than the useful modulo operator. Its ubiquity is a serious screwup in programming language design, albeit largely accidental.


I guess because remainder is simpler to implement? (anecdotally, the assembly generated by @code_native in Julia is much shorter for remainder)

In my opinion neither are used sufficiently often to justify taking up a valuable ASCII symbol (same with xor).


The other point is that remainder makes much more sense for floating point numbers, as it's exact. mod on the other hand is not, and is fiendishly difficult to get "right" (if that is even possible).


In what context do you use it? I use a “mod” operator all the time for floating point calculations, and have never come across a need for the other one.

As one simple example, it is frequently useful to map floats into the range [0, 2π), but I have never once wanted to map positive floats into the range [0, 2π) while negative floats get mapped into (–2π, 0] by the same code.


Well, I have indeed spent more time thinking about this than is probably healthy, but here are some nice edge cases to think about:

- https://github.com/JuliaLang/julia/issues/14826

    mod(-eps(), 4.0)
- https://github.com/JuliaLang/julia/issues/17415

    fld(1.0,0.2)
- https://github.com/JuliaLang/julia/issues/3127

    mod(1.0,0.1)
- https://github.com/JuliaLang/julia/issues/3104

    mod(0.95,0.01)
I'm pretty sure there are more issues.

re: mod 2pi, typically it is most accurate to reduce to (-pi,pi) (i.e. division rounding to nearest). Also to get it accurate you need fancy range reduction algorithms, hence julia has rem2pi https://docs.julialang.org/en/stable/base/math/#Base.Math.re...


Without knowing the intended use case of the code, the edge case behavior in some of these cases is pretty meaningless. Different behavior might make sense in different applications, and in many applications the choice is largely irrelevant. Whichever one you choose someone who wants the other version will have to work around it.

These examples don’t really have much bearing on the general usefulness of “remainder” vs. “modulo” though.


This depends on what the behavior is of the division operator. People should use floor division, then implementing modulo is easy.

I don’t know what is available as chip instructions, I could certainly believe hardware designers made the wrong choice.


I take the opinion that people should have the option to round division (and compute remainders) however they want: down, up, to zero, to nearest (with different options for breaking ties).

https://github.com/JuliaLang/julia/issues/9283


Common Lisp has all four division operators ('floor', 'ceiling', 'truncate' (equivalent to what most languages consider division), and 'round') [0], and both 'mod' and 'rem' [1].

[0] http://www.lispworks.com/documentation/lw50/CLHS/Body/f_floo...

[1] http://www.lispworks.com/documentation/lw50/CLHS/Body/f_mod_...


> I really want my zeroth frequency in my zeroth bin

That's precisely what it does!

> I also cringe at languages where -1 mod N does not equal N-1.

julia> mod(-1,10)

9


A 1-character infix operator is much cleaner to read than a 3-character name of a 2-parameter function.

But admittedly when you use 1-indexed arrays, any kind of modulo operator becomes pretty inconvenient a lot of the time (lots of futzing to avoid off-by-1 or boundary errors). So maybe it doesn’t matter in the Julia context.



Three years ago, when I found out about Julia (and quickly fell in love with the language), I not happy at all about the 1-based indices and column-major storage, and at the time, a lot of the responses I got were along the lines of "Julia is for mathematicians, and 1-based indexing and column major are just fine for us". Now, Julia is able to have indices with any base that you want, and can handle row-major storage as well (thanks to Tim Holy's great work). Why is anybody concerned about this anymore? Julia can do whatever you want, you shouldn't be stuck with languages that can ONLY do 0-based indexing.


> I initially defended the choice, but I now agree that 1-based indexing now seems like a poor choice since Julia has become something more than the original mission of a better MATLAB or Octave. It’s a, admittedly, minor tragedy of Julia’s success

Especially since it would have been very easy to 'do it like Ada' and allow any start index by default (I have use Lua and it's really annoying to use an extension language with 1-index when the base language is 0-indexed)



The problem is the default behaviour: in Ada you can use whatever you want, in Julia the default is 1-based which is quite controversial.


Picking whatever you want is one line of code, and similar to the line you'd need to allocate any array. And your choice propagates to the downstream operations.

Instead of complaining in the abstract, check it out, you'll be impressed. https://julialang.org/blog/2017/04/offset-arrays


I already know about this but if you use someone else's code in a library for example in Julia the library will most likely only work with 1-indexed array, in Ada it will work with any base index.

Default matters!


It's straightforward to write library code in Julia that handles any type of index. If it doesn't, it is probably old code from before the time before offset arrays. Take a look at Tim's linked blog post.


It is straightforward to create a language where a user can specify any type of index (like Ada), if you need to import a library everywhere to have the correct behaviour then this library should be a part of the language instead.


The controversy reminds me a bit of the "Python uses whitespace semantically, oh my" back in the days. Close to bikeshedding: Lots of discussion about a very minor point that everyone has an opinion on though.

(not referring specifically to you here!)


Even Visual Basic (of all things) allowed you to set it with Option Base. Never really understood why its not supported in more languages...


Because due to UNIX's adoption every programming language needed to be like C.

Until then Fortran, Lisp, Algol, Pascal, Basic never had any big issue with indexes.


because the whole Option Foo stuff that Visual Basic had, is a hideous global state thing that makes it hard to reason about code without checking for what it's options were set to.

Programming languages should not be configurable in that kind of way.


This is most certainly an area that needs improvement.

You can get around some of the pain of haning error types in catch statements if you are comfortable paying the price for a locally defined function:

result = try Something() catch err h!(e)=throw(e) h!(e::MethodError) = SomethingElse() h!(err) end

This pattern works well if an error case should return a default value and all others should throw.


I have high hopes for Julia becoming the defacto open-source scientific language. Despite Python and R both having a massive head start, I'm willing to bet that talented engineers and scientists will be drawn to Julia to implement their next-generation frameworks owing to the powerful features that it offers.

For example, the fact that an array of unions such as Array{Union{Missing,T}} is represented in memory as a much more efficient union of arrays is a perfect example of where a clever compiler can make the logical thing to do also the efficient thing to do!


I am a machine learning library developer and I don’t share your feelings. For example the specific example you cite, I feel, should never be something scientists or engineers actively think about, only language implementers.

Once you make that distinction, then whether you write it as a Cython module exposed in Python or you can use native language features to do it in Julia, nobody cares. It’s encapsulated away from people who use numerical libraries, as it should be.

I also spend time developing these “runtime overhead avoidant” backend numerical libraries, and I would say I’ve seen no significant reason to prefer Julia over Cython.

Don’t getme wrong, Julia is great, just not offering anything fundamentally different. And since there’s already a critical mass of people with engineering and optimization experience in the Cython & Python extension module stack, I’d expect that community to continue dominating Julia just by attrition alone.


It's offering something completely different because of the compatibility, compile-time controls, and ability to fully interprocedurally optimize.

http://www.stochasticlifestyle.com/why-numba-and-cython-are-...


Your article raises several valid points but is a overly optimistic. I still don’t believe that you AD through arbitrary code to produce useful gradient estimates; I’ve seen that fail miserably time after time.

I don’t think Julia’s generic code is the end game for fast code though. Efforts in Python toward code gen include projects like loopy, which targets accelerators, are still developing but make it easy to decouple kernels from loop domains, target specific devices and optimizations etc.


The article you link is severely wrong about both numba and Cython. I frequently use Cython to quickly wrap calls to other C++ implementations of tools I want to try and have a working Python module in a matter of minutes, and I have almost no knowledge of C++.

Modern numba can also do a lot more for huge scale projects than what the article suggests.

Julia docs also seem very smugly proud of multiple dispatch and autogenerating implementations for multiple types or signatures.

But Cython fused types allow the exact same polymorphic multiple dispatch. Here’s an example pedagogical project illustrating that point.

< https://github.com/spearsem/buffersort >

This pattern is quite easy and offers a lot of generic strategies in Cython, especially if you just want a bunch of overload options in a pure C backend with a thin entrypoint to Python.


I use Cython a lot, yet heartily welcome Julia. I think you are over reacting a shade here.

With some persistence Python-Cython-Numba can get the work done, but to me it has never felt like a consistent whole, usually a tagged on hodge podge. Numba is cherry picky on what it will and will not optimize, I totally understand why it is so. It has always been a 'will it or wont it" with Numba.

I think the potential that Julia offers that Cython and Numba doesnt is the option of a programmable syntax for the low level manipulation. Cython has improved, but for the longest time it was always a forced decision between operating at a the higher level of array/sub-array manipulation in Python or down to tedious index manipulation in Cython with nothing in between. Not to mention fighting the Cython compiler and Python's C API to get the 'yellows' out.


I think one of my points is that what you call a “tacked on hodge podge” is actually a desirable trait, isolated and targeted optimization of only subcomponents. This is preferred to working a language where the whole thing inhabits that same space.


The points that you mentioned are not relevant to what's in the article.

Cython fused types are required to be known at the package's compile time, and that's exactly the point that makes it less composible. Not requiring this is exactly the advantage of Julia which is demonstrated.

The article doesn't say Numba cannot do huge scale projects. It just talks about the difference in the compilation strategies. If you read that and think that it means Numba isn't suited for huge scale projects, then that's your interpretation of how Numba works, not an opinion directly stated by the article.

The article gives an easy way for you to demonstrate it wrong. You can show 10 lines of code where you send dual numbers and uncertainty objects through SciPy's ODE solvers to generate solutions with gradients and error bars. If Cython can do this, please show it.


The first and second paragraphs under “Point 1” are specifically saying Cython & numba work just as well as Julia for single functions, then saying the problem where they are inferior to Julia is large code bases.


No, I say that Cython and Numba made an engineering tradeoff that reduces their performance and flexibility in exchange for the ability to compile their codes separately, whereas Julia needs to compile dependently. You are the one that is inferring that means it's inferior. There are cases where this ability to easily reduce compile time can be useful, and there are cases where the improved flexibility and performance is useful. But this is the tradeoff that is made in a concrete form, and the value judgement of whether it's a good one is for you to make.

Besides, please show me that 10 lines of Cython that shows it can AD and throw numbers with uncertainties through SciPy's ODE solvers since it's just as flexible as Julia. It should take less characters to write than your previous response!


This is a quote from the link:

> “It's not hard to get a loop written in terms of simple basics (well okay, it is quite hard but I mean "not hard" as in "give enough programmers a bunch of time and they'll get it right"). Cool, we're all the same. And microbenchmark comparisons between Cython/Numba/Julia will point out 5% gains here and 2% losses here and try to extrapolate to how that means entire package ecosystems will collapse into chaos, when in reality most of this is likely due to different compiler versions and go away as the compilers themselves update. So let's not waste time on these single functions. Let's talk about large code bases. Scaling code to real applications is what matters. Here's the issue: LLVM cannot optimize a Python interpreter which sits in the middle between two of your optimized function calls, and this can HURT.”

You seem to be disingenuously recasting what’s written in the link to serve shifting purposes, to the point that I am skeptical you’ve even got a coherent claim here in your comments.


1. You're talking to the author.

2. The quoted text talks exactly about separate compilation, like the comments above do. So it's really unclear where you take offense. Maybe read it again carefully?

3. I have championed numba for a long time, and am now happy to jump to Julia, for pretty much those reasons. I have implemented time critical parts of the code in numba, and that has forced me to reimplement a whole bunch of standard algorithms rather than relying on library implementations. Need sparse matrices in your hot loop? well you can't just call scipy sparse. That's the fundamental disaster of numba. You sit in the middle of the most amazing ecosystem and you can't use it.


I’m not clear on why you being the author is supposed to matter. There’s a direct, unambiguous quote (of your own writing) that disagrees with what you are saying here. I don’t care who wrote it, only that your opinions here in the comments are not consistent with the actual linked post.

Authors can have inconsistent opinions, and can sometimes try to retroactively reinterpret something they wrote to serve a different purpose later on. That is happening here. It’s not a big deal; I’m just saying for this reason I don’t agree with your claims about my comments, nor your overall comments about numba (even after re-reading your link and comments and deliberately reflecting on them).


I'm not the author, the guy you're replying to is. The article that you quote says that crossing function barriers and joint compilation are issues with numba and cython. So does the comment you were replying to. You've failed to articulate where the perceived discrepancy is.


I see your comment grayed out, and I just want to chime in, as some who does a lot of numerical stuff (more than a decade, published stuff, support multiple lab research projects etc), I want to second this point of view. When it’s time to get real work done Python is more than good enough, and there’s plenty of strategies for acceleration where required.

And when I want Julia’s promise of fast loops, I use Numba. If all the effort gone into Julia had instead been spent on fixing remaining warts in Python workflow for science, we wouldn’t even havee this conversation.


The point is that Python presents a very complex environment where you have to deal with different languages and technologies.

For package developers it is a lot eaiser to use Julia.

Saying one should focus more on Python and we would not have these problems is missing the point. Enormous resources by countless companies has been poured in to solve the performance problems of Python.

It is almost impossible to do due to the language design of Python. You cannot fix it without breaking the language.

Julia in contrast required minimum effort and resources to get fast. It is almost a toy project compare to Python. It is all down to clever language design which allowed them to use a rather dumb and simple compiler while letting LLVM do most of the heavy lifting.

Every advance of Python is going to require 10x the effort of advancing Julia. It will just be a question of time before Julia catches up. Python has a huge lead so that will still probably take years but it will happen.


I’m not missing the point: I’ve tried doing the same thing in Julia before as I do in Python, and it’s not a 10x difference. No language (relevant to this discussion) is ever going to cut down on lines of code required to do error checking, code gen, plotting etc. Sure, Julia is great as a high-level interface to LLVM, but so is Numba.

Python’s design leaves something to be desired performance-wise for those coming from JVM or native languages, but it’s a trade off, not an obvious win (for Julia), and the problem goes away as programmers get wise to performance strategies in Python.


I have quite limited experience with Cython and tried Numba just a couple of times, but I'm curious how much would it take to rewrite one of my Julia libraries to them.

The library is for reverse-mode automatic differentiation, but let's put AD itself aside and talk about code generation. As an input to code generator, I have a computational graph (or "tape") - a list of functions connecting input and intermediate variables. As an output I want a compiled function for CPU/GPU. (Note: Theano used to do exactly this, but that's a separate huge system not relevant no Numba or Cython).

In Julia I follow the following steps:

1. Convert all operations on the tape to expressions (~approx 1 line per operation type).

2. Eliminate common subexpressions.

3. Fuse broadcasting, e.g. rewrite:

    c .= a .* b
    e .= c .+ d
into

    e .= a .* b .+ d
Dot near operations means that they are applied elementwise without creating intermediate arrays. On CPU, Julia compiler / LLVM then generates code that reads and writes to memory exactly once (unlike e.g. what you would get with several separate operations on numpy arrays). On GPU, CUDAnative generates a single CUDA kernel which on my tests is ~1.5 times faster then several separate kernels. Note that `.=` also means that the result of operation is directly written to a (buffered) destination, so it no memory is allocated in the hot loop.

4. Rewrite everything I can into in-place operations. Notably, matrix multiplication `A * B` is replaced with BLAS/CUBLAS alternative.

5. Add to the expression function header, buffers and JIT-compile the result.

In Python, I imagine using `ast` module for code parsing and transformations like common expression elimination (how hard it would be?). Perhaps, Numba can be used to compile Python code to fast CPU and GPU code, but does it fit with AST? Also, do Numba or Cython do optimizations like broadcasting and kernel fusion? I'd love to see side-by-side comparison of capabilities in such a scenario!


> In Julia I follow the following steps: … does it fit with AST?

I'm fairly certain the steps you've listed can be accomplished through AST manipulations, and would go something like

    def manip_ast(fn):
        import ast, inspect
        fn_ast = ast.parse(inspect.getsource(fn))
        new_fn_ast = …
        return compile(new_fn_ast, …)

    def rewrite(fn):
        fn = manip_ast(fn)
        fn = numba.jit(fn)

    @rewrite
    def func(*args):
        …
there's nothing in the language that prevents this from working with the autograd package, except no one's taken the time to implement it (https://github.com/HIPS/autograd/issues/47). That said, for many tasks with wide vector data, a DL framework is going to do ok, e.g. PyTorch.

> Julia compiler / LLVM then generates code that reads and writes to memory exactly once (unlike e.g. what you would get with several separate operations on numpy arrays)

Numba's gufuncs address exactly this + broadcasting over arbitrary input shapes. I've used this extensively. That said, I don't find fusing broadcasting is always a win, especially when arrays exceed cache size. Numba's CUDA support will also fuse jit functions into a single kernel, or generate device functions.

Sometimes you want manual control over kernel fusion, and I've found the Loopy (https://documen.tician.de/loopy/) to be fairly flexible in this regard, but it's a completely different approach compared to Numba/Julia.

I'd be interested in a side by side comparison as well, and I was thinking that the main difficulty would be that I couldn't write good Julia code, but maybe we can pair up, if that'd be interesting, to address several common topics that come up (fusion, broadcasting, generics but specialization, etc).


> there's nothing in the language that prevents this from working with the autograd package, except no one's taken the time to implement it (https://github.com/HIPS/autograd/issues/47).

I believe it's more complicated than most posters there realize, especially in the context of PyTorch (which uses a fork of autograd under the hood) with its dynamic graphs... Anyway, AD deserves its own discussion, that's I didn't want to concentrate on it.

> I'd be interested in a side by side comparison as well, and I was thinking that the main difficulty would be that I couldn't write good Julia code, but maybe we can pair up, if that'd be interesting, to address several common topics that come up (fusion, broadcasting, generics but specialization, etc).

Sounds good! Do you have a task at hand that would involve all the topics and could be implemented in limited time? Maybe some kind of Monte Carlo simulation or Gibbs sampling to get started?


I think it is easier to write fast, complicated code in Julia than in C, C++, or Fortran. Not just the syntax, but because of great support for generic code by default and metaprogramming making it relatively simple to write code to generate the code you actually want for any given scenario. Interactive benchmarking and profiling are a boon too.

An example of the value of generic code is that forward mode AD is extremely easy, and almost always just works on whatever code you run it on.

Then, once that's done, multiple dispatch (and possible macros for a DSL) allows for a much cleaner user interface than Python offers for numeric code.

I have a lot more experience with R than Python, but seeing more of the scientist/mathematician/researchers side of things, I have to strongly disagree with the view that they should write slow code and contact a CS guy to write fast code in another language when they need it. Do you honestly think that's practical for grad student's projects? Recently, one of my friends wrote a simulation in R. Most of the work was done by hcubature -- a package written in C -- integrating a function written in R. Could just have easily been written in Python. That function was slow, and the simulation ran for days. Before an error caused it to crash, losing days of compute time. I -- a statistics grad student -- helped him rewrite it in Julia, and it finished in 2 hours.

That C/C++ code will still run slowly if they have to call your R/Python code is a problem. They also can't apply things like AD easily. A common solution, used by Stan for example, is to create a whole new modeling language and have users interface through that. Learning a new language -- albeit relatively simple/domain specific -- which they then cannot debug interactively, is another pain point. All this can be avoided by simply using Julia.


The reason I ran away from Julia and don't plan on ever using it again, and don't recommend anyone use it outside of academia, is that so much of the community is made up of grad students. So you get a lot of research code and people who have never been professional programmers maintaining most of the ecosystem. Julia Computing is largely made up of people they've hired from the community straight out of grad school.


I don't see your point of academia and about hiring from the community?

What I see on Github is as professional as it can get. Issues, discussions, triage, review, CI-tests for example.

Maybe you started too early, before Julia was settled? And/or were too over-enthusiastic to begin with? I think Julia had to grow, find the 'correct' solution with e.g. NA/Missing/Nullable. Break things b/c it didn't work out as expected. Postpone things, debugger (maybe?), for more important areas or because base was not stable yet.

Two years ago in a project I hoped that people would switch immediately from R to Julia. But in retrospect it was good they didn't. Julia was not ready for them and too much ecosystem stuff missing/unclear still. (This said, Julia would in principle have been much much better suited for that project).


Things are decent on average, but there's a persistent carelessness and rush to do things without paying attention to the consequences. More in packages than base nowadays, but there's a lot of merging and releasing things immediately without waiting for code review that could have caught mistakes before breaking users.


Out of curiosity: what language has a package ecosystem that in your opinion does do this right?


Large Apache projects, notable widely-used c++ projects like boost, llvm, zmq, cmake, the c++ language standards committee itself, all take their time and rarely if ever release changes/bugfixes immediately. Things go through review, testing, release candidates, and people other than original authors of code provide input before normal users get their hands on anything. The core pydata projects take their time and are cautious about breaking things.


I complained also about the "cowboy" culture I saw among the Julia developers when I first started with it (people making a change directly to master, or merging there own PR without giving time for people around the world to review, or not having a minimum number of qualified reviewers before merging), but those days are gone, and I feel they've matured quite a lot in the past few years in that respect. Some of it I think was simply the great excitement that comes from being able to be so creative with the language, and a rush to get things figured out and nailed down to finally get to v1.0. As far as projects in other languages, I don't really feel it has much to do with the languages themselves, more the type of people that particular project attracts.


Just in the last few months BinDeps was broken by a "deprecation fix" that was completely wrong and using a name that didn't exist, and it got merged and released by the patch author before anyone else could look at it, breaking many downstream packages.

Refactorings and major changes in ZMQ.jl and the web stack similarly get merged and released immediately with zero review, still. This is a major problem.

Features in the base language have been deleted during 0.7-DEV because a single core developer didn't like them, despite multiple other core developers voicing disagreement that the features were useful and removing them was not urgent or necessary.

It's not a development culture I would rely on when products and money and jobs are at stake. Even the startup you were working with abandoned julia, correct?


> Just in the last few months BinDeps was broken ...

What I don't understand is why you didn't just stay with old stable versions? You wouldn't be exposed to such issues, wouldn't you?

> It's not a development culture I would rely on when products and money and jobs are at stake

On the other hand this 'development culture' has brought brilliant results in a relatively short amount of time with a relatively small team.

There was a talk [1] at the Juliacon 2018 where a company very successfully replaced an IBM product with Julia code. At 48:07 there was a question 'about problems with changes in Julia'. Answer: they started with v0.3 and 'didn't really have many problems'. They 'didn't use anything particularly exotic'. So, yes, I'd say if you adapt to the given situation it can (could have) work(ed).

I'm not convinced that a non-cowboy style would have been better. (And besides, this doesn't come free moneywise).

[1]: https://www.youtube.com/watch?v=__gMirBBNXY


These incidents were with respect to 0.6-supporting versions of packages. Pinning is a good idea for reproducability but it's not the default, so updating packages or new installs are broken when careless things make it instantly into a release.

Talk to me when google, amazon, microsoft, facebook etc are publicly using and officially supporting julia on cloud platforms or even infrastructure libraries like protobuf.

The carelessness isn't responsible for or helping anything. A good diffeq and optimization suite have been built despite the prevalence of careless practices, not because of them.

It's not a question of money either, just patience and code review and recognition of how many things downstream are going to be affected by mistakes. You'll save more time in not having to put out as many fires than it will cost to slow down and not be in such a rush at all times.


I was not expecting the change to MDD (Marketing Driven Development) at the last minute. But at least 1.0 is out, I hope those wild west times, get past far behind us now. I'll wait for Julia 1.1 and most packages at 1.0 before diving back in.


Dlang. But it's a compiled language and you have the option of statically compiling all your dependencies. The package manager is also quite simple and just works.

https://github.com/dlang/dub


SciPy's ODE solver doesn't have a stable contributor who contributes more than than about once a year even though it has many long standing issues, and PyDSTool hasn't had a commit in 2 years and doesn't work on Python 3 (and most of pycont is incomplete...). R's deSolve isn't even in a repository you can track and still hasn't fixed their DDE solver issues even though they directly mention in the docs how it's incorrect. So it's not like other open source software has strong maintenance structures....


SciPy solvers are mostly interfaces to existing established solvers, and I’ve not had any problems with them. We’ve also used PyDSTool without problem, and it appears to support Python 3,

https://github.com/robclewley/pydstool/blob/master/README.rs...

If you think these are poorly maintained you should see XPPAUT, a tool still quite widely used.


It just sounds like your code doesn't require more than the occasional hotloop. That's fine then. There is no reason to leave numba.

If you have anything that requires more complications, numba becomes painful. You seem to somehow insist that your usecase is the only one out there. We are actively developing a scientific simulation library in Julia. The prototype was in Python+numba. The Julia code is vastly simpler, and that is because Julia is not "an interface to LLVM for fast loops". It's a full fledged language with performant abstractions, closures, inline functions, metaprogramming, etc. To get things fast in numba I ended up doing code generation (I talked to the Numba developers, it seemed the only way). Talk about brittle, painful and impossible to generalize.

Now we have Julia code, using sparse matrices in the hot loop is easy, Automatic Differentiation just works, etc...

The correct comparison for Julia is this context is C++, not Python.


I’m not insisting I have the only use case, but apart from the examples of traversing language boundaries, I haven’t see a good example of what’s so painful in Numba. What is so challenging that is requires code generation?


I have a data structure based on which I generate a dynamical behaviour that I want to integrate. So I construct a rhs.

I further want the user of the library be able to pass it new functions that can be integrated into overall dynamical behaviour.

There are different ways to achieve this, the simplest version is with closures. Pass a list of functions, and some parameters and I construct a right hand side function from it. Unfortunately this does not work with numba. What I ended up doing is passing not the function itself but the function text to generate the code of the function to be jited and then eval that. It worked but it was horrible to maintain, and required users to pass function bodies as text witha very specific format.

Now in Julia we will probably eventually transition to a macro based approach, but the simple closure based model just worked.

Previously I had large scale, inhomogeneous right hand side functions that I wanted to jit in numba and that need sparse matrices. So I ended up having to implement sparse matrix algorithms by hand because I can't call scipy.sparse.

Another instance: I implemented a solver for stochastic differential equations with algebraic constraints in numba, partly to be able to use it with numba jited functions and get a complete compiled solver out of it. This already constrained my users to use numba compatible code in their right hand side functions.

In order to get this to work I had to implement a non-linear solver from scratch in numba rather than being able to use scipys excellent set of solvers.

Julia is not a magic silver bullet. Getting the ODE solvers to make full use of sparsity still requires some care and attention. But I simply spend a lot less time on bullshit than before. (so I have more time to spend on HackerNews :P)

I decided to switch over when for one paper I was able to implement a problem using the standard tools and packages available in Julia within half a day. The Python equivalent would have involved using a new library that came with its own DSL, which would have meant rewriting quite a bit of my code to take advantage of it. Easily several days work.

With DifferentialEquations.jl I also could just test half a dozen different numerical algorithms on a problem in a matter of minutes, find out which performed best and use that for MonteCarlo. Saved about a week of computation time on one project alone. That's not a critical amount, nobody cares if the paper comes out a week later or earlier, but it's nice (and I don't waste super computer time). With Python libraries with different DSLs this would have taken considerably longer, and I probably would not have done it. This is the result of having one library and interface rather than a whole bunch, if everyone agreed on scipys ode interface (which just got properly established in scipy 1.0.0) this would be easy in Python as well. But that's also the point that people have been making: Julias design for composition over inheritance makes it convenient to rally around one base package.

I also personally very much like being able to enforce types when I want to. This is a big win for bigger projects for us.


> With DifferentialEquations.jl I also could just

yep... I took a look at the DE packages in Julia today, and quite frankly they're much better than the situation in Python, perhaps because of one or more prolific applied mathematicians are making a concerted effort, which is lacking Python? I dunno, but I did recommend my colleagues look at Julia for DEs, for this reason.

That said,

> Pass a list of functions, and some parameters and I construct a right hand side function from it. Unfortunately this does not work with numba.

I'm pretty sure I've done this before with numba, so maybe getting concrete would help, e.g. an Euler step

    def euler(f, dt, nopython=False):
        @numba.jit(nopython=nopython)
        def step(x):
            return x + dt * f(x)
where user can provide regular Python function or a @numba.jit'd function. If a @numba.jit'd function is provided, and nopython=True, this should result in fused machine code. This sort of code gen through nest functions can be done repeatedly for e.g. the time stepping loop.

I've done this for CPU & GPU code for a complex model space (10 neural mass models, 8 coupling functions, 2 integration schemes, N output functions, ...) which, by the above pattern, results in flexible but fast code.

Is this a pattern that captures your use case or not yet?

> implement sparse matrix algorithms by hand because I can't call scipy.sparse.

agreed, this is a surprising omission, which I attribute to not much of the numerical Python community making use of Numba, but could be fixed rapidly.

> constrained my users to use numba compatible code in their right hand side functions

what did you run into that was problematic?

> I had to implement a non-linear solver from scratch in numba rather than being able to use scipys excellent set of solvers

I didn't follow; passing @numba.jit'd functions to scipy is in the Numba guide, so what exactly didn't work?


This pattern is how I wrote the SDE solver in Python. That works great and is really useful and the reason why I teach closures.

The library we're building now though does something different. Something like this:

  def network_rhs(fs, Network)
    def rhs(y,t)
      y2 = np.dot(Network, y)
      r = empty_like(y)
      for i, f in enumerate(fs):
        r[i] = f(y2[i])
      return r
  return rhs

> what did you run into that was problematic?

For more complex model building the right hand side functions actually make use of fairly complex class hierarchies. That was the major stumbling block. But people also were using dictionaries and other non-numpy data structures and just generally idiomatic Python that is not always supported. Some of that stuff is inherently slow/bad design of course, but it still ended up killing the use of my solver for this project.

They are now rewriting in C++, which is absolutely a great choice for their case (and probably would have been viable for us too if we had had more people with a C/C++ background in the team).

> passing @numba.jit'd functions to scipy is in the Numba guide

I wanted to use scipy.root from numba. Not the other way around.

Now if all of the numerical Python community was standardized on numba, a lot of this would not be an issue. Scipys LowLevelCallable is a great step in the right direction. But fundamentally I don't see how you will ever get the different libraries to play together nicely in a performant way. It would require every API to expose numba jitable functions. Last I checked, the only functions you could call from within numba code were other numba functions and the handful of numpy features the numba authors implemented themselves (I remember waiting for dot and inv support). If I have an algorithm by a student implemented on a networkx graph as a data structure I can't just jit that. In Julia it automatically is.


I see what you mean. I’ve done exactly that sort of thing in C with an array of function pointers, but I’m not sure it would work in Numba.

The churn is exhausting but I see the merit of starting over and getting everything done in a fully fledged JITd language.


As said in another part of the thread, we were in a very good spot to do so. There absolutely are many reasons to not jump in at this time.


This just sounds like bad software designto me. You are miswanting something overly generic that’s super not needed, and regardless of implementing in any given language, it sounds like it would benefit hugely from taking a more YAGNI approach to it, restricting its genericity based on likely usage (not intended or imagined usage), and either just manually writing stuff for an exhaustive set of use cases, or code genning just those cases and not allowing or encouraging arbitrary code gen of possible other cases.

I love it when libraries limit what can be done with them and document an extremely specific scope they apply to.

When libraries try to be all things to all people, it’s bad. A sophisticatedcode gen tool that enables library authors to choose to do that is a bad thing, not a good thing.


You don't know my use case, and you are not right. I have a network of heterogeneous interacting nodes with quite different dynamics on the nodes. I pay great attention to YAGNI, and constantly tell my students and colleagues to cut enerality and work from the specific case outward. But this is just essential complexity of the problem domain. I've spent years implementing the concrete cases, I know what research we couldn't and didn't do because it was to painful to do by hand, and this is the minimum level of generality I can get away with.

I have ideas for a more general library of course, :P But I'm not spending time on them.


SciPy's solvers cannot handle events which are nearby, most return codes aren't documented, you cannot run the wrapped solvers in stepping control mode, you cannot give it a linear solver routine, etc. So it wraps established solvers but still only gives the very basic solving feature of it, and most of the details that made the methods famous are not actually available from SciPy's interface.

And it wasn't Python 3 for pydstool. It's SciPy 1.0.0. Some of the recent maintenance for this stuff has actually come from the Julia devs though:

https://github.com/robclewley/pydstool/pull/132


You mentioned Python 3, not me. Btw, I did look through your DE packages, and they are definitely an amazing contribution not seen in Python; I've recommend to colleagues.


>You mentioned Python 3, not me.

Yeah sorry, I was just acknowledging that I was wrong when I found the PR and noticed the mistake. I guess it come across oddly.


> Julia Computing is largely made up of people they've hired from the community straight out of grad school.

Where do you think most companies get "professional programmers" from, exactly?

Julia's been designed and implemented by some very bright people, and it shows.


Industry gets professional programmers by hiring people who have been hammering out shipping code in paying products for years, and years, doing support, maintenance, and new product development and research.

Grad students may be brilliant but that does not help give them any insight in to what makes a good ecosystem, toolchain, and feature set good.


We would love to have more professional programmers contribute: unfortunately those 1-based indices put them off.

More seriously: part of the problem does seem to be that Julia does have some significant differences from "traditional" languages (e.g. the concept of a "virtual method" is a bit fuzzy in Julia, what we call a JIT is probably better described as a JAOT, whether it has a "type system", homoiconicity, etc.).

That said, this JuliaCon I have met a lot more people from and classical "programmer" backgrounds. So hopefully that is changing.


I've seen quite an evolution over the past 3.4 years I've been using Julia and the 4 JuliaCon's I've attended so far. Back at the 2015 JuliaCon, a number of us "older" professional programmers felt like we should stage a palace coup, because it did feel like the input of people who had "been around the block" a few times was not really valued. That's changed quite a lot (maybe because in the intervening years many of the core contributors have gotten their Ph.D.s and are having to live off their blood, sweat, and tears (plus lots of joy, to be sure) of producing things with Julia that people will actually pay money for). Yes, it was young and brash, but those awkward years seem to be past, and I feel the future of Julia is quite bright.


How was it with the origin and design with Python, NumPy, Matplotlib, Pandas? Were the people who originated these projects in their time any more professional and seasoned than Julia people are currently?


Well, if they'd been as brilliant as the GP indicates there would be no need for Julia, would there?


Also, I was actually a postdoc...


I hope that if/as Julia gets adopted in industry, more libraries get written and maintained by professionals. If the language is successful, that may change.

Although AFAIK it hasn't really in the case of R, outside of tidyverse.

As a grad student without a CS background, I don't think I'm qualified to say much more on this.


The issue isn't "Professionals" it is domain specialist. In the past data specilist didn't have Computer Science specialty. They were awesome in stats and numbers but lacked strong programming skills.

Hadley Wickham is special because he has both the stats, data science AND programming skills.

data.table is also an amazing library. R to me is the most improved language in the history of programming languages over the past 5 years.

Also R allows anyone with basic hackery R skills to create libraries easily and that is why so many of them are not optimal.


What happens for that scientist when they have to dive into Julia’a stack to debug something weird? In Python and C, you have established debuggers, semantics etc, which means that, yes, there are two languages instead of one, but neither is a moving target compared to a language which just had a 1.0 release.

I get the issue with scientists writing poor code, but Numba has largely solved this problem, by packing an LLVM JIT into a decorator which can be applied to any numerical code to get same speed ups as Julia, except no language switch required.

Citing slow code in the wild with a fast rewrite is a hilariously poor anecdote performance wise. I’ve rewritten Fortran code into Python and gotten speed ups. Regardless of the language, garbage in, garbage out.

Stan is an example where the modeling is “just” a DSL implemented as C++ templates. Does that make that a good choice?


Sometimes low level debugging is a surprisingly pleasant experience as the julia JIT generates proper DWARF debug info. So for instance, you can break in gdb and see the julia source code for any julia generated stack frames, neatly intertwined with the frames of the C runtime.

To be clear, I don't remember needing to do this as a regular user. As an occasional compiler hacker it's been quite nice though.


As someone who's spent decades programming in C/C++, and diving into assembly code (and writing a fair share when the compiler just couldn't do what I needed), I love being able to directly inspect the output code at many levels, including all the way down to "bare metal". Yes, there's a lot of work to be done in the area of debuggers for Julia, but there are already useful debugging tools (like Rebugger) that I haven't seen for any other language.


That’s cool, I wasn’t aware. I think that’s most useful when working with external libs.


It'd probably be a lot easier to debug something somewhere in the Julia stack, than in the C/C++/Fortran code that many R libraries run through.

My point with the rewrite was not garbage in, garbage out. It was that even though the original R code was using a library written in C, that library had to call a function he wrote in R millions of times. That R function being inherently slow is part of the problem. (The easiest fix for that is just writing the function you pass to that library in RCpp, but the overhead on that is still close to a microsecond -- not sure how it is in R. Numba is probably easier.) It is nicer to not have to worry about that.

An alternative approach some libraries provide, like my Stan example in RStan or PyStan, is the DSL they implement to make it easier for end users in R or Python to write fast C++ code.

But, now lets say you're working on an optimization problem. You want to use a gradient-based optimization method, while your code is heavily dependent on the FooBars and Widgets libraries. If these libraries are written in Julia, you can write code in Julia using these libraries, and automatic differentiation will just work as you pass it to Optim.jl.

If FooBars and Widgets were Cython libraries, optionally wrapping C/C++ code, would this work? Could you write functions making use of these libraries, and get efficient gradients for optimization for use with an optimization library and have everything be fast?

'Stan is an example where the modeling is “just” a DSL implemented as C++ templates. Does that make that a good choice?'

I gave Stan as an example of a less than ideal situation, because people normally use Stan from R or Python, not from within C++. Therefore there R and Python don't integrate well. If you're already working in C++, Stan seems ideal. You can use arbitrary templated C++ code with Stan, include external libraries, etc. It needs to be C++ because of their autodiff.


Optimization (or any gradient based algorithm) is a good use case for AD, but I don’t see why Julia’s approaches are any better than Python’s, eg autograd, theano, pytorch etc.

And sure that wouldn’t work with arbitrary Cython modules because Cython was designed as a Pythonic syntax over the Python C-API, and it just happened to become popular for numerical work.

I don’t think that’s a strong argument, though, because anything small enough to be usable with AD can be rewritten without too much time lost, whereas those massive Fortran routines with iterative algorithms wouldn’t produce useful gradients in any language.


>And sure that wouldn’t work with arbitrary Cython modules because Cython was designed as a Pythonic syntax over the Python C-API, and it just happened to become popular for numerical work.

>I don’t think that’s a strong argument, though, because anything small enough to be usable with AD can be rewritten without too much time lost, whereas those massive Fortran routines with iterative algorithms wouldn’t produce useful gradients in any language.

No. In Julia you can just stick the entire delay differential equation solver into the AD functions and get a gradient for parameter estimation. Saying you cannot use an arbitrary Cython code is a limitation, and saying you cannot put a random large code into AD is a limitation. It wouldn't be an issue if these weren't already solved problems, but having a performant software with simple and available AD is not something that's unreasonable anymore. If you use Julia, it's just something you can expect to work.


No you can’t, at least, not if you want it work. This has nothing to do with Julia though and I didn’t say you can’t run AD over the solver but that that generally will not produce good gradient estimates unless the solver is written with AD in mind. DDEs are a mixed case where I’d expect some parts to be workable but not in general. Another example is the FFT, totally worthless to use AD.


In python I can chose whether to use autograd or numba, I can't JIT the autograded function.

Seriously, read this issue:

https://github.com/HIPS/autograd/issues/47


I’m aware of that, but high performance, fat vector AD would be done with eg Theano or PyTorch, not autograd.


That's the problem: You're locked into one particular library. You can not combine libraries without sacrificing massive performance. There is just no way around that.

The numba story in that github issue mirrors my own experience: Excitement! This works! It's fast! Ok here are some limitations that I can work around. Hmm I would really like to use this library, in principle it should be possible to JIT its output/make it JIT compatible. In practice this turns out to be way to subtle. Ok I'm giving up, either I reimplement an algorithm directly in my hotloop or run slow code.

So yeah, as long as your problems do not cross the domain of one package, Python and its ecosystem is great. Stay with it. But I fully expect that we'll see a lot more innovation in Julia. Already now there are classes of problems for which no Python solution exists but which actually have library support in Julia. That's really really remarkable.

Despite my misgivings about the state of tutorials, the release handling process and the aproach to tooling, this is why I switched already. I also hope that all these aspects will improve post 1.0 massively.

It's genuinely a liberation to no longer be confined to silos of DSLs that do not allow for low cost abstractions.

It's fine if you don't need it. I just don't understand why you hang out in a thread about Julia insisting that I don't need it either and could just use numba + Python when that's exactly what I've been using prior to Julia.


> That's the problem: You're locked into one particular library.

But that's not a issue resolved by any particular language; Julia appears to be free of lock-in because it hasn't had time to develop multiple, exclusive approaches to the same problems. Perhaps Julia builds into the language the ultimate performance solutions, so ok, then, for example, wait until there are N different web frameworks, and there you will find your silos. Python has many approaches to making things fast, which is why there are silos. /shrug

> I switched already

I probably would too if I was still a grad student.

> why you hang out in a thread about Julia

I was perusing whilst waiting for my Python code to complete, when I saw someone suggesting Python is already quite OK, catching some hate. I'm more than happy for the Julia community, but I think it's helpful to get exchanges accurate and critical.


Well I'm not a grad student, but my grad students were happy to switch, too. :) I think you are ignoring the structural reasons why Python has to lead to silos, and that these reasons are addressed in Julia. But in the end, time will tell.


I’m not ignoring anything, but trying to evaluate whether Julia will be worth supporting as the N+1 scientific stack in the lab I work for, and what to recommend to incoming students and people who consider getting off of MATLAB.

I think Julia looks very sexy and students jump on that, often without considering whether they will be their actual work done or spend time porting libs or debugging things I can’t help them with.


I was worried about the same. I think it's a valid reason to sit back and wait. Especially if Numba works for you.

I was also and continue to be worried about the tooling, the lack of good tutorials and especially the Type driven system. I like it so far but Object Oriented is a lot more familiar to many people. The library situation for me specifically has tipped to be a net positive. I also could transition my very small team off Python completely.

So it was not an ad hoc decision, I tried it several times over the last year's and decided it's not there. In my specific situation, with a rewrite of a core library coming up and the library situation being there that changed early this year.


> What happens for that scientist when they have to dive into Julia’a stack to debug something weird?

The same things that happened when we had this conversation about what happens to the Fortran writing scientist when Python and Numpy came along, even at that time it wasnt the first time. I am sure it would not have been a whole lot different when a COBOL alternative had come along.


Not quite: the argument for Julia is that a casual user won’t drop down into C from Python for performance while in Julia it wouldn’t be necessary, thus easier.


Anecdote time: I hit a non-deterministic bug in one of the C based Python packages we were using. Most of the time it worked, but we were running MonteCarlo on it and saw many errors.

I guessed correctly that it was using uninitialized memory, and errored when that wasn't zeroed out, but my C wasn't good enough to find where. Had this been Julia the whole C code would have been Julia code and I would have had a chance to dive in and debug. I ended up having to get a colleague who's fluent in C to help.


Let me guess, this can’t happen in Julia because memory is always initialized? Sounds like a performance hit if you know what you’re doing, so maybe you can use uninitialized memory in Julia and run into the same bug. Perhaps Julia makes it easy to use LLVM sanitizers? But you could’ve done this with your C code as well.


The bug could totally happen in Julia. The point was your question: "What happens for that scientist when they have to dive into Julia’a stack to debug something weird?"

And then claimed that this was somehow better in Python + C, which is not my experience. I expect this to be easier in Julia than in Python and C.

I totally agree that the tooling is not where it needs to be, btw, but now that the target has stopped moving I expect it to get there soon.


> And when I want Julia’s promise of fast loops, I use Numba.

This only works (easily) as long as you don't have user-defined types

> If all the effort gone into Julia had instead been spent on fixing remaining warts in Python workflow for science, we wouldn’t even havee this conversation.

Python is too dynamic, you cannot just fix remaining warts. From Julia documentation I know that Julia language has been designed for speed and e.g. some dynamic possibilities have been omitted in order to be able to generate fast code. For a general idea about Julia speed see e.g. this 7 hours old excellent Juliacon video: https://www.youtube.com/watch?v=XWIZ_dCO6X8

Python certainly works. But already for syntax alone, if you have written Julia, it's hard to - in my case - go back to R.


Numba has user defined types,

https://numba.pydata.org/numba-doc/dev/user/jitclass.html

No one intends to fix Python but it’s straightforward to do things like Numba: use a decorator to read out the AST for a function, reimplement it however you like and pass back the compiled function, and document the semantics.


But you have to define the types for them to work. That's exactly the problem and that's what Julia has already solved.


What are user defined types that don’t require being defined?


Generic functions in Julia can use types which have no reference to their definition using dependent compilation.

http://www.stochasticlifestyle.com/why-numba-and-cython-are-...

So you can work with types even when you've never seen their definition given how the compilation will occur with all of the pieces together instead of separately.


I usually profit from type definitions during debugging and memory layout work, so perhaps i don’t see this is a feature.


It's probably us that are using types in a weird way so it's my lack of explanation that's the issue. Types in Julia undergo multiple dispatch, so by passing a type into a generic function you can make the same generic algorithm run in different ways. So I use types to parallelize my code, calculate derivatives, propagate parameter uncertainties, and things like that. This talk from JuliaCon discusses a lot of the things for free that were developed by taking a type from one package and putting it into another:

https://www.youtube.com/watch?v=dmWQtI3DFFo


thank you


> And when I want Julia’s promise of fast loops, I use Numba. If all the effort gone into Julia had instead been spent on fixing remaining warts in Python workflow for science, we wouldn’t even havee this conversation.

Python is rather a mess. Code written in Python can't be sped up without pain/cost, and apparently it will never support concurrency natively. It also suffers from the bane of weakly typed languages, errors at run time instead of compile time.

I think the sweet spot for a language with most of Python's benefits that fixes many of its glaring warts is enormous.


Python supports thread and process based concurrency. Most optimizations require effort, or they wouldn’t be called optimizations.

It’s not weakly types either: you can’t add a list to a string. It’s dynamically typed.


Many scientists might have mathematical ideas about how an operation should be done, but dont want to learn C++ to implement them. We create a division between scientists and programmers that hurts productivity.


Contrarian view: having a division between "general scientists" and "programmer scientists" is a good thing. I really don't miss the bad old days of scientists doing write-only code from a cobbled-together mess based on Numerical Recipes, leaving behind spaghetti C99 or F77 for the next post-doc, commenting out routines for version control; giving us papers with numerical analysis that's not even reproducible by the same group 12 months down the line.

In fact I think it's a much better thing to have a sort of division where a general scientist can put together a simulation/analysis pipeline in Python or R, but if they want to implement new algorithms they'll be sufficiently out of their league that they'll need help from someone who has actually spent time learning how to code properly and efficiently, to do testing and version control etc.

In fact it's very similar to how experimental science often works: you have the general scientist who just knows how to do basic measurements with some instrument, and you have the instrument scientist who has to help if the general scientist wants to use some novel technique, who keeps the equipment tidy and working and logs everything in the big lab journal.


I can understand how one can reasonably hypothesize that. In some sense you can see Julia as one big social experiment in what happens when you do everything possible to blur the line between users and developers of a scientific (and, of course, also general) programming language by solving the two language problem and making the language for users and developers one and the same.

... and it's been wildly successful. Some of the most prolific and talented contributors to Julia and its ecosystem are scientists by day and brilliant programmers of all kinds—crazy meta programmers, compiler hackers, generational GC writers, etc.—by night (as they say). During the keynote last night before tagging 1.0, we went through the history of major features in Julia's development history, it became a running theme that so many of these contributions have come from people who are physicists, chemists, geologists, biologists, etc. So I'd say that Julia is strong evidence that breaking down the separation between scientific users and developers of scientific libraries is a good idea.


I was watching live stream last night from the East coast and very impressed by the achievement by you scientists and grad students since 2012. https://arxiv.org/abs/1209.5145

As a former physicist, I have been through the Numerical Recipes/F77 stage myself two decades ago. In the age of Big Data and Machine Learning, there are many ways to harvest the creativities in people who are not trained as professional engineers. Our software development mindset should be extended beyond the industrial "task oriented" products and platforms.

In the spirit of Turing machine and LISP where code is also data, we can even treat scientific computing ecosystems like Python/R/Julia etc. as models to capture creative scientific minds. It is AI at a high social level. The future is beyond our imagination.

We need more crazy meta programmers.


> I was watching live stream last night from the East coast and very impressed by the achievement by you scientists and grad students since 2012. https://arxiv.org/abs/1209.5145

Thank you, that's very kind of you! To be fair, Jeff, Viral, Alan and I had many collective decades of industry experience interleaved with an equal amount of academia by the time we wrote that paper. And of course the academics are rather suspicious about just how academic we really are.

> We need more crazy meta programmers.

Hear, hear!


That’s a great viewpoint, imho! Scientists in most fields have to write code nowadays, better to make it easier for them to hack together novel and useful tools.


But this is what Cython already is...


He mentioned implementors of frameworks specifically, not users of frameworks which you seem to talk about. Julia is superior to R and Python as a language for package development. This is illustrated by the fact that almost all big popular Python libraries are made in complicated C++. In Julia all the popular packages are native Julia, because it is a high performance language. It means it is much easier to get package contributors and feedback and help from package users. This is why Julia is moving forward so much faster than Python despite having much smaller mindshare.


Library devs also need to actively think about memory layout if they are after crazy amounts of speed.


> For example, the fact that an array of unions such as Array{Union{Missing,T}} is represented in memory as a much more efficient union of arrays is a perfect example of where a clever compiler can make the logical thing to do also the efficient thing to do!

Can you elaborate on how Julia represents arrays of unions, or point to some documentation? I'm working on something where an automatic efficient representation would be useful, and I'd like to learn from other's experience.


There's an overview here: https://docs.julialang.org/en/v1.0.0/devdocs/isbitsunionarra...

In terms of the Julia changes that were needed to support this, there were two key PRs:

https://github.com/JuliaLang/julia/pull/20593 https://github.com/JuliaLang/julia/pull/22441


Thanks, this looks about as I expected. The real problem occurs when you have an array of union types, where sum of the union cases are themselves way, which it does not look like Julia does anything clever about (because it's not even clear to me that it's even possible).


We merged the 1.0 PR at JuliaCon live, with streaming on YouTube. Some people remarked that this might be a first for a major programming language release.

https://youtu.be/1jN5wKvN-Uk?t=1h3m

It was fun to do this with everyone at JuliaCon and online, and thought it was worthwhile to share here.


Every release I am downloading Julia and trying to wrestle through some tutorials. Every time (0.4.0, 0.6.0, 1.0.0) I get stuck at some error, usually during the pre-compiling of some dependency.

For example, I have downloaded julia-1.0.0. I try to follow this tutorial here, linked in this post by someone: http://juliadb.org/latest/manual/tutorial.html

Then I do this and get an error:

julia> using JuliaDB [ Info: Precompiling JuliaDB [a93385a2-3734-596a-9a66-3cfbb77141e6] ERROR: LoadError: UndefVarError: start not defined Stacktrace:

Every time. Even the screenshot of Julia code that julialang.org used to have was not runnable per admission of core devs.

What am I doing wrong? How are you able to run large Julia programs successfully?

Edit:

Let's try tutorial at https://www.analyticsvidhya.com/blog/2017/10/comprehensive-t.... First command: Pkg.add("IJulia"): command fails to install dependency Conda.

Same for the tutorial at http://ucidatascienceinitiative.github.io/IntroToJulia/Html/...

Sigh. Give up


> Every time (0.4.0, 0.6.0, 1.0.0) I get stuck

You should have tried 0.4.2, 0.6.3, 1.0.1. Motto: Don't start too fast and be slow giving up ;-)


Yes, if you want stability, you should never use a x.0 or x.0.0 release (even from a big company - how many people remember Windows 3.0? ) I, however, am a bit crazy, and enjoy living on the bleeding edge, and so am up late tonight hacking making sure all my packages are working correctly on v1.0.0 of Julia!


Thanks for the tip :-)


After every release the package ecosystem usually takes a few weeks to catch up. I'd recommend trying 0.6 until then.


Is it possible for the Julia team to make a short tutorial that does not depend on any external packages? Just show the new features in a fresh installation on a clean machine downloaded from https://julialang.org/downloads/ .

That would help a lot of people get started.


It was literally released today. Of course the packages don't work yet...


That makes sense. Maybe I was used to R (CRAN) where uploaded packages are actually tested against the version they declare to support.


That probably didn't happen at R 1.0.0. (Release notes: ftp://cran.r-project.org/pub/R/R-release-1.0.0.html).

More informative wikipedia page: https://en.wikipedia.org/wiki/R_(programming_language)#CRAN

That being said, R was GNU S once upon a time, so they didn't really build a new language, rather an open source clone of a popular proprietary tool. (In case it wasn't clear I both love R and am madly excited that Julia has finally hit 1.0.0).


CRAN has extremely strong requirements around testing, compatibility, documentation, all aggressively enforced by the maintainers of the repository.

The Julia package ecosystem is much more anarchic, like npm. You basically just have to have a public git repository with certain files in place, and a cursory review from the managers of the package metadata.

There are tradeoffs. I'm honestly not sure which one is the right way. I really appreciate how much I can trust that a CRAN package works, but there's a reason so many R devs are using devtools to do an end-run around it.


On each packages GitHub page, there should be unit test info, including badges that link to Travis and/or Appveyor, indicating where they were treated and whether they passed. If the tests haven't run in a while, they wouldn't have been tested on 1.0. Waiting a month may be a good bet.


JuliaDB declares support for 0.6 not 1.0 - http://juliadb.org/latest/

Which package declared support for 1.0.0 and didn’t compile?


A binary package will be available in CRAN for a given platform/version only if it works (it passes all the tests). It seems that Julia lets you install a non-working package without any warning (you will probably get errors when you run it, but I guess it may also fail silently which is worse).


I was excited to try using Julia 1.0.0 today after a couple years since my last try and... couldn’t.

None of the packages work with it yet. I guess I could go find an older version, but it seems like a problem that Julia will happily allow you to install a package it isn’t compatible with. What’s the point of the Pkg system then? CRAN’s model makes a lot more sense.


Many packages have declared them selves compatible with any future version of Julia... which is clearly a lie. These need upper bounds on their version compatibility, but that will take some time to propagate through the system.


This is indeed a very good idea. In Julia, the package author defines the range of supported Julia version. Most package actually just say "version 0.6 or later". The JuliaDB package was just been updated preventing the installation on Julia 0.7 or 1.0:

https://github.com/JuliaComputing/JuliaDB.jl/commit/8bf3057d...

A automatic check as in R would indeed a better choice.


That will only prevent installation of master of the package or the immediate next few versions on 0.7. The old package versions without a julia upper bound remain available so users on 0.7 or 1.0 will just be held back to old versions of JuliaDB until a new release without an upper bound gets made.


I've never used Julia but why would a language make breaking changes on each release? It doesn't have backwards compatibility? That sounds like a nightmare to work with. Is it because it was pre-1.0?


Yes. v1.0 is literally the release so that all future 1.x releases are backwards compatible. That's why it's such a big deal.


Pre 1.0, we've generally provided backwards compatibility for one version with deprecations. Somewhat ironically that has often led to people just living with walls of warnings until the version that actually broke it came out, which led to a worse experience. We also have automated upgrade tools now, which can do many of the simple (and some not so simple ones) automatically. The situation on 1.0 is slightly worse than in previous releases because we released 0.7 and 1.0 simultaneously to avoid having to ship 1.0 with active deprecations. Of course that means that people will have to fix their packages now, rather than waiting until next year.


Exactly, and that is why 1.0 is a big deal: the end of breaking changes until Julia 2.0.


Is there anything wrong with recommending 0.7? Most things seem to work by now.


v0.7 is mostly for developers upgrading packages. It is nice because it throws warnings instead of errors for things that changed. So if it works on v0.7 without depwarns then it will be 1.0 compatible. But in many cases it's not perfect yet, so staying at v0.6 can be nicer than throwing a bunch of depwarns at new users.


Never had a proper look at the documentation until now. This language is very interesting!

One thing I found is that types in Julia are first-class values [0]. You can put them in variables, pass them around, inspect them, even produce new ones in runtime. Opens up all kinds of metaprogramming opportunities. Very Lispy! (Well, Julia is Lispy). It's also interesting that types are optional, yet they're significant for the optimising compiler. Like, if you do type the arguments of a function, the optimising compiler will have less work to do.

It also looks like much of the compiler pipeline is exposed to the user [1]. You have access to the parser, can tweak the AST. Macros also there of course.

[0] https://docs.julialang.org/en/stable/manual/types/ [1] https://docs.julialang.org/en/stable/manual/metaprogramming/


> Like, if you do type the arguments of a function, the optimising compiler will have less work to do.

Giving types for function arguments doesn't actually have any effect on performance: the compiler specializes on concrete runtime argument types anyway, so completely untyped code is just as fast as fully type annotated code—since the types are known when the code is compiled. On the other hand, giving type information is essential for performance involving memory locations, i.e. field types and the element types of collections.


Thanks for the correction!


I've actually seen many cases where overuse of concrete types (on function parameters) in Julia can lead to poor performance. For example, if functions are written declaring an argument as `Vector{Int64}`, and then people using the function end up calling `collect` (and causing a lot of memory allocations), when they had a value that was an iterator and were forced to convert it into a vector just to call the function. Simply leaving off the `::Vector{Int64}` and getting rid of the `collect` on the caller speeds things up nicely.


It may be a tiny thing, but I'm still amazed at how fast the new REPL starts up, and how quick the package manager is, compared to what was there before. Congrats!


> with a liberal license

I hope Julia doesn't rely on inferior libraries just out of copyleft phobia. I would much rather use FFTW than FFTPACK or whatever other alternative they have in mind. FFTW is really best in class.

I'm okay with them making FFTW optional, but please make it opt-out, not opt-in. People should be getting the best software by default. Copyleft isn't going to hurt anyone but people who are trying to hide source code, and scientific computing needs all of the visible source code we can get.


Many things in Base have moved to separate packages to make it light-weight.

FFTW is available in a package under the MIT license [1]. Also its author is a top contributor to Julia ;).

[1] https://github.com/JuliaMath/FFTW.jl


That MIT license only applies to the Julia wrapper code. The package downloads and dynamically links into an FFTW shared library, which means any code that uses it needs to be GPL if distributed as a whole.


The README for that package [0] states:

> Note that FFTW is licensed under GPLv2 or higher (see its license file), but the bindings to the library in this package, FFTW.jl, are licensed under MIT. This means that code using the FFTW library via the FFTW.jl bindings is subject to FFTW's licensing terms.

If you have an idea on how to make that clearer, we would be happy to review a PR to the FFTW.jl repository.

[0] https://github.com/JuliaMath/FFTW.jl


My mistake, docs there are fine. A few other BinaryBuilder-using packages have neglected to mention this issue, last I checked. And BTW BinaryBuilder is violating even MIT licenses if you don't package and include the license file along with the shared-library download.


Most of these things are just packages. Just install FFTW.jl to get that FFT. https://github.com/JuliaMath/FFTW.jl

Julia's Base is for the language, not for every little detail so this is all handled by the package ecosystem.


Also those bindings were developed by Steve Johnson himself!


Wonderful! The one untold story is the more I use it the better the programmer I become. It is so easy to benchmark and profile code. It has a great community that will help you how to write high performance code. Congrats!


I really hope for Julia to become mainstream and maybe replace Python as the defacto lang for data science. Julia is an incredible language. Kudos to the team developing it.


This is my thinking as well. Python is nice to glue things, but doing high-performance math is not its strength. Things like GIL should be addressed long time ago, but it seems it is so fundamental to make things work in Python that I have big doubts that it will ever be addressed.


I agree that the GIL has become a problem for a variety of high-performance tasks, but, I’m curious, what kind of problems have you encountered with numerical computation? I contribute to both NumPy and TensorFlow, two libraries with different processing models, and I don’t see any obvious area where removing the GIL would provide substantial benefits. However, I’ll readily admit that I don’t think about this too often and it’s entirely possible I’m missing something obvious! Maybe Julia could provide some guidance around this.

I would also bet (but not too much) that we eventually see major progress in removing the GIL. I really don’t think it’ll be around forever!


numpy wiki summaries that well [1]. Too many things especially with complex math cannot run in parallel unless one spends a lot of time on workarounds.

One starts with quick and dirty solution, makes it work on a small dataset and then struggle to make it utilize at least 4 cores to cut running time with more realist datasets. Surelly I can code numerical calculations in C++, but then the code cannot be maintained by python-only guy. So I hope that Julia or anything else with better parallel support replaces Python for scientific calculations when scaling quick and dirty solutions is straightforward.

[1] http://scipy-cookbook.readthedocs.io/items/ParallelProgrammi...


The GIL has been 'addressed' regularly since python 1.4, just no one has come up with an acceptable solution.


By 'addressed' I think he means 'solved'.


Julia is such a delightful language! It allows rapid prototyping and at the same time it runs fast. It has a great REPL, package manager and workflow with Revise.jl.


I'm sure this is too late to get much visibility, but I recently looked into using Julia (for my MS thesis) and found it sorely lacking in one major way that I found unforgivable.

Their type system is pretty interesting, and allows for some really cool abilities to parameterize things using types. I'd like to have seen more work done on, effectively, strong typedefs (or whatever $lang wants to call them). However that sort of thing is fairly uncommon so it's hard to hold it against them too much.

The biggest issue, and one they seem unwilling to really address, is that actually using the type system to do anything cool requires you to rely entirely on documentation which may or may not exist (or be up-to-date).

Each type has an entirely implicit interface which must be implemented. There is no syntax to mark which methods must be present for a type. No marker for the method definitions, no block where they must be named, or anything like that. You can't even assume you'll find all the interface methods defined in a single file because they can appear literally anywhere.

Whoever wrote the type has in mind some interface, a minimal set of methods, that must be present for any derived type. There are only two possible ways to determine this. The first is to look to the documentation. Even for the basic types defined by Julia this documentation doesn't seem to exist for all types. I don't have high hopes for most libraries to provide, and keep up to date, this documentation either. This concern gets even greater when considering the focus is largely on scientific computing.

Without up-to-date documentation, the only option is to manually review every file in a library and keep track of the methods defined for the type you're interested in. With multiple dispatch, you can't even get away with just checking the first parameter either. Then you need to look at the definitions for those methods to narrow your list down to the minimal set required. This is not an easy task.

This issue has been brought up before and discussed, but nobody seemed very interested in it. This is a fairly major issue in my view, as it cripples the otherwise very interesting type system. As it stands, it seems to be a fairly complex solution to the issue of getting good compiled code out of a dynamic language. It could be so much more.


This is truly an important issue. Right now, every interface represents something that needs to be documented by the author. The AbstractArray and Iteration interfaces are well-documented, but the AbstractDict interface isn't. I believe that documentation for an interface is enough, but I also don't think enough people will take the time to write it. So I agree there should be a technical solution.

The main reason this has not been implemented as a language feature is that people are worried about settling on a design that would be impossible to make fast and concise. It is certainly on the designers' radar, and was discussed specifically at JuliaCon 2017 in Jeff Bezanson's talk.

There are some people who plan to attempt a trait system as a package on Julia 1.0. Perhaps this will be successful and we won't need language changes! Stay tuned.

As an aside, I wouldn't take the lack of action as lack of interest. People are interested, but it wasn't prioritized yet. It will get effort and attention!


I think you need language changes no matter what. I've seen some of the previous trait packages and while extremely cool, they're insufficent for tackling this problem for a couple reasons.

First, this extends deep into the core of Julia, and I don't see how a traits package would be involved with that.

Second, and this dovetails with the first issue, this needs to be something that people actually use as a default action. Part of that means ensuring the built-in types make use of this.

Related to all of this, I worry it's too late to really make a meaningful change here. The culture and existing packages are already set without it. Adding the feature as a requirement isn't going to happen anytime soon unless people are willing to continue to break things post 1.0. And it needs to be a requirement or it won't get used except by people that will already provide documentation.

The lack of interest I mentioned mainly came from some github issues which either received little attention, had creators that had a very "maybe it would be nice if" attitude, and responses that were questioning the benefit compared to the cost of implementing the syntax changes.


>First, this extends deep into the core of Julia, and I don't see how a traits package would be involved with that.

No, Tim Holy described over dinner how all that was necessary to complete the existing traits packages was method deletion, and that's in v1.0.


Interesting.

But as an external package it still has the issues I described as not being part of the defaults of the culture. Without that it just becomes a nice-to-have that only people serious about writing good quality, usable code are going to use. And these are the people most likely to have good documentation in any case.


Yes indeed, but the cultural issue can be addressed by adding it into Base in a 1.x since it's not breaking. I don't think it will make it into a 1.x though but it can.


It's not that people are uninterested; it's more that inventing a system for specifying and enforcing these kind of generic interfaces is a really hard design problem.

As a comparison, consider that the C++ standards body has been working on, debating and serially rejecting the various "C++ concepts" proposals for years now.


This is not that hard of a problem to solve to a useful level.

If you want to get fancier, yeah, it gets hard. But something as simple as a syntax block that lists the interface methods would be enough for now.

The fancier stuff is already projected so far out to at least 2.0 that I don't understand why a simple working solution wouldn't be desired for now. It would also simplify the work of changing code later to work with 2.0.

I really don't understand how this issue wasn't addressed a long time ago, and why it wasn't a blocker for 1.0.


I think the main two solutions here are introspection and a little TDD.

`methodswith` tells you all methods that been defined on a type. Since it returns an array of methods you can do some more introspection on that, if you're so inclined.

I've also just written a few test cases using the interface I'd like, then worked on methods until my tests pass (which can be just not throwing MethodErrors).

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: