Hacker News new | past | comments | ask | show | jobs | submit login
Jax vs. Julia (Vs PyTorch) (kidger.site)
141 points by sebg on May 4, 2022 | hide | past | favorite | 110 comments



Another massive frustration for me is that Julia has no formal way to say "here are public functions and these are private" but does have a completely orthogonal way of saying "these are functions that will populate your global namespace if you use `using`", i.e. `export variable_name`, and people absolutely confuse the hell out of these. I don't think there's even agreement in the Julia community if you should use `export` for your public API or not.

And if you misspell the exports or change the variable in question, Julia won't even warn you about it. That is straight crazy behavior to me, and I still don't understand how that hasn't been changed.

The `using` + `import` packages in Julia combined with how `export`s work make it SUCH a confusing and frustrating experience for beginners in Julia.

I personally like mathematical symbols when I'm writing and reading code in my domain, but I do feel very lost when I'm reading Julia code outside of my area of expertise. All my colleagues hate it too (hard to grep, hard to type if you don't know the math and are just a software engineer) and I'm coming around to the idea of not using it or documenting it explicitly.

The fact that mutable structs are easier to use but immutable structs are more performant, the lack of composition of fields the way Go handles it, the lack of traits or interfaces, the sorry state of compilation time, the non existent tooling all lead a beginner / intermediate Julia developer in the wrong direction in my opinion. It's very easy to write code that is straight up broken or just not efficient in Julia, and that's probably why I won't pick it for a big project going forward.

But I'm still keeping my eye on it. Maybe in 5 years it'll be the language for a lot of the jobs?


I use export for my public functions. What is wrong with that? How exactly does Python signal what is public? Nothing but conventions like underscore, which btw Julia developers use as well. If I really really don't want somebody to use a function, I might underscore it but generally I don't see the point.


The underscore naming convention is far from ubiquitous in either the Python or Julia communities.

The more practical indicator of a function being "public" is whether or not it appears in the published API documentation. Unlike Julia, Python projects are more likely to have complete documentation (or even documentation at all). So there is a better understanding of what "public" means in Python. Without docs, it's a guessing game.


I agree documentation is a problem with Julia, but I dislike the lack of nuance when people discuss this. There is no doubt in my mind that the core Julia libraries are far better documented than the Python equivalents.

Just do something simple like looking up help on `print` in the REPL on Python and Julia. The Python documentation is very primitive compared to what you get in Julia.

Julia has a better standard documentation system than Python and have in the standard library started a better convention for how to document things which means that as this spreads out to other third party libraries, Julia has the potential to be better documented than Python.

What you see today is immaturity rather than signs of an inherently bad system.


There are effectively private functions in python though.

Double underscores before the name makes it impossible to call them from the outside. It's more like typescripts private however, so it's still technically possible to call these functions in a specific way, it just won't happen on accident.


Yeah I really want to like Julia but for all the reasons you mentioned I’ll just use python. Also python is making a ton of improvements and looks better by the day


You list some reasonable criticisms, but how is 'immutable structs are more performant' a valid criticism? That's just because compilers can do more optimizations with immutables. Unless you want immutables to be intentionally hobbled, what exactly do you expect the Julia devs to do about it?


I feel called out on the academic part hahaah. I simply want to code state of the art (thermodynamic) models, and at least julia helps by providing easy testing and publishing infraestructure. but obviously we can't compete with a corporation in code quality (we are trying!)

Unrelated, but for small sizes, i really prefer to use forward mode in julia (Via ForwardDiff.jl) instead of Zygote. the overhead of reverse ADing over an arbitrary function with mutation is not worth it.


In the context of neural networks with differential equations (which appears to be the original poster's field), the trade-off depends: https://diffeqflux.sciml.ai/dev/ControllingAdjoints/


yeah, my systems are really small in comparison (1-20) but with higher order derivatives (up to 4th order), so reverse AD is not the best in that regard


There are several repeating (valid) critiques. There is also a sense that things won't get better. I don't think that's necessarily the case.

I can't give timelines, but I'll list the themes and point to the very active work going on in those areas:

1. Static analysis/ it's hard to write large codebases of correct Julia code.

That's being addressed on several fronts:

a. Work on both traits/interfaces and sound static typing is progressing in Jan Vitek's group at Northwestern

b. User extensible type lattice for correctness proof of programs. See here: https://news.ycombinator.com/item?id=26136212

c. Jet.jl already catches method errors at JIT time. It's under-used IMO.

2. Compilation time and package dev latency.

a. We're getting close to native code caching, and more: https://discourse.julialang.org/t/precompile-why/78770/8

As you'll also read, the difficulty is due to important tradeoffs Julia made with composability and aggressive specialization...but it's not fundamental and can be surmounted. Yes there's been some pain, but in the end hopefully we'll have something approximating the best of both worlds.

b. Runtime free static compilation: https://www.youtube.com/watch?v=YsNC4oO0rLA

c. Semantics for separate compilation: https://gist.github.com/JeffBezanson/dd86043ef867954bd7e2163...

Some combination of the above should also address deployment scenarios, whether that be CLI, mobile, browser or even embedded (Yes, with an (expanding) subset of code, StaticCompiler can produce binaries down to the 10s of Kbs with real working Julia programs that use non trivial abstractions like subtyping and higher order functions)

3. Zygote is rough

a. Enzyme will do (almost) whole language AD, including mutation.

b. Diffractor.jl


Do you have a link to some information regarding the work on traits/interfaces and static typing, please?


In his item #1, he links to https://discourse.julialang.org/t/loaderror-when-using-inter... The issue is actually a Zygote bug, a Julia package for auto-differentiation, and is not directly related to Julia codebase (or Flux package) itself. Furthermore, the problematic code is working fine now, because DiffEqFlux has switched to Enzyme, which doesn't have that bug. He should first confirm whether the problem he is citing is actually a problem or not.

Item #2, again another Zygote bug.

Item #3, which package? That sounds like an hyperbole, an extrapolation from a small sample, and "I'll avoid naming names" is a lazy excuse that would hide this. It is similarly easy to point to poorly written (or poorly documented) JAX code or Python code as well, so that doesn't prove that "Julia is lacklustre and JAX shines". Also, as an academician, I strongly disagree that learning_rate=... is better than η=..., but that's a matter of convention & taste, with no bearing on the correctness or performance of the package/language. It's bikeshedding. I agree that errors are usually not "instructable" in Julia ML packages (which needs to be improved), so monkey typing is less likely to succeed.

Item #4 is such a nitpicker. Sure, Julia may not have a special syntax for that particular fringe array slicing like Numpy does and you'll instead need to make a function call, but matrix/tensor code in Numpy is usually filled with calls to zips and cats with explicit indices, whereas in Julia, no explicit indexing needs to be done usually. Also, one can nitpick similarly in the opposite direction: Julia has many language features lacking from JAX or Python, why not talk about those as well?

I'm not a huge fan of Julia either (mainly because of it's garbage collected nature), but this is such a low-effort criticism of it.


I’ve been using Julia since 2017 and still do on a day to day basis, and I agree with the author in a lot of cases, even his subjective naming conventions gripes.

The author’s biggest criticism is that Julia doesn’t have tooling to make the developer experience better. There’s Revise, JuliFormatter, LanguageServer and Jet, but the development experience in Python is enviable. There’s like 3 different REPLs, at least two competing linters and auto formatters. It’s okay to admit that these are places Julia is lacking.

I think your kind of response to criticism about Julia is what gives the Julia community a bad name, in my opinion. What is wrong with saying these things suck and need improvements? Would you rather Julia not improve and stay the way it is right now forever? Surely I hope not.


There's something a little strange and subjective about saying "a problem with language X is that third party package Y has a bug." Y != X. If Y is some tiny package that hardly anyone uses or cares about, then bugs in Y don't imply much about the typical experience of using X. But if Y is a very common package, practically essential to everyday workflows, then bugs in Y do have implications for the typical X user.

The subjectivity arises in deciding whether some third party package Y is totally common, essential, de facto part of the language itself, or esoteric, negligible, who cares. I perceive that a lot of the back and forth around "is Julia good" basically boils down to this.

"I don't like Julia! I tried to use it and <package> had an annoying bug!" "Okay, but <package> isn't part of Julia itself or even written/maintained by core Julia contributors, so bugs therein are irrelevant for the awesome multiple dispatch speedy beauty of Julia!" "Hard disagree, <package> is the de facto standard for <common task Julia should be good at> and maintained by <important prominent figure in the Julia community. If even such a high visibility package is broken, the whole ecosystem must be rotten!"

The parties are just talking past each other here; there's no ground truth. We'd all be better off if we were more explicit about whether comments applied to (a) the core language itself (b) core language and standard lib (c) the universal experience of working in Julia, i.e. packages that nearly everyone encounters (or lack thereof) (d) packages which are key for certain kind of work but irrelevant for others; auto differentiation would seem to go here (e) niche packages.

Disclaimers?: I love Julia-the-language; it is my favorite programming language by a mile. My brain seems to work similarly to Julia-the-language, so programming in Julia feels fluid and effortless to me; no other language sparks joy the same way. But Julia-the-ecosystem certainly has its holes and weak spots. And the limited size of Julia-the-community means even relatively prominent packages can be less maintained than e.g. Python analogues. Example 1: it's unnerving the extent to which the Julia data ecosystem rests on the heroic efforts of a few people [1]. Example 2: even relatively mainstream packages can have issues and PRs unaddressed indefinitely, e.g. I like to plot with Gadfly.jl but am nonetheless frustrated that I've had an open issue there and an upstream PR for over a year.

[1] https://github.com/orgs/JuliaData/people


I partly agree with you, in that the author is mainly complaining about Flux and other packages they've used.

But on the other hand, all the reasons the author has listed as annoyances I've experienced the exact same things and more. And I'm definitely not blaming the package maintainers for this. There's either 1) something wrong with the Julia language that results in code that is harder to maintain at scale, or 2) missing tooling to help package maintainers solve some of these problems.

In my personal opinion (and also almost universally the opinion of 20-25 of my colleagues some of whom are way smarter and more experienced than me), it is a little bit of both. Tooling like JET for linting is just taking off, and for years people have been writing and maintaining packages without linters. The ONLY way to ensure the Julia code you write is to write LOTS and LOTS of tests. In Python, if I use type hints and run mypy a whole class of errors is avoided. That's not the case in Julia. At work, we maintain a close to 10,000 line Julia code base and honestly if we'd known the tooling was going to be the way it is, we'd not have chosen Julia.

The fact that startup time is so painful + coupled with the fact that tests are basically the only way to guarantee that you are writing correct code, makes it EXTREMELY painful to work in a large team. Our internal CI takes hours to run for even the smallest changes. Even a small scale refactoring takes weeks of review from everyone involved and it is a time consuming frustrating and expensive process for the organization.

I've worked with similar size codebases in Python and C++ and have had no where near this kind of difficulty.

I can't find the link to the talk right now, but Bryan Cantrill talks about "values" of a programming language / ecosystem. And based on the way Julia is designed and implemented, I'd say the "values" are "run time performance", "scientific applications", "research suitability" (ease of creating environments, DataFrames.jl for data science etc). I think "correctness" as a value is FAR from the priority in this language, which incentivizes researchers to write code that gets the job done quick but fails to be long term maintainable or viable.

The worst part is I don't know how Julia will ever solve these issues. Leveraging LLVM to generate fast machine code at runtime is the very basis of the language, so you'll almost always (unless static compilation happens) pay a penalty at start up and first run. Using the type system for dispatch means that static analysis is just going to suck. How JET works is a mystery to me, so I'm not sure what the extent of the improvements there can be, but based on talking with others in the Julia community, I don't think it'll ever rival Python's or CPP's static linting capabilities.

The Julia devs are constantly working on improvements, so I expect things will get marginally better and maybe in 5 years or so it'll be a tolerable experience. But all in all, I'm not holding my breath.


I guess the Cantrill talk you are referring to must be 'Software as a reflection of values' from 2018:

https://www.youtube.com/watch?v=2wZ1pCpJUIM&t=128s


Please see my comment here: https://news.ycombinator.com/item?id=31269739

Based on the above, I expect more than marginal improvements in these areas. Would you agree?


Just trying to use this thread for a market demand survey - Julia devs, would you pay $9.99 a month for better tooling? [Maybe responses to this will encourage devs to notice that there is a viable market here?]


Yes. But not to pay for a software license. I'd pay $10 a month as a donation to a group improving Julia though.


That question is rather cryptic. Your profile LinkedIn url also might be broken? (For added mystery maybe)


I would, for linting and vim integration comparable to what’s available for Python.


What exactly do you think I said about Julia's linters and auto-formatters?


Yup Julia looks nice on paper but it’s devx is really bad


> but that's a matter of convention & taste, with no bearing on the correctness or performance of the package/language. It's bikeshedding.

I heartily disagree with that. Bikeshedding is about focussing on the trivial, and the symbols we choose in our codebases are hardly trivial, and indeed many of us regard naming things as one of the central problems in programming[1].

[1]:https://medium.com/hackernoon/naming-the-things-in-programmi...


Unlike the example in the link you give, η isn't a generic random name like a,x that can mean anything. If you ever read a paper on stochastic gradient optimization, you'd know that η means learning rate in the context.

It is bikeshedding because it is analogous to insisting that using "angle" instead of "θ", or "radius" instead of "r" in a 2D geometry library is superior and takes your code from being a lackluster to something that shines (in the words of the original author), while not having anything useful to say anything about the mathematical/technical aspects of the code itself.

Here is the definition of bikeshedding:

> The term was coined as a metaphor to illuminate Parkinson’s Law of Triviality. Parkinson observed that a committee whose job is to approve plans for a nuclear power plant may spend the majority of its time on relatively unimportant but easy-to-grasp issues, such as what materials to use for the staff bikeshed, while neglecting the design of the power plant itself, which is far more important but also far more difficult to criticize constructively. It was popularized in the Berkeley Software Distribution community by Poul-Henning Kamp[1] and has spread from there to the software industry at large.

from https://en.wiktionary.org/wiki/bikeshedding


My interpretation of the point in the blog post was that explicitly spelling out variable names makes APIs and the underlying code much more accessible to a wider audience.

Sure, there'll be a subset of users of these libraries that have read ML/textbooks and are familiar with what η means in this context.

Today, many (most?) users of ML libraries will probably not know what η means without looking it up. Adhering to mathematical notation puts up an unnecessary barrier to using the API/code and ultimately limits wider engagement/collaboration.

To attract a bigger slice of the ML community, choosing names that the ML hobbyyist can read, understand and use without pause is the better path forward.


You are saying most people don't know what η in that context means (=people who likely haven't read a book or a paper on stochastic gradient, and don't know how it actually works), but they would somehow magically figure out what it actually does if we call it "learning_rate" in ASCII letters. How does that work?

FYI, the documentation of the function https://fluxml.ai/Flux.jl/stable/training/optimisers/ explicitly says it is learning rate:

> Learning rate (η): Amount by which gradients are discounted before updating the weights.

so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.


> How does that work?

You can look up "learning rate" much easier than to look up "what is this Greek letter on my screen" followed by "what is the use of this Greek letter in my context" and only then followed by searching for "learning rate"

More importantly, it's possible to know what a learning rate is without knowing what Greek letter it's commonly denoted as. Especially since mathematical notation is so inconsistent across authors. I want less ambiguity in code, not more. Explicit is better than implicit.

Mathematical notation is notorious for being an absolute mess of inconsistencies. Who in their right mind looked at it and went "yep, I want more of this in my source code".


This depends a lot on the target audience for your code.

For research-focused code, it is likely that whatever you're implementing was initially described in terms of mathematical notation (e.g., in a paper or book). It can be helpful to have variables that unambiguously match that canonical source. In fact, a lot of my Julia code has docstrings containing references/links to the original paper and a comment noting that it uses the notation therein.

This sidesteps the problem where textual descriptions like `learning_rate` can sometimes be ambiguous: is it the original learning rate, or perhaps the current rate after applying some sort of schedule or decay? I think the Flux documentation is pretty close to ideal, in that it's got a symbol you can match against equations (though no reference to them) as well as text that you can search to learn more.


You are not answering the (rhetorical) question that you quoted, and the answer to your response is already in the paragraph that followed it:

As I said, the necessary keywords for Googling it, along with a brief description is already present in the documentation.

The quibble here is about the necessity of reproducing the all the necessary keywords for an accurate Google search during every single function call.

> Mathematical notation is notorious for being an absolute mess of inconsistencies.

According to whom? What exactly is inconsistent?


> The quibble here is about the necessity of reproducing the all the necessary keywords for an accurate Google search during every single function call.

No, that is not the quibble. My quibble is with choosing identifiers that make code less legible when taken by itself. The best code teaches future readers about how it works.


The struct's field name is `eta`, but this is an internal detail. Its constructor takes a positional-only argument, no public name.

The greek letter is used in the documentation. And the reason is that every optimiser's documentation links to the original paper, and tries to follow that. If the Adam paper calls the two decay rates β1, β2, then staying close to that seems the least confusing option.


Perhaps I'm missing your point, but I think you're focussing too much on the specific case that someone who isn't me came up with.

My most general point is that the identifiers we use in our code are almost never just convention or taste when we are sharing that code with anyone else (and for most, "anyone else" includes our future selves). Getting a little more specific, I'm specifically interested in Julia and look forward to working in it further, but I've personally felt pain around scientific/mathematical notation when trying to understand code I've found on github. tagrun dismissing my pain as nonsense and the people who argue for my ilk as perpetrators of bikesheddding is dipshitted. Yeah, I'm probably the asshole for being a college dropout trying to leverage modern scientific computing for my own ends (snoogins), but I'm also willing to bet tagrun is probably the member of a team that talks down to junior members and complains they haven't read enough papers or the right papers to see the magnificience of their code ;).

It's fine to write code that demands a domain expert to understand, but don't pretend like its good across all dimensions. There are tradeoffs involved.

Personally I find the preponderance of scientific/mathematical notation (whatever you want to call it) in Julia to be cute; It certainly does bind the code to linked papers in a pretty cool way when it all fits together properly. That said, its a pain in the ass when it doesn't fit together properly and I've personally had a journey into Julia spoiled due to frequency at which I had to figure out how to notate something or what word to use when regarding some squiggle I haven't encountered before. I look forward to having a better intuition for the greek alphabet but until then Julia will often be harder to read, let alone understand when compared to ruby or javascript or go or C# or any other of the roughly dozen programming languages I've worked with and feel comfortable translating between.


> > Learning rate (η): Amount by which gradients are discounted before updating the weights.

> so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.

As far as I can tell it's a documentation complaint. He has to remember "η" from the line with the signature, past the line "Gradient descent optimizer with learning rate η ...", and a heading "Parameters" until the line quoted which explains this in full.

He says this is the API, but that's inaccurate. The API being explained is that the first positional argument is the learning rate. It's not a keyword argument, so you cannot supply it by name. What variable names are used in the code is private, and in fact the struct's field name is `eta` so that you can access it without typing greek.

If this makes the top 10 list (even the top 10 list of documentation complaints) then Flux is doing OK. Especially the top 10 list of a guy with a PhD in a mathematical field. (From the sort of university which used to require students to know latin & greek, too.)


> Unlike the example in the link you give, η isn't a generic random name like a,x that can mean anything. If you ever read a paper on stochastic gradient optimization, you'd know that η means learning rate in the context.

Why should reading a paper on stochastic gradient optimization be a prerequisite to understanding your idiosyncratic choices for identifiers? The fact of the matter is that I can understand code much better than acedemic prose. I'll learn the code and then supplement with the paper as needed. By using idiosyncratic identifiers you're gating off your code from people who haven't jumped through the same specific hoops you have and have the same mental muscles you have developed.

> It is bikeshedding because it is analogous to insisting that using "angle" instead of "θ", or "radius" instead of "r" in a 2D geometry library is superior and takes your code from being a lackluster to something that shines (in the words of the original author), while not having anything useful to say anything about the mathematical/technical aspects of the code itself.

No. To someone who doesn't have an established mental muscle for mathematical notation it is analgous to using thai script to write for an audience that primarily reads english: I can still use google translate, but the cognitive load is much higher to members of the audience who are native thai. That isn't bikeshedding, that's caring about understandability.

> You are saying most people don't know what η in that context means (=people who likely haven't read a book or a paper on stochastic gradient, and don't know how it actually works), but they would somehow magically figure out what it actually does if we call it "learning_rate" in ASCII letters. How does that work?

I learn from code a whole lot faster than I do from acedemic prose. I usually start with code then read the whitepapers the code refers to as I go. I learn slower from code that uses symbology I'm not familiar with. In the context of learning a new code base, unfamiliar symbols are bad in several ways.

> so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.

My quibble is with the assumptions I think you make about what comprises good code quality while at the same time having suffered through code from people who share your attitudes. Naming matters to me in more ways than you're apparently versed with. The article I linked is just one small discussion on naming but not comprehensive by any means. I was linking it more in hopes that you would do further thinking of your own about a pretty wide subject. Just my opinion.

Furthermore, the page on FluxML demonstrates the problem I'm referring to. Just down from where you linked you'll find an entry like `RMSProp(η = 0.001, ρ = 0.9, ϵ = 1.0e-8)` in which `ϵ` is described nowhere in the entry. It's a random symbol that I understand to be used usually in set membership notation, but in this context (the context of some random link some random person posted in the interwebs) I have no clue what it means and thus it is a barrier to my understanding.


Unicode should not be in public APIs. This is a standard around Julia. Flux is breaking the standard. Yes, it's not a good thing.


The unicode epsilon isn't in the public API, it's describing the 3rd positional argument.

This was added recently, and for some reason the PR (1840) didn't fix the docs, which is bad. The Optimisers.jl version has an explanation: https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.RMSProp


Interesting! do you have any links talking about that standard? I'm super interested in Julia and this seems like a good opportunity to learn something I've been missing so far.


I'm not sure if/where it's formalized, but it's just generally something that's been enforced throughout Julia's Base, along with many of the package organizations (like SciML among others). It's something that would be mentioned at code review time by most contributors. It's why you don't see unicode keyword arguments. There's a lot of reasons. I think the best one is that you want the API to be compatible with old terminals you tend to get on HPCs which do not tend to support unicode. We should probably make it a part of the standard formatter rules or something at this point.


Word. Thank you for the response.


> In his item #1, he links to https://discourse.julialang.org/t/loaderror-when-using-inter... The issue is actually a Zygote bug, a Julia package for auto-differentiation, and is not directly related to Julia codebase (or Flux package) itself. Furthermore, the problematic code is working fine now, because DiffEqFlux has switched to Enzyme, which doesn't have that bug. He should first confirm whether the problem he is citing is actually a problem or not.

> Item #2, again another Zygote bug.

If flux chose a buggy package as a dependency, that's on them, and users are well justified in steering clear of Flux if it's authors are not in habit of auditing the depencies they pull in. As of today, the Project.toml for both Flux and DiffEqFlux still lists Zygote as a dependency. Neither list Enzyme.

https://github.com/FluxML/Flux.jl/blob/master/Project.toml

https://github.com/SciML/DiffEqFlux.jl/blob/master/Project.t...


> If flux chose a buggy package as a dependency, that's on them, and users are well justified in steering clear of Flux if it's authors are not in habit of auditing the dependencies they pull in. As of today, the Project.toml for both Flux and DiffEqFlux still lists Zygote as a dependency. Neither list Enzyme.

For DiffEqFlux, it's just for backwards compatibility to Flux. DiffEqFlux is a weird library because it's been "eradicated" over time. There was a point in time where it defined some necessary things to do the things in its documentation. At this point, for most tutorials it's not even required that you have the DiffEqFlux library to do it. That makes it a rather odd library haha. The transition is:

- GalacticOptim.jl has become a fully-fledged optimization package formalizing some of the heuristics and multi-package support it was using internally. https://galacticoptim.sciml.ai/dev/

- The adjoint overloads are handled all by DiffEqSensitivity.jl at this point. If you try to use them without having that library then you get an error asking you to install it. DiffEqFlux.jl just reexports it. This is why you don't see the Enzyme.jl dependency.

- Enzyme.jl has gotten better and better over time, and has become the go-to for DiffEq. In fact, we use a polyalgorithm under the hood that tends to prefer Enzyme.jl and ReverseDiff.jl for the autodiff over Zygote.jl, so it's weird that in 2022 any comparisons still feature Zygote.jl given that, internally, it's almost certainly not using Zygote. Zygote is what the user sees and interacts with, but it's not the core. (Even then, that will be changing soon with the coming Enzyme.jl overloads)

- FastChain was a setup to get around issues with Flux, but that has become a full-fledged library of its own, Lux.jl, which isn't completely done yet but will be how those pieces get deleted.

So yes, the state of DiffEqFlux.jl is that it has been a fast moving library to the point of its own destruction, basically just holding what would become "hacks" to make the high end work that were then slowly absorbed to become things that "just work" when using Julia. What the library is pivoting towards now is just being a high-level interface for common machine learning use cases of differential equations, like for defining FFJORD or continuous normalizing flows architectures, which you could do from scratch but it's nice to have somewhere that these are all defined. What to do with those tutorials, who knows, move those to DiffEqSensitivity.jl docs, and then we need some kind of inter-module documentation so that way people can easily see the 25+ docs of SciML in one website (or whatever the number is).

But honestly, while it becomes a documentation mess to eradicate a higher-level library like this, this has been our dream with Julia over the last few years. Having no place to describe the differentiability of solvers means differentiable programming is truly working. With Enzyme's improvements and such, we're supporting even things like mutation, which is progressively making almost any Julia code you write "just work" with automatic differentiation. No hacks are required to make it work with some underdocumented sublanguage (cough Jax). As everything becomes automated, it becomes harder to document it as a feature because it's instead just a property of being a Julia library.


I strongly agree with readability in my opinion its cause Academia people live in "bubbles" and they assume everyone knew what a domain specific terms and greek letters means so its easier to read some omega then for example learning_rate or lr.

But for us mortal who cross multiple domains its just getting extremely frustrating to read full math based notation without any extra info about notation in package/functions etc. so debugging multiple sub-packages is just getting too time consuming as you have to learn both person style of writing code, whole scientific notation and get domain knowledge before you can even touch the code.


> Academia people live in "bubbles" and they assume everyone knew what a domain specific terms and greek letters

Naming things by their English name is not more universal than using Greek letters. It's just serving amother group of people who live in a different bubble.


Yes and no, the example that the author gives is actually a very good one:

> Many Julia APIs look like Optimiser(η=...) rather than Optimiser(learning_rate=...). This is a pretty unreadable convention.

The learning rate is a well known name that basically every one will understand, on the other hand, "η" or eta, is not even used everywhere in the literature with some papers using alpha instead.

This just looks clever, it's a pretty bad parameter name.


> The learning rate is a well known name that basically every one will understand

Absolutely! Because as we all know, everyone speaks English.

The GP's point was that greek letters are used in lots and lots of papers even written in other languages. I have read quite a few papers in Japanese that used exactly the same conventions with respect to the greek letters and latin letters used.


How many researchers in the ML/DL community don't speak English? I don't have hard numbers but I highly doubt that it's a significant proportion. What is the reach of your Japanese papers when almost no-one outside of Japan can read Japanese?

Even China, despite their best effort to de-westernize their culture still uses English in their research papers.

And if all the above wasn't enough, Julia's libraries are still all in English so if an hypothetical researcher's English is so poor that they don't know what "learning rate" is, I'd venture that they'll have trouble programming in Julia/JAX/PyTorch.


How many don't speak it as a native language? Quite a lot as most of the world uses something else as their primary language.

If you're instead asking of how many can struggle trough an english text supported by machine translators, then that's clearly almost everyone.

There's very often a significant gap between the ease with which the native and the foreign language can be used for reasoning, but surely I don't need to point that out since any bilingual person knows this.


The programming language is already in English, implying that using greek letters to map to math concepts is easier to understand for non-native is disingenuous.


It's really not. There are just a few programming language-specific keywords that you have to memorize, unlike infinite possible combinations of English words for parameters.

edit: as an example, when programming for Brazilian businesses, we generally just use a mix and match of Portuguese and English like getFinanciamento. "Get" is a technical concept while "Financiamento" is a business concept and it often doesn't make sense to translate either. Best just to leave them both in their respective domain-native language.

In case of maths, Greek IS the domain native language. Now, I wouldn't necessarily enjoy a codebase with Unicode variables for other reasons, but using Greek is best to map directly from the equations.


I am not debating the merits of writing your business logic in whatever language that you want.

Flux, the Julia framework that we are discussing, is in English. Its classes, function names, optimizers, everything is in English. Keeping parameters as Greek letters and arguing that it's easier to understand is insanity.


I am replying to a comment in which you were talking in general terms. Maybe it's insanity because typing Greek symbols in most general-purposes IDEs is hard, querying it in search engines can be tough, etc. Not because it's mixed with English names.

Classes are a programming concept, they do map to the programming domain. Even optimizers and functions probably do map to a named mathematical concept, not to a Greek letter. But prameters probably map to a specific symbol in an equation.

In Mathematica, for instance, we use Greek letters and it allows you to just type an equation in the proper visual form. So much easier to map it when you're working with the actual domain-native names and formats.


> Absolutely! Because as we all know, everyone speaks English.

I understand the sentiment here, especially as English is not my native language, but for many domains, for anything close to the state of the art, English is lingua franca.


Google Translate is one click away. I can easily translate both Japanese and Chinese comments and variable names to get the gist of it. Using single hieroglyphs for it makes the entire endeavor impossible.


So, you are fine with google translating variable names but you absolutely cannot interpret symbols? Wow.


Still better to google translate the meaning of 学习率 than do the same on eta...


That's ridiculous.


I can assure you that more people exist with B1 knowledge of english then folks that have at least minimum knowledge of all math and computer science related domains that are used for deep learning (even subset like audio/video) even if we use only using software engineers/ml engineers as reference you have so much domain knowledge needed to even consider reading notation (without proper explanation in docs).

And most of the variables in code still will be in english or other natural language so it is just easier to read english then read math notation mixed with english/other language for MOST (not all) people using underlying software.


I agree, and from my math/APL bubble, let me have my succinct symbols. I invoke the famous Iverson 1979 Turing Award Lecture, "Notation as a Tool for Thought" [1]. If you are in ML and do math to some degree, learn the symbols; it's more than just about succinctness.

[1] https://www.eecg.utoronto.ca/~jzhu/csc326/readings/iverson.p...


> Naming things by their English name is not more universal than using Greek letters.

It is - the people who understand the Greek-letter notation will understand the English words but the inverse statement isn't always true.


You're entirely ignoring the size of the bubbles.


I understand both camps but I believe, these are superficial problems. It's like worrying about the comfort of seat in the operating room of a nuclear plant.


Exactly. I actually find Julia's ecosystem (not the language) way more approachable than Python's.

In Python, most libraries are big monoliths. Whereas in Julia, libraries are small and composable. Furthermore, it's the same language all the way down.

Python's libraries are superb, but the learning curve to develop (not to use them) is really steep.


I don't understand. What do you mean by "learning curve to develop" an existing Python library?


Python's story for package development is a mess compared to Julia. If you know what you're doing you can make a new Julia package in the general registry with unit tests, CI, and code coverage in about 30 minutes.

To contribute to an existing repository, you can ]dev the package to download the git repository, @edit a function to open the source in an editor, and push the changes to a PR in about 10 minutes. This makes it much easier for package users to turn into package developers.

I've contributed to a bunch of Julia libraries that if I still used python, I wouldn't have contributed to the equivalent python library because contributing to Julia packages is at least an order of magnitude easier.


I imagine it has to do with the fact that lots of python libraries are mostly c/c++ or fortran, while Julia packages are usually just Julia.


superficial you say. How about I name these in chinese in my package?


You're annoyed to the point of overreaction I'd say. I assume julia's devs use symbols that are culturally set already. Not just random glyphs for no good reason.

I've seen this in math, I come the dev land around java peak, where it was a holy trait to write runComputationThatIsMaybeSafeOverArguments(...), because that's where they though information was. In math they lived in the opposite side of things.. names were few and concision was more important. Also structural thinking made names really unimportant, mostly redundancy and waste of time, the domain and properties were.

I may be wrong though.


Sure, just try to properly document what it does. There are some characters that are easily confused at first glance even for native speakers, so be sure to use some common sense.


And if you ever do want to edit the code, you have to know the name of every non-ASCII symbol the codebase uses if you want to type out those same symbols without copying and pasting them. If you're not familiar with the material, entering a character like ξ can be a real challenge, and is actually more keystrokes than just typing "xi".


Difference is 1 keystroke: you type \xi. Why is that a "real challenge"?


The challenge comes from a person like me, who doesn't know off the top of my head which Greek letter ξ is. So for each of these symbols I'd have to google them and learn it, or have some notepad where I can copy paste the needed symbol


You should probably take the time to learn the names of the Greek letters. There are only 24 of them and they're even related to the English ones. It's not a huge time investment, and it's probably worth it if you work in engineering.


So you don't know the Greek alphabet, but write high-performance computing code involving non-trivial math? (Julia's main use case is HPC)


What? Did you not read the parent of this thread?

> strongly agree with readability in my opinion its cause Academia people live in "bubbles" and they assume everyone knew what a domain specific terms and greek letters means

In any case, I program HPC stuff myself with pytorch and no - I don't know the Greek alphabet and probably don't understand "non trivial math". The assumption that these people can't contribute is pretty off-putting honestly. More engineers would join such efforts if there wasn't so much gatekeeping.


So you think there's a cabal of "gatekeeper" scientists who are trying to hold the hordes of capable (yet somehow unfamiliar with the corresponding mathematical literature) engineers from making useful contributions to their libraries, because they use the conventional notation from the scientific literature?

They should instead tread carefully and dispose of all Greek letters in their code to ensure you are not put off, because that's the real reason that stops you from making useful contributions?


> So you think there's a cabal of "gatekeeper" scientists

No. I suggested I myself have seen gatekeeping and referenced your comment as example. Make of that what you will, but don't extrapolate it to a conspiracy theory I didn't suggest.


The author example is using the greek letter "eta" instead of spelling out "learning_rate", this is a pretty damning example.


What editors does this work in?


Juno, Jupyter support it out-of-the-box. With plug-ins: VS Code (unicode-latex), Atom (latex-completions), Emacs (which has TeX input), Vim (latex-unicoder), ...


Any editor, on Linux, if you have your keyboard set up to type Greek letters and the Unicode symbols that you use frequently. I can do it directly in the comment box: ξ


Sure, and that works great with rg and grep.


For me it’s one keystroke, with the dead-Greek modifier key.

I’ll do it right here, by typing <G-x>: ξ


How do you go from reading ξ to saying "oh I need to press <G-x>" ?


Because I recognized the Greek letter that we usually represent in English as “xi”. In fact I wasn’t sure, but I guessed x, and it turned out to be right. For each letter there’s usually one obvious latin-alphabet equivalent.

I bet you know right away how to type α and β.


Ouch! As a big Julia fan I cannot exactly complain about these criticism. They seem overall very fair. Although some things like tracking down problems I think is affected a lot by what you are used to. I am way slower figuring out problems in Python than in Julia.

But the heavily academic code and often poor user documentation for third party packages is a very real problem. I would like to point out that I think the Julia core libraries are better documented than Python. But third party stuff often has a long way to go.

I also think Julia may suffer a bit from people writing a lot of CLEVER code. I am computer science guy who feel lost in the midst of all these brainy Julia PhDs, postdocs and researchers. I think maybe many of us more regular programmers have a bit healthier habits in writing code a little bit less clever and and perhaps more verbose.

It is not all bad though. I think that the APIs to a lot of Julia libraries are quite well thought out. It is more the inside of a lot of these packages which can look a bit messy.

Perhaps Julia code quality and documentation can improve when the language starts expanding more outside academia and more among general programmers, industry etc.


> It’s pretty common to see posts on the Julia Discourse saying “XYZ library doesn’t work”, followed by a reply from one of the library maintainers stating something like “This is an upstream bug in the new version a.b.c of the ABC library, which XYZ depends upon. We’ll get a fix pushed ASAP.”

And

> First of all, it’s not in the Flux documentation. Instead it’s in the documentation for a separate component library – Zygote. So you have to check both, and you have to know to check both.

People love to rave about how the julia ecosystem is made up of tons of tiny composable libraries instead of large monolithic libraries like scipy. To me, that sounds like a major headache, and the quotes above are partly why. I'd rather read docs for one library than two. The chances that the maintainers of a large monolithic library break internal compatibility is way smaller than the chance that two disconnected projects become incompatible.


> People love to rave about how the julia ecosystem is made up of tons of tiny composable libraries instead of large monolithic libraries like scipy. To me, that sounds like a major headache

Absolutely agree.

First and foremost, I need to get my work done. I do not want to learn all those libraries.

The API design, its quality, philosophy (or lack thereof) of framework will be different across all the frameworks.

And I am not sure I want too many libraries in my codebase.

It is a whole different level of technical debt.


Tangentially related but there is an effort to get some of the features of JAX into PyTorch: https://pytorch.org/functorch/


""" Even in the major well-known well-respected Julia packages – I’ll avoid naming names – the source code has very obvious cases of unused local variables, dead code branches that can never be reached, etc. """ Please name names. that way we can fix stuff. Other than that, great post!


I think the author is pushing for better code quality tooling in Julia instead of having people manually fix these problems.


we absolutely should, but in the short term, fixing issues is good.


>> Where does JAX shine vs. Julia? ...Documentation...Code Quality...

I am not sure I need to go beyond that. The decision is somewhat easy here. I am surprised that integration is not mentioned. JAX is just Python (library) so it could be easily integrated into the rest of the system. It fits very well into Python/C skillset.


The main idea (both pro and con) with Jax is its "automatic performance" model. Essentially, Jax uses a tracer down to XLA which then does some linear algebra optimizations to generate code. For the DSL case where Jax is generally used, this tends to be "good enough" for a good number of use cases. The issue is that when it's not "good enough", it doesn't give package developers much recourse to really dig into the details and optimize for the things which the compiler (and compilers in general) is unable to do.

For example, we released the library SimpleChains.jl that was optimized for a lot of cases for small neural networks (https://julialang.org/blog/2022/04/simple-chains/). There was discussion about Jax, and we saw that for any case where individuals had a modern multi-core CPU threaded machine setup, SimpleChains.jl had about a 10x performance improvement on CPU over the Jax equinox library. This shouldn't be surprising to anyone because the details of how that was achieved was quite clear (memory handling, manual SIMD bits, etc.), though those of course are not the kinds of things you get "automatically" with either language. The main difference of having a full programming language instead of a DSL is that such code can be written to manually optimize what the compilers are not doing.

And this is the pattern that tends to repeat. For small static-like differential equation solves, Jax is fine, but for example the PDE code optimizations done in this tutorial (https://diffeq.sciml.ai/dev/tutorials/faster_ode_example/#Ex...) are completely incompatible with the Jax programming model and accelerate the PDE solve by more than 10x. And if you look into detail at things showing Jax is fine for PDEs, they are using the slower code style and comparing to that slower Julia code, not the faster iteration form. Of course, these are optimizations most people don't know to do so they are the kind of extra order of magnitude you see in Julia packages over Jax, but not as much in user code.

Another repeat of this pattern is the sub-optimality of vmap. It doesn't run code completely independently so it's not optimal in CPU cases where you'd want to effectively use multithreading or MPI, but it's also not generating kernels so it's not optimal in the case of GPUs (for example, we have a new GPU-based ODE solver coming out soon that outperforms the older Julia form and the Jax form by like 100x. Again, no surprises here because special-built GPU kernels have already demonstrated this as possible from C++ directly in this context). So vmap will always get you parallelism, but when you build a library you have to leave it behind at some point if you really want to keep optimizing.

I think what it does is admirable. If you can get a code to compile with Jax (which is a big if, it does require pure functions and a lot of extra things Python code normally doesn't satisfy) then it does get you something relatively performant. But there's a reason why what's mentioned here is Flux and Zygote, not SimpleChains and Enzyme, the real tools the Julia community is pushing towards with the full mutation support and the full speed behind it. Jax can compare in the former case, but not the latter case.


One of the biggest hurdles to my use of Jax is that it doesn't work on Windows, my daily driver. It's inconvenient when playing around with a new tool that requires a whole other OS to run. I'm hoping that gets figured out soon.


does it not work on WSL ?


I think WSL is the official workaround

There is also https://github.com/cloudhan/jax-windows-builder


The most next gen autodiff library probably is https://github.com/breandan/kotlingrad because of its features, ergonomy and type safety


Idk, Enzyme is pretty next gen, all the way down to LLVM code.

https://github.com/EnzymeAD/Enzyme


So enzyme is pretty cool but IMO the documentation is terrible and the API is absolutely insane.


Enzyme dev here. The documentation is not just bad, it is in fact actively missing for several things (like forward mode AD, split mode, recent vector).

Having recently finished up a number of big technical challenges (forward, vector, split, BLAS), we're hoping to have a big documentation-athon at the end of summer (all are welcome)!


Interesting. I'm mulling over integrating Enzyme into the LLVM D compiler so I may/will join in with that.


Do it! Would be happy to help. Also we have weekly development meetings (time about to change for summer), which you’re welcome to join.

Also feel free to email me (wmoses@mit.edu), would be happy to chat and or help bootstrap.


On HN I am often seeing libraries/blog authors react very quickly to references of their projects. I'm curious, how do you achieve this, is it a coincidence and where you reading HN comments, or do you have a crwaler/API than send you a notification when your project github link is found in a new HN comment ?


I was actually socializing with folks at the Cambridge Area Julia meetup where I gave a talk on Enzyme’s Julia bindings no less when someone shared this.

So in this case pure coincidence.


Thanks for the data point :)


Did you catch your plane?


I did!


Author here. You are very kind, but Kotlin∇ is definitely a work-in-progress and its claimed contributions are more modest in nature. It is true that Kotlin∇ was one of the first AD frameworks to introduce shape-safe tensor arithmetic, however we were not the only ones to realize the importance of this concept, which has since been adopted by more powerful type systems (e.g., Dex and Hasktorch). Right now, our efforts have mostly migrated from gradient-based machine learning in continuous spaces to generalized differentiation over finite fields (details forthcoming in our ARRAY'22 presentation), which allows us to take derivatives in discrete spaces such as bounded-width regular and context free languages. Our findings indicate that generalized AD has practical applications to robust-parsing, error-correcting codes, type inference and program synthesis.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: