I teach a graduate course in optimization methods for machine learning and engineering [1,2]. Julia is just perfect for teaching numerical algorithms.
First, it removes the typical numpy syntax boilerplate. Due to its conciseness, Julia has mostly replaced showing pseudo-code on my slides. It can be just as concise / readable; and on top the students immeditaly get the "real thing" they can plug into Jupyter notebooks for the exercises.
Second, you get C-like speed. And that counts for numerical algorithms.
Third, the type system and method dispatch of Julia is very powerful for scientific programming. It allows for composition of ideas in ways I couldn't imagine before seeing it in action. For example, in the optimization course, we develop a mimimalistic implementation of Automatic Differentiation on a single slide. And that can be applied to virtually all Julia functions and combined with code from preexisting Julia libraries.
I'm using Julia (because of the hype) to prototype out some numerical optimization stuff. There is a million functions for reshaping multidimensional arrays. The syntax is uncannily like Matlab: retrieving the last element of an array with `[end]`, indexing into a collection with an array of booleans, element-wise versions of operators prepended with dot, etc.
However, I keep running into niggling corner cases that kind of make Julia's promise of a powerful, extensible, yet intuitive type system less convincing.
ME: I want to write a custom getproperty() for Tuple!
JULIA: No.
ME: I want to broadcast over the fields of a NamedTuple!
JULIA: Not allowed.
ME: I want to get a view, not copy, with `@view M[m:n, r:s]`, but also get the ability to specify a default for out-of-range indices, like `get()` allows.
As sibling posts have pointed out, you can do all of those things:
1. You can trivially write a `getproperty` method for a tuple. It is considered to be type piracy and thus runs the risk of colliding with someone else's definition, but the language absolutely lets you do it.
2. You can broadcast over the fields of a `NamedTuple` by defining appropriate methods. Again, it's type piracy, so take that into consideration, but the language lets you do this easily as well.
This doesn't really seem like a legitimate complaint. You want some particular pet behaviour, and claim it is impossible to achieve. When someone points out that it is in fact possible, you are unhappy that someone else did not anticipate and implement it pre-emptively...
Do you also expect your 'custom getproperty' (whatever it might do) to have been predicted and pre-implemented by someone else? And do you also expect arrays to 'just know' what value or behaviour you are looking for whenever you index out of bounds?
And as the phrase "reserved" in the error message indicates, it will likely be given a meaning once all the ramifications of doing so are worked out and the best choice of meaning is decided upon. If you're impatient and don't want to wait for that, define it to do what you want. Your code won't even break when it is given an official behavior since your method will overwrite the built-in one.
But this is exactly the kind of thing I called a bothersome corner case. Needing to redefine a global function in order to use fairly intuitive behavior is not great developer experience.
When things like this are left undefined, it's not to intentionally annoy you, as you seem to be taking it. It's generally because there are two or more reasonable possible behaviors and which one is correct hasn't yet been determined. In this particular case, there are subtleties because named tuples can be seen as ordered collections of values and as named associative structures. Deciding that kind of thing takes a lot of time and effort. If you feel that there is a preferred behavior that broadcasting over named tuples ought to have, it would be helpful to post that on GitHub.
There may be other packages or
methods for doing the other things you want. I’d think that broadcasting over a NamedTuple would iterate over the key => value pairs, but I haven’t tried it.
Looks like vectorizing over NamedTuples is explicitly disallowed. Probably you can still vectorize things over the keys and values separately, along with some helper functions, but it is a bit annoying. Looks like the reason was due to questions on whether iteration should be over values or pairs.
Indeed, that is precisely the decision that must be made. Neither one is obviously correct. And once a decision is made and goes into a release, it cannot be unmade — we all have to live with it forever.
> Due to its conciseness, Julia has mostly replaced showing pseudo-code on my slides.
This benefit is really underappreciated IMO — for a lot of "science" applications, the core part of the program should be readable by people who don't program in the language. In research papers, by people who want to understand the fine details of your algorithm, for example.
Julia gets closer to "executable pseudocode" than I would have thought possible.
Fourth, the ability to effortlessly drop down several layers of abstraction: Pointer types, all packages including Base are written in Julia and can easily be extended or patched on the fly, homoiconicity, seamless integration with BLAS and LAPACK.
Julia is a nice language, it's just tough to compete with Python.
- The beginner experience in Julia is still much worse than it is in Python. Stuff that should work intuitively sometimes doesn't, and when you get a cryptic error message, it's difficult to find relevant help online. And when you do find help, some of it is out of date because the language has changed over the past few years.
- You can squeeze a lot of performance out of Python and the ecosystem of libraries is hard to beat.
- Julia has to be way better than Python to give people an incentive to switch. Being just marginally better in some aspects of the language isn't enough. And it's very difficult to be much better than Python especially in useability and ecosystem.
> Julia has to be way better than Python to give people an incentive to switch.
A language doesn't necessarily have to give all the old programmers an incentive to switch, if it can position itself as a good language for new programmers to learn.
For example: at our institute (computational biology), we had a PhD student who was an early Julia adopter and wrote his model in that. Several students have since joined the project he started, so obviously they're now writing Julia too. That project's experiences with the language were so good, it soon became obvious that for our use case, Julia was superior to any other language we'd used so far. So pretty much the whole research group has now shifted to Julia, and that's what we teach new students. Slowly, other groups in our institute became interested, and more and more people are adopting it, which in turn means that their new students will also end up learning it in future.
If you work with a lot of data, Julia is already a 10-100x improvement over Python.
Being able to iterate and mangle huge columns with real lambdas and without having to marshal arguments to/from C++ is a huge advantage.
Where I used to spend hours in aggregate searching through docs for pandas/numpy, for stupid shit like "how do I shift but also skip NaNs", now I just write a for-loop in a couple minutes and get on with my work.
There's a whole subclass of tasks in R/pandas to work around the interpreter that just aren't needed in Julia.
For me at least it's well worth the syntactical warts and slow interpreter.
As an experienced python/data science user, this (creating fast complex column-wise transforms) is rarely a problem for me.
The truly huge advantage for Julia is how it plays with parralelism. The GIL makes it an absolute pain to do parallelism in python. Always ends up in threading hacks with numba or joblib, or multiprocessing, which has its own unfixable flaws
Examples: Basic, C (both Basic and C, to a degree), Visual Basic, PHP, Javascript, Python. I'm probably missing some. These displaced older languages just by being adopted by newbies.
This is an insightful and level-headed comment that applies equally to R.
Although Julia is a growing alternative to Fortran/C/etc for long-running computations, it remains awkward and unpleasant for interactive analysis. Users familiar with Python/R/etc must weight the benefits of Julia against its slow library startup, its cryptic error messages, and its thin documentation.
Also, the lack of a community repository for well-vetted Julia libraries can limit uptake by professional researchers who must be able to trust their tools. A real strength of R (in comparison not just to Julia but also to python) is that such a repository exists, and that it has automated testing across a range of computer architectures and versions of R, including not just unit testing within individual libraries, but also testing of related libraries.
> Julia is a growing alternative to Fortran/C/etc for long-running computations, it remains awkward and unpleasant for interactive analysis
Did you mean to write short-running scripts? If anything, the Julia dev workflow is biased towards interactive analysis in a REPL a la R or IPython/Jupyter. I don't mean to imply that there's no startup overhead, but how often are you restarting the REPL when doing EDA? Unless it's more than once every few minutes (which is a very odd workflow), then startup overhead is effectively amortized.
> A real strength of R (in comparison not just to Julia but also to python) is that such a repository exists
CRAN is certainly a cut above many other package repos here, but I'm not sure "trust their tools" can apply to all packages on there. Anecdotally, I've had a lot of issues with compiled dependencies and missing/out of date external assets on less well-trodden packages. There's a reason MRAN, Conda and JuliaBinaryWrappers exist after all.
For whatever reason, Julia package maintainers also seem more receptive to making their work compatible with other libraries as well. This goes beyond just multiple dispatch as well--imagine if tidyverse/non-tidyverse wasn't such a hard split.
There are warts with the beginner experience with Python, principally the awful situation with packaging.
If you care about performance in code that mixes together several packages in nontrivial ways, Julia is way better than Python.
There's a far broader range of libraries in Python than Julia, but none of them are going to prevent adoption of Julia when its performance advantages are crucial, because of the excellent facilities for using Python from Julia.
I'm not sure the package problem is really a problem for beginners. Just within the last year firsthand I've seen people in undergraduate classes, in graduate classes, and at work try Python the first time, and the default install of Anaconda worked for them in every case. The classes were taught by different professors, and they all suggested Anaconda independently and were not Python programmers.
It is overkill/brute force to install all the Anaconda default packages when a beginner is not going to use over maybe 5-10 libraries, but it's a solution that has worked flawlessly for beginners from my experience watching non-software engineers and "non-technical" people using Linux, Windows, and MacOS try Python for the first time in math and data science classes.
You can, but Pluto is (I would argue) a better notebook than Jupyter. Regardless, the latency experienced with Julia applies equally whether you use Pluto, Jupyter, or any other front-end.
You absolutely can use regular jupyter notebooks for julia! Pluto has some advantages, like being stored as a normal julia file. The julia startup time issues affect both.
Oh, man, this is indeed a major feature. My main point of friction with jupyter notebooks is the stupid json ipynb format. Why can't it be just a regular language file with comments?
> They contain code, rendered Markdown, images, plots, video players, widgets, etc.
The code could be verbatim python code (or whatever language the notebook uses), and the rest could be embedded inside comments. I don't see any problem with that (besides the very concept of "rendered Markdown" being totally out of order). The fact that they are saving it as json by default seems more to be laziness by the developers than a well thought-out solution, that could be just a straightforward serializer.
>and the rest could be embedded inside comments. I don't see any problem with that
Do you mean embedding images and plots inside comments? If yes, please elaborate on how you see that happening in the real world.
>The fact that they are saving it as json by default seems more to be laziness by the developers than a well thought-out solution, that could be just a straightforward serializer.
So, how would that well thought-out solution in the form of a "straightforward serializer" work? I have a flat file, and I want to display images, plots that you can zoom into out of, figures, etc. as comments. How would that happen?
>At the very least, you could put the whole json stuff inside a comment. It's already plain text, isn't it?
So instead of having the whole file as JSON, which is lazy and not well thought-out, we'll put all content in JSON, then put that JSON inside a comment in a plain text file. Do I read you correctly?
I feel we're making progress faster than these lazy Jupyter org bandits.
> we'll put all content in JSON, then put that JSON inside a comment in a plain text file
Only the "output" content. The code inside the cells is verbatim, and the markdwon cells are regular text comments.
See, I'm not discussing you just because. I have a legitimate problem with ipynb: very often I want to run the code of a notebook from the command line, or import it from another python program. This is quite cumbersome with the ipynb, but it would be trivial if it was a simple program with comments.
I believe people reading this are not detecting the sarcasm. I'm demonstrating that the Jupyter folks are not lazy engineers, and the "obvious" solutions people come up with are not that well thought-out when you start actually thinking about them.
You can also use VS Code notebooks and Julia support in VS Code keeps getting better. As a newcomer to Julia I am super impressed with the experience. No getting around loading the Plots package but producing a high quality plot and getting the data there is a much more enjoyable experience than pandas + numpy + Matplotlib + whatever tensor framework you’ve sworn to.
Do you still have latency issues in Julia 1.6? The latency improvements in the last 3 versions of julia have been so significant that I do not really notice it anymore. Supposedly there are additional speedups planned for 1.7.
I tried also recently the beta version of Julia 1.6 and the speed improvement of installing/loading package are quite impressive. Essentially, packages get precompiled after installation using multiple threads.
Beside this, if you only infrequently install/update package, you can use PackageCompiler.jl. I use it for PyPlot.jl (based on matplotlib), DataFrames.jl, ... and plotting some data quasi instantaneous as it is in python (even the very first time in a session).
Julia has the focus on scientific and numerical computing, and is overtaking the python/numpy combo in that niche. In addition to being considerably faster than python, it also has quite some innovative libraries in the area. This can also extend into machine learning, where python has been the go to language, despite its limitations.
For other areas, like web programming, there is no sign of Julia replacing Python in the forseable future.
I second this. Python is actually starting to get significant traction in the scientific community. Depending on the field, R, Fortran and Matlab (and even C++) still have a huge lead.
It's nice that Julia is getting noticed, but it's a distant blip in the radar.
The sci community is really hard to move from existing battle-tested and performant libraries.
I don’t have much insight on the scientific computing landscape in general, but here’s one notable data point: I worked on the CMS experiment of LHC (Large Hadron Collider) for a while, which is one of the highest profile experiments in experimental physics. The majority of CMS code is C++, which you can check for yourself at https://github.com/cms-sw/cmssw (yes, much/most? of the code is open source). What I worked on specifically was prototyped in Python, then ported to C++ and plugged into the massive data processing pipeline where performance is critical due to the sheer amount of data. So I probably wouldn’t put C++ in parentheses.
This need to rewrite, of course, is what Julia is trying to avoid. My workflow is exactly the same, and I’d love to be able to write code in a high-level language like Python and then use that directly instead of having to rewrite.
However, in my case the reason for rewriting isn’t just performance, but also to be able to build compiled binaries. Julia aims to be as high-level as Python but faster - is there a language that’s as high-level as Python but AOT-compiled?
Cython - in fact I think in 2021 if you want to write a pure C or pure C++ program, Cython is the best way to go, and just disable use of CPython.
The “need to rewrite” is actually a sort of advantage with Cython. You only target small pieces of your program to be compiled to C or C++ for optimization, and the rest where runtime is already fast enough or otherwise doesn’t matter, you seamlessly write in plain Python.
Using extension modules is just a time-tested, highly organized, modular, robust design pattern.
Julia and others do themselves a disservice by trying to make “the whole language automatically optimized” which counter-intuitively is worse than make the language overall optimized for flexibility instead of speed, yet with an easy system to patch optimization modules anywhere they are needed.
I have been using pythran for the last year and the nice thing is that you hardly have to rewrite anything but get speeds which are often as fast (or sometimes faster) than c modules.
The problem with cython is that to really get the performance benefits your code looks almost like C.
I agree with you on the optimize the bits that matter, often the performance critical parts are very small fractions of the overall code base.
> Using extension modules is just a time-tested, highly organized, modular, robust design pattern.
I really don't get this. I'am fully on the side that limitations may increase design quality. E.g I accept the argument that Haskell immutability often leads to good design, I also believe the same true for Rust ownership rules (it often forces a design where components have a well defined responsibility: this component only manages resource X starting from { until }.)
But having a performance boundary between components, why would that help?
E.g. This algorithm will be fast with floats but will be slow with complex numbers. Or: You can provide X,Y as callback function to our component, it will be blessed and fast, but providing your custom function Z it will be slow.
So you should implement support for callback Z in a different layer but not for callback X,Y, and you should rewrite your algorithm in a lower level layer just to support complex numbers.
Will this really lead to a better design?
> “But having a performance boundary between components, why would that help?”
It helps precisely so you don’t pay premature abstraction costs to over-generalize the performance patterns.
One of my biggest complaints with Julia is that zealots for the language insist these permeating abstractions are costless, but they totally aren’t. Sometimes I’m way better off if not everything up the entire language stack is differentiable and carries baggage with it needed for that underlying architecture. But Julia hasn’t given me the choice of this little piece that does benefit from it vs that little piece that, by virtue of being built on top of the same differentiability, is just bloat or premature optimization.
> “you should rewrite your algorithm in a lower level layer just to support complex numbers.”
Yes, precisely. This maximally avoids premature abstraction and premature extensibility. And if, like in Cython, the process of “rewriting” the algorithm is essentially instantaneous, easy, pleasant to work with, then the cost is even lower.
2. Allow each to pursue optimization independently, with clear boundaries and API constraints if you want to hook in
3. When possible, automate large classes of transpilation from outside the separate restricted computation domains to inside them (eg JITs like numba), but never seek a pan-everything JIT that destroys the clear boundaries
4. For everything else (eg cases where you deliberately don’t want a JIT auto-optimizing because you need to restrict the scope or you need finer control), use Cython and write your Python modules seamlessly with some optimization-targeting patches in C/C++ and the rest in just normal, easy to use Python.
> One of my biggest complaints with Julia is that zealots for the language insist these permeating abstractions are costless, but they totally aren’t.
This sounds like it might be interesting, but your later comments about overhead and abstraction costs sounds like you maybe don't understand what Julia's JIT is actually doing and how it leverages multiple dispatch and unboxing. Could you be a bit more concrete?
No I think that’s what I’m saying. When raising the issue that using multiple dispatch this way is premature abstraction that has intrinsic costs, all I get is the religious pamphlet about multiple dispatch.
In practice the multiple dispatch overhead is elided by the compiler. If it can’t be you’re doing something truly dynamic, which is generally unavoidably slower. It’s still a better place to be than everything being a generic Object type.
The nice thing about Cython is that you can have both - all the multiple dispatch you want with fused types, or escape that paradigm to do other things if you desire. It gives a lot of surgical control.
I don’t think that is true. As far as I know, Cython let’s you do function overloading and single dispatch via class inheritance. I think you also miss out on the type inference that lets you do things like pipe dual numbers through functions without any dispatch related overhead.
Does compiling with cython decrease the ffi overhead of the calls into native code? My problems with numpy have always been that I have to make a lot of calls on small bits of data and the ffi overhead eats all my performance gains. If I put more logic on the native side and made fewer bigger calls it would be faster, but that often doesn't make sense, or is a slope where putting the logic unto native pulls a data structure over or another related bit of logic until I just have a tiny bit of python left.
Probably. Cython compiles a C-style superset of Python into C. Then a C compiler compiles that to a Python-importable DLL/.so. So, the overhead to call a C function is no more than declaring its types (programmer person overhead) and then, in the generated C, the native C-linkage function can be called like any other. Now, just one C function calling another from another translation unit (i.e. object file or shared lib) can be "high" overhead (nothing like Py FFI), but you may also be able to eliminate that with modern compilers with link-time-optimization with some build environment care.
Just for reference, my experience is mostly computational genomics. R is king of analysis, and most of the actual "meat" is implemented in C++. But I work with other teams as well, so the experience is a bit more varied if you look across different areas.
It's all about which "bubble" you're in. Many people posting here work for startups using micro services (for which Go is a decent fit) and for companies close to the whole Docker/Kubernetes ecosystem, which is based on Go. So naturally they assume Go is huge.
My anecdata kind of tells me that Go is reasonably big, but it's not yet near .NET and Java, worldwide. But it could get there in a few years, I've seen/heard about some enterprises adopting it.
True, but I'm not talking about simple users. I'm talking about companies extending Kubernetes or building adjacent software. Even if their service doesn't necessarily integrate with Kubernetes, there is frequently a temptation to "follow your heroes".
Look at the whole Cloud Native Foundation thing, I think most of their projects are developed using Go.
So if you're using that stack, it's easy to assume that all new development everywhere is in Go.
It will probably balance out once the newness wears off Go (I think this is already happening).
It's not overtaking at all. It's seen growth in some areas.
The issue with regards to web programming/other programming is important, because sometimes it's useful to make a website/build another tool as a scientist. Python can do both easily.
They're not exactly mature frameworks yet though, which is more the point I'm making. Of course, you can do most things in Julia, but does it provide a good experience for it yet?
there are so many tools coming up around this in julia that it is arguably a problem.
THere was a whole session last juliacon that was just on web-dashboard tools like Dash.jl and Stipple.jl and several others.
And there was another half-session worth of other talks about web related things.
Seriously?!? Julia has no hope of overtaking Python in numerical computing by 2032, expecting movement by 2022 is just delusional. Here is a better prediction: by 2022 people using Python for numerical computing who started doing so in the previous year will exceed the number of people who have ever downloaded Julia since it was first released.
No, not seriously. But also seriously. The article is based on % change which is of course ridiculous because the % increase of a small population isn't at all interesting. And GP also has a ridiculous claim. So I'm offering a stake in the ground to determine if Julia is on the track that GP claims. If GP wants to come back and discuss where the stake should be, it will be an interesting conversation.
You can call Python directly from Julia https://github.com/JuliaPy/PyCall.jl so much of the Python library ecosystem (say, matplotlib) is available to be used in Julia programs.
That helps the adoption story quite a bit. You can do the number-crunching in Julia where performance counts, and then analyse and present the results using Python.
- Using Python directly is a better experience than calling Python from Julia
- I've never run into unsolveable performance issues with Python
So I guess I'm not in the target audience unless I just happen to be curious about a new language? That's kind of my overall point - even if Julia is a good language on its own and I work in data science, I don't have reasons to pick it over Python.
If you haven't hit a brick wall with python, it is just because you haven't run into the right problem. I was doing something that required lots of conditional operations on small matrices. The FFI into numpy's native library really bogged it down. I didn't have permission to install a compiler on that machine so I wrote it in vba in excel. It was 11x faster.
I said something similar in another thread, but for me it doesn't have to be better than Python, as that is largely going to be subjective, the package ecosystem just has to grow and have some offering, at all, for the things that I do.
Is still very barebones compared to Torch/TF/Flax and I would be hamstringing myself by switching to Julia even if I find the language otherwise attractive.
Thanks for this, I will definitely follow along there. Yeah if they can just check a few of those boxes I'm much more likely to at least try to more regularly work with Julia.
If it's a pure function. Oh and if you have state-based control flow you have to turn off the JIT. Etc. If you take a standard library like some thermodynamics simulator and throw Jax on it do you expect it to work without modification? Most of the time it'll fail right at the start by using the wrong implementation of Numpy. So no, that's not "ordinary functions": those are functions where people consciously put in the effort to rewrite years of work onto Jax which is very different.
I found the beginner installation/package installation experience a million times better than python (except that it’s tricky to explain that you type ] to enter the package manager but you don’t see the ] that you typed)
I think your "ecosystem of python libraries" is the key point. Python got a lot of mileage for a mass adoption from ML. Its libraries provided an "easy ML" for masses at the time ML got popular in science and job market, which quickly brought it into mainstream and built up its network effect.
A similar enabler in a new field could help Julia burst in as a general language. My 2c.
Autodiff is a place where there is a gulf between Julia and Python, one
that I think can't be bridged well: JuliaDiff is astonishingly flexible and performant.
I linked to the website (which was updated in May, but its contents could do with more work) because it has examples of how well the suite fits together.
I don't know much about Jax. I've seen competent benchmarks showing an order of magnitude benefit for using ReverseDiff from the AutoDiff suite over Autograd, which is what Pytorch uses for reverse-mode autodiff
I think you are confusing 1995 with 2005. Perl was in decline by 2000 and by 2005 it was terminal; you could probably count the number of perl shops of any consequence in that year on the fingers of one hand.
I don't think that is the case.
Sure Perl may have been in decline for ages, but people were not comparing Perl to Python for that long.
Simply because python hasn't existed that long.
Python 2.0 was released in 2000.
Python 1.0 was 1994, and Python 0.9 (first public release?) was 1991.
People like to substitute "10x better" here but I think the real number is 100,000x better, aka it's not possible by default. Q: What it would take to replace Windows? A: iPhone was a new product category that targetted a new market.
It does happen, though. C has mostly replaced FORTRAN for scientific applications. Not entirely, FORTRAN is (infamously) still used, but I don't know anyone who has started a new project with FORTRAN.
Just 6 years ago, I was taught Perl in my Introduction to Bioinformatics course. The teachers were still using Perl because it used to be the go-to language for bioinformaticians. The year after, and every year since, they've taught using Python.
We started a very large computational intensive project in Fortran. It is still easier to do maths in Fortran than in C/C++ and now, Fortran has a wonderful C binding system allowing direct call into C .so/.dll if you want to do some SQL or other kind of data input/output.
The idea that people were being forced to learn perl just for bioinfomatics as recently as 6 years ago fascinates me.
Python has gotten exceptionally lucky. I am sure the two or three remaining perl users on the planet are also on HN and ready to jump to its defense, but to me this just goes to show you how heavy the switching cost is for something like this is and also how lucky python was to have been the best language to switch to at this point. It was in the right place at the right time for a lot of these switches away from older languages in obvious decline and then it was able to leverage numpy and scikit to pick up a lot of additional momentum in ML and data science tasks. It is almost never the 'best' language for the job, but coming in as second choice on most tasks is a huge win.
Jack of all trades, master of none, but oftentimes better than some are at one.
This is in no small part due to a clever design decision of language design of combining type genericism with multiple dispatch.
For example, Turing.jl for Bayesian Inference plays well with Flux.jl for Neural Networks which plays well with DifferentialEquations.jl for ODEs. Basically, everything in pure Julia plays nicely with everything else.
An example of how this useful: when neural ODES became more popular a couple of years ago, Julia users had to do almost nothing to implement them and extend them. DifferentialEquations.jl and Flux.jl already played nicely with each other, and you could just run wild. Meanwhile, in Python-land, there are devs building out ODE solvers built in Tensorflow and Pytorch, doing a load of duplicate work because the frameworks don't allow the same level of genericism.
The whole ecosystem is like this.
So I've decided to stay with Julia. I'm staying with Python too. It's no big deal.
This seems very desirable. Though at the moment, when the self-attention got popular for the first time, it was already available in PyTorch. Python seems to have the edge just because there are lots of people using it. Maybe it is just a matter of time and users. Probably I will wait until the ecosystem gets larger, and then switch to it. (Yes, I am a lazy person to implement a transformer from scratch)
I've been using R nonstop for pretty much 5+ years. I'm happy that there's established competition coming from Python and new competition coming from Julia. Having these languages compete over similar types of programmers pushes each one to be better, which is awesome. I'm not a die-hard R person, I'd be more than happy to switch under the right circumstances.
But...I think one thing gets overlooked way too often. For "data scientists" or "statisticians" or [insert new term here], the majority our non-modeling time is spent on just plain old data wrangling. To me, R is unbeatable here. I've tried Python ~2 years ago and pre-1.0 Julia.
Using tidyverse you can do pretty much anything to any dataset, often *without a monstrous amount of keystrokes*. (The pipe syntax is awesome). If you really need speed you can always switch over to data.table for uglier but faster code. I really tried but I could never replicate the "brain cycles to keystrokes" speed of R in Python/Julia. That is, being able to intuitively and quickly just convert my thoughts into readable data wrangling code.
Sure the base R language is not that "fast" and Julia/Python benchmarks are way faster. But in practice this doesn't matter to me. Most of the performance sensitive packages are written in C/C++/Fortran anyway (rstan, brms, glmnet, caret). I don't care that I could write 3x faster loops. The extra 5 seconds for that one piece of code doesn't make up for the absence of a good data wrangling ecosystem.
My message to the Julia team: You can get a very large portion of the R userbase to switch over if you focus on a Julia version of the tidyverse (especially dplyr). I know that DataFrames.jl exists but it just doesn't even come close. There's a difference between "you can do this in Julia too" and "here's a clean/intuitive way to do this better without extra baggage".
I'm sorry if the above seems harsh. I genuinely appreciate the Julia team's efforts. I can only imagine how hard it is to create a new language. I just wanted to be honest.
I deeply loath R for its terrible type idiosyncracies, syntax, and slowness.
However, even I must admit that it is incredibly good at what it was meant to do - analyse and display data. (And yes, the tidyverse is a huge improvement of the syntax, although it's telling that they basically reinvented the language to do so.)
As an ecological modeller, I create my actual simulation models in Julia, because it is a much, much better language for any real programming. But I still analyse the output in R.
I don't understand how people can loath R. If you take a functional approach, especially using pipes, dplyr and a split, apply, combine style, it is quite beautiful. Much nicer than trying to, say, divide a time period by an integer in Go.
> If you take a functional approach, especially using pipes, dplyr and a split, apply, combine style, it is quite beautiful
Sure, but what if you don't? Sometimes, this is the right way to do things, other times there are other approaches that are more natural/beautiful. In many cases, a loop with conditionals is much easier to understand.
I use a lot of R, and like many aspects of it. But the fact that `f(stop("Hi!"))` may or may not throw an error depending on the internals of `f` is a little maddening. (And there are tons of similar issues.)
When it comes to data wrangling, one huge advantage of Julia over tidyverse/R dataframes/Pandas is that you can write a damn for loop and it won't be brutally slow.
It's so much simpler and faster to use a loop that says "pick this row only if this and that and this other thing are sometimes true" vs having to construct an algebra of column filters to do the same.
I think that is absolutely a fair criticism. Personally, I rarely run into an issue where I absolutely am bottlenecked by a slow loop. But this sort of thing drew me to Julia in the first place.
There was also an R update in ~2017 that introduced some JIT speed-ups for loops, which made a noticeable difference.
If this is a problem you run into often, I suggest converting your object to a data.table. You can pass a function row-wise over the object very quickly:
I think loops are not ideal for data analysis. They are prone to human error, especially ones that modify the data, and in a way that can be hard to sort (i.e.iterating over the dimensions of the wrong object). A stepwise creation of new logical fields using mutate, and then a vectorised ifelse command is more robust and you can clearly see steps of the logic.
I mean I get yout point.
Julia has a bit of a Lisp's Curse
http://winestockwebdesign.com/Essays/Lisp_Curse.html
Writing a performant and easy to use data wrangling library for R is a bunch of work and means dealing with C/C++ etc.
So few people are willing to do so, and just contribute to a small number of libraries like dplyr.
(I feel like there are at least 2 other major compeditors to that in R?)
Where as in julia it's really easy to write a new data wrangling library.
Its just not that much work. So people:
A) do it for just fun / student projects (None of those ones are though).
B) do it because they have a nontrivially resolvable opinion (e.g. Queryverse has a marginally more performant but marginally harder to use system for missing data)
Nice thing about julia, especially for tabular data (thanks to Tables.jl), is everything works together.
It's actually completely possible to mix and match all of those libraries in a single data processing pipeline.
Which while is generally a weird thing to do, it does mean if you have a external package uses any of them it works into a pipeline of another.
(One common case is that queryverse has CSVFiles.jl, but CSV.jl actually is generally faster, and you can just swap one for ther other, inside a Query.jl pipeline)
I absolutely argee this makes learning harder.
---
Also that particular example:
> "I need pipes to help me wrangle data more efficiently do I use Base Julia, Chain.jl, Pipe.jl, or Lazy.jl?"
It's piping.
Something would have to massively be screwed up if any of those options were more or less efficient than the others.
The only question is what semantics do you want.
Each is pretty opinionated about how piping should look.
The Lisp Curse was written by then inexperienced web developer, with (then, and likely now still) zero Lisp experience, based on extrapolating something he read about Lisp in an essay by Mark Tarver. He prefers it not be submitted to HN due to the embarrassment, yet for some reason keeps the article up (probably because it generates traffic).
Yeah, NSE (non-standard evaluation) is really annoying to work with in dplyr/tidyverse codebases, and this definitely inhibits people from building on top of them.
They are an 80% solution for a lot of data analytic needs, but base-R is 100% the right choice if you want your code to run for a long time without needing updates.
I've never really gotten into data.table for some reason, normally dplyr is fast enough, or I'm using something more efficient than R.
What a constructive, positive, down-to-earth, well-written comment, and what a nice reprieve from everything that's broken about the tone of web discussions these days. You point out that there's still another player in this space (R), but not in a way that's whiny, dismissive, or doctrinaire, and you celebrate the healthy competition. You suggest a streamlined path toward Julia ecosystem maturity, rooted in real-world needs. Nicely done!
I have no real dog in this fight, but I hope Julia team members (and/or aspiring Julia ecosystem contributors) will read and consider your point.
This whole thread seems to be quite civilized. I can see no name-calling or off-topic rants, only a frank exchange of opinions, mixed in with some facts.
Your post seem to indicate that there is some sort of 'fight' going on, or that the tone is broken. I disagree. If most web discussions were like this one, we would have fewer problems in this world.
Oh, that's exactly what I mean -- when I say "everything that's broken about the tone of web discussions these days", I'm talking about threads and topics other than this one. I don't see any 'fight' here, and that's what's so refreshing.
All right! I got the impression you were contrasting that particular post with the rest of this discussion, but apparently not. Still slightly confused here. Oh well, carry on.
Another big thing that R has an edge over python (and I guess Julia, but not sure) is making quick yet presentable plots of data that contain different factors that you want to show together. The matplotlib equivalent requires tracking different indices and manually adding layers for different indices.
I worked with R and Python during the last 3 years but learning and dabbling with Julia since 0.6. Since the availability of [PyCall.jl] and [RCall.jl], the transition to Julia can already be easier for Python/R users.
I agree that most of the time data wrangling is super confortable in R due to the syntax flexibility exploited by the big packages (tidyverse/data.table/etc). At the same time, Julia and R share a bigger heritage from Lisp influence that with Python, because R is also a Lisp-ish language (see [Advanced R, Metaprogramming]). My main grip from the R ecosystem is not that most of the perfomance sensitive packages are written in C/C++/Fortran but are written so deeply interconnect with the R environment that porting them to Julia that provide also an easy and good interface to C/C++/Fortran (and more see [Julia Interop] repo) seems impossible for some of them.
I also think that Julia reach to broader scientific programming public than R, where it overlaps with Python sometimes but provides the Matlab/Octave public with an better alternative. I don't expected to see all the habits from those communities merge into Julia ecosystem. On the other side, I think that Julia bigger reach will avoid to fall into the "base" vs "tidyverse" vs "something else in-between" that R is now.
Out of curiosity, when was the last time you looked at DataFrames.jl? A huge amount has happened in the last year. Plus, if you want more tidy-like syntax, you can go with Query.jl, (or DataFramesMeta.jl, though that isn't quite finished updating to the the new DataFrames syntax), or of you just want pipes on DataFrame operations, there's Pipe.jl and Chain.jl.
I don't think your comments are harsh, you need what you need and you like what you like. I do mostly data wrangling too, but feel much less constrained with Julia than with tidyr. Sometimes having constraints and one right way to do things is good, but it's not for me.
Also worth noting it's not necessarily on the language developers to do this. Even in R, tidyverse is in packages, not in the base language.
My experience with R was somewhat different. R was my first computational language in 2006 (version 2.3, IIRC), and parsing real life data (biological, in my case) into a format acceptable to R was a non-trivial exercise. I had somebody write me a perl script to parse the raw data into a clean CSV, but that has its own problems. The tools that were the kernel of the tidyverse (created 2014) were just beginning to show up, and even magrittr pipes were many years away. The only tidyverse tool even close to mature at the time was ggplot. For me data munging was the limiting factor, and at some point I discovered many people prefer Python for these initial steps. In 2013 I learnt Python with the explicit aim of data munging, while continuing analyses in R. With Pandas I could cover 80% of my use case for R, and eventually dropped it completely. Again, this predates the creation of the tidyverse, which I noted with some irony.
For what its worth, Hadley Wickham was asked in a Reddit AMA several years ago about which platform he'd choose if he was just starting out. He pointed to Julia as his pick.
> My message to the Julia team: You can get a very large portion of the R userbase to switch over if you focus on a Julia version of the tidyverse (especially dplyr).
If we removed dplyr, then R scripts would absolutely scream so I find the speed argument for 'why switch to X' unconvincing. If users cared so deeply about speed, almost no one would be using tidyverse instead we'd all be using base-R or data.table.
Multiple dispatch? Hmm is this really a problem that I'm going to come across in the real-world when 90% of our time is spent ingesting a poorly-formatted csv, doing some quick plots and perhaps building a model to test something out. If the goal of Julia is to replace R/Python then their priorities feel way off the mark
> If the goal of Julia is to replace R/Python then their priorities feel way off the mark
There's a lot more to scientific computing than wrangling tabular data. Julia is competing in that overall space with R/Python/Fortran/Java/C++. If R or Pandas is better at data wrangling, then Julia won't win out there. But so be it. No PL is best at everything.
> There's a lot more to scientific computing than wrangling tabular data.
Also a point that gets ignored way too often. My original post differentiated between time spent writing models and time spent data wrangling.
I would never even attempt to write a symplectic integrator in base R (OK maybe Rcpp would be fine but that's not really "R"). Julia, by design, is better at that. But the R ecosystem is so good that I can use the best practical implementation of a symplectic integrator to solve common modeling problems via RStan.
Yes, Stan is a standalone framework that can be accessed from Julia as well. But the following workflow can be done in R much easier:
1) Read in badly formatted CSV data
2) Wrangle the data into a useable form
3) Do some basic exploratory analysis (including plots)
4) Write several models in brms/raw Stan (via rstan)
5) Simulate from the priors and reset them to more sensible values
6) Run the model over the data to generate the posterior
7) Plot/run posterior predictive checks, counterfactual analysis, outlier analysis (PSIS or WAIC), etc.
Again, the above represents my common use case. I fully appreciate that people use Julia to do awesome stuff like "the exploration of chaos and nonlinear dynamics." [0]. I understand that the modern R ecosystem isn't really built for this.
Totally agree there. It is not a replacement and it is trying to solve a different problem. I dont believe Julia contributers are lying awake at night upset that other languages exist and feel they need to put a stop to that. My point (put across clumsily I see) is that IF that was their goal then they are going about it the wrong way as most R/Python users have different priorities. But it is a moot point as that would be an absurd motivation to create a whole new language
> is this really a problem that I'm going to come across in the real-world when 90% of our time is spent ingesting a poorly-formatted csv, doing some quick plots and perhaps building a model to test something out
Yes, multiple dispatch is not some highfalutin ivory tower concept that only comes up in specialized code. For example, the model in question could define custom plotting recipes[1] so that you can just call plot() and have it produce something useful.
Also, why shouldn't dplyr perform comparably against data.table? Seems like there would be no need for a fragmented library ecosystem here if the abstractions the tidyverse is built upon were lower-cost. Moreover, what if my data isn't CSV or in a table-like shape at all? "real world" does not mean the same thing across different domains.
> Yes, multiple dispatch is not some highfalutin ivory tower concept that only comes up in specialized code. For example, the model in question could define custom plotting recipes[1] so that you can just call plot() and have it produce something useful.
This is literally the whole conception behind generic functions in R (print, plot, summary etc).
I agree it's great, but Julia is building on a lot of prior art here.
For sure, and one would be remiss not to mention Dylan, CL/CLOS and Clojure here as well. My quibble was with the claim that multiple dispatch rarely shows up in practice, which you've pretty clearly shown is not the case in R!
'highfalutin ivory tower' is a great name for a band :D
Naturally you are correct and I am wrong to dismiss it as unimportant. What I'm saying is that the majority of R/Python users today are not looking for ultimate speed or sophisticated programming paradigms. Most users are doing the unsexy bread and butter of 'Take some tabular data' -> analyse -> report on it and I want to dismiss the argument of 'users will migrate to Julia because of these nifty features' because it ignores the very reasons the existing users use these tools in the first place. It would be as absurd as proclaiming Excel users will switch to Python because the accounts deparment suddenly cares about NLP.
The comparison between different languages gets tiring when it focuses on making a black-and-white statement like "Julia is better" or "Python is better" and "x is never going to overtake y". Yes, Python has many more libraries thanks to it being much older than Julia, same for R. But at the same time, Julia can be used for impressive work that R/Python struggle with and which only seem solvable in these languages because of large investments into certain packages by big companies.
So I find the fact that many hard problems can be solved very generically and performant with small libraries written in Base Julia much more interesting than countering that much larger and older Python packages with millions of developer hours poured into them are currently more feature-complete. Yes, they are, right now. Why wouldn't they be. But does what is being done in Julia with much fewer resources not point to an impressive ability of the language to facilitate such development?
While I generally agree with your argument, it's worth noting that the median Julia programmer is probably more invested in the language/ecosystem than the median R/Python programmer.
Back in the mid-90's Java was the new hotness, and it probably made problems that required 100+ lines of C easier, but it's not still full of above-average programmers, as any language/ecosystem that achieves success will inevitably regress to the mean.
That's true. I am of course biased in talking mostly to people on the Julia Slack etc. who enjoy the language a lot and do interesting things with it.
That's one of the reasons, though, why I never find the "how many people are using it" argument the most convincing when talking about the merits of a language. Because most people I've seen using R, Matlab and Python, at university or work for example, used it really superficially, and therefore wouldn't have any interesting things to say about it. Neither do they add anything interesting to the respective ecosystems. I don't think it's the first interest of a new language to get this type of user, although of course in the long term you want to build tools that are easy to be picked up and used by a wide audience, and number of users is some indicator of that.
I'm a graduate student that's switched almost completely over to Julia. Prior to it I worked in both MATLAB (the IDE is so nice, and writing out matrix computations is just great) and Python (for ML). Julia is absolutely nicer to write in than either of the two. MATLAB is slow and at times feels less like a programming language and more like an incomplete and brittle interface with the JVM. Python is also slow, and it feels awkward to use given that it was not explicitly designed for scientific workflows. With Julia I get proper typing, incredible speed, easy parallelization, and a kick ass REPL.
The only thing I truly miss in using Julia is the plotting capacities of MATLAB. I haven't found an environment that can match it in terms of interactivity. Give me the ability to (easily) save interactive figures for later use and Julia would be perfect.
You should check out Makie. Getting it set up can be a bit frustrating if things don’t go right, and there is a small learning curve for using `@lift`, but it is an absolute joy to use once you ramp up.
I use it for my research by default. You can pan, zoom, etc. The subplot/layout system is frankly a lot better than Matlab (and I enjoyed Matlab for plotting!). The best part is that I can insert sliders and drop downs into my plot easily, which means I don’t need to waste time figuring out the best static, 2D plot for my experiment. I just dump all the data into some custom logging struct and use sliders to index into the correct 2D plot (e.g. a heat map changing over time, I just save all the matrices and use the slider to get the heat map at time t).
Wow, I just tried it out. This is really great. And it solves my interactive plot saving requirement. Easy as doing `Plotly.savehtml(fig, "test_fig.html")` :). Thanks!
I have, though I've mostly stuck with the plain PyPlot.jl package due to the familiar syntax and interactivity. Perhaps things have changed, but I just recall being frustrated at the inability to zoom in/out, and again save interactive figs. Perhaps it was just due to the particular backend I was using. I'll give the VS Code experience another try!
In my modest experience the perfect Julia slogan would be:
"fast as C, easy as python, but NEVER the two together"
All the sentences:
"When you’re writing various algorithms, you don’t necessarily want to think about whether you’re on a GPU, or whether you’re on a distributed computer. You don’t necessarily want to think about how you’ve implemented the specific data structure. What you want to do is talk about what you want to compute."
sound nice.
Except in practice, unless someone else bothered doing that for you, you have to do it yourself.
Underrated comment. Yeah, if you want C-like performance, you have to do some low-level considerations, that is unavoidable at some point. So the "speed of C, convenience of Python" is misleading.
However, for many, many small tasks, today's compilers are smart enough that you can express your idea in a high-level language and the generated code will be maximally efficient. The real killer feature of Julia is that, where ever you can gain maximal performance with high-level syntax, you can just choose to do that. A more correct but less sexy slogan for Julia is that it has the best performance/expressiveness tradeoff you have ever seen.
That's still a massive selling point. In python, getting speed can be weird and counterintuitive. In C, a straightforward algorithm can be blazing fast. For example, finding the length of the longest word in a string, you can just iterate through the string keeping track of a few indices. In cases like that, where the obvious simple C function is incredibly faster than the same python, where does Julia fit in? Would that kind of naively written function be closer to C or Python?
In that case, it's closer to C in speed, and usually more generic and "easy on the eyes". I don't know much about C, but seem to recall that it doesn't play well with unicode, usually treating text as bytes. Here's an equivalent Julia example:
function longest_word(st::Union{String, SubString{String}})
len = 0
start = 1
@inbounds for i in 1:ncodeunits(st)
if codeunit(st, i) == UInt8(' ')
len = max(len, i - start)
start = i + 1
end
end
max(len, ncodeunits(st) + 1 - start)
end
This takes about 8.2 µs for a 8.5 Kb piece of text on my laptop, but that only works on ASCII text and only treats ' ' as whitespace, not e.g. '\n'. For a more generic one, you can do:
function longest_word(st::Union{String, SubString{String}})
len = i = 0
start = 1
for char in st
i += 1
if isspace(char)
len = max(len, i - start)
start = i + 1
end
end
max(len, i + 1 - start)
end
This is 20 µs for the same text, so still only 3 ns per char. The underlying functionality, namely String iteration and the `isspace` function, is also implemented in pure Julia.
In general, if you code like Python (highly dynamic code with no consideration to performance) it will be closer to Python in speed, and if you code like C/Fortran (completely static, overspecified types) it should be closer to C in performance, the variance in performance in terms of naive implementations is pretty high. That means it's easy to get into Julia and start programming no matter your background, but idiomatic Julia (which it's not something you'll learn in a day) should be concise and high level like Python (and frequently more concise) and close to C in speed.
For example, what other dynamic languages do like verbosely typing everything doesn't really work in Julia (the compiler already knows pretty much every type even without hints), what works is treating the variable as a polymorphic container instead of a dynamic container: you don't know yet what type the variable has (only the behaviour), but whatever it is you should avoid changing it if possible (what they call type stability). Which is kinda why it might not be obvious reading proper high performance Julia code, as it is not something you do to make it fast, but what you don't do (change a variable type, forcing the compiler to create a low performance dynamic box, plus other stuff like global variables).
Yes - if you have a real problem Julia is the way to go. If you are just banging something out to prove a point or make a delivery then Python is often much easier.
Ofc this is like Excel and Notebooks - I start doing things in Excel because I can sort out an answer in like 30seconds. Doing it in a notebook requires 5 minutes, or maybe a little longer. But... see me there, a week later after the feedback and next questions from the customer... now I am in Excel hell and I wish wish wish I had started out in a Notebook.
This is my favourite comment about julia for like last few years +1 on that.
Ecosystem is extremly poor outside very few niches and most of the Deep Learning stuff isn't even faster than python api (+C ofc.) so swaping is just usless if u dont have time to write your own GPU kernals for every new opertaion.
At least for the GPU case, the ecosystem is slowing moving towards writing generic kernels that can be executed on both the CPU (multithreaded) and the GPU, without doing anything special in the kernel itself, via KernelAbstractions.jl. It's still got a little way to go, but already some larger codes are using it to great effect. Also, as a member of the JuliaGPU group, I know that AMD and Intel GPUs should be supported by KernelAbstractions within the next month or two, so a single generic kernel will be able to run unmodified on all major GPUs.
I have been using Julia only for a few months, but I’ve been surprised in the speed up that’s possible vs python code using pandas. Depending on the size of your datasets the JIT might slow you down a little, but the speed of Julia outweighs this. Liberally using functions really allows Julia to shine.
One thing that I’ve recently seen which concerns me long term is the creation of various competing macro syntaxes for reducing the wordiness of Julia. There are many competing implementations of pipes and other syntax sugars. These macros definitely make things easier, but as you use them the code becomes more difficult for another to understand and since there is at this time, no one set of macros to use, you’ll have to know each of the competing sets to make since of examples.
I have created one of the recent competitors in this space [1], and I think it's not so bad, as long as the macros are reasonably simple to understand. If you look in the readme of my project, the four syntax examples actually look quite similar so I don't anticipate they'd cause a lot of confusion.
When I got annoyed in my own work that some data wrangling syntax was repetitive I was just really glad that I could easily build my optimal solution and didn't have to just accept that there's one suboptimal (for me) way. In Python and R, if you like what they offer that's good, if not - not so good.
Part of the problem comes from Julia not being geared towards DataFrames like R is, but I gladly trade a bit of convenience in one domain against a lot of expressive freedom with very clean rules that apply everywhere.
For example, I think it's quite good that you can only have "weird" behavior in Julia with macros, but they give you a visual indicator with the @ that you're seeing non-standard syntax. While in R, the non-standard evaluation means that literally anything could happen to the variables you pass into any function. It makes for some convenient syntax in some cases, yes, but it's so confusing as a system for writing software! You never really know if you're looking at a variable or just a name, for example.
Julia is way superior if you are building programs and systems, especially if you are building for sustained use (rather than something to do a job once). Julia is less error prone, more expressive, more maintainable, more performant.
But if you are creating cut and shut scripts for data science notebooks Python wins... the repl start time alone is a killer for Julia, add in the requirement to actually think about structure and the problem and it's out of my "3 -> 6hr" workflow.
Julia is amazing for numerics, but the JIT is painfully slow for anything that gets looped only a few times or not at all. I don't think it is usable as a general purpose language until this gets improved somehow.
But is that a problem apart from in scripting? If you ain't looping you ain't waiting in my experience.... Can you give an example which isn't time to plot?
I think that a slow JIT is problematic in any application except for non-interactive numerical calculations. For example in a GUI software or in a server application it would be very undesirable to have each function run a million times slower the first time it gets called.
For GUI's I don't see this as an issue - JITs run alot faster than humans. All of Javascript is on a JIT - and thats the dominant UI for now! For servers generally you find that users are all calling the same function so it's very rare that you hit a blip - much more commonly users get performance problems from something else in the stack like the network or the client, and while Javascript is the dominant front end the dominant back end for servers is Java and that's got a JIT as well.
JavaScript's JIT is a tracing JIT, so it can compile code in the background while the interpreter/less optimized compiled code is actually running. In Julia, the compiler runs first, and then the compiled code is run. This will probably eventually change as Julia's compiler improves, but regardless, it's important to note this distinction.
Julia is simply a _much_ nicer to write lang than Python, it is more functional and it provides the user with far more ways to be expressive. I think the JIT warmup is less of an issue than its overall memory use but I can see that the former is more apparent to the average user sitting at their terminal.
I wish we would see a larger fraction of the energy invested into propagating the 'Pythonic' approach were _properly_ redirected into improving relevant aspects of Julia.
If you want to help the adoption of Julia in the bioinformatics and medical applications - feel free to support BioJulia[1][2] on their OpenCollective page[3]. During the times of pandemic projects like these are of the special importance.
Julia’s type system is not particularly user-friendly. For example, it has both a “String” and a “SubString” type which cannot always be interchanged. Their language design seem to be much more concerned with execution speed than programmer productivity — Python has the balance in the opposite direction, but they’ve been gradually improving performance for years, and this is much easier to improve after the language design is set in stone, especially as more and more people add typing info to their programs.
I’ve generally found they can be interchanged in the way you would expect as a Python user: duck typing. I have more experience using SubArrays, but I think the underlying machinery in the language is basically the same.
Adding performance after the fact is not easy. This is why most numerical Python projects depend on C extensions rather improvements to the Python runtime. Writing C extensions or jamming your algorithm into the shape of existing C accelerated APIs is often not very time efficient.
IMO, array indices are a matter of taste which inspire feelings of religious intensity.
You can change it if you prefer 0 or -14 or whatever number you like. Non zero effort is required, but if you otherwise like the language it can be changed AFAIK
Julia uses modern compiler technology to achieve close to native performance. This is not just generating LLVM IR.
Julia also has it's own optimization system for language specific optimizations that LLVM struggle to do (due to language specific info getting lost in conversion to LLVM IR).
Google's V8 (js interpreter) also uses modern compiler tech but I think it is not as capable as LLVM optimization vise (I don't think it is designed to be).
Python will either adapt or perish. Even if it is not julia it would be a another language.
V8 and the Julia compiler work in quite different ways - because they solve quite different problems. In particular, Julia’s compiler only works well on code that is "type-stable", whereas V8 has no such limitation.
I know that V8 does some interesting stuff with it's types.
Julia's creators considered supporting optimization while designing the language.
Whereas the first javascript engine that even generated machine code came much later it's creation.
Js has weird features like being able to set a getter function to a array index.
There was a memory corruption bug in V8 that the implementation of `Array.sort` would call a getter function in the array that would change the size of the array causing a memory corruption. This was used in a exploit.
Creators of V8 created a domain specific language called Torque to implement the language lol.
For some numerical things it’s nice with easy interfaces to modern algorithms. I solved a differential equation recently and it just worked. And Julia feels like a much more proper programming language than matlab.
At the lower level (which I haven’t looked at in a while so may have changed) I found it a bit confusing and messy. The subtyping and method selection are tricky to get right and they are fundamental to important parts of the language like it’s numeric tower. But libraries seem to just work.
Macros were horrific and the ast is inscrutable and liable to change from one version to the next. Quasiquoting was also tricky. So I wouldn’t recommend trying to do anything weird with them. But maybe they are good now.
Pluto notebooks seem a great concept. I tried them recently and mostly they worked (sometimes they didn’t get dependencies right but it’s still beta). I felt like I was fighting a bit with plots.jl. I don’t know if there are things that weren’t obvious to me that I was missing or if it can just be a bit annoying. I haven’t tried gadfly but I would like to at some point. I’ve heard good things about ggplot2 which it is inspired by.
I felt like documentation was a bit lacking in good straightforward tutorials and examples. As well as documentation in general. But I don’t want to put too much emphasis on that.
Julia is fantastic with a great community. The one issue I have though is the use of greek symbols, while great for those formally trained, may have a negative impact on wider adoption for deep learning.
Python has a battle-tested, humongous standard library, and a rich ecosystem. It is easy to learn and it's great at gluing every day stuff. Data scientists like it, engineers like it, even the cashier at Lidl likes it.
I can see how Julia may challenge Python for academic use, but challenging Python in 2021 for industrial use is no joke.
Fortran holdouts say today that there is still no competition to Fortran for optimizing code in multiprocessing/supercomputing environment and no volunteers to rewrite tons of proven and optimized to the extreme numerical and physics code in another language. Good, old languages die hard...
Going by Tiobe there is gulf between the top 4 languages and everything else. To put it into perspective Julia is only twice as relevant as Prolog and on par with Scratch.
Of course I don't necessarily think Tiobe is a great metric for this but it was quoted in the article.
"I saw the 87 percent increase and think it is wonderful to see Julia growing. I think that Julia has great potential to replace C/C++/Python (and of course Fortran) in scientific and technical computing as it matures."
its really all about the ecosystem, community and ease of use. python took off once people developed numpy, pandas, Anaconda, etc.
On what time scale? Julia is really nice lang. but it will take decades for it to put a serious dent in Python. If web side of things matures fast and someone builds a killer framework it might carve out a niche there too.
It's easy to pick up super readable performant lang with good package manager. This already makes it significantly better than a good number of popular web dev. languages.
Julia absolutely is a general purpose language and has been from the beginning. However, there are plenty of languages that are fine for building websites, whereas there are no other languages with the combination of speed and usability that Julia offers in technical computing. It's a lovely language for doing all kinds of work and I personally mostly use it for non-technical computing these days — specifically to implement package management client/server infrastructure, which is mostly web + file system wrangling.
Can you compile Julia programs to native binaries with a singe command ? Can Julia run on mobile and embedded devices ? A real general purpose programming language should do all of those.
True, I'm obviously biased, but at least I can attest to the intention: Julia has always been intended for general purpose computing with the additional (and very challenging requirement) that it be a general purpose language that's also excellent at technical computing, which turns out to be a remarkably hard additional requirement.
1. Base-1 indexing is more natural and less verbose. My suspicion is that base-0 indexing grew out of language implementers wanting to reduce their mental overhead rather than a first-principles approach. AKA machine code leaking out to the higher level languages.
2. You have to have some way of structuring syntax. Whether you have "end", ")" or "}". It is not inconsistent to start with something else, like a function signature with the corresponding keyword. Again Julia opts for natural readability.
Both of these complaints are rather superficial anyways. Julia is a marvelous piece of tech and has an interesting story to tell about type inference, multiple dispatch, performance, data-structures, general Lisp-yness and the merits of tailoring a language for a purpose/domain (and its users) in contrast to trying to adhere to paradigms and programmer culture (cults?).
I can't say that I like or use Julia, but just regarding your comment, and because I've seen similar comments about other languages around:
> 2. Julia uses an "end" keyword everywhere, which is imho too verbose (and the corresponding "begin" is missing so it's inconsistent).
This is an absolutely childish approach to comparing or selecting programming languages.
In my eyes, it says a lot about the maturity of software development as a discipline that a big chunk of our debates are at this level.
We should discuss about quality, breadth and depth of standard libraries, quality of implementation of the most common interpreters/compilers, etc.
I don't want to fault you personally, OP, I think this approach is quite widespread, unfortunately, one could say that it's part of our software development culture at this point.
While you do interface with it all the time, it's not something you'll be thinking all the time, at some point it will be completely invisible even though it's there (unless you're that bored). What will really be a big part of the experience is what offers actual resistance to solving your problem and actively waste brain power and not muscle memory, be it the fact that language semantics forces you to write the same thing all the time when a particular feature (like macros) could handle it trivially, the lack of some particular type safety makes it so you keep losing time debugging the same error, the interactive tools being lacking forcing you to waste time debugging with prints, you have a workflow where the JIT lag keeps breaking your pace, the community doesn't have a culture of documentation making it so you'll lose time trying to decode the source, etc...
If 0-indexing hadn't been the default for so many programmers for so many years (usually because the languages being used required actually thinking about arrays in terms of memory offsets) then it would feel very odd. Humans reason about counting in terms of the natural numbers, so 1-indexing is fine. Just ask any beginner CS student what they think of 0-indexing if you doubt this.
I would argue that if those really are your major gripes about Julia, you either have very bad priorities, or don't know enough about Julia to contribute to this conversation.
Base-1 makes sense in sciences where you've Fortran, MATLAB, R, Mathematica. If anything before Python's rise in popularity base-0 was the strange choice. Fortunately you can use arbitrary indices with OffsetArrays.
Not really. It was a choice for C and C++ and it has a very solid low level implementation and efficiency motivation. Unfortunately, it also became a curse for these languages. After many decades of existence, they still haven't introduced proper multidimensional arrays, other than a hodgepodge of competing library implementations.
I don't like those two either, but that doesn't change the fact that the rest of the language is absolutely ideal for what I do ;-) (ecosystem modelling)
I'm interested in the following example (3x speedup), and is it possible to read more details about their project?
(From the hpcwire article) "During his talk, Edelman presented an example in which a group of researchers decided to scrap their legacy climate code in Fortran and write it from scratch in Julia. There was some discussion around performance tradeoffs they might encounter in the move to a high level programming language. The group was willing to accept a 3x slowdown for the flexibility of the language. Instead, said Edelman, the switch produced 3x speedup."
I tested my well-optimised R code and saw only 3x to 10x performance gain. That's still not substantial enough currently to migrate a whole code base, in particular given that the libraries are also still not mature enough. The research group I'm working with also have no interest in adopting anything new, In fact most of our code is still in FORTRAN so that is something I would be more interested in migrating to Julia but I don't think that is happening anytime in the next 10 years
Seems like it would still be useful use Julia as the backend for the R package instead of Fortran. I've been showcasing a lot of that lately with good success:
It does sound like a lot but it depends on the actual wall clock time. If your run time goes from 10 days down to 1 day then yes, it matters a lot. If it goes from 1s down to 0.1s it might not matter so much.
Of course, but if you observe 3x to 10x performance gains across the board, you will have some programs that run in more than 1 second where it may be worthwhile.
I recently decided to port some of my Python analysis code over to Julia since I've been doing a lot of ODE stuff lately. Overall I'm liking it, but I've found a couple things frustrating: 1) the docs and online issues/answers are a bit of a mess, partly because many of the accepted answers are from pre-1.0 days; 2) the error messages I've found are long and not very helpful. It takes much longer to track down a bug in Julia than Python code.
Is it correct to state that in 5 to 10 years we’ll see Julia as the default for new Data Science projects? (ML and statistical inference). I’m not a biggie on switching tool sets just because something is becoming “hot”. I like to start using something when it’s boring and battle tested, the youngsters can do the bleeding. But it seems like the likely candidate if something is going to displace the Python and R ecosystem?
I’m a huge Julia proponent (and before, a huge Python proponent over Matlab), but I would be careful about claiming that Julia will be the coming standard. Python has a crazy amount inertia and excellent projects still in the pipeline.
I think Julia will eventually win because writing your code in one language which isn’t C or FORTRAN is extremely productive and leads to much more composable libraries. The progress that’s been made on, for example, deep learning frameworks is impressive given the lack of massive investment from FAANG. I hope this leads to it dethroning Python, but it might not. If anything will, I think it has the best chance.
I’d suggest you give it a try anyway because its really not a hard language to learn. The ecosystem itself is quite good. The fact that you can write fast code without C extensions leaves you less dependent on the ecosystem, too.
> Is it correct to state that in 5 to 10 years we’ll see Julia as the default for new Data Science projects?
No, it is not correct to make this claim. Nothing is going to de-throne Python in the next five years and it is EXTREMELY unlikely that anything will de-throne it in 10 years. Any language that replaces Python in these tasks will need to be significantly better, and Julia just isn't that. Incremental improvement in a few areas that reek of premature optimization is not going to be a compelling argument for the masses.
The language that de-thrones Python has not been invented yet, and it will probably need some sort of hardware-coupled advance to have a chance (e.g. if the next big leap in mass-produced hardware were to drop 4K cores into a cheap SoC then a simple scripting language that handled internal data and execution concurrency might take over.) Julia is nice, but if anything you are probably going to see more migration from MATLAB and similar older dead-ends to Python over the next five years than you are to see migration from Python to Julia.
Probably yes - but 10 years is more likely than 5. Python really gained steam when it was 15+ years old. Julia has two years since its 1.0 release.
But it will eventually take over Python, unless something third comes and takes the cake before it. Julia is simply much better: More consistent, better designed, faster, more flexible, more extendable and with better tooling.
But is say most people can just wait. If Python is working fine for you, and it's not going anywhere the next 10 years, why not just wait? At that point Julia will be more mature with better learning resources and a better ecosystem. You can always just pick it up then.
Can anyone recommend some good resources to learn Julia? I am hoping there is something akin to the Rust Book [1] since watching youtube videos is too slow and the exercism option listed on the website is too cumbersome.
I'm a researcher, doing lots of numerical work both professionally and in hobby projects. While Julia has a lot of technical merits, there are just some superficial, syntax-level design decisions that strongly rub me the wrong way: 1-based indexing (makes interfacing with C code hard), explicit begin/end (verbose & ugly) and column-major indexing (personal preference). I understand that these follow in the footsteps of Fortran (and matlab), but they always feel wrong to me. I grew up on C-based languages, and these things made working with R and Lua difficult whenever I had to interface with any non-numerical code (which somehow even most of my research projects ended up needing). It's a weird hill to die on, but I personally will avoid Julia (and actively discourage students from using it, in case I need to work with their code) for as long as I can due to these design decisions.
Who am I to say what hills are worth dying on? But I don't want you to die at all - in case you're ever compelled to switch, I can't help you with end, but OffsetArrays.jl can help with your indexing woes, and I think there are packages to rotate matrices too (though not certain of that).
I'm expecting somebody to add them to LLVM soon. I'd talked to folks at Apple about that some time back, but they weren't able to tell me at the time what their plans were for adding it. That said, I am fully expecting them to just add it to LLVM themselves. If not, somebody in the community will do it. Once that's done, Julia will just pick it up.
I’m not sure what “them” is. Apple has AMX, a Neural Engine, and a GPU which are abstracted through the Core ML and Accelerate libraries.
AMX is not exposed as an instruction set like NEON. I guess this question applies to all of the numerical analysis tool chains: will they be able to leverage Apple silicon coprocessors/accelerators?
I was talking specifically about the ARM ISA extensions. The accelerators are a different question of course. I understand the folks working on the Linux port are trying to reverse the GPU ISA. Apple could of course also just publish that, but I'd put the odds on that lower than the ISA extensions. That said, Apple has sometimes surprised me on these kinds of things.
The most important aspect of any language are the trade-offs and values held by the designers and community. Julia takes a zero-compromise oath to approachability and performance. Julia is not (yet!) a perfect language, but those values make me want to invest in the ecosystem.
In my tests I frequently switch between cpython, pypy and julia (depending on the libraries/task I want to perform) and I haven't found the JIT overhead to be worse than pypy on average.
Count me as one of the 1-based index haters, but I do love multiple dispatch and the language in general. As a language for explorative tools and analysis is on par of python (strict preference between the two according to taste).
To me the biggest flaw currently is the poor "catch" syntax for exception handling. There are countless spots where exceptions are incorrectly caught at random points due to the catch-all semantics hiding/masking/breaking stuff. This is one area where I really find the syntax has been chosen poorly and it's causing real damage.
Agreed on error handling actually. It doesn't quite have the feel that it should in a modern language. I think error handling and lifetimes/mutability are two of the top things we're looking at for a fundamental remodel in 2.0.
PyPy is definitely used behind the scenes in many places. IMHO you don't often hear about it because if you're ready to take the performance hit that comes with python you're probably not trying to squeeze the best out of it all the time.
There was also the lack of compatibility with existing packages, a big issue in the past, but nowdays it's pretty rare.
You can often just run your program into both and measure if using pypy makes sense for your task. Frequently the free speedup is very welcome, especially for long or repeating jobs.
Even when used opportunistically like this PyPy is still tremendously useful.
I don't know. I also don't use pypy much, since python+numba is actually fast enough most of the time, and I always see pypy as a fallback to see if I can squeeze a little bit more performance before running a task.
No, from my own (relatively limited) experience with Numba, if you enable "@nopython", it's about the same speed as Julia, which is the same speed as C.
The thing is that Numba is only applicable for simple numeric code. Last I checked it didn't even support custom classes. In fact, last I checked it didn't even support Numpy - to support "Numpy" it had to internally re-implement much of Numpy, which really says something bad about its use cases. In contrast, the Julia JIT speeds up the entire language from string processing to set operations.
Edit: To not be misleading: Julia and C (and Numba) have the same speed only in the simple cases you can apply Numba to. In more diverse workloads, C pulls ahead of Julia for various small reasons.
I wonder can I compile Julia code into a Windows DLL? For instance, with AVX2 and DirectCompute included?
Currently writing C++/17 and HLSL. Would like to evaluate something higher-level. However, I think it’s unreliable in the long run to redistribute and support complicated packages like Python runtime or LLVM. Users mess with environment variables, update Windows, run antimalware, etc. Process startup time also matters, I don’t want to wait 40 seconds for the first output.
> "Because Julia’s compiler is different from the interpreters used for languages like Python or R, you may find that Julia’s performance is unintuitive at first. If you find that something is slow, we highly recommend reading through the Performance Tips section before trying anything else. Once you understand how Julia works, it’s easy to write code that’s nearly as fast as C."
If Julia needs a "Performance Tips" section to produce fast code, I might as well use Python.
The "speed" from Julia comes from LLVM, but there is nothing stopping Python to use LLVM as well where it _makes sense_ (which is the case with XLA in TensorFlow, for example).
I see no plus value in learning Julia over existing tools, there is nothing revolutionary or nothing that could alleviate future risks.
I don't want to disparage Julia, it's actually a very nice language, and I was very excited to learn it a couple of years ago.
But, honestly, I think their adoption at this point is less "linux-like" driven and much more "apple-like". In that, the language is 'ok', but the company is going to INCREDIBLE lengths with respect to shrewd marketing and buzz-creation at this point.
Which is admirable but also kinda worrying at the same time.
This "Julia marketing conspiracy theory" that many people on HN seem to believe is so bizarre. What big tech company do you think is behind this incredible, shrewd and presumably well-funded marketing campaign? Julia is the only new major programming language of the last decade that doesn't have a major tech giant backing it. Adoption and development are pretty much entirely grass roots. If you see a lot of enthusiastic posts about Julia on HN, that's because there are a lot of actual people out there using Julia who love it and write posts about it.
Hi Lyndon. Yes, but that's exactly what I meant! I think my phrasing was off. For a company of that size, I've seen very good activity promoting the julia brand, both officially and through word-of-mouth networks. (your own Cambridge meetups notwithstanding). Therefore, I think much of the hype is at least partly that, rather than just the technical merits of the language (which I agree it has plenty). I don't remember this kind of 'buzz' before v1. Back then it was just people who saw promise in its features. Now people seem to be promoting it quite actively.
I've seen a shift in the winds, that's all I'm saying. I wasn't mean to come off so negative. (certainly not as negative as Chris took it!)
Thanks for the explanation. I feel that Julia has always been well received on HN ever since we publicly announced it in 2012. I believe that post v1, there are just more users out there and more blogs are being written, more companies are using it, more universities are teaching it, and hence more stories are making their way to HN.
Nowadays, I find new Julia stories and posts when they show up on HN (as opposed to a few years ago when all you had to do was follow juliabloggers).
Hey, the Julia open source organization did have an undergrad in his senior year working part time on community management though. Can't leave that out. We don't know if JetBrain or Mozilla had something like that.
@StefanKarpinski I said none of those things. I didn't mean to hit a nerve. I actually agree with you. I was one of those grassroots people who enthusiastically tried to get friends to try it. Perhaps 'shrewd marketing' didn't come off as positive as it sounded in my mind.
PS. One forgets people like Stefan and Jeff are likely to be on HN. Apologies. I'd have been a bit more careful in my choice of words otherwise.
The buzz you see is almost all from people who switched from other languages and found that Julia was a gigantic breath of fresh air. I can say that for me, it completely changed my attitude towards programming in general. Before, programming was something I did sometimes as part of my physics research. Now, it’s also my hobby that I probably spend too much time on.
It’s hard not to get a little evangelical when you go through a change like this.
Yes, they went to INCREDIBLE lengths by, umm, spending a number of years creating an excellent language which drew enthusiastic folks who created INCREDIBLE packages on top of that base.
Seriously, when I learned Python (about 20 years ago) I thought it was amazing, and it was, because it let me do things I wouldn't have otherwise done (by reducing the cognitive load on the programming side so I could think more about my problem than the code).
Julia's giving me that kick again - more expressive than Python, doesn't just glue things together but integrates them, and can make code as fast as any language.
It’s extremely silly, but I don’t really like the name Julia for a programming language. It’s just a bit uncomfortable to have a programming language with a particular, kind of formal-sounding human name like Julia (or like Michael, Lauren, or Jonathan).
It just feels weird to me. I know a number of people (family, friends, colleagues) named Julia.
I honestly think it could have an effect on adoption. People have to say the name a lot in making a choice to adopt a language for a project. Names like C, C++, Java, Python are fairly neutral. “Julia” is just an awkward name in this context, in my opinion.
What about "Ada", "Miranda" or "Haskell"? First names, too, albeit ones much less common these days. ("Linda" the language isn't even in that category, popularity-wise, although the source of the name seems to be a weirder story)
I had no idea that “Ada” and “Haskell” were intended to be people names. Those are not super common names. And I had totally forgotten about Miranda, which I agree has the same issue.
I see, that's what I thought. No problem with naming languages after people, but it's easier if it's an homage to a certain person. Common first and last names alone often point to confusing people, and there might be a certain dissonance between the mental images ("I pulled some Julias pigtails in kindergarten, now I have her name on a CV?").
The same problem would probably arise if the last names would be more common, too. "Pascal" and "Turing are probably rare enough ('though "Pascal" was a bit in fashion as a boy's name in Germany when I was young).
I agree that it's easier when it's an homage to a certain person, and a last name, like Pascal. I actually knew that about Pascal but never gave it a second thought.
According to Wikipedia at least, "Julia" is not named after anyone in particular.
Ada was named after Ada Lovelace, for context. And it’s definitely not a common name anymore, at least in the US. It dropped off significantly through the 20th century with a spike several years ago.
With colleagues I've discussed some thought to come from a person name where others thought to come from Julia sets (which in turn are named after the mathematician G. Julia).
First, it removes the typical numpy syntax boilerplate. Due to its conciseness, Julia has mostly replaced showing pseudo-code on my slides. It can be just as concise / readable; and on top the students immeditaly get the "real thing" they can plug into Jupyter notebooks for the exercises.
Second, you get C-like speed. And that counts for numerical algorithms.
Third, the type system and method dispatch of Julia is very powerful for scientific programming. It allows for composition of ideas in ways I couldn't imagine before seeing it in action. For example, in the optimization course, we develop a mimimalistic implementation of Automatic Differentiation on a single slide. And that can be applied to virtually all Julia functions and combined with code from preexisting Julia libraries.
[1] https://www.youtube.com/playlist?list=PLdkTDauaUnQpzuOCZyUUZ...
[2] https://drive.google.com/drive/folders/1WWVWV4vDBIOkjZc6uFY3...