Julia adoption keeps climbing

jpfr · on Jan 18, 2021

I teach a graduate course in optimization methods for machine learning and engineering [1,2]. Julia is just perfect for teaching numerical algorithms.

First, it removes the typical numpy syntax boilerplate. Due to its conciseness, Julia has mostly replaced showing pseudo-code on my slides. It can be just as concise / readable; and on top the students immeditaly get the "real thing" they can plug into Jupyter notebooks for the exercises.

Second, you get C-like speed. And that counts for numerical algorithms.

Third, the type system and method dispatch of Julia is very powerful for scientific programming. It allows for composition of ideas in ways I couldn't imagine before seeing it in action. For example, in the optimization course, we develop a mimimalistic implementation of Automatic Differentiation on a single slide. And that can be applied to virtually all Julia functions and combined with code from preexisting Julia libraries.

[1] https://www.youtube.com/playlist?list=PLdkTDauaUnQpzuOCZyUUZ...

[2] https://drive.google.com/drive/folders/1WWVWV4vDBIOkjZc6uFY3...

i_am_proteus · on Jan 18, 2021

This. Julia's combination of human-readable pseudocode-like code and speed makes it perfect for these applications, and seems to be driving adoption.

mirekrusin · on Jan 19, 2021

Also not seen before degree of code reuse.

legerdemain · on Jan 18, 2021

I'm using Julia (because of the hype) to prototype out some numerical optimization stuff. There is a million functions for reshaping multidimensional arrays. The syntax is uncannily like Matlab: retrieving the last element of an array with `[end]`, indexing into a collection with an array of booleans, element-wise versions of operators prepended with dot, etc.

However, I keep running into niggling corner cases that kind of make Julia's promise of a powerful, extensible, yet intuitive type system less convincing.

ME: I want to write a custom getproperty() for Tuple!

JULIA: No.

ME: I want to broadcast over the fields of a NamedTuple!

JULIA: Not allowed.

ME: I want to get a view, not copy, with `@view M[m:n, r:s]`, but also get the ability to specify a default for out-of-range indices, like `get()` allows.

JULIA: I'm afraid I can't let you do that.

StefanKarpinski · on Jan 18, 2021

As sibling posts have pointed out, you can do all of those things:

1. You can trivially write a `getproperty` method for a tuple. It is considered to be type piracy and thus runs the risk of colliding with someone else's definition, but the language absolutely lets you do it.

2. You can broadcast over the fields of a `NamedTuple` by defining appropriate methods. Again, it's type piracy, so take that into consideration, but the language lets you do this easily as well.

3. The https://github.com/JuliaArrays/PaddedViews.jl package implements exactly what you're saying Julia won't let you do.

If anything, Julia errs on the side of allowing you to do too many things! There are very few things the language really won't let you do.

legerdemain · on Jan 18, 2021

I meant literally this:

  julia> (x = 1, y = 2) .+ (x = 1, y = 2)

  ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved

DNF2 · on Jan 18, 2021

This doesn't really seem like a legitimate complaint. You want some particular pet behaviour, and claim it is impossible to achieve. When someone points out that it is in fact possible, you are unhappy that someone else did not anticipate and implement it pre-emptively...

Do you also expect your 'custom getproperty' (whatever it might do) to have been predicted and pre-implemented by someone else? And do you also expect arrays to 'just know' what value or behaviour you are looking for whenever you index out of bounds?

eigenspace · on Jan 18, 2021

Stefan's point was that you are free to commit type piracy and make this do whatever you want

  julia> Base.broadcasted(f, x::NamedTuple, y::NamedTuple) = "my custom method!"

  julia> (x=1, y=2) .+ (x=1, y=2)
  "my custom method!"

StefanKarpinski · on Jan 18, 2021

And as the phrase "reserved" in the error message indicates, it will likely be given a meaning once all the ramifications of doing so are worked out and the best choice of meaning is decided upon. If you're impatient and don't want to wait for that, define it to do what you want. Your code won't even break when it is given an official behavior since your method will overwrite the built-in one.

legerdemain · on Jan 18, 2021

But this is exactly the kind of thing I called a bothersome corner case. Needing to redefine a global function in order to use fairly intuitive behavior is not great developer experience.

StefanKarpinski · on Jan 18, 2021

When things like this are left undefined, it's not to intentionally annoy you, as you seem to be taking it. It's generally because there are two or more reasonable possible behaviors and which one is correct hasn't yet been determined. In this particular case, there are subtleties because named tuples can be seen as ordered collections of values and as named associative structures. Deciding that kind of thing takes a lot of time and effort. If you feel that there is a preferred behavior that broadcasting over named tuples ought to have, it would be helpful to post that on GitHub.

newen · on Jan 18, 2021

It's not generally recommended but you can do a custom getproperty() for Tuple.

  julia> function Base.getproperty(x::Tuple, f::Symbol)
       if f == :a
         return x[1]
       elseif f == :b
         return x[2]
       else
         return x[3]
       end
       end

  julia> (3,4,5).a
  3

Edit: Okay, why am I getting randomly downvoted here?

oivey · on Jan 18, 2021

For the last thing, is this what you want? https://github.com/JuliaArrays/PaddedViews.jl

There may be other packages or methods for doing the other things you want. I’d think that broadcasting over a NamedTuple would iterate over the key => value pairs, but I haven’t tried it.

legerdemain · on Jan 18, 2021

Thanks for the good find!

oivey · on Jan 18, 2021

Looks like vectorizing over NamedTuples is explicitly disallowed. Probably you can still vectorize things over the keys and values separately, along with some helper functions, but it is a bit annoying. Looks like the reason was due to questions on whether iteration should be over values or pairs.

StefanKarpinski · on Jan 18, 2021

Indeed, that is precisely the decision that must be made. Neither one is obviously correct. And once a decision is made and goes into a release, it cannot be unmade — we all have to live with it forever.

grayclhn · on Jan 18, 2021

> Due to its conciseness, Julia has mostly replaced showing pseudo-code on my slides.

This benefit is really underappreciated IMO — for a lot of "science" applications, the core part of the program should be readable by people who don't program in the language. In research papers, by people who want to understand the fine details of your algorithm, for example.

Julia gets closer to "executable pseudocode" than I would have thought possible.

yellowcake0 · on Jan 18, 2021

I would add,

Fourth, the ability to effortlessly drop down several layers of abstraction: Pointer types, all packages including Base are written in Julia and can easily be extended or patched on the fly, homoiconicity, seamless integration with BLAS and LAPACK.

awefasdfasdf · on Jan 18, 2021

Small aside. Do you use the beamer package for your slides?

jpfr · on Jan 18, 2021

Yes. Beamer LaTeX with the Metropolis theme and Fira Sans as the main font.

dm319 · on Jan 18, 2021

Those are beautiful slides.

superbcarrot · on Jan 18, 2021

Julia is a nice language, it's just tough to compete with Python.

- The beginner experience in Julia is still much worse than it is in Python. Stuff that should work intuitively sometimes doesn't, and when you get a cryptic error message, it's difficult to find relevant help online. And when you do find help, some of it is out of date because the language has changed over the past few years.

- You can squeeze a lot of performance out of Python and the ecosystem of libraries is hard to beat.

- Julia has to be way better than Python to give people an incentive to switch. Being just marginally better in some aspects of the language isn't enough. And it's very difficult to be much better than Python especially in useability and ecosystem.

veddox · on Jan 18, 2021

> Julia has to be way better than Python to give people an incentive to switch.

A language doesn't necessarily have to give all the old programmers an incentive to switch, if it can position itself as a good language for new programmers to learn.

For example: at our institute (computational biology), we had a PhD student who was an early Julia adopter and wrote his model in that. Several students have since joined the project he started, so obviously they're now writing Julia too. That project's experiences with the language were so good, it soon became obvious that for our use case, Julia was superior to any other language we'd used so far. So pretty much the whole research group has now shifted to Julia, and that's what we teach new students. Slowly, other groups in our institute became interested, and more and more people are adopting it, which in turn means that their new students will also end up learning it in future.

CoolGuySteve · on Jan 18, 2021

If you work with a lot of data, Julia is already a 10-100x improvement over Python.

Being able to iterate and mangle huge columns with real lambdas and without having to marshal arguments to/from C++ is a huge advantage.

Where I used to spend hours in aggregate searching through docs for pandas/numpy, for stupid shit like "how do I shift but also skip NaNs", now I just write a for-loop in a couple minutes and get on with my work.

There's a whole subclass of tasks in R/pandas to work around the interpreter that just aren't needed in Julia.

For me at least it's well worth the syntactical warts and slow interpreter.

VHRanger · on Jan 18, 2021

As an experienced python/data science user, this (creating fast complex column-wise transforms) is rarely a problem for me.

The truly huge advantage for Julia is how it plays with parralelism. The GIL makes it an absolute pain to do parallelism in python. Always ends up in threading hacks with numba or joblib, or multiprocessing, which has its own unfixable flaws

oblio · on Jan 18, 2021

Examples: Basic, C (both Basic and C, to a degree), Visual Basic, PHP, Javascript, Python. I'm probably missing some. These displaced older languages just by being adopted by newbies.

bluenose69 · on Jan 18, 2021

This is an insightful and level-headed comment that applies equally to R.

Although Julia is a growing alternative to Fortran/C/etc for long-running computations, it remains awkward and unpleasant for interactive analysis. Users familiar with Python/R/etc must weight the benefits of Julia against its slow library startup, its cryptic error messages, and its thin documentation.

Also, the lack of a community repository for well-vetted Julia libraries can limit uptake by professional researchers who must be able to trust their tools. A real strength of R (in comparison not just to Julia but also to python) is that such a repository exists, and that it has automated testing across a range of computer architectures and versions of R, including not just unit testing within individual libraries, but also testing of related libraries.

disgruntledphd2 · on Jan 18, 2021

Yeah, the R community owes a lot of it's stability to Uwe Ligges, and his (sometimes annoying) insistence on CRAN package quality.

BadInformatics · on Jan 18, 2021

> Julia is a growing alternative to Fortran/C/etc for long-running computations, it remains awkward and unpleasant for interactive analysis

Did you mean to write short-running scripts? If anything, the Julia dev workflow is biased towards interactive analysis in a REPL a la R or IPython/Jupyter. I don't mean to imply that there's no startup overhead, but how often are you restarting the REPL when doing EDA? Unless it's more than once every few minutes (which is a very odd workflow), then startup overhead is effectively amortized.

> A real strength of R (in comparison not just to Julia but also to python) is that such a repository exists

CRAN is certainly a cut above many other package repos here, but I'm not sure "trust their tools" can apply to all packages on there. Anecdotally, I've had a lot of issues with compiled dependencies and missing/out of date external assets on less well-trodden packages. There's a reason MRAN, Conda and JuliaBinaryWrappers exist after all.

For whatever reason, Julia package maintainers also seem more receptive to making their work compatible with other libraries as well. This goes beyond just multiple dispatch as well--imagine if tidyverse/non-tidyverse wasn't such a hard split.

chalst · on Jan 18, 2021

There are warts with the beginner experience with Python, principally the awful situation with packaging.

If you care about performance in code that mixes together several packages in nontrivial ways, Julia is way better than Python.

There's a far broader range of libraries in Python than Julia, but none of them are going to prevent adoption of Julia when its performance advantages are crucial, because of the excellent facilities for using Python from Julia.

watwatinthewat · on Jan 18, 2021

I'm not sure the package problem is really a problem for beginners. Just within the last year firsthand I've seen people in undergraduate classes, in graduate classes, and at work try Python the first time, and the default install of Anaconda worked for them in every case. The classes were taught by different professors, and they all suggested Anaconda independently and were not Python programmers.

It is overkill/brute force to install all the Anaconda default packages when a beginner is not going to use over maybe 5-10 libraries, but it's a solution that has worked flawlessly for beginners from my experience watching non-software engineers and "non-technical" people using Linux, Windows, and MacOS try Python for the first time in math and data science classes.

shiftingleft · on Jan 18, 2021

I think this is a fair assessment, and would like to add that the "time to first plot" is also quite the usability issue.

Julia is using LLVM for code-gen has to compile a lot of code before you can actually use stuff like plots.

It takes ages to get a Pluto Notebook up and running, while a jupyter notebook is available instantly.

enriquto · on Jan 18, 2021

Wait, why can't you use a regular jupyter notebook for julia? The "ju" in "jupyter" stands for what, then?

jakobnissen · on Jan 18, 2021

You can, but Pluto is (I would argue) a better notebook than Jupyter. Regardless, the latency experienced with Julia applies equally whether you use Pluto, Jupyter, or any other front-end.

carterschonwald · on Jan 18, 2021

Pluto is a notebook design the world needs. It’s nice enough for it to motivate me to use Julia for some new prjects.

But yeah latency kills :)

spacedome · on Jan 18, 2021

You absolutely can use regular jupyter notebooks for julia! Pluto has some advantages, like being stored as a normal julia file. The julia startup time issues affect both.

enriquto · on Jan 18, 2021

> being stored as a normal julia file.

Oh, man, this is indeed a major feature. My main point of friction with jupyter notebooks is the stupid json ipynb format. Why can't it be just a regular language file with comments?

Jugurtha · on Jan 18, 2021

>My main point of friction with jupyter notebooks is the stupid json ipynb format. Why can't it be just a regular language file with comments?

Have you ever used Jupyter notebooks? They contain code, rendered Markdown, images, plots, video players, widgets, etc.

How do you see a "regular language file with comments" supporting this, instead of the "stupid ipynb format"?

You can use plain text files with Jupyter, too.

enriquto · on Jan 18, 2021

> They contain code, rendered Markdown, images, plots, video players, widgets, etc.

The code could be verbatim python code (or whatever language the notebook uses), and the rest could be embedded inside comments. I don't see any problem with that (besides the very concept of "rendered Markdown" being totally out of order). The fact that they are saving it as json by default seems more to be laziness by the developers than a well thought-out solution, that could be just a straightforward serializer.

Jugurtha · on Jan 18, 2021

>and the rest could be embedded inside comments. I don't see any problem with that

Do you mean embedding images and plots inside comments? If yes, please elaborate on how you see that happening in the real world.

>The fact that they are saving it as json by default seems more to be laziness by the developers than a well thought-out solution, that could be just a straightforward serializer.

So, how would that well thought-out solution in the form of a "straightforward serializer" work? I have a flat file, and I want to display images, plots that you can zoom into out of, figures, etc. as comments. How would that happen?

disgruntledphd2 · on Jan 18, 2021

Well you could look at the source for org-mode, which does what Jupyter does but in Emacs, and using plain-text files.

https://github.com/bzg/org-mode

enriquto · on Jan 18, 2021

> How would that happen?

At the very least, you could put the whole json stuff inside a comment. It's already plain text, isn't it?

lenkite · on Jan 20, 2021

Python supports multi-line strings for this. No need for comments.

Jugurtha · on Jan 18, 2021

>At the very least, you could put the whole json stuff inside a comment. It's already plain text, isn't it?

So instead of having the whole file as JSON, which is lazy and not well thought-out, we'll put all content in JSON, then put that JSON inside a comment in a plain text file. Do I read you correctly?

I feel we're making progress faster than these lazy Jupyter org bandits.

enriquto · on Jan 18, 2021

> we'll put all content in JSON, then put that JSON inside a comment in a plain text file

Only the "output" content. The code inside the cells is verbatim, and the markdwon cells are regular text comments.

See, I'm not discussing you just because. I have a legitimate problem with ipynb: very often I want to run the code of a notebook from the command line, or import it from another python program. This is quite cumbersome with the ipynb, but it would be trivial if it was a simple program with comments.

Jugurtha · on Jan 22, 2021

I believe people reading this are not detecting the sarcasm. I'm demonstrating that the Jupyter folks are not lazy engineers, and the "obvious" solutions people come up with are not that well thought-out when you start actually thinking about them.

_v7gu · on Jan 19, 2021

Rmarkdown does this perfectly

goerz · on Jan 18, 2021

Sounds like you want jupytext

enriquto · on Jan 18, 2021

Yes. What I really want is that ipynb disappears everywhere forever to be replaced by jupytext or a variant of it.

lhomdee · on Jan 18, 2021

You can also use VS Code notebooks and Julia support in VS Code keeps getting better. As a newcomer to Julia I am super impressed with the experience. No getting around loading the Plots package but producing a high quality plot and getting the data there is a much more enjoyable experience than pandas + numpy + Matplotlib + whatever tensor framework you’ve sworn to.

disgruntledphd2 · on Jan 18, 2021

Unfortunately for Julia, its actual competitors in this space are ggplot and R.

krastanov · on Jan 18, 2021

Do you still have latency issues in Julia 1.6? The latency improvements in the last 3 versions of julia have been so significant that I do not really notice it anymore. Supposedly there are additional speedups planned for 1.7.

kwertzzz · on Jan 18, 2021

I tried also recently the beta version of Julia 1.6 and the speed improvement of installing/loading package are quite impressive. Essentially, packages get precompiled after installation using multiple threads.

Beside this, if you only infrequently install/update package, you can use PackageCompiler.jl. I use it for PyPlot.jl (based on matplotlib), DataFrames.jl, ... and plotting some data quasi instantaneous as it is in python (even the very first time in a session).

alephnil · on Jan 18, 2021

Julia has the focus on scientific and numerical computing, and is overtaking the python/numpy combo in that niche. In addition to being considerably faster than python, it also has quite some innovative libraries in the area. This can also extend into machine learning, where python has been the go to language, despite its limitations.

For other areas, like web programming, there is no sign of Julia replacing Python in the forseable future.

superbcarrot · on Jan 18, 2021

> and is overtaking the python/numpy combo in that niche

No, it's isn't. Julia is growing but it's far from overtaking Python at this point.

> For other areas, like web programming, there is no sign of Julia replacing Python in the forseable future.

That's where Go comes in.

bsdubernerd · on Jan 18, 2021

I second this. Python is actually starting to get significant traction in the scientific community. Depending on the field, R, Fortran and Matlab (and even C++) still have a huge lead.

It's nice that Julia is getting noticed, but it's a distant blip in the radar.

The sci community is really hard to move from existing battle-tested and performant libraries.

oefrha · on Jan 18, 2021

I don’t have much insight on the scientific computing landscape in general, but here’s one notable data point: I worked on the CMS experiment of LHC (Large Hadron Collider) for a while, which is one of the highest profile experiments in experimental physics. The majority of CMS code is C++, which you can check for yourself at https://github.com/cms-sw/cmssw (yes, much/most? of the code is open source). What I worked on specifically was prototyped in Python, then ported to C++ and plugged into the massive data processing pipeline where performance is critical due to the sheer amount of data. So I probably wouldn’t put C++ in parentheses.

pansa2 · on Jan 18, 2021

> prototyped in Python, then ported to C++

This need to rewrite, of course, is what Julia is trying to avoid. My workflow is exactly the same, and I’d love to be able to write code in a high-level language like Python and then use that directly instead of having to rewrite.

However, in my case the reason for rewriting isn’t just performance, but also to be able to build compiled binaries. Julia aims to be as high-level as Python but faster - is there a language that’s as high-level as Python but AOT-compiled?

pjmlp · on Jan 18, 2021

> is there a language that’s as high-level as Python but AOT-compiled?

Common Lisp, Ocaml for example.

oefrha · on Jan 18, 2021

Nim? I know it has Python-like syntax and aims to be performant, but don’t know much beyond that.

0-_-0 · on Jan 18, 2021

Indeed, the Julia autodiff implementation linked above would look very similar in Nim as well.

mlthoughts2018 · on Jan 18, 2021

Cython - in fact I think in 2021 if you want to write a pure C or pure C++ program, Cython is the best way to go, and just disable use of CPython.

The “need to rewrite” is actually a sort of advantage with Cython. You only target small pieces of your program to be compiled to C or C++ for optimization, and the rest where runtime is already fast enough or otherwise doesn’t matter, you seamlessly write in plain Python.

Using extension modules is just a time-tested, highly organized, modular, robust design pattern.

Julia and others do themselves a disservice by trying to make “the whole language automatically optimized” which counter-intuitively is worse than make the language overall optimized for flexibility instead of speed, yet with an easy system to patch optimization modules anywhere they are needed.

cycomanic · on Jan 18, 2021

I have been using pythran for the last year and the nice thing is that you hardly have to rewrite anything but get speeds which are often as fast (or sometimes faster) than c modules.

The problem with cython is that to really get the performance benefits your code looks almost like C.

I agree with you on the optimize the bits that matter, often the performance critical parts are very small fractions of the overall code base.

szemet · on Jan 18, 2021

> Using extension modules is just a time-tested, highly organized, modular, robust design pattern.

I really don't get this. I'am fully on the side that limitations may increase design quality. E.g I accept the argument that Haskell immutability often leads to good design, I also believe the same true for Rust ownership rules (it often forces a design where components have a well defined responsibility: this component only manages resource X starting from { until }.)

But having a performance boundary between components, why would that help?

E.g. This algorithm will be fast with floats but will be slow with complex numbers. Or: You can provide X,Y as callback function to our component, it will be blessed and fast, but providing your custom function Z it will be slow.

So you should implement support for callback Z in a different layer but not for callback X,Y, and you should rewrite your algorithm in a lower level layer just to support complex numbers. Will this really lead to a better design?

mlthoughts2018 · on Jan 18, 2021

> “But having a performance boundary between components, why would that help?”

It helps precisely so you don’t pay premature abstraction costs to over-generalize the performance patterns.

One of my biggest complaints with Julia is that zealots for the language insist these permeating abstractions are costless, but they totally aren’t. Sometimes I’m way better off if not everything up the entire language stack is differentiable and carries baggage with it needed for that underlying architecture. But Julia hasn’t given me the choice of this little piece that does benefit from it vs that little piece that, by virtue of being built on top of the same differentiability, is just bloat or premature optimization.

> “you should rewrite your algorithm in a lower level layer just to support complex numbers.”

Yes, precisely. This maximally avoids premature abstraction and premature extensibility. And if, like in Cython, the process of “rewriting” the algorithm is essentially instantaneous, easy, pleasant to work with, then the cost is even lower.

This is why you have such a spectrum in Python.

1. Create restricted computation domains (eg numpy API, pandas API, tensorflow API)

2. Allow each to pursue optimization independently, with clear boundaries and API constraints if you want to hook in

3. When possible, automate large classes of transpilation from outside the separate restricted computation domains to inside them (eg JITs like numba), but never seek a pan-everything JIT that destroys the clear boundaries

4. For everything else (eg cases where you deliberately don’t want a JIT auto-optimizing because you need to restrict the scope or you need finer control), use Cython and write your Python modules seamlessly with some optimization-targeting patches in C/C++ and the rest in just normal, easy to use Python.

chalst · on Jan 18, 2021

> One of my biggest complaints with Julia is that zealots for the language insist these permeating abstractions are costless, but they totally aren’t.

This sounds like it might be interesting, but your later comments about overhead and abstraction costs sounds like you maybe don't understand what Julia's JIT is actually doing and how it leverages multiple dispatch and unboxing. Could you be a bit more concrete?

mlthoughts2018 · on Jan 18, 2021

No I think that’s what I’m saying. When raising the issue that using multiple dispatch this way is premature abstraction that has intrinsic costs, all I get is the religious pamphlet about multiple dispatch.

oivey · on Jan 18, 2021

In practice the multiple dispatch overhead is elided by the compiler. If it can’t be you’re doing something truly dynamic, which is generally unavoidably slower. It’s still a better place to be than everything being a generic Object type.

mlthoughts2018 · on Jan 18, 2021

The nice thing about Cython is that you can have both - all the multiple dispatch you want with fused types, or escape that paradigm to do other things if you desire. It gives a lot of surgical control.

oivey · on Jan 18, 2021

I don’t think that is true. As far as I know, Cython let’s you do function overloading and single dispatch via class inheritance. I think you also miss out on the type inference that lets you do things like pipe dual numbers through functions without any dispatch related overhead.

galangalalgol · on Jan 18, 2021

Does compiling with cython decrease the ffi overhead of the calls into native code? My problems with numpy have always been that I have to make a lot of calls on small bits of data and the ffi overhead eats all my performance gains. If I put more logic on the native side and made fewer bigger calls it would be faster, but that often doesn't make sense, or is a slope where putting the logic unto native pulls a data structure over or another related bit of logic until I just have a tiny bit of python left.

cb321 · on Jan 18, 2021

Probably. Cython compiles a C-style superset of Python into C. Then a C compiler compiles that to a Python-importable DLL/.so. So, the overhead to call a C function is no more than declaring its types (programmer person overhead) and then, in the generated C, the native C-linkage function can be called like any other. Now, just one C function calling another from another translation unit (i.e. object file or shared lib) can be "high" overhead (nothing like Py FFI), but you may also be able to eliminate that with modern compilers with link-time-optimization with some build environment care.

bsdubernerd · on Jan 18, 2021

Just for reference, my experience is mostly computational genomics. R is king of analysis, and most of the actual "meat" is implemented in C++. But I work with other teams as well, so the experience is a bit more varied if you look across different areas.

pjmlp · on Jan 18, 2021

First Go needs to offer comparable stacks to .NET and Java offerings, not only their platforms languages, but also their guest ones.

And yes, there are ways to AOT compile as well.

oblio · on Jan 18, 2021

It's all about which "bubble" you're in. Many people posting here work for startups using micro services (for which Go is a decent fit) and for companies close to the whole Docker/Kubernetes ecosystem, which is based on Go. So naturally they assume Go is huge.

My anecdata kind of tells me that Go is reasonably big, but it's not yet near .NET and Java, worldwide. But it could get there in a few years, I've seen/heard about some enterprises adopting it.

pjmlp · on Jan 18, 2021

You don't need to write C or C++ when using a SQL RDMS, likewise .NET and Java shops don't need to write Go when using Docker/Kubernetes.

oblio · on Jan 18, 2021

True, but I'm not talking about simple users. I'm talking about companies extending Kubernetes or building adjacent software. Even if their service doesn't necessarily integrate with Kubernetes, there is frequently a temptation to "follow your heroes".

Look at the whole Cloud Native Foundation thing, I think most of their projects are developed using Go.

So if you're using that stack, it's easy to assume that all new development everywhere is in Go.

It will probably balance out once the newness wears off Go (I think this is already happening).

pjmlp · on Jan 18, 2021

Actually Rust in what concerns Microsoft.

https://deislabs.io/posts/still-rusting-one-year-later/

I should also note that after creating the initial support for Go on VSCode, they have given it away to Google to maintain it.

physicsguy · on Jan 18, 2021

It's not overtaking at all. It's seen growth in some areas.

The issue with regards to web programming/other programming is important, because sometimes it's useful to make a website/build another tool as a scientist. Python can do both easily.

amkkma · on Jan 18, 2021

So can Julia:

https://github.com/plotly/Dash.jl https://www.youtube.com/watch?v=uLhXgt_gKJc https://github.com/GenieFramework/Genie.jl

physicsguy · on Jan 18, 2021

They're not exactly mature frameworks yet though, which is more the point I'm making. Of course, you can do most things in Julia, but does it provide a good experience for it yet?

oxinabox · on Jan 18, 2021

there are so many tools coming up around this in julia that it is arguably a problem. THere was a whole session last juliacon that was just on web-dashboard tools like Dash.jl and Stipple.jl and several others. And there was another half-session worth of other talks about web related things.

fnord123 · on Jan 18, 2021

>Julia has the focus on scientific and numerical computing, and is overtaking the python/numpy combo in that niche\

I agree. In fact if Julia hasn't overtaken Python in numerical computing by January 2022 I will consider it a huge failure.

evgen · on Jan 18, 2021

Seriously?!? Julia has no hope of overtaking Python in numerical computing by 2032, expecting movement by 2022 is just delusional. Here is a better prediction: by 2022 people using Python for numerical computing who started doing so in the previous year will exceed the number of people who have ever downloaded Julia since it was first released.

fnord123 · on Jan 19, 2021

No, not seriously. But also seriously. The article is based on % change which is of course ridiculous because the % increase of a small population isn't at all interesting. And GP also has a ridiculous claim. So I'm offering a stake in the ground to determine if Julia is on the track that GP claims. If GP wants to come back and discuss where the stake should be, it will be an interesting conversation.

kazinator · on Jan 18, 2021

Not people that matter, though, mostly just redundant copies of the same person.

ForHackernews · on Jan 18, 2021

You can call Python directly from Julia https://github.com/JuliaPy/PyCall.jl so much of the Python library ecosystem (say, matplotlib) is available to be used in Julia programs.

That helps the adoption story quite a bit. You can do the number-crunching in Julia where performance counts, and then analyse and present the results using Python.

superbcarrot · on Jan 18, 2021

- Using Python directly is a better experience than calling Python from Julia

- I've never run into unsolveable performance issues with Python

So I guess I'm not in the target audience unless I just happen to be curious about a new language? That's kind of my overall point - even if Julia is a good language on its own and I work in data science, I don't have reasons to pick it over Python.

galangalalgol · on Jan 18, 2021

If you haven't hit a brick wall with python, it is just because you haven't run into the right problem. I was doing something that required lots of conditional operations on small matrices. The FFI into numpy's native library really bogged it down. I didn't have permission to install a compiler on that machine so I wrote it in vba in excel. It was 11x faster.

pjmlp · on Jan 18, 2021

Except when Python fails, the option is to write C, in Julia I can carry on writing Julia.

ianbutler · on Jan 18, 2021

I said something similar in another thread, but for me it doesn't have to be better than Python, as that is largely going to be subjective, the package ecosystem just has to grow and have some offering, at all, for the things that I do.

https://fluxml.ai/Flux.jl/stable/

Is still very barebones compared to Torch/TF/Flax and I would be hamstringing myself by switching to Julia even if I find the language otherwise attractive.

huitzitziltzin · on Jan 18, 2021

Maybe keep an eye on this issue:

https://github.com/FluxML/Flux.jl/issues/1431

They are going for feature parity with pytorch hopefully in the near term.

ianbutler · on Jan 18, 2021

Thanks for this, I will definitely follow along there. Yeah if they can just check a few of those boxes I'm much more likely to at least try to more regularly work with Julia.

macawfish · on Jan 18, 2021

But can Torch/TF/Flax do autodifferentiation on constants ordinary functions? No they cannot!

komuher · on Jan 18, 2021

Flax/jax can :)

ChrisRackauckas · on Jan 18, 2021

If it's a pure function. Oh and if you have state-based control flow you have to turn off the JIT. Etc. If you take a standard library like some thermodynamics simulator and throw Jax on it do you expect it to work without modification? Most of the time it'll fail right at the start by using the wrong implementation of Numpy. So no, that's not "ordinary functions": those are functions where people consciously put in the effort to rewrite years of work onto Jax which is very different.

melling · on Jan 18, 2021

Python used to be tough to compete with Perl.

But then times change... It’s usually the tooling and libraries. Now I don’t want to go back to Perl.

And I’d certainly be willing to give Julia a try.

dan-robertson · on Jan 18, 2021

I found the beginner installation/package installation experience a million times better than python (except that it’s tricky to explain that you type ] to enter the package manager but you don’t see the ] that you typed)

jakobnissen · on Jan 18, 2021

You can "using PKG" and then type "pkg"add MyPackage""

ptero · on Jan 18, 2021

I think your "ecosystem of python libraries" is the key point. Python got a lot of mileage for a mass adoption from ML. Its libraries provided an "easy ML" for masses at the time ML got popular in science and job market, which quickly brought it into mainstream and built up its network effect.

A similar enabler in a new field could help Julia burst in as a general language. My 2c.

chalst · on Jan 18, 2021

Autodiff is a place where there is a gulf between Julia and Python, one that I think can't be bridged well: JuliaDiff is astonishingly flexible and performant.

https://www.juliadiff.org/

oxinabox · on Jan 18, 2021

Urg that website is so incredibly out of date. Julia has amazing things for autodiff. But like not the things listed on that website.

Also python is still doing great with Jax and PyTorch.

chalst · on Jan 18, 2021

I linked to the website (which was updated in May, but its contents could do with more work) because it has examples of how well the suite fits together.

I don't know much about Jax. I've seen competent benchmarks showing an order of magnitude benefit for using ReverseDiff from the AutoDiff suite over Autograd, which is what Pytorch uses for reverse-mode autodiff

oxinabox · on Jan 18, 2021

It was update in May to fix that the CDN changed. The actual content hasn't been updated in 7 years.

It really should be updated to have basically the content of this thread https://discourse.julialang.org/t/state-of-automatic-differe...

oxinabox · on Jan 18, 2021

I remember in ~2005 people said almost the exact same thing about Python and Perl.

evgen · on Jan 18, 2021

I think you are confusing 1995 with 2005. Perl was in decline by 2000 and by 2005 it was terminal; you could probably count the number of perl shops of any consequence in that year on the fingers of one hand.

oxinabox · on Jan 18, 2021

I don't think that is the case. Sure Perl may have been in decline for ages, but people were not comparing Perl to Python for that long. Simply because python hasn't existed that long.

Python 2.0 was released in 2000. Python 1.0 was 1994, and Python 0.9 (first public release?) was 1991.

Check the google trends. https://trends.google.com/trends/explore?date=all&geo=US&q=p... Its unclear when perl peaked since it has been in decline since before 2004 But it wasn't til late 2007 that Python overtook Perl in google searches

Even while it was in decline people were still making that argument.

uncrex · on Jan 18, 2021

> way better

People like to substitute "10x better" here but I think the real number is 100,000x better, aka it's not possible by default. Q: What it would take to replace Windows? A: iPhone was a new product category that targetted a new market.

jakobnissen · on Jan 18, 2021

It does happen, though. C has mostly replaced FORTRAN for scientific applications. Not entirely, FORTRAN is (infamously) still used, but I don't know anyone who has started a new project with FORTRAN.

Just 6 years ago, I was taught Perl in my Introduction to Bioinformatics course. The teachers were still using Perl because it used to be the go-to language for bioinformaticians. The year after, and every year since, they've taught using Python.

Loic · on Jan 18, 2021

We started a very large computational intensive project in Fortran. It is still easier to do maths in Fortran than in C/C++ and now, Fortran has a wonderful C binding system allowing direct call into C .so/.dll if you want to do some SQL or other kind of data input/output.

evgen · on Jan 18, 2021

The idea that people were being forced to learn perl just for bioinfomatics as recently as 6 years ago fascinates me.

Python has gotten exceptionally lucky. I am sure the two or three remaining perl users on the planet are also on HN and ready to jump to its defense, but to me this just goes to show you how heavy the switching cost is for something like this is and also how lucky python was to have been the best language to switch to at this point. It was in the right place at the right time for a lot of these switches away from older languages in obvious decline and then it was able to leverage numpy and scikit to pick up a lot of additional momentum in ML and data science tasks. It is almost never the 'best' language for the job, but coming in as second choice on most tasks is a huge win.

Jack of all trades, master of none, but oftentimes better than some are at one.

TheRealKing · on Jan 19, 2021

There are many new projects in modern Fortran. Just search GitHub.

Bostonian · on Jan 18, 2021

People who use a version of Fortran later than FORTRAN 77 do not spell it in all upper case letters.

phonebucket · on Jan 18, 2021

Julia offers a wonderful modular ecosystem.

This is in no small part due to a clever design decision of language design of combining type genericism with multiple dispatch.

For example, Turing.jl for Bayesian Inference plays well with Flux.jl for Neural Networks which plays well with DifferentialEquations.jl for ODEs. Basically, everything in pure Julia plays nicely with everything else.

An example of how this useful: when neural ODES became more popular a couple of years ago, Julia users had to do almost nothing to implement them and extend them. DifferentialEquations.jl and Flux.jl already played nicely with each other, and you could just run wild. Meanwhile, in Python-land, there are devs building out ODE solvers built in Tensorflow and Pytorch, doing a load of duplicate work because the frameworks don't allow the same level of genericism.

The whole ecosystem is like this.

So I've decided to stay with Julia. I'm staying with Python too. It's no big deal.

alper111 · on Jan 18, 2021

This seems very desirable. Though at the moment, when the self-attention got popular for the first time, it was already available in PyTorch. Python seems to have the edge just because there are lots of people using it. Maybe it is just a matter of time and users. Probably I will wait until the ecosystem gets larger, and then switch to it. (Yes, I am a lazy person to implement a transformer from scratch)

BadInformatics · on Jan 18, 2021

If NLP primitives are all that's keeping you from testing the waters, have a look at https://github.com/chengchingwen/Transformers.jl.

laplacesdemon48 · on Jan 18, 2021

I've been using R nonstop for pretty much 5+ years. I'm happy that there's established competition coming from Python and new competition coming from Julia. Having these languages compete over similar types of programmers pushes each one to be better, which is awesome. I'm not a die-hard R person, I'd be more than happy to switch under the right circumstances.

But...I think one thing gets overlooked way too often. For "data scientists" or "statisticians" or [insert new term here], the majority our non-modeling time is spent on just plain old data wrangling. To me, R is unbeatable here. I've tried Python ~2 years ago and pre-1.0 Julia.

Using tidyverse you can do pretty much anything to any dataset, often *without a monstrous amount of keystrokes*. (The pipe syntax is awesome). If you really need speed you can always switch over to data.table for uglier but faster code. I really tried but I could never replicate the "brain cycles to keystrokes" speed of R in Python/Julia. That is, being able to intuitively and quickly just convert my thoughts into readable data wrangling code.

Sure the base R language is not that "fast" and Julia/Python benchmarks are way faster. But in practice this doesn't matter to me. Most of the performance sensitive packages are written in C/C++/Fortran anyway (rstan, brms, glmnet, caret). I don't care that I could write 3x faster loops. The extra 5 seconds for that one piece of code doesn't make up for the absence of a good data wrangling ecosystem.

My message to the Julia team: You can get a very large portion of the R userbase to switch over if you focus on a Julia version of the tidyverse (especially dplyr). I know that DataFrames.jl exists but it just doesn't even come close. There's a difference between "you can do this in Julia too" and "here's a clean/intuitive way to do this better without extra baggage".

I'm sorry if the above seems harsh. I genuinely appreciate the Julia team's efforts. I can only imagine how hard it is to create a new language. I just wanted to be honest.

veddox · on Jan 18, 2021

I deeply loath R for its terrible type idiosyncracies, syntax, and slowness.

However, even I must admit that it is incredibly good at what it was meant to do - analyse and display data. (And yes, the tidyverse is a huge improvement of the syntax, although it's telling that they basically reinvented the language to do so.)

As an ecological modeller, I create my actual simulation models in Julia, because it is a much, much better language for any real programming. But I still analyse the output in R.

dm319 · on Jan 18, 2021

I don't understand how people can loath R. If you take a functional approach, especially using pipes, dplyr and a split, apply, combine style, it is quite beautiful. Much nicer than trying to, say, divide a time period by an integer in Go.

kescobo · on Jan 19, 2021

> If you take a functional approach, especially using pipes, dplyr and a split, apply, combine style, it is quite beautiful

Sure, but what if you don't? Sometimes, this is the right way to do things, other times there are other approaches that are more natural/beautiful. In many cases, a loop with conditionals is much easier to understand.

grayclhn · on Jan 18, 2021

I use a lot of R, and like many aspects of it. But the fact that `f(stop("Hi!"))` may or may not throw an error depending on the internals of `f` is a little maddening. (And there are tons of similar issues.)

_v7gu · on Jan 19, 2021

Isn't that just lazy evaluation?

CoolGuySteve · on Jan 18, 2021

When it comes to data wrangling, one huge advantage of Julia over tidyverse/R dataframes/Pandas is that you can write a damn for loop and it won't be brutally slow.

It's so much simpler and faster to use a loop that says "pick this row only if this and that and this other thing are sometimes true" vs having to construct an algebra of column filters to do the same.

laplacesdemon48 · on Jan 18, 2021

I think that is absolutely a fair criticism. Personally, I rarely run into an issue where I absolutely am bottlenecked by a slow loop. But this sort of thing drew me to Julia in the first place.

There was also an R update in ~2017 that introduced some JIT speed-ups for loops, which made a noticeable difference.

If this is a problem you run into often, I suggest converting your object to a data.table. You can pass a function row-wise over the object very quickly:

https://stackoverflow.com/questions/25431307/r-data-table-ap...

dm319 · on Jan 18, 2021

I think loops are not ideal for data analysis. They are prone to human error, especially ones that modify the data, and in a way that can be hard to sort (i.e.iterating over the dimensions of the wrong object). A stepwise creation of new logical fields using mutate, and then a vectorised ifelse command is more robust and you can clearly see steps of the logic.

tikej · on Jan 18, 2021

I also like pipe syntax and I've found there is nice support for it in Julia. There are some nice packages to improve it over base [1].

Have you checked queryverse [2]?

[1] https://github.com/jkrumbiegel/Chain.jl [2] https://www.queryverse.org

laplacesdemon48 · on Jan 18, 2021

I haven't heard of queryverse, thank you for that. This also brings up a good point I wanted to highlight.

I get that Julia is a young language with a growing ecosystem. But the lack of "one obvious way to do something" may scare new users away.

"I want to quickly wrangle data. Do I use Query.jl, DataFramesMeta.jl, SplitApplyCombine.jl or something else?"

"I need pipes to help me wrangle data more efficiently do I use Base Julia, Chain.jl, Pipe.jl, or Lazy.jl?"

For a new R user it seems so much simpler:

1. run "library(dplyr)" 2. Google "how to XYZ in dplyr" 3. ??? 4. Profit

oxinabox · on Jan 18, 2021

I mean I get yout point. Julia has a bit of a Lisp's Curse http://winestockwebdesign.com/Essays/Lisp_Curse.html Writing a performant and easy to use data wrangling library for R is a bunch of work and means dealing with C/C++ etc. So few people are willing to do so, and just contribute to a small number of libraries like dplyr. (I feel like there are at least 2 other major compeditors to that in R?) Where as in julia it's really easy to write a new data wrangling library. Its just not that much work. So people: A) do it for just fun / student projects (None of those ones are though). B) do it because they have a nontrivially resolvable opinion (e.g. Queryverse has a marginally more performant but marginally harder to use system for missing data)

Nice thing about julia, especially for tabular data (thanks to Tables.jl), is everything works together. It's actually completely possible to mix and match all of those libraries in a single data processing pipeline. Which while is generally a weird thing to do, it does mean if you have a external package uses any of them it works into a pipeline of another. (One common case is that queryverse has CSVFiles.jl, but CSV.jl actually is generally faster, and you can just swap one for ther other, inside a Query.jl pipeline)

I absolutely argee this makes learning harder.

---

Also that particular example:

> "I need pipes to help me wrangle data more efficiently do I use Base Julia, Chain.jl, Pipe.jl, or Lazy.jl?"

It's piping. Something would have to massively be screwed up if any of those options were more or less efficient than the others. The only question is what semantics do you want. Each is pretty opinionated about how piping should look.

kazinator · on Jan 18, 2021

The Lisp Curse was written by then inexperienced web developer, with (then, and likely now still) zero Lisp experience, based on extrapolating something he read about Lisp in an essay by Mark Tarver. He prefers it not be submitted to HN due to the embarrassment, yet for some reason keeps the article up (probably because it generates traffic).

phonebucket · on Jan 18, 2021

> For a new R user it seems so much simpler:

> 1. run "library(dplyr)" 2. Google "how to XYZ in dplyr" 3. ??? 4. Profit

I beg to differ here. There’s much to be said for using data.table and base R instead of the tidyverse.

This article is worth a read in my view: https://github.com/matloff/TidyverseSkeptic

disgruntledphd2 · on Jan 18, 2021

Yeah, NSE (non-standard evaluation) is really annoying to work with in dplyr/tidyverse codebases, and this definitely inhibits people from building on top of them.

They are an 80% solution for a lot of data analytic needs, but base-R is 100% the right choice if you want your code to run for a long time without needing updates.

I've never really gotten into data.table for some reason, normally dplyr is fast enough, or I'm using something more efficient than R.

arduanika · on Jan 18, 2021

What a constructive, positive, down-to-earth, well-written comment, and what a nice reprieve from everything that's broken about the tone of web discussions these days. You point out that there's still another player in this space (R), but not in a way that's whiny, dismissive, or doctrinaire, and you celebrate the healthy competition. You suggest a streamlined path toward Julia ecosystem maturity, rooted in real-world needs. Nicely done!

I have no real dog in this fight, but I hope Julia team members (and/or aspiring Julia ecosystem contributors) will read and consider your point.

DNF2 · on Jan 18, 2021

This whole thread seems to be quite civilized. I can see no name-calling or off-topic rants, only a frank exchange of opinions, mixed in with some facts.

Your post seem to indicate that there is some sort of 'fight' going on, or that the tone is broken. I disagree. If most web discussions were like this one, we would have fewer problems in this world.

arduanika · on Jan 18, 2021

Oh, that's exactly what I mean -- when I say "everything that's broken about the tone of web discussions these days", I'm talking about threads and topics other than this one. I don't see any 'fight' here, and that's what's so refreshing.

DNF2 · on Jan 18, 2021

All right! I got the impression you were contrasting that particular post with the rest of this discussion, but apparently not. Still slightly confused here. Oh well, carry on.

j7ake · on Jan 18, 2021

Another big thing that R has an edge over python (and I guess Julia, but not sure) is making quick yet presentable plots of data that contain different factors that you want to show together. The matplotlib equivalent requires tracking different indices and manually adding layers for different indices.

amkkma · on Jan 18, 2021

Julia has plenty of plotting solutions that are better for stats than matplotlib:

https://github.com/JuliaPlots/AlgebraOfGraphics.jl https://github.com/queryverse/VegaLite.jl https://github.com/JuliaPlots/StatsPlots.jl

nextos · on Jan 19, 2021

Gadfly is amazing too, and Makie is the future.

laplacesdemon48 · on Jan 18, 2021

Here's a neat website that captures this: https://www.r-graph-gallery.com/

If you click into any of the plots and scroll down you can see how little code is needed for most of these plots.

For example: https://www.r-graph-gallery.com/135-stacked-density-graph.ht...

notagoodidea · on Jan 18, 2021

I worked with R and Python during the last 3 years but learning and dabbling with Julia since 0.6. Since the availability of [PyCall.jl] and [RCall.jl], the transition to Julia can already be easier for Python/R users.

I agree that most of the time data wrangling is super confortable in R due to the syntax flexibility exploited by the big packages (tidyverse/data.table/etc). At the same time, Julia and R share a bigger heritage from Lisp influence that with Python, because R is also a Lisp-ish language (see [Advanced R, Metaprogramming]). My main grip from the R ecosystem is not that most of the perfomance sensitive packages are written in C/C++/Fortran but are written so deeply interconnect with the R environment that porting them to Julia that provide also an easy and good interface to C/C++/Fortran (and more see [Julia Interop] repo) seems impossible for some of them.

I also think that Julia reach to broader scientific programming public than R, where it overlaps with Python sometimes but provides the Matlab/Octave public with an better alternative. I don't expected to see all the habits from those communities merge into Julia ecosystem. On the other side, I think that Julia bigger reach will avoid to fall into the "base" vs "tidyverse" vs "something else in-between" that R is now.

[PyCall.jl]: https://github.com/JuliaPy/PyCall.jl

[RCall.jl]: https://github.com/JuliaInterop/RCall.jl

[Julia Interop]: https://github.com/JuliaInterop

[Advanced R, Metaprogramming] by Hadley Wickham: https://adv-r.hadley.nz/metaprogramming.html

kescobo · on Jan 19, 2021

Out of curiosity, when was the last time you looked at DataFrames.jl? A huge amount has happened in the last year. Plus, if you want more tidy-like syntax, you can go with Query.jl, (or DataFramesMeta.jl, though that isn't quite finished updating to the the new DataFrames syntax), or of you just want pipes on DataFrame operations, there's Pipe.jl and Chain.jl.

I don't think your comments are harsh, you need what you need and you like what you like. I do mostly data wrangling too, but feel much less constrained with Julia than with tidyr. Sometimes having constraints and one right way to do things is good, but it's not for me.

Also worth noting it's not necessarily on the language developers to do this. Even in R, tidyverse is in packages, not in the base language.

valarauko · on Jan 19, 2021

My experience with R was somewhat different. R was my first computational language in 2006 (version 2.3, IIRC), and parsing real life data (biological, in my case) into a format acceptable to R was a non-trivial exercise. I had somebody write me a perl script to parse the raw data into a clean CSV, but that has its own problems. The tools that were the kernel of the tidyverse (created 2014) were just beginning to show up, and even magrittr pipes were many years away. The only tidyverse tool even close to mature at the time was ggplot. For me data munging was the limiting factor, and at some point I discovered many people prefer Python for these initial steps. In 2013 I learnt Python with the explicit aim of data munging, while continuing analyses in R. With Pandas I could cover 80% of my use case for R, and eventually dropped it completely. Again, this predates the creation of the tidyverse, which I noted with some irony.

For what its worth, Hadley Wickham was asked in a Reddit AMA several years ago about which platform he'd choose if he was just starting out. He pointed to Julia as his pick.

snicker7 · on Jan 19, 2021

> My message to the Julia team: You can get a very large portion of the R userbase to switch over if you focus on a Julia version of the tidyverse (especially dplyr).

How about the Queryverse?

https://www.queryverse.org/

blablablerg · on Jan 18, 2021

Coudn't agree more! Everytime I look at Julia, I check if they have an alternative to the tidyverse (esp. dplyr and tidyr) yet.

ryndbfsrw · on Jan 18, 2021

If we removed dplyr, then R scripts would absolutely scream so I find the speed argument for 'why switch to X' unconvincing. If users cared so deeply about speed, almost no one would be using tidyverse instead we'd all be using base-R or data.table.

Multiple dispatch? Hmm is this really a problem that I'm going to come across in the real-world when 90% of our time is spent ingesting a poorly-formatted csv, doing some quick plots and perhaps building a model to test something out. If the goal of Julia is to replace R/Python then their priorities feel way off the mark

goatlover · on Jan 18, 2021

> If the goal of Julia is to replace R/Python then their priorities feel way off the mark

There's a lot more to scientific computing than wrangling tabular data. Julia is competing in that overall space with R/Python/Fortran/Java/C++. If R or Pandas is better at data wrangling, then Julia won't win out there. But so be it. No PL is best at everything.

laplacesdemon48 · on Jan 18, 2021

> There's a lot more to scientific computing than wrangling tabular data.

Also a point that gets ignored way too often. My original post differentiated between time spent writing models and time spent data wrangling.

I would never even attempt to write a symplectic integrator in base R (OK maybe Rcpp would be fine but that's not really "R"). Julia, by design, is better at that. But the R ecosystem is so good that I can use the best practical implementation of a symplectic integrator to solve common modeling problems via RStan.

Yes, Stan is a standalone framework that can be accessed from Julia as well. But the following workflow can be done in R much easier:

  1) Read in badly formatted CSV data
  2) Wrangle the data into a useable form
  3) Do some basic exploratory analysis (including plots)
  4) Write several models in brms/raw Stan (via rstan)
  5) Simulate from the priors and reset them to more sensible values
  6) Run the model over the data to generate the posterior
  7) Plot/run posterior predictive checks, counterfactual analysis, outlier analysis (PSIS or WAIC), etc.

Again, the above represents my common use case. I fully appreciate that people use Julia to do awesome stuff like "the exploration of chaos and nonlinear dynamics." [0]. I understand that the modern R ecosystem isn't really built for this.

[0] https://juliadynamics.github.io/DynamicalSystems.jl/latest/

ryndbfsrw · on Jan 18, 2021

Totally agree there. It is not a replacement and it is trying to solve a different problem. I dont believe Julia contributers are lying awake at night upset that other languages exist and feel they need to put a stop to that. My point (put across clumsily I see) is that IF that was their goal then they are going about it the wrong way as most R/Python users have different priorities. But it is a moot point as that would be an absurd motivation to create a whole new language

BadInformatics · on Jan 18, 2021

> is this really a problem that I'm going to come across in the real-world when 90% of our time is spent ingesting a poorly-formatted csv, doing some quick plots and perhaps building a model to test something out

Yes, multiple dispatch is not some highfalutin ivory tower concept that only comes up in specialized code. For example, the model in question could define custom plotting recipes[1] so that you can just call plot() and have it produce something useful.

Also, why shouldn't dplyr perform comparably against data.table? Seems like there would be no need for a fragmented library ecosystem here if the abstractions the tidyverse is built upon were lower-cost. Moreover, what if my data isn't CSV or in a table-like shape at all? "real world" does not mean the same thing across different domains.

[1] http://docs.juliaplots.org/latest/recipes/

disgruntledphd2 · on Jan 18, 2021

> Yes, multiple dispatch is not some highfalutin ivory tower concept that only comes up in specialized code. For example, the model in question could define custom plotting recipes[1] so that you can just call plot() and have it produce something useful.

This is literally the whole conception behind generic functions in R (print, plot, summary etc).

I agree it's great, but Julia is building on a lot of prior art here.

BadInformatics · on Jan 18, 2021

For sure, and one would be remiss not to mention Dylan, CL/CLOS and Clojure here as well. My quibble was with the claim that multiple dispatch rarely shows up in practice, which you've pretty clearly shown is not the case in R!

disgruntledphd2 · on Jan 18, 2021

Yup, the R-FAQ specifically calls out Dylan and CL as influences.

ryndbfsrw · on Jan 18, 2021

'highfalutin ivory tower' is a great name for a band :D

Naturally you are correct and I am wrong to dismiss it as unimportant. What I'm saying is that the majority of R/Python users today are not looking for ultimate speed or sophisticated programming paradigms. Most users are doing the unsexy bread and butter of 'Take some tabular data' -> analyse -> report on it and I want to dismiss the argument of 'users will migrate to Julia because of these nifty features' because it ignores the very reasons the existing users use these tools in the first place. It would be as absurd as proclaiming Excel users will switch to Python because the accounts deparment suddenly cares about NLP.

krumbie · on Jan 18, 2021

The comparison between different languages gets tiring when it focuses on making a black-and-white statement like "Julia is better" or "Python is better" and "x is never going to overtake y". Yes, Python has many more libraries thanks to it being much older than Julia, same for R. But at the same time, Julia can be used for impressive work that R/Python struggle with and which only seem solvable in these languages because of large investments into certain packages by big companies.

So I find the fact that many hard problems can be solved very generically and performant with small libraries written in Base Julia much more interesting than countering that much larger and older Python packages with millions of developer hours poured into them are currently more feature-complete. Yes, they are, right now. Why wouldn't they be. But does what is being done in Julia with much fewer resources not point to an impressive ability of the language to facilitate such development?

disgruntledphd2 · on Jan 18, 2021

While I generally agree with your argument, it's worth noting that the median Julia programmer is probably more invested in the language/ecosystem than the median R/Python programmer.

Back in the mid-90's Java was the new hotness, and it probably made problems that required 100+ lines of C easier, but it's not still full of above-average programmers, as any language/ecosystem that achieves success will inevitably regress to the mean.

krumbie · on Jan 18, 2021

That's true. I am of course biased in talking mostly to people on the Julia Slack etc. who enjoy the language a lot and do interesting things with it.

That's one of the reasons, though, why I never find the "how many people are using it" argument the most convincing when talking about the merits of a language. Because most people I've seen using R, Matlab and Python, at university or work for example, used it really superficially, and therefore wouldn't have any interesting things to say about it. Neither do they add anything interesting to the respective ecosystems. I don't think it's the first interest of a new language to get this type of user, although of course in the long term you want to build tools that are easy to be picked up and used by a wide audience, and number of users is some indicator of that.

dopu · on Jan 18, 2021

I'm a graduate student that's switched almost completely over to Julia. Prior to it I worked in both MATLAB (the IDE is so nice, and writing out matrix computations is just great) and Python (for ML). Julia is absolutely nicer to write in than either of the two. MATLAB is slow and at times feels less like a programming language and more like an incomplete and brittle interface with the JVM. Python is also slow, and it feels awkward to use given that it was not explicitly designed for scientific workflows. With Julia I get proper typing, incredible speed, easy parallelization, and a kick ass REPL.

The only thing I truly miss in using Julia is the plotting capacities of MATLAB. I haven't found an environment that can match it in terms of interactivity. Give me the ability to (easily) save interactive figures for later use and Julia would be perfect.

darsnack · on Jan 19, 2021

You should check out Makie. Getting it set up can be a bit frustrating if things don’t go right, and there is a small learning curve for using `@lift`, but it is an absolute joy to use once you ramp up.

I use it for my research by default. You can pan, zoom, etc. The subplot/layout system is frankly a lot better than Matlab (and I enjoyed Matlab for plotting!). The best part is that I can insert sliders and drop downs into my plot easily, which means I don’t need to waste time figuring out the best static, 2D plot for my experiment. I just dump all the data into some custom logging struct and use sliders to index into the correct 2D plot (e.g. a heat map changing over time, I just save all the matrices and use the slider to get the heat map at time t).

superdimwit · on Jan 18, 2021

I would recommend Plotly.js in VSCode for interactive plotting.

dopu · on Jan 18, 2021

Wow, I just tried it out. This is really great. And it solves my interactive plot saving requirement. Easy as doing `Plotly.savehtml(fig, "test_fig.html")` :). Thanks!

skybrian · on Jan 18, 2021

Could you elaborate? I see that Plotly.js is a JavaScript library, but is there special integration with VSCode?

superdimwit · on Jan 18, 2021

yes, it integrates with the VSCode plotting pane. Not exactly sure where this happens, though.

StefanKarpinski · on Jan 18, 2021

Out of curiosity, have you tried using Plots.jl from VS Code? If so, what's missing from that experience?

dopu · on Jan 18, 2021

I have, though I've mostly stuck with the plain PyPlot.jl package due to the familiar syntax and interactivity. Perhaps things have changed, but I just recall being frustrated at the inability to zoom in/out, and again save interactive figs. Perhaps it was just due to the particular backend I was using. I'll give the VS Code experience another try!

cricci16 · on Jan 18, 2021

In my modest experience the perfect Julia slogan would be:

"fast as C, easy as python, but NEVER the two together"

All the sentences:

"When you’re writing various algorithms, you don’t necessarily want to think about whether you’re on a GPU, or whether you’re on a distributed computer. You don’t necessarily want to think about how you’ve implemented the specific data structure. What you want to do is talk about what you want to compute."

sound nice.

Except in practice, unless someone else bothered doing that for you, you have to do it yourself.

jakobnissen · on Jan 18, 2021

Underrated comment. Yeah, if you want C-like performance, you have to do some low-level considerations, that is unavoidable at some point. So the "speed of C, convenience of Python" is misleading.

However, for many, many small tasks, today's compilers are smart enough that you can express your idea in a high-level language and the generated code will be maximally efficient. The real killer feature of Julia is that, where ever you can gain maximal performance with high-level syntax, you can just choose to do that. A more correct but less sexy slogan for Julia is that it has the best performance/expressiveness tradeoff you have ever seen.

cricci16 · on Jan 18, 2021

"A more correct but less sexy slogan for Julia is that it has the best performance/expressiveness tradeoff you have ever seen."

This I almost fully agree

6gvONxR4sf7o · on Jan 18, 2021

That's still a massive selling point. In python, getting speed can be weird and counterintuitive. In C, a straightforward algorithm can be blazing fast. For example, finding the length of the longest word in a string, you can just iterate through the string keeping track of a few indices. In cases like that, where the obvious simple C function is incredibly faster than the same python, where does Julia fit in? Would that kind of naively written function be closer to C or Python?

jakobnissen · on Jan 19, 2021

In that case, it's closer to C in speed, and usually more generic and "easy on the eyes". I don't know much about C, but seem to recall that it doesn't play well with unicode, usually treating text as bytes. Here's an equivalent Julia example:

    function longest_word(st::Union{String, SubString{String}})
        len = 0
        start = 1
        @inbounds for i in 1:ncodeunits(st)
            if codeunit(st, i) == UInt8(' ')
                len = max(len, i - start)
                start = i + 1
            end
        end
        max(len, ncodeunits(st) + 1 - start)
    end

This takes about 8.2 µs for a 8.5 Kb piece of text on my laptop, but that only works on ASCII text and only treats ' ' as whitespace, not e.g. '\n'. For a more generic one, you can do:

    function longest_word(st::Union{String, SubString{String}})
        len = i = 0
        start = 1
        for char in st
            i += 1
            if isspace(char)
                len = max(len, i - start)
                start = i + 1
            end
        end
        max(len, i + 1 - start)
    end

This is 20 µs for the same text, so still only 3 ns per char. The underlying functionality, namely String iteration and the `isspace` function, is also implemented in pure Julia.

ddragon · on Jan 18, 2021

In general, if you code like Python (highly dynamic code with no consideration to performance) it will be closer to Python in speed, and if you code like C/Fortran (completely static, overspecified types) it should be closer to C in performance, the variance in performance in terms of naive implementations is pretty high. That means it's easy to get into Julia and start programming no matter your background, but idiomatic Julia (which it's not something you'll learn in a day) should be concise and high level like Python (and frequently more concise) and close to C in speed.

For example, what other dynamic languages do like verbosely typing everything doesn't really work in Julia (the compiler already knows pretty much every type even without hints), what works is treating the variable as a polymorphic container instead of a dynamic container: you don't know yet what type the variable has (only the behaviour), but whatever it is you should avoid changing it if possible (what they call type stability). Which is kinda why it might not be obvious reading proper high performance Julia code, as it is not something you do to make it fast, but what you don't do (change a variable type, forcing the compiler to create a low performance dynamic box, plus other stuff like global variables).

sgt101 · on Jan 18, 2021

Yes - if you have a real problem Julia is the way to go. If you are just banging something out to prove a point or make a delivery then Python is often much easier.

Ofc this is like Excel and Notebooks - I start doing things in Excel because I can sort out an answer in like 30seconds. Doing it in a notebook requires 5 minutes, or maybe a little longer. But... see me there, a week later after the feedback and next questions from the customer... now I am in Excel hell and I wish wish wish I had started out in a Notebook.

komuher · on Jan 18, 2021

This is my favourite comment about julia for like last few years +1 on that.

Ecosystem is extremly poor outside very few niches and most of the Deep Learning stuff isn't even faster than python api (+C ofc.) so swaping is just usless if u dont have time to write your own GPU kernals for every new opertaion.

jpsamaroo · on Jan 18, 2021

At least for the GPU case, the ecosystem is slowing moving towards writing generic kernels that can be executed on both the CPU (multithreaded) and the GPU, without doing anything special in the kernel itself, via KernelAbstractions.jl. It's still got a little way to go, but already some larger codes are using it to great effect. Also, as a member of the JuliaGPU group, I know that AMD and Intel GPUs should be supported by KernelAbstractions within the next month or two, so a single generic kernel will be able to run unmodified on all major GPUs.

twobitshifter · on Jan 18, 2021

I have been using Julia only for a few months, but I’ve been surprised in the speed up that’s possible vs python code using pandas. Depending on the size of your datasets the JIT might slow you down a little, but the speed of Julia outweighs this. Liberally using functions really allows Julia to shine.

One thing that I’ve recently seen which concerns me long term is the creation of various competing macro syntaxes for reducing the wordiness of Julia. There are many competing implementations of pipes and other syntax sugars. These macros definitely make things easier, but as you use them the code becomes more difficult for another to understand and since there is at this time, no one set of macros to use, you’ll have to know each of the competing sets to make since of examples.

krumbie · on Jan 18, 2021

I have created one of the recent competitors in this space [1], and I think it's not so bad, as long as the macros are reasonably simple to understand. If you look in the readme of my project, the four syntax examples actually look quite similar so I don't anticipate they'd cause a lot of confusion.

When I got annoyed in my own work that some data wrangling syntax was repetitive I was just really glad that I could easily build my optimal solution and didn't have to just accept that there's one suboptimal (for me) way. In Python and R, if you like what they offer that's good, if not - not so good.

Part of the problem comes from Julia not being geared towards DataFrames like R is, but I gladly trade a bit of convenience in one domain against a lot of expressive freedom with very clean rules that apply everywhere.

For example, I think it's quite good that you can only have "weird" behavior in Julia with macros, but they give you a visual indicator with the @ that you're seeing non-standard syntax. While in R, the non-standard evaluation means that literally anything could happen to the variables you pass into any function. It makes for some convenient syntax in some cases, yes, but it's so confusing as a system for writing software! You never really know if you're looking at a variable or just a name, for example.

[1] https://github.com/jkrumbiegel/Chain.jl

sgt101 · on Jan 18, 2021

Julia is way superior if you are building programs and systems, especially if you are building for sustained use (rather than something to do a job once). Julia is less error prone, more expressive, more maintainable, more performant.

But if you are creating cut and shut scripts for data science notebooks Python wins... the repl start time alone is a killer for Julia, add in the requirement to actually think about structure and the problem and it's out of my "3 -> 6hr" workflow.

fsh · on Jan 18, 2021

Julia is amazing for numerics, but the JIT is painfully slow for anything that gets looped only a few times or not at all. I don't think it is usable as a general purpose language until this gets improved somehow.

sgt101 · on Jan 18, 2021

But is that a problem apart from in scripting? If you ain't looping you ain't waiting in my experience.... Can you give an example which isn't time to plot?

fsh · on Jan 18, 2021

I think that a slow JIT is problematic in any application except for non-interactive numerical calculations. For example in a GUI software or in a server application it would be very undesirable to have each function run a million times slower the first time it gets called.

sgt101 · on Jan 18, 2021

For GUI's I don't see this as an issue - JITs run alot faster than humans. All of Javascript is on a JIT - and thats the dominant UI for now! For servers generally you find that users are all calling the same function so it's very rare that you hit a blip - much more commonly users get performance problems from something else in the stack like the network or the client, and while Javascript is the dominant front end the dominant back end for servers is Java and that's got a JIT as well.

jpsamaroo · on Jan 18, 2021

JavaScript's JIT is a tracing JIT, so it can compile code in the background while the interpreter/less optimized compiled code is actually running. In Julia, the compiler runs first, and then the compiled code is run. This will probably eventually change as Julia's compiler improves, but regardless, it's important to note this distinction.

pjmlp · on Jan 18, 2021

It can make use of JIT caches (if it doesn't already), not every JIT compiler starts from zero.

metreo · on Jan 18, 2021

Julia is simply a _much_ nicer to write lang than Python, it is more functional and it provides the user with far more ways to be expressive. I think the JIT warmup is less of an issue than its overall memory use but I can see that the former is more apparent to the average user sitting at their terminal.

I wish we would see a larger fraction of the energy invested into propagating the 'Pythonic' approach were _properly_ redirected into improving relevant aspects of Julia.

xvilka · on Jan 18, 2021

If you want to help the adoption of Julia in the bioinformatics and medical applications - feel free to support BioJulia[1][2] on their OpenCollective page[3]. During the times of pandemic projects like these are of the special importance.

[1] https://biojulia.net/

[2] https://github.com/BioJulia/

[3] https://opencollective.com/biojulia

salt-licker · on Jan 18, 2021

Julia’s type system is not particularly user-friendly. For example, it has both a “String” and a “SubString” type which cannot always be interchanged. Their language design seem to be much more concerned with execution speed than programmer productivity — Python has the balance in the opposite direction, but they’ve been gradually improving performance for years, and this is much easier to improve after the language design is set in stone, especially as more and more people add typing info to their programs.

Also, 1-indexed arrays are a major turn-off.

oivey · on Jan 18, 2021

I’ve generally found they can be interchanged in the way you would expect as a Python user: duck typing. I have more experience using SubArrays, but I think the underlying machinery in the language is basically the same.

Adding performance after the fact is not easy. This is why most numerical Python projects depend on C extensions rather improvements to the Python runtime. Writing C extensions or jamming your algorithm into the shape of existing C accelerated APIs is often not very time efficient.

huitzitziltzin · on Jan 18, 2021

IMO, array indices are a matter of taste which inspire feelings of religious intensity.

You can change it if you prefer 0 or -14 or whatever number you like. Non zero effort is required, but if you otherwise like the language it can be changed AFAIK

https://docs.julialang.org/en/v1/devdocs/offset-arrays/

stabbles · on Jan 18, 2021

    julia> supertype(String) == supertype(SubString) == AbstractString
    true

If you insist just use f(s::AbstractString)

kuter · on Jan 18, 2021

Julia uses modern compiler technology to achieve close to native performance. This is not just generating LLVM IR. Julia also has it's own optimization system for language specific optimizations that LLVM struggle to do (due to language specific info getting lost in conversion to LLVM IR).

Google's V8 (js interpreter) also uses modern compiler tech but I think it is not as capable as LLVM optimization vise (I don't think it is designed to be).

Python will either adapt or perish. Even if it is not julia it would be a another language.

pansa2 · on Jan 18, 2021

V8 and the Julia compiler work in quite different ways - because they solve quite different problems. In particular, Julia’s compiler only works well on code that is "type-stable", whereas V8 has no such limitation.

kuter · on Jan 18, 2021

I know that V8 does some interesting stuff with it's types. Julia's creators considered supporting optimization while designing the language.

Whereas the first javascript engine that even generated machine code came much later it's creation.

Js has weird features like being able to set a getter function to a array index. There was a memory corruption bug in V8 that the implementation of `Array.sort` would call a getter function in the array that would change the size of the array causing a memory corruption. This was used in a exploit.

Creators of V8 created a domain specific language called Torque to implement the language lol.

pansa2 · on Jan 18, 2021

I don’t know whether Julia will succeed as a general-purpose language, but its design seems ideal for numeric code.

In particular, multiple dispatch makes operator overloading so much nicer than Python’s fragile “dunder” methods like `__add__` and `__radd__`.

dan-robertson · on Jan 18, 2021

I’ve used juli a few times over the years.

For some numerical things it’s nice with easy interfaces to modern algorithms. I solved a differential equation recently and it just worked. And Julia feels like a much more proper programming language than matlab.

At the lower level (which I haven’t looked at in a while so may have changed) I found it a bit confusing and messy. The subtyping and method selection are tricky to get right and they are fundamental to important parts of the language like it’s numeric tower. But libraries seem to just work.

Macros were horrific and the ast is inscrutable and liable to change from one version to the next. Quasiquoting was also tricky. So I wouldn’t recommend trying to do anything weird with them. But maybe they are good now.

Pluto notebooks seem a great concept. I tried them recently and mostly they worked (sometimes they didn’t get dependencies right but it’s still beta). I felt like I was fighting a bit with plots.jl. I don’t know if there are things that weren’t obvious to me that I was missing or if it can just be a bit annoying. I haven’t tried gadfly but I would like to at some point. I’ve heard good things about ggplot2 which it is inspired by.

I felt like documentation was a bit lacking in good straightforward tutorials and examples. As well as documentation in general. But I don’t want to put too much emphasis on that.

ajankelo · on Jan 18, 2021

Julia is fantastic with a great community. The one issue I have though is the use of greek symbols, while great for those formally trained, may have a negative impact on wider adoption for deep learning.

DNF2 · on Jan 18, 2021

Python also allows Greek letters.

There are no Greek letters forced upon you, they aren't even used in Base, and barely if at all, in the stdlibs.

It is a feature for you to use, if you want. (And they dramatically improve code readability in heavily mathematical code.)

chrispeel · on Jan 18, 2021

I don't think Julia requires you to Greek symbols, rather it allows them ;-)

mkl95 · on Jan 18, 2021

Python has a battle-tested, humongous standard library, and a rich ecosystem. It is easy to learn and it's great at gluing every day stuff. Data scientists like it, engineers like it, even the cashier at Lidl likes it.

I can see how Julia may challenge Python for academic use, but challenging Python in 2021 for industrial use is no joke.

dunefox · on Jan 18, 2021

> Python has a battle-tested, humongous standard library, and a rich ecosystem

Use it then: https://github.com/JuliaPy/PyCall.jl

Xcelerate · on Jan 18, 2021

I’m amazed how many people are defending Python. It’s such a kludgy language. I’m curious what the Fortran holdouts said back in the day.

eigenspace · on Jan 18, 2021

The better comparison is to Perl. Everyone used to say that there was no way Python could replace Perl because Perl had such a big ecosystem.