Hacker News new | past | comments | ask | show | jobs | submit login
Julia v1.2 (julialang.org)
155 points by ziotom78 62 days ago | hide | past | web | favorite | 58 comments

Bit late to the party here, but here's a thing that's been bugging me: as a veteran of many deep learning frameworks, I’m 100% convinced of many of the arguments put forward in "Tensors Considered Harmful" [1].

Basically, I can summarise the argument as: for complex, real-world projects involving multidimensional arrays, having to remember which dimension means which thing, and track these through various tensor transformations, is the 21st century equivalent of how prior to high level programming languages we had to manually track which registers we we were using to store what quantities, and how this changed in different parts of a program.

It was the real dark ages of programming, and we’re in that same place with a lot of complex deep learning models now.

So my question is: has this been discussed at all in the Julia community? Is this on anyones radar? Since one of the core competencies of Julia is array processing and elegant metaprogramming, I would hope that it can gracefully absorb the idea of named array axes in a way that feels natural and complete, and could start showing up in ML frameworks like Zygote and Flux down the line...

[1] http://nlp.seas.harvard.edu/NamedTensor

There are a bunch of packages exploring ideas around labelled dimensions, although we are still far from the vision of caring as little as we do about registers:

* https://github.com/JuliaArrays/AxisArrays.jl * https://github.com/davidavdav/NamedArrays.jl * https://github.com/invenia/NamedDims.jl * https://github.com/rafaqz/DimensionalData.jl * https://github.com/ITensor/ITensors.jl

While these don't pass dimension labels along in the type, there are also a number of Einsum-like packages which should work on any storage order, and could be taught to interface with such types:

* https://github.com/Jutho/TensorOperations.jl * https://github.com/under-Peter/OMEinsum.jl * https://github.com/mcabbott/TensorCast.jl

There are some packages:


In Julia, as long as any custom type implements the base methods (through overloading) it will work with anything and with no performance cost (since most base Julia types are written in Julia anyway), including every ML library. For example unitful that keeps track of the physical dimensions that perfectly integrate with the differentialequations library:


This release got delayed a bit but I think it was for great reasons and really showcases the straights of the Julia community and core developer team.

My understanding (please correct my if I’m wrong somewhere) is that in Julia, for releases which are supposed to be non-breaking such as 1.2, the Julia devs run the test suites of every registered Julia package on stables versions and then on their release candidate and then go through any regressions and identifies any unforeseen breakages that occurred and then either fix them in the Julia release candidate or if the package relies on an internal implementation detail which is not guaranteed to be stable, they will often try to help the package owners fix the regression. That’s seriously impressive and while I’m sure it happens in other languages, I hadn’t heard of a programme like that and thought it was worth mentioning here as the happy reason why this release was delayed.

This process is common for modern languages with modern tooling and that's a good thing.

The Rust project has a tool called `crater` that can build all crates in crates.io and run their tests, a process that takes a couple of days. It also has a github bot that allows running crater on PRs on demand.

This is super super useful and something that all programming languages should have. It allows you to:

* make sure that a new Rust release doesn't break code that compiled with old releases.

* assess the impact of a PR implementing a breaking change (e.g. due to a bug fix)

* obtain quantitative data about language usage for language evolution.

This last point is probably the most impactful. When I was working on C++, a question like "How much people are doing X?" would often come up.

What did we do? Stakeholders in C++ meetings would travel back home, grep their codebases for the X pattern, and report back 6 months later. And that's it. That's all evidence we were hoping to get. And this was often super biased due to different coding guidelines and styles in the different companies.

In Rust, you open a PR that makes doing X trigger a compilation error, schedule a crater run, and get a list of all places in all crates in crates.io where the error is triggered. This gives you quantitative information about how often is X done in the wild, and you can then go and analyze the code doing X and obtain even more information.

It's hard to convey how important this is. You'll never have to read discussions about "I've never seen X in the wild" vs "I see X in the wild all the time". Those discussions make no sense when you have a tool that can actually tell you the answer. Every programming language being evolved today should have something like this.

Partially the same for Scala. They build the (currently) 185 most used open source projects. Very important since they just did a major overhaul of the stdlib and are working on a major overhaul of the compiler (Scala 3.x/Dotty)

Don't you run into sampling bias this way since not everything is published on crates.io? (I'm not saying that having that tool is a bad thing, just saying that maybe that tool doesn't give you the actual answer ...)

There's no better data set, and it's not a big assumption to assume closed source/unpublished code has roughly the same usage patterns.

For R and CRAN this is also being done. Maybe not with 100% coverage (there are 14k packages on CRAN) but an attempt is made to fix packages by R-core developers if breakage occurs.

Perl/CPAN has been doing this for years - http://cpantesters.org/page/about.html

Here's an example of a CPAN module (Moose) which is tested against every release of Perl (going back to 5.8.3 from 2004) against multiple OS/Platforms - http://matrix.cpantesters.org/?dist=Moose

Good work and keep it up.

A nice language for scientific computing after Fortran (some might hate it, but I liked it during my CS studies).

I hope Julia succeeds like Python. I have been using Python for quite a while for scientific computing, machine learning and as general purpose language for web services development. I wanted to use Julia for some machine learning problem for predicting prices (time series forecasts), but familiarity and tool support of Python always kept me back and with Facebook's Prophet (fbprophet) library I could do it easily.

I will definitely give Julia a try in coming months. I like this competition and innovation in scientific computing.

> some might hate it, but I liked it during my CS studies

Isn't a lot of the frustration regarding Fortran aimed towards libraries written by PhDs who don't know a thing about writing maintainable code (because that's not their specialty - not their fault really) and on top of that are so intelligent and specialized in their own field that you end up with even more impenetrable code, while also being dependent on said library because nobody else understands the problem domain in question to the same degree as those PhDs did (emphasis did, to make things worse[0]) when they wrote the library?

That sounds like a frustration layer cake to me, but it's not exactly the language's fault.

[0] https://medium.com/@fjmubeen/ai-no-longer-understand-my-phd-...

I recall at my first job improving a PHD candidates Fortran code that ran experiments on our various mixing rigs.

His initial code's interface was "?" and you entered integer numbers to execute various functions, and there where multiple levels here so you had to memorise the numbers and where you where in the program :-)

I added prompts (with Mixed case Hollerith statements) that showed the options at each level and what they did.

This was to reduce the risk of a botched run which when we scaled up to 1:1 in an 8-1 m dia tank could cost £20k - the cost of a small flat at the time

Do it. I'm using it atm after years of Python and I like it.

Julia has very well thought syntax and runtime

I hope to see it succeed in the server-side web development area

There is https://genieframework.com/ for a fullstack framework.

Awesome, thanks.

I'd actually want it to get more traction on the DevOps side. The language seems really nice to use, they just need to make it super easy to deploy (think static compilation). Basically it could take over the niche were Go is right now.

I think this [1] is the latest on static compilation -- they're working on it

[1] https://github.com/JuliaLang/PackageCompiler.jl

That would be nice but it's probably unlikely

That's not what Julia's use case is

Does anyone know where Julia has been deployed for production use cases at companies? Understand it’s quite an undertaking but seems like the fast computations might make it seem worthwhile in certain industries?

Julia isn't really well suited for production environments. And by that, I mean on some server or something. In finance it's gaining marketshare as an exploratory analysis tool for traders and quants though. The code is typically productionized in C++ or whatever in the same way Python or Matlab code is.

Is this a common thing? I'd argue it's an anti-pattern, similar to how we prefer devops over splitting purely development and purely operations nowadays.

At least the times I've seen it the first authors proclamed it as done and "just need to port it for production" only to find out pesky things like massive amounts of bugs (no testing in the originals) or unusable performance problems (turns out what you can do on a 64-core cluster overnight is not suitable for doing realtime)

It's very common for larger projects requiring sophisticated algorithm development coupled with demanding runtime requirements such as high throughput, low latency, or high reliability. In such cases, the separate toolchain inefficiency can be justified by higher developer efficiency and the advantage of having two separate code bases developed two ways by two different teams. The two versions can be compared functionally to find many errors that would otherwise be missed due to bad assumptions, library issues, compiler issues, incomplete test coverage, etc.

Whats the alternative? Write it with prod in mind, when you don’t know if the idea is actually viable, and end up erasing a lot more time/code if it fails?

The idea for exploratory programs is just to see if the core idea is actually viable in the first place, before spending time on things like making it robust/fast.

Ofc, if you could make it robust/fast for free/cheap, then it makes sense to call it an anti-pattern — you made it fragile for no good reason — but I’m not clear on how to obtain robustness/speed without quite a bit of work

My preferable alternative would not be to use a totally different toolchain for prototyping and production, and making the people responsible for the prototype also responsible for bringing it to production.

I've heard all kinds of arguments against it, but they mostly boil down to "data scientists/analysts/whatever" can't be arsed to bring it to production (or "do not have the right skill set") and vice-versa for software developers.

I've seen much more success with a developer and less-technical person working closely in a team and bringing a project from prototype to production together.

A lot of the time, you’d just be wasting the developers time. In finance the research quants are going to have to test and explore dozens of ideas before even one is suitable enough to actually deploy in production. It doesn’t make sense for the quant and developer to work closely together in C++ or whatever when there’s a high likelihood that the trading idea doesn’t work in the first place.

Also, the quant often isn’t good enough at computer science to actually develop a production system himself. And even if he was good enough, it isn’t his comparative advantage so he probably should be focusing on research.

It's certainly _very_ different to MATLAB and Python.

After all a lot of the performance sensitive libraries written in C/C++/etc... in Matlab/Python are written in Julia itself.

It really depends what you mean by production.

Why is it not suitable for production environments, its a compiled language?

It depends on the application. Sometimes you need the finer level of control over resources provided by a statically-complied language like C++. An example would be latency-sensitive applications that cannot tolerate the unpredictable delays inherent in garbage collection and just-in-time compilation. Other issues might include current availability or stability of the packages.

By this logic, Java would be unsuitable for production, too. I think perhaps you're conflating "production" and "real-time".

Personally, what I'm really looking for when I ask if something is ready for production is, first and foremost, is it buggy? Second, has it stabilized, or are breaking changes still common? Third, is package/dependency management something that ops can work with without too much hassle, preferably without just relying on Docker to keep it manageable like a parent closing the door to their teenager's pigsty of a room.

Also, you don't have to be faster than the bear. Julia's chief competitor, Python, is often used in production even though it fails the 3rd test and only semi-succeeds at the first.

Java is most certainly unsuitable for production in many domains. That doesn't mean it's unsuitable for all, nor does it mean it isn't "production ready" in general. The GP context was about Julia not being well-suited for certain production environments. I gave one of a number of possible examples where that can be true. Just as with Java, that doesn't mean it's unsuitable for all production environments.

Its probably true that Julia isn't suited to real time but then again the normal prod environments thease days are not running on a real time OS

I certainly used to write real time code for FORTRAN IV on DEC RT11 - thank goodness for the DECUS tapes for example code.

Julia should work fine in production (outside of having less mature tools to integrate with other services compared to popular languages that get direct support from the providers). That and like other said, it's not optimal for some domains (like really realtime stuff that can't have millisecond lags, though honestly latency can be kept low in general by having multiple replicas handling the requests), in embedded stuff (because of high memory profile for example) and it's sub-optimal in situations in which you can't keep the session alive (when the server will restart the application at every request, and the JIT will have a lot of effect in the response time).

The difference between an exploratory code in my opinion and production code is that production code must be correctly structured (comply to an architecture defined by the developers, properly separated in functions, readable), fully documented, unit tested, integration tested, properly tested in homolog/staging, logging, metrics, health checking and extensive error/exception handling.

So in general, during exploration phase you can use the already tested production code/libraries and extend them without much care on the above until you're satisfied with the approach/results (usually in a fast REPL/Jupyter/Revise.jl cycle). Then you start refactoring and implementing all the above to prepare for deployment (and probably during the testing you'll possibly find errors in your exploratory code, which may lead to another exploration phase). Eventually you'll end up with solid production level code.

However, the built in abilities to facilitate structuring code are somewhat lacking in Julia. I would really like a Protocol system of some kind...

It might be a question of avoiding implicit interfaces in your code (or documenting and testing it extensively in case you need them). They are great for creating really generic libraries (like Tables.jl interface), but for business logic production code it might be better to restrict the polymorphism of the language (using more strict dispatch rules, avoiding duck typing, type declaration as type asserts).

Thanks for the great insight - is there any particular reason why this might be the case?

Julia is extremely focused on data science and scientific computing. Most of everyone using it is concerned about things like ML libraries, dataframes, matrices, etc. it’s not that Julia couldn’t be a good general purpose language, it’s just that it’s not the focus. Compare that to the JVM for example, which is mostly focused on servers serving requests.

Also, it’s pretty clear from the existing Julia libraries and homepage that serving requests isn’t what’s important to them.

The Wikipedia page for Julia provides several examples of its use in high-profile companies who say it helped them to make tons of cash faster.

You will find a few talks from JuliaCon 2019.

I’ve recently switched to Julia for all my side projects and I’m loving it so far! For me the killer feature is the seamless GPUs integration

What do you mean by seamless GPU integration, is it some specific package?

I have heard a lot of nice things about Julia and I've decided to learn it.

Can someone point me to excellent resources/tutorials for learning Julia?

I mean I can search Internet but I am not sure I'll stumble upon really good tutorials (I don't want to go through Medium or TutorialsPoint kind of tutorials).

Any help?

I have just finished Think Julia, which was a pretty good introduction and is available for free online: https://benlauwens.github.io/ThinkJulia.jl/latest/book.html Its a little slow for seasoned programmers, but it has a lot of exercises and the idiomatic julia parts tend to stick out more with the simple 2 line snippets.

The best place to start is this list of resources: https://julialang.org/learning/

Excluding the university course listings, I think most of the resources there should be up-to-date (post Julia 1.0).

Once you have a basic feel for the language, the forum (discourse.julialang.org) and Slack are both very welcoming.

Thanks a lot!

Honest question - why did authors feel like creating new langauge instead of contributing/expanding something that exists, ie. Crystal seems to be extremely close to what Julia offers, or am I wrong? Are there any fundamental decisions that put Julia above Crystal?

Julia came before Crystal?



Also, there's a lot of stuff that Julia has which Crystal doesn't (and vice versa since they focus on very different domains)

Ignoring the Crystal issue that others already answered, why didn't Julia developers instead create a kick-ass JIT for Python or R?

And the answer to that is that the semantics of the language were designed from the start for efficient JIT'ing. Thanks to the language design, Julia can get close to C performance with a comparatively simple JIT approach (resolve types, compile a function statically with LLVM just like C, then use multiple dispatch to select the right function at runtime (simplified, yes, but roughly like this)). Compare this with all the tracing JIT heroics that Javascript implementations or PyPy pull, while still getting performance far from what Julia achieves.

Actually, it will choose the right function to dispatch at compile time. In fact if it will in certain occasions even erase all function call and replace with the final output and even remove unaccessible branches from the generated code.

I'm not really seeing any similarities other than both languages having type inference (which is a pretty standard feature of newer languages and as mentioned, Julia was earlier). Julia is competing with Matlab/R/Numpy. That's not what Crystal is targeting at all. I suppose you could imagine that libraries could be written for Crystal, but except for Python's Numpy, these tend not to go anywhere in practice (Ruby itself has the now mostly dead Sciruby for example).

Crystal and Julia is not alike at all? Crystal is a static compiled OOP language. Julia is dynamic with multiple dispatch, and is not OOP.

The crystal guys should have but their effort into nim or something like it imo

Why didn't the crystal guys put the effort into Julia instead?

Going off of wikipedia, it looks like Julia started in 2012, two years before Crystal.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact