
Julia's Arraypocalypse Release - ViralBShah
https://github.com/JuliaLang/julia/issues/13157
======
johnmyleswhite
Probably worth noting that the GitHub issue in question relates to work being
done on Julia 0.5, a future release that is not yet ready to ship and won't
ship for several months.

Within a few weeks, Julia 0.4 will be shipping, which is not focused on
changes to arrays.

------
StefanKarpinski
The "arraypocalypse" name is somewhat tongue in cheek, referring to the fact
that this change will include some breaking changes. To provide some very
brief motivation for the two most significant changes that are planned:

1\. Array slices will create views rather than copies. This is just a matter
of performance – although possible, it is currently way too annoying to write
copy-free array code. This change will make it the default.

2\. Slicing will behave more like APL: the rank of a slice of an array will be
the sum of the ranks of the slices. This is a regular and powerful scheme, but
sometimes at odds with how slicing works in Matlab.

These changes will make the language easier to write high-performance, general
code, while, with some care not making linear algebra harder.

~~~
masklinn
> Array slices will create views rather than copies. This is just a matter of
> performance

Except for breaking any existing code which mutates slices (unless slices are
COW?), and implicit sharing can have disastrous effects on memory usage.

~~~
StefanKarpinski
How can implicit sharing have "disasterous effects on memory usage"? It will
certainly break some code, hence the large amount of warning and the
"pocalypse" appellation. Note that this is how array slices work in NumPy and
that ecosystem has not collapsed.

~~~
masklinn
> How can implicit sharing have "disasterous effects on memory usage"?

A small long-lived slices keeping a original arrays alive implicitly. Oracle
recently undid the same optimisation in Java's strings because of that exact
issue (which obviously broke code the other way around, code relying on cheap
substrings would now take O(n) instead of O(1)). Erlang developers are also
warned early and often about that behaviour in binaries.

~~~
StefanKarpinski
Totally different typical usage patterns. Taking substrings and throwing away
the original string is the norm for text. Taking a column of a matrix and
throwing away the original matrix is not. Different domains call for different
design tradeoffs – this change stems from extensive practical experience with
writing numerical algorithms.

~~~
lqdc13
would there still be an easy way to make copies?

Like advanced indexing
([http://docs.scipy.org/doc/numpy/reference/arrays.indexing.ht...](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-
indexing)) in the Python world?

~~~
elcritch
You could just use a list comprehension, e.g. `[ x for x in arrayveiw1 ]` or
perhaps a copy function will be added to Base.

~~~
StefanKarpinski
There's already a copy function that does this.

~~~
elcritch
Great, Julia seems to have a good assortment of those "I need XYZ" (for
scientific programming) and it's already in Base. I should check the apropos
more.

------
jsnathan
Could someone maybe tl;dr what is going on here?

I've tried out Julia a few years back, but didn't find it ready for serious
use (yet). Am I right to assume that this means the language is still in
serious flux?

~~~
ViralBShah
Compared to a few years back, Julia is in seriously good shape. The long and
short of this is that while the language has largely been stable, the array
library in Julia will get an upgrade to become a lot more modern and clean.

For most users, this should not change much. However, it gives us the ability
to do some really amazing things with our array infrastructure going forward.

~~~
Lofkin
Also, any plans for the stats ecosystem? (dataframe, dashboards, imputation,
data cleaning etc)

It is still a bit wanting.

~~~
ViralBShah
A lot of interesting things are going on in the DataFrames area - especially
with the recent work on NullableArrays by David Gold and John Myles White. A
blog post will be out soon describing it all. In my opinion, Escher.jl makes
it really easy to build amazing dashboards and is progressing well.

But, a lot more remains to be done, and contributions would be really welcome
in these areas that take Julia's statistical computing to the next level.

~~~
elcritch
I'd like to add a note about my experience with doing this; I ended up forking
Distributions.jl and added a fitting function for the Weibull distribution.
Still need to clean it up and do a pull request,.

The amazing thing about Julia I've come to enjoy is how clean the source code,
how easy it is to find functionality, and it's all written _IN_ Julia. That
means I can modify libraries that are normally C or C++ for perf reasons. It's
changed the prospect of needing to add functionality from one of hesitation to
feeling that it's actually enjoyable.

~~~
KenoFischer
I'm really glad to hear this! I'd like to think that the community and
inspectability argument for Julia is actually one of the strongest ones out
there, but it's not easy convincing people who haven't had that experience.

~~~
elcritch
It really is pretty great.

------
east2west
I love Julia for its expressiveness but I am somewhat confused about its
dynamic typing and its powerful type system. Are we to write mostly Python-
like function and rely on multiple dispatch and JIT code generation for speed?
Duck typing is one reason I stopped using Python for technical computing. I
tried Julia out a few months ago and found that type inference has some holes
that make type annotation brittle for function parameters. I finally gave up
when I found out functions have no return type in Julia.

Hopefully Julia can work out all the language features and mature into a good
technical language. It would also help if there is a guide about the canonical
programming style. I prefer static typing but thought using type annotation I
could come close to static typing while still retain the freedom of quick-and-
dirty code. I believe for the current Julia I was wrong.

~~~
johnmyleswhite
If you'd like to know canonical programming style, you should read the code in
Julia Base. Learning how to write good Julia is somewhat difficult because the
language occupies an unfamiliar position between traditional static languages
and traditional dynamic languages.

Can you provide some concrete examples of the functionality you're looking
for? What exactly does "quick-and-dirty" code look like to you?

~~~
east2west
I don't use "quick-and-dirty" as a derogatory term, so mostly I mean no error
checking for rare occurrences or not ensuring correct values are being passed
through. An example would be in Python if I just write MatrixA * MatrixB
without any checking. I should check that they are both matrices if that is
what I want, and that is what I did for reusable code. For one-off data
analysis or computation, no-checking is certainly quick.

------
jbssm
Is there any progress when it comes to the slow package loading times of
Julia? That made it quite unusable as a data exploration language when I last
tried it about 1 year ago, but perhaps there was progress in that regard.

~~~
ViralBShah
Julia 0.4, already in the RC phase, has precompilation thanks to Jameson
Nash's work. It has been enabled for several packages, and gives sub-second
load times.

------
Xcelerate
I'll be glad once this is all implemented. I recently rewrote most of my
functions that took abstract vectors and matrices to instead take concrete
arrays along with a few index parameters. I did this because I realized
SubArrays were making huge memory allocations inside the tightest parts of my
loops for some unknown reason. Since I didn't want to spend days debugging
that weirdness, I just took the simpler (but cruder) option of passing in the
"kitchen sink" to each function.

