
Array Programming with NumPy - headalgorithm
https://www.nature.com/articles/s41586-020-2649-2
======
smabie
Everyone uses numpy, but not because they want to or the API itself provides
some massive value. They use it because unless you're calling into C for
literally every operation, Python is far too slow to get any scientific
computing work done.

And because of this, numpy has to be much larger and less orthogonal. It needs
to provide a wide range of functions that ensure that the user will never have
to drop into regular Python and suffer a 100x performance hit. This is
unfortunate, and prevents numpy by having the beauty and orthogonality of true
array languages like apl, j, or kdb+/q.

This problem makes it difficult for new scientific users of Python to write
fast code. Everything can be fast and then you add one if statement and the
performance falls off a cliff. I've helped many people in various labs as this
is a recurring problem that researchers face.

As such, the ease of use that Python supposedly has for scientific computing
is a lie. The APIs are extremely complex, from numpy to Pandas to statsmodel
and scikit. And they have to be, because scientific Python can't compose code
due to performance reasons.

In contrast, Julia is much easier to learn, easier to use, and is pretty much
better than Python for scientific applications in every way imaginable. In
other words, it is strictly superior. The performance is consistently fast and
array manipulation is built into the core language, unlike with Python. You
are free to write Julia code instead of shelling out to C.

I'll probably get down voted, but Python should not be used for new scientific
computing projects. I've had tons of experience with ml/financial Python and
after trying Julia, I'll never go back willingly. The difference in speed,
complexity, ergonomics, and expressivity is so stark it's hard to imagine why
anyone would choose Python for a greenfield scientific computing project in
2020, especially since Julia can trivially access all of Python's ecosystem.

~~~
geoalchimista
I think you may be overgeneralizing. If the speed of number crunching is the
only concern, then sure, few scientists would use Python, R, or MATLAB. This
is not the case in reality. Whether you would run into speed limit of course
depends on the size of the problem. It's fine to use a "slow" scripting
language to process a small data set on a personal laptop. And I suspect that
this is what most everyday numpy users do.

Once you move to the other end of the spectrum where the size of the problem
requires a supercomputer, that is when the Python ecosystem becomes clunky and
painful. I think that could be an area that Julia shines in. But even that,
I'm not sure if Julia is a clear winner yet, as the field is still largely
dominated by C, C++, and Fortran.

Also, check ref. 56 in the article. They did cite Julia, among others.

~~~
smabie
The point is that Julia is easier to use and faster than Python for just about
any scientific task. Even if Python is fast enough for you (which it probably
isn't), it's harder to use and less expressive than Julia. This is why I
consider it strictly inferior: it is not better at anything at all, only
worse.

Python is so slow it is even unusable on trivially small data sets. Also it
uses a massive amount of ram. I loaded a 2gb csv file the other day in Pandas,
and it consumed over 40gb of ram. That's totally unacceptable in my view.

~~~
geoalchimista
> The point is that Julia is easier to use and faster than Python for just
> about any scientific task.

Try to make a plot. Last time I checked, the time to first plot was still a
pain point of Julia.

> it's harder to use and less expressive than Julia.

I agree that Julia is more expressive because it's a Lisp in disguise. This is
definitely a plus. But in practice, I don't find Python to be "harder to use."
There is good consistency among the mainstream packages in terms of API. For
example, if you have a NumPy function np.mean, you can assume that Pandas
would have a method .mean() for the DataFrame, and Dask would have that for
DaskArray as well. Not always, but things are moving toward that direction.

> I loaded a 2gb csv file the other day in Pandas, and it consumed over 40gb
> of ram.

Have you tried the `engine='c'` option when calling pandas.read_csv? Pretty
sure there is also another option of chunking that may be useful.

~~~
rsfern
Time to first plot is still painful even in julia 1.5, sure.

But after paying that startup cost, the speed boost can be transformative in
terms of the flexibility and dynamism that it buys. I've been using Pluto.jl
lately, and interactive data analysis feels like way less of a burden than
working in jupyter.

As to harder to use, writing fast vectorized numpy code can take a lot of
mental effort. For one project I ended up forcing things into this 5-D array
broadcasting mess, where in julia it's way simpler to just write intuitive
code that's performant without bending over backwards to avoid explicit loops.
For numpy code you can "just" drop down to cython, but for pytorch you can get
stuck with thinking up some clever broadcasting solution.

------
powersnail
I've come to the realization that the real relationship of Python and Numpy is
not a language/library relationship, but a script/engine relationship, similar
to writing scripts in a game or map editor.

Numpy is not just a helping hand to python; instead, it is this efficient,
giant, comprehensive engine, and python is merely the interface used to feed
it data, operate it, and read from it.

Of course, under the hood, it's still just language and library. But the
mental model is quite different.

------
mattip
The real value of NumPy is that it sets the standard for what we expect in an
array processing interface. Of course NumPy did not invent the idea, it builds
on many others, but it is the de facto standard for what we expect. In time,
another language or array processing library will probably replace Python /
NumPy/SciPy, but due to the convenience of the API, I predict whatever takes
over will look a lot like NumPy.

------
joker3
Finally I can take NumPy seriously now that it's been published.

------
nycticorax
I think it's cool that NumPy is being recognized in Nature, but I'm a little
surprised that "the NumPy paper" wasn't published 10-ish years ago...

~~~
alpineidyll3
Back then travis was still licking wounds over leaving academia.

