
Using D and std.ndslice as a Numpy Replacement - bionsuba
http://jackstouffer.com/blog/nd_slice.html
======
RogerL
Does D have code for: plotting, optimization, probability distributions,
machine learning, Fourier transformations, masked arrays, finanial
calculations, structured arrays (read a CSV from disk, get named columns based
on the header), SVD, QR and Cholesky decomposition, eigens, least squares,
Levenberg Marquardt, matrix inverse and pseudoinverses, integration, Runge
Kutta, interpolation, bsplines, fft convolves, multidimensional images,
KDTrees, symbolic equation solvers, merge/join of data sets, etc.?

Because I use almost all of these _every single day_ (I don't do
multidimensional images or b-splines much at all). Are those all in standard
libraries, fully documented, backed by 60 year old, fully debugge code
(LAPACK, etc), that I reliably email to anyone across the world and they can
immediately run and modify my code because it is such a standard? I honestly
don't know, but I'm guessing not.

I use Python/Numpy/Scipy/Pandas/Matplotlib because everyone else in the world
knows and uses them; they are a standard. Yes, my np.mean() might be slower
than your map(). I almost always don't care. That misses the forest for the
trees.

The article might be a good argument for why library writers might consider
building out D's standard library to support numerical computation, I dunno.
But no one is going to use D for serious number crunching without that
infrastructure in place. People moved from Fortran and Matlab to Python not
because it is fast, but for the environment. These language tricks are cute
and all (I like D well enough, don't get me wrong), but it ain't why we are
using Python.

At this point, if I were to switch languages to something without a lot of
adoption I'd lean towards Julia. It also have a modern language design, but it
is written from the ground up for numerical computation. I can't think of any
reason I'd ever reach for D.

~~~
9il
Yes, Julia is amazing! In the same time, if you want to write a package for
Julia you _may_ need to use C/C++. D is going to have integration with Julia
in 2016 ;)

D already have good integration with Python. You may want to read this article
[http://d.readthedocs.org/en/latest/examples.html#plotting-
wi...](http://d.readthedocs.org/en/latest/examples.html#plotting-with-
matplotlib-python) (it may be a little bit outdated).

~~~
tadlan
What would one need to use c or c++ to write a julia package? It's as fast as
native code so no need to use multiple languages.

~~~
9il
Julia is really fast in 95% cases, but 5% still "make the weather". The
pairwise summation is an example. I will post benchmarks D vs Julia next week
;)

~~~
skariel1
I would say the biggest benefit of D is static typing. In Julia you can run a
simulation and discover only after half an hour that you misspelled a function

------
jboy
Credit to the D developers for providing a concise, carefully-designed library
for N-D array processing. The chained method invocations demonstrate D's UFCS
(Uniform Function Call Syntax) nicely. And it's a definite bonus that you can
use underscore like a comma separator in long integer literals (eg,
`100_000`).

But if you use Python + Numpy/Scipy/Matplotlib and you're looking for a
modern, compiled language for execution speedups or greater flexibility than
what Numpy broadcasting operations provide by default, I would recommend Nim.
It's as fast as C++ or D, it has Pythonic syntax, and it already includes many
of D's best features (including type inference, UFCS, and underscores in
integer literals).

And best of all, you _don 't_ need to rewrite all your existing Python+Numpy
code into a new language to start using Nim.

The Pymod library we've created allows you to write Nim functions, compile
them as standard CPython extension modules, and simply drop them into your
existing Python code: [https://github.com/jboy/nim-
pymod](https://github.com/jboy/nim-pymod)

The Pymod library even includes a type `ptr PyArrayObject` that provides
native Nim access to Numpy ndarrays via the Numpy C-API [
[https://github.com/jboy/nim-pymod#pyarrayobject-
type](https://github.com/jboy/nim-pymod#pyarrayobject-type) ]. So you can
bounce back and forth between your Python code and your Nim code for the cost
of a Python extension module function call. All of Numpy, Scipy & Matplotlib
are still available to you in Python, in addition to statically-typed C++-like
iterators in Nim+Pymod [ [https://github.com/jboy/nim-pymod#pyarrayiter-
types](https://github.com/jboy/nim-pymod#pyarrayiter-types) ,
[https://github.com/jboy/nim-pymod#pyarrayiter-loop-
idioms](https://github.com/jboy/nim-pymod#pyarrayiter-loop-idioms) ]. The Nim
for-loops will be compiled to C code that the C compiler can then auto-
vectorize.

~~~
9il
D has integration with Python/Matplotlib too =P
[http://pyd.readthedocs.org](http://pyd.readthedocs.org)
[http://d.readthedocs.org/en/latest/examples.html#plotting-
wi...](http://d.readthedocs.org/en/latest/examples.html#plotting-with-
matplotlib-python)

~~~
jboy
It looks like you need to copy your D array to a newly-allocated Numpy ndarray
before you can pass it to Python. So there's no binary PyArrayObject
interoperability between D & Python (right?). Copying large N-D arrays all the
time sounds slow...

(That Matplotlib example uses the function `d_to_python_numpy_ndarray` in the
PyD project, which I found defined here:
[https://github.com/ariovistus/pyd/blob/master/infrastructure...](https://github.com/ariovistus/pyd/blob/master/infrastructure/pyd/extra.d#L82)
. It clearly allocates a new Numpy array:
[https://github.com/ariovistus/pyd/blob/master/infrastructure...](https://github.com/ariovistus/pyd/blob/master/infrastructure/pyd/extra.d#L97)
)

Also, I couldn't find any examples of invoking D functions from Python. In
fact, I could only find mentions on the D mailing list of people reporting
that they _couldn 't_ get it to work:
[http://forum.dlang.org/post/rdhrvzhhwxgfyxzjevfu@forum.dlang...](http://forum.dlang.org/post/rdhrvzhhwxgfyxzjevfu@forum.dlang.org)

By compiling (transpiling) to C, Nim really does have an unfair advantage in
the interoperability challenge...

~~~
9il
ndslice was merged to DLang master repo today. It is not a problem to fix PyD.
(compiling to C is crispy)

EDIT: Exposing-d-functions-to-python
[http://pyd.readthedocs.org/en/latest/functions.html#exposing...](http://pyd.readthedocs.org/en/latest/functions.html#exposing-
d-functions-to-python)

------
zardeh
And here we have a case of why microbenchmarks don't work. What you're
measuring here isn't a speed difference in the mathematical code, its a
constant time overhead from calling into the numerical libs. Up your array
size by 100 times and this will become evident.

Why do I say this? Because inlining the python function to

means = numpy.mean(numpy.arange(100000).reshape((100, 1000)), axis=0)

from the original example in the article cut the benchmark time in down from
around 215us to 205 us in my testing. That was done by removing a single
python bytecode instruction.

Its quite likely that the D numerical code is actually slower than the LAPACK
based python numerical code, but you're hiding this in the constant time
overhead of a few python function calls.

~~~
bionsuba
As I stated in the article, I did not include the array creation in the
benchmark in order to be fair to Numpy with its slow initialization times. The
only python code that I benchmarked was the numpy.mean line.

~~~
zardeh
That still doesn't matter. So much of the work is being done in python that
you're benchmarking the overhead of python, not the numpy speed.

I mean this is a terrible benchmark. Comparing to pure python, the numpy code
is about 4-5x faster on my machine, whereas in reasonable real world
benchmarks, numpy is hundreds of times faster than pure python.

For reference my benchmark was

    
    
        %timeit [sum(row)/len(row) for row in mat]
    

and it completed in about 900us, given mat was a python 2d array.

These results are as valuable as the ones pypy gave me:

    
    
        pypy -mtimeit -s "[sum(row)/len(row) for row in [[range(100000)[i*j] for i in range(100)] for j in range(1000)]]"  
        1000000000 loops, best of 3: 0.00101 usec per loop
    

Edit:

In other words, because the code returns a python array of length 1000, what
you're benchmarking is python's speed of array instantiation vs. Ds
mathematical speed. Its obvious that D will win. The only fair way to
benchmark the mathematics here is if you compare the D implementation of O(n)
algorithm that returns a single value, or an O(n^2) algorithm that returns a
1d array or single value etc. Otherwise the python overhead of creating O(n)
objects will overshadow the actual mathematical calculations on those O(n)
objects.

~~~
bionsuba

        you're benchmarking is python's speed of array instantiation vs. Ds mathematical speed. Its obvious that D will win.
    

Not true, the D code also has the overhead of array initialization.
std.array.array is called which allocates the results of the range into an
array on the GC, which everyone bemoans as being slow as a dog.

Plus, I don't see why this is an invalid benchmark when this is perfectly
normal Numpy code, the kind that you see all the time. That was the reason
behind the benchmark, to find some common piece of Numpy code and see how
equivalent std.ndslice stacked up. I don't see how it's fair to say "Python is
really slow at this one thing, so it's not fair to compare it to D in that
area".

~~~
semi-extrinsic
He's absolutely right, this is a crappy benchmark. Increase the array sizes by
at least a factor of 100 to get anything meaningful.

The fact that someone who is "the review manager for std.ndslice's inclusion
into the standard library" does not understand how to profile numerical
algorithms makes me very skeptical of using D for any numerical project.

Plus, the syntax looks god-awfully unintuitive. A main advantage of Python is
that you often get "code that looks like what it does". The D "basic example
with a benchmark" OTOH looks almost obfuscated. To wit; a Fortran version is
more readable, is fewer lines of code(!) and of course kicks D's butt when it
comes to speed:

    
    
      program p
      real, dimension(100,1000) :: data
      real, dimension(1000) :: means
      
      n=1
      forall(i=1:100,j=1:1000)
        data(i,j)=n
        n=n+1
      end forall
    
      forall(i=1:1000000,j=1:1000)
        means(j) = sum(data(:,j))/size(data(:,j))
      end forall
    

Disclaimer: I wrote this on my phone, only 95% sure that it will compile and
run correctly. Save it in means.f90 and compile with `gfortran -Ofast
means.f90`. This calculates the means 1 mill. times (the loop over i in the
second forall); I bet you it will be an order of magnitude faster than D per
means calculation if you time it with plain time (the *nix command).

~~~
bionsuba
"Increase the array sizes by at least a factor of 100 to get anything
meaningful."

Ok, let's do that and see what happens:

    
    
        python -m timeit -s 'import numpy; data = numpy.arange(10000000).reshape((1000, 10000))' 'means = numpy.mean(data, axis=0)'
    

D code

    
    
        import std.range : iota;
        import std.array : array;
        import std.algorithm;
        import std.experimental.ndslice;
        import std.datetime;
        import std.conv : to;
        import std.stdio;
    
        enum testCount = 10_000;
    
        void f0() {
            auto means = 10_000_000.iota
                .sliced(1000, 10000)
                .transposed
                .map!(r => sum(r) / r.length)
                .array;
        }
    
        void main() {
           auto r = benchmark!(f0)(testCount);
            auto f0Result = to!Duration(r[0] / testCount);
            f0Result.writeln;
        }
    

Results

    
    
        Python: 14.1 msec
        D:      39   μs
        D is 361.5x faster

~~~
semi-extrinsic
Ok. So three points remain:

* how do I know that the calculations aren't actually optimized away in all that code I can't understand? E.g. what happens if you make f0() return the means array instead of being a void function?

* Given the same number of lines (or characters), are you sure you can't write equally fast Python code?

* I tested the Fortran version on a slightly slower computer (got 14.7 msec on the Python version). Fortran runs at 0.7 μs, i.e. > 55x faster than D, with less code that's more readable to boot.

~~~
bionsuba
"how do I know that the calculations aren't actually optimized away in all
that code I can't understand? E.g. what happens if you make f0() return the
means array instead of being a void function?"

If you can't understand the code, how did you know it was a void function.
Please stop with the hyperbole, it's not adding anything to the discussion.

Updated code:

    
    
        import std.range : iota;
        import std.array : array;
        import std.algorithm;
        import std.experimental.ndslice;
        import std.datetime;
        import std.conv : to;
        import std.stdio;
    
        enum testCount = 10_000;
    
        auto f0() {
            auto means = 10_000_000.iota
                .sliced(1000, 10000)
                .transposed
                .map!(r => sum(r) / r.length)
                .array;
            return means;
        }
    
        void main() {
            auto r = benchmark!(f0)(testCount);
            auto f0Result = to!Duration(r[0] / testCount);
            f0Result.writeln;
        }
    

Results:

    
    
        Python: 14.1 msec
        D:      41   μs
        D is 343.9x faster
    

"Given the same number of lines (or characters), are you sure you can't write
equally fast Python code?"

IMO program size is an almost meaningless statistic outside of code golf
challenges. LOC is not an indicative measure of code readability, usefulness,
or organization.

For example, your Fortran code was 11 lines while the D function (with the
return) is eight lines.

"I tested the Fortran version on a slightly slower computer (got 14.7 msec on
the Python version). Fortran runs at 0.7 μs, i.e. > 55x faster than D, with
less code that's more readable to boot."

Just goes to show why Fortran is still used in a lot of scientific areas.

But two things:

1\. Fortran is a much simpler language than D or Python and is much harder to
do multipurpose work in it (so I'm told from Fortran programmers, I don't know
Fortran myself). So when your program needs to do anything other than number
crunching, it's normally done in a separate language. Using D you can have
everything in one code base.

2\. This article was about Numpy and std.ndslice because those are two areas
that I know about and Numpy is a very popular library. Bringing up Fortran's
speed here is like commenting on how much faster C++ is in a thread about
Ruby.

Also, readability is a subjective idea; I believe the D code is more readable
than the Fortran you wrote. Different strokes.

~~~
semi-extrinsic
> If you can't understand the code, how did you know it was a void function.
> Please stop with the hyperbole, it's not adding anything to the discussion.

I understand "void" perfectly fine, but understanding how D optimizes your
code is a completely different matter. Playing devil's advocate further, you

> IMO program size is an almost meaningless statistic outside of code golf
> challenges. LOC is not an indicative measure of code readability,
> usefulness, or organization.

No, but the numpy code is undoubtedly much simpler, and very general. How does
the D example look for a 3D array where you want to average over the second
dimension?

> For example, your Fortran code was 11 lines while the D function (with the
> return) is eight lines.

You're neglecting the library imports.

>So when your program needs to do anything other than number crunching, it's
normally done in a separate language. Using D you can have everything in one
code base.

I do agree other languages are much better for e.g. string processing. But for
the applications where you can afford to trade simpler code for 50x slower
performance, I'd say you're not really caring about performance at all, so why
not just use Python? If performance is mission critical, a two-language code
base (Python + C/C++/Fortran/CUDA) is not that hard to do, fairly common, and
will give you the required performance.

> Bringing up Fortran's speed here is like commenting on how much faster C++
> is in a thread about Ruby.

No; once you start saying "I want more performance than Python", it's
obviously interesting to see what level of performance a "low-level" language
gets, to put the result in perspective.

Now, I'm not trying to beat down on D, so please try to interpret this as
constructive criticism. Finding "where does it fit in the scientific toolbox"
is what makes or breaks a language's adoption in the scientific community.
Just look at R, it's at the same level of performance/abstraction but in a
separate niche from Python, and doing very well. The same goes for Matlab
(which has e.g. Simulink).

------
cannam
I occasionally rewrite Python+NumPy signal processing code in C++ for purposes
of packaging and integration with native apps, so I read these examples with
an eye to how they compare with typical C++, rather than with NumPy. They
compare very well, and it would never have occurred to me to look into D as a
possibility for this sort of code.

I'm guessing the GC might rule it out for many cases where you do signal
processing in C++, but I may as well ask: what's the deployment side of things
like? Can I easily build a shared library and use it from a C++ application?

~~~
jboy
You might also be interested to check out Nim. It transpiles to C before
invoking the C compiler, so it runs as fast as C++ and has excellent
C-compatibility (and by extension, excellent C++-compatibility).

Compiling a shared library is as easy as passing the "\--app:lib" option to
the Nim compiler: [http://nim-lang.org/docs/nimc.html#compiler-usage-command-
li...](http://nim-lang.org/docs/nimc.html#compiler-usage-command-line-
switches)

The GC is optional; you can manage your memory manually if you prefer:
[http://nim-lang.org/docs/manual.html#types-reference-and-poi...](http://nim-
lang.org/docs/manual.html#types-reference-and-pointer-types)

The Nim tutorial is here if you want to have a quick skim: [http://nim-
lang.org/docs/tut1.html](http://nim-lang.org/docs/tut1.html)

~~~
carljv
Please stop this. It's approaching spam. Someone put a lot of work into
building a new capability into a language and writing up a lengthy blog post
about it. If you want to do the same for what you think is good about Nim and
submit it to HN, please do. Repeatedly intercepting people's question about D
with "you should look at Nim" is obnoxious. If D users did this in a thread
about Nim, I imagine you would find it frustrating.

------
snydly
Am I correct in thinking that this is only reasonable if you're already using
D? The switching cost seems too high if you're python everything.

I tried the Armadillo C++ library a while ago
([http://arma.sourceforge.net/](http://arma.sourceforge.net/)). The speed up
and time spend learning the syntax didn't seem worth it.

~~~
bionsuba
I completely understand that for existing projects it might not make sense to
switch, but as I say at the start of the article

    
    
        why you should consider D for your next numerical project.

~~~
snydly
I just ran your column means benchmark. It is pretty impressive... maybe my
C++ is trash. Never saw that much speedup with Arma. I'll try it out :)

------
p4wnc6
The central claim of the post seems summarized by this quote:

> For example, when using a non-numpy API or functions that don't use Numpy
> that return regular arrays, you either have to use the normal Python
> functions (slow), or use np.asarray which copies the data into a new
> variable (also slow).

but I disagree strongly with this.

First of all, if there is a common use case for some set of operations that
need to be performed on very large data (the type of data you'd look to NumPy
to handle), then generally there is already a subpackage within
numpy/scipy/scikits/pandas/etc that already deals with that use case and
natively handles it with NumPy arrays, with no switching cost to convert back
forth between lists or tuples or whatever.

And, of course, when a list/tuple-heavy API is only meant to deal with small
data, it's not a problem to use NumPy's facilities for converting between
ndarray and the builtin array types. In cases where you're dealing with a huge
breadth of small data, then that casts doubt on whether you should be using
NumPy; it wouldn't be casting doubt on whatever the other list/tuple-heavy API
is. And probably parallelization (or even the buffer stuff I mention below) is
a fine solution in that case.

Second, in a lot of cases you can make use of the Python Buffer Protocol to
share the underlying data of a NumPy array without copying it. This won't help
if some other API expects Python lists or tuples, but the great thing about
dynamic typing in Python is that all that really matters is that whatever
underlying buffer type you need implements whatever methods that other API
expects to call.

You can always write your own extension type that adheres to the Buffer
Protocol and also provides whatever API is needed to conform to some other
library, so the power to create these double-sided adapters (one side sharing
data with NumPy, the other side appearing like a drop-in acceptable data
structure for the other library API) is very powerful and generic. It might
take some getting used to the first few times you do this, but if you use
tools like Cython to help, it's really quite easy to do, easy to maintain, and
solves a surprisingly wide range of NumPy integration problems. In fact, these
things generally already exist for most problems you will run into and
ultimately they often boil down to simple Cython-based wrappers around C
bindings to the other Python API you're working with.

I would argue that the existence of this Buffer Protocol adapter strategy
alone is enough to say that the switching cost to D is virtually never worth
it, and still pretty speculative even if you're starting a brand new numerical
computing project.

Finally, most Python libraries that heavily rely on the list or tuple APIs are
not meant for large data (those APIs mostly already just use NumPy, as I
mentioned, or else they use generators and let the end user decide which array
type will eventually be instantiated as the results are consumed). It's not
common, by intentional design, for list/tuple-heavy APIs to need to cope with
large data, so when someone says something off the cuff like "What do you do
when some library API needs lists and you've got NumPy arrays?" it _sounds_
like a worrisome case, but in practice it's really, really uncommon that such
a situation arises and no one else ran into it before you and no one has
created a NumPy-compliant solution already. It's not impossible, of course.
Just unlikely, and probably not important enough to use as a basis for
language choice, unless you're facing a really special sort of API problem.

Edit: None of this should read at all as a criticism of the D language or this
particular implementation of ndarray data structures. All that stuff is great
and anyone wishing to use D absolutely should.

I'm _only_ arguing against the post's central thesis insofar as it is used to
justify considering D as an alternative to scientific Python. The problem that
the post points out already has solutions solely in the Python ecosystem, many
projects have handled that problem before, and the problem is pretty rare and
esoteric anyway, so it's probably not a good thing to use as the basis of an
important choice like which language to use, or whether to switch languages.

There could be many reasons to prefer D over scientific Python depending on a
given use case, and there could be certain situations where switching from
Python to D is a good idea. Whatever those cases may be, the central issue of
this post, performance degradation caused by NumPy-to-other-API compatibility,
is _not_ one of them.

~~~
srean
Very nicely put and I agree. What I was expecting to see in the list of numpy
problems mentioned in the article wasn't there. The major problem that makes
Numpy performance lag behind C++ or Fortran is the extra level of indirection
that is needed for accessing an element (via stride ptr), and the need for
extra copies that is forced on you by vectorization. Numexpr can help for
certain cases of the latter, but its still quite limited in the type of
expressions that numexpr can handle. It is my belief that with some local
static analysis both can be mitigated somewhat.

It would be really interesting to see whether D's nd object tackles these
issues. The copy of arrays across function boundaries, as you correctly
pointed out, is mostly a red herring.

~~~
tadlan
Numba obviate these issues

~~~
srean
I see your excitement about Numba, I guess you follow it closely, or perhaps
have other valid reasons. So all the best. However, the last time I tried to
use it, about 4 months ago, installation was a bitch and it would crap out
compiling some functions. It holds promise of course, so does alternatives.

What I think the current post is about is that similar nice and performant
abstractions for D. I love Python a lot but I have to pay extra attention so
that it is performant, so that I don't make stupid typos. I have to rewrite
parts in Cython or Weave (sadly the latter gets no love anymore). Larger the
code base becomes I have to spent more time in the tail part of the 'tooth to
tail' ratio. In D many of these are taken care by default, I have to worry a
lot less about these things. The other fantastic thing is DMD has super fast
compilation times. It used to be faster than Go in compilation time but the
latter has caught up in that department (grossly lacking in others). D
binaries execute a lot faster (when compiled with LDC or GDC). It may lack
some libraries here and there, but that does not worry me as much, everyone
has to start somewhere, and to be honest Python is lacking in that respect
compared to R. What would be worrisome is the presence of structural aspects
that may get in the way of such an eco-system emerging. I don't see anything
of that kind in D.

In anycase I don't quite see how Numba obviates the issues I mentioned. It
does not change the Ndarray representation, and the extra indirections lay
right there. OTOH some copy elision I would grant.

I can give concrete examples. The isotonic regression implementation in
scikits.learn was absolutely pathetic. It has been rewritten several times and
its performance is still severely lacking in spite of being Cythonized. You
can take a stab at it with Numba and see what you can do. In C++ (D would have
been nicer) the very first idiomatic implementation in STL was faster than any
of these rewrites. I wrote it once and moved on. But C++ is a horrifying mess,
now if there was a language that gave me C++ performance (not asking for much)
but none of the mess and numpy level expressiveness to boot, that would be
sweet. Julia, Nim, D, PyPy are some of the contenders trying to reach this
holy grail

~~~
p4wnc6
Numba certainly does not obviate all of these issues. I think user `tadlan` is
referring to some of the compiler optimizations, like loop unrolling or fusion
(e.g. noticing that two subsequent loops can be 'fused' into the subordinate
execution block of one single loop). These things can offer speed-ups even
beyond NumPy, and they can work even when the Python code you start out with
already uses NumPy.

The thing is, which `tadlan` seems unaware of, a lot of this stuff just fails
in production environments and hits corner cases that the Numba compiler does
not handle (I'm talking about the first part of the Numba compiler pipeline,
where it converts to _Numba_ IR, and not yet to LLVM IR, and does things like
examine the CPython bytecode to alter the representation from a CPython stack-
based representation to a register-based representation that will be
compatible with LLVM and ultimately with the actual machine itself). In _that_
compilation step, the only things that are able to be handled are things that
the Numba team (I used to be a member of it) explicitly support. They don't
support a full-blown compiler for the entire Python language, nor even for
every type of NumPy operation. That's not a knock against Numba at all -- it's
a _specializing_ compiler and obviously they need to prioritize what to
support, and make longer term goals about supporting more general things. But
the point still remains that you cannot just assume that if you call `jit` on
any arbitrary Python code, it will always become faster. In some cases, it can
even become slower.

I suspect `tadlan` is very interested in Numba and enthusiastic about knowing
the taxonomy of Numba details, but it does not seem like that user has had
real world experience trying to get Numba to work in production, and seeing
all of the numerous buggy and missing features. I don't want to diminish
anyone's enthusiasm for Numba, so it's probably best just not to engage with
`tadlan` about it. That user's mind seems made up already.

~~~
Gtifn
As a former member of the Numba team, what is your outlook on the technology
and the broader continuum ecosystem?

I'm looking at Numba and friends (Dynd, blaze) for a new stack, but I'm not
sure where the development arc will end up vs say Julia. I'm also curious
about the sustainability of Continuum's business model and practices in the
medium and long term.

Any thoughts on this? I understand if you are limited in what you can say, but
I'm open to any nuggets.

~~~
p4wnc6
This is one of those questions that is extremely hard to predict. Even though
I was part of that team, it doesn't mean I have any special insight into what
will happen.

A lot will depend on sources of funding. Will folks like Nvidia start
sponsoring Numba, and if so what will it mean for support of Nvidia
alternatives like OpenCL?

It seems like NumbaPro as a stand-alone for-pay product is not viable on its
own, but that claim could be wrong based on more recent data that I don't have
access to. So external sponsorship may be necessary.

One form of this could be through Continuum's already established business
model of consulting and support services. But then the question is whether the
nature of those consulting and support projects will allow for developers to
actually further the cause of Numba, or just merely hack in poorly conceived
features that are demanded by the consulting and support customers? Since
Numba is open source, it should be easy enough for anyone to follow along with
commits and discussions on GitHub and make their own opinion about what
direction that is going.

The other question that is always hard is staffing. Far and away the
colleagues I had the chance to work with on the Numba team were amazingly
good. But it's not clear if working solely on Numba can justify the sort of
salary that would be required to attract very top engineers and grow the team.
You might start to see more interns and/or post-doc type labor feeding into
Numba, and again I don't know what that will mean for the project ... could be
good or bad.

At the same time, you've also got a lot of active development for Julia, PyPy,
and a lot of people still prefer to use Cython rather than jitting functions.
Some people even call into question the entire goal of making something that
is "easy" but also a "black box" \-- like the way just dropping in `jit` works
for people who merely use, but don't understand the inner workings of,
CPython.

It's an exciting area, and the Numba team has as much talent and ability to
claim a significant piece of the tool space surrounding high performance
computing as anyone else. Whether that will pan out for them is still really
hard to predict.

~~~
Gtifn
Thanks very much for your thoughts.

What do you think about the foundational tech of numba itself?

Do you think it is any more of a black box than say Julia? Is there anything
about it that would impede extension into a more stable, predictable and
feature rich product?

BTW looks like Intel is doing some stuff with Julia:
[https://github.com/IntelLabs/ParallelAccelerator.jl](https://github.com/IntelLabs/ParallelAccelerator.jl)

There has also been alot of recent funding to Continuum and dev of numba seems
to be going strong. Also some recent work with AMD.

------
tadlan
Use numba to compile python loops or array expressions to fast llvm, and
problem solved. I'm sticking with python.

~~~
bionsuba
Considering that in the benchmarked example, only one like of Numpy code was
used which already uses compiled C, I have a hard time believing that that
would catch up to using all compiled code.

~~~
tadlan
Numbs compiles entire functions on and allows array expressions with
allocation and loop fusion. I don't see the problem

~~~
bionsuba
The benchmarked code is ALREADY a comiled C function called from Python and it
still lost.

~~~
tadlan
Numba would still be faster. It would fuse away any intermediates in the code
and remove any Overhead to the compiled code.

also have the option of devecting to loops.

both of which are generally faster than vectorized jumpy code.

~~~
p4wnc6
While I agree broadly that the benchmark example in the post is not
representative or useful as a comparison between D and NumPy, I disagree with
your strong insistence on Numba in this case.

There is still _a lot_ of Python code that the Numba-to-LLVM compiler cannot
handle. Yes, it is true that Numba can do a decent job of removing CPython
virtual machine overhead, even for functions in which you statically type the
arguments merely as 'pyobject' \-- but not universally.

And there are also a ton of super basic things, for example creating a new
array inside the body of a function when using Numba in 'nopython' mode [1],
that Numba doesn't handle. These things will improve over time, but they may
not improve quickly enough for a given use case, and the D language and this
ndarray implementation may be a fair competitor to Numba in the short-to-
medium term (and could even be superior in the long run, who knows).

A fairer alternative would be comparing with the use of Cython, which for my
money still hands down beats anything like D or Nim/Pymod for performance
enhancement without sacrificing pleasantness of design and development.

Though, of course, none of this stuff holds a candle to Haskell :)

[1] < [http://stackoverflow.com/questions/30427081/creating-
numpy-a...](http://stackoverflow.com/questions/30427081/creating-numpy-arrays-
inside-a-function-decorated-with-numbas-jitnopytho) >

~~~
tadlan
Actually, Numba array allocation in nopython mode is allowed now.

What other numerical features are missing?

------
fizixer
click-bait. Why should I change my language even if I'm looking for a numpy
replacement?

~~~
bionsuba
How is this clickbait? Did I in any way misrepresent the content of the
article?

If you don't like D and don't want to use it move on to the next item on the
front page.

