Using D and std.ndslice as a Numpy Replacement

RogerL · on Jan 3, 2016

Does D have code for: plotting, optimization, probability distributions, machine learning, Fourier transformations, masked arrays, finanial calculations, structured arrays (read a CSV from disk, get named columns based on the header), SVD, QR and Cholesky decomposition, eigens, least squares, Levenberg Marquardt, matrix inverse and pseudoinverses, integration, Runge Kutta, interpolation, bsplines, fft convolves, multidimensional images, KDTrees, symbolic equation solvers, merge/join of data sets, etc.?

Because I use almost all of these every single day (I don't do multidimensional images or b-splines much at all). Are those all in standard libraries, fully documented, backed by 60 year old, fully debugge code (LAPACK, etc), that I reliably email to anyone across the world and they can immediately run and modify my code because it is such a standard? I honestly don't know, but I'm guessing not.

I use Python/Numpy/Scipy/Pandas/Matplotlib because everyone else in the world knows and uses them; they are a standard. Yes, my np.mean() might be slower than your map(). I almost always don't care. That misses the forest for the trees.

The article might be a good argument for why library writers might consider building out D's standard library to support numerical computation, I dunno. But no one is going to use D for serious number crunching without that infrastructure in place. People moved from Fortran and Matlab to Python not because it is fast, but for the environment. These language tricks are cute and all (I like D well enough, don't get me wrong), but it ain't why we are using Python.

At this point, if I were to switch languages to something without a lot of adoption I'd lean towards Julia. It also have a modern language design, but it is written from the ground up for numerical computation. I can't think of any reason I'd ever reach for D.

vidarh · on Jan 3, 2016

> Does D have code for: [...]

He covered this, specifically out that Numpy has an advantage in the number of libraries and other resources available for it.

> because everyone else in the world knows and uses them;

No, everyone working in some specialized niches does. And those niches don't even scratch the surface of the number of people who regularly do simple operations on multi-dimensional arrays but who don't need most of what you list.

And sometimes people even in your niche that does depend on the other stuff run into things that are just way too slow, that they have to rewrite in a lower level language.

There are plenty of people for whom this will be very useful without having all the support infrastructure around Numpy in place.

> I can't think of any reason I'd ever reach for D.

D is for people who wants a "C++ done right", where it is implicit that you want a C-like language, and share ideals of C++ such as to whatever extent possible not paying for what you don't use.

Specifically in this case, you'd reach for D if you need multi-dimensional arrays and either prefer a C++-like language over Python or your Python code is too slow and you prefer to avoid a lower level language like C.

That may not be for you, and that's fine. Just like Numpy isn't nearly universal for numerical computing.

> Yes, my np.mean() might be slower than your map(). I almost always don't care.

I find this an amusing objection... I did my MSc thesis on methods for reducing error rates in OCR, and included methods that used various nearest-neighbour variations and statistical methods on bitmap slices and similar.

I did it mainly in Ruby (with some inline C where absolutely necessary for performance - if I remember correctly it was about a dozen lines of code in C) because I cared far more about readable code than fast code, and Python would in my eyes have been a double loss: not the speed of C, yet still ugly and annoying to work with in my opinion.

Seeing the D code examples, I might very well have picked D for it today.

n00b101 · on Jan 3, 2016

>Are those all in standard libraries, fully documented, backed by 60 year old, fully debugge code (LAPACK, etc) ... yes, my np.mean() might be slower than your map(). I almost always don't care.

Hmm ... LAPACK itself is only 23 years old (1992). Its was implemented to be a faster replacement of the "standard" LINPACK library, by effectively exploiting the caches on modern processors.

For most serious users of numerical computing, performance is extremely important.

Also, I think you might be putting far too much faith in an arbitrary assortment of Python libraries. If you think that NumPy is just a thin wrapper on LAPACK, and you are using it in mission or life critical operations, then I think you have some code vetting to do.

Even if some of your chosen libraries do use LAPACK, and even if there is some code in LAPACK that is 60 years old, that does not suddenly mean that all of your chosen Python libraries (and your code on top of them) is all correct. Most serious users of many of the routines that you mention (e.g. "machine learning" and "financial calculations") implement and test these routines themselves in order to ensure their correctness. In very serious model vetting operations, it would be customary to even invest in two different independent implementations of the same routine.

> I reliably email to anyone across the world and they can immediately run and modify my code because it is such a standard?

I don't think "reliable" and "emailing scripts" normally go together. It would be more customary to use a source code control system in a high reliability environment. Personally I have found reproducing Python installations to be very tricky and I would take extreme care in any critical operations that assume two different Python installations produce the exactly the same results (high reliability, like some of the 60+year old codes run by NASA, would even require bug for bug reproducibility).

"Python/Numpy/Scipy/Pandas/Matplotlib" are not the last word on anything, theyt are not standardized as you imply, and they leave vast room for improvement in all dimensions (performance, correctness, reliability, productivity, etc).

9il · on Jan 3, 2016

Yes, Julia is amazing! In the same time, if you want to write a package for Julia you _may_ need to use C/C++. D is going to have integration with Julia in 2016 ;)

D already have good integration with Python. You may want to read this article http://d.readthedocs.org/en/latest/examples.html#plotting-wi... (it may be a little bit outdated).

tadlan · on Jan 3, 2016

What would one need to use c or c++ to write a julia package? It's as fast as native code so no need to use multiple languages.

9il · on Jan 3, 2016

Julia is really fast in 95% cases, but 5% still "make the weather". The pairwise summation is an example. I will post benchmarks D vs Julia next week ;)

skariel1 · on Jan 3, 2016

I would say the biggest benefit of D is static typing. In Julia you can run a simulation and discover only after half an hour that you misspelled a function

tadlan · on Jan 3, 2016

Make sure you use the devect macro or parallel accelerator if you're going to use vector iced code :)

dallbee · on Jan 3, 2016

A lot of that is up and coming.

https://github.com/DlangScience

[x] Plotting: http://code.dlang.org/packages/plot2kill

[+] Optimization: Nothing directly for the purpose, but the intermediate steps are done. And a little bit of work from http://code.dlang.org/packages/atmosphere

[x] Probability Distributions: http://dlangscience.github.io/dstats/api/dstats/random.html and http://dlang.org/phobos/std_mathspecial.html

[-] Machine Learning: Not that I'm aware of, but I vaguely recall some work being done on this

[-] Financial calculations: I couldn't find anything, but I'd be surprised if this wasn't implemented already.

[x] Masked Arrays: Accomplished via language features in combination with ndslice

[x] Structured Arrays: Absolutely. Standard library support as well as a number of 3rd-party libs can do this.

[-] SVD

[?] QR: Afraid I'm not sure what you're referring to.

[+] Cholesky Decomposition: Trivial to implement

[x] Eigenvalues/Eigenvectors: Yes. SciD provides Eigens as well as two other third party packages. https://dlangscience.github.io/scid/api/scid/linalg/eigenval...

[x] Least Squares (linear): Trivial to implement, but also implemented by a few of the 2-d & 3-d graphics libraries.

[x] Least Squares (nonlinear): Nothing, but we both know that this is easily done by applying a function to your independent variable to linearize it in many cases.

[-] Levenberg Marquardt

[x] Matrix Inverse: Provided by SciD

[?] pseudoinverses: I'm afraid my math background doesn't cover that, and I couldn't find anything with a similar naming in the package repository

[x] Integration: SciD provides integration

[-] Range Kutta

[-] Interpolation: Sort of. A few of the graphics libraries have this, but it's asking a bit much to include a graphics library to do interpolation.

[-] Bsplines

[-] fft convoles

[?] Multidimensional images

[?] KDTrees: Probably in one of the 3d graphics libs

[+] Symbolic equation solvers: Some of this was done for Tango back in the day, and it has been ported, but the project looks fairly dead. https://github.com/opticron/libdmathexpr/blob/master/source/...

[x] Merge/join of data sets: Can be done efficiently and easily with core language features

Additionally, anything written for C can be trivially wrapped and used in D. An example is http://scimath.com/, which completes much of what you mentioned.

Are they in standard libraries? SciD needs more time Fully Documented? D tends to encourage good API documentation, and the algorithms are largely ports of "69 year old, fully debugged code".

I think you're point about "reliably email to anyone across the world" is an overstatement. Python is popular, not ubiquitous. D is obviously much less popular. Can anyone run the code? Yes. For them to modify it, they need to learn the language.

You might not care about how long computations take, but I know back at my old university there were dozens of researchers complaining about the resources that had to be spent on supercomputer time, and the annoying amount of time that many calculations involving things like molecular configurations can take. Speeding this up saves money and makes the research less painful.

"No one is ever going to use D for serious number crunching without the infrastructure in place" - Yeah, that's totally true. The infrastructure is a WIP. There's nothing wrong with that.

Your post comes across to me as being rather cynical - but why not be supportive of the good work that's being done?

dsharlet · on Jan 3, 2016

> [?] QR: Afraid I'm not sure what you're referring to.

> [+] Cholesky Decomposition: Trivial to implement

How is it that you can judge Cholesky decomposition to be trivial to implement, while being unfamiliar with QR decomposition?

Cholesky decomposition is not trivial to implement well.

> Your post comes across to me as being rather cynical - but why not be supportive of the good work that's being done?

I had a similar experience to what's happening here myself. I worked on a project that had some very specific requirements for a high performance simulation (described by a differential equation). I worked on a very specialized piece of code for a long time to optimize this differential equation solver, eventually writing a JIT compiler to generate specialized code for each simulation.

I had a D advocate come along and rather vigorously argue that D can solve this problem, without really understanding it, and missing some key requirements.

This sort of comes off the same way, "a numpy replacement". numpy is big and does a lot of stuff. Like Cholesky decomposition, many of these things have trivial implementations, and then they have good implementations. The trivial implementations will take a few hours to write. The good implementations will take a few months, or even longer. Sometimes "not good" means slow, sometimes "not good" means numerical stability issues, etc. If you don't have the right background, these things may not even be obvious.

This is a bit of a pattern with D folks. Several times (this, my previous experience, and others), I've gotten the impression that a D fan comes along, invests a bit of time (a few weeks, months) in a particular domain, and starts making comparisons with other tools that they only scratch the surface of. It can rub people the wrong way.

edit: The symbolic expression reference in this post is another example. Computer algebra is an incredibly difficult thing to implement. I don't know of anything that is harder to implement well than a computer algebra system. Yet, you link a single file under 1000 lines as partially checking the box in your list for symbolic expression manipulation. D might be a highly productive, but it's not that productive.

dallbee · on Jan 5, 2016

I should have provided a key. Where I listed a +, I basically meant that there was either work underway or it was something that could be done yourself without hindering a project too much. I did not intend to imply that symbolic expression was a solved problem, or anything of that sort.

As for QR decomposition, I am actually rather familiar with it, my brain simply decided to fart on the symbols QR without being followed by decomposition.

For Cholesky Decomposition, I'm sticking with "trivial to implement", and I'll even throw in "trivial to implement well": https://github.com/numpy/numpy/blob/v1.10.1/numpy/linalg/lin...

You asked if D had code for a list of math capabilities that you use on a regular basis. My answer says "In many cases, no, but there's work being done toward that goal." No one is saying "we have a feature-complete math library!". There's much more needed than what you listed, and even the things you listed have many items with zero work done toward them. I am not attempting to say "D can do all these things!", and I don't think the author of the article was either.

srean · on Jan 3, 2016

> This is a bit of a pattern with D folks

Nothing D specific here. That's the pattern for every language, be it Haskell, Go, Common Lisp, JavaScript or Java ...

jboy · on Jan 3, 2016

Credit to the D developers for providing a concise, carefully-designed library for N-D array processing. The chained method invocations demonstrate D's UFCS (Uniform Function Call Syntax) nicely. And it's a definite bonus that you can use underscore like a comma separator in long integer literals (eg, `100_000`).

But if you use Python + Numpy/Scipy/Matplotlib and you're looking for a modern, compiled language for execution speedups or greater flexibility than what Numpy broadcasting operations provide by default, I would recommend Nim. It's as fast as C++ or D, it has Pythonic syntax, and it already includes many of D's best features (including type inference, UFCS, and underscores in integer literals).

And best of all, you don't need to rewrite all your existing Python+Numpy code into a new language to start using Nim.

The Pymod library we've created allows you to write Nim functions, compile them as standard CPython extension modules, and simply drop them into your existing Python code: https://github.com/jboy/nim-pymod

The Pymod library even includes a type `ptr PyArrayObject` that provides native Nim access to Numpy ndarrays via the Numpy C-API [ https://github.com/jboy/nim-pymod#pyarrayobject-type ]. So you can bounce back and forth between your Python code and your Nim code for the cost of a Python extension module function call. All of Numpy, Scipy & Matplotlib are still available to you in Python, in addition to statically-typed C++-like iterators in Nim+Pymod [ https://github.com/jboy/nim-pymod#pyarrayiter-types , https://github.com/jboy/nim-pymod#pyarrayiter-loop-idioms ]. The Nim for-loops will be compiled to C code that the C compiler can then auto-vectorize.

9il · on Jan 3, 2016

D has integration with Python/Matplotlib too =P http://pyd.readthedocs.org http://d.readthedocs.org/en/latest/examples.html#plotting-wi...

jboy · on Jan 3, 2016

It looks like you need to copy your D array to a newly-allocated Numpy ndarray before you can pass it to Python. So there's no binary PyArrayObject interoperability between D & Python (right?). Copying large N-D arrays all the time sounds slow...

(That Matplotlib example uses the function `d_to_python_numpy_ndarray` in the PyD project, which I found defined here: https://github.com/ariovistus/pyd/blob/master/infrastructure... . It clearly allocates a new Numpy array: https://github.com/ariovistus/pyd/blob/master/infrastructure... )

Also, I couldn't find any examples of invoking D functions from Python. In fact, I could only find mentions on the D mailing list of people reporting that they couldn't get it to work: http://forum.dlang.org/post/rdhrvzhhwxgfyxzjevfu@forum.dlang...

By compiling (transpiling) to C, Nim really does have an unfair advantage in the interoperability challenge...

9il · on Jan 3, 2016

ndslice was merged to DLang master repo today. It is not a problem to fix PyD. (compiling to C is crispy)

EDIT: Exposing-d-functions-to-python http://pyd.readthedocs.org/en/latest/functions.html#exposing...

stevieboy · on Jan 3, 2016

jboy, Can nim-pymod be used as VLA's for nim? I'm not too fond of the nim seq'type (bit slow for my usage) and prefer arrays, but need their length allocated at runtime. Can this be done via (albeit a clunky route) through nim-pymod? i.e arrays created and accessed all in nim (no python)?

jboy · on Jan 3, 2016

Hi, the short answer is "Yes, but ...".

Yes, Pymod's PyArrayObject can be created & accessed entirely in Nim; yes, its length (actually, shape) is specified at runtime; and yes, it can be resized after creation.

However, Pymod's PyArrayObject is designed for maximum binary compatibility with Numpy. As such:

1. It uses Numpy's array creation functions (such as `createSimpleNew`), which in turn uses the Python runtime memory allocator. So at the very least, Pymod assumes you've linked against `-lpythonX.Y`.

2. It integrates with Python's GC rather than Nim's GC. (This is why it's passed around as `ptr PyArrayObject` rather than `ref PyArrayObject` -- it's untouchable by the Nim GC, but instead uses Python refcounts.)

Of course, we do intend to extend Pymod to include a pure-Nim sibling for PyArrayObject, but it doesn't exist yet.

If you want a high-quality no-Python-dependency VLA in Nim right now, I'd recommend this library by Andrea Ferretti [ https://github.com/unicredit/linear-algebra ]. It can link against your system BLAS, in addition to offering GPU support using NVIDIA CUDA. Andrea is another member of the Nim community who does a lot of scientific computing: http://rnduja.github.io/2015/10/21/scientific-nim/

stevieboy · on Jan 3, 2016

Ok, looking at the linear-algebra VLA stuff now. Thanks for the detailed reply and suggestion. (and apologies to original OP for veering off-topic... the new ndslice package looks like a pretty cool addition the D libraries, I'll be having a look at that too)

zardeh · on Jan 3, 2016

And here we have a case of why microbenchmarks don't work. What you're measuring here isn't a speed difference in the mathematical code, its a constant time overhead from calling into the numerical libs. Up your array size by 100 times and this will become evident.

Why do I say this? Because inlining the python function to

means = numpy.mean(numpy.arange(100000).reshape((100, 1000)), axis=0)

from the original example in the article cut the benchmark time in down from around 215us to 205 us in my testing. That was done by removing a single python bytecode instruction.

Its quite likely that the D numerical code is actually slower than the LAPACK based python numerical code, but you're hiding this in the constant time overhead of a few python function calls.

bionsuba · on Jan 3, 2016

As I stated in the article, I did not include the array creation in the benchmark in order to be fair to Numpy with its slow initialization times. The only python code that I benchmarked was the numpy.mean line.

zardeh · on Jan 3, 2016

That still doesn't matter. So much of the work is being done in python that you're benchmarking the overhead of python, not the numpy speed.

I mean this is a terrible benchmark. Comparing to pure python, the numpy code is about 4-5x faster on my machine, whereas in reasonable real world benchmarks, numpy is hundreds of times faster than pure python.

For reference my benchmark was

    %timeit [sum(row)/len(row) for row in mat]

and it completed in about 900us, given mat was a python 2d array.

These results are as valuable as the ones pypy gave me:

    pypy -mtimeit -s "[sum(row)/len(row) for row in [[range(100000)[i*j] for i in range(100)] for j in range(1000)]]"  
    1000000000 loops, best of 3: 0.00101 usec per loop

Edit:

In other words, because the code returns a python array of length 1000, what you're benchmarking is python's speed of array instantiation vs. Ds mathematical speed. Its obvious that D will win. The only fair way to benchmark the mathematics here is if you compare the D implementation of O(n) algorithm that returns a single value, or an O(n^2) algorithm that returns a 1d array or single value etc. Otherwise the python overhead of creating O(n) objects will overshadow the actual mathematical calculations on those O(n) objects.

srean · on Jan 3, 2016

You may have a point that the benchmarks do not rule out the possibility that D code might be asymptotically slower.

But benchmarks that are designed to hide the fact that CPython function call overheads and object creation overheads are a godawful abomination are far from a fair benchmark. Those overheads are real and have real effects.

zardeh · on Jan 4, 2016

Sure, but the claim he's making is that D is faster than the numpy numerical algorithms. This is patently false. The C or fortan code that is doing the numerical calculation is as fast or faster than the D code he's running. Al of the difference comes from the initialization of O(n) pyobjects to represent the returned values.

If he claimed that handwritten D code was faster than equivalent python, I'd say "duh", but he claimed that the numerical libraries are faster. I take issue to the statement that D will be faster than numpy for real world calculations, because the numpy code will run the actual math faster, and the python overhead will still be on the order of nano or milliseconds.

bionsuba · on Jan 3, 2016

    you're benchmarking is python's speed of array instantiation vs. Ds mathematical speed. Its obvious that D will win.

Not true, the D code also has the overhead of array initialization. std.array.array is called which allocates the results of the range into an array on the GC, which everyone bemoans as being slow as a dog.

Plus, I don't see why this is an invalid benchmark when this is perfectly normal Numpy code, the kind that you see all the time. That was the reason behind the benchmark, to find some common piece of Numpy code and see how equivalent std.ndslice stacked up. I don't see how it's fair to say "Python is really slow at this one thing, so it's not fair to compare it to D in that area".

semi-extrinsic · on Jan 3, 2016

He's absolutely right, this is a crappy benchmark. Increase the array sizes by at least a factor of 100 to get anything meaningful.

The fact that someone who is "the review manager for std.ndslice's inclusion into the standard library" does not understand how to profile numerical algorithms makes me very skeptical of using D for any numerical project.

Plus, the syntax looks god-awfully unintuitive. A main advantage of Python is that you often get "code that looks like what it does". The D "basic example with a benchmark" OTOH looks almost obfuscated. To wit; a Fortran version is more readable, is fewer lines of code(!) and of course kicks D's butt when it comes to speed:

  program p
  real, dimension(100,1000) :: data
  real, dimension(1000) :: means
  
  n=1
  forall(i=1:100,j=1:1000)
    data(i,j)=n
    n=n+1
  end forall

  forall(i=1:1000000,j=1:1000)
    means(j) = sum(data(:,j))/size(data(:,j))
  end forall

Disclaimer: I wrote this on my phone, only 95% sure that it will compile and run correctly. Save it in means.f90 and compile with `gfortran -Ofast means.f90`. This calculates the means 1 mill. times (the loop over i in the second forall); I bet you it will be an order of magnitude faster than D per means calculation if you time it with plain time (the *nix command).

bionsuba · on Jan 3, 2016

"Increase the array sizes by at least a factor of 100 to get anything meaningful."

Ok, let's do that and see what happens:

    python -m timeit -s 'import numpy; data = numpy.arange(10000000).reshape((1000, 10000))' 'means = numpy.mean(data, axis=0)'

D code

    import std.range : iota;
    import std.array : array;
    import std.algorithm;
    import std.experimental.ndslice;
    import std.datetime;
    import std.conv : to;
    import std.stdio;

    enum testCount = 10_000;

    void f0() {
        auto means = 10_000_000.iota
            .sliced(1000, 10000)
            .transposed
            .map!(r => sum(r) / r.length)
            .array;
    }

    void main() {
       auto r = benchmark!(f0)(testCount);
        auto f0Result = to!Duration(r[0] / testCount);
        f0Result.writeln;
    }

Results

    Python: 14.1 msec
    D:      39   μs
    D is 361.5x faster

semi-extrinsic · on Jan 3, 2016

Ok. So three points remain:

* how do I know that the calculations aren't actually optimized away in all that code I can't understand? E.g. what happens if you make f0() return the means array instead of being a void function?

* Given the same number of lines (or characters), are you sure you can't write equally fast Python code?

* I tested the Fortran version on a slightly slower computer (got 14.7 msec on the Python version). Fortran runs at 0.7 μs, i.e. > 55x faster than D, with less code that's more readable to boot.

bionsuba · on Jan 3, 2016

"how do I know that the calculations aren't actually optimized away in all that code I can't understand? E.g. what happens if you make f0() return the means array instead of being a void function?"

If you can't understand the code, how did you know it was a void function. Please stop with the hyperbole, it's not adding anything to the discussion.

Updated code:

    import std.range : iota;
    import std.array : array;
    import std.algorithm;
    import std.experimental.ndslice;
    import std.datetime;
    import std.conv : to;
    import std.stdio;

    enum testCount = 10_000;

    auto f0() {
        auto means = 10_000_000.iota
            .sliced(1000, 10000)
            .transposed
            .map!(r => sum(r) / r.length)
            .array;
        return means;
    }

    void main() {
        auto r = benchmark!(f0)(testCount);
        auto f0Result = to!Duration(r[0] / testCount);
        f0Result.writeln;
    }

Results:

    Python: 14.1 msec
    D:      41   μs
    D is 343.9x faster

"Given the same number of lines (or characters), are you sure you can't write equally fast Python code?"

IMO program size is an almost meaningless statistic outside of code golf challenges. LOC is not an indicative measure of code readability, usefulness, or organization.

For example, your Fortran code was 11 lines while the D function (with the return) is eight lines.

"I tested the Fortran version on a slightly slower computer (got 14.7 msec on the Python version). Fortran runs at 0.7 μs, i.e. > 55x faster than D, with less code that's more readable to boot."

Just goes to show why Fortran is still used in a lot of scientific areas.

But two things:

1. Fortran is a much simpler language than D or Python and is much harder to do multipurpose work in it (so I'm told from Fortran programmers, I don't know Fortran myself). So when your program needs to do anything other than number crunching, it's normally done in a separate language. Using D you can have everything in one code base.

2. This article was about Numpy and std.ndslice because those are two areas that I know about and Numpy is a very popular library. Bringing up Fortran's speed here is like commenting on how much faster C++ is in a thread about Ruby.

Also, readability is a subjective idea; I believe the D code is more readable than the Fortran you wrote. Different strokes.

semi-extrinsic · on Jan 4, 2016

> If you can't understand the code, how did you know it was a void function. Please stop with the hyperbole, it's not adding anything to the discussion.

I understand "void" perfectly fine, but understanding how D optimizes your code is a completely different matter. Playing devil's advocate further, you

> IMO program size is an almost meaningless statistic outside of code golf challenges. LOC is not an indicative measure of code readability, usefulness, or organization.

No, but the numpy code is undoubtedly much simpler, and very general. How does the D example look for a 3D array where you want to average over the second dimension?

> For example, your Fortran code was 11 lines while the D function (with the return) is eight lines.

You're neglecting the library imports.

>So when your program needs to do anything other than number crunching, it's normally done in a separate language. Using D you can have everything in one code base.

I do agree other languages are much better for e.g. string processing. But for the applications where you can afford to trade simpler code for 50x slower performance, I'd say you're not really caring about performance at all, so why not just use Python? If performance is mission critical, a two-language code base (Python + C/C++/Fortran/CUDA) is not that hard to do, fairly common, and will give you the required performance.

> Bringing up Fortran's speed here is like commenting on how much faster C++ is in a thread about Ruby.

No; once you start saying "I want more performance than Python", it's obviously interesting to see what level of performance a "low-level" language gets, to put the result in perspective.

Now, I'm not trying to beat down on D, so please try to interpret this as constructive criticism. Finding "where does it fit in the scientific toolbox" is what makes or breaks a language's adoption in the scientific community. Just look at R, it's at the same level of performance/abstraction but in a separate niche from Python, and doing very well. The same goes for Matlab (which has e.g. Simulink).

zardeh · on Jan 4, 2016

Is D or the compiler doing something sneaky and inlining something, because I'd expect doing 100x more things to take 100x more time for a linear algorithm,but it takes you on the order of 10x more time. Does the original function run in <1ns for you?

Edit:

Because as I showed above, Pypy managed to JIT inline the entire benchmark, so I wouldn't be surprised at all if a clever compiler managed to do something sneaky.

cwyers · on Jan 3, 2016

Because, assuming he's right about what's going on, the relationship between complexity and time isn't linear (so D's performance relative to Python will grow less impressive as complexity increases, and at some point Python with NumPy may even perform better), and your example isn't representative of a task that takes long enough for speed to matter.

zardeh · on Jan 4, 2016

Its that microbenchmarks are useless. If it was worth my time, I could come back and use any of the available python tools (cython, numba, etc.) to compile that function to native code from python syntax and achieve native speed. Or I could write a D implementation of {{some relatively complex matrix algorithm}} and compare it to numpy, showing it was slower.

You could then respond by saying that I wasn't using D optimally, and I'd agree.

cannam · on Jan 3, 2016

I occasionally rewrite Python+NumPy signal processing code in C++ for purposes of packaging and integration with native apps, so I read these examples with an eye to how they compare with typical C++, rather than with NumPy. They compare very well, and it would never have occurred to me to look into D as a possibility for this sort of code.

I'm guessing the GC might rule it out for many cases where you do signal processing in C++, but I may as well ask: what's the deployment side of things like? Can I easily build a shared library and use it from a C++ application?

jboy · on Jan 3, 2016

You might also be interested to check out Nim. It transpiles to C before invoking the C compiler, so it runs as fast as C++ and has excellent C-compatibility (and by extension, excellent C++-compatibility).

Compiling a shared library is as easy as passing the "--app:lib" option to the Nim compiler: http://nim-lang.org/docs/nimc.html#compiler-usage-command-li...

The GC is optional; you can manage your memory manually if you prefer: http://nim-lang.org/docs/manual.html#types-reference-and-poi...

The Nim tutorial is here if you want to have a quick skim: http://nim-lang.org/docs/tut1.html

carljv · on Jan 4, 2016

Please stop this. It's approaching spam. Someone put a lot of work into building a new capability into a language and writing up a lengthy blog post about it. If you want to do the same for what you think is good about Nim and submit it to HN, please do. Repeatedly intercepting people's question about D with "you should look at Nim" is obnoxious. If D users did this in a thread about Nim, I imagine you would find it frustrating.

snydly · on Jan 3, 2016

Am I correct in thinking that this is only reasonable if you're already using D? The switching cost seems too high if you're python everything.

I tried the Armadillo C++ library a while ago (http://arma.sourceforge.net/). The speed up and time spend learning the syntax didn't seem worth it.

bionsuba · on Jan 3, 2016

I completely understand that for existing projects it might not make sense to switch, but as I say at the start of the article

    why you should consider D for your next numerical project.

snydly · on Jan 3, 2016

I just ran your column means benchmark. It is pretty impressive... maybe my C++ is trash. Never saw that much speedup with Arma. I'll try it out :)

p4wnc6 · on Jan 3, 2016

The central claim of the post seems summarized by this quote:

> For example, when using a non-numpy API or functions that don't use Numpy that return regular arrays, you either have to use the normal Python functions (slow), or use np.asarray which copies the data into a new variable (also slow).

but I disagree strongly with this.

First of all, if there is a common use case for some set of operations that need to be performed on very large data (the type of data you'd look to NumPy to handle), then generally there is already a subpackage within numpy/scipy/scikits/pandas/etc that already deals with that use case and natively handles it with NumPy arrays, with no switching cost to convert back forth between lists or tuples or whatever.

And, of course, when a list/tuple-heavy API is only meant to deal with small data, it's not a problem to use NumPy's facilities for converting between ndarray and the builtin array types. In cases where you're dealing with a huge breadth of small data, then that casts doubt on whether you should be using NumPy; it wouldn't be casting doubt on whatever the other list/tuple-heavy API is. And probably parallelization (or even the buffer stuff I mention below) is a fine solution in that case.

Second, in a lot of cases you can make use of the Python Buffer Protocol to share the underlying data of a NumPy array without copying it. This won't help if some other API expects Python lists or tuples, but the great thing about dynamic typing in Python is that all that really matters is that whatever underlying buffer type you need implements whatever methods that other API expects to call.

You can always write your own extension type that adheres to the Buffer Protocol and also provides whatever API is needed to conform to some other library, so the power to create these double-sided adapters (one side sharing data with NumPy, the other side appearing like a drop-in acceptable data structure for the other library API) is very powerful and generic. It might take some getting used to the first few times you do this, but if you use tools like Cython to help, it's really quite easy to do, easy to maintain, and solves a surprisingly wide range of NumPy integration problems. In fact, these things generally already exist for most problems you will run into and ultimately they often boil down to simple Cython-based wrappers around C bindings to the other Python API you're working with.

I would argue that the existence of this Buffer Protocol adapter strategy alone is enough to say that the switching cost to D is virtually never worth it, and still pretty speculative even if you're starting a brand new numerical computing project.

Finally, most Python libraries that heavily rely on the list or tuple APIs are not meant for large data (those APIs mostly already just use NumPy, as I mentioned, or else they use generators and let the end user decide which array type will eventually be instantiated as the results are consumed). It's not common, by intentional design, for list/tuple-heavy APIs to need to cope with large data, so when someone says something off the cuff like "What do you do when some library API needs lists and you've got NumPy arrays?" it sounds like a worrisome case, but in practice it's really, really uncommon that such a situation arises and no one else ran into it before you and no one has created a NumPy-compliant solution already. It's not impossible, of course. Just unlikely, and probably not important enough to use as a basis for language choice, unless you're facing a really special sort of API problem.

Edit: None of this should read at all as a criticism of the D language or this particular implementation of ndarray data structures. All that stuff is great and anyone wishing to use D absolutely should.

I'm only arguing against the post's central thesis insofar as it is used to justify considering D as an alternative to scientific Python. The problem that the post points out already has solutions solely in the Python ecosystem, many projects have handled that problem before, and the problem is pretty rare and esoteric anyway, so it's probably not a good thing to use as the basis of an important choice like which language to use, or whether to switch languages.

There could be many reasons to prefer D over scientific Python depending on a given use case, and there could be certain situations where switching from Python to D is a good idea. Whatever those cases may be, the central issue of this post, performance degradation caused by NumPy-to-other-API compatibility, is not one of them.

srean · on Jan 3, 2016

Very nicely put and I agree. What I was expecting to see in the list of numpy problems mentioned in the article wasn't there. The major problem that makes Numpy performance lag behind C++ or Fortran is the extra level of indirection that is needed for accessing an element (via stride ptr), and the need for extra copies that is forced on you by vectorization. Numexpr can help for certain cases of the latter, but its still quite limited in the type of expressions that numexpr can handle. It is my belief that with some local static analysis both can be mitigated somewhat.

It would be really interesting to see whether D's nd object tackles these issues. The copy of arrays across function boundaries, as you correctly pointed out, is mostly a red herring.

tadlan · on Jan 3, 2016

Numba obviate these issues

srean · on Jan 3, 2016

I see your excitement about Numba, I guess you follow it closely, or perhaps have other valid reasons. So all the best. However, the last time I tried to use it, about 4 months ago, installation was a bitch and it would crap out compiling some functions. It holds promise of course, so does alternatives.

What I think the current post is about is that similar nice and performant abstractions for D. I love Python a lot but I have to pay extra attention so that it is performant, so that I don't make stupid typos. I have to rewrite parts in Cython or Weave (sadly the latter gets no love anymore). Larger the code base becomes I have to spent more time in the tail part of the 'tooth to tail' ratio. In D many of these are taken care by default, I have to worry a lot less about these things. The other fantastic thing is DMD has super fast compilation times. It used to be faster than Go in compilation time but the latter has caught up in that department (grossly lacking in others). D binaries execute a lot faster (when compiled with LDC or GDC). It may lack some libraries here and there, but that does not worry me as much, everyone has to start somewhere, and to be honest Python is lacking in that respect compared to R. What would be worrisome is the presence of structural aspects that may get in the way of such an eco-system emerging. I don't see anything of that kind in D.

In anycase I don't quite see how Numba obviates the issues I mentioned. It does not change the Ndarray representation, and the extra indirections lay right there. OTOH some copy elision I would grant.

I can give concrete examples. The isotonic regression implementation in scikits.learn was absolutely pathetic. It has been rewritten several times and its performance is still severely lacking in spite of being Cythonized. You can take a stab at it with Numba and see what you can do. In C++ (D would have been nicer) the very first idiomatic implementation in STL was faster than any of these rewrites. I wrote it once and moved on. But C++ is a horrifying mess, now if there was a language that gave me C++ performance (not asking for much) but none of the mess and numpy level expressiveness to boot, that would be sweet. Julia, Nim, D, PyPy are some of the contenders trying to reach this holy grail

p4wnc6 · on Jan 3, 2016

Numba certainly does not obviate all of these issues. I think user `tadlan` is referring to some of the compiler optimizations, like loop unrolling or fusion (e.g. noticing that two subsequent loops can be 'fused' into the subordinate execution block of one single loop). These things can offer speed-ups even beyond NumPy, and they can work even when the Python code you start out with already uses NumPy.

The thing is, which `tadlan` seems unaware of, a lot of this stuff just fails in production environments and hits corner cases that the Numba compiler does not handle (I'm talking about the first part of the Numba compiler pipeline, where it converts to Numba IR, and not yet to LLVM IR, and does things like examine the CPython bytecode to alter the representation from a CPython stack-based representation to a register-based representation that will be compatible with LLVM and ultimately with the actual machine itself). In that compilation step, the only things that are able to be handled are things that the Numba team (I used to be a member of it) explicitly support. They don't support a full-blown compiler for the entire Python language, nor even for every type of NumPy operation. That's not a knock against Numba at all -- it's a specializing compiler and obviously they need to prioritize what to support, and make longer term goals about supporting more general things. But the point still remains that you cannot just assume that if you call `jit` on any arbitrary Python code, it will always become faster. In some cases, it can even become slower.

I suspect `tadlan` is very interested in Numba and enthusiastic about knowing the taxonomy of Numba details, but it does not seem like that user has had real world experience trying to get Numba to work in production, and seeing all of the numerous buggy and missing features. I don't want to diminish anyone's enthusiasm for Numba, so it's probably best just not to engage with `tadlan` about it. That user's mind seems made up already.

Gtifn · on Jan 3, 2016

As a former member of the Numba team, what is your outlook on the technology and the broader continuum ecosystem?

I'm looking at Numba and friends (Dynd, blaze) for a new stack, but I'm not sure where the development arc will end up vs say Julia. I'm also curious about the sustainability of Continuum's business model and practices in the medium and long term.

Any thoughts on this? I understand if you are limited in what you can say, but I'm open to any nuggets.

p4wnc6 · on Jan 3, 2016

This is one of those questions that is extremely hard to predict. Even though I was part of that team, it doesn't mean I have any special insight into what will happen.

A lot will depend on sources of funding. Will folks like Nvidia start sponsoring Numba, and if so what will it mean for support of Nvidia alternatives like OpenCL?

It seems like NumbaPro as a stand-alone for-pay product is not viable on its own, but that claim could be wrong based on more recent data that I don't have access to. So external sponsorship may be necessary.

One form of this could be through Continuum's already established business model of consulting and support services. But then the question is whether the nature of those consulting and support projects will allow for developers to actually further the cause of Numba, or just merely hack in poorly conceived features that are demanded by the consulting and support customers? Since Numba is open source, it should be easy enough for anyone to follow along with commits and discussions on GitHub and make their own opinion about what direction that is going.

The other question that is always hard is staffing. Far and away the colleagues I had the chance to work with on the Numba team were amazingly good. But it's not clear if working solely on Numba can justify the sort of salary that would be required to attract very top engineers and grow the team. You might start to see more interns and/or post-doc type labor feeding into Numba, and again I don't know what that will mean for the project ... could be good or bad.

At the same time, you've also got a lot of active development for Julia, PyPy, and a lot of people still prefer to use Cython rather than jitting functions. Some people even call into question the entire goal of making something that is "easy" but also a "black box" -- like the way just dropping in `jit` works for people who merely use, but don't understand the inner workings of, CPython.

It's an exciting area, and the Numba team has as much talent and ability to claim a significant piece of the tool space surrounding high performance computing as anyone else. Whether that will pan out for them is still really hard to predict.

Gtifn · on Jan 3, 2016

Thanks very much for your thoughts.

What do you think about the foundational tech of numba itself?

Do you think it is any more of a black box than say Julia? Is there anything about it that would impede extension into a more stable, predictable and feature rich product?

BTW looks like Intel is doing some stuff with Julia: https://github.com/IntelLabs/ParallelAccelerator.jl

There has also been alot of recent funding to Continuum and dev of numba seems to be going strong. Also some recent work with AMD.

tadlan · on Jan 2, 2016

Use numba to compile python loops or array expressions to fast llvm, and problem solved. I'm sticking with python.

bionsuba · on Jan 3, 2016

Considering that in the benchmarked example, only one like of Numpy code was used which already uses compiled C, I have a hard time believing that that would catch up to using all compiled code.

tadlan · on Jan 3, 2016

Numbs compiles entire functions on and allows array expressions with allocation and loop fusion. I don't see the problem

bionsuba · on Jan 3, 2016

The benchmarked code is ALREADY a comiled C function called from Python and it still lost.

tadlan · on Jan 3, 2016

Numba would still be faster. It would fuse away any intermediates in the code and remove any Overhead to the compiled code.

also have the option of devecting to loops.

both of which are generally faster than vectorized jumpy code.

p4wnc6 · on Jan 3, 2016

While I agree broadly that the benchmark example in the post is not representative or useful as a comparison between D and NumPy, I disagree with your strong insistence on Numba in this case.

There is still a lot of Python code that the Numba-to-LLVM compiler cannot handle. Yes, it is true that Numba can do a decent job of removing CPython virtual machine overhead, even for functions in which you statically type the arguments merely as 'pyobject' -- but not universally.

And there are also a ton of super basic things, for example creating a new array inside the body of a function when using Numba in 'nopython' mode [1], that Numba doesn't handle. These things will improve over time, but they may not improve quickly enough for a given use case, and the D language and this ndarray implementation may be a fair competitor to Numba in the short-to-medium term (and could even be superior in the long run, who knows).

A fairer alternative would be comparing with the use of Cython, which for my money still hands down beats anything like D or Nim/Pymod for performance enhancement without sacrificing pleasantness of design and development.

Though, of course, none of this stuff holds a candle to Haskell :)

[1] < http://stackoverflow.com/questions/30427081/creating-numpy-a... >

tadlan · on Jan 3, 2016

Actually, Numba array allocation in nopython mode is allowed now.

What other numerical features are missing?

srean · on Jan 3, 2016

If we are discussing in 'would' and 'could' why wouldn't a D backend also be able to that. Lets talk once Numba is easy to install and use :)

tadlan · on Jan 3, 2016

It's not a would and gpod. Numba works like that right now, has better interoperability with python and less context switching.

Numba is I easily installed on all systems through conda package manager and is easily used with just a function decorator

Seems you are grasping at straws here.

fizixer · on Jan 3, 2016

click-bait. Why should I change my language even if I'm looking for a numpy replacement?

bionsuba · on Jan 3, 2016

How is this clickbait? Did I in any way misrepresent the content of the article?

If you don't like D and don't want to use it move on to the next item on the front page.