
Show HN: Arraymancer v0.4, Nim tensor library now with OpenCL and Keras-like API - mratsim
https://github.com/mratsim/Arraymancer
======
mratsim
Here is a link to the documentation as well:
[https://mratsim.github.io/Arraymancer/](https://mratsim.github.io/Arraymancer/)
And the examples:
[https://github.com/mratsim/Arraymancer/tree/master/examples](https://github.com/mratsim/Arraymancer/tree/master/examples),
including the mandatory XOR, MNIST convnets and the notorious FizzBuzz with
neural networks ;).

Also while deep learning is a focus, I've added general linear algebra and ML
features like a least squares solver and PCA.

------
narimiran
Speed in benchmarks [0] looks impressive. But they seem to be run on the old
version of Arraymancer (Arraymancer 0.2.90). Were there any speed
improvements/regressions in the newer version(s)?

Is there any "production code" written in Arraymancer?

[0]
[https://github.com/mratsim/Arraymancer#speed](https://github.com/mratsim/Arraymancer#speed)

~~~
mratsim
The code didn't change so there shouldn't be any difference on Nim stable.
Unfortunately I catched a regression in Nim #devel branch itself which is very
impacting if tensors are created in a tight loop.

All in all I take performance very seriously and regularly check the assembly
generated and the memory overhead of the library. It should at least be as
fast as any C, C++ library even if they are optimized by a compiler
(Tensorflow, Tensor Comprehension/Halide or MxNet/TVM), my intermediate
language is C, and my optimizing compiler is GCC/LLVM. Critical parts should
even reach Fortran speed thanks to heavy usage of __restrict__ and
assume_aligned compiler builtins.

I also take great care about how to implement any algorithm for numerical
stability and speed for example using a numerically stable 1-pass parallel
softmax_cross_entropy through a frobenius inner product that I didn't see in
any library I checked (Caffe, Tensorflow, Torch).

