
Benchmarks for Blaze, A high-performance C++ math library - wall_words
https://code.google.com/p/blaze-lib/wiki/Benchmarks
======
deng
I remember this library being discussed on the Eigen list in 2012. Here's the
thread:

[http://thread.gmane.org/gmane.comp.lib.eigen/3423](http://thread.gmane.org/gmane.comp.lib.eigen/3423)

The main critique at the time was that Blaze always assumes perfectly aligned
data, including inside(!) the matrix, and pads the data with zeros if that is
not the case. Of course, this makes it impossible to map external data, which
is a huge downside. I'm not sure if that is still the case, but from skimming
through the docs it doesn't look to me like this has changed.

~~~
santaclaus
An old thread on the Eigen list also mentioned that the Blaze folks were a
little tricky with their benchmarks. Some of the reported performance numbers
with Blaze were for calls out to Intel MKL routines. Eigen also supports MKL
as a kernel backend, but the Blaze folks failed to enable this feature for the
reported comparisons, if I recall.

~~~
shepardrtc
> a little tricky

That's rather generous. If you're going to do serious work with Eigen, or
Armadillo, or Blaze, you're going to include a BLAS library like OpenBLAS or
Intel MKL (if you can afford it). Not including them is dubious, at best.

------
therobot24
better link: [https://bitbucket.org/blaze-
lib/blaze](https://bitbucket.org/blaze-lib/blaze)

~~~
wall_words
Thanks, I tried finding the link to the Bitbucket repository but I had a hard
time finding the link via Google.

~~~
therobot24
It actually took me more than a few minutes as well, had to use BitBucket's
search since Google (rarely enough) was returning crap

~~~
pmelendez
I have noticed Google not being too useful in some queries this week... I'm
wondering if they are trying new updates or something like that...

------
east2west
I am considering converting a C++03 math library to C++14 as a side project to
learn C++14 and I examined Eigen and Blaze. Eigen's code size seems to be a
fraction of Blaze, even though their functionalities are similar. Eigen also
has some design documents while Blaze has papers but not much more. It seems I
will try my hands on Eigen library for now. It is amazing that a couple of
people could do in a few years; Blaze has hundred of thousands of lines of
code.

~~~
arcanus
Eigen is header only and heavily templated, which certainly keeps the source
down.

However, SLOC is a poor metric for the quality of a codebase, especially
scientific ones. I've certainly found many instances where longer line counts
are more performant, for instance with hand-unrolling loops ( _very_ rare edge
case, not suggesting doing this as a rule!).

Unless you intend to become involved in development of the library, I see no
reason you would care about the lines of code. Even at that point, design
philosophy, features, etc. are more likely to be major factors in your choice.

FYI: Computational Scientist here. I am neither affiliated with Blaze nor
Eigen.

~~~
Profan
I'd say lines of code does matter if the set of functionality is the same, i'd
wager that, if you have the same functionality in less lines of code,
generally it's easier to verify that it's correct + avoid daft bugs.

.. Unless it's written in a completely uncomprehensible way of course (such as
some meta c++ stuff), but in languages with good metaprogramming.

~~~
arcanus
Scientific software should absolutely, always be verified through regression
and unit tests. Anything less is non-negotiable.

In a decade of work in hpc and computational science, I have very seldom found
looking at the code to be a useful tool for either verification or debugging.

Instead, use the scientific method: hypothesis testing by constructing simple
examples with known analytic solutions and using that for clues as to where
the real problem lies.

~~~
angersock
_Scientific software should absolutely, always be verified through regression
and unit tests. Anything less is non-negotiable._

I like your world. Let's live there. :)

------
shepardrtc
Does Blaze implement its own BLAS subroutines? Or does it wrap around an
existing BLAS library?

~~~
shepardrtc
Turns out that it does implement its own, but it also has the possibility of
wrapping existing libraries:

[https://bitbucket.org/blaze-
lib/blaze/src/e09d62ee714745b297...](https://bitbucket.org/blaze-
lib/blaze/src/e09d62ee714745b2972e8e80ea81f9872285c46a/blazemark/Configfile?at=master)

Of course, if you're seriously going to use this library, you're going to use
an established BLAS library like OpenBLAS or Intel MKL.

------
santaclaus
It would be interesting to see benchmarks of sparse operations, as well.

------
wall_words
I've had great success with Blaze, despite the fact that it has received
little publicity compared to alternatives like Eigen, Armadillo, etc. Blaze is
consistently the leader of the pack in benchmarks, and even outperforms Intel
MKL on the Xeon E5-2660 (the CPU for which the benchmark results are shown).

~~~
arcanus
For what problems? General statements like this are hard to back up,
especially in the wild world of numerical linear algebra.

From my experience, there are currently no good distributed-memory open source
sparse-direct solvers.

No good distributed-memory ILU implementation, either. Scalability is almost
non-existant beyond 100 cores.

~~~
onalark
That's because direct solvers can't scale. If you want to solve a large
(distributed over hundreds of nodes) sparse linear algebra problem as fast as
possible, decades of research have been poured into efficient techniques
(Krylov methods, Multigrid, preconditioners) for solving them iteratively.

~~~
math_and_stuff
Can't scale in a weak, strong, or asymptotic complexity sense? And for what
sorts of problems (I assume you're thinking of 2D and 3D PDEs discretized with
local basis functions)?

~~~
onalark
Yes, I'm thinking of discretizations of elliptic 2D/3D PDEs. They don't scale
in the weak or strong sense, and they can't hold O(n log n) asymptotic
complexity due to fill-in from Cholesky/LU-style factorizations.

------
fizixer
In the first plot, why do all libraries slow down at the n=1000 mark?
something to do with cache?

~~~
acconsta
Yep. A 32KB L1 cache can hold at most 4000 doubles. 4000/(2 input vectors + 1
result) =1333.

[http://danluu.com/3c-conflict/](http://danluu.com/3c-conflict/)

