

High Performance Machine Learning Through Codesign and Rooflining [pdf] - nkurz
http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-169.pdf

======
ced
Intriguing. This looks like a good summary:

 _Chapter 2 and 3 presents two key design ideas that guides the overall
development of BIDMach: codesign, which means choosing the right combination
of hardware and designing algorithms to best leverage the hardware, and
roofline design, which means quantifying the theoretical performance limits of
each component of a ML algorithm and making sure to get close to them. When
implementing a algorithm, we need its hardware mappings. This includes
computation pattern required by the algorithm, memory /IO pattern and
communication pattern when designing a distributed algorithm.

To reach performance limits, careful and often iterative design and coding is
needed. This is time-consuming. It would be problematic to do such
optimization for every machine learning algorithm. Instead, we create an
intermediate layer - a set of common computation and communication
kernels/primitives, between BIDMach and the hardware. This includes BIDMat,
butter y mixing and Kylix. These deal respectively with matrix algebra, model
synchronization across the network, and algorithms on graphs. An important
role of BIDMat is to manage memory with matrix caching, which we will describe
shortly._

So it's a really well-engineered approach Some bold claims:

 _We can bring the performance of sample-based Bayesian inference up close to
symbolic methods_

"Symbolic" here seems to refer to expectation propagation algorithms such as
[http://research.microsoft.com/en-
us/projects/infernet/](http://research.microsoft.com/en-us/projects/infernet/)

------
nl
BIDMarch (mentioned in this paper) looks interesting, and benchmarks well[1].
I hadn't come across it before.

The PageRank benchmarks are very impressive.

[1]
[https://github.com/BIDData/BIDMach/wiki/Benchmarks](https://github.com/BIDData/BIDMach/wiki/Benchmarks)

