
Open-sourcing FBGEMM for state-of-the-art server-side inference - BooneJS
https://code.fb.com/ml-applications/fbgemm/
======
gok
Does anyone see an actual link to the code?

Edit: found it
[https://github.com/pytorch/fbgemm](https://github.com/pytorch/fbgemm)

------
danmg
Why aren't they showing benchmarks that compare it to other BLAS
implementations? How does it compare to the GEMM in atlas, cblas, intel's mks,
GOTOBlas, or any other library that implements GEMM? Is writing 'jitted' asm
like this better than writing fortran with -march=native?

~~~
Marat_Dukhan
FBGEMM is faster than theoretical peak FP32 (single-precision floating-point)
performance, therefore its faster than SGEMM/DGEMM in any BLAS library

~~~
danmg
If they're showing something higher than a 'theoretical peak' then it's a
fantastic result that must be investigated carefully for any error in data
collection.

Also, that doesn't stop them from showing an apples-to-apples comparison
against other libraries that provide GEMM. If other libraries are reporting
the same 'beyond theoretical peak' then it most certainly is a data collection
error.

~~~
Marat_Dukhan
Performance on the plot is higher than FP32 peak, but there's no error -
because FBGEMM does not compute in FP32, it computes in 8-bit fixed point. On
a Broadwell CPU, you can do 16 FP32 multiply-adds (2x 8-wide FMA instructions
via VFMAxxxPS instructions), but 32 8-bit multiply adds (1x 32-wide
multiplication with accumulation of adjacent results via VPMADDUSBW
instruction).

~~~
danmg
Ok. Then this will introduce significant truncation errors and it's not a
general GEMM. That's like claiming you've made the fastest FEM routine in the
world by doing everything in half-precision.

------
gnufx
Although it's right that most GEMMs are for large-ish, sqare-ish matrices,
small and skinny ones are actually important and targeted for some HPC
applications. The relevant comparison would be with libxsmm [1], which also
targets deep learning (on x86_64), though I don't think the released version
does reduced precision. 1\.
[https://libxsmm.readthedocs.io/](https://libxsmm.readthedocs.io/)

