Hacker News new | comments | ask | show | jobs | submit login
Open-sourcing FBGEMM for state-of-the-art server-side inference (fb.com)
72 points by BooneJS 74 days ago | hide | past | web | favorite | 7 comments

Does anyone see an actual link to the code?

Edit: found it https://github.com/pytorch/fbgemm

Why aren't they showing benchmarks that compare it to other BLAS implementations? How does it compare to the GEMM in atlas, cblas, intel's mks, GOTOBlas, or any other library that implements GEMM? Is writing 'jitted' asm like this better than writing fortran with -march=native?

FBGEMM is faster than theoretical peak FP32 (single-precision floating-point) performance, therefore its faster than SGEMM/DGEMM in any BLAS library

If they're showing something higher than a 'theoretical peak' then it's a fantastic result that must be investigated carefully for any error in data collection.

Also, that doesn't stop them from showing an apples-to-apples comparison against other libraries that provide GEMM. If other libraries are reporting the same 'beyond theoretical peak' then it most certainly is a data collection error.

Performance on the plot is higher than FP32 peak, but there's no error - because FBGEMM does not compute in FP32, it computes in 8-bit fixed point. On a Broadwell CPU, you can do 16 FP32 multiply-adds (2x 8-wide FMA instructions via VFMAxxxPS instructions), but 32 8-bit multiply adds (1x 32-wide multiplication with accumulation of adjacent results via VPMADDUSBW instruction).

Ok. Then this will introduce significant truncation errors and it's not a general GEMM. That's like claiming you've made the fastest FEM routine in the world by doing everything in half-precision.

Although it's right that most GEMMs are for large-ish, sqare-ish matrices, small and skinny ones are actually important and targeted for some HPC applications. The relevant comparison would be with libxsmm [1], which also targets deep learning (on x86_64), though I don't think the released version does reduced precision. 1. https://libxsmm.readthedocs.io/

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact