Hacker News new | past | comments | ask | show | jobs | submit login

for those curious about running their own matmul benchmarks, I wrote a script a while back that works with both linux and MacOS that should make comparison easy.


I saw ~1.2tflops on the regular M1

On Linux I had to use `-lcblas` instead of `-lblas`. "6.63171 gflops" with a 24-core AMD EPYC 74F3

I used OpenBLAS on my cheap last-generation AMD Ryzen 7 4700U laptop like so:

git clone https://github.com/xianyi/OpenBLAS && cd OpenBLAS && make PREFIX=/opt/openblas install && curl https://jott.live/code/blas_test.cc | sed -n "/<code>/,/code>/p" | tail -n +2 | head -n -1 > blas_test.cpp

inspect blas_test.cpp file, and then...

g++ -I/opt/openblas/include/ blas_test.cc -lopenblas -std=c++11 -O3 -L/opt/openblas/lib/ -o blas_test && ./blas_test 512 512 512 100 100

and got a peak of about 192 gflops, averaging closer to 180. So yeah, the M1 is > 6x faster in this simple single-precision matrix test.

541 gflops here, following those steps. Well done Apple for making a laptop CPU over 2x faster than a 250W server CPU released this year :)

With my Ryzen 7 5800U laptop I get around 530 gflops, with a peak of 596 if I compile the test against MKL with

g++ -I/opt/intel/mkl/include/ blas_test.cc -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -std=c++11 -O3 -march=native -L/opt/intel/mkl/lib/intel64 -o blas_test_mkl

If you swap out “code” in the url for “raw” you get the raw text itself without a need to use sed


2.1 TFlops on 8(6 efficiency) core M1 pro

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact