Matrix multiplication via SIMD

Content:

Original link: Beating NumPy's matrix multiplication in 150 lines of C code / Aman Salykov.

SIMD is hot right now.

Here is an example of someone handcoding C code to do matrix multiplication using FMA3 and AVX instructions and end up with code sligtly faster than OpenBLAS (a widely used open source implementation of the classic BLAS library written in C, Fortran and assembler).

(I don't think the fact that OpenBLAS is called via Python and NumPy has any relevance)

Pretty impressive - even though the author do note that OpenBLAS is likely to be faster on a CPU that supports AVX-512.

And there is a very good explanation of the logic behind the code. Definitely cool stuff.

And I like to see such stuff - real low level stuff, which was common 40 years ago, but not today.

Comments: