Nice to see such speedups for CPUs. Are these changes available as a branch or pull request in llama.cpp itself? I'd like to make use of them in that form if possible (as I'm used to using that).
Yes, this is really a phenomenal effort! And what open source is about: Bringing improvements to so many use cases. So that Intel and AMD chip uses can start to perform while taking advantage of their high-performance capabilities, making even old parts competitive.