Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
jart
on April 2, 2024
|
parent
|
context
|
favorite
| on:
LLaMA now goes faster on CPUs
BLIS does that in their kernels. I've tried doing that but was never able to get something better than half as good as MKL. The BLIS technique of tiling across k also requires atomics or an array of locks to write output.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: