these LFM 2.5 models are crazy fast. the (biggest in series) 8B-A1B model produces 35-40 t/s on an aged 6-core CPU using llama.cpp. it's my go-to model for whenever i need fast local inference. it's also pretty good at toolcalling. would love to see more finetunes on HF, but it appears not many people discovered it yet.
reply