Llama2.mojo - outperforms Karpathy’s llama2.c by 30% in multi-threaded inference

notnap · 2023-12-05T06:43:54

Isn’t llama2.c was just a fun project for Karpathy?

When I compared llama2.c to llama.cpp it was way way slower.

All the mojo <insert number>x speed up claims I’m hearing about always use the baseline of some toy examples that nobody actually uses IRL.

Am I missing anything?