When I compared llama2.c to llama.cpp it was way way slower.
All the mojo <insert number>x speed up claims I’m hearing about always use the baseline of some toy examples that nobody actually uses IRL.
Am I missing anything?
When I compared llama2.c to llama.cpp it was way way slower.
All the mojo <insert number>x speed up claims I’m hearing about always use the baseline of some toy examples that nobody actually uses IRL.
Am I missing anything?