Hacker News new | past | comments | ask | show | jobs | submit login

The other way around is whole number math. I added the 3-node output from the 13B model to github, the timings are below. The 3-node 65B job hasn't finished yet.

llama_print_timings: load time = 17766.29 ms llama_print_timings: sample time = 264.42 ms / 128 runs ( 2.07 ms per token, 484.07 tokens per second) llama_print_timings: prompt eval time = 10146.71 ms / 8 tokens ( 1268.34 ms per token, 0.79 tokens per second) llama_print_timings: eval time = 287157.12 ms / 127 runs ( 2261.08 ms per token, 0.44 tokens per second) llama_print_timings: total time = 297598.22 ms




This is very interesting and actually in the usable realm, for some use cases


My networking setup is not optimal, but it was quite surprising how easy it was to get it all to work.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: