The other way around is whole number math. I added the 3-node output from the 13... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

cameron_b on Aug 16, 2023 | parent | context | favorite | on: How Is LLaMa.cpp Possible?

The other way around is whole number math. I added the 3-node output from the 13B model to github, the timings are below. The 3-node 65B job hasn't finished yet.

llama_print_timings: load time = 17766.29 ms llama_print_timings: sample time = 264.42 ms / 128 runs ( 2.07 ms per token, 484.07 tokens per second) llama_print_timings: prompt eval time = 10146.71 ms / 8 tokens ( 1268.34 ms per token, 0.79 tokens per second) llama_print_timings: eval time = 287157.12 ms / 127 runs ( 2261.08 ms per token, 0.44 tokens per second) llama_print_timings: total time = 297598.22 ms

eurekin on Aug 16, 2023 [–]

This is very interesting and actually in the usable realm, for some use cases

cameron_b on Aug 16, 2023 | [–]

My networking setup is not optimal, but it was quite surprising how easy it was to get it all to work.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact