Hacker News new | past | comments | ask | show | jobs | submit login

Generally, having to unload data from GPU ram, and load a new set of weights in is quite expensive, so my guess is that the backend is built out where an input gets reservation to some cluster based on some ordering, and the batch is ran through.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: