Hacker News new | past | comments | ask | show | jobs | submit login

How does that work? Does it mean that for every token generated it has to page areas of disk into RAM and then back out again?



You'd have to have enough RAM to hold the whole model in memory or performance will be awful, mmap is just a way to get faster startup (if the mmap'd file looks exactly like the in-memory representation) and easier sharing (the mmap region can be shared read-only memory that multiple processes use).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: