Yeah that's using over 5 gigabytes, not 400 megabytes. Your process monitor is i...

cfn · on Aug 8, 2023

It is strange that it does that given that there's plenty of free memory available in the system (it has 256Gb of RAM and wasn't running anything else).

dTal · on Aug 9, 2023

Not really, it's just a question of accounting. mmap is functionally the same as disk cache. As long as you've got the RAM, it'll run from RAM. If you really want, you can force llama.cpp not to use mmap and explicitly load everything into RAM, but there's not really any performance reason to do that - if the kernel keeps dropping your pages, you're under memory pressure anyway and "locking" that memory will probably end up either thrashing or invoking the OOM killer.