So the deal with AWE (Address Windowing Extensions) is that it lets 32-bit apps access memory above 4GB by essentially doing manual page mapping. You allocate physical pages, then map/unmap them into your 32-bit address space as needed. It's like having a tiny window you keep sliding around over a bigger picture.
The problemis that llama.cpp would need to be substantially rewritten to use it. We're talking:
cAllocateUserPhysicalPages()
MapUserPhysicalPages()
// do your tensor ops on this chunk
UnmapUserPhysicalPages()
// slide the window, repeat
You'd basically be implementing your own memory manager that swaps chunks of the model weights in and out of your addressable space. It's not impossible, but it's a pretty gnarly undertaking for what amounts to "running AI on a museum piece."
That would be a real issue. I vaguely recall methods to work around this - various mappings, some Intel extension for high memory addressing, etc: https://learn.microsoft.com/en-us/windows/win32/memory/addre...
Maybe unrealistic :( I doubt this is drop-in code.