True -- you can use (in OpenCL) clEnqueueMapBuffer to get something that looks like memory-mapped IO, but the consistency guarantees are different from regular host-based MMIO. Specifically, if you map a GPU buffer for writes, there's no guarantee on what you'll get when you read that buffer until you unmap the region. (You can think of it as buffering up writes in host memory until you unmap the region, at which point it's DMAed over to the GPU.)
In other words, OpenCL supports this very limited buffer interface due to compatibility issues, i.e. this kind of MMIO is the lowest common denominator that has to be implemented by any GPU claiming OpenCL compatibility. Although, this does not preclude most desktop discrete GPUs from mapping their whole internal VRAM onto host's memory address space through PCI bus. It seems to be the common mechanism for a host to access VRAM in the modern ATI and NVidia GPUs from what I understood after skimming through several technical documents. It is, by the way, as far as I can tell, the main reason behind the infamous 'memory hole' in 32-bit Windows OS's (inability to use more than 2.5-3G of RAM).
So, I guess, the correct answer to my initial question as to why it's not possible to use tmpfs with VRAM would be because that will require special memory allocation made in VRAM. Meaning, a patch to tmpfs code that can properly allocate memory in VRAM buffer would suffice if we are willing to limit compatibility to 64-bit x86 architecture with AMD/NVidia GPUs.
See the "Notes" section in https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/c...