It really does seem like a natural extension of the Unix maxim of "everything is a file"; I can imagine a modern browser embedded in a Plan 9 or Inferno style OS implementing a "tab as a file" style semantic.
OMG! I run in either text mode or very minimal graphics mode. This is a very appreciated comment. Now I am going to reconfigure my home machines to steal back that otherwise unused VRAM for swap.
Is this safe ? Had the impression that undetected bit-flips were more common with GPU RAM than with CPU RAM banks, unless you have professional GPU cards with ECC.
Just SWAP is inflexible, especially with the way Linux usually handles buffer/pagecache, memory allocation,and countless other things, if not explicitly configured otherwise. You'd probably get the most out of it if you'd use https://en.wikipedia.org/wiki/Zram setup to use that VRAM instead.
Except if it has a very weak CPU which struggles with the load caused by compression, or power constraints, because mobile? But even then the compression is configurable.
I once used something similar on an old Thinkpad 60p with an ATI FireGL which had some 32bit Intel Centrino, and had only 3.25GB usable RAM from the 4GB. The FireGL had 256MB of which I only needed 2, and let it have 6, just to be on the safe side. The other 250MB I gave to compcache pointed to the device created by some (experimental?) blockdriver I can't remember anymore.
This wasn't automatic, didn't work on the first tries, and produced crashes.
But was worth it once I had it right(¹), because that feeble thing ran much later into its limits. Noticeably. Not by some meaningless benchmarks.
(¹) much fiddling with many hexadecimal memory addresses and ranges, exclusions of the same in other configuration files, scattered over /etc/, mostly Xorg, but also other Drivers/PCI-Address spaces, to not have one BYTE of
overlap, but also not wasting any ;-)
Edit: Skimming the two links phram rings a bell.
It also ran Google Earth! Didn't really matter, on that GPU it probably did everything in software, and not something which the FireGL could deliver, or at least not at the times, with what MESA, XORG, DRI could do then with that card.
Make that three decades ago. In 1992 I was using a utility called VGADISK to create a 176K ramdisk in the VGA memory - this was on a 286 "laptop" (rather a luggable) with 1MB of RAM, and allowed me to fit entire programs in there as long as they didn't use VGA graphics (which would instantly corrupt the content of the ramdrive).
...and that was almost 7 years ago too, during which my second link there also died; but that was before the rise of all the AI stuff and its accompanying demand for GPUs with otherwise insane amounts of VRAM.
I don’t know about Linux these days, but Windows WDDM likes to have host-backed storage for VRAM allocations... which makes me wonder if using VRAM for swap could ever get you into a weird recursive state with allocations or swapping. It’s already the case that allocating VRAM on Windows can require waiting on swapping some VRAM to disk. What if that swap had to then go back to VRAM?
In the era of huge ML models, wouldn't it make more sense to do the opposite, i.e. use main RAM or SSD as swap voor VRAM? So that we can run e.g. ChatGPT on consumer hardware.
A lot of low-end GPUs do that and, when you don't have enough memory to fit the working dataset in the GPU's very fast RAM, the CPU will end up swapping in and out of the GPU via the PCIe bus.
I used I think 32 megabytes VRAM as swap way back when. I think it was on single channel AGP or PCI bus, so it wasn't fast, but much faster than disk, so pretty good.
One thing I'm not seeing mentioned is the security implications of running a filesystem in VRAM. Unless things have changed, GPU VRAM is basically free-for-all and any application can do read/write operations anywhere in the memory bank.
There are certain scenarios where this is okay but lots of scenarios where it is not (example: webgl)
Things have in fact changed. GPUs overwhelmingly have their own MMU to seperate different user contexts, including as accessed through their DMA engines. The few cases under Linux that don't such as the Raspberry PI's GPU instead use the kernel to wrap any submissions to the GPU where they can abstract and validate access to memory.
Yeah, it supports it. Pretty much every desktop/laptop GPU from the past 15-20 years supports this in fact. I actually remember John Carmack lamenting back in ~2009 that these MMUs were ubiquitous already and that he wanted to more directly use the MMUs to implement virtual memory for IdTech4's MegaTexture feature, but couldn't get the GPU vendors to expose them in usable ways for application developers.
Mantle/Vulkan/DX12/Metal was in a lot of ways a reaction to the thought 'well because of the ubiquitous MMUs in GPUs, user code can only crash their on GPU context, so it can't hurt to give them access to a lower level, video game console-esque API. At most they crash themselves and that was always allowed'.
> Is this the IOMMU (which isn't generally supported, except on high end hardware [0]) or something else?
No it's another MMU on the GPU itself. An IOMMU when combined with a GPU is a third MMU between the GPU and physical memory so you can give more or less raw GPU access to a VM securely.
... but why does, when my X11 (happens in wayland too iirc) system has some kind of bug, one window displays the contents of another (with some graphical corruption)?
my supposition was that a buffer of one program ending up in a VRAM associated with another program can only happen if there is no MMU, otherwise, what's the point
also: do you happen to know why do Qubes folks think that it's okay for two mutually untrusted programs run on the CPU, but can't accept that two untrusted programs can run on the GPU without eveasdropping on each other? even with IOMMU it seems.
it gives me the impression that safely sharing the GPU is a work in progress / not very secure
> ... but why does, when my X11 (happens in wayland too iirc) system has some kind of bug, one window displays the contents of another (with some graphical corruption)?
> my supposition was that a buffer of one program ending up in a VRAM associated with another program can only happen if there is no MMU, otherwise, what's the point
Because the compositor is running entirely in one GPU user context with access to the different windows. The different applications can't access each other, but the compositor can see all of their final display buffers.
> also: do you happen to know why do Qubes folks think that it's okay for two mutually untrusted programs run on the CPU, but can't accept that two untrusted programs can run on the GPU without eveasdropping on each other? even with IOMMU it seems.
> gives me the impression that safely sharing the GPU is a work in progress / not very secure
Defense in depth from not trusting MMU hardware that slightly changes it's interface to drivers every year and is different for every manufacturer, combined with the lack of documentation on said hardware.
An IOMMU doesn't really address the problem because it doesn't provide a way to split a GPU, only to give it as a whole to another context. So for instance an IOMMU doesn't enable the ability to access the GPU from multiple VMs simultaneously either.
Yeah, it isn't right. In the very early WebGL days there have been a handful of implementation bugs where previous VRAM content was accidentally exposed in textures, but these things have been fixed quickly. AFAIK, most of this is now also taken care of down on the hardware and driver level.
WebGL goes to great lengths to not expose any 'junk' VRAM data or allow out-of-bounds access even on old GPUs. But for the last decade or so, GPUs and their drivers implement the same memory protection mechanisms as CPUs (and AFAIK WebGL was a major driving force in that area).
VRAM is accessible via an aperture in physical address space (I believe it's bank-switched for GPUs with more VRAM than the aperture size), which normally isn't directly accessible to user-mode applications, much like most hardware on Linux.
That's not too bad. Most desktop work require only a tiny fraction of the memory a discrete GPU has on-board. As long as the OS manages it well enough (letting programs use the remaining memory safely while keeping the VRAM filesystem safe), seems like a good solution.
It would seem more natural to map VRAM to a block device, using nbdkit. Then you could create a regular filesystem like ext4 on top. Is there a specific reason why writing an esoteric FUSE filesystem is better?
I did that long ago with an IBM T60p, which had a 32bit Centrino, and only 3.25 or 3.5GB usable RAM out of 4GB. While the ATI FireGL had 256MB. At the time there was no ZRAM, but its predecessor compcache. IIRC I only needed about 2MB VRAM for the stuff I did, any capability the FireGL had wasn't supported by MESA/XORG/DRI at the times, apart from useless GLXGEARS.
What can I say? After much fiddling it worked stable, resulting in running into the limits of that POS much later. Extending its usability. Not that much, but noticeable.
Even Google Earth ran. Probably because it did anything in software anyways, on that combo of supported and exposed capabilities of the GPU at that time.
I assume you're tongue in cheek, but just to make a serious point, it should never make sense to swap from cheap RAM to more expensive RAM. The whole point of swap is to move rarely used stuff into storage with a lower cost per GB. Otherwise, you would just buy more RAM in the first place.
For example, you might have a laptop with soldered RAM and a discrete GPU. There may be tasks where you don't need the dGPU but could use the extra RAM.
It's relatively fast (probably could be faster than NVMe if optimized, i did not rtfm) and it is available. it's not the most horrible of ideas. I am fairly fairly certain lots of applications will load assets into vram and then evict them from system memory. Why not leverage this for general purpose use if it makes sense (after taking into account the security implications of course)?
Unless you have a notebook/desktop with a powerful GPU and there were no option with all things staying the same but a cheaper GPU so you ended up which under-utilised GPU.
It would be interesting to implement this and a highly parallelizable GPU background data compression algorithm...
It could be zram (zram would be a good choice!), it could be something else, it all depends on what kind of performance/compression is desired, and how parallelizable the compression algorithm is...
I have more RAM than I need, and just the other day I set up a RAM disk to put my /tmp directory on. My comic reader unpacks huge rar and zip archives into /tmp at every run, and I don't want it to wear out my SSD. I put this line in /etc/fstab:
none /tmp tmpfs defaults,size=4G 0 0
It works like a charm. And in the worst case, if I run out of RAM anyway, it readily swaps out.
I picked these up through some compliance benchmarks, commonly applied to /tmp -- I'd exercise caution with these elsewhere, they're fairly restrictive
Question is, can I use my spare APU for this. On second thought, giving it more VRAM in BIOS only means using more RAM for it. So it would just be a RAM disk with extra steps.
[1] Yesterday: TabFS – a browser extension that mounts the browser tabs as a filesystem, https://news.ycombinator.com/item?id=34847611
[2] The day before: Linux's SystemV Filesystem Support Being Orphaned, https://news.ycombinator.com/item?id=34818040