So you say having a decoupled arrangement in software (which happens to be a de ...

derefr · 2024-05-14T20:52:37 1715719957

> So you say having a decoupled arrangement in software (which happens to be a de facto open standard) is a "terrible awful idea" and that instead you should just rely on whatever your proprietary hardware graphics vendor proposes to you? Why?

Sandboxing, and resource quotas / allocations / reservations.

By itself, a paravirtualized GPU just treats each userland workload launched by any given guest onto the GPU, as all being siblings — exactly as if there was no virtualization and you were just running multiple workloads on one host.

And so, just like multiple GPU-using apps on a single non-virtualized host, these workloads will get "thin-provisioned" the resources they need, as they ask for them, with no advance reservation; and workloads may very well end up fighting over those resources, if they attempt to use a lot of them. You're just not supposed to run two things that attempt to use "as much VRAM as possible" at once.

This means that, on a multi-tenant hypervisor host (e.g. the "with GPU" compute machines in most clouds), a paravirtualized GPU would give no protection at all from one tenant using all of a host GPU's resources, leaving none left over for the other guests sharing that host GPU. The cloud vendor would have guaranteed each tenant so much GPU capacity — but that guarantee would be empty!

To enforce multi-tenant QoS, you need hardware-supported virtualization — i.e. the ability to make "all of the GPU" actually mean "some of the GPU", defining how much GPU that is on a per-guest basis.

(And even in PC use-cases, you don't want a guest to be able to starve the host! Especially if you might be running untrusted workloads inside the guest, for e.g. forensic analysis!)

skissane · 2024-05-14T22:42:56 1715726576

Why does multi-tenant QoS require hardware-supported virtualisation?

An operating system doesn't require virtualisation to manage application resource usage of CPU time, system memory, disk storage, etc – although the details differ from OS to OS, most operating systems have quota and/or prioritisation mechanisms for these – why not for the GPU too?

There is no reason in principle why you can't do that for the GPU too. In fact, there have been a series of Linux cgroup patches going back several years now, to add GPU quotas to Linux cgroups, so you can setup per-app quotas on GPU time and GPU memory – https://lwn.net/ml/cgroups/20231024160727.282960-1-tvrtko.ur... is the most recent I could find (from 6-7 months back), but there were earlier iterations broader in scope, e.g. https://lwn.net/ml/cgroups/20210126214626.16260-1-brian.welt... (from 3+ years ago). For whatever reason none of these have yet been merged to the mainline Linux kernel, but I expect it is going to happen eventually (especially with all the current focus on GPUs for AI applications). Once you have cgroups support for GPUs, why couldn't a paravirtualised GPU driver on a Linux host use that to provide GPU resource management?

And I don't see why it has to wait for GPU cgroups to be upstreamed in the Linux kernel – if all you care about is VMs and not any non-virtualised apps on the same hardware, why couldn't the hypervisor implement the same logic inside a paravirtualised GPU driver?

AshamedCaptain · 2024-05-15T10:15:37 1715768137

> Sandboxing, and resource quotas / allocations / reservations.

But "sandboxing" is not a property of hardware-based virtualization. Hardware-based virtualization may even increase your surface attack, not decrease it, as now the guest directly accesses the GPU in some way software does not fully control (and, for many vendors, is completely proprietary). Likewise, resource quotas can be implemented purely in a software manner. Surely an arbitrary program being able to starve the rest of the system UI is a solved problem in platforms these days, otherwise Android/iOS would be unusable... Assuming the GPU's static partitioning is going to prevent this is assuming too much from the quality of most hardware.

And there is an even bigger elephant in the room: most users of desktop virtualization would consider static allocation of _anything_ a bug, not a feature. That's the reason most desktop virtualization precisely wants to to do thin-provisioning of resources even when it is difficult to do so (e.g. memory). i.e. we are still seeing this from the point of view of server virtualization, and just shows how desktop virtualization and server virtualization have almost diametrically opposed goals.

gorkish · 2024-05-15T15:43:29 1715787809

A soft-gpu driver backed by real hardware "somewhere else" is a beautiful piece of software! While it certainly has application in virtual machines, and may even be "optimal" for some use cases like desktop gaming, it's ultimately doesn't fit with the modern definition of "virtualization --

I am talking about virtualization in the sense of being able to divide the hardware resources of a system into isolated domains and give control of those resources to guest operating systems. Passing API calls from guest to host for execution inside of the host domain is not that. A GPU providing a bunch of PCIe virtual functions which are individually mapped to guests interacting directly with the hardware is that.

GPU virtualization should be the base implementation and paravirtualization/HLE/api-passthrough can still sit on top as a fast-path when the compromises of doing it that way can be justified.

AshamedCaptain · 2024-05-15T19:32:01 1715801521

I would say the complete opposite. The only reason one may have to use a real GPU driver backed by a partitioned GPU is precisely desktop gaming, as there you are more interested in performance than anything else and the arbitrary limits set by your GPU vendor (e.g. 1 partition only) may not impact you at all.

If you want to really divide hardware resources, then as I argue in the other thread doing it in software is clearly a much more sensible way to go. You are not subject to the whims of the GPU vendor and the OS, rather than the firmware, control the partition boundaries. Same as what has been done in practically every other virtualized device (CPUs, memory, etc.). We never expected the hardware to need to partition itself; I'd even have a hard time calling that "virtualization" at all. Plus, the way hardware is designed these days, it is highly unlikely that the PCI virtual functions of a GPU function as an effective security boundary. If it wasn't for performance, using hardware partitioning would never be a worthwhile tradeoff.