More

jhiesey · 2024-05-14T17:57:37 1715709457

Agreed. Virtualized 3d acceleration in particular still has quite a bit of "secret sauce" left in it.

gorkish · 2024-05-14T18:57:15 1715713035

Today this is mostly implemented by having a guest driver pass calls through to a layer on the host that does the actual rendering. While I agree that there is a lot of magic to making such an arrangement work, it's a terrible awful idea to suggest that relying on a vendor's emulation layer is how things should be done today.

Proper GPU virtualization and/or partitioning is the right way to do it and the vendors need to get their heads out of their ass and stop restricting its use on consumer hardware. Intel already does; you can use GVT-g to get guest gpu on any platform that wants to implement it.

AshamedCaptain · 2024-05-14T19:26:54 1715714814

So you say having a decoupled arrangement in software (which happens to be a de facto open standard) is a "terrible awful idea" and that instead you should just rely on whatever your proprietary hardware graphics vendor proposes to you? Why?

And that's assuming they propose anything at all.

Even GVT-g breaks every other Linux release, is at risk of being abandoned by Intel (e.g. how they already abandoned the Xen version) or limited to specific CPU market segments, and already has ridiculous limitations such as a limit on the number of concurrent framebuffers AND framebuffer sizes (why? VMware Workstation offers you an infinitely resizable window, does it with 3D acceleration just fine, and I have never been able to tell if they have a limit on the number of simultaneous VMs... ).

In the meanwhile "software-based GPU virtualization" allows me to share GPUs in the host that will never have hardware-based partitioning support (e.g. ANY consumer AMD card), and allows guests to have working 3D by implementing only one interface (e.g. https://github.com/JHRobotics/softgpu for retro Windows) instead of having to implement drivers for every GPU in existence.

derefr · 2024-05-14T20:52:37 1715719957

> So you say having a decoupled arrangement in software (which happens to be a de facto open standard) is a "terrible awful idea" and that instead you should just rely on whatever your proprietary hardware graphics vendor proposes to you? Why?

Sandboxing, and resource quotas / allocations / reservations.

By itself, a paravirtualized GPU just treats each userland workload launched by any given guest onto the GPU, as all being siblings — exactly as if there was no virtualization and you were just running multiple workloads on one host.

And so, just like multiple GPU-using apps on a single non-virtualized host, these workloads will get "thin-provisioned" the resources they need, as they ask for them, with no advance reservation; and workloads may very well end up fighting over those resources, if they attempt to use a lot of them. You're just not supposed to run two things that attempt to use "as much VRAM as possible" at once.

This means that, on a multi-tenant hypervisor host (e.g. the "with GPU" compute machines in most clouds), a paravirtualized GPU would give no protection at all from one tenant using all of a host GPU's resources, leaving none left over for the other guests sharing that host GPU. The cloud vendor would have guaranteed each tenant so much GPU capacity — but that guarantee would be empty!

To enforce multi-tenant QoS, you need hardware-supported virtualization — i.e. the ability to make "all of the GPU" actually mean "some of the GPU", defining how much GPU that is on a per-guest basis.

(And even in PC use-cases, you don't want a guest to be able to starve the host! Especially if you might be running untrusted workloads inside the guest, for e.g. forensic analysis!)

skissane · 2024-05-14T22:42:56 1715726576

Why does multi-tenant QoS require hardware-supported virtualisation?

An operating system doesn't require virtualisation to manage application resource usage of CPU time, system memory, disk storage, etc – although the details differ from OS to OS, most operating systems have quota and/or prioritisation mechanisms for these – why not for the GPU too?

There is no reason in principle why you can't do that for the GPU too. In fact, there have been a series of Linux cgroup patches going back several years now, to add GPU quotas to Linux cgroups, so you can setup per-app quotas on GPU time and GPU memory – https://lwn.net/ml/cgroups/20231024160727.282960-1-tvrtko.ur... is the most recent I could find (from 6-7 months back), but there were earlier iterations broader in scope, e.g. https://lwn.net/ml/cgroups/20210126214626.16260-1-brian.welt... (from 3+ years ago). For whatever reason none of these have yet been merged to the mainline Linux kernel, but I expect it is going to happen eventually (especially with all the current focus on GPUs for AI applications). Once you have cgroups support for GPUs, why couldn't a paravirtualised GPU driver on a Linux host use that to provide GPU resource management?

And I don't see why it has to wait for GPU cgroups to be upstreamed in the Linux kernel – if all you care about is VMs and not any non-virtualised apps on the same hardware, why couldn't the hypervisor implement the same logic inside a paravirtualised GPU driver?

AshamedCaptain · 2024-05-15T10:15:37 1715768137

> Sandboxing, and resource quotas / allocations / reservations.

But "sandboxing" is not a property of hardware-based virtualization. Hardware-based virtualization may even increase your surface attack, not decrease it, as now the guest directly accesses the GPU in some way software does not fully control (and, for many vendors, is completely proprietary). Likewise, resource quotas can be implemented purely in a software manner. Surely an arbitrary program being able to starve the rest of the system UI is a solved problem in platforms these days, otherwise Android/iOS would be unusable... Assuming the GPU's static partitioning is going to prevent this is assuming too much from the quality of most hardware.

And there is an even bigger elephant in the room: most users of desktop virtualization would consider static allocation of _anything_ a bug, not a feature. That's the reason most desktop virtualization precisely wants to to do thin-provisioning of resources even when it is difficult to do so (e.g. memory). i.e. we are still seeing this from the point of view of server virtualization, and just shows how desktop virtualization and server virtualization have almost diametrically opposed goals.

gorkish · 2024-05-15T15:43:29 1715787809

A soft-gpu driver backed by real hardware "somewhere else" is a beautiful piece of software! While it certainly has application in virtual machines, and may even be "optimal" for some use cases like desktop gaming, it's ultimately doesn't fit with the modern definition of "virtualization --

I am talking about virtualization in the sense of being able to divide the hardware resources of a system into isolated domains and give control of those resources to guest operating systems. Passing API calls from guest to host for execution inside of the host domain is not that. A GPU providing a bunch of PCIe virtual functions which are individually mapped to guests interacting directly with the hardware is that.

GPU virtualization should be the base implementation and paravirtualization/HLE/api-passthrough can still sit on top as a fast-path when the compromises of doing it that way can be justified.

AshamedCaptain · 2024-05-15T19:32:01 1715801521

I would say the complete opposite. The only reason one may have to use a real GPU driver backed by a partitioned GPU is precisely desktop gaming, as there you are more interested in performance than anything else and the arbitrary limits set by your GPU vendor (e.g. 1 partition only) may not impact you at all.

If you want to really divide hardware resources, then as I argue in the other thread doing it in software is clearly a much more sensible way to go. You are not subject to the whims of the GPU vendor and the OS, rather than the firmware, control the partition boundaries. Same as what has been done in practically every other virtualized device (CPUs, memory, etc.). We never expected the hardware to need to partition itself; I'd even have a hard time calling that "virtualization" at all. Plus, the way hardware is designed these days, it is highly unlikely that the PCI virtual functions of a GPU function as an effective security boundary. If it wasn't for performance, using hardware partitioning would never be a worthwhile tradeoff.

jhiesey · on Oct 24, 2023

In parts San Francisco you'll see a couple waymo/cruise cars drive by in any 5 minute interval

jhiesey · on Feb 15, 2023

If this is truly meant as a PoC to raise awareness, shouldn't there be a writeup of how it works and/or source code? I'm not interested in running some random binary that claims to hack a game, but a technical description of the vuln would be interesting.

_yrbh · on Feb 15, 2023

Sure, I'll release the source code. It's pretty simple.

irjustin · on Feb 15, 2023

agreed and obviously running this in a VM is the way to go.

i'm impressed at the amount of effort this person put in.

jhiesey · on May 6, 2022

The domain sentry.io resolves (for me at least) to a Google Cloud ip address in the us-central1 region, which is having an outage as well: https://status.cloud.google.com/incidents/4Qvmd4q81VnA9RirCM...

jvmiert · on May 6, 2022

Seems you are right: https://twitter.com/mitsuhiko/status/1522550576734691328

nik736 · on May 6, 2022

Are they using a single zone though? Only a single zone seems to be down.

dijit · on May 6, 2022

I think it can be unfair to characterise single zone failures as being an failure to adequately deploy or architect.

There's many opportunities for failure even if only a single zone goes away; most (if not nearly all) database solutions elect leaders for example, and "brown-outs" (as in, not total failures) can lead to the leader maintaining leadership status, or at least messing with quorum.

other situations can exist where the migration out of a zone leads to hardware becoming unavailable for consumption for other people, after all, the cloud is not magic and if peoples workloads auto shift to the surrounding (unaffected) zones then it will impact peoples ability to do the same migration as all the free hardware could be used up.

I can think of dozens of examples honestly where even if you had built everything multi-zonal you could be down due to a single zone; for instance if some unknown subsystem was zonal (like IAM?) or you use regionally available persistent disks and now they suddenly perform extremely bad with writes because they can't sync to the unavailable datacenter.

I believe multi-zone is less possible than we would like it to be, there are many cases where you can commit no error but still be completely at the mercy of a single zone going away.

scottlamb · on May 6, 2022

> I believe multi-zone is less possible than we would like it to be, there are many cases where you can commit no error but still be completely at the mercy of a single zone going away.

There are many understandable ways to accidentally have a single point of failure. But if your conclusion after the outage is that there was no mistake, you have made two of them, and the second is much less understandable.

samstave · on May 6, 2022

I remember when it was the control plane at AWS US-West that went out - causing mass havoc for many regardless of your architecture.

throwaway787544 · on May 6, 2022

Yeah but it looks bad

candiddevmike · on May 6, 2022

I am seeing some GKE control plane weirdness this morning too. API server endpoints going unreachable intermittently.

jhiesey · on Oct 6, 2021

It's the Erlang runtime/virtual machine, like the JVM is for Java, Scala, etc.

jhiesey · on Feb 25, 2021

Current US mail trucks are right hand drive so the driver can deliver to curbside mailboxes without getting out.

jhiesey · on Sept 22, 2020

Probably an isolation transformer. A search for “cable tv isolation transformer” gives lots of results.

jhiesey · on June 28, 2020

I bet you could get a lot better color if you used the screen itself to generate the bayer pattern on film rather than printing it with a printer.

Get some large format color negative film and thin cyan, magenta, and yellow filters. Put a cyan filter on the LCD, followed by the film on top. In a darkroom, light up the "red" pixels and to expose that part of the film. Repeat with the magenta filter and "green" pixels, and finally the yellow filter and "blue" pixels. Develop the film and reattach to the LCD.

This is kind of how the shadow mask was made in color CRT monitors. A photographic process was used to put the red, green, and blue phosphor dots on the screen by shining light through the shadow mask from the same angles the electron guns would later illuminate the phosphors. This ensures that the red, green, and blue phosphor dot pattern lines up (fairly) accurately with the shadow mask.

jhiesey · on Jan 17, 2020

Yes this issue is still there unfortunately.

jhiesey · on Dec 10, 2019

Free electron lasers generate X-rays using a single pass through the lasing medium, which is essential because you can't make good mirrors at those wavelengths either.