Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Nvidia releases open-source GPU kernel modules (nvidia.com)
2410 points by ghishadow on May 11, 2022 | hide | past | favorite | 392 comments


1) This is unambiguously Good News

2) This is not upstreamable in its current form (nvidia admit this in their press release)

3) In an ideal world, nouveau (the open source driver for nvidia hardware) would be able to target this kernel code. Right now though there's no commitment for any sort of stable userland ABI, and that makes that difficult (moving to a new driver version may break interfaces that nouveau uses)

The press release does say that they have plans to figure out a more upstream friendly approach in the long term, which is great - and what has been released will undoubtedly help development of the existing nouveau codebase.


Sadly, it's also not the actual GPU driver, just the kernel part; you still need their massive closed blob in userspace. Almost had me excited there for a moment.


While this is not as good as an open-source GPU driver, it is nonetheless a significant progress, which cannot be dismissed as useless.

It is far more important to have open sources for all programs that are executed in privileged mode than to have the source for the programs that are executed in user mode.

The kernel can control what a user-mode program does and limit its access to any hardware resources or CPU time, but it cannot do anything against kernel modules that are executed in privileged mode.

So having the source for all the kernel modules is the main priority. With this new open-source kernel module from NVIDIA, the only closed-source kernel module used by a large number of Linux users is no longer so.

Now the main problems remain the BIOS/UEFI of the computers, which is free to override what the operating system does, due to the exceedingly stupid System Management Mode invented by Intel, and the firmware for the auxiliary CPUs that may be used for unauthorized remote management in most Intel- and AMD-based computers.


Plus, I assume that this would make it possible for distributions to provide signed modules, making it much easier to use secure boot on systems with NVIDIA drivers (before you had to sign the drivers yourself after building them).


I came here to say that. This isn't an NVIDIA commitment to open source; it is a slap in the face. Rather than open sourcing the drivers, they are patching the kernel to fit the needs of their proprietary driver. Note that I'm not against proprietary software, however, title of this post was super misleading and NVIDIA needs to grow up and work on an open-source driver. The world will NOT change for them.


The world doesn't need to change for them. They already have successful closed source drivers.

You're right that this isn't about nVidia suddenly deciding to open source anything, but it's still good for compatibility reasons.



It’s not actually a driver. Note comments like “Initialize driver-specific stuff” in the code.

Mesa refers to https://docs.mesa3d.org/systems.html and Vulkan refers to https://en.m.wikipedia.org/wiki/Vulkan

To quote the linked blog post:

> This blog post will be a tutorial of sorts (we won't have a functioning Vulkan driver in the end, sorry)

> First off, every driver needs a name. We're not actually writing one here but it'll make the examples easier if we pretend we are. Just for the sake of example, I'm going to pick on NVIDIA because... Why not? Such a driver is clearly missing and really should happen soon. (Hint! Hint!) We're going to call this hypothetical new Vulkan driver NVK.

That said… NVIDIA did release a Vulkan driver in January 2022 which supports Vulkan 1.3 so… this blog post is slightly outdated. But it certainly does not include or reference work on an actual NVIDIA driver.


The blog post is about writing a Vulkan driver inside Mesa. Nvidia's blob driver does not count here.


That's a bit of a tired and somewhat misleading argument.

Take a look into the kernel-firmware repo, and you will see a tremendous amount of binary firmware blobs. For example the Intel micro-code blobs are protected by ~7 layers deep of encryption, wifi drivers may have government regulatory issues with sharing firmwares in anything but binary form. So let's please drop the evil binary blob nonesense...

The kernel does in fact accept small or large binary blobs, and some of them even come with source code...So I guess the key takeaway is to look at what separate the Nvidia driver/firmware from all those other binary firmware loading drivers.

Hint, it has more to do with hiding the program interface, and using abstractions within abstractions.


This small tirade of yours is arguing against a strawman.

GP is likely referring to the userspace component of nvidia's driver, not the actual blob that gets loaded onto the gpu. If it is indeed true that you need to run proprietary userland code, then this open-source release is nothing but a way to circumvent the GPL issues that lead to nvidia's driver partially breaking in Linux 5.??.

While not ideal, I like many don't have a hard stance against loading proprietary hardware firmware blobs. I, however, absolutely take issue with running opaque user-space programs.


All closed-source kernel modules and a part of the firmware blobs (those that are for devices which have communication interfaces, especially wireless interfaces, and which may access directly the memory space without being blocked by a MMU under the control of the OS kernel) are major security risks, because they might be controlled by an attacker, with no means of detection by the user.

With a security-strengthened operating system, or with a VM/container/sandbox, it is possible to ensure that any opaque user-space program is no security risk.

I prefer very much open-source user-space programs for 2 reasons:

1. I am only seldom content with a program conceived by someone else, because usually I want to do something else than they had in mind and their program is suboptimal for my use case. When the program is open-source, that does not matter. I can modify it and combine it with other programs, to do whatever I like.

2. Whenever there are problems due to bugs, incomplete documentation or misconfigurations, if the program is open source any problem can be solved sooner or later, usually sooner. With proprietary programs, typically one gets only some incomprehensible error messages and it may happen that entire departments of IT professionals try for weeks to solve the problem, without any progress. (I had such a case last year; after upgrading an old company laptop with a new one, MS Teams stopped working; 4 or 5 IT support people distributed over 3 continents attempted to discover the cause over 2 weeks, but they all failed).

These 2 reasons make the open source programs much more desirable, but if you have to also use some opaque user-space programs for specific tasks, e.g. a Synopsys Verilog/VHDL synthesis program for designing with a FPGA, that is no big deal.

On the other hand, using an opaque kernel module which may contain either an intentional back-door or just an unintentional bug which may cause your computer to freeze exactly before saving some large piece of work, or which may corrupt you file buffers before being saved, this is much worse than the use of any opaque user-mode program.


> With a security-strengthened operating system, or with a VM/container/sandbox, it is possible to ensure that any opaque user-space program is no security risk.

This is absolutely not true.

It is possible to restrict an opaque user-space program to the minimal set of permissions it requires.

In this particular case, the program needs to manipulate system hardware in undocumented ways, via vendor code which is extending the kernel.


> With a security-strengthened operating system, or with a VM/container/sandbox, it is possible to ensure that any opaque user-space program is no security risk.

In theory this may be true. But the particular opaque user-space program is talking to a driver authored by the same team. That driver is talking to hardware that has access to all of the memory in your computer.

The odds are high that a userspace program talking to this kind of driver could make your computer do something you don't want it to do in a way that is very difficult to suss out just by reviewing the driver source code.

(I agree that an open driver is a great improvement over a closed one. But I don't think you can make any strong guarantees about the safety of the closed userspace portions of this one, yet.)


I'm not sure why you were downvoted; this is 100% correct. Open source makes adding backdoors much more difficult, but I imagine GPU driver code is so complex that there are likely security flaws that even the entire open source community wouldn't catch. Yes, an open source driver is an improvement in many ways, but it certainly doesn't solve these problems because the attack surface is so gigantic.


> has access to all of the memory in your computer

You can use IOMMU to restrict the access to the appropriate memory sections.


I don't think, for accelerating a desktop graphics card used across your entire desktop environment, IOMMU changes my argument at all.

I believe it *does* help if, say your desktop's integrated video works natively and you have a second card you want to pass through to a gaming VM. Or if you have a server with no card shared between users and a one-card-per-user scenario.


Interesting, so they can do whatever they want, provided it runs as firmware.


My view is that nvidia's hardware can work however they want it to. FOSS firmware would be cool in the same way that FOSS hardware would be cool, but it's not something I expect. I also don't care that much whether the proprietary firmware is stored on on-board flash and programmed from the factory or if it's uploaded by my OS when the driver is loaded.

But I would prefer if the code I run in userspace or kernelspace is FOSS.


> I also don't care that much whether the proprietary firmware is stored on on-board flash and programmed from the factory or if it's uploaded by my OS when the driver is loaded.

Bingo!

Nobody cares about firmware, that's not the issue, never was... but somehow that became the narrative repeated by people who don't know what they are talking about.

Many firmware loading drivers have caused drama over the years. I'm not sure if anybody remembers when Pottering and Sievers re-worked udev to change the ordering of how firmware was loaded from user-space, trying to improve how computers booted, by graphing firmware loads, determining the loading ordering, etc... which sometimes need to happen before it's kernel driver loads, or sometimes the driver itself is able to load the firmware, or the driver begin lazy, expects udev to load the firmware after the kmod is loaded... As a result, the kernel loads firmwares itself.

I'd say the whole firmware thing needs to be put under review, with concrete rules how they are loaded, and what their interfaces has to look like, and what hardware abstraction rules actually apply.


I think this firmwareblob vs. kernelblob vs. userspace blob needs to be decided on a case by case basis, and no strict line that shall not be crossed can be drawn anywhere. The important factor is, does their contribution help the community or not, will it allow the community to fix bugs and port stuff to a new innovative kernel, without it needing to re-implement everything from Linux?

Personally I would try to decide on blob size and scope. With nvidias driver I suspect that require massive firmware and userspace blobs, and the kernel driver is probably just a pass-through API. I would argue that this doesn't really move the community forward.

However if the blobs are small and simple, and have specific scope and a reason, like legislation or security, maybe even calibration data blobs, are useful independent of the OS itself, then I would give them a pass, since it allows the community to build upon.


Where do you draw the border? Even if they open source the firmware, they can also do whatever they want provided it runs as hardware.

If trust is the issue, you would have to open source the hardware/the silicon as well.

If trust isn't the issue, but being able to do your own things with the hardware you own, this is sth. different.


> this open-source release is nothing but a way to circumvent the GPL issues

Talking about strawman arguments... that's a doozie!

Instead of ignoring your strawman, staying focused on the topic...I'm going to address this nonesense...

Firstly Nvidia is not in violation of the GPL. Secondly, nobody cares about GPL violations, most certainly not the mainline kernel community. They do not pursue blatant GPL violations, because litigation is not productive in terms of improving the kernel, but keep in mind Nvidia is not violating the GPL. Thirdly, user space libraries that interface with a kernel API are not violating the GPL. Folks are allowed to run proprietary software on Linux. Do I like that, no... I certainly don't love it. However, Nvidia running user space graphics libraries is not much different than Radeon drivers using Mesa in userspace, it's how graphics architectures work in 2022.

So it's kidna ironic you're arguing about something separate (straw-man), and even then that argument fails...

Now then, putting your tangent aside... The whole point here is to start with the kernel, and get the kernel wrapper driver that interfaces with the hardware's firmware binary interface open sourced. The whole point is to get the driver into mainline, someday... Once we have that, we can untangle the the spaghetti of graphics API libraries, perhaps even port them to Mesa. Nvidia most certainly uses an intermediary low-level graphics representation for their graphics cards, so they can support all the various graphics APi's, or what not... it's a very common pattern for graphics cards... one Radeon even fell into not long ago. So it's just a matter of time before we decode this stuff.

But you're not wrong in terms of Nvidia's overall stack being proprietary, at least for now. But that's not important, what's important is Nvidia users wont' have so much drama when their distro kernel upgrades, and they screen goes blank thanks to Nvidia's driver not being compatible with the changed kernel ABI, and the AKMOD not being able to handle it.


No law or regulation that I know of specifies binary format.


The regulations specify they should not be easily changed. Distributing source makes it easier to change.


The carl firmware at https://github.com/chunkeey/carl9170fw should put that argument to rest. That is Wifi firmware source code for the Wifi I currently use.

Easily changed likely means there's a physical knob somewhere that you could accidentially poke as a layperson.

Not that you get a three year CS education, figure out how your distribution packages dependencies, install the correct embedded toolchain (good luck), find out exactly which chip is in your device, fetch the proper firmware source code in the right version, build the thing, figure out how to flash the result onto the chip / read the Linux kernel sources to figure out the filename inside /lib/firmware. That's not easy.

Even if this entire process was packaged (it is--see <https://packages.debian.org/sid/firmware-linux-free>), it probably still doesn't count (or at least shouldn't count) as "easily" changed.


Also, ath9k as a company.

More libre fw:

https://jxself.org/git/?p=linux-libre-firmware.git;a=tree

People stop talking out of his ass.


Please note that the Debian firmware-linux-free package does not build carl9170fw from source. There is proper packaging (source package called carl9170fw, binary package will be firmware-carl9170) being worked on but it isn't completed yet:

https://bugs.debian.org/994625


Unless you provide source code that builds reproducibly, and a detached signature that your hardware checks.



Wasn't that law only introduced a few years ago?


Has any rationale been provided around the blob being closed?

Is it licensed, or simply too too complicated to pick apart?

Graphics is a complicated field, but the techniques could still be patented - surely their market status isn’t dependent on trade secrets?


My guess - tangled code mess without clear control over ownership and bean counters not approving the work of untangling all that to open things up - it's not an easy process for huge legacy codebase.


It's pretty common to license peripheral bits of an ASIC from other companies. You don't even get to look at their blobs or much else.


There have been reports earlier [1] that processing power of some GPUs is suppressed by software rather than the hardware capability itself. It is frequently easier to mass produce similar chips than to have different chips for different priced devices. I had come across comments in other online forums where the users alleged that some software flags restrict the capability. I hope someone else will link those webpages if they come across them

[1] https://www.tomshardware.com/news/nvidia-gpu-system-processo...


I suppose this is also used for yield quality?

For instance, AMD is known to sell the same actually-8-core chip as a 6-core if the transistor yield was poorer.

I believe this is a standard in the industry, right? You can theoretically unlock these cores yourself, but there's a decent chance it will break or cause other significant problems.


They can move those restrictions into firmware otherwise someone is going to develop a tool that unlocks the driver like people did for the Intel compiler.


this was the case with RTX voice, a simple regedit made it work on non-RTX cards fine.


From my understanding, it used CPU acceleration instead


It would use RT cores by default, but if they weren't available or in use (EG: running a game with Ray Tracing) it could fall back to the older CUDA

That's also why they split it out into two products now

"Nvidia Broadcast" is their "you bought an RTX GPU so here's a shiny toy"

And "RTX Voice" for people with GPU's dating back to the Geforce 400 series that only does it via CUDA


Yes. Drivers represent enormous amounts of work and large parts of the overall development/IP effort in a GPU. Asking why nVidia don't open source it is like asking why they don't just open source the entire GPU. Answer: because then they'd have a much smaller and less valuable business, if they even had a business at all.


> because then they'd have a much smaller and less valuable business, if they even had a business at all.

This does not follow. While GPU drivers are a huge development effort, they are also very specific to the hardware. This is even more true with modern APIs that are closer to how the hardware works.

AMD and Intel both have open source GPU drivers and they are still in business. Are you claiming that their business would be much bigger if not for those open source drivers?


>AMD and Intel both have open source GPU drivers and they are still in business. Are you claiming that their business would be much bigger if not for those open source drivers?

AMD and Intel are underdogs, and benefit from open standards. Nvidia has an outright majority, and as a result their incompatible "standards" benefit them, because it forces everyone to choose between supporting Nvidia (and therefore the majority of their userbase), or "open standards" (a minority).


Apparently nVidia believe that is the case, and they are the market leaders, so - yes?


I expect once Nouveau will have reclocking ironed out, there will be an effort to implement Vulkan on top of it in Mesa. OpenGL is already available.

That will allow avoiding Nvidia blob except for the firmware.


It is not, but I think it removes a giant blocker for a truly OSS driver with Mesa/nouveau.


It takes time for Nvidia to become open-source friendly. Still a good move.

Hopefully the GPU driver is next in their open-source roadmap.


... and like any new software from NVIDIA it only supports the most recent generation of cards.

In normal times that might be a fair way to get people to upgrade but in the last few years (when crypto bros and scalpers have gotten all the cards) it's been outright cruel.


Turing is not the most recent generation of cards.

But that aside, Turing is their first generation that a GPU System Processor (GSP) that runs the heavy binary blob that comes with the open source driver.


> This is not upstreamable in its current form (nvidia admit this in their press release)

True, I remember though that the semi-recent AMD Radeon drop was not immediately merge-able to mainline because they used a bespoke hardware abstraction layer. But (as alluded to by your point #1) it's a huge first step, and in AMD's case I think they eventually reworked it so that it could indeed be merged.


This is the key paragraph in the article relevant to your point:

  This open-source kernel code is currently split into OS-agnostic and kernel interface layer components. This stems from NVIDIA's proprietary driver on Linux largely being shared code across Windows / Linux / FreeBSD / Solaris. For it to be upstreamed in the Linux kernel it would likely need to be more re-factored to cater to Linux, just as AMD's DAL/DC originally had tough time upstreaming due to its numerous abstractions.


> This is not upstreamable in its current form

What does "upstreamable" mean in this context? Is this good or bad?

As a follow-up, what would need to be different so that it is upstreamable and would that be good or bad?


> What does "upstreamable" mean in this context? Is this good or bad?

"Upstream" is the kernel source tree maintained by Linus (and friends). As Linux is open source, anyone can fork it for any reason, and a lot of people do. However, Linus still maintains the common base that people use to build on top of. "Upstreaming" is the process of getting your code included in these official releases. It's significant because there are no stable interfaces inside the kernel. That is, in any release the Linux developers can change any internal system of the kernel in any way they please, with the only caveat that to get that change included they have to fix up any breakage it causes, but only inside upstream. Or, if you are building your own kernels with your own patch sets, any release of the kernel can completely break your build.

Because of this, if you want your code to be actually used by real people, upstreaming it is considered extremely good, to the point of being almost a necessity. However, it can be a significant hurdle to overcome. Partly because there are fairly strict code quality/style requirements (as any kernel dev is supposed to be able to jump into your code to fix issues they caused by modifications elsewhere), but mostly because unlike inside the kernel itself, all interfaces between kernel and userspace are stable. That is, if you start supporting some operation from userspace, you need to be able to support it in perpetuity. Also, the kernel devs are extremely reluctant to add any kind of interface to the kernel where the only thing that uses it is some specific non-free binary blob on the userspace side.

Currently the main reason the code is not upstreamable as is is that it needs a very substantial userspace program to function, which currently only exists as nonfree, and nVidia doesn't think that the interface used to connect to it is well-enough decided they want to freeze it.

So the main hurdle for upstreaming is developing that interface until they are happy with it, and then develop some free version of the userspace portion of their graphics stack that can plug into that same interface.


Excellent answer, thank you. I appreciate both the overall general answer and the detail specifically involving the nvidia driver itself.


> upstreaming it is considered extremely good

are we assuming some sort of automatic testing will be added? unit testing? integration testing? sounds super hard to do for a driver. but... high quality code just being merged into master can still have bugs introduced in the future on accident, right? it's tests that usually makes it safe?


You can run these sorts of tests in a VM with PCIe passthrough.


And hardware companies usually have large test-beds of hardware. So, Nvidia should be able to build a comprehensive test-suite.


the same way they should have been able to build an upstreamable opensource driver all this time, right? lol


No. Cause one (hardware) is old-hat and the other (FOSS) is big and risky and scary.


The reason upstreaming it is considered extremely good really doesn’t have to do with unit testing or really exactly with testing at all. The issue, as stated eloquently by the parent of your post, is that without it being upstreamed, it’s Nvidia’s problem to fix any issues caused by one of their dependencies in the kernel changing, where as if it’s upstreamed, it’s only maybe partially their responsibility (main burden is on whoever changed the dependency).


The kernel has bugs, all software has bugs.

But there are massive massive test suites run by hundreds of companies on linux-next. If you email off a stack of patches that breaks something, chances are you'll get emails back telling you which patch had the breaking change in.


'Upstreamable' means that they send Linus a pull request, and Linus merges it into the kernel. It would then become part of Linux.

It's a negative that it's not upstreamable. I wouldn't go so far as to say it's bad, but it's less than good. That means you'll continue to maintain a separate kernel module that's rebuilt for each kernel.

It would need to be modified to match the kernel's coding standards and whatnot. Part of what makes linux such a strong thing is that you're just not allowed to do certain things in the kernel. They won't include ZFS in the kernel because it's got its own entire vfs -- linux filesystems are only permitted to use linux's vfs, and if you need vfs changes to make it work, you modify the 'one true' vfs. They didn't include the original AMD Radeon driver because it had its own hardware abstraction layer; you're supposed to do that in user space or not at all. (the Radeon driver was changed to remove the abstraction layer, and it was later merged into the kernel)

It's not immediately clear to me how much work it will take to get the driver into the kernel. Hopefully it won't be a lot of work, but it's possible that it would require basically rewriting it from scratch. It's at least written almost entirely in C, so that's a good start.

Regardless, this is tremendously good news.


> send Linus a pull request

Linux doesn't do pull requests. You actually send your patches as an email to a kernel maintainer (different people accept patches for different parts of the codebase) and then they'll review it. They'll pass it up to someone above them, and they'll pass it to someone above them, and eventually it makes its way to Linus.

That's why Linus is famous for angry and aggressive emails. He's not shouting down new developers who have made a simple mistake. He shouts and swears at the people employed below him who failed to properly do quality control and code review.


What would the practical effect be for me as a user, say I render in Blender, or I play a few intensive 3d games?


Currently, you rely on Nvidia to keep the drivers up to date and working with the kernel you use in your OS.

If the drivers aren't upstream, a future kernel can break them. So if you update your OS, it might no longer support your GPU without installing an alternative. Nvidia are responsible for getting this fixed in a timely manner.

If the drivers are upstream, then people aren't allowed to break them when they make changes to the kernel. There should never be a kernel or OS upgrade that causes the drivers to no longer work, the people who made the breaking change are responsible for fixing it before it gets released to you.


Thanks, so it means a better out of the box stability for end users.


And when you update your kernel, you wont need to recompile the kernel modules every time.


You want things to be 'upstreamable' as they can be included in the default upstream kernel. If it gets included in Linus' git tree then keeping it working on future version of the kernel is much less hassle


Upstreamable means adding it to the official Linux source being distributed on linux.org.

It just means you have to download drivers separately until Nvidia give the kernel guys a pull request for the drivers to be included in the kernel sources by default.


I think the more important thing than the separate download is that when someone works on the kernel, they usually make sure that the don't break other things that are in the official kernel. But there's no obligation not to break things outside the official kernel (apart from the interface to userland).


(1) No, it's unambiguously "meh" news

Nvidia have merely moved their giant closed source drivers into "firmware" - they now have a 34Mb firmware image for the GPU, that would be more correctly called the real drivers. They have essentially created an open source GPL forwarding layer so that the non-GPL firmware gets access to GPL-only kernel APIs, that's it.


New code on github is never good news.


Why?


Because github is microsoft, the enemy of freedom and the greatest threat the free software movement has ever faced


Had to do a few doubletakes on this, but even with the recent hacks and progress on NVIDIA releasing Tegra source code, I didn't expect this for another few years.

Holy shit.

It's even licensed as MIT.

Even OpenBSD could conceivably port this with enough manpower. Adding enough just emulation for the userspace driver would be a lot easier than maintaining a complete linux emulator.

This is one of the biggest things to happen to hardware support for open source OSes in well over a decade.


That might not even be overstatement. The last few big desktop linux crash-and-burns I've run into all had display drivers as a common component.

I like back-foot, underdog NVIDIA. Ascendent AMD hasn't drawn my ire yet, let's hope power corrupts slowly.


Amd changed their windows drivers to not output video if it detects its running in a VM. Nvidia went the other way and stopped doing so.

Both can/could be bypassed with some libvirtd xml magic, but still. Nvidia seem to slowly stop being assholes, AMD started already.


>Amd changed their windows drivers to not output video if it detects its running in a VM.

What? Why?


Presumably market segmentation. You're only allowed VMs that dont feel like shit (i.e. have gpu accel) if you pay for enterprise vGPU shit. Can't have someone buy two of your GPUs to give one to a VM, obviously.


> pay for enterprise vGPU

For AMD the driver is difficult to find and poorly documented (and only available on ESXi unlike NVIDIA vGPU support for Xen, Hyper-V, KVM, Nutanix, and ESXi, etc.). At least the guest drivers don't have licensing issues unlike with NVIDIA IIUC.


And very few AMD GPUs even support it...

(and good luck finding a remotely recent AMD GIM driver)


Plus the quality of the overall experience. And I understand NVIDIA is even worse re: GPU virtualization.

The end result is that it is unusable in practice. Very difficult and restricted to few CPUs/GPUs and very specific software chain.

Otherwise, it'd be open source, universally available and trivial to use.

The good news is that I understand this support is actually good on the Intel side, and Intel has promised that they will actually release competitive GPUs soon. Should this truly be the case, it will automatically make Intel the go-to for GPU virtualization, and might help motivate NVIDIA/AMD to stop segmenting re: GPU virtualization, ending this shitty situation.


> And I understand NVIDIA is even worse re: GPU virtualization.

Nope, it’s much better on the nvidia side actually. The latest AMD GPU with a publicly accessible OSS GIM driver is the AMD S7150, which was released in 2016. (https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualiza...)

And it’s locked out from most AMD SKUs today, so even if you got a modern GIM driver, you’ll need very special SKUs to enable it and use virtual GPUs.

> The good news is that I understand this support is actually good on the Intel side

Not anymore. GVT-g is gone on Ice Lake (Intel 10th generation mobile, 11th gen desktop) so that you can no longer do hardware vGPU on newer Intel parts at all.

Sad thing is that what you said used to be true.

Meanwhile NVIDIA GRID needs licensing fees but actually works, with high end GPU options being available. And has all the fancy stuff like vGPU live migration for seamless maintenance too. It doesn’t even compare.


Given the recent moves by Nvidia I wonder if they'd consider the holy grail of passthrough: Officially supporting (even a single) vGPU on linux so that you could use the card on the linux host and a guest.

The tech already seems possible since people have modded the enterprise drivers to do it, but that isn't official support. A few years ago I would have said there was no chance of this happening from nvidia. But I was also hopefully Intel's dedicated GPUs would support GVT-g back then!


Because the drivers for the consumer GPUs are not licensed for datacebter use and obviously VM == datacenter


This is a problem for QUbes OS which has a legitimate need for vgpu on a desktop operating system.

It's because of this arbitrary restriction that Qubes is not able to provide GPU acceleration, which is a huge barrier to its adoption.


Wow, I hadn't heard that AMD added that check. Between that and their unending reset problems that makes them a completely inferior choice for GPU passthrough. Before Nvidia stopped the passthrough blocking you could make a case that AMD was a better choice.


The cycle continues


> I like back-foot, underdog NVIDIA. Ascendent AMD hasn't drawn my ire yet, let's hope power corrupts slowly.

That "back-foot" "underdog" nVidia has the edge in the video market still... and 3x the market cap of AMD.


It's fair to extrapolate because their strategic decisions will be based on extrapolations.

NVIDIA had to overclock and hustle the current generation of cards and it's looking even worse for the next generation. Software was a moat when AMD was heavily resource constrained, but now they can afford the headcount to give chase. Between the chip shortage and crypto, there was plenty of noise on top of fundamentals, but one doesn't make strategic plans based on noise.

This is all speculative, of course. I'm sure if asked they would say it was a total coincidence. Just like AMD and Intel switching places on their stance towards overclocking. Complete coincidence that it matches the optimal strategy for their market position -- "milk it" vs "give chase." Somehow it always seems to match, though, and speculation is fun :)


NVIDIA is well, well ahead of AMD.

NVIDIA's cards were faster than AMD's with the huge gap in transistor density that was the Samsung fab.

Don't get excited for the AMD graphics division up in Canada.


>NVIDIA's cards were faster than AMD's with the huge gap in transistor density that was the Samsung fab.

They are roughly at par. AMD does better at lower resolutions because of their cache setup.

With the refreshed cards, AMD is slightly ahead.


Keep in mind that is at a particular price point.

NVIDIA's top of the range chip is ahead of AMD's, and the 3080's SKU is at a lower binning point on the bell curve than the 6950's.

Hence NVIDIA would be able to maintain a performance per watt crown at the 6950's price point if it sold its highest bins cheaper.

Given the gap in transistor density, that is an exorbitant architectural delta.


I wish my company were in the same desperate situation as Nvidia. One where we’d be faster than the competition with similar perf/W while using a much inferior silicon process…


APUs are eating the market of novideo, see e.g. the performance of the M1 iGPU


Are APUs different from what we used to call integrated graphics cards?


The difference is getting blurry. Apus have generally better communication/latency/shared resources with the CPU. The ultimate ideal of an APU is to have a unified memory with the CPU, which is the case in e.g the PS3/PS4 Despite progress in heterogenous computing (the neglected HSA), in SOCs, 3D ingerposers, high bandwidth buses interconnects and 3D memory such as HBM, the PC platform has yet to see a proper APU. In fact the M1 is probably the closest thing to an ideal APU on the market. But yes the more time pass, the more the term IGPU denote APU. AMD bought ATI because of the fusion vision, the idea that sharing silicon, resources and memory between the CPU and the GPU would be the future of computing.

An unrelated but very underrated is the egpu. Egpus are external to the pc unlike a dgpu. So you can buy a thin laptop, connect it via Thunderbolt to a rtx 3080 and enjoy faster gpu performance than allowed on any laptop on the market, and enjoy a thin lightweight, silent laptop the rest of the time. Disclaimer Thunderbolt is still a moderate limiting factor in reaching peak performance.


> the PC platform has yet to see a proper APU

Wat. AMD literally invented the term 'APU' and has been shipping them since 2011. Fully unified CPU+GPU memory since 2014's Kavari. That's full cache coherent CPU & GPU along with the GPU using the same shared virtual pageable memory as the CPU.

The M1 didn't add anything new to the mix.


It's a spectrum. I don't think that cache coherency was useable by developers/compilers. The two only ways I know (HMM and HSA) are niche, used by nobody. GPGPU compute would GREATLY benefit from programs that can share memory between cpu and gou without having to do needless high latency round-trips and copies. So they failed in practice. They never did a CPU addressable HBM interposer (despite having invented HBM) unlike what I believe is the M1.


> An unrelated but very underrated is the egpu. Egpus are external to the pc unlike a dgpu. So you can buy a thin laptop, connect it via Thunderbolt to a rtx 3080 and enjoy faster gpu performance than allowed on any laptop on the market, and enjoy a thin lightweight, silent laptop the rest of the time. Disclaimer Thunderbolt is still a moderate limiting factor in reaching peak performance.

Not just for laptops: this sounds also a bit like what the Switch dock could have been.

(And in some sense, it reminds me of Super FX chip for the SNES.)


APUs are AMD-speak for CPU and GPU on the same die (Intel has similar but doesn't call them that). Integrated graphics cards (a misnomer since there is no card -- IGP or iGPU is probably more accurate) may or may not be on the same die (instead could be on the motherboard, particularly in the chipset). That design is pretty rare/antiquated at this point though. Being on the same die means higher bandwidth, lower latency, etc.


I think Intel calls them XPUs.


Integrated video cards were integrated onto the motherboard. APUs/iGPUs are integrated into the CPU.


Just had my first graphics stack issue since 2013 upgrading to Fedora 36 and was caught flat-footed. I've got multiple GPUs, so now I've got to figure out if it's Wayland, amdgpu, nouveau (since unblacklisting), or dkms. "Just working" has made me lazy.


Was in similar boat recently, I'm not that up to day with the whole X11 vs Wayland, but dammit am I mad !

I feel like JUST as the "Linux X11 Discrete Graphics Scenario" started to become more stable and less (not none, but less) of an issue to setup and upgrade without getting black screens, the Linux-world is now turning to a "new windowing server" i.e Wayland, we starting all over again sigh

Maybe the answer to having a decent and carefree discrete graphics Linux stack is to fork (Don't you dare link to the XKCD comic about 'Standards') SteamOS.

They are at least motivated (as it's part of their core product) to make it work most of the time. And have a done boat load of good work for Linux ecosystem. Well done guys ! :)


> It's even licensed as MIT.

Wouldn't be the first time. The old 2D "nv" driver was part of X11, and maintained by Nvidia employees.

The catch, besides it being 2D only, was the "source code" was actually somewhat obfuscated.


Wow! MIT is about as clean as you can go, with no real reserved advantage to dual license etc using Affero GPLv3 or similar. Not bad.



I wonder if this release will meet the hacker's definition of open source in this case. People like that have a habit of changing the goal lines.


def seems related


it's not at all. it's been in the works via redhat, canonical and others for probably 2 years now.


If the module can use MESA, good. If not, meh.

>. Adding enough just emulation for the userspace driver would be a lot easier than maintaining a complete linux emulator.

OpenBSD it's the best BSD on support for free (Intel) and semi free drivers such as the ones from AMD, they already adapted all the src from Linux, KMS included.


That's very different. The source code for the userspace portions of the MESA drivers for AMD/Intel are released under a permissive licenses, so OpenBSD (and other BSDs) have been able to modify them to compile under their OS (and get those changes committed to the original tree). With NVIDIA, the userspace portions don't use MESA, so would need some form of translation layer to work on an OS other than Linux.

But, said translation layer would have limited scope; so is a lot more feasible than maintaining a general purpose translation layer indefinitely.


Official NVidia drivers natively supported FreeBSD for over a decade now.


KMS is from the kernel.


[flagged]


Nvidia have always shipped closed-source drivers, despite AMD and Intel providing open source drivers for their GPUs. This made the experience on Linux second class to Windows, where while also closed source, at least you knew that bugs would probably get fixed. Various non-core features simply wouldn't have support on Linux, e.g. Optimus. Also, shipping closed source binaries would limit which kernel you could run to supported versions.

Lastly, of course, this opens the way to non-Linux OSes receiving support as well.

If you just search the web for "Nvidia issues Linux" you'll see quite a few complaints. Particularly from people who have obscure configurations -- they pretty much had no chance of getting anything to work.


This is a good technical summary of the impact but I think misses the somewhat emotive history here.

See things like this infamous clip from Torvalds[0] for more context on the community sentiment around nvidia in general.

[0] https://www.youtube.com/watch?v=IVpOyKCNZYw


Don't even have to click to know what that is, first thing that came to mind when I saw this headline


> where while also closed source, at least you knew that bugs would probably get fixed.

Yeah, I waited just 18 months (IIRC) for proper DisplayPort DPMS signalling, so my monitor can sleep.

Linux driver bugs get fixed, albeit accidentally.


I think the GP meant the windows drivers would be fixed; not the linux drivers.


>If you just search the web for "Nvidia issues Linux" you'll see quite a few complaints.

I'd wager that is the BULK of the linux-power-user complaints. I'm definitely never buying NVIDIA again for my next PC. The last time I decided to go for Nvidia was because of their CUDA/ML eco-system.

Next time I'll just use the cloud for any ML/GPU-computing. The amount of time wasted by nvidia-driver nonsense is actually absurd !


Note that this is just the kernel modules, not the actual graphics driver.

IIRC these sources were already released under a permissive license along with their driver distribution. This just seems to be them putting those sources on GitHub and being more open to contributions.


The kernel driver was never distributed as source. The driver itself was a giant compiled object file that was linked into a kernel module with some shim code using DKMS. I know because I dug through it trying to fix the RTX 2060 in my G14 not going into D3.


Are they? I thought the “OS-agnostic” part was historically only available as a binary. Maybe that changed since last time I looked.

On quick inspection, this is a complete, MIT-licensed kernel driver.


>Note that the kernel modules built here must be used with gsp.bin firmware and user-space NVIDIA GPU driver components from a corresponding 515.43.04 driver release. This can be achieved by installing the NVIDIA GPU driver from the .run file using the --no-kernel-modules option.


Closed source firmware is acceptable for Linux. See:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/lin...

It would not be much of a stretch at all for nvidia to include their firmware there.

The userspace stack is an issue. For upstream Linux, there’s a fairly hard rule that graphics driver interfaces must be usable with open source userspace. But I don’t think the nvidia graphics user parts are particularly secret, and someone could write such a thing.


It already is for the 510 driver.


That's orthogonal to the parent's question.

My understanding as well was that the bulk of the kernel space driver was previously only available as a binary blob, with only a small shim layer open sourced that loaded the rest of the kernel module and translated all of the internal kernel calls. I heard a rumor this was actually core to their internal legal theory about why they could have a binary blob kernel driver.

Required firmware blobs and the user space libraries are ultimately different components in the stack.


This isn't the same as the source previously provided with the drivers. This is a completely new kernel module.


Thanks for clearing that up, for a second I thought nVidia was finally Linux viable.


The same could be said of a lot of Nvidia IP which was leaked a few months ago.

But this is different, it's voluntary.


> In this open-source release, support for GeForce and Workstation GPUs is alpha quality. GeForce and Workstation users can use this driver on Turing and NVIDIA Ampere architecture GPUs to run Linux desktops and use features such as multiple displays, G-SYNC, and NVIDIA RTX ray tracing in Vulkan and NVIDIA OptiX. Users can opt in using the kernel module parameter NVreg_EnableUnsupportedGpus as highlighted in the documentation. More robust and fully featured GeForce and Workstation support will follow in subsequent releases and the NVIDIA Open Kernel Modules will eventually supplant the closed-source driver. Customers with Turing and Ampere GPUs can choose which modules to install. Pre-Turing customers will continue to run the closed source modules.

Translating & simplifying the language here: sounds like GTX 10xx GPU users (Pascal architecture, e.g. 1070/1080) will stick with closed source for now, but RTX 20xx GPU users (Turing architecture, e.g. 2080) and RTX 30xx GPU users (Ampere architecture, e.g. 3080) will have the option to opt-in to the open source kernel module. Stable open source support for GTX 10xx GPU users may come later.


The reason for this is because NVIDIA's Turing and above GPUs use a new microcontroller called the GSP, which is RISC-V based. From my understanding, NVIDIA has offloaded their proprietary IP from the closed-source driver to the GSP firmware (and not the older microcontroller present on Pascal and lower). This is why `gsp.bin` exists in `linux-firmware` now, and the FOSS driver targets the GSP (because now the proprietary stuff isn't in the kernel driver but rather a RISC-V ELF binary that runs on the GPU), not the older controller.


ctrl+f binary, tx this is the answer I was looking for. The binary is still in linux-firmware similar to intel drivers


> Stable open source support for GTX 10xx GPU users may come later.

Nope, Turing or later gen GPU is a hard requirement.


What about GTX 16xx users? I have a GTX 1650 which is based on Turing but doesn't have the new NVENC encoder, I wonder if the new OSS driver will support this GPU, if it has that RISCV chip that everyone's talking about.


I have a 16xx card. It works fine.


For those who didn't use Nvidia on linux in the old times:

The driver was a proprietary binary. Since a kernel module requires interfacing with the kernel API, it could be considered a derivative work and a breach of the GPL license. So, Nvidia provided a small open source shim which interfaced between the kernel and the proprietary module.

You had to compile that shim yourself with the right arcane command line incantations and if you did anything wrong, missed the right packages or had an incompatible user space, including libs and compiler, you could end up without X11 and no way to easily run a browser or google about the problem you had. You had to do it EVERY FUCKING TIME YOU UPDATED THE KERNEL!

It was still possible to edit xorg.conf or, if you were older, xf86config by hand to fix it and use the VESA driver, but it was very inconvenient. It became more reliable over the time and even fully automated with DKMS, but I hated them for it.

I used and recommended ATI and INTEL for most of the people I could for a long time because of this.

I was from a time when It was possible to use 3D acceleration on linux with 3dfx with fully open source drivers (I think), giving you a cheap UNIX-like graphical workstation with OpenGL support. When Nvidia bought 3dfx and simply killed their drivers, my hate became specially strong.

EDIT: Remember you had to recompile the shim at every kernel update and replaced "module" with "driver".


> I used and recommended ATI and INTEL for most of the people I could for a long time because of this.

Same here but recently I somehow got a 3700X and there's no integrated GPU so I had to look for a GPU. I like my PC not just quiet but nearly silent, so a GPU with a fan was a big no-no. I couldn't find any single GPU able to drive 3840x1600 without a fan... Except for a NVidia one. Of course the proprietary Linux drivers are somehow buggy: the "sleep" doesn't work correctly, failing to reinitialize the correct video mode when waking up. It's always, always, always the same with NVidia GPUs on Linux. Thankfully I can switch to tty2, then back to graphical mode but I hate the inconvenience.

I'm thinking about selling my 3700X and getting a 12th gen Intel with an integrated GPU (I don't game and really couldn't care less about fast GPUs).


If you want to stick with AMD and don't want to swap out the motherboard, then you'd probably just need to get an AMD CPU with on board graphics.

The actual name of the CPU should help you find one that would work in that regard, for example, see this link which explains the naming suffixes: https://www.androidauthority.com/amd-cpu-guide-1222438/

In particular:

  X - Higher clocked desktop processor (what you got)
  G - Has integrated AMD Radeon Vega Graphics (what you probably want)
  GE - Has integrated AMD Radeon Vega Graphics but lower TDP (what you might want in niche use cases)
For example, some of my homelab servers use older AMD Athlon 200 GE as their CPUs, due to the fact that there are on board graphics and because the TDP is 35W, both saving electricity and also allowing me to use passive cooling (with just a large heatsink https://www.arctic.de/en/Alpine-AM4-Passive/ACALP00022A).

For the Zen 2 series that your 3700X belongs to, you might want to look at this table: https://en.wikipedia.org/wiki/List_of_AMD_Ryzen_processors#A...

From what i can tell, the closest option performance wise to Ryzen 7 3700X but with on board graphics would be Ryzen 7 4700G.


> For example, some of my homelab servers use older AMD Athlon 200 GE as their CPUs, due to the fact that there are on board graphics and because the TDP is 35W, both saving electricity and also allowing me to use passive cooling (with just a large heatsink https://www.arctic.de/en/Alpine-AM4-Passive/ACALP00022A).

I have a server with a 3700x. I originally had a cheapo discrete AMD graphics card in it, but I ended up just yanking that and running the machine without any graphics card. Saves power.


Side note: not that you said it yourself, but since you bought up the "E" SKUs, and it often comes up:

in almost all situations, the "lower tier" CPUs can be replicated simply by taking the higher-tier CPU and setting a lower power limit. A 1800X with a 65W power limit is the same thing as a 1700, a 3400G with a 45W (?) power limit is the same thing as a 3400GE, etc. Since the "GE" chips and other niche SKUs (there is a 3700 non-X iirc, for example) are often OEM-only, and thus they only have a very limited availability, they will often command higher prices on ebay/etc than the "real" chip. In this case, there is no reason to seek out the "E" chip, with the possible exception of if it gets you the "pro" feature set and you happen to need one of the features (particularly for APUs since ECC is disabled on non-pro APUs). So if you see a 3400G for $200 and a 3400GE for $250 (made-up numbers) then don't buy the GE, buy the G and set the power limit yourself.

Lower-end chips do not have better binning - actually the opposite, higher-end chips have better binning and will run at lower voltages for a given frequency than the lower-end chips will. The "higher leakage clocks better" thing is not really a factor that matters on ambient cooling, that is for XOC doing LN2 or LHe runs, but it has entered the public consciousness that "low-TDP chips are binned for efficiency". Not in the consumer market they're not - the exceptions being things like Fury Nano that explicitly are binned better, and for which you pay a premium price for that efficiency. But most low-end consumer processors are just... low-end. They're priced according to performance, not binning.

You can see this in the SiliconLottery historically binning statistics: 1800X will categorically clock higher at any voltage than a 1700, a 3800X is categorically better at any voltage than a 3700X, etc - and that also means they will run a lower voltage at any target frequency. AMD bins straight-down in terms of chip quality: Epyc and TR get the best, then the high-end enthusiast chips, then the value enthusiast chips, then the efficiency parts at the bottom of the bins.

The "E" parts are efficient because they have a low power limit set - not because they're binned better. Lots of chips can run fine at lower voltages, they just don't do the peak frequencies as well as the binned chips. A 1800X will still get you a bit lower voltage - but since the voltage/power curve is quadratic, the difference in power is compressed at lower frequencies/voltages. Also, at a low-enough frequency you will bump into the minimum voltage required, so that tends to compress things as well. So at 3 GHz, the impact on binning between a 1800X and a 1700 would be a lot smaller than, say, at 4 GHz. The 1800X can usually do that fairly easily in a later sample, but 4 GHz is pretty much always pushing the limits of safe voltage for a 1700, for example, so the 1700 gets crappier silicon because it clocks lower.

https://siliconlottery.com/pages/statistics

https://www.reddit.com/r/Amd/comments/cll1r9/hardware_numb3r...

(the "one weird exception" is lower core-count chips. If you think of each core as a dice roll, this means that an 8-core chip has to roll perfectly 8 times, where a 6-core chip only has to roll perfectly 6 times. Since all-core OC is limited by the performance of the worst core, this means that for equal yields, there may be more lower-core-count chips with high all-core OCs. Many midrange parts do not actually have defective cores, they're locked out for market segmentation to avoid undercutting margins on the higher-end parts (which is why Phenoms and 7950s used to be unlockable, etc) and - while I'm not sure AMD has explicitly ever said it - it would also be sensible that when they are disabling cores on an 8-core to turn it into a 6-core they pick the best 6 cores, which would push silicon quality upwards too. A 1600X actually has silicon quality comparable to a 1800X according to SiliconLottery, for example, despite being a much higher-volume part. This "weird exception" also gets complicated with Zen2/Zen3 because AMD deliberately uses a low-quality die for the second CCD (3900X/3950X) since it will mostly be used under those lower-clocking all-core loads, and the impact of binning is compressed in those lower-clocking situations...)


I heard the rumor that, starting with Zen4, AMD will include an embedded GPU in each and every CPU using the new socket.

This will end this situation, and give a nice bump to AMD's market share in e.g. Valve's steam client statistics.

These days, the performance of the embedded GPUs is already pretty usable, even allowing for running heavy videogames on lowish settings. They've been increasing the performance considerably (30-60%) on each generation for several generations.


There is another downside to G SKUs: many lack PCIe 4.0. For some dedicated GPUs, that is a (minor for now) limitation.

It's why I ended up getting the (surprisingly) cheaper Intel equivalent. It has an iGPU and also PCIe 4.0.


Why not go for a 5700g? That would hopefully avoid you having to replace your motherboard.


Exactly


> I couldn't find any single GPU able to drive 3840x1600 without a fan

> I don't game

Let me help. https://www.ebay.com/itm/194948432276 that's a full DP 1.2 port in there https://www.techpowerup.com/gpu-specs/sapphire-ultimate-r7-2... so it'll drive 3840 x 1600 up to 108 Hz even without custom mode shenanigans.


Ps. I didn't know Newegg merchants are selling used stuff but here we are: https://www.newegg.com/sapphire-radeon-hd-7750-11202-03-40g/...


The 3700x with a GPU with a fan is going to be a lot easier to make silent than a 12th gen Intel is going to be.

Also many GPUs support the fan turning off when idle.


I was about to post this. The fans on my AMD RX 580 never turn on unless I'm gaming or doing some crazy WebGL stuff. I can only imagine that newer ones are even more efficient in this regard.


It’s pretty damn stunning how quite an AIO 2x 140mm cooler can be. And graphics cards with those big triple fans are silent until needed.

I’ve been through a couple 12th gens. I like them, but unless you need the machine updated right now, 13th gen is 6 months out.


AIO performance on CPUs is largely limited by thermal transfer through the coldplate/IHS (Integrated Heat Spreader), not by radiator size. Basically the radiator is keeping the fluid very cool already, but heat can't move through the IHS quickly enough. So it takes a large improvement in fluid temperature to make a small improvement in die temperature - you are "pushing on a string" as the expression goes.

Almost no AIOs have the fluid-temperature sensors that would allow you to measure this directly, so everyone uses the die sensors. Which, since they're behind the IHS, will be much higher than the fluid itself. The die temperature is also a measurement of interest too - I'm merely explaining why the number you're seeing in the die sensor isn't really the big picture of how good the the radiator is doing at cooling. The die is hot, but the fluid is cool.

The AMD 295x2 is an extremely good example of this observation - this card had a single 120mm radiator, with one fan, and it could dissipate >500W of heat at ~60C die temperature. (not sure if this source says this directly but the non-OC power was ~430W average during gaming, and ~250W is not unreasonable for each 290X chip - actually they could go to 300W or higher if you really poured it on, but they also did generally show some significant power scaling with temperatures, so ~250W per chip/500W total is a reasonable estimate imo).

https://www.techpowerup.com/review/amd-r9-295-x2/28.html

You might say - but that's a dual-GPU card, with bare dies. And yes, that's my point, when the coldplate/IHS is no longer a bottleneck moving heat into the loop, a 120mm radiator is comfortably capable of dissipating 500W of power back out of the loop at extremely reasonable operating temperatures (60C die temperature). 60C is actually barely breaking a sweat, you could probably do 1000W through that 120mm if you didn't mind a die temperature in the 80-90C range. In CPU overclocking - your power limits/temps are almost entirely limited by how fast you can get that heat through the IHS. Reducing fluid temps (by increasing radiator size) is pushing on a string, it takes big gains in fluid temp to produce a small improvement in die temp.

Incidentally, direct-die cooling is the last untapped frontier of gains for ambient (non-chilled) overclocking. Der8auer and IceManCooler.com both make "support brackets" that replace part of the ILM (integrated loading mechanism - the socket and its tensioning mechanism and attachment to the motherboard) that holds the processor. This is necessary since the IHS is actually part of the ILM - the ILM presses down on the sides of the IHS, so removing it would change the pressure, and the ILM needs to keep a specific level of pressure on the chip to make a good contact with the pins but but without damaging anything. But you can delid the processor (there are services that do this for soldered chips, I don't recommend doing it at home) and use one of those brackets with a "normal" waterblock/AIO (or even air-cooler), since the bracket is holding the chip in the pin-bed at the proper tension.

Thermal density is going nowhere but up, Dennard scaling is over, so that is the only way to really improve thermals on <= 7nm-class nodes. Even AMD runs hot - they routinely run in the mid-80C range nowadays, even though they don't pull a lot of power - because of that thermal density, and every time they shrink it's going to get worse. The gains may be more worth it on Intel though - they show better scaling from power/voltage, TSMC nodes seem to pretty much top out at about 4 GHz and past there it gets exponentially worse for very little actual performance gain. 4.3, 4.4, sure, but they don't seem to do 5-5.3 GHz like Intel can on their Intel 7 given good temps and enough voltage.

But yes, to go back to your original point, I really like my 3090 Kingpin as well. It runs extremely cool, I can keep the die at literally 30C with the fans cranked all the way up, and it'll keep the VRAM at under 70C (!). And since it is a 2-slot card it doesn't turn into a compatibility mess with motherboard pcie slots getting blocked and needing airspace/etc. I am 100% behind AIOs on the larger gpus that we are seeing lately, this is a better solution than triple-slot or 3.5 slot coolers, which are (imo) completely ridiculous.


It's more of an adventure with a GPU than a CPU but you can add an aio after market with eg https://nzxt.com/product/kraken-g12

Then you're not paying the Kingpin markup if you're not planning to x-oc it. I did that on a Titan X (Pascal) which was still using a blower design and it worked fantastically.


This is true, but: you have to watch compatibility (note that there are no 3000 series chips on that list - because NVIDIA changed their hole placement again, and AMD has a couple different sizes for their different chips, 6500XT/6400 is definitely smaller for example), and also it doesn't do as good a job cooling VRAM. You can put add-on heatsinks on the VRAM chips, but they can fall off and short something. And adding them on the back can run into compatibility problems with bumping into the CPU heatsink.

And VRAM temperatures are a big problem on the Ampere cards - I don't really think running >100C all the time is really gonna be great for them long-term. Even gaming (vs mining) it's not abnormal to see VRAM over 100C (especially 3090, with the chips on the back, but also on the other GDDR6X cards, GDDR6X just runs extraordinarily hot). I know what NVIDIA and Micron say, I'm not sure I believe it. Above-100C is really really dubious imo.

for the 3090, with the VRAM on the back, I think it makes sense to go with a factory-configured AIO. Other cards, and especially GDDR6 cards, sure, it does work and it does help. Don't go too nuts tightening the AIO down though (ask me why! >.<)

Gelid used to make nice little cooling shields for the VRM and memory modules. I'm disappointed they stopped, although I'm sure it was a tiny market. For single-sided cards that is a much much nicer solution than stick-on heatsinks imo.

https://www.quietpc.com/gelid-icy-vision-gtx1080kit


When I ran it the vram temps were actually fine (although not the scorching GDDR6X variety of course). Since it's not sharing thermal mass with the GPU die, the direct airflow on its own was plenty.

You actually can see this with the 3090 in fact. Simply pointing a fan at the back of the card does wonders and easily keeps them in spec without a heatsink at all, although the backplate is acting as a bit of a spreader. Which makes sense since each memory chip is only like 2-3W. You don't need a heatsink for that, just a little bit of airflow


Thanks for the detailed write up, I learnt something new today.

I sometimes feel I should stop wasting so much time on the internet, but sometimes I realize that it means missing out on such comments that distill a lot of/important information.


> I'm thinking about selling my 3700X and getting a 12th gen Intel with an integrated GPU (I don't game and really couldn't care less about fast GPUs).

Don't suppose you're in Ireland/UK/EU? Looking at building a low/mid-range gaming rig for a sibling, and a 3700X would fit fine if you're looking to sell.


I bought some midrange AMD graphics card a few years back, picked for idle wattage, linux support and quiet cooling.

The card isn't silent, but it's quieter than the high end, low wattage PSU fan, which is needed to dissipate heat from everything else.

Anyway, eliminating the video card fan would probably make my desktop louder, which was counterintuitive to me at the time.

(The setup is extremely quiet, FWIW.)


> The module was a proprietary binary. Since a kernel module requires interfacing with the kernel API, it could be considered a derivative work and a breach of the GPL license.

I never quite understood this logic: the same (?) binary blob is used for the FreeBSD and Solaris drivers.

* https://www.nvidia.com/en-us/drivers/unix/

So how can it be a 'derivative' of the GPL Linux if it it also used on non-GPL systems?


Because to make a driver work with Linux you have to add Linux-specific code that typically uses Linux's source code, and that combination of the driver and Linux-specific code could be considered a "derivative".

Note that the word "derivative" is used here as defined by the license, not in its plain English meaning.


Linus Torvalds:

    But one gray area in particular is something like a driver that was
    originally written for another operating system (ie clearly not a derived
    work of Linux in origin). At exactly what point does it become a derived
    work of the kernel (and thus fall under the GPL)?
    
    THAT is a gray area, and _that_ is the area where I personally believe
    that some modules may be considered to not be derived works simply because
    they weren't designed for Linux and don't depend on any special Linux
    behaviour.
    
    Basically:
     - anything that was written with Linux in mind (whether it then _also_
       works on other operating systems or not) is clearly partially a derived
       work.
     - anything that has knowledge of and plays with fundamental internal
       Linux behaviour is clearly a derived work. If you need to muck around
       with core code, you're derived, no question about it.
* https://yarchive.net/comp/linux/gpl_modules.html

Then you have things like (Open)ZFS and DTrace.


By that reasoning every program written to run on MS-DOS was a "derived work" of MS-DOS. Such programs were written with MS-DOS in mind, and often mucked around with fundamental internal MS-DOS behavior.

Same with pre-OS X Mac. Most Mac programs were written with Mac in mind, and it was not uncommon for programs mucked around with OS code and data.

What matters is how copyright law defines derivative work, and in the US that has nothing to do with whether or not your work was written with another work in mind or plays with fundamental internal behavior of another work. What matters is whether or not it incorporates copyrighted elements of another work.


> don't depend on any special Linux behaviour.

This is the key point, and it is nearly impossible for firmware blobs to depend on any OS behavior, let alone "special Linux behavior". The only way they could do so is via the open-source part, so it should be easy enough to check that.


You quickly get into something similar the ship of theseus (or Trigger's broom) argument. You write some code that must link to a GPL library to functon; that code is now GPL because it's a derivative work of the library.

You rewrite that library under an MIT license, so now your code can link to that and run. Is your original code still a derivative work?


Your original code is never a derivative work. You retain copyright to the code you wrote yourself, even if it's combined with GPL later. GPL even contains this interesting clause:

> You are not required to accept this License, since you have not signed it.

So to answer your question: no, unless you've copied bits of GPL library into your code (or similar that would be judged as a copyright violation).

There's also a crappy situation of Oracle vs Google that made APIs copyrightable, so now it's not entirely clear if your code + your rewrite of library is still yours if it uses an API of the GPL library.


> Your original code is never a derivative work[...]

> > You are not required to accept this License, since you have not signed it.

> So to answer your question: no, unless you've copied bits of GPL library into your code (or similar that would be judged as a copyright violation).

Actually that clears a lot up for me, and I'd have considered myself reasonably knowledgeable when it comes to copyright in general; I think I had a few conflicting ideas about what it means to be an original work. Thank you.


> unless you've copied bits of GPL library into your code (or similar that would be judged as a copyright violation).

Which might happen very easily: one #include and you might be there.


The license is abundantly clear about this and answers all your questions.

It matters who is doing the rewriting and how they got the code in the first place.


> The license is abundantly clear about this and answers all your questions.

The GPL is emphatically not clear about anything. It's a legal minefield precisely because no one has any idea what it means, and everyone has their own interpretation.

> It matters who is doing the rewriting and how they got the code in the first place.

Well, then,... that's about as clear as mud.


I think one of the matters that confused me about it was the CLISP question. IIRC, CLISP linked to readline, but was released under a non-GPL license.

RMS contacted them, and asked them to relicense. They suggested either reimplementing a stub readline-library, or rewriting their line editing code against another lib instead. RMS insisted that they would still be a GPL-derivative, resulting in the current license situation.

I may be misremembering the recount of this, as this was way before my time.


RMS may have believed or wanted that, but it is my understanding (IANAL!) that the case law has been settled differently. If you are found in violation of the GPL due to a dependency you weren't aware was released under the GPL, you can fix that violation by rewriting your application to avoid the GPL dependency.

CLISP et al cannot be forced to distribute their code under the GPL. It's their code and their choice; contract law cannot compel someone who has never entered into the contract to do something against their will -- CLISP didn't knowingly distribute GPL code, so that distribution doesn't trigger acceptance of the GPL terms. They just have to make the situation right once they're made aware of the violation.


It's more nuanced. If you've included GPL code and modified or redistributed it, then you either have to comply with the GPL to have permission for that use, or you've potentially committed a copyright violation.

To comply with the GPL you only need to publish the one specific snapshot of source code you've combined and redistributed with GPL code, but you don't need to permanently relicense your project if it doesn't contain any code you don't have rights to use. The "tainted" version will be granted as GPLed forever, but other earlier or later versions that don't use any GPL code don't have to.

Or you can go the copyright way, and claim it wasn't a copyright violation (because it was a fair use, or non-copyrightable code) or settle the matter in whatever way the law lets you get away with.


That isn’t any different from what I said though? If someone points out you violated their copyright then you need to (1) fix the issue, and (2) pay appropriate damages. But for open source software the damages are nil, so fixing the violation is the only thing you need to do. And you can do that by either releasing the code as GPL, or removing the dependency. Either would be an acceptable remedy, in the eyes of the law.


Here's the email chain between RMS and Bruno Haible, author of CLISP:

    https://sourceforge.net/p/clisp/clisp/ci/tip/tree/doc/Why-CLISP-is-under-GPL
Apparently the situation was that CLISP was distributed as `lisp.a` and `libreadline.a` (with source for Readline included) and the end-user linked them together. Haible offered to write a `libnoreadline.a` library, exporting Readline's function but not providing their functionality, but RMS insisted that the result would still be a derived work of Readline.


I’m not a lawyer, but my understanding is that the GPL requires you to make source available upon request to users of your software.

So I guess if someone asks you for the source code, you can require them to prove that they actually have a copy of the GPL version, and not the new one.


> Note that the word "derivative" is used here as defined by the license, not in its plain English meaning

The license does not contain a definition for "derivative" nor of the similar term "derived" which it also uses.


That's an interesting cognitive dissonance that I've always been fascinated by. I've heard people criticize developers who release proprietary drivers for the linux kernel, but never those who release something dual licensed as GPL2/MIT, or those who distribute a dual licensed GPL2/MIT module as if it were solely under the MIT license; surely that would violate the Linux kernel's GPL (in being a derivative work) as much a proprietary module would?


MIT is Ok because MIT is compatible with GPL. GPL has language saying you can't add restrictions, but the combination GPL+MIT is essentially GPL so it's ok.

Dual GPL/MIT essentially means that you as a user can choose whether you to use the code as GPL or as MIT, but if you contribute to the code you must provide the full GPL+MIT rights.

As to why release a driver as GPL/MIT instead of just the GPL, I think the idea is that the BSD's (or other OS'es) can take the code and use it under the terms of the MIT license and port it to their kernels. IIRC there are many drivers in Linux that are dual licensed in this way for that reason.


I'm not sure about the details, but to write a linux kernel module you must include some .h files under the GPL license. You really have to include it because linux ABI is intentionally not stable exactly to prevent proprietary abuse. So, AFAIK, every functional linux kernel driver must be released under the GPL.


APIs and ABIs should be open. GPL taking a different stance here is not helping anyone.


The point is: if you want to include a GPL licensed header, your code must comply with its license. Your opinion on the GPL is an entirely different matter.


btw for the Oracle vs Google lawsuit, OpenJDK was licensed as GPLv2. It was very much a GPL (purported) violation case.


"Derivate" I think here is red herring. The biggest problem is that the combination of kernel + nvidia modules could never be redistributed, which means that technically no distro should have been able to ship these drivers by default.


If I remember correctly, the open source ATI drivers were always a bit buggy and it wasn't that easy getting them installed either. The tradeoff was always Nvidia: proprietary but works well, ATI: open but buggy.


As far as I'm aware, since AMD took over, they've been fairly stable (although occasionally omitting support for the latest features until the next kernel release)


IIRC were some problems with the Linux drivers for Navi 1.0 that continued for about a year after launch. Supposedly those have been fixed.

My Vega 56 has been perfectly stable and trouble-free for years.


As a Navi 10 (5700 XT) owner, those problems still exist. It used to be that at least once a week while gaming the driver would crash with some undecipherable error message in dmesg, and because the card had the reset bug the only recourse was to reboot the machine entirely. 4 years later the only thing that's changed is that the crash shows up less frequently (I'd say once every 3 months).


Have you ruled out power supply issues and are you running at stock clocks (for CPU and RAM as well)?

Anyway, going from "at least once a week" to "once every 3 months" means that 90% of your crashes have been fixed.

> with some undecipherable error message in dmesg

What kind of message would you expect that would be more decipherable.


> Have you ruled out power supply issues and are you running at stock clocks (for CPU and RAM as well)?

Yes for both. No overclocking whatsoever.

> Anyway, going from "at least once a week" to "once every 3 months" means that 90% of your crashes have been fixed.

I don't think I'm supposed to be ok with a device I paid premium money for crashing once every three months with no explanation from the manufacturer. They could've fixed 99% for all I care, it's still absurd that it's even an issue in the first place.

> What kind of message would you expect that would be more decipherable.

One that would lead me to an actual solution or at least an explanation, not just year old threads of people reporting this exact issue with replies saying it was fixed in kernel version X, where X is different for each thread.


Are you using Debian or Ubuntu LTS? LTS distros with older kernels are a separate beast when it comes to hardware support.


> the open source ATI drivers were always a bit buggy and it wasn't that easy getting them installed either.

No. Once mainlined, you had to do absolutely nothing to get the hardware working.


The part about them being buggy is definitely true.

Up until somewhere around 2016-2017 the ATI/AMD drivers were really bad.

I had an "HD 7850" GPU on Linux around that time and it was barely usable. The performance was less than half of what you got on Windows, and the drivers would crash very often, sometimes several times a day if I was trying to play games like Team Fortress 2.

It was so bad that I decided to replace the HD 7850 with a new GTX 970 and decided to not buy anymore AMD GPUs for the indefinite future. The GTX 970 was stable and performed very well with the closed source drivers, and other than them being closed source I never had an issue with them. I always installed the closed drivers through the system package manager which handled all of the tricky stuff for me (Arch Linux maintains the nvidia driver as a system package and makes sure it runs on the current kernel before releasing it).

In modern times the situation has flipped though. I still haven't bought an AMD GPU since then but I am pretty sure my next one will be.


I agree; 2016-17 was about the turning point. I bought a Fury X around then, and it was flawless back then. In contrast, my old nvidia cards had become unusable.

On the AMD, FreeSync and HDMI audio didn't work at first. (For any card; the driver documentation said those features were a work in progress.)

Anyway, I unplugged it for a year or so, and recently plugged it back in. One apt get upgrade later FreeSync and HDMI audio just work.

It's gotten to the point where I'd opt for an ARM laptop over one without AMD or intel graphics. From what I can tell suspend resume doesn't work on intel CPUs (on windows or linux), so it's basically AMD GPU or no x86 at from a compatibility perspective. (Did AMD also eliminate S3 suspend, and not replace it with a working alternative?)


Just bought a 6600XT, and it's been great.

Literally just plugged it in and installed the driver packages I didn't on initial setup, on most distros it would've literally been plug and play.


I also had a HD 7850, and though I had pushed it less than you I never noticed any huge issues.

It was in a uniquely terrible position of being one of the last cards released supported by radeon when all the development had moved to amdgpu, which it supposedly could run if you jumped through the right hurdles. I remember the xorg feature table having several things working for older and newer models but not the 7850.

Still, my experience with it led to another AMD card that I've also been quite happy with.


About them being buggy, I won't discuss. But you didn't have to do anything to even get them installed.


For the people downvoting. The dude is saying there was nothing to download since the drivers come with the is install.


I belive this is talking about radeonhd/radeon/ati circa 2015 or earlier.

Around then, you still had to install the corresponding X11 portion of the drivers, though the nvidia eqiuvalent had the same limitation.

radeon/radeonhd, or fglrx (which was the propriertary AMD graphics) absolutely worked worse than nouveau or the proprietary nvidia drivers at that time. It was only a couple of years into amdgpu where the tables turned.

At this point it would be nice if they'd backport their Linux drivers to Windows, as I'm now on my third AMD GPU in 12-13 years (HD 5770, r9 290x, 6900XT) to have issues where the driver will randomly crash when playing hardware accelerated video on one monitor while playing a directx game on another monitor under Windows.


I'm pretty sure I needed to mess with xorg.conf and other settings to get things like screen resolution and Compiz working correctly. I don't know what part of the stack was responsible for those issues, but I thought it was related to the graphics driver.

I could be misremembering though, this was 15+ years ago now.


Except having been downgraded from OpenGL 4.1 to OpenGL 3.3, because the GPU wasn't interesting enough, well done AMD open source drivers.


What do you mean with "old times"? :)

That's basically still what happens. Fedora automates this nicely with akmods, which automatically rebuilds these source only modules and installs them when you update the kernel. Has been working smoothly for a while, but it is fundamentally the same thing still.


Debian does the same with DKMS.


> EDIT: Remember you had to recompile the shim at every kernel update and replaced "module" with "driver".

I haven't bought NVIDIA since then


I remember using that for a Geforce2 MX and the installer.

People has no idea on what FREEDOM do you have if you aren't bound to crappy licenses with Nvidia and CUDA for serious computing where you can be limited per core.


Those were simpler times


On Debian I do: module-assistant build-install nvidia

And it works every time, but you do need to run it every time. There is a way to automate it on new kernel installs.

> missed the right packages or had an incompatible user space, including libs and compiler, you could end up without X11 and no way to easily run a browser or google about the problem you had

I always kept the previous version of the kernel and module in case of this.

I've been recompiling my nvidia module each kernel release for over a decade and I've had no problems, you install the kernel, you install the nvidia module, and you reboot.


> [...] you could end up without X11 and no way to easily run a browser or google about the problem you had.

You don't need X11 to run a browser. But you are right that it's pretty inconvenient without.


Thank you for bring back so many memories of learning the unnecessarily hard way.

This is great news


the old times is what precipitated linus to give nvidia the finger.


I did a double take because the title is almost clickbait for us desktop Linux users, and immediately wondered "what's the catch?". It's a significant catch, but also a significant step in the right direction.


It sounds like this "GSP" lets them binary blob their "secret sauce" on the controller itself, so the rest of the driver can be open source now.


Probably all their "enterprise" feature lockouts(vgpu etc..) are in the gigantic firmware that's loaded into GSP. But things that matter to desktop users probably aren't going to be locked out this way.


It will probably end up like how AMD cards work.

The closed source driver still exists but there will hopefully be a completely open source stack (Nouveau++?) For nvidia.

This blog has more details about red hats plans for this driver.

https://blogs.gnome.org/uraeus/2022/05/11/why-is-the-open-so...


> The closed source driver still exists but there will hopefully be a completely open source stack (Nouveau++?) For nvidia.

I can only hope they change name to aidivn, like any sane driver should.


I've never properly understood why the closed source AMD driver still exists. Is it substantially different from the open source one? Does it offer anything not included in the open source one?


For the most part, it's just specific support for specific workstation applications that AMD can't release publicly due to some contractual reasons or that those specific changes would be a detriment to a general selection applications. There are a few OpenGL features in there not in the open driver. There's also the DirectGMA thing for having hardware DMA directly to and from the GPU without CPU involvement.

The latter I wish was just a general purpose feature now that things like resizable bar are seeing general support in the consumer space.


AMD proprietary driver actually has only few proprietary bits. Other than OpenCL that was already mentioned there are:

* Proprietary shader compiler that can also be used for Vulkan

* Legacy OpenGL driver optimized for closed-source workstation apps



The rocm stack has a short list of supported hardware, but it'll run in YMMV fashion on other hardware.


OpenCL, for one.

If you don't care about that, you can run pretty much anything else and performance is great.


I think support for professional users.


With this announcement, I'm now interested in trying out some Nvidia graphics cards on the Pi again. Nouveau had some issues, and the official drivers had no source available, so I couldn't hack them to work.

With the source available... it could be possible! Of course, CUDA support may never happen there, at least not using open source code.


Its a bit too early now isn't it ? The driver is still very alpha, and doesn't support most display stuff.


It… works. I'm typing this comment from a machine running the new FOSS drivers like a champ on KDE Plasma. Not that it is perfectly stable, but it definitely works.


This early is a good time to also iron out some of the inevitable aarch64 bugs that have already been ironed out in other kernel drivers for AMD/other stacks.



Which ones are connectable to it? And does this rely on the "replace the USB3 controller with a PCIe bridge" hardware mod?


Parent is talking about the Compute Module 4, which can plug into (among other things) a first party IO board that has a PCIe 1x slot. With a simple hardware mod (cut the slot connector, cut the GPU, use a riser, etc etc), any GPU can physically fit. He's only gotten one to work so far though, an ancient ATI card. Lots of fun YouTube videos and GitHub issues about this if you search parent's name or look at links in his profile.


<meta>-F geerl: not disappointed

Thank you for the work you do, I'm going through your ssd on rpi4 today!


23 years ago in middle school I made my first ever linux user group post [1] trying to shift from an nvidia geforce to the onboard cyrix mediagx drivers because they had closed source drivers.

It’s been a long time coming lol.

[1] https://www.spinics.net/lists/xf-xpert/msg04601.html


They can finally close that support ticket!


>in middle school

Haha that's awesome! I was 1 yo 23 years ago, but in the same vein, I had my middle school years of Linux. Although it was significantly simpler than what you were up to at the same age!


The kernel modules are the first stage I guess, since a massive amount of hardware programing knowledge is in user space like with AMD/intel GPUs.

I wonder how much LAPSUS$ hack has to do with it.

I wonder if nvidia hardware programing interface is a mess like AMD one, just curious.


>I wonder how much LAPSUS$ hack has to do with it.

Probably zero.

1) There have been rumors about this for months

2) The hacks only happened very recently, this certainly would have taken longer to do than that.


Do you have a reference for the AMD interface? I know it exists but don't know where to find it.


If I recall properly, the command circular buffers of 2^n bytes ("queues" in vulkan3d) are VRAM IOMMAP-ed (you just need atomic R/W pointers for synchronization, see mathematically proven synchronization algorithms). There is a "GPU IRQ" circular buffer of 2^n bytes coupled with PCIE MSIs (and I recall something about a hardware "message box"). The "thing" is, for many of them, how to use those commands and how they are defined feels very weird (for instance the 3d/compute pipeline registers programing).

Have a look at libdrm from the mesa project (the AMDGPU submodule), then it will give you pointers where to look into the kernel-DRM via the right IOCTLs.

Basically, the kernel code is initialization, quirks detection and restoration (firmware blobs are failing hard here), setting up of the various vram virtual address spaces (16 on latest GPUs) and the various circular buffers. The 3D/compute pipeline programing is done from userspace via those circular buffers.

If I am not too much mistaken, on lastest GPU "everything" should be in 1 PCIE 64bits bar (thx to bar size reprograming).

The challenge for AMD is to make all that dead simple and clean while keeping the extreme performance (GPU is all about performance). Heard rumors about near 0-driver hardware (namely "rdy" at power up).


> Have a look at libdrm from the mesa project (the AMDGPU submodule), then it will give you pointers where to look into the kernel-DRM via the right IOCTLs.

Exactly the pointer I was looking for, thank you.


Likely very related, wasn't this one of their exact demands? Looks like Nvidia caved haha.



This is well after the deadline iirc, and not the sort of thing that gets rushed.

I doubt this is directly because of that.


maybe for the user space then.


I wonder if this has been in the works for years, or if this is a reaction to the recent Lapsus hack.


The datacenter focus here probably just means that $$$$ did the talking somewhere. AKA some large customer/former customer/potentially former customer said "open source or else" and they decided that having a couple people clean up, and push the special bits into the firmware/etc was a good way to solve the problem and keep $ALTERNATIVE at bay for the next generation or two.


I'd be interested to know too. The hackers purportedly demanded release of the Nvidia drivers as open source:

> The LAPSUS$ hacking group, which has taken credit for the breach, had an unusually populist demand: it stated that it wants Nvidia to open source its GPU drivers forever and remove its Ethereum cryptocurrency mining nerf from all Nvidia 30-series GPUs (such as newer models of the RTX 3080) rather than directly asking for cash. [1]

[1] https://www.theverge.com/2022/3/4/22962217/nvidia-hack-lapsu...


My money is on the "Steam-Deck Effect".


What's that?


Looking over the GitHub commit history of the two contributors on the repo, I’d say it’s a reaction.

These folks look like they’ve barely touched much open source code before now.

A planned run up would surely have these folks doing more GitHub based commits even if on very private repos.


c'mon y'all. read the other threads before commetning. It's not. https://blogs.gnome.org/uraeus/2022/05/11/why-is-the-open-so...


Hell is freezing?

Serious, does it means we won't need Nouveau anymore? How many and which binary blobs it still needs? Are they encrypted or require signing?


From the linked article:

> The current codebase does not conform to the Linux kernel design conventions and is not a candidate for Linux upstream. [...]

> In the meantime, published source code serves as a reference to help improve the Nouveau driver. Nouveau can leverage the same firmware used by the NVIDIA driver, exposing many GPU functionalities, such as clock management and thermal management, bringing new features to the in-tree Nouveau driver.


Nouveau is still needed because Nvidia's drivers do not conform to Linux kernel standards, so they can't be upstreamed. Nouveau is conforming, so it still has a reason to exist. Fortunately Nouveau can more easily improve by using Nvidia's now-open source as a reference.


Nouveau is still needed if you want open-source userland; the new nvidia open source thingy is only kernel-side and presumably works only with their own userspace drivers.


> presumably works only with their own userspace drivers.

For now, the plan is to replicate the way AMD drivers

Work, with having shared firmware but separate user lands, one closed and the other libre(MESA for AMD, Nouveau++ for nvidia?)

More details here from Red hats Christian schaller

https://blogs.gnome.org/uraeus/2022/05/11/why-is-the-open-so...


It means Nouveau will finally can get good I think.


Nope, nouveau is still hosed because they've been blocked ny Nvidia's secrutful FALCON's which have the reclocking and power management API's locked away behind proprietary firmware blobs.


This new driver targets GPUs with FALCON, and nouveau will be able to control it, as FALCON is GSP (https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-R...).


They're likely changing it because security researchers pwned FALCON.

There is a way to leak the hash against which High Security mode FALCON compares the microcode for signature validation.

So hopefully you'll forgive me if I maintain my skepticism over this being anything but Nvidia iterating to close a massive security defeat by redesigning their FALCON controller to mitigate the thorough pwning that's been achieved.

I don't doubt there are other improvements, I just think the timing is pretty darn convenient, especially for being in the midst of a semiconductor shortage for going and doing a massive manufacturing change like that.


Reading the hash doesn’t sound like much of a flaw to me, could you elaborate or provide a link?


nvidia switched away from FALCON to RISC-V.


Currently no, the driver is alpha code at this point.

But the goal is to create a complete open source stack like MESA for AMD.

Also its only for Turing and newer (GTX 16XX+) https://blogs.gnome.org/uraeus/2022/05/11/why-is-the-open-so...



Phoronix link about what this actually means:

https://www.phoronix.com/scan.php?page=article&item=nvidia-o...

Main takeaways:

- support for gaming workstation GPUs is alpha

- the user space stuff (OpenGL/Vulkan) is still closed source. (A LOT of heavy lifting is done here.)


Ok, we've changed to that from https://developer.nvidia.com/blog/nvidia-releases-open-sourc... above. Thanks!

Edit: ok, changed back


Uh why would you change from a good, official source to a worse rehash of the same stuff with less detail from Phoronix?

I don't even dislike Phoronix, but what was wrong with the first-party source?


User suggestions about better articles tend to be pretty reliable, so we tend to trust them. Sometimes they turn out not to be better (at least not by consensus)—in which case we can change the link back. I've done so in this case.

If you want to understand this process, you should understand that none of it involves actually reading the articles!


I would argue the Phonorix is the vastly superior link.

That blog is the most dedicated to the confluence of 3D, Linux, and open source.

The Nvidia copy is typical engineering marketing copy. The Phonorix post dives into everything with great detail and goes further by discussing the future of open source Nvidia support with the upcoming technologies, and existing technologies (Noveau, Mesa, etc)

Nvidia fanboys care little for Phonorix because that site consistently points out how unfriendly and uncooperative Nvidia has been with the whole of the Linux community. (Web search for yourself Nvidia EGLStreams GBM.)


Please change it back. Nvidia was the official source and Phoronix is low-quality blogspam of dubious accuracy.


> the user space stuff (OpenGL/Vulkan) is still closed source. (A LOT of heavy lifting is done here.)

Much like AMD - surely they benefit a lot from having a relatively stable non-kernel ABI they can target, though. The problem right now is that everything changes every time the kernel is updated, but if you turn it into a "here's how you dispatch PTX to the card" layer and a "here's how you turn OGL/Vulkan into PTX" blob then the dispatch layer probably isn't changing as much.

(graphics doesn't really use PTX but you get what I mean... dispatching compiled assembly and graphics API calls to the card.)

It doesn't help further the copyleft cause as much as if NVIDIA had open-sourced everything, of course, but from an end-user perspective of "I don't want my kernel updates to be tied to my driver release-cycle" it should solve almost all of the problem?


With the kernel driver open source + redistributeble firmware I guess the graphics APIs can be provided by mesa.


Nouveau has very close to 100% openGL and ES support. It has a deficit of out of specs extensions support though. Also no opensource nvidia vulkan implementation??

https://mesamatrix.net/


I guess nobody bothered to add vulkan support due to almost unusable performance without reclocking. Hopefully some work will get started with this news!


reclocking.. You're reminding me of so many memories of years of a decade of reading phoronix nouveau news. (btw nouveau mean new in french)


If I understand this correctly nvidia is moving its proprietary code from the drivers to the card itself. Looks like we are looking at a future where graphic cards will get firmware updates instead of driver updates. I am wondering with how large the newest graphics cards and ho the power requirements are so much larger than the other components. I see a future where we have PC cases and laptops with a pci slot exposed externally to attach external graphics cards which has it's own power supply and cooling only. It is attached to the PC when you need the GPU power for gaming or work then just remove it for a silent PC. As we start hitting the physical limits for moores law I think such a future is big possibility.


Would love to hear from folks who worked on this!

Was this an ongoing thing? Did it have to be pitched hard? What finally made the difference? I'm assuming here it's not because of LAPSUS as some speculate. This seems like it must have been in the making for quite a while.


I didn't work on this but I have watched it with interest for a very long time. It has been a strategic initiative in order to improve the GPU compute ecosystem. No one loves having a tainted kernel and everyone who uses a GPU with Linux (which is almost all data center GPUs) would prefer to have the kernel modules open source. It took a lot of work and planning to figure out how to do this over many teams for a very long time, and then an enormous amount of testing to prove the new drivers were fast enough to be deployed.


It could be quite interesting to outline just how far back in time this started (with lots of details), considering those demands to open-source certain code that were apparently made around March 3rd this year.

My own motivation is to believe that NVidia isn't as broken as everyone insists it is, I guess :) and more broadly speaking it honestly seems like a Good And Interesting Idea to make the situation more clear in any case, particularly given the coverage and significant collective awareness it's attracted.

Also, I found https://www.phoronix.com/scan.php?page=news_item&px=Big-New-... in another comment - 22 May to 12 May, that's some serious stamina lol


This functionality has been rolled out (shipping) for the past year. From the blog post: "This was made possible by the phased rollout of the GSP driver architecture over the past year, designed to make the transition easy for NVIDIA customers. https://download.nvidia.com/XFree86/Linux-x86_64/510.39.01/R..."


And further, the GSP driver arch depends on the GSP controller available on Turing and later GPU's. Per wikipedia Turing was unveiled in 2018, so I guess design work was started several years prior to that unveiling. Not saying the decision to open source the driver was made back in 2015(?) or so, but the wheels were set in motion that eventually enabled the open source decision a long time ago.


Why is the firmware closed source?


So that people could not bypass DRM (Digital Rights Management, not Direct Rendering Manager). And signing the firmware would not help because with source code it is easier to find vulnerabilities in signature verification code. Though they still could publish the firmware partially.


Why would it be open source?


Because anyone would be able to improve it.


See the RedHat graphics director here: https://linuxactionnews.com/240

He says that Nvidia's interest intensified in the last 3 months. Probably Steam Deck challenging their monopoly is making them uneasy.


> Open kernel modules support all Ampere and Turing GPUs. Datacenter GPUs are supported for production, and support for GeForce and Workstation GPUs is alpha quality.

Sounds like this is not going to be what I game on for at least a little bit?



- Stock market crashing - Crypto being an absolute dumpster fire - Nvidia doing something to support Linux

Catastrophe is upon us!


The end times are upon us!


I'm curious to see how they'll manage updates, so far it's one mega commit.

> Showing 2,519 changed files with 1,060,036 additions and 0 deletions.


That's not really surprising when they are importing to a new repo and they aren't sure there isnt anything confidential (or bad optics) in previous commits or message


from the article:

> With each new driver release, NVIDIA publishes a snapshot of the source code on GitHub

That definitely sounds like squashed commits will be the norm for now


According to README:

> There will likely only be one git commit per driver release.


So is it correct that they are only open-sourcing the kernel-mode portion of the driver and that the real meat of the driver like the shader compiler will remain in a closed-source binary?


Wow! I literally never thought I'd see the day.

I've been on linux for at least a couple of decades now, and this has been a thorn in my side from the get-go. I can't overstate how huge this is!

Great work Nvidia, seriously. It does look like it's not perfect, but damn it's a great step.


I am guessing the Nvidia is more okay with doing this because they have moved more functionality to the userspace components?


They moved more function to the GPU firmware: https://news.ycombinator.com/item?id=31346024


The bits that implement the userland graphics libraries are closed source, unlike Mesa. So this is still useless, but don't get me wrong... I'll take it, but I'm still going to bitch about it. My hope is now folks will be able to adapt these Nvidia cards into the Mesa ecosystem, like ~10 years from now or whatever.


Since the open sourced drivers are explicitly stated in TA to also help the Nouveau driver improve, the latter does not matter that much. That's what Mesa is here for.


watch out the fine print though:

https://twitter.com/marcan42/status/1524615058688724992

34MB firmware

Good, but I'll try to stick with AMD. If only AMD's opencl support was better...


we regret to inform you...

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/lin...

AMD still has closed-source binary blobs in their cards too, as does everyone else. Their userland is also closed-source too, just like NVIDIA's.


Nvidia module does not taint kernel anymore!


For the uninitiated: If the linux kernel is asked to load a module with an incompatible license, it's called "tainted": https://www.kernel.org/doc/html/latest/admin-guide/tainted-k...


> load a module with an incompatible license

Loading any out-of-tree module taints the kernel, for example the vbox modules are GPL: https://archlinux.org/packages/community/x86_64/virtualbox-h...


I saw this message scroll by during booting, somewhere in 2004? I searched for it online and those were my first lessons on foss and the GPL. How great to see this happening. Is it the hack (I don't think so)? Is it the succes of Steam (Deck)? Is is Proton? Deep Learning? Probably all of it…


16xx, 20xx, 30xx, and some enterprise cards. Still great progress.


lol!


I'm confused! Why would Nvidia want to keep any of this closed source. Surely they make money when people buy graphics cards, and having open source out-of-the-box graphics support in Linux would mean they would sell more graphics cards?


They only really care about Linux usage in embedded computers (Jetson) and in datacenters, and Free drivers would allow you to modify them to permit using consumer GPUs in virtual machines. Currently, you need to spend significantly more money for an enterprise GPU that has the same specs as a consumer GPU just so the driver will allow you to use GPU passthrough. They did recently allow consumers to pass a GPU to a single Windows VM guest so you could run a Windows-only game, but you can't split access to the single GPU among multiple VMs.


Drivers are a large part of the development costs of a GPU, just like how an operating system is a large part of the development cost of a general computer.

If nVidia open sourced all their drivers tomorrow the risk of some cheap Chinese shop making clones of their hardware and re-targeting the nVidia drivers to get lots of the features would be very high. It'd significantly reduce the value of what they'd built (to them).

Really, I don't get why so many Linux users ask questions like this. Most software is proprietary because it costs money to develop. This site we're talking on isn't open source. Windows isn't. macOS / iOS isn't. Games generally aren't. Google isn't. Azure/Bing aren't. Open source is the exception, not the norm.


AMD and Intel both have open source video drivers and I really doubt anyone has cloned them.

The non-free features for the Nvidia cards and many other chips are achieved by the card itself running closed source binaries. It is actually probably a better way to protect IP anyway since no one can decompile the encrypted binary (well until recently...).

Open source and proprietary software exist as a duality. It's not an absolute expectation that everything will be open source but Nvidia is very late vs its competitors.

Google, Windows, macOS, etc. all have large open source parts. Games are kind of an exception because they are treated like a single work of art. Also, crucially no other software or hardware has a game as a dependency so interoperability isn't a concern.


AMD and Intel aren't (or weren't at least) shipping cutting edge GPU tech so there's less need to look to them for ideas and shortcuts anyway.

Well I used to work at Google and the open source parts are a tiny, tiny fraction of their actual codebase. Windows isn't really open source at all, although in recent years a few utilities have been opened up - note, only after Windows stopped being so important to Microsoft.

And as for macOS. Well. You can download and read some code, sometimes. Good luck trying to actually build it or do anything useful with it at all. You'll find that it's (a) completely undocumented and (b) all depends on internal stuff you don't have access to. An exception is WebKit.


You mean, like Intel GPU drivers with just the libre free-as-in-freedom kernel and MESA? Blender? Krita? Cinelerra-CV? Darktable? ImageMagick? FFMPEG? KVM/Qemu? Most programming languages and frameworks? Clang/LLVM?

WTF are you talking about?

If any, science today it's made thanks to FLOSS software, propietary software it's the exception. And the trend it looking worse for propietary environments.

Money will come from support and integration, not for the software.

A complex, scientific related reproducible ad-hoc environment for Guix may cost a little more on a single PC than a a propietary OS license and setting up the rest for yourself, but you will be able to replicate that setup everywhere and forever and on a guaranteed basis that once your paper/experiment it's replicated, yo get the same environment no matter where and how. That's the difference.


Krita? You're just naming random open source Linux programs that hardly anyone uses.

Yes, we can all make long lists of open source projects. That's not my point. The point is that the average person, on an average day, is using lots of proprietary software (or proprietary forks of open source software).

I've been hearing about how the future of software is charging for support tickets and hand-waved 'integration' for 30 years. The biggest, richest and most powerful tech firms today all ignored that advice. There's only one company that did well out of that approach and they're now called IBM.


> Krita? You're just naming random open source Linux programs that hardly anyone uses.

More than you think, and ditto with Blender.


Probably their management is too old fashioned to get this.


Nvidia... THANK YOU!


It seems the modules bridge the kernel with the driver [1], so it is this part that is GPL/MIT, the driver itself is still a binary blob. AMD probably does the same? Correct me if I'm wrong.

[1] https://github.com/NVIDIA/open-gpu-kernel-modules/tree/main/...


You're looking at the wrong thing. https://github.com/NVIDIA/open-gpu-kernel-modules/tree/main/... contains the largest parts of the driver.


Can I get an ELI5 on this? I don't do enough in this space to understand the significance. Thanks in advance to whomever can help out!


May this enables the possibility of compiling NVIDIA drivers for ARM-based PCs?


https://github.com/NVIDIA/open-gpu-kernel-modules#supported-...

"Currently, the kernel modules can be built for x86_64 or aarch64."


Nvidia has been providing ARM drivers for a little while and they work on Ampere Altra.


Looking forward to a day when out of the box Nvidia and AMD GPUs just work on Linux at fullpower and features.


I first used OpenGL on linux in 1995. Software-based commercial X server that was unusably slow. By 98 or so I had Mesa, and was running software-based open source OpenGL (still pretty slow, but almost usable). By 2001, I think, I had a FireGL card that was supported in linux doing hardware OpenGL, I think it was a partly open source kernel driver- first time I had competitive performance to lower-end SGIs in the lab). FireGL was then acquired by ATI which sold their cards as high-end and continued the driver. After that I reverted to software for driver reasons, then to the nvidia driver, which I've used on linux for over a decade now. I will give ATI and nvidia credit for having at least some fairly good level of support for linux over the past two decades.


The Devil's putting on ice skates.


Does this mean we get a bit closer to having native, kernel-level Optimus drivers for Linux?


This year just keeps getting more weird


This is what happens when you don't just give up and "be pragmatic". Kudos to NVidia for coming to the table, and I look forward to seeing more of this sort of thing.


Isn't this connected with fact that nvidia got threats from lapsus group to open-source their's drivers unless hacker will publish some stolen confidential data? https://videocardz.com/newz/hackers-now-demand-nvidia-should...


Is this connected to the hack/leak where the hackers requested this very move?

WE REQUEST THAT NVIDIA COMMITS TO COMPLETELY OPEN-SOURCE (AND DISTRIBUTE UNDER A FOSS LICENSE) THEIR GPU DRIVERS”

https://www.theverge.com/2022/3/1/22957212/nvidia-confirms-h...


No, there's no reason for NVIDIA to react to that. Anyone working in the industry even glancing at a leak from the competitor would become a toxic legal liability. It's been a problem for reactos with windows leaks, and it would be orders of magnitude worse for NVIDIA/AMD.


This may be a boon for unlocking hardware features on GeForce. Stuff like below shows that some things are just disabled in hardware, but lots of reverse engineering would be needed to bypass it.

https://www.ibtimes.com/nvidia-geforce-graphics-card-virtual...


With this NVIDIA can move these things to the GPU firmware, making things like vGPU enablement harder.


And just like that my next gpu will probably be NVIDIA. AMD's recent prices have made nvidia look like a better option and the only thing holding me back was amd's amazing open source drivers and the options that came with. If nvidia's drivers become comparable or are well on their way then they would be the obvious choice in today's market.


Finally some good news for free software \o/


This is such good news... I hope we see 'real world improvements soon' !

The amount of time, I was 'sucked' with a black-screen after a reboot and the amount of time I wasted with Nvidia drivers ! I swear I would never buy NVIDIA again !



Well, then... hopefully FreeBSD and illumos will finally start getting support for CUDA.


This. A clutch of really nice laptops fail hard when you include the .ko GPU blobs, and most developers in BSD land shrug and say "use VESA 2D" which is fine, for all but high-DPI use. (ok, its fine everywhere, but sub-par. you use more grunt doing less work to try and deal with dumb video device)

so ok, not exactly "this == CUDA" but this == decent kernel module drivers for the GPU.


Thank you nvidia, we've been waiting for this and you finally came through for us!


Anyone know if this makes it easy to hack in SR-IOV support for desktop class gpus?


Can this be used with CUDA for GPGPU or is it somehow only relevant for graphics?


You might be curious about Kompute++.

But anyway, the user-space stuff is still the same. This only affects the kernel modules.


I don't normally curse on HN but I think I speak for all of us when I say

Fucking finally.


Just out of curiosity, how easy would it to have been to decompile and reverse-engineer the original closed-source modules and then write an open sourced version of them? Would that have been legal?


That's called Nouveau.


Congrats!

Most importantly, will they open source Optimus? Even though the drivers work, getting everything to render on the GPU, and output to the laptop LCD, has always been an inconsistent pain.


Oh frabjous day! I have been waiting for this for literal decades.


That makes ThinkPads with Linux look much different than before compared to MacBooks with macOS. (And save a lot of update/re-configure/re-install cycles...)


I didn’t see anything for Pascal GPUs, are they left out?


Yes, Turing or newer is a hard requirement.


There is no god


Will Linus revisit his rejection of Nvidia integration?


Wow, this is big for Linux & Nvidea isn't it?


Linus' middle finger lowers slightly.


Can I get an ELI5 on this? At time of writing 2018 points but I have no clue what the significance is.


This is great. Anyone know how/if this will/does impact laptops with nvidia dGPU?


How does what is being open-sourced differ from the proprietary binary modules?


What does it mean? I don't know anything about it? Why is it good news?


I wonder what impact this will have on the SteamDeck if any.


The Steam Deck uses an AMD GPU, so it was probably the other way around. I'm betting this was partly done so nVidia GPUs can be used in similar devices going forward.


they are contributing to https://github.com/Plagman/gamescope/pull/454 which is SteamOS session compositing window manager. so maybe in future Steam Deck will Nvidia tech.


can someone point me to the reason why this open-sourcing only covers the newest generation(s) of hardware?


Never though this would happen in my lifetime.


I was worried they had actually decided not to do this after seeing all of the other recent developments and then a pause. Glad to see it was just part of the path to it.


I believe everyone is the comments are like, "Omg! They finally took us seriously telling they are horrible at writing drivers for Linux". Good job Nvida


https://youtu.be/_36yNWw_07g Linus expressing his deepest gratitude.


TIL HN has an excellent duplicate link feature. It's my first time posting a link, and behold, teleported straight to the existing post.


Can someone ELI5 for this news?


Yay!!!


I wonder how much of this decision was due to the hack leaking everything they had.


woah.... that only took like forever. anyhow - good job


Wow. Am I dreaming? That's great.


I wonder if it is the result of the LAPSUS$ hack. This was their statement from early March:

"After evaluating our position and Nvidia's, we decided to add one more requirement."

"We request that Nvidia commits to completely Open Source (an distribute under a FOSS License) ther GPU drivers for Windows, macOS and Linux, from now on and forever."

"If this request is not met, on Friday we will release the complete Silicon, Graphics and Computer Chipset Files for all recent Nvidia GPUs."


I don't think so. Phoronix.com just over a year ago reported that a big GPU maker was going to open source their drivers. I can't find the link but I believe this thing has been in the making for a long while

[edit] found it: https://www.phoronix.com/scan.php?page=news_item&px=Big-New-...


This doesn't actually solve the major standing issue with Nvidia's drivers as far as I'm aware, because the biggest thorn has been the secretful FALCON units, and the code signing of firmware blobs that's kept projects like nouveau from being able to gain traction.

Remember, it isn't enough that the kernel module is FOSS. The firmware is where the crux is.


Radeons have the same issue, tho.


Incredible if extortion actually worked.


I remember reading an article where it was claimed that Nvidia successfully did a “counter hack” on that group, whatever that means.

Considering this is coming well after the supposed deadline, I don’t think it’s related.


>If this request is not met, on Friday we will release the complete Silicon, Graphics and Computer Chipset Files for all recent Nvidia GPUs.

Since the deadline passed I guess Nvidia negotiated, or they were bluffing.


I wonder if that means their userspace drivers will follow soon.


No way they're releasing CUDA or Nvidia RTX code. That'll stay closed source (and I respect that)


Well, the libre GNU world world will be able to use OpenCL with even more performance.


They may be limiting that stuff on the closed source parts or at the firmware level. Nvidia no way will allow full blown OpenCL/Vulkan performance and support on their cards.


Direct GPU assembly programing for even more performance, like with AMD GPUs.

I guess the nintendo switch "may" benefit from GPU assembly programing... that said, AMD GPU isa documentation is top notch and I wonder if nvidia will follow up to there.


Definitely. There are no coincidences in Nvidia strategy.


I thought they had open source CUDA kernels for a minute..


AYYYYYY

Shout out to nvidia for this.


nvidia, fuck you no more!


Hopefully Linus concurs


Ten bucks Linus will curse at them in the near future.


May be but atleast he wont show the middle finger





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: