Hacker News new | past | comments | ask | show | jobs | submit login
The AMD Radeon Graphics Driver Makes Up Roughly 10.5% of the Linux Kernel (phoronix.com)
706 points by akvadrako 10 days ago | hide | past | favorite | 462 comments





My two cents as a kernel developer, the driver is pretty abominable compared to the code quality of most of the rest of the kernel.

However, having a GPU driver not just be open source but in the upstream Linux kernel is a gigantic deal. Kernel development takes a long time, we have millions of lines in the amdgpu driver, and if every one of those dealt with the lengthy review process it would have never made it in the tree.

So it's a necessary evil. I do wish they would clean it up though, I sent a fix to amdgpu once that was the same thing for 3 different files that were largely duplicated. That kind of thing wouldn't fly anywhere else in the kernel


I would also mention that gpus are a GIANT abstraction, and since they are rev'd faster than arguably any hardware in a sytem, there are abstractions layered on top of that for the families and models of gpus too.

Another way of looking at it is -- I started playing with openwrt for a relatively small router, with 5 ports plus wifi.

I was amazed at not only the amount of openwrt code required to support the different router families and the different router models, but at the sheer amount of stuff turned on by default in the kernel just in case I might need to load a module for some obscure feature or package. I assume the same goes for a gpu driver both at the source level and in the kernel.


Yeah AMD/NV typically do 2-3 chips per year. That complexity adds up pretty quickly when backwards/forwards compatibility is fairly strict and the inputs/abstractions not particularly well defined or behaved.

I'm wondering at which point it would make sense to split the driver into multiple device family drivers instead of lumping it all together into a mess of unmaintainable abstractions.

At the point where people are making jokes about the Linux kernel making up X% of the GPU driver.

The opposite. The lack of abstractions, mere industrial copy pasta code is the problem.

I’m not saying it’s true here but duplication over the wrong abstraction is always better. Seems to me if each graphics card is different enough there’s probably lots to duplicate.

In software development absolutes are always dangerous.

Apart from absolutes about absolutes.

Nonsense. This duplication is not maintainable and needs massive amounts of memory. Proper abstraction adds a few if else in the data, and is about 20x smaller. You can even read and understand that, e.g. what changed with this HW upgrade. No chance with duplicated blobs of structs and enums.

> I’m not saying it’s true

Yes, I did caveat my suggestion. Why not submit a fix if it's so simple ;-)


If the Radeon driver is in the kernel now, then maybe at some point someone other than AMD will pick it up and start cleaning up excessive copy paste code.

Assuming they play nice with the community, it could be a huge benefit to AMD in the long run.

Still, 2 million lines is a massive amount of code to start working on.


Unlikely, unless that entity has all the supported hardware at hand and ready for automated tests. Refactoring without a thorough test harness is ref*toring.

isn't that what Google are hoping to do with Fuchsia to make a next generation of Android that's not dependant on device drivers in the kernel ?

I can't see how they'd be able to achieve that, unless you mean simply that the device drivers would run in userland.

Reinventing QNX will always be cutting edge

Are there still workarounds for specific games/programs inside the driver?

You can see all the workaround used in mesa here: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/u...

Propriatery drivers (especially nvidia), most likely has lots of similar game specific workarounds and optimizations (even going as far as overriding shaders in games with better ones they wrote). [1]

1: https://www.gamedev.net/forums/topic/666419-what-are-your-op...


Yes. In fact, that's essentially what the drivers are.

Leaving aside arguments about "what the drivers are", the kernel driver being discussed here generally doesn't have or need that kind of thing. The user-space drivers which talk to the kernel drivers are under the Mesa umbrella as part of Gallium for OpenGL and Direct3D support (e.g. https://github.com/mesa3d/mesa/tree/master/src/gallium/drive...) or as a standalone driver for Vulkan support (https://github.com/mesa3d/mesa/tree/master/src/amd/vulkan). That said, I haven't seen many app-specific hacks in the open source drivers, even in the user-space code.

If anyone wants to learn more about lower-level aspects of GPUs, the Vulkan driver code I linked is one of the best places to start. It directly implements the Vulkan API on one end and talks to the kernel drivers on the other end, so it's relatively easy to follow if you're a systems programmer with an API-level understanding of graphics. Just pick a Vulkan function of your choice and start tracing through the code, e.g. vkCmdDraw: https://github.com/mesa3d/mesa/blob/master/src/amd/vulkan/ra.... The Vulkan driver calls into some of the low-level radeonsi code I linked from the Gallium tree but it isn't a Gallium-based driver, so you don't have to deal with those extra layers of abstraction.


> That said, I haven't seen many app-specific hacks in the open source drivers, even in the user-space code.

They are enabled via driconf [0]. Not nearly as many as what I imagine you'd find in the proprietary Windows drivers though.

[0] https://github.com/mesa3d/mesa/blob/master/src/util/00-mesa-...



I understand how big a deal this is and want to buy an AMD card for my next PC, just to support them, but is the driver actually good? Ie, is support for AMD cards on par with Windows?

The Nvidia driver is crappy, doesn't support Optimus, etc, but at least I haven't had any problems with it for as long as I've used it.


I bought an AMD GPU specifically for use with my Linux workstation and haven't regretted it. Perhaps I simply had bad luck with specific nVidia cards, but the AMD driver is stable in a way the nVidia driver simply never was, especially w/ respect to GPU accelerated desktop environments and screen capture utilities. The only change I made was to switch Arch over to the LTS kernel, as the upstream kernel in Arch isn't quite as battle hardened, and did occasionally require a rollback. That's not something that's likely to affect any other distro though, it's a side effect of Arch's bleeding edge nature.

Anecdotal data and all that. I'm on a Radeon VII, pretty darn solid, will probably continue to choose AMD cards in the future. Wish the Windows driver were a bit more stable, and it's... frankly weird to be saying that in comparison to the Linux driver for the same card.


I have had a similar experience with my laptop that uses a Ryzen 5 Pro 2500U, it would crash once every couple days under Windows, but no such issue crops up using mainline drivers on Debian.

Yeah, from what I understand the difference in driver stability between Nvidia and AMD on Linux is exactly the reverse of their relationship on Windows.

>Wish the Windows driver were a bit more stable

I have an AMD 5700XT in my Windows games machine, and the driver is an absolute travesty. And looking over the installed files is a horror show. Qt5WebEngineCore.dll and avcodec-58.dll, because a browser engine and ffmpeg are essential in a device driver. And why does FacebookClient.exe exist? Fuck knows.


It's even worse with NVIDIA - they ship an entire custom NodeJS with their drivers.

Also don't forget that these are user-space apps that are simply bundled with the driver but not necessarily part of it. Qt5WebEngineCore.dll is most likely used by the UI portion of the driver (settings dialogs, radeon software etc.), same with the ffmpeg dll and the facebook client.

NVIDIA does the exact same btw. - see [1]

[1] https://www.ghacks.net/2020/03/13/nv-updater-nvidia-driver-u...


I think that crap is only included if you install "Geforce Experience". Nvidia parted off their garbage into an optional component, while AMD forces you to install it.

For anyone using Nvidia on Windows, here's a useful tool to carve out most of the trash from the driver prior to installing.

https://www.techpowerup.com/download/techpowerup-nvcleanstal...


Still nothing compared to typical RGB control software, that on Windows only runs while actively logged in (stops when locking the screen/desktop) instead of a tight/light service that uses the "last known" config that updates from the desktop/gui. Let alone painfully missing tecnical docs or support for Linux.

Does AMD make you sign in to open their locally-installed driver utility? Nvidia seems to have thought it was a good idea and went ahead and did that a year or so ago...

No, and neither does Nvidia. "Geforce Experience" is not the driver. It's just bloatware nobody needs.

I have the same card and have issues with locking up and crashing in both Windows and Linux. It looks like the kernel in Ubuntu 20.10 might have a fix for some of issues.

I remember back when AMD had both closed and open drivers and, having trouble with the proprietary drivers, I switched to the OSS version. It was NIGHT AND DAY. Games that would crash and had weird oddities now ran smooth with higher frame rates. A lot of weird video issues, especially those that come from running more than one X server or Xnest, all went away.

I would only use AMD cards in my Linux boxes. The nvidia drivers/cards pale in comparison.


At least on par. That said, the AMD driver on Windows is notoriously crap compared to Nvidia.

I'm hoping AMD's next gen turns out to be competitive with the RTX3000 series for my next GPU for the same reason.


That is really interesting, I haven't had an AMD card for >10 years now and would really like to be rid of Nvidia due to the closed source drivers. How is suspend/resume? Thinking about canceling my pre-order queue with EVGA and getting a 6xxx card.

For system integration stuff like suspend/resume, display hotplug and resolution changes, etc, the open-source radeon driver is good, and probably the best option on linux. The 3D accel is not bad (but not as good as nvidia).

However, don't expect a new Radeon GPU to be well supported on day of release, expect 1 kernel release cycle until it basically works, and one more until it has most of the bugs ironed out, and then wait until your favorite distro gets that kernel. So you're looking at 3 to 9 months depending on what distro you use.

I'm personally going to be looking for people selling their RX 5700, to replace my RX 480 ...


Agree mostly, but the best option under Linux seems to be Intel graphics (or at least was until a few years ago) - arguably not beefy enough for some things, but regarding supported features, stability and power consumption the best supported mainstream gpu in the Linux kernel.

Intel simply has no closed source driver for Linux. New hardware is often supported/merged before it is even sold. AMD is trying the same, but not there yet.


The Intel i915 driver STILL crashes my system regularly even on the latest kernels... I have a skylake i5-2600k and the iGPU is absolute dogshit. Not sure if it's a hardware or driver issue but it still hasn't been sorted out after all these years.

Typically the entire system will freeze (speakers will continue to play whatever was in the short audio buffer - pretty awful) for 10-15s, then the driver will detect the hang and reboot the iGPU. Happens much more frequently (every ~15m) when using more graphically intense programs. I can't use blender because sometimes when it hangs it won't reset and requires a full reboot.

There are dozens of issues about it and related problems in Intel's drm fork of the kernel [0]. I (finally) posted a bug report about it months ago since it seemed to have gotten worse after 5.4 but never heard back from them.

All this to say - be wary of Intel graphics on linux.

[0] https://gitlab.freedesktop.org/drm/intel/-/issues


Ever since kernel 5.7 was released my i7-5500 will not boot. (Well it will boot with “nomodeset” option but then X doesn’t work so not very useful.) It’s still not fixed in 5.9.

Wouldn't even say that, I've experienced regressions/bugs on intel drivers for laptops a few times.

In general, it's kind of a crapshoot no matter which way you go, and expect pain if the gpu chipset is less than a year old.


> The 3D accel is not bad (but not as good as nvidia).

How is that? I think Mesa provides state of the art OpenGL and Vulkan support, especially with work on ACO. Nvidia doesn't have any edge in that anymore. They did a few years ago still, but not today.


Last time i checked (which was about a couple of months ago), Mesa had very primitive support for display lists (most of the time you get a command playback though if you only submit vertex commands it gets converted to VBO - and i think that was added recently-ish) whereas Nvidia's perform optimizations in background threads to convert in the best GPU format, split as necessary to minimum calls and when rendering it performs culling before processing the full list. AMD's Windows drivers also do some of that stuff (though not all).

Mesa does implement a lot of stuff but they do not take much advantage of what the higher level parts of the API allow to optimize rendering. From what i remember until AMD pushed some devs on it, they didn't care about supporting the entire API at all.

Vulkan support is most likely good though.

(EDIT: yes, "display lists are deprecated", but this is irrelevant, the API is there, available and works and works great on Nvidia and still very good on AMD Windows driver and a lot of applications use it - Khronos splitting the API to core/compatibility was a mistake that made everything more complicated than necessary when what they should have done if they wanted a clean API would be to make something new like they eventually did with Vulkan and avoid messing up OpenGL )


> Mesa does implement a lot of stuff but they do not take much advantage of what the higher level parts of the API allow to optimize rendering.

There is always more that could be optimized, especially when it comes to niche use cases, but generally Mesa/radeonsi do a decent job of making things fast.

> yes, "display lists are deprecated", but this is irrelevant, the API is there, available and works and works great on Nvidia and still very good on AMD Windows driver and a lot of applications use it

By "lot of applications" you mean some workstation applications that refuse to upgrade their code. You can still use AMD's closed source driver on Linux if you need optimizations for those. If you don't (and most people won't) then Mesa works extremely well.

> Khronos splitting the API to core/compatibility was a mistake that made everything more complicated than necessary when what they should have done if they wanted a clean API would be to make something new like they eventually did with Vulkan and avoid messing up OpenGL

You could argue for drivers not providing newer features in the compatibility profile (and Mesa did that until recently) but as long as there are customers demanding support for newer features while refusing to move off the older APIs, this is what you will get. I don't think having OpenGL Core and OpenGL Compat sharing some of the API hurt anything here.


> There is always more that could be optimized, especially when it comes to niche use cases, but generally Mesa/radeonsi do a decent job of making things fast.

Sure, i didn't dispute that, what i wrote was that Nvidia's drivers are faster in some cases based on code i've actually seen. And they used to be slower until not too long ago in that case too, so it isn't like they aren't improving. But still Nvidia's implementation is faster.

> By "lot of applications" you mean some workstation applications that refuse to upgrade their code. You can still use AMD's closed source driver on Linux if you need optimizations for those. If you don't (and most people won't) then Mesa works extremely well.

I mean games, applications and tools, not workstation applications. Not every application uses the latest and -rarely- greatest version of everything out there nor all applications are always updated - or even under development (especially games). Those that are may have other priorities too.

But why an applications uses some API is irrelevant, the important part is that the API is being used and one implementation is faster than another, showing that that other implementation has room for improvement.

> You could argue for drivers not providing newer features in the compatibility profile (and Mesa did that until recently) but as long as there are customers demanding support for newer features while refusing to move off the older APIs, this is what you will get. I don't think having OpenGL Core and OpenGL Compat sharing some of the API hurt anything here.

My point was that the split itself was a mistake (it isn't like splitting OpenGL into Core and Compatibility was a mandate from heaven -or hell- it was something Khronos came up with) and the hurt was that it make things complicated for a lot of people (e.g. not everyone cares about having the best performance out there - some applications are, e.g., tools that wont even come close to using even a 1% of a GPU's power, but they'd still prefer to rely only on open APIs instead of some proprietary one or some library that may be abandoned next year - code written for OpenGL 1.x 25 years ago can still work fine in modern PCs after all) and split the OpenGL community into two "camps".

This created issues like libraries and tools only supporting one version or the other, tons of bugs and wasted time for "integrating" to Core (or supporting both Compatibility and Core), invalidating a ton of existing knowledge and books (OpenGL being backwards compatible down to 1.0 is very helpful since you can always start at the beginning with something proven and work your way towards more modern functionality in an as-needed basis) and at the end all of that was a huge waste of time since everyone outside Apple decided that Compatibility is necessary - and Apple decided that splitting OpenGL in two halves wasn't enough, so they made everyone's life even harder and came up with a proprietary API all on their own.


ACO developers will work on OpenGL at some point too. OpenGL in general isn't the case I worry about, as long as it performs sufficiently well. All modern things should be using Vulkan anyway, especially if something requires focus on performance.

And deprecated features? I think there are better things to focus on first optimization wise.


Well, the original comparison was with Nvidia's driver and Nvidia has a much more optimized driver.

Also it is much more practical (and realistic) to have a few devs optimize a handful of API implementations than expect the thousands of devs who work on thousands of applications to do that (also why OpenGL etc isn't going anywhere).


> Well, the original comparison was with Nvidia's driver and Nvidia has a much more optimized driver.

I wouldn't say that. In all common cases they don't. And as above, deprecated features is the last thing I'd start comparing that on. If you use something deprecated, worrying about performance shouldn't be the case, rather you should worry about rewriting your code.


That sounds just like sour grapes :-P "Mesa is as fast as Nvidia" "But they are slower in these cases" "That doesn't count".

At least on the hardware that I've had, it's basically rock-solid in practice. I use it with high-refresh-rate monitors, I've tried FreeSync and that works, it works with all my displays, and recently the older of my GPUs (Radeon Pro WX 7100) finally got audio output over DisplayPort, as the newer of them (Radeon VII), though I never really had any use for that feature.

The acceleration, particularly with RadeonSI and RADV, and particularly as the RADV developers (independents, Valve, and some smaller companies I wish I remembered the names of) have been making massive improvements on the shader compiler side. RADV's own shader compiler (ACO) is noticeably better than the first-party AMD LLVM stack, and RADV is substantially faster than any of the first-party AMD Vulkan drivers for both graphics and compute workloads. I hope ACO in RadeonSI becomes a thing, I think it will be a major improvement.

Message to anyone listening from AMD: maybe look into making ACO your primary target rather than LLVM, it is clearly a better design for your GPUs, it has substantially less overhead, and there's no legal reason it can't be a part of all of your drivers.

As for kernel support, it is often same-day or at least it can access the displays on launch day, provided you have the latest stable kernel. ArchLinux is rarely that far behind a new stable kernel release, so on ArchLinux, same-day support of one form or another, and full support that day or some day soon, is the norm.


Suspend / resume works fine with my Sapphire Pulse RX 5700 XT.

Is it really crap? I have it and it feels stable and the Crimson UI seems well made. It feels way better than the Catalyst days.

It is crap enough for me (RX 5700 XT user) to keep a backup of the few previous successful drivers so that when one inevitably breaks things i can roll back to a previous driver.

Some issues i had with a variety of AMD drivers on my current PC from the top of my head: turning on the monitor before the PC would cause the GPU to not realize there is a monitor attached, letting the monitor to go to power save mode would also cause the GPU to think the monitor was lost, settings for display scaling would be lost after every full reboot (full=real reboot, not the fast hibernate based one Win10 do most of the time, you get a full reboot after updates, some installs, etc), random full system hangs when trying to play GPU accelerated video (which is pretty much most videos on web as well as some applications like Microsoft's new XBox Games app), random reboots too, etc.

So i tend to be careful with updating the drivers. Last issue i had wasn't as bad the random hangs/reboots (which fortunately hasn't happened recently) but i simply couldn't launch the crimson UI at all. I had to do a full reset and reinstall of the drivers for it to appear again.

In comparison updating to the latest Nvidia driver when i had an Nvidia GPU (which was since early 2000s to ~2 years ago) was basically a non-issue: i wouldn't even think twice about it as i never had any issue.

And FWIW that was the same on Linux too: i never had issues with Nvidia's drivers there either and performance was more or less the same (at least for OpenGL stuff). But note that i avoid stuff like Wayland, hybrid GPUs, etc like the plague.


turning on the monitor before the PC would cause the GPU to not realize there is a monitor attached

I have a similar issue with a Dell display attached to an AMD card. After suspending the PC, the monitor does not detect the PC at the other end of the DP cable, except for Amazon Basic cables which work for some reason. Digital standards are weird.


I've had all the same problems with my recently bought DisplayPort Monitor (previous ones were all HDMI and worked flawlessly).

The fix for me was switching from Xorg to Wayland. Haven't had a problem since, apart from Steam not liking it all that much.


Interesting you mention this standby issue. I just moved a monitor from an nvidia setup where it had zero issues.

Now when you turn the laptop (with Radeon gfx) on, it requires me to turn the monitor off and on before It is recognised.


This back and forth in this thread about nuisances like that is one of the reasons I am definitely sticking up to Intel integrated GPU when running Linux. It's 2020 and stuff like that should be much smoother :-(.

Note that in my comment above i was referring to the Windows AMD driver. I haven't used Linux much with this machine (though when i did it had a 50/50 chance to completely hang the system, but i think this was an issue with the kernel and the then-new Zen APUs that was quickly fixed).

I have Lexa PRO in my workstation (Fedora) - Suspend/Resume works so far.

I have an issue though where switching off the monitor for a few days might make the AMD card disabling the outputs and not recognizing the monitor afterwards (I think it is related to the order in which I try to "wake" the monitor) - which I cannot recover from without rebooting the machine.

But this is with a machine never going into suspend or any sleep state - and I can't say if this would be the same with the NVIDIA card. I do not use the NVIDIA card for video output because the proprietary driver would regularly stop showing my desktop - or suddenly any output at all after reboot.

The integrated Intel GPU on my laptop is mostly without issues whatsoever.

On laptops I would still recommend Intel GPUs anyway for power consumption reasons - although AMD APUs are quite interesting and I don't have recent knowledge about how well they compare. The CPU and its ability to lower power consumption under sleep is also relevant there, and this was way better under Intel so far. Unless you need the increase in performance an AMD GPU/APU would offer...


I have a similar issue with Nvidia on Linux. My larger display is slow to start, so I have to rerun xrandr after suspend in order to get it working.

I remember the Catalyst days. I used to work for a company which included a pc in the price when selling its software. We unofficially supported people who would run it on their own PC but eventually had to put our foot down and explicitly state that we wouldn't support AMD cards.

Hmm, that's a low bar, huh. Is AMD on Linux anywhere close to Nvidia on Windows?

The thing is, Nvidia also has issues, but their PR game is historically better. Many graphics developers have had experiences with Nvidia support where they run into a strange bug and are instructed to set a magic value to enable a driver hack. AMD drivers have had good and bad periods and hacks of their own, but are usually better behaved in this respect. But it's actually Intel that gets the most praise for adhering to spec, and therefore being a useful baseline. So user perceptions and dev perceptions diverge on what makes the drivers good, actually, and this has shifted with the different generations of APIs too; as we've gone towards a lower level access model, the basic driver functionality has become less focused on performance hacks, but there is a lot of legacy support there to support old games.

We're long past the worst period for Radeon on Linux which was back in the 2000's with "fglrx" - a driver that I never managed to get working. The new stuff will run with some competence.


I recently bought a RX 5700 XT, and installed it on a computer that first ran Linux and then Windows 10.

In Linux, the driver (including audio) seemed very robust, but I didn't find anything like a detailed control panel for the card's graphics features.

On Windows, the AMD-supplied control panel has plenty of knobs and buttons, but the driver itself seems less robust, particularly w.r.t. audio-over-HDMI.


That's very informative, thanks. I wonder if there's a cli utility on Linux instead...


Thanks for the links; I will definitly test those out

Sure. For some reason they aren't packaged yet in common distros, so that makes them not well known.

Maybe have a look at corectrl, which aims to create a beautiful control panel for graphics cards.

What about: is AMD on Linux anywhere close to Intel on Linux? No games, just 3D acceleration for the desktop, bug-free suspend and resume, etc.

Maybe I'm just lucky, but I have not had a single issue on Windows with my RX560. I know AMD/ATI drivers used to be horrible on Windows back in the day, but I really think they've gotten a lot better, I'd say on par with Nvidia's.

It is not. My 5500 XT is unusable when using 2 monitors. https://gitlab.freedesktop.org/drm/amd/-/issues/929

Apparently AMD doesn't have the resources to debug these millions lines of code, since this has been open for a year now.

Yet people still say NVIDIA on Linux has issue. They don't support Wayland and tend to lag behind with Linux only tech in general, but the driver itself is top notch. I haven't had an nv driver crash on Linux in 10 years. It's only the same echo chamber borne by the famous moment of Linus flipping NVIDIA the bird.


My experience is 180° opposite to what you describe.

I never had a mentionable issue with AMD cards since switching to the open source driver approx. a decade ago. I have a NVIDIA 1060 card in my workstation for CUDA - every single time I put it in running state again, I have a realistic chance of completely borking my system.

In fact I had an AMD card installed after the first two incidents, simply to have at least a chance of having working video output when the NVIDIA driver once again doesn't want to talk to the kernel.

That and the whole practical implications and idealistic differences of having an (mostly) open source driver vs. a (mostly) closed source driver (I think we can agree that the open source NVIDIA driver is out of the discussion).

Obviously you might run into problems if you try to run very recent hardware right after availability. Kernel driver development is not ideal for cutting edge hardware and some things might break and it might need some time for your distro to ship the newest kernel/driver.


The Nvidia driver started supporting proper optimus at the begining of 2020 (can runs apps on integrated and dedicated cards simultaneously). I use it regularly on my XPS 15 (to play Kerbal Space Program). It's called "DRI PRIME". You have to set an environment variable for starting and application saying what GPU you want it to run on.

I am, however, very much looking forward to the new AMD GPUs. Hopefully the RX 6000 series will be near a 3080 in more than the 3 hand picked games in their teaser. Would love to use Wayland on my desktop.


That's interesting, thanks, I tried to use Optimus on my XPS years ago but it wouldn't work. I'll try it now, thanks!

Search for “amdgpu ring gfx timeout”. There seem to be a whole class of bugs that have been open for years which not only haven’t been fixed, but there isn’t even any clear indication of what the root cause(s) is/are.

I tried a couple of different AMD cards, and my machine crashes on resume if I try to use either of them (but the Intel iGPU works fine).

Searching for amdgpu bug reports leads to:

https://amdgpu-install.readthedocs.io/en/latest/install-bugr...

which links to a page saying "Bugzilla is no longer in use" :-(

This is under Qubes/Xen, though, so maybe that causes extra problems. If any devs are reading, I did report it here in the end:

https://github.com/QubesOS/qubes-issues/issues/5459


It could be misbehaved applications. While AMDGPU and Mesa are much faster than AMD proprietary driver (on some OpenGL workloads I have seen 2x improvement compared to AMDGPU-PRO or Windows driver) and are normally stable, I had several issues where bad shaders brought down whole GPU (with "ring gfx timeout"). Things like out-of-bounds access or division by zero.

I upgraded from a Geforce 460GTX to a Radeon RX560, and I ran into two issues. Nothing major, and I've had worse issues with the Nvidia drivers, but they are still something to be aware of.

The first was that my distro (KDE Neon based on Ubuntu 18.04) shipped an older version of Mesa at the time, which was too old for the AMDGPU driver, so I had to add a PPA with an updated version. Since Neon updated to a 20.04 base, it works straight from a clean install. It also worked with no issues when I switched to openSUSE Leap 15.2.

The second was that DVI output was limited to single-link instead of dual-link. My monitor at the time only supported full 1440p through dual-link DVI or displayport, and the old GPU didn't have displayport. Buying a displayport cable was a quick fix, and I believe the DVI issue is fixed in the driver now.

Aside from those two minor hurdles, it has been smooth sailing, very good OpenGL performance in the games I play.


Not sure if this is a driver problem but there's a LOT of general usability issues on AMDGPU + Linux. The default thermal control being absolute catastrophe for one.

How is it a catastrophe? I game every day on AMD on Linux and have no issues. 99.9% of consumers don’t care about overclocking so if that’s what you’re referring to I think it’s a non-issue.

It runs 75C at idle because the fan curves are wonky.

AMD has caught up to Intel but still lags behind nVidia (on Windows at least). I'm just not sure they can fight a two front war. Something has to give.

If we're talking about CPU's wrt Intel, and GPU's wrt nVidia, I think they'll do fine- IIRC, they're both separate internal groups with the same overall leader (Dr. Su).

Wait a few months after a new GPU comes out, maybe until the next major version cycle (like if you want a new card that comes out in November, wait until the 21.04 Ubuntu/PopOS release).

I bought my RX 5700XT shortly after release, and was using alpha/beta kernel releases and downloading extra files manually for several months after to run, then an upgrade/update may turn into a blank screen on boot for me. It also broke out of the box support for running full VMs, which was pretty painful for me as well, and I wasn't going down that rabbit hole to try and build it myself.

YMMV of course.. but that's just my take on it.. I bought specifically for Linux support, but took a few months to shake out.


Have you tried Nvidia on-demand option for Optimus?

It is good enough. I'd say overall Nvidia's driver is worse.

I have returned 2 Radeons that I bought for specs but returned because the drivers were bad enough that I couldn’t get the same-clock performance as Nvidia or worse dealt with driver crashes and system reboots - note that this was between 10 and 20 years ago. I am highly considering trying again at eom when they announce the new cards but that it’s a Radeon is still a downside to me.

Most of the Linux community has a historical hatred of Nvidia because of the driver issue so there’s a lot of relative love out in forums, but just “stable” would be a step up for me for Radeons on windows.


I recently built a system based around a Ryzen 5 3600 CPU and Radeon RX 5600 XT GPU, and in both Windows 10 and Linux with a 5.4+ kernel it's rock solid. Gaming in Windows is simply amazing and it pairs well with my 1440p monitor. On Linux gaming is also extremely good, with only a couple of "Windows only" titles acting buggy under Proton/Steam. Considering Proton itself is in its infancy, that's to be expected.

With native performance on official Linux games on par with or better than the Windows equivalent, and more and more games getting Linux ports due to Vulkan, I just about have no need to boot into Windows at home anymore apart from Fusion 360.

As a workstation in Windows, since I don't overclock I don't see any stability issues. Fusion 360 is fast and fluid unlike my 8 year old Sandy Bridge dinosaur at work, even after adding a GT 1030. Good quality Crucial RAM and a no-frills AsRock B450 board make for a rock-solid build. Ditto on Linux as a workstation, everything just works and works well, and it's superb for 3D modeling and music creation (two of my main hobbies).


Good to hear that things have gotten better! Will be watching the oct 27 reveal of the new cards :)

I'm also very interested on giving my 2080 Ti to my partner (a Windows user) and getting the fastest next gen Radeon to myself.

It is not on par at all - my 5600 got annihilated by driver issues.

AMD has incredible CPUs, but just buy an Nvidia GPU - especially if you are using linux.


Nvidia has subpar support for Wayland on Linux because it uses its own EGLStreams buffer API instead of the standard GBM buffer API, which is better-supported. Both AMD and Intel use GBM.

Also, the open source driver for Nvidia (nouveau) has incredibly poor performance compared to Nvidia's proprietary driver, and lacks essential features such as reclocking for recent hardware generations:

https://nouveau.freedesktop.org/PowerManagement.html

AMD's and Intel's open source drivers are their primary offerings on Linux and have good performance across all hardware generations.


Intel has actually gone downhill lately, especially for prior generations. I've had to live with 5 or so years of tearing with multi-monitor support on Ivy Bridge, and even single monitor tears inexplicably with some software (that shouldn't). The Intel Xorg driver is unmaintained and the generic modesetting driver doesn't work quite as well. When I first got my Ivy Bridge system, triple head mode didn't work for a while either, so it's not like they have great support when the hardware is current either.

I've switched to AMD now and things are much better. Go with AMD.


The Xorg modesetting driver works quite reliably on Intel in my experience.

The SNA acceleration architecture in the Intel Xorg driver was a disaster in terms of correctness and stability. When SNA appeared as an option it initially seemed quite fast, but didn't take long to reveal it was also quite broken vs. UXA.

I used to explicitly use UXA but for the last 5-10 years simply using modesetting has been the way to go.

Personally I think you're conflating Xorg and kernel driver issues. Xorg is basically unmaintained in general now and unfortunately SNA was the last major development in that context for the Intel driver, and it was not good.


This doesn't apply if you want to run CUDA-dependent software. I've generally gone for Nvidia for my personal machine since Torch has behaved oddly on AMD cards in the past.

It's true that Nvidia doesn't support Wayland properly, but that's not really an issue in my opinion. Wayland still has its own problems that mean switching from X11 isn't viable yet.


Although your argument is valid, are we talking about CUDA? Obviously CUDA is an NVIDIA thing under all platforms, right? I don't think anyone would buy AMD with the intention of running CUDA.

Regarding GPUs and how good they work under Linux, computing on GPUs is only a part of the discussion I would argue...


What issues have you had with Wayland? Switching to it has given me a tear free experience on both AMD & Intel laptops, besides that it performs similar to X11.

> tear free

> 5 or so years of tearing

I know what people are referring to, but a less geeky person might come away from this thinking people get very emotional about bad Linux graphics drivers.


My main problem with it is limited software support. Xmonad isn't available and as far as I can tell what support exists for screen recording and screenshots is half-baked at best. I haven't seen anywhere near enough problems with X11 to make switching window managers worth it, and the screen recording thing would be a massive pain to work around.

I'm still on an Intel system (skylake) and my experience is similar to yours. 5+ years of bugs and crashes, tearing, multi-monitor headaches and general instability.

Eagerly awaiting the new AMD hardware.


I've found the wayland server to be a great experience with intel—the only weird bits I've seen is full-screen noise on firefox and poor support for high dpi, the latter of which is even shittier under X11. The server is really very usable nowadays.

AMD's ok if you have the room for the discrete card, but I wish they would invest more in integrated on-board chips.


Modern AMD GPUs work better on linux than nvidia. No tearing, multi-monitor works, and vulkan is very smooth. Nvidia is actually less stable, and has some peculiar quirks, such as needing composite manager running to get rid of tearing, spotty multi-monitor support, etc..

You are dismissing people saying they ARE having issues with AMD on Linux. In fact my AMD card does not do multi-monitor, and in this thread I'm not the only one that has multi-monitor issues on AMD.

Which card are you using? I'm aware the older cards are still bad. Especially if you still need fglrx. In my personal experience, the modern AMD GPUs on linux is first time graphics have worked reasonably well on linux. Even intel drivers are riddled with bugs and instability (not to mention they still don't even do gallium). GMA 3650 (powersgx based) being the most infamous worst driver ever.

A 5500 XT bought in June, so not old at all. I've heard the opposing argument, that since it's a relatively new card (out since Dec 2019?) I should expect some bugs, which is insane one year later. It's actually unusable, I have to log into my machine via SSH to restart it, or force reboot. It might break after 30 minutes or 3 days, when idle or busy.

https://gitlab.freedesktop.org/drm/amd/-/issues/929

AMD developers in that thread are chasing their tails and still haven't figured out why so many cards are having issues, and why other aren't, but as a consumer, that's really not inspiring at all.


Funny, I have 5600 XT (Sapphire Pulse) and it runs like dream. The out of box experience with Linux has been very good. Note that some of the aftermarket cards are actually bad and the instability might not be software related. Before 5600 XT, I used R9 290, and while it did require some tweaks to enable all features (due to being older card), it still ran relatively stable and in general was better experience than any nvidia card I had used in past.

This guy is having the same issues I'm having with a 5600. Multi-monitor, entirely new computer built a couple months ago.

Randomly locks up, random black screen, random rainbow colors all over my monitors.

With my new Nvidia 2060 which I bought to replace it; nothing. No issues. Works just fine on Manjaro.

For whatever reason, the AMD cards just get clapped on Linux.


My experience with linux is that the nvidia drivers and support are the worst of the bunch, and if I had a nickle every time I could trace a kernel panic through their driver I'd get a very nice lunch. Their popularity seems to be driven primarily by exclusive access to CUDA APIs and windows gaming. Nouveau is OK for accelerated 2d but is hardly in the same ballpark as the AMD drivers.

That said I just picked up a quadro (not my choice, came with a prebuilt NUC) and I've been pleased to find that it "just works" on freebsd (I use it to realtime transcode video), so clearly great experiences are possible and I don't want to be needlessly harsh.

Personally, I'm dying for a discrete intel card. I can't recall any hiccups with intel chipsets, ever, and that matters WAY more to me than raw performance.


> the driver is pretty abominable compared to the code quality of most of the rest of the kernel.

Could you say more about what specifically makes the driver abominable? Is it just those files with largely duplicated code?


duplication 3 times with small differences between is a good case to keep separate imo.

abstraction is one of the main sources of code complexity.

you start with one function used in 3 places, then add boolean args to it to get slightly different functionality at each place, eventually it becomes a mess of complexity


I think that's very subjective and situational.

The amdgpu driver has duplicated files for different versions of things, so it'll have thing_v6.c and thing_v7.c and thing_v8.c with a lot of duplicated functions.

The more common way of doing something like this would be to have structs of function pointers that get populated based on what version of GPU you have. You have one file with all the common functions that they can share, in the definitions for each GPU version you set the majority of the function pointers to the common version they all share and for ones that have to be different, you set them to their unique version. That way you can define all the common functions once, and point to them in the structs for each version.

Having a quick flick through the code now, they do use structs of function pointers in each version for common operations but they still don't abstract out the ones that are either identical or have very few differences that you could special case.

Refactoring such a giant driver for no performance gain is going to be extremely low on AMD's todo list, so it'll probably stay like that. It just doesn't look like anything else in the kernel


This is literally what what everyone does in embedded C land. The repetitive definitions ate generally intended to be used with macros and are typically generated from the same definitions as the chip registers itself. Some places also auto generate embedded c/c++ structs or classes which imo is better. But I have gotten quite a bit of pushback for doing it.

A big issue also is the use of bitfields as much as reg duplication. Bitfields in c/c++ are a minefield if you don't lock down a known-good compiler version because there's just so much of it that's technically unspecified. Oftentimes you'll also have issues where certain register fields exist for some registers of a series and not the next or where the functionality/sizing/interpretation is context dependent or where certain locks or write orders are needed for correct access and these are often handed with presence checking macros.

IMO, if we want better driver code, it's time for GCC/Clang to nail down the bitfield layouts for the embedded use cases. This has been broken for far too long.


Sounds like an excellent way for someone looking for something to contribute to get their code into the kernel though

It would be very difficult to get accepted. You'd have to get the AMDGPU driver maintainers on board, and you'd probably have to do a lot of it at once to justify the change. It would also take some discussion, and you're talking about refactoring a lot of stuff which probably moves underneath you during this, so you have to keep iterating to keep up with the changes, all without knowing if they'll even end up taking it...

Changes like this are probably a good way to get started but I would guess the AMDGPU driver is one of the worst places to get started as a beginner.


I mean, each new version is separate, correct? So the only change that can happen under you is when something is backported. How often does that happen for a gpu driver, and how far back does that go?

Or you duplicate code in 3 places, and apply the same fixes or updates in 3 places for all of eternity. There are pros and cons to both methods and each have their places, no need to start this constant debate here.

That's why this approach can tend to be a positive for driver versions matched to hardware iterations: a given fix may or may not apply to a given hardware config, and likely has to be tested against each config separately.

It's one of the unusual circumstances where, unfortunately, abstraction can decrease flexibility and increase development time.


Proverb: "A little copying is better than a little dependency." (Rob Pike)

That is, it's better to have duplications than the wrong abstraction. This may also be in reference to C compilation, in that loading header files and dependencies costs more than inlined code. That's one of the goals that the Go language sought to resolve, anyway.


> Though as reported previously, much of the AMDGPU driver code base is so large because of auto-generated header files for GPU registers, etc. In fact, 1.79 million lines as of Linux 5.9 for AMDGPU is simply header files that are predominantly auto-generated. It's 366k lines of the 2.71 million lines of code that is actual C code.

Why not generate it during the build? Is there a good reason not to do that?

It was generated by the hardware division. These are the registers that are authorized for disclosure in the open-source driver by the AMD employed open-source driver developers.

So, it includes many times more register definitions than are ever used (consider there are 8x more register definition lines than actual code lines that could use them) and it includes many sets of 16 or 64 definitions that a software developer would have made one parameterized definition (all the same except for _00, _01, _02, _03, etc). But this is exactly what the hardware guys generated for public release, and it is to be used as-is.

IMHO it's kinda annoying and sad. The rest of the kernel is held to a higher standard, that's why all the other non-trivial multi-arch multi-family multi-generation code in the linux kernel is much more concise / less sloppy. It takes a lot of effort to make it that way, and commercial companies pretty much never bother, except when required by the Linux maintainers.

But, modern graphics drivers are way too complex and way too much work, and most people do want some proper modern GPU support in the kernel, so compromises have to be made. It's not too bad, just a bunch of inert header lines, git and the compiler handle them just fine I guess.


Isn't that the beauty of open source, when if someone has severe OCD they could just spend their time tidying up the kernel driver instead of watching mind numbing telly?

I'm not sure what the score is - if these things were tidied up would AMD still be able to upstream their own changes or do they take back fixes from Kernel devs? Seems likely a complex political process...

> It was generated by the hardware division. These are the registers that are authorized for disclosure in the open-source driver by the AMD employed open-source driver developers.

...which is arguably not compatible with the GPL:

"The source code for a work means the preferred form of the work for making modifications to it."


It's not applicable, in practice.

This is the published hardware interface for the driver, the formal public contract. You can't change it without changing the hardware itself.

If you really want to run the generator... well, the preferred form for modification is open to interpretation and if it's some proprietary tool then just getting the output is preferable to a dependency. Sometimes the rabbit hole is too deep, and we have to draw a line.


> ...which is arguably not compatible with the GPL:

From what I can tell, most if not all of the driver is licensed with an MIT-style license. But even if it was GPL, AMD would be the licensor, so it gets to decide the “preferred form of the work”.


"Preferred form for modification" is a form that is suitable for a skilled stranger to modify it with little exposure.

What I meant is that the copyright owner is not bound by the terms of a GPL license he grants to others. Similarly, a licensee who receives software from the copyright owner under a GPL license cannot compel the copyright owner to do anything.

An author that licenses the software under GPL, but does not release the source code in that format cannot legally incorporate outsider contributions into his GPL'd work as he would be in a position of infringing the derivative work author's right.

> a licensee who receives ... GPL license cannot compel the copyright owner to do anything.

Unless licensee in question has also contributed to a published revision of original licensor's code. And for that to work (remember the wording "preferred form for modification"), you need a form suitable for modification by a skilled stranger with little prior exposure to said work. You would otherwise get different preferred forms of modification of each contributor, which is unworkable.


> you need a form suitable for modification by a skilled stranger with little prior exposure to said work

That’s a nice idea, but it’s not a condition of the GPL. GPL v2 and v3 both only state, “The ‘source code’ for a work means the preferred form of the work for making modifications to it.” That definition exists because without it a licensee might try to argue that distribution of modified and then obfuscated code satisfies the source code offer condition.

Regarding a project licensed to others under the GPL, if the project owner accepts contributions under the GPL, then he becomes a licensee of the contributions. So, as you pointed out, he would need to meet the “preferred form” clause and other terms, at least as regards to the contributed portions. As you might expect, for a substantial project with many contributors, this could become very complicated. Therefore, many projects require contributions be made under a more liberal license (or even a copyright assignment) that allows the contribution to be sub-licensed to others without conditions.


> Therefore, many projects require contributions be made under a more liberal license (or even a copyright assignment) that allows the contribution to be sub-licensed to others without conditions.

Most, but not all of European jurisdictions, have a legal stipulation that all copyright assignments are either void or revocable even if the assigner says otherwise, except for work-for-hire. You therefore cannot release yourself from preferred form even if you required a copyright assignment, otherwise you will get stuck in the case any further published modifications to your work, not only for the contributions, but any part those modifications that interact so much so that they are inseparable, even by the original licensor, may become illegal overnight. As GPL does not state "the form deemed preferred for modifications by the licensor(s)", but " preferred form ... for modifications", you need to apply that objective definition I stated above. It would be nice if they explicitly stated that way though, relieving a lot of load from judges in resolving a possible dispute on which forms are preferrable for modification and which are not.


It may help to think about who can sue whom. Generally only a copyright owner can sue an infringer. A license operates as a defense against a claim of infringement. If a licensee fails to meet a condition, then the license is invalid.

So, in the case of the project owner who (1) starts out owning all of the rights to the project, (2) incorporates code licensed from a contributor and, (3) distributes the combination, the only person who could possibly sue the project owner for copyright infringement is the contributor. The claim would only pertain to the contributor's code, because that is the only part he owns the copyright to. The project owner/defendant would raise the license as a defense and the key question would shift to whether the owner/defendant violated any of the conditions of the license.

Where the license is the GPL, one of the conditions is partially affected by the "preferred form" definition of source code. The court would look at what the owner/defendant did and whether he met that condition. Importantly, the condition and "preferred form" definition would only be considered in relation to the plaintiff's code; the owner/defendant's code wouldn't be relevant.

Regarding the contributor's code being "inseparable", that will not be the case for one very simple reason: If the contributor sues the project owner, then he must identify which portion of the code he is suing about. If he can't do that or can't show ownership of it, then he will lose.


> license operates as a defense against a claim of infringement

It works like that in fully assignable IP jurisdictions (like USA), but it works like a contract of adhesion in the author's compulsory rights jurisdictions (like Germany and Czechia).

What I meant by inseparable contribution was a significant contribution, when eliminated, that would make entire work not resemble the current state of the work; i.e. the line that tells derivative work versus near-equal co-authorship apart (which are treated similarly in fully assignable IP jurisdictions, yet have entirely different regimes in the compulsory rights jurisdictions). Not the entirety of the work indeed.

> the condition and "preferred form" definition would only be considered in relation to the plaintiff's code; the owner/defendant's code wouldn't be relevant.

It would, in a compulsory rights jurisdiction, because all copyright assignments are either void or revocable at will in such jurisdictions.


> It would, in a compulsory rights jurisdiction, because all copyright assignments are either void or revocable at will in such jurisdictions.

I didn't believe this, so I looked at a study of EU copyright law[0]. Rights of authors are split into moral rights and economic rights. Economic rights are transferable as property. Moral rights, however, inure to the author and are inalienable. In some countries, the moral rights include the right to withdraw the work from circulation. This right to withdraw is probably what you are referring to when you say that copyright assignments are void or revocable.

The right to withdraw a work from circulation, however, does not come for free. In Spain it is only, "after indemnification of the holders of exploitation rights for damages and prejudice."[1] In Estonia, "The rights ... shall be exercised at the expense of the author and the author is required to compensate for damage caused to the person who used the work."[2] In France, "... he may only exercise that right on the condition that he indemnify the assignee beforehand for any prejudice the reconsideration or withdrawal may cause him."[3] In Romania, the right is "subject to indemnification of any holder of exploitation rights who might be prejudiced by the exercise of the said withdrawal right."[4]

In all of the examples I could find, the withdrawal right essentially extinguishes an assignment of the economic rights. So, in a sense you are correct that an assignment is revocable. Practically, however, the author who exercises that right would be liable for damages to the assignee, which could be significant, and the author would not be able to exercise the right if he could not pay for the economic harm.

Anyway, this has been interesting and I learned something about European copyright regimes. Thanks.

[0] https://www.europarl.europa.eu/RegData/etudes/STUD/2018/6251...

[1] Id. at 134.

[2] Id. at 93.

[3] Id. at 173.

[4] Id. at 301.


What modifications would you make that might be useful? The (proprietary) hardware isn't going to change.

If the generated code is a representation of certain unchangeable data about the hardware, you might still want to

1) represent it more compactly;

2) represent it in a form that can more easily be read and transformed to handle future use-cases for the data;

3) after some future restructuring of the driver, represent the data in a form that better fits with that structure.

If you have to regenerate the code using the proprietary tool in order to restructure the driver, the generated code is not "the preferred form of the work for making modifications".


All you're going to end up doing is changing the names. And for that, in my view, a big long list of defines (or whatever), autogenerated or not, is as good a form of the work as any other.

And, besides, there is an excellent chance that you will never end up changing the names.


You might want to port it to a new language, in which case having the hardware description and a generator tool is easier and better than converting the C headers.

And yeah sure pragmatically it might not make much of a difference in this specific case, but if the AMD devs were to port their driver to a new language they wouldn't edit the C headers they would certainly just update their generator, so the preferred form for modification is clearly not the generated C headers.

Not to mention if all you wanted to do was change the names, maybe prefix them with something, editing the generator is _still_ clearly the preferred format for making that change.


But the GPL applies to the driver they released, not some hypothetical driver that you or somebody else might create in the future. You're already going to have to rewrite it all in this proposed alternative language... this header is the least of your worries.

Strikes me that AMD have supplied everything required: all the driver code in the preferred form for modification of the driver, i.e., a bunch of C files.

Some of these C files are a big long list of slightly opaque magic number defines that relate to the hardware, perhaps generated by some unreleased tool, who can say - it's all speculation at this point - but that's OK! The hardware is not the bit you're going to modify. As far as the people modifying the drivers are concerned, those numbers are never going to change. This portion of the driver is fixed.


You fundamentally can't because the defining code is usually hard core proprietary or a proprietary toolchain artifact from cadence/synopsis. We're talking like a memory map of the entire system or 1MB+ XML blobs.

Honestly lifting it from a header file manually is going to be easier for everyone.


What do you think AMD would do if they decided to port their driver to a new language? Would they update their generator or would they copy and edit the existing header?

They sure as shit didn't type out these header files by hand, so clearly these are not the "preferred form" for modification.


Remove some bugs, or improve its performance. Hardware drivers get updated all the time even when the hardware remains the same.

I'm not an open-source absolutist: I think the pragmatic solution Linux went with is good here. But it's silly to suggest that the driver couldn't be improved if it were more open.


The topic is not the driver - it’s the definition of the lowest level hardware interface.

It’s lists of registers and stuff like that; not things that can really be fixed by external devs.


We can say that it's generated from the "hardware schematics". AMD hardware isn't an opensource hardware.

These are the registers that are authorized for disclosure in the open-source driver by the AMD employed open-source driver developers.

In other words, there's more functionality that they're keeping secret? Sounds like a challenge...

Edit: so the hacker spirit is not welcome here...?


Modern high-end processors have a lot of undocumented features. This is rather widely known, though of course not universally known. These have existed for a long time - https://en.wikipedia.org/wiki/Illegal_opcode .

And ... you know this. Checking your comment history https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... I easily found https://news.ycombinator.com/item?id=8834863

> Intel CPUs have had undocumented features since their introduction; it's not hard to imagine their chipsets do too.

Before I did the search I thought you were one of the 10,000 (https://xkcd.com/1053/ ), surprised that others didn't share your enthusiasm. Now I don't understand your surprise.

You must surely know your comment about "the hacker spirit is not welcome here" comes across like snobbish gatekeeping, yes? At the very least, the implied lack of knowledge about undocumented features makes it seem that you aren't one to judge what the hacker spirit might be. ... which cannot be correct given your posting history.


What you're describing sounds like a binary blob masquerading as C header files.

It's just a list of registers names, offsets, field names, and bit assignments. It is nothing like a binary blob. It is GPU documentation in the form of C header files.

Now I happen to know that the vast majority of it is shared between GPU generations to some extent, so someone could abstract things out manually to remove duplication, but it's a huge task.


> Why not generate it during the build?

If those headers aren't expected to change then, with regards to accountability, it's far better to have the code checked into the version control and processed as is.

More importantly, if the code is already generated then there's no need to make the build system more brittle by adding a non-standard build target that depends on custom/third-party tools.


The Linux kernel doesn't play the Firefox game of requiring Ruby, Python3, Python2, NodeJS and Rust to even be able to build the thing.

Not for the build itself, but there are Perl and Python scripts in the kernel source, which are referenced by the kernel's main Makefile.

If you don’t want to use languages that enable code reuse, then you can’t complain about repeated code

>Why not generate it during the build? Is there a good reason not to do that?

In many hardware shops the C definitions for the visible registers are generated automatically from the hardware's source code


and I should add ... chances are the linux kernel is not the primary user of these addresses, more likely it's the internal DV ('Design Verification' - hardware QA/testing) teams who need access to all those internal debug/setup/etc registers that are not normally architecturally visible to the downstream software teams (like Linux/etc)

Wait hardware has source code? I’m just a web dev so I’m not aware of this. What does the code look like?

Yes, there is "source code" for describing hardware. Here are two you can take a look at:

VHDL: https://en.wikipedia.org/wiki/Vhdl

Verilog: https://en.wikipedia.org/wiki/Verilog


Depends on what you're building, boards tend to be done as net lists (essentially a list of components and wires and how they are connected) - but digital chips of more more than a few dozen gates are normally written in a highlevel language (linked to by other posters here) which can be compiled into both machine code that can be simulated, and synthesised into gates and wires (a net list) that can be laid out onto a chip

I'll tell you more, hardware is often simulated in software from those source descriptions before going to any sort of production. Which is probably one of the reasons for the existence of these definition languages.

If it never changes, generating the headers during the build is just wasted time for whoever/whatever is running that build.

The real shame is that Nvidia is still doing binary blob drivers 15 years after I started caring about Linux. Are they really that afraid of someone taking their Lucky Charms?

My new theory is the Nvidia driver can't be GPL and in the Linux kernel, because then they couldn't ban datacenter usage of their GeForce cards by not licensing the driver for datacenter use. The upcharge on the Tesla series of cards is huge compared to GeForce for mostly the same chips. (For those not aware, see if you can find and GTX 2080 or 3080 from a cloud provider. It's not a thing. This is actually a huge deal for the machine learning industry, massively increasing costs. I doubt Google would have made the TPU if not for this.)

Also, their driver is very complex, and they are constantly improving their hardware. They don't want to be dependent on getting new features and performance improvements upstreamed.


Don't forget the fact that most of their silicon is basically the same and you can easily change it with some hardware/software mods[1] --- I think they have tried to lock that down a bit more, but ultimately it's a cat-and-mouse game and the only ones who win are those willing to ignore the insanity of Imaginary Property laws and take matters into their own hands.

[1] https://www.eevblog.com/forum/general-computing/hacking-nvid...


AMD does the same thing and so does Intel, this is for CPU's too. The yield on silicon has a probability that some transistors wont work, so they disable those cores and create lower end models. Sometimes, to meet demand they do just simply disable cores, as it's also cheaper to have one process. Tesla does the same thing as well with their cars, funny enough.

IBM did it for their mainframes back in the day.

They still do to this day.

I've spent a lot of time trying to come up with a better term for these laws, and I think your "Imaginary Property" phrase here is better than anything I've come up with. Thanks!

> not licensing the driver for datacenter use.

Why are these kind of licenses even allowed. If I buy a product, surely I can do with it as I please?

Also, why doesn't TSMC slap a license on every IC that leaves their fab, taking (say) a 30% profit from every application in which their ICs are being used?


Sure you can do anything with the hardware which you actually ha e bought!

The problem is in the software (the driver) which you never can buy, only license under a long list of conditions which prohibit specific uses.

If e.g. Noveau could implement interfaces needed for CUDA, you could probably try to use a 3050 in a datacenter. I bet NVidia has provisions against this turn of events, too.


> The problem is in the software (the driver) which you never can buy, only license under a long list of conditions which prohibit specific uses.

Ok, so who gave software a special status over hardware? Is this desirable? Can we reverse it?


> Ok, so who gave software a special status over hardware?

Software is rarely sold (outside of bespoke development). All the off the shelf software is essentially rented.

Software itself has no legal value - the copyright is what is considered to be property. That property can be leased or sold. This is why copyright infringement is called infringement and not theft.

When you “buy” software, you are actually entering into a lease contract to use the software (sometimes perpetual, but increasingly only temporary) which can have various terms and conditions (that you really should read, but never do). But that lease doesn’t grant you the copyright.


I think that's misleading then, because when I buy a GPU, they make me believe I own it, when, apparently, in reality I don't.

I don't think this way of selling (or as you say renting) stuff should be considered legal.


I think your idea is agreeable but if we did treat hardware this way it changes almost everything. Apple/Nintendo/Sony/etc would all be required to give users root access to the software and remove their ToS.

And then it get even more complex when you get in to online services. Game consoles are going online only next gen. If you buy the ps5 digital edition and you mod your OS and sony bans you from their servers, your console is now a brick. But in many cases its fair to be banned such as banning cheaters.


What's the point of that? We do the same thing in software all the time. You get basic functionality for one price, and pay for a key to unlock extra features. Why should hardware be any different? So the law would somehow require any feature on a hardware product to have some physical difference and not be purely a software limitation? What is the advantage of that? Just increases cost to the manufacturer (which will get passed down), then also precludes any possibility of upgrades by purchasing a software patch.

It should be clearly communicated and never be misleading.

What's misleading? You get the functionality you pay for. The fact that software controls that is an implementation detail.

As mr_toad says, the software is essentially rented.

This means I'm not buying but renting, which is not how it is advertised.


You own it, and can talk to it the same way you can talk to a brick, lol :D

(I avoid nvidia whenever possible)


> I think that's misleading then, because when I buy a GPU, they make me believe I own it, when, apparently, in reality I don't.

If you buy a GPU you own it and the copy of the software it came with. You are free to use that combination as you choose, forever.

It’s not renting because you don’t have to pay rent to continue to use it. There may be software license restrictions, typically against modifying or reverse-engineering the software, However, it is an error to say that those license restrictions convert your ownership into anything like a rental agreement.

Some digital activists say that we don’t really own the devices that we buy because of license restrictions or restricted device firmware. It’s hyperbole. We do own our devices and the copies of the software they came with, even if they came with artificial limitations.


Lets test this idea of ownership: My phone auto-updates, and the manufacturer prevents me from reverting updates. One update has removed my ability to record my calls.

Does that sound like ownership? Can BMW employee pop over to your garage one day remove some bits of the car he thinks you shouldn't have any more?


The problem of features being changed or removed by a software update is real and the owner can be harmed, as you were. As the owner, however, if you are harmed in that way then you may have a claim against the manufacturer. For example, in a recent class action case by PlayStation 3 owners against Sony over the removal of the Linux OS feature, the court seemed to agree that owners were entitled to damages because Sony ended up paying millions of dollars to class members in a settlement. If you or the PlayStation 3 owners were not owners, then you wouldn’t have a good claim.

By the sounds of it playstation 'owners' were paid compensation, but could not get the Linux feature back, in other words they were not made whole. They don't control what is happening to their properly, and without Sony's agreement they cannot repair damage done by sony to it.

That does not sound like ownership to me - again, think back to car ownership. Firstly tampering with your car would have been criminal damage.

Secondly, BMW does not get a say in how you use your car. They can't stop you going over the speed limit. You could get your car fixed without having to involve BMW or going to court to force their hand.

In my view this Sony case looks like compensation for breach of a lease-like contract.


> By the sounds of it playstation 'owners' were paid compensation, but could not get the Linux feature back, in other words they were not made whole.

Members of the class could opt out of the settlement and sue Sony individually. A court could theoretically enjoin Sony to restore the feature for those individual plaintiffs, but the plaintiffs would have to show that monetary damages would be insufficient. Generally courts don’t like to force defendants to do things when paying money would be an acceptable outcome.

> In my view this Sony case looks like compensation for breach of a lease-like contract.

I haven’t read the complaint in that case but the plaintiffs probably alleged a breach of the implied covenant of good faith and fair dealing. So, yes, possibly a breach of contract claim but not a lease. (Note: A lease is a specific form of contract in which a lessor transfers possession of property to a lessee, but retains a future interest in the property after the contract term ends.)


Your idea of ownership is way too primitive and doesn't reflect reality.

You do NOT own the software that comes with your GPU!

Ownership implies the ability to transfer, modify, and resell, none of which are within the rights granted by the license of said software.

It's not "rental" either - it's licensing. You don't have to become a lawyer, but knowing and understanding the difference between proprietorship (ownership) and possession is a good start. Same goes for renting vs. licensing vs. ownership.

TL;DR you do not have ownership of any software that came with any device you bought and it's not hyperbole at all.


> You do NOT own the software that comes with your GPU!

When you purchase a consumer GPU that comes with software, you acquire the GPU, the copy of the software it came with, and a license to use the software subject to particular terms and conditions. That is what you own, no more, no less.


You do own it. And you're free to use the open source drivers if you want.

> When you “buy” software, you are actually entering into a lease contract to use the software

This is inaccurate, at least as to purchases software. A license is not a contract because the licensee is not required to do anything. A license can have conditions (restriction), but not covenants (promises to do something). A license basically functions as a defense against a claim of infringement.

Note: For purchased software there is a contract for the sale of the software subject to the license, but that shouldn’t be confused with the license itself.


> For purchased software there is a contract for the sale of the software subject to the license, but that shouldn’t be confused with the license itself.

That's simply not true. You are indeed making a contract for the sale of the license itself. Otherwise subscription models wouldn't work and would even be legally allowed to share and resell the software, which you aren't (i.e. just because it's possible to resell an acquired license while keeping a working copy, doesn't make it legal to do so).


I agree with you. My earlier point was that a license is not a contract, and shouldn’t be confused with one. My note at the end was that there is also a contract when you acquire a license through a purchase. The contract is typically of the form “you pay us money, we give you license”. That contract too shouldn't be confused with the license acquired.

As you correctly point out, one who sells his only license to a piece of software no longer has a license. If he kept a copy of the software and continues to use it, he is committing an act of infringement. That is the same whether the license is for a term (subscription) or perpetual.


Keep in mind it's the same special status that allows the GPL to have the condition that you must release your source code if you distribute something that includes GPL code. So, "reversing" it would also reverse the GPL.

Not exactly. The GPL's special status generally comes from the fundamentals of copyright law: it attaches conditions to the duplication, modification, and distribution of a work. If not for the GPL, you'd have no right to distribute something containing the copyrighted code.

The datacenter-versus-personal conditions of NVidia drivers attach instead to the use of the copyrighted work. These restrictions are based on the idea of an end user license agreement as an enforceable contract, either agreed-upon when the driver is downloaded or through a theory that copyright attaches to the temporary (in-memory) copy of the driver necessary to run it.


Yes, maybe, but step #1 is to sue Oracle.

See if you can convince them to "let you" publish a benchmark of their database management system.

Start there.


Amusingly, Oracle is known as the slowest major DB despite their heavy handed tactics. So, actual benchmarks might actually help their sales rather than people simply assuming it’s unacceptably slow.

Did this change recently? I remember my database professor in college was adamant that when they talk about databases, I am to assume some things as a given (going by memory, am probably not completely accurate):

that the data set is large enough that cannot fit in memory

that storage is orders of magnitude slower than memory and memory is orders of magnitude slower than processor cache

Oracle has the “best implementation” given these constraints.

Is that not the case?


It's worth noting that in current conditions the assumptions may be unwarranted.

First, while storage used to be orders of magnitude slower than memory, not SSD storage is just a single order of magnitude slower;

Second, in many domains now it's often practical to ensure that your data set can fit in memory. For example, if your system is for storing financial transactions (which is a prime market for Oracle), then your enterprise has to be quite large to get a terabyte of transactions and you can put a terabyte (or much more) of RAM in a database system if you choose to.


That's precisely the point, Oracle forbids benchmarking and comparisons in its licensing.

So how would anyone (legally) know?


Well, you cannot legally publish a benchmark, but you can set up your own for your private uses. It is not like Oracle DB detects it is being benchmarked and shuts off itself.

It's not a special status, anyone has the right to deny you a hardware product as well. I don't have to sell cars to anyone if I don't want to. If I do want to sell someone a car, I can specify a contract or license that they must follow if they buy my car. Ferrari famously only sells exclusive models to customers who have been pre-approved, i.e. they have a certain amount of income and own 5+ Ferraris already. I also cannot walk into a Lockheed Martin dealership and tell them to sell me a F-22, even if I can afford it, even if my country has permissive laws regarding the ownership of fighter aircraft.

As for software, well, EA has the right to ban me from their servers if I hack their games, even if I did pay for the product, and this makes sense because it ruins everyone else's experience. I don't pay for HN but if I did they still would have a right to ban my account if I start posting slurs or other abusive content.

Is it desirable? Of course it's desirable; imagine having no control over your own creations and having to deal with the consequences of other people abusing it.


None of those examples are equivalent. The hardware examples are where companies refuse to sell a product (so you never own the product to begin with) where as the EA example is where you’ve been kicked off online services (you still have the capability to play the game offline, you just can’t access their servers, but you don’t buy their servers when you buy the game) and the HN example is a termination of subscription. Neither of those examples demonstrate legal limitations to software usage with a product you own (though the EA one at least comes close from a superficial perspective).

> you still have the capability to play the game offline

EA famously uses online-only DRM in many of their modern titles; if you get banned from, say, SimCity, you can't run the game at all. There is no "offline mode".


And is this desirable? How does hacking your copy of a single player game harm anyone?

Also, SimCity eventually got an offline mode.


It's not a single player game I'm pretty sure - there are leaderboards and achievements that allow you to compete with your friends. Obviously these features are moot if the top 10,000 players have a score of MAX_INT. It would have been nice to "disconnect" your city from the leaderboards if you wanted to go crazy, but unfortunately this mode was not added.

For the record I am against always on DRM so I did not buy this game nor any other game that uses it. I don't believe we need to codify laws banning the practice or any such thing that requires software developers to build things they don't want to build (with the exception of critical fields such as healthcare and aviation).

It's desirable in that a one time purchase does not entitle a customer to a lifetime of server resources; they paid for the game and they can certainly keep the game, but they don't have a right to the services required by the game (those are recurring costs). This makes sense since the alternative is forcing EA to pay to host servers for people that violated their terms of service.

You are correct that it got an offline mode eventually, I overlooked this. But this demonstrates that the market corrected this problem: Enough consumers complained to force a change. Therefore, is there need for external intervention? The simple solution to always-on DRM seems to be to just avoid buying any products that use it.


>If I do want to sell someone a car, I can specify a contract or license that they must follow if they buy my car.

Only in a limited form e.g. exhaustion doctrine prevents you from restricting resale. If someone wants to resell their exclusive Ferrari, there's nothing Ferrari can legally do (though this'll probably get you blacklisted from ever receiving an exclusive vehicle).

In general, terms can't go against existing laws and have to be 'conscionable' to be enforceable (i.e. they can't be obviously 'unfair').


The examples of software you list aren't close to equivalent. They are all services and you can get banned from a service if you misbehave. But a piece of software such as a driver is not a service.

Software updates aren't a service? Doesn't Nvidia provide updates to its drivers over time? They can choose to cut anyone off from those, including the very first initial driver download. Sure, you can own and do whatever you want with the hardware - but good luck getting it to do anything useful if you can't access Nvidia's driver download service.

Yes software updates can be a service. Nvidia however doesn't provide a working version of their driver with a card. Just like buying a game is not a service but recievig updates can be.

Sure, but a reseller could resell it under the first sale doctrine. https://en.wikipedia.org/wiki/First-sale_doctrine

> Ferrari famously only sells exclusive models to customers who have been pre-approved, i.e. they have a certain amount of income and own 5+ Ferraris already.

Sounds like discrimination to me, and not desirable.


Only a few very specific types of discrimination (religion, sex, ethnicity/race, etc) are prohibited, every other discrimination is fair game and often desired. For example, it's quite desirable to discriminate potential developer hires according to their programming ability, to discriminate potential borrowers according to their ability to repay the loan, etc.

And discrimination is only discrimination if it's illegal where PeterisP lives?

Discrimination is discrimination everywhere, as it applies to all activities where people make distinction and treat some things, people or activities differently.

But I'm arguing that when you hear "X is discrimination" then it's wrong to automatically imply that X is bad or X should be changed - there's just a narrow subset of discrimination that's immoral and should be avoided; and there's a narrow subset of discrimination that's illegal discrimination (there's some overlap between these two subsets but they are not exactly the same of course), but most discrimination - and certainly the default situation - is just reasonable human activity of us applying common sense and acting according to the specific situation instead of blindly acting the same no matter what like robots would, it's completely normal to adapt to the specific person and act differently to be most suitable with them, make adjustments and custom approaches for different individuals which definitely is discrimination but there's nothing a priori wrong with that. For example, custom pricing is one form of discrimination - offering a discount for students or senior peole is certainly discrimination, but we generally consider that it's entirely appropriate.

And in certain cases a lack of discrimination would be completely immoral - for example, the concept of "reasonable accomodations" is a requirement for discrimination; for example, a policy that forbids electronic devices in an exam does not discriminate in any way and applies equally to anyone (in colloquial language one might call it a "discriminatory policy" but that's wrong; perhaps I'm nitpicking on that but it a misuse of words to mean their exact opposite), but as it forbids hearing aids for people who need them, then that non-discrimination is bad; and also simply equally allowing all devices would be bad for other reasons, so ADA and equivalent laws require to discriminate and apply different rules to people with different abilities.

So if you see a practice that seems definitely bad and harmful, then "is it discrimination?" is the wrong question to ask, since it's very likely that it may be harmful but not discrimination, or it may be discrimination but nothing wrong with it; these aren't edge cases, the overlap is just partial. The proper question to ask is whether the criteria of the discrimination is fair (the up-thread issue of discriminating upon wealth certainly is debatable whether that should or should not be acceptable) and whether the results of that discrimination are appropriate.


But building a brand by only selling to rich people?

Sure, everything that's not explicitly prohibited is permitted, and wealth is not one of those very few things prohibited for discrimination. You're free to have a club that only admits billionaires or offer a discount that applies only to people below a certain amount of income.

The example on ability to repay is closely related to discrimination by pure wealth, but there are businesses with even more straightforward criteria, e.g. financial services that are offered only to individuals with net worth above a certain (quite large) amount, and having less money than that automatically disqualifies you from that service even if you were able and willing to pay the involved fees.


> Sure, everything that's not explicitly prohibited is permitted

That was not the issue. The question was whether it is desirable.

Personally it leaves a bad taste. It reminds me of a fashion brand that doesn't sell to obese people (can't remember the name but it was in a documentary).


So don't buy one

> Sounds like discrimination to me, and not desirable.

Are you allowing everybody who want to have sex with you to have sex with you or are you discriminating to a select few / unique person ?

Discriminations is part of human nature.


> Discriminations is part of human nature.

Glad to see someone also came to this conclusion!


That's the point of luxury models. Not everyone can have them.

The difference between hardware and software is that copying is free for software. You can own the hardware and do whatever you want with it because for you to reproduce it would require you to effective be Nvidia. For software you can't give a user ownership of exactly 1 copy of software. If the purchaser has all of the rights of ownership they would have the right to distribute copies for free, which obviously make selling the same software impossible. Software is copied and hardware is moved, they're fundamentally different so the have to be treated differently

> Ok, so who gave software a special status over hardware?

Some american politician extended copyright protection towards software. The rest of the world eventually did the same.

> Is this desirable?

No.

> Can we reverse it?

Sure. We just need to have billions of dollars just like the copyright industry. That money can buy a lot of influence.



> Some american politician extended copyright protection towards software. The rest of the world eventually did the same.

>> Is this desirable?

> No.

So I'm sure you'd be happy if I just took the software for whatever great startup idea you'd been slaving away on for the last two years, slapped better marketing on it, and undercut you by 50% since I didn't have to employ all those pesky overpaid engineers.


If someone purchased a product from me, and then used it however they want, that's fine.

It is their product at that point, because they purchased it.

You should not have the rights to control someone else's product, such a a graphics card, or whatever, after they have purchased it. It's theirs now.


You're free to write your own graphics driver for the hardware you own, just as Nvidia is free to not help you.

Nvidia is free to not give out any graphics driver, however, that would make their graphics cards unfunctional and hard to sell.

However, if nvidia has sold me a functional graphics card including the driver as an unalienable part of the package that I purchased (since the driver being functional is part of the card being 'fit for purpose' of the sale), I should be free to use the driver without any unreasonable restrictions. I have legally bought [a copy] of it, it's not copyright infringement for me to run it on a computer - even if it resides in a datacenter.


> your own graphics driver

My whole point is that there needs to be more efforts to hack and modify these things, and that this would be "more desireable".

And that orgs should be using their power to cause this to happen more. For example, if open source orgs can weaponize licensing agreements against nvidia, in order to force them to do this, then they should and this would be desireable.


If only! The hardware is made so that only software written by nvidia can drive the graphics card.

Buy a Radeon. I'm not seeing the problem. You have options? Nobody is forcing you to buy Nvidia hardware.

> You have options? Nobody is forcing you to buy Nvidia hardware.

Laptops are generally an all or nothing proposition. I wanted a laptop with a high performance CPU and the nvidia GPU just came along with it. Couldn't even disable the thing in firmware since hardware video decoding with the Intel GPU caused kernel panics.


If you claim it matters, but not enough to impact the purchasing decision, did it actually matter?

Like you showed your disapproval of Nvidia by giving them your money anyway. So... They're right - people care enough to complain, but not buy something else, so it doesn't really matter.


> So... They're right - people care enough to complain, but not buy something else

Are you aware of the concept of market power, switching costs, barriers to entry, and market lock in?

If so, then that should enlighten you as to the explanation for this.

> did it actually matter?

Yes. It matters, yet still did not change consumer behavior, due to the concept of market power.


How can you take software that was never published in the first place? There's a reason everything is a service these days. So what's the point of these protections?

There's nothing stopping nvidia from theoretically offering a gpu that is only rented out rather than sold. It's just not really considered acceptable for hardware (at the moment).

That's only because the hardware is a useless dongle without the software.

Sure in theory you could run an open source driver, and in practice sometimes the river won't crash, but there's no point because you could get an equally good open source driver video card for the same price, since you can't get the fancy card's peak performance from the open source driver


Isn't that what IAAS is in essence?

rms, amongst others, long maintained that firmware is different to software.

If the hardware is effectively useless without the software driver, it could be argued the whole thing is a bit of a fraud / misrepresentation. But I guess nobody wants to sue somebody with pockets as deep as nVidia to change the status quo.

GTX/RTX cards are sold as gaming cards so its not misrepresentation. If it was then every bit of hardware with locked down software would be fraud.

“Locking down” is not the problem - the problem is that you are told you’re buying goods when you are actually buying goods which require access to rented software in order to function at all.

It’s like buying a blender and then finding out that you’re not allowed to blend anything unless someone in the manufacturers’ operation approves of it.


It's not fraud; you can return the purchase if you don't like the software license.

"The problem is in the software (the driver) which you never can buy, only license under a long list of conditions which prohibit specific uses."

Well, buying and licensing are not so different in Europe (first sale doctrine). The company can not forbid you to resell a license (Exhaustion of intellectual property rights) in Europe.


The US respects first sale doctrine as well. See Vernor v. Autodesk, Inc. 555 F. Supp. 2d 1164. [0]

However, the ability to resell a license doesn’t remove other conditions, e.g., restrictions on data center usage.

[0] https://en.m.wikipedia.org/wiki/Vernor_v._Autodesk,_Inc.


lol. When was the last time anyone actually bought a copy of software? You own nothing. You are a party to a contract written by nvidia and signed by you when you installed their driver. You can do only what they allow and they can yank thier permission whenever they see fit.

Not happy? That is what makes FOSS so appealing.


> When was the last time anyone actually bought a copy of software?

Satya Nadella bought Skyrim recently. All of it.


Skyrim includes licensed code such as Bink Video.

Microsoft did, even satya is perhaps not rich enough to spend 7.5 B on his own.

Have enough details about TSMC contracts ever been released/leaked to know they don't do this?

I don't follow the semiconductor industry closely enough to know anything about TSMC's business practices, but these kind of contracts are far from unheard of in other sectors.


Maybe you could just use it for the datacenter anyway. NVIDIA doesn't have a right to know how you are using it.

What are they going to do, call you and ask how you are using the GPUs? Don't answer. Message you on Facebook? Don't answer. Visit you? Don't publish your address.

Alternatively, just don't call it a datacenter. Just call it a private internet gaming cafe or something of that sort. NVIDIA doesn't have a right to know what's actually inside.


I suppose you may get raided by an organization like the BSA upon suspicion (e.g. when you get reported by a disgruntled employee).

https://en.wikipedia.org/wiki/BSA_(The_Software_Alliance)


Build your dreams in a country whose government won't give a damn about enforcing it then. I can think of several where you can safely do so, and the government will just laugh it off as a waste of time if someone tried to file a suit about something like this.

The US will fall behind in tech if it insists on enforceability of things like this.


Most companies would rather just buy the enterprise card than go through all of this hassle. Its not even a rip off when you consider that the enterprise cards pay for the research and development on cuda which puts enterprise grade tools in the hands of students and hobbiests.

AMDs version is simply not supporting their version of cuda (rocm) on consumer cards (the navi ones anyway)


> Its not even a rip off when you consider that the enterprise cards pay for the research and development on cuda which puts enterprise grade tools in the hands of students and hobbiests.

Monopolists can do anything with your money including sitting on their hands. Also, supporting students and hobbyists may be noble, but education is something we all pay tax for.

Also, hobbyists would be better served if they could develop their own version of cuda.


All of the BSA members are US tech companies. One could argue the enforceability of US Copyright law is to protect this industry.

I suppose to the idea is that you can't use the driver..

Let's say I do this in Russia or Morocco or whatever.

How are they going to prevent me from doing so?


How are you going to sell said cloud service to companies doing business in the US or EU?

Using the internet and a payment processor, of course. The true hardware would be hidden from the client in one way or another to protect from judgements, and any inspection would be respectfully denied.

> If I buy a product, surely I can do with it as I please?

We no longer buy products these days. We license them. Another form of rent that allows the true owner to maintain control. Somehow this became the norm.


There's nearly no market for people willing to pay to own.

That's because most people don't know they don't really own the stuff they "buy" these days.

The EULA / driver license may prevent you from reverse engineering the driver to enable these features, but that is only legal protection. nVidia sells these cards saying they don't provide feature X; nVidia also sells some cards, which do provide feature X (at a different price point). There is imho nothing per se wrong with this practice. The silicon being the same in both products is an implementation detail.

If you sold pork with different price on the condition of eating it in a wood vs. a stone house, some people would consider it market segmentation or maximizing profits. Others might call it illegal price discrimination.

Who calls it illegal?

> nothing per se wrong with this practice.

There is nothing wrong with someone doing whatever they want with a product that they now own.

If someone wants to modify their own hardware, that is their right.


It's cheaper than designing, manufacturing, and stocking more chip models. It's also cheaper than designing and manufacturing 1 model and physically disabling the pieces after.

You could try to regulate that what is manufactured is not gimped on it's way to the consumer for ideological reasons but in the end you'd just end up paying more for a separate physical model because the profit margins on these advanced use cases are simply what drives GPU design.

As for the royalty licensing TSMC is ahead in abilities and has captured an enormous portion of the market but it's not so far ahead that it can eat however far into customer income streams as it wants. Other manufacturers still exist and get deals, Nvidia is using Samsung 8nm for the latest round of GPUs for example. If it continues to increase its lead then we may see that type of agreement grow though.


> Also, why doesn't TSMC slap a license on every IC that leaves their fab, taking (say) a 30% profit from every application in which their ICs are being used?

Because companies would stop using TSMC chips...?

Not to mention the logistical problems to attribute "profit" to any chip in particular.


I don't think not using TSMC is a viable option anymore. There's (slightly worse?) Samsung alternative, but they can't satisfy the demand.

Also, if TSMC introduces the license, then Samsung may do the same.

But perhaps I'm too much thinking from the perspective of what large US based businesses would do.


US trusts are nothing compared to Asian trusts.

en.wikipedia.org/wiki/Keiretsu

en.m.wikipedia.org/wiki/Chaebol


Not about trusts but about too much capitalism on US (like pharmaceutical industry?)

For GPUs, I agree that you currently have to use TSMC. But if TSMC were to charge 30% of profits, you would almost certainly see a migration to other fabs which would harm TSMC's long-term profitability.

> For GPUs, I agree that you currently have to use TSMC.

RTX 30 is manufactured by Samsung.


Not the Quadros used in datacenters. GA100 is on TSMC 7nm according Nvidia themselves: https://developer.nvidia.com/blog/nvidia-ampere-architecture...

Give it 10 years and a lot of engineering time, maybe half a trillion USD and you'll get the equivalent in the mainland US. Until then, it's more convenient to use the US military resource to protect Taiwan from PRoC.

This is called a royalty and is a pretty common business arrangement when licensing e.g. a patent for your product or some stock footage for your movie.

>Why are these kind of licenses even allowed.

They're not in (some parts of) Europe.


This sounds dubious, isn't it easier/more reliable to just block parts of the hardware using fuses?

To differentiate products for consumer and enterprise, Intel disables ECC RAM support for Core i5 or upper series and enables for Xeon E series (i3 or below is sold for both market so not disabled).

NVIDIA reduces (actually reduced on die) FP64 calculation units and disables ECC RAM support for GeForce (except some Titans) to not to be used in datacenter. Previously it works because most scientific calculations requires FP64 calculation and reliability is matter.

But now is the deep learning era, it won't need FP64 calculation and rare RAM error isn't matter. So they must enforce the EURA to avoid dirt cheap Geforce to be used in datacenter for deep learning.


I don't think the latest i3s have EcC anymore, but they are releasing Xeon Ws which are the same chips as i7s with ECC enabled.

The price premium is like ~10%, which is fair.


I've caught up latest SKUs, thanks. Rebranding Xeon E to Xeon W looks not meaningful.

No because they want students to have cuda to learn with and home devs to have so they develop tools for it. Then when the enterprises use it for profit they have to pay for the development of the platform.

It's basically the Adobe route.

How? That’s like saying an Intel i7 can’t be used in a datacentre.

Meanwhile people say capitalism drives innovation.

How much is nvidia single handedly holding back innovation and new discoveries?


That viewpoint is adorably naïve. The Computer History Museum in Mountain View pretty clearly falls into three categories: government projects, genuinely innovative ideas from the private sector that failed, and the people who ripped off those ideas and made a killing. There is very little overlap between the last two categories.

> The Computer History Museum in Mountain View pretty clearly falls into three categories: government projects, genuinely innovative ideas from the private sector that failed, and the people who ripped off those ideas and made a killing. There is very little overlap between the last two categories.

This is ignoring two very important things.

The first is the number of government-funded projects that burned a mountain of cash and led to nothing. Unfortunately this is the rule rather than the exception in modern times because modern government has been captured by interest groups that divert money from where it's supposed to be going to themselves, which makes everything cost ten times more than it did when the government was funding the Apollo program and ARPANET. So you can't just say "government fund more stuff" without fixing that first.

And the second is that private companies inventing stuff only to see somebody else successfully commercialize it is still causing it to be invented. And the overlap between invention and commercial success can be very little and still cause people to do it, because the reward when it happens is very large.


> Meanwhile people say capitalism drives innovation.

The saying is really that free market competition drives innovation.

Obviously patents and copyrights are government-issued monopolies, and monopolies are by definition lacking in competition.

The theory is that by granting the monopolies we get more innovation. Often the theory is wrong.

Especially when we allow the company to leverage the monopoly on the thing they actually invented into a monopoly on ancillary things that are only used in combination with that class of product.


I mean Nvidia is just cashing in their innovation advantage, AMD stack was worse forever and OSS is their white flag/hope someone else picks up the ball and creates an ecosystem to leverage their HW.

Your second sentence doesn't contradict the first sentence. Capitalism (or more precisely, IP law) can simultaneously drive innovation and hold back innovation. The more worthwhile question is whether capitalism drives more innovation overall, but that's hard to prove either way with snarky 1 liner.

"The more worthwhile question is whether capitalism drives more innovation overall, but that's hard to prove either way with snarky 1 liner."

Sent from my iPhone.


I think you proved the parent's point.

Maybe they can't open-source it because they don't own all the IP? That's very likely the case for Windows as well, for example, Microsoft just didn't licence all the code they used for releasing the source, and now you can't go back to 1000 different IP owners and negotiate anything reasonable.

Didn't they have to manually prepare a binary patch for a security issue in the Word Equation Editor, because they either lost or could not compile the source code anymore?

The original Equation Editor was licensed from a third party (Design Science), and it is possible that Microsoft never had the source code. Maybe the third party vendor lost the source code, but I think it is more likely that getting the third party vendor to fix the bug would have required negotiation with that vendor, and maybe Microsoft and that vendor were having trouble agreeing. (This is speculation on my part, I have no inside info.)

Or equally likely, that vendor no longer exists.

Likely, perhaps, but not true, since they still exist: https://www.dessci.com/en/

Microsoft probably started wondering internally why they don't just write their own equation editor, but didn't have time, so decided to do a crazy patch to this one and then start on a rewrite.


I think that Microsoft can open-source most of Windows sources. Nobody would care too much about few binary blobs and I don't believe that they don't own license for a significant portion of OS.

This is exactly what happened with Solaris, and it turned out to be a rather massive problem because it meant that the community couldn't actually functionally produce a derivative distribution because the original released source code didn't actually represent the entire distribution. And a project that the community can't build will always be critically undermined by that flaw.

I think that momentum behind Open Source Windows would be immense so community would overcome any problems. I mean, people are making Windows distributions right now, with all sources closed, and they're making amazing work if you ask me, with all those reverse-engineered knobs and whistles. Solaris is niche OS after all unlike Windows.

It was a solvable problem though, right? A bunch of different OpenSolaris distributions exist now.

I dunno if I'd call it "solved" or not... Illumos reduced the binary blobs, but to this day you have to download a bundle of them when you build it. The whole issue also added significant friction early on, which I personally think stunted the project's growth, but I'm not sure that really knowable.

But yes, it did eventually get mitigated.


Well, at the very least, they could allow loading unsigned firmware, or allow their firmware to be redistributed, then.

This is the number one usability issue with nouveau: no firmware means no re-clocking, which means bad perf.


Here's a kernel engineer from Microsoft answering the question: What do you think about open sourcing windows and getting rid of the licensing code? [0]

[0] https://www.quora.com/What-do-you-think-about-open-sourcing-...


The last assertion made in that answer is unfounded and false - "Even if the entire OS code was made public tomorrow morning, it would take years before someone figures out how to build it, the complexity of the build system itself is mind boggling."

is contradicted by the fact that just recently a version of windows source (old, but still) was leaked, and people did manage to successfully build and boot the leaked windows (xp and server 2003 IIRC) code within days of that source becoming available.


that's what kept Solaris from being open sourced for years.

There's a talk by Bryan Cantrill about that.

They basically could not provide a fully functional OS because some marginal yet used-everywhere parts where licensed and proprietary (Bryan cites the internationalization library as an example).


second-hand information but apparently the reason they can't is that the driver contains code licensed from other companies and they can't open-source that.

https://www.reddit.com/r/hardware/comments/j217oo/gamers_nex...

while obviously not an official source, that isn't particularly surprising either.

as an additional relatively-well-known-but-possibly-incorrect bit of internet lore, right now their Linux driver is basically a wrapper around their Windows driver, so that explanation makes a lot of sense. They would have to go through and disentangle what parts they own and what needs to be stripped out / replaced for the linux version at an absolute minimum.


Why is it a shame? They had been providing quality Linux drivers for years, when nobody else cared about high-end graphics for Linux. Remember fglrx?

Now AMD is opensource? Great! However, it's still very far from perfect. You only have to take a look at the list of AMDGPU issues at freedesktop[1]... because being opensource is easy, but working fine in a stable manner is another.

[1]https://gitlab.freedesktop.org/drm/amd/-/issues


It's shame, because they hinder the progress of Linux desktop and prevent Nouveau from reclocking properly.

And what about AMD bug tracker? It's open, so you can see the bugs. That's a plus, not a minus. Nvidia blob has all the bugs hidden somewhere, so you don't see them. It doesn't mean the blob doesn't have them.


> It's shame, because they hinder the progress of Linux desktop and prevent Nouveau from reclocking properly.

I think it's the opposite. Some years ago, Nvidia was your only chance to have accelerated graphics on Linux. ATI/AMD didn't care about it at all, and Intel cards were not for gaming. So Nvidia made it possible to do things in Linux when nobody else allowed you to... how's that hindering the progress of anything? Specially when nobody forces you to get an Nvidia card.

> And what about AMD bug tracker? It's open, so you can see the bugs. That's a plus, not a minus. Nvidia blob has all the bugs hidden somewhere, so you don't see them. It doesn't mean the blob doesn't have them.

Yes, I didn't say Nvidia was bug free. I just said AMD drivers for Linux are, at the moment, far from perfect, despite being opensource. I'd say, for newer cards, they're worse than Nvidia's. I value opensource, but if I have to choose between having an opensource desktop crashing twice a day, vs. the Nvidia blob, of course I'd go for the latter, as much as I'd love to have a fully opensource OS.


I was Nvidia user for a long time due to the above, but today they aren't worth bothering with. AMD can be slower to fix bugs or have more of them on release day due to having smaller teams, but they are gradually ramping that up, and their current level of support already doesn't bother me, while they are providing a proper open source driver. Nvidia don't and have no plans to. I'd take AMD over Nvidia today any time.

Regarding slowing down progress, I was talking about modern desktop like Wayland compositors and so on. Nvidia was hindering it for years. And their attitude towards Nouveau is disgusting.


Well, I've been using Nvidia cards for years, and the last time I built a new computer (some months ago) I had a hard time deciding whether to stick to Nvidia or switching to AMD. Eventually, I chose to stay with Nvidia, because getting a new AMD card (apart from the fact that there seems to be no budget AMD cards...) seemed like a lottery in terms of having a stable Linux desktop environment, being my best bet to get an older generation card (RX570 or RX580), that had not much availability and they were overpriced here.

As for slowing down Linux desktop progress, I think it's not Nvidia's fault: you could always get a card from another vendor, although the alternatives were not as good. Well, maybe those other vendors are to blame, and not Nvidia...


I'd say it's their fault, since due to the above situation, there were a lot of Linux users with Nvidia cards. Nvidia didn't care to upstream things and that caused them not to support Wayland and many other modern use cases for years.

Today it's less relevant, since usage of Nvidia on Linux is gradually dropping, so their damage to the progress is also diminishing thanks to that. Wayland compositors' developers can simply say - we don't support the blob and don't plan to and be done with it. In the past it was much harder, due to how many Linux users had Nvidia still while alternatives were way less viable.


Providing support for a marginal platform is also much harder when you’re <0.1 times the size.

Are they really that afraid of someone taking their Lucky Charms?

They're afraid of patent trolls.


That's my dream... or in a short time frame at least wayland support.

Well, in their defense, I generally haven't run into any issues with their driver and it's also pretty easy to package/install as an enduser.

I have. Like keeping a copy of the last page you viewed in Chrome and overlaying it on the screen after exiting the browser.

That one is fun.


I encountered a real fun bug where a game on linux crashed and a ghost of the game remained on the monitor even after it was connected to a different computer and power cycled, it remained for days. Some really interesting cascading bugs there.

That's not a software bug, that's an image retention issue with your monitor. If it was that severe then it probably got into a state where it was sending severely invalid timings to the TFT LCD array, DC biasing it, which causes long-term retention and may even cause permanent damage if done for too long.

Software isn't supposed to be able to cause that. That's on your monitor.


I _think_ it was every other frame, and I am fairly sure that it was a software/driver issue. It had never happened before, it has not happened since, and it started immediately when starting the game and the symptoms got progressively worse and triggering the mild symptoms happened every time I started the game.

What I convinced myself of after a few minutes being sure I wasn't hallucinating was that the graphics driver was pushing out malformed data in some way or the other which was triggering bugs in the monitor hardware/firmware, which is easy to believe are plentiful. It would be an interesting project to try to track down and replicate the bug.


That reminds me of the spookiest bug I've ever encountered: once, when resuming a Dell laptop from suspend at work, it showed a Windows desktop. Said laptop had been running Linux exclusively for several months (but it had previously been used with Windows). Interacting with the laptop made the expected xscreensaver unlock screen appear, and everything worked normally afterwards. The only explanation I could come up with was that, somehow, a snapshot of the Windows screen had survived intact in a corner of the framebuffer which the Linux driver didn't touch, even after months of power off/on and suspend/resume cycles, and a bizarre driver glitch made it visible in that particular resume cycle.

The said windows desktop ghost screen didn’t have date time on start menu, did it?

If I recall correctly, the Windows XP default was to show only the time, so the date probably wasn't visible.

I get this on Macbook Pros pretty often when connecting external displays. I think the nvidia drivers are universally bad.

I agree, if it weren't for the fact that they give zero shits about Wayland support. I'd be totally fine with them staying closed source as long as they kept up with the standards.

What are you talking about? Wayland is supported on DEs that wanted to support nVidia chips.

Meanwhile projects like Sway have a direct "Go to hell if you use nVidia, we won't let you run this code." It's bizarre that you blame nVidia for this.


It's possible to create a Wayland compositor that works with the proprietary nVidia drivers, but it requires using nVidia-specific interfaces because nVidia refuses to support the same interfaces for non-GLX, non-X11 hardware acceleration provided by every other Linux graphics driver.

It's hardly surprising that a lot of Wayland compositor developers would rather not put in a ton of extra effort to add a special case for one particular set of proprietary drivers, which they would then need to maintain and support separately from the common code path.


To tell the whole story, the NVIDIA argument is that they want cross platform standard interfaces (EGLStream), which would use the same code the Windows driver uses, but the Linux world is pushing for Linux-only interfaces (EGL)

That may be, but the fact remains that nVidia is pushing an interface that no other Linux drivers currently support, for reasons that really only benefit them. The Linux kernel team has never been particularly supportive of middleware layers designed to promote common drivers between Linux and other operating systems, and for good reason—it impedes the development of optimized, native Linux drivers.

The only way I see nVidia succeeding here is if they clearly demonstrate that EGLStreams is a technically superior alternative to GBM, not just for their own hardware but in general, and also contribute the changes needed to support EGLStreams for all the other graphics drivers currently using GBM so that applications don't need to deal with both systems. As long as the EGLStreams code path can only be exercised in combination with the proprietary nVidia drivers it will remain a second-class citizen and projects would be well-advised to avoid it. (Drew DeVault goes into more detail[1] in his objection to the inclusion of EGLStreams support in KWin, which I agree with 100%.)

Or they could just acknowledge that this is a Linux driver, not a Windows driver, and implement the standard Linux GBM interfaces like everyone else even if that means less shared code.

[1] https://lists.sr.ht/~sircmpwn/public-inbox/%3C20190220154143...


Typo, by Linux-only interface I meant GBM.

It’s “””supported”””. It’s apparently very buggy and very difficult to debug. Sway lets you run it after you set a flag making it very clear that if something is broken you may not report a bug since the developers are unable to reasonably fix it.

Most Linux distro will also prevent you submitting a bug report for a kernel issue if you have a tainted kernel.


I have. I needed a prerelease kernel for a new driver but nvidia had not released a binary for the new kernel yet so I was unable to use anything but the open source nvidia driver.

You could just as easily blame the author of the driver that only works on a prerelease kernel.

I still haven't been able to get my laptop's 1660Ti to display anything else than glitch art, proprietary or nouveau.

Is it? This is the same driver that makes me shut down X and fucks up xorg.conf every time I need to update?

Not to mention that nvidia uses proprietary configuration options even in Xorg.conf. A multi-monitor configuration which works fine in nouveau (or really any other driver) refuses to work with nvidia, because if you use the binary driver you have to set bizarre metamode options to make it work.

wayland doesn't work at all so if you have a 4k monitor and a non-4k monitor and an nvidia card, you're basically just fucked, because you can't selectively scale things

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: