
Netgpu and the hazards of proprietary kernel modules - rbanffy
https://lwn.net/Articles/827596/
======
microcow
It seems reasonable to reject the patch for technical and legal reasons, but
it's pretty disappointing to see toxic comments from prominent kernel
developers like:

> "Seriously? If you only even considered this is something reasonable to do
> you should not be anywhere near Linux kernel development. Just go away!"

[https://lwn.net/ml/netdev/20200727073509.GB3917@lst.de/](https://lwn.net/ml/netdev/20200727073509.GB3917@lst.de/)
[https://lwn.net/ml/netdev/20200728064706.GA21377@lst.de/](https://lwn.net/ml/netdev/20200728064706.GA21377@lst.de/)

~~~
kabdib
> "Seriously? If you only even considered this is something reasonable to do >
> you should not be anywhere near Linux kernel development. Just go away!"

A response like this would get you a serious talkin' to at just about any
place I've worked. Possibly fired, if it was consistent behavior.

I don't care if you're Donald Knuth, Dennis Ritchie and Edgar Djikstra all
rolled into one, act like a jerk and you're off of my Christmas list.

~~~
stefan_
So what do you get for trying to upstream an entirely useless patchset that
exists purely as a GPL workaround to enable the ever-proprietary NVIDIA
driver? Do you realize that's spitting into the face of the very people trying
to drive open-source software across the entire stack?

Greg KH called this trolling, and he isn't one to mindlessly throw around
insults.

------
viraptor
It wasn't clear from the article, and I can't figure out the answer really,
but I'm curious if this will impact any existing popular modules. For example
the NVIDIA driver - is it affected by the symbol import limitation as proposed
now?

------
ajb
Ignoring the legal spat, it's interesting that someone is trying to direct
packets at the GPU. In a way, GPUs and high-end routers have similar demands:
execute the same code in parallel on huge numbers of data units - and I
understand that there are also similarities in the microarchitecture. So it
wouldn't surprise me if you could implement a high end router using a GPU. All
those floating-point ALUs would be sitting idle, of course, so it might be an
expensive solution; although the economies of scale in GPUs might be larger
than routers, so it might work out. Devil is in the detail, of course.

------
znpy
The gpl-shim loophole could be useful, actually. Having a gpl shim means that
one could insert tracing code into such shim, helping reverse engineering with
the goal of developing free drivers.

------
f00zz
There was a module that used CUDA to accelerate operations in the Linux kernel
a few years ago (but the author knew better than to try to mainline it). I
think it was this one:
[https://github.com/wbsun/kgpu/](https://github.com/wbsun/kgpu/)

~~~
formerly_proven
> Treating the GPU as a computing co-processor.

I'm starting to get mildly disappointed about this paradigm. GPUs have the
hardware for very powerful dynamic parallelism, yet only CUDA supports that
(since no GPU driver of note supports more than OpenCL 1.x). I'd much rather
see the GPU-CPU pair used as a pair of nodes that are linked with a high-speed
interconnect (PCIe), yet APIs apart from CUDA never moved much beyond the
compute model of OpenCL 1.2.

------
trasz
I wonder if it’s somewhat similar to Linux developers refusing to support TCP
Offload Engine functionality, inventing various excuses easily proven false by
experiences with TOE support for other systems.

------
sillysaurusx
What's the best way to accomplish network card -> GPU streaming right now?
Netgpu sounds handy. Is there an alternative, or was netgpu the only way?

If network -> GPU is fast enough, you might be able to stream your training
data from a separate box. That'd be useful for massive datasets, e.g. training
GANs on terabytes of photos.

I'm also interested in the reverse: GPU memory -> network, without passing
through host memory. I've wanted to make real-time (>20 FPS) ML
visualizations, streamed directly from the training box down to the user.

------
shmerl
Nvidia should stop fooling around and support Nouveau. Their dinosaur blob
driver approach is not something anyone needs.

~~~
Arnavion
Unfortunately it's something that all their customers who use it today need,
and are quite satisfied with using. As long as their customers have no
problems using the prop driver, nvidia has no reason to stop pushing it.

~~~
shmerl
_> Unfortunately it's something that all their customers who use it today
need, and are quite satisfied with using._

Doesn't look like it:
[https://www.gamingonlinux.com/index.php?module=statistics&vi...](https://www.gamingonlinux.com/index.php?module=statistics&view=trends)

Nvidia usage among gamers on Linux at least is gradually dropping. Blob is
causing all kind of problems (especially in Wayland session use cases), while
not offering any advantages over open drivers from AMD.

And with Intel coming out with gaming GPUs soon too, I'd expect this trend to
accelerate even.

~~~
Arnavion
Their gamer customers are mostly on Windows. Their Linux customers use their
cards for ML and CUDA.

~~~
shmerl
But we were discussing Linux users. Windows market is irrelevant for that.

Industrial Linux users are not using their gaming cards.

~~~
Arnavion
>But we were discussing Linux users. Windows market is irrelevant for that.

You brought up Linux gamers, so I pointed out the Linux gamer market is
miniscule because most gamers are on Windows.

>Industrial Linux users are not using their gaming cards.

Okay? The thread never said anything about gaming cards. Again, you're the one
who brought up gamers in the first place.

~~~
shmerl
_> The Linux gamer market is miniscule_

Not for the matters of the Linux driver. Whether Nvidia cares is another
question.

 _> The thread never said anything about gaming cards_

And Linux gamers is the main driving force behind open drivers. Industrial
users are too slow to have effect on that.

------
smartmic
Good news, seems like the GNU Linux-libre team has enough to do anyhow …
([https://lists.gnu.org/archive/html/info-
gnu/2020-08/msg00001...](https://lists.gnu.org/archive/html/info-
gnu/2020-08/msg00001.html))

------
vermilingua
I must be misunderstanding, but isn’t that symbol access limitation entirely
useless? If someone has the technical knowhow to develop a kernel patch set,
they certainly have the capability to do so on a kernel they’ve recompiled
without this limitation.

~~~
gnu8
Sure, anyone can do that and it is entirely fair to use the software that way
privately. The barrier is that they can’t distribute the resulting modules
because no one else will have a kernel that has been modified this way.

~~~
vermilingua
Right of course, yours and the sibling comment explained it perfectly. Cheers.

------
xvilka
NVIDIA is a source of the most kernel problems related to the proprietary
drivers. Pure evil. They manage to intoxicate the ecosystem without even doing
anything.

------
gridlockd
Make BSD great again.

~~~
akerro
it's always been great

~~~
zokula
Lol no.

~~~
non-entity
Do you have anything more substantive to say?

------
microcolonel
NVIDIA is even undermining kernel development that they had no hand in, is
this a first?

~~~
wrkronmiller
> The sad part is that, by all appearances, the goal of this work was not to
> add functionality for NVIDIA GPUs in particular. Lemon does not seem to be
> an NVIDIA employee; the patches included a Facebook email address.

~~~
pabs3
Its interesting the person posting the objection uses an @nvidia.com address.

~~~
striking
Which person?

~~~
Macha
Jason Gunthorpe

------
IshKebab
> The sad part is that, by all appearances, the goal of this work was not to
> add functionality for NVIDIA GPUs in particular.

I don't see how that can be true given that it is designed to speed up ML, and
only nVidia graphics cards are used for ML. Approximately nobody uses Intel or
AMD.

Maybe they meant it wasn't written by an nVidia employee? But it's clearly
intended to be used with nVidia GPUs in particular.

~~~
deft
It uses an NVIDIA apis, its made for NVIDIA tasks (CUDA) and it was only ever
designed to work on NVIDIA cards. Apparently being a Facebook employee means
you can't have pro-NVIDIA anti-anything else intentions... even when that's
clearly the case.

~~~
mroche
_Apparently being a Facebook employee means you can 't have pro-NVIDIA anti-
anything else intentions... even when that's clearly the case._

That’s being a tad disingenuous. Developing a solution to an in-house use case
while ignoring alternative platforms that you don’t use does not make one
“anti-anything else”. For a business of Facebook’s scale (or any scale for
that matter), would you or your manager approve of spending time on such a
task? Unless it was simple, there was enough time, or there were known
potentials of switching product lines, I would venture to guess: no.

However, trying to upstream such work without considering where it’s going in
which the very last patch review of 21 shows the problem isn’t the brightest
idea.

------
userbinator
I feel like all this needless bureaucracy could've been avoided completely if
Linux hadn't turned into a target for everyone to (try to) put code in. The
GPL allows modification and redistribution, after all.

 _With this patch applied, any module that imports symbols from a proprietary
module is itself marked as being proprietary, denying it access to GPL-only
symbols._

That's starting to sound like DRM.

~~~
josephcsible
> That's starting to sound like DRM.

It's close, but the key difference is it doesn't actually restrict the user,
since you're both legally allowed and technically able to build and use a
kernel without that code.

~~~
trasz
Not being able to use functionality because it never got implemented due to
some licensing shenanigans does not restrict the user?

