
Deep Learning and Free Software - pilooch
https://lwn.net/SubscriberLink/760142/c328ef70b2d47794/
======
infinity0
Debian packager of Leela Zero here.

Most of the questions raised in the thread were basically irrelevant to Leela
Zero, which does everything "correct" from the point-of-view of a strict
interpretation of free software:

\- open freely licensed data set \- open freely licensed training code

The only issue relevant for user software freedom, is that the results of the
training process can't be reproduced easily.

I was irritated at the thread because 90% of it was theorising about some
alternative situation where the data set was not free, or the training code
was not free. That is not the situation we have with Leela Zero, we can skip
that discussion and focus on what's _actually_ at issue, i.e. the reproduction
/ verification of the training process.

~~~
GistNoesis
How do you deal with evolving software versions (and dataset versions, and
build environment). Do you have to retrain the whole weights to make sure the
training would results in exactly the same weights. Maybe some unit cases/non-
regression tests between versions to check that there are no breaking changes
and that results are fully deterministic?

~~~
infinity0
That's a very interesting question, especially since the training data is also
(partly) generated by the software in a self-feedback loop. For the case of
Leela Zero, most of it has been generated by Leela Zero itself based on its
own algorithm, but some of the data has been supplied from the data set
generated by the Facebook ELF Go engine. (They used it as a shortcut to catch
up to the level of ELF, and then surpass it.)

In order to reproduce the current best-weights then, one would have to record
exactly what versions of Leela Zero and ELF was used to generate which subsets
of data, and which subsets were used to create further subsets of the data. I
don't think anyone has kept that information around, so I'd guess the current
best-weights will not actually be reproducible, ever.

In future, one can imagine other software that could keep track of this
information, and then be actually able to reproduce a particular resulting set
of weights.

However let's step back a bit. On a high-level, nobody actually cares that the
results are not fully deterministic, they only care that it is a faithful
representation of what the source code does. This is true both for software
determinism and for weights-model determinism. Being deterministic is a
(relatively) easy property, which when we achieve it, allows us to verify that
the results (binary software, trained weights) don't contain backdoors or
other unpleasantness that's not visible in the source code. But the latter is
what we "actually" care about.

If we can achieve the latter property without achieving determinism, then we
are also mostly satisfied. That would involve being able to examine the model
directly and see what it does, and see that it doesn't contain backdoors or
other things. I can't even begin to imagine how to achieve this, it is a hard
problem and the solution to this, would also solve the criticism of these AI
weights/models being opaque and not really contributing much to human
knowledge.

Determinism is still useful for other purposes though. If you know exactly how
something was produced, you have much greater control and understanding of how
to tweak it, which might actually help us with the aforementioned goal of
deeply-understanding these weights from a human perspective.

------
JanisL
Its good to see some discussion about this. I'm currently maintaining an open
source automated speech transcription library ([https://github.com/persephone-
tools/persephone](https://github.com/persephone-tools/persephone)) that's
being used in some research and we are extremely concerned about enabling
reproducible research. Because the system is neural net based (Tensorflow) we
have gone to effort beyond just open sourcing the code to make sure that
people can rerun experiments. Reading this and the comments makes me aware
that there might be sources of nondeterminism or other issues I may not be
aware of. What issues should we be aware of for making sure that people can
replicate results that were trained earlier? Are there any resources that
discuss how people have gone about doing this? Any advice would be much
appreciated

~~~
nlowell
I think distributing the weights will give you guaranteed reproducibility. If
your users intend on retraining the network, another idea would be to set
performance expectations on a validation set within ballpark. So you could
tell them, "we got 90% accuracy on this dataset, if you re-train and get below
80 you've probably made a mistake somewhere." The scary thing to me would be
the very small test cases where maybe different trained neural nets end up
having a lot of variance because there is barely more than noise to learn.

~~~
JanisL
Distributing the weights seems to be the safest method, something I'll have to
look into a bit more. It's a bit disturbing that training with the same data
and same parameters could lead to different weights and therefore different
accuracies but it's better to know that this is the case than not.

------
jsty
'In fact, they are probably not redistributable unless all the training data
is supplied, since the GPL's definition of "source code" is the "preferred
form for modification". For a pretrained neural network that is the training
data'.

To me, this is the crux of the matter. If the view is taken that software
distributed under a "free-software" compatible license is non-free without the
ability to obtain all training sets for any models, there's going to be huge
difficulty in incorporating ML into free software in many cases. Since many
datasets aren't able to be redistributed freely (e.g. licensing, legal, cost),
that's an enormous advantage for non-free offerings.

A possible route around this might be 'community curated' datasets, where
contributors freely license their data in the same way as we do code. It'd be
interesting to see if they come with an analog of the AGPL - ie. models
trained on this dataset must be released to the user as source. (This might
already exist?)

~~~
sanxiyn
In the specific case of Leela Zero, entire training dataset, all 3 gigabytes
of it, is available for download at [https://leela.online-
go.com/zero/](https://leela.online-go.com/zero/)

~~~
NhanH
The above link actually only has the raw sgf files (game records). The actual
training data (output of MCTS search) is here: [https://leela.online-
go.com/training/](https://leela.online-go.com/training/)

------
xvilka
NVIDIA is the major obstacle to FOSS for a decades already. So they keep their
spirit even now. Mainline Tensorflow, for example, still doesn't support
OpenCL.

~~~
stochastic_monk
And OpenCL was recently deprecated by OSX, wasn’t it?

Edit: yes. Discussion on hn:
[https://news.ycombinator.com/item?id=17231593](https://news.ycombinator.com/item?id=17231593)

~~~
dragandj
So what? High-end GPU are used in servers and desktops, which almost
exclusively run Linux in this space. The really relevant thing here is
hardware (Nvidia vs AMD) and software - Nvidia drivers + CUDA (+ Nvidia's
OpenCL) vs AMD drivers + OpenCL.

So far Nvidia does excellent technical work with their hardware + their out of
the box CUDA toolkit + support in every major deep learning library. The
drawback is that they keep everything closed and they charge a huge premium.
AMD has great hardware at good prices but the software side is non-existent
and the Linux support goes from bad to worse.

Nvidia is the one having absolute control here and they choose to squeeze the
market because they can.

~~~
stochastic_monk
And Nvidia isn’t the only company to do so. Intel had a similar monopoly and
squeezed it for all it was worth, while providing the best available
scientific computing libraries. I can’t blame them, but I would certainly
prefer it if there was a way to generically dispatch optimal code for
heterogeneous hardware.

~~~
gnufx
The advantages of MKL (or does that mean something else?) are over-sold.
Indeed, I understand the relevant primitives in this context were driven by
Pabst's free implementations in libxsmm. I don't know whether "dispatch
optimal code for heterogeneous hardware" means chose the SIMD instructions or
offload appropriately, but what's missing in whichever case it is?

Intel's free software release seems to me to be rather in their favour
compared with other vendors.

~~~
stochastic_monk
I was speaking regarding different GPUs. I’d like a framework which generates
the fastest code for performing a given task for whatever GPU you have.

MKL’s fft isn’t a huge improvement on fftw with avx512, but its blas is ~3x as
fast as openblas currently. And before these open source projects caught up,
it was by far the best.

And I agree, they’ve been better about it lately. I was comparing Intel then
and Nvidia now.

~~~
gnufx
I don't know a lot about GPU compilation/offload, but I'd expect normally to
dispatch to library kernels the way linear algebra frameworks do, given we
can't even compile GEMM effectively on CPUs. Sufficiently smart tools always
welcome, though.

For a free avx512 BLAS, use the current release of BLIS. OpenBLAS recently
gained skx (but not knl) gemm support, but I don't know how good it is as I
don't have the hardware.

------
stared
People not sharing weights (or sharing but on non-open licenses) is an issue.
Other things are not an issue (and from the tone of this post is clear that
they repeat misheard things).

First, you can load things into your network regardless if you use CPU or GPU.
And if needed, people can write GPU code for other architectures/

Second, inference (prediction) is fast. Yeah, it may be not useful for real-
time applications on CPU (like: self-driving cars) but for detecting one
object it can be fast (see e.g. [https://transcranial.github.io/keras-
js/#/resnet50](https://transcranial.github.io/keras-js/#/resnet50)).

Third, using things like TensorFlow.js you get GPU acceleration with any GPU
card, not only nVidia. It is not nearly as fast, but still faster than Python
+ CPU. There are real-time demos such as
[https://experiments.withgoogle.com/collection/ai/move-
mirror...](https://experiments.withgoogle.com/collection/ai/move-
mirror/view/mirror).

Side note: I just start [https://inbrowser.ai/](https://inbrowser.ai/) for
tutorials and open source templates for using fully frontend AI.

------
black_puppydog
Fundamentally, the large-scale collection of (often user generated) data to
train models puts a more and more power into the hands of those doing the
collection.

Since giving away this data is mostly neither possible or desirable legally,
nor in the interest of the dataset owner, there is a tradeoff between wanting
to learn from data, and preserving privacy of the people generating/described
by the data.

That is not to say that science is not trying, like for example this paper:

[https://dl.acm.org/citation.cfm?id=2813687](https://dl.acm.org/citation.cfm?id=2813687)

------
tree_of_item
People keep saying it's "not feasible" to do deep learning on a CPU. Is that
actually true? I'm thinking of papers like this one[0] from Uber. that make
the claim that CPU training is very much feasible, in this case with a cluster
of 10 machines, each with 72~ CPUs. There's a blog post about someone
recreating the paper here[1].

Sure, you can't do it on ONE CPU, but the point is to have a cluster of CPUs.
It's not the case that you're forced to use nVidia's proprietary stuff in
order to do deep learning.

[0]: [https://arxiv.org/abs/1712.06567](https://arxiv.org/abs/1712.06567) [1]:
[https://towardsdatascience.com/paper-repro-deep-
neuroevoluti...](https://towardsdatascience.com/paper-repro-deep-
neuroevolution-756871e00a66)

~~~
blt
It's definitely feasible and widely used for reinforcement learning on low
dimensional systems, where the neural networks are small and the simulator is
more expensive than backprop. On other hand, deep Q learning from pixels on
Atari is practically impossible without GPUs.

~~~
tree_of_item
Huh? I just showed you reinforcement learning from pixels on Atari with CPU...

------
jxub
So... deep learning is just creating black-box platform-specific behaviours
that can't be shared and thus threaten the OSS model.

Is the era of computers + internet = freedom and knowledge-sharing over?

Perhaps I'm over-dramatising but this is not something to be okay with.

------
sanxiyn
In the specific case of Leela Zero, there is no NVIDIA lockin whatsoever.
Leela Zero uses OpenCL so that people with AMD GPU can use it. Deep learning
research code can require NVIDIA; end-user game software mostly can't.

------
protomikron
This is totally an issue and I am glad there is some emerging discussion. It's
true that practically you can run SGD on a CPU but it is really not feasible
as your training time might blow up by a factor of 50 and it is not unusual to
train a network for a week using nvidia GPUs (and their proprietary drivers
and frameworks like cudnn).

At this point in time you just can't do some of the DL stuff without
proprietary nvidia tech.

------
gaius
I kinda think this is a non-issue. If you are working at the level of Keras
then the back-end is pluggable. You can use a fancy GPU relying on a binary
blob, you can use PlaidML to use any random GPU, you can use CPU as a last
resort. The Keras code is all the same, you just set the backend you want in
an environment variable.

