
Developing Bug-Free Machine Learning Systems with Formal Mathematics - pkay
https://arxiv.org/abs/1706.08605
======
rhaps0dy
Finally, an excuse to learn to use theorem provers and include them in my
research as an auxiliary tool!

Note that they claim performance similar to TensorFlow in CPU only. But a
direct translation of this to Python TensorFlow code could run in GPU, and is
probably easy to make. It wouldn't be proven, but it would be much more likely
to be correct than the code you have constructed for your research just now.

Also as scribu states, this doesn't allow you to prove final goals of the
system like "classify images with 95% accuracy", nor does it save you from
insufficient or inaccurate data.

(Edit: added last paragraph, finished reading paper)

~~~
kg9000
> Note that they claim performance similar to TensorFlow in CPU only. But a
> direct translation of this to Python TensorFlow code could run in GPU, and
> is probably easy to make. It wouldn't be proven, but it would be much more
> likely to be correct than the code you have constructed for your research
> just now.

They are wrapping unverified C++ code (Eigen) for the primitive kernels
anyway, such as gemm, so AFAIK it could be extended to launch kernels on GPUs
without any modification to the part in Lean that they proved correct.

Since most ML users are happy to trust TensorFlow, one could also wrap
(unverified) TensorFlow and then use their approach to prove properties about
real TensorFlow programs, e.g. that various model-transformations are sound.

~~~
dselsam
Author here.

> They are wrapping unverified C++ code (Eigen) for the primitive kernels
> anyway, such as gemm, so AFAIK it could be extended to launch kernels on
> GPUs without any modification to the part in Lean that they proved correct.

That is correct. Our Lean development assumes that we have primitive kernels
that we trust, and it is otherwise agnostic about how they are implemented. It
would be straightforward to have kernels run on GPUs.

> Since most ML users are happy to trust TensorFlow, one could also wrap
> (unverified) TensorFlow and then use their approach to prove properties
> about real TensorFlow programs, e.g. that various model-transformations are
> sound.

This is a great idea. There are a bunch of useful properties TensorFlow users
could prove about their programs. For example, a user could prove that their
TensorFlow program will never produce a NaN, or that a mutation operation is
"safe" and won't corrupt the gradients (see
[https://groups.google.com/a/tensorflow.org/forum/#!topic/dis...](https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/kdhnp35QJfs)
for an example of unsafe usage). Moreover, since verifying specific properties
of specific programs is much more routine than verifying algorithms, most of
the proofs could be push-button.

To TensorFlow users reading this: what are some common errors you make that
are annoying to track down, that we may be able to catch (or prove the absence
of) using formal methods? If there is enough demand perhaps we will work on
this next.

------
scribu
Sounds cool, but how do you "state a formal theorem defining what it means for
the system to be correct" when the task is classifying objects in an image,
for example?

~~~
rhaps0dy
You can't, not yet. However you can specify the following:

>The main purpose of the system is to take a program describing a stochastic
computation graph and to run a randomized algorithm (stochastic
backpropagation) that, in expectation, provably generates unbiased samples of
the gradients of the loss function with respect to the parameters.

which is pretty good, and can save you from errors in manipulating the math of
your gradient to make it have less variance, and in implementing it.

------
kmicklas
This is the final state of tech (until general AI). Fully-proven
infrastructure, AI-powered user level features.

------
alonchu
> Stochastic computation graphs extend the computation graphs that underly
> systems like TensorFlow ... by allowing nodes to represent random variables
> ...

Doesn't TensorFlow support random variables too? They even compare their
performance to TensorFlow on a model with random variables.

~~~
dselsam
> Doesn't TensorFlow support random variables too?

The paper doesn't explain this well, but although you can put random variables
in TensorFlow programs, you cannot backpropagate through them. With stochastic
computation graphs, you can differentiate the expected loss as long as the
probability density/mass functions of the random variables are differentiable.
One of the main benefits of stochastic computation graphs is that you can
train with arbitrary, non-differentiable simulators as long as they only
depend on the parameters indirectly though random variables.

> They even compare their performance to TensorFlow on a model with random
> variables.

The naive variational autoencoder of Figure 2 cannot be trained in TensorFlow
since the random variable depends on the parameters. However, this particular
model can be reparameterized so that it doesn't depend on them without
affecting the expected loss (Section 4.6) and the resulting model can be
trained in TensorFlow.

