Hacker News new | comments | show | ask | jobs | submit login
Developing Bug-Free Machine Learning Systems with Formal Mathematics (arxiv.org)
111 points by pkay 141 days ago | hide | past | web | 9 comments | favorite



Finally, an excuse to learn to use theorem provers and include them in my research as an auxiliary tool!

Note that they claim performance similar to TensorFlow in CPU only. But a direct translation of this to Python TensorFlow code could run in GPU, and is probably easy to make. It wouldn't be proven, but it would be much more likely to be correct than the code you have constructed for your research just now.

Also as scribu states, this doesn't allow you to prove final goals of the system like "classify images with 95% accuracy", nor does it save you from insufficient or inaccurate data.

(Edit: added last paragraph, finished reading paper)


Author here.

> Also as scribu states, this doesn't allow you to prove final goals of the system like "classify images with 95% accuracy", nor does it save you from insufficient or inaccurate data.

We focused on implementation-based notions of correctness, which are both much easier to state and much more useful when actually developing software, but in principle downstream notions of correctness could be stated and proved as well. For example, for some models and under some assumptions about the data, you could formally verify that the accuracy of your trained model will probably be high on an unseen test set. However, the limiting factor is that nobody knows that much yet about the assumptions under which most useful models (such as neural networks) are guaranteed to perform well.


> Note that they claim performance similar to TensorFlow in CPU only. But a direct translation of this to Python TensorFlow code could run in GPU, and is probably easy to make. It wouldn't be proven, but it would be much more likely to be correct than the code you have constructed for your research just now.

They are wrapping unverified C++ code (Eigen) for the primitive kernels anyway, such as gemm, so AFAIK it could be extended to launch kernels on GPUs without any modification to the part in Lean that they proved correct.

Since most ML users are happy to trust TensorFlow, one could also wrap (unverified) TensorFlow and then use their approach to prove properties about real TensorFlow programs, e.g. that various model-transformations are sound.


Author here.

> They are wrapping unverified C++ code (Eigen) for the primitive kernels anyway, such as gemm, so AFAIK it could be extended to launch kernels on GPUs without any modification to the part in Lean that they proved correct.

That is correct. Our Lean development assumes that we have primitive kernels that we trust, and it is otherwise agnostic about how they are implemented. It would be straightforward to have kernels run on GPUs.

> Since most ML users are happy to trust TensorFlow, one could also wrap (unverified) TensorFlow and then use their approach to prove properties about real TensorFlow programs, e.g. that various model-transformations are sound.

This is a great idea. There are a bunch of useful properties TensorFlow users could prove about their programs. For example, a user could prove that their TensorFlow program will never produce a NaN, or that a mutation operation is "safe" and won't corrupt the gradients (see https://groups.google.com/a/tensorflow.org/forum/#!topic/dis... for an example of unsafe usage). Moreover, since verifying specific properties of specific programs is much more routine than verifying algorithms, most of the proofs could be push-button.

To TensorFlow users reading this: what are some common errors you make that are annoying to track down, that we may be able to catch (or prove the absence of) using formal methods? If there is enough demand perhaps we will work on this next.


Sounds cool, but how do you "state a formal theorem defining what it means for the system to be correct" when the task is classifying objects in an image, for example?


You can't, not yet. However you can specify the following:

>The main purpose of the system is to take a program describing a stochastic computation graph and to run a randomized algorithm (stochastic backpropagation) that, in expectation, provably generates unbiased samples of the gradients of the loss function with respect to the parameters.

which is pretty good, and can save you from errors in manipulating the math of your gradient to make it have less variance, and in implementing it.


This is the final state of tech (until general AI). Fully-proven infrastructure, AI-powered user level features.


> Stochastic computation graphs extend the computation graphs that underly systems like TensorFlow ... by allowing nodes to represent random variables ...

Doesn't TensorFlow support random variables too? They even compare their performance to TensorFlow on a model with random variables.


> Doesn't TensorFlow support random variables too?

The paper doesn't explain this well, but although you can put random variables in TensorFlow programs, you cannot backpropagate through them. With stochastic computation graphs, you can differentiate the expected loss as long as the probability density/mass functions of the random variables are differentiable. One of the main benefits of stochastic computation graphs is that you can train with arbitrary, non-differentiable simulators as long as they only depend on the parameters indirectly though random variables.

> They even compare their performance to TensorFlow on a model with random variables.

The naive variational autoencoder of Figure 2 cannot be trained in TensorFlow since the random variable depends on the parameters. However, this particular model can be reparameterized so that it doesn't depend on them without affecting the expected loss (Section 4.6) and the resulting model can be trained in TensorFlow.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: