
Certigrad: bug-free machine learning on stochastic computation graphs - kg9000
https://github.com/dselsam/certigrad
======
taneq
I find the appeal of formally proven languages somewhat confusing. All you're
doing is moving the bugs from the source code to the specification.
(Alternately, you can think of source code as a 'specification' for a compiled
program. You still have to transfer the same amount of information to the
computer.)

~~~
dselsam
> All you're doing is moving the bugs from the source code to the
> specification.

The value of doing this can vary, but there are some cases in which the gain
is immense and indisputable. Suppose you are writing a compiler optimization.
Your source code could be arbitrarily complicated, but your specification is
simply "an optimized program always behaves the same as the original".

~~~
Scea91
> your specification is simply "an optimized program always behaves the same
> as the original"

I wouldn't say it is simple, since it is undecidable for general programs.

~~~
jroesch
Daniel's point is that the specification itself is simple, he is saying
nothing about the complexity of proving that your program matches the
specification.

The decidability of the specification isn't important in this context. It is
known there is no procedure to build a proof of this specification for any
language or optimization, but as humans we can prove that it holds for a
specific configuration of compilers, and optimizations, and have done so many
times, for example CompCert, Alive, etc.

------
gtani
This seems to be a newly hot topic, whether to fight adversarial inputs or
looking for efficiencies. I bookmarked a few relevant blogs (but really need
to read more deeply)

[http://composition.al/blog/2017/05/31/proving-that-safety-
cr...](http://composition.al/blog/2017/05/31/proving-that-safety-critical-
neural-networks-do-what-theyre-supposed-to-where-we-are-where-were-going-
part-2-of-2/)

[http://www.cleverhans.io/security/privacy/ml/2017/06/14/veri...](http://www.cleverhans.io/security/privacy/ml/2017/06/14/verification.html)

[https://blog.foretellix.com/](https://blog.foretellix.com/)

and another preprint:
[https://arxiv.org/abs/1706.10268](https://arxiv.org/abs/1706.10268)

------
binarymax
Meta to the Certigrad project, for those mystified by the 'lean' programming
language, here is a good start:
[https://leanprover.github.io/programming_in_lean/programming...](https://leanprover.github.io/programming_in_lean/programming_in_lean.pdf)

~~~
taneq
PDF warning for those of us on mobile.

------
alimw
Previous discussion of the associated paper:
[https://news.ycombinator.com/item?id=14658832](https://news.ycombinator.com/item?id=14658832)

------
pkay
> Note: building Certigrad currently takes ~15 minutes and consumes ~7 GB of
> memory.

Why does it take so long and use so much memory?

~~~
dselsam
Author here.

Building Certigrad involves replaying all tactic scripts in the entire project
to reconstruct all of the formal proofs, and then checking each of the formal
proof objects in Lean's small trusted kernel. Proving (and checking) the main
correctness theorem for stochastic backpropagation is very fast. The vast
majority of the time and memory is spent verifying that a specific machine
learning model (AEVB) satisfies all the preconditions for backprop. This
involves proving several technical conditions, e.g. that various large terms
are (uniformly) integrable. We have not experimented much with simplification
strategies, and there is probably a lot of room for improvement in bringing
these numbers down. It would also be good to provide an option to build the
system without reconstructing the proofs; checking the proofs is analogous to
running the entire test suite, and most users do not do this for every tool
they build.

~~~
xgk
Could this problem have been (partly) avoided by going for an LCF-based prover
such as HOL or Isabelle/HOL, rather than a system based on the Curry-Howard
correspondence?

~~~
dselsam
I do not even know how I would have built Certigrad in Isabelle/HOL in the
first place. In my first attempt to build Certigrad, I used a non-dependent
type for Tensors (T : Type), and I accumulated so much technical debt that I
eventually gave up and started again with a dependent type for tensors (T :
Shape -> Type). This was my only major design error in the project, and
development went smoothly once I made this change.

~~~
xgk
ITTs like Lean are more expressive as logics than HOL, but I doubt that
Certigrad needs even a fraction of HOL's power. So anything you need that you
express as types in Lean you should be able to express as theorem in HOL.
Presumably that is inconvenient (= technical debt), but why?

~~~
jojo3000
Tensors are a good example: In a proof you might want to do induction over the
dimensions of a tensor. This means your type of tensors needs to contain all
tensors of all dimensions. But working on the type of all tensors is not so
nice anymore: a lot of algebraic properties do not hold, they only hold for
tensors of a specific dimension. An example is the Tensor library in the
Deep_Learning AFP entry.

Now in Isabelle most strutures are encoded as type class, but when your type
is not anymore in the type class suddenly you need to proof a lot of theorems
yourself, and the automation does not work as nice as with type classes.
Generally, in HOL provers you want at least that your type at least has nice
algebraic properties on the entire type. If this is not the case proofs get
much more convoluted. Isabelle's simplifier supports this case using
conditional rewrite rules, but it still does not work as nice as using type
inference to handle this cases.

In dependent type theorem provers it isn't a problem to proof everything on
tensors with a fixed dimension. When a proof requires to do induction on the
dimension itself these kind of shape can always constructed in the proof
itsefl.

------
misterAxiom
> Machine learning systems are also difficult to engineer because it can
> require substantial expertise in mathematics (e.g. linear algebra,
> statistics, multivariate analysis, measure theory, differential geometry,
> topology) to even understand what a machine learning algorithm is supposed
> to do and why it is thought to do it correctly.

It's like Teen Talk Barbie.

