
Conformal Prediction: Machine Learning with Confidence Intervals - scottlocklin
https://scottlocklin.wordpress.com/2016/12/05/predicting-with-confidence-the-best-machine-learning-idea-you-never-heard-of/
======
arcanus
Interesting post!

A quibble:

> If you’re a Bayesian, or use a model with confidence intervals baked in, you
> may be in pretty good shape. But let’s face it, Bayesian techniques assume
> your prior is correct, and that new points are drawn from your prior. If
> your prior is wrong, so are your confidence intervals, and you have no way
> of knowing this.

I don't agree. Baysian models must be validated, like any model. While no
validation process is exhaustive, a predictive validation process is designed
to directly test the applicability of the prior to a set of results.

Furthermore, there are priors (Jeffrey's for example) that are entropy
maximizing, from an information standpoint. These non-informative priors are
designed to be used when an otherwise possibly misspecified prior would
otherwise be introduced. It is not uncommon for reviewers to ask for results
reproduced with Jeffries priors to ascertain of this is indeed the case.

~~~
scottlocklin
I said it in an awkward way. The important thing is, your CP gizmo will tell
you when something has gone haywire while you're using it. Your Bayes doodad
might not (other than noticing errors). In particular, some very important new
point may have a bad prediction associated with it, with bad error bars: CP
may be of big help here. There's an example of this in page 102-106 of the
original book I think. Another good example is applying this idea to HMMs
which you can read about in "Hidden Markov Models with Confidence" Giovanni
Cherubin1 and Ilia Nouretdinov

I'm not throwing poo at Bayesian models, which I think are sadly neglected
these days, but with CP ideas you can get more useful results. While I think
CP is useful for practitioners now, the most exciting applications are in
stuff like active learning, and developing novel techniques associated with
this basket of ideas.

Really, my blog kind of missed the mark. I need to do more with examples.

~~~
stdbrouw
> Really, my blog kind of missed the mark.

It'd be sad if people only published stuff once they considered it perfect. I
enjoyed the post, thanks for writing.

~~~
arcanus
> I enjoyed the post, thanks for writing.

Same. My quibble was just part of the discussion, not intended to imply it was
a poor article.

------
shoo
i'm glad to see conformal prediction getting a bit more exposure. the idea is
quite interesting and reasonable (in terms of the assumptions you make), and
in principle the conformal transductive approach can be used to turn any
standard supervised learning algorithm into one that produces some kind of
conformal confidence intervals. but, as Scott writes

> Practically speaking, this kind of transductive prediction is
> computationally prohibitive

I tried to use this some years ago - with the simpler ridge regression
conformal approach described in the book [1] - when fitting empirical models
to experimental data (low number of samples, very high cost of obtaining more
samples, high dimensional space) where it seemed desirable to produce some
kind of reasonable estimate of the uncertainty of the model fit without making
a bunch of assumptions about the underlying relationship.

> There are a number of ad hoc ways of generating confidence intervals using
> resampling methods and generating a distribution of predictions.

In practice I ended up doing something ad-hoc -- just boostrap sample and fit
a bunch of decision trees, then back some kind of crude confidence interval
out of the distribution of resulting predictions. I think I ended up
preferring this over over a conformal regularised linear model approach
because the trees seemed to be a better able to model the actual relationship
than whatever simple family of linear model we were using (probably just
degree 2 polynomials in the raw input values, there wasn't really enough data
for the number of dimensions to support doing much else).

I've never read up on the non-transductive approaches to conformal prediction,
so it'll be interesting to read up on some of the references from this post.

[1] Algorithmic learning in a random world - Vovk, Gammerman, Shafer
[http://www.alrw.net/](http://www.alrw.net/)

------
volodia
That's an interesting discussion! Having read the Vovk papers, this blog post
definitely presents things much more clearly. The original papers often don't
adhere to the standard definition/lemma/proof style of mathematical
exposition, which makes them really hard to follow.

It's also an interesting coincidence that this story on the front page today.
I'm giving a talk tomorrow at AAAI on some work that extends this theory. We
show how to do uncertainty estimation (e.g. calibrated probabilities for ML
classifiers) under fully adversarial assumptions (input data can be chosen by
an adversary). I'll do a shameless plug and post the paper here, in case
people are interested in this general topic:

[https://arxiv.org/abs/1607.03594](https://arxiv.org/abs/1607.03594)

~~~
scottlocklin
Some guys in Reddit pointed this out as a more clear presentation:

[https://people.dsv.su.se/~henke/DSWS/johansson.pdf](https://people.dsv.su.se/~henke/DSWS/johansson.pdf)

