
Unwittingly obfuscating the fact that you're not doing AI - s_Hogg
https://breakitdownto.earth/2019/06/06/Obfuscating_a_lack_of_AI.html
======
ageitgey
This is great advice. Another related issue I see is when you engineer a new
feature that can't always be 100% accurate because the source data is spotty
but you intuitively think the new festure should help the classifier anyway
when it is present. And if the new feature's feature importance in the trained
model turns out really high, you think you've done something great. But in the
end you made model that simply detects the presence of your new feature which
you knew wasn't 100% accurate anyway because the source data it is derived
from is spotty. So you've accomplished precisely nothing.

~~~
s_Hogg
OP Here, glad you liked it!

The thing you're talking about definitely happens heaps as well, because of a
fundamental mental blind spot we have. I'd definitely love to hear if you've
got any more stories along these lines. The psychology of what makes a
successful machine learning project really interests me, and I don't mean in
terms of platitudes about openness and transparency.

I'm really tempted to write another post about specifically the sort of thing
you talk about in your example - narrative fallacies in machine learning.
Basically because we operate in the unknown we tend to want to string the
evidence we have together in a nice appealing way.

------
_bxg1
There was an article on here a few months back that said something like, "The
majority of today's applications of AI could be just as well - if not better -
served by a simple heuristic"

~~~
ksaj
I thought it was "database." Or maybe I'm projecting. A lot of "AI" project
sales blurbs I've seen suggest they are really just databases with a pretty
search function. A number of Knowledge Bases fit this example.

------
privong
I think this is minor, but I noticed the example doesn't use error bars for
the customer numbers. The customer counts in the product/churn categories are
counting statistics, so have an associated Poisson uncertainty. My guess is
that considering the uncertainties when doing the likelihood parameter
estimation won't fully obviate this issue, but I wonder how much it would
help. I'm also not sure if real-world implementations commonly consider such
uncertainties on their measured metrics.

~~~
s_Hogg
Hi - the reason there are no error bars for the number of customers in that
2x2 table is because there is no associated uncertainty. Those numbers are as
you might find them in a dataset from summing with this particular problem.
The problem we're looking at here is a binary churn/no churn problem, as
opposed to one that looks at how many people churn out of a population.

That said, that absolute lack of uncertainty is itself the problem. The
Maximum Likelihood approach to this sort of modelling implicitly assumes that
the data you have is all there is to know about the problem you're working on
and so can very easily overfit on a weird artefact like that. If you want to
incorporate some kind of uncertainty in your estimates, then you either need
to augment your dataset (i.e. include some random examples, roughly speaking),
or estimate your model using the Bayesian approach which explicitly allows for
uncertainty relating to the data itself.

Hope this clarifies!

~~~
privong
> Those numbers are as you might find them in a dataset from summing with this
> particular problem. The problem we're looking at here is a binary churn/no
> churn problem, as opposed to one that looks at how many people churn out of
> a population.

But isn't that still a count per bin? Since it's a count it has a Poisson
uncertainty.

> f you want to incorporate some kind of uncertainty in your estimates, then
> you either need to augment your dataset (i.e. include some random examples,
> roughly speaking), or estimate your model using the Bayesian approach which
> explicitly allows for uncertainty relating to the data itself.

Or use a likelihood that considers uncertainties. That also allows one to
explicitly consider the uncertainties when maximizing the likelihood.

~~~
s_Hogg
> But isn't that still a count per bin? Since it's a count it has a Poisson
> uncertainty.

Yes, as presented there. But in a binary classification setting, that's not
how the data would be presented to the model. Instead you would have one row
per customer with a churn/no churn label for that customer along with values
for a number of independent variables you deem relevant. The reason I put it
in that 2x2 table like that is just to make the problem more apparent. If you
had potentially millions of customers (and therefore rows), the exact
separation problem would not be as blatantly obvious to the naked eye as it is
there - this is partly why I recommend using confusion matrices to check for
whether this is happening.

> Or use a likelihood that considers uncertainties. That also allows one to
> explicitly consider the uncertainties when maximizing the likelihood.

Insofar as a set of parameters arrived at by means of MLE has associated
standard errors, yes there is some uncertainty involved. If I understand you
correctly, what you're talking about is modifying the likelihood to be flatter
so that it can't get caught in this one localised whirlpool as easily. That's
effectively regularisation - which I cover in the article. You could do it,
but it's more or less a band aid. Really the data itself is the problem. Did
you have a specific thing in mind when you were talking about likelihoods?

~~~
privong
> But in a binary classification setting, that's not how the data would be
> presented to the model. Instead you would have one row per customer with a
> churn/no churn label for that customer along with values for a number of
> independent variables you deem relevant. The reason I put it in that 2x2
> table like that is just to make the problem more apparent.

I see. Thanks for clarifying that.

> If I understand you correctly, what you're talking about is modifying the
> likelihood to be flatter so that it can't get caught in this one localised
> whirlpool as easily.

Effectively, yes. But by adding a term for the uncertainty on the
measurements, not an uncertainty on the fit parameters (though those exist as
well).

