
Why are we using black box models in AI when we don’t need to? (2019) - Hooke
https://hdsr.mitpress.mit.edu/pub/f9kuryi8
======
crazygringo
I think it's important to note that _human_ pattern recognition is basically
black-box as well.

We can't "explain" how we recognize a particular person's face, for example.

Robust pattern recognition based on 100+'s of factors is just inherently black
box.

Even when people make decisions, they're generally a weighted combination of a
bunch of "feelings". We attempt to explain a kind of simple, logical
rationalization afterwards, but psychologists have shown that this is often a
bunch of post-hoc fiction.

Black box doesn't necessarily mean bad. I think the relevant question is: how
do we ensure machine learning is trained and verified in ways that don't
encode bias, and only used with "intended" inputs, so a model isn't being
trusted to make predictions on things it wasn't trained for?

And also: when do we want to decide to use alternatives like decision trees
instead -- because despite being less accurate, they can have designers who
can be held legally accountable and liable, which can be more important in
certain situations?

~~~
daoxid
> I think it's important to note that human pattern recognition is basically
> black-box as well.

Agreed. But as you note, even though humans are basically black boxes we can
ask them questions in order to find out how they came to a particular
conclusion. (How reliable the answers to these questions are is of course a
different matter.)

So maybe we don't necessarily need fully interpretable models but simply a way
to ask black-box models specific questions about their state, e.g., "To what
degree does a person's age influence the output?".

~~~
knzhou
> But as you note, even though humans are basically black boxes we can ask
> them questions in order to find out how they came to a particular
> conclusion.

No, you can't. If somebody treats you with suspicion, it's because of a
combination of their news intake, their culture, local events, what their
friends and family would think, the way you present yourself, and many other
factors. You can always ask somebody to state their reason as a simple "if-
then" statement, and they can make one up on the spot, but it'll be so
oversimplified that it's basically a lie.

> So maybe we don't necessarily need fully interpretable models but simply a
> way to ask black-box models specific questions about their state, e.g., "To
> what degree does a person's age influence the output?".

You can already do that. Just change that number in the input and see how the
output changes. To that extent, even the most black box AI model is more
transparent than human decision making.

~~~
daoxid
> You can always ask somebody to state their reason as a simple "if-then"
> statement, and they can make one up on the spot, but it'll be so
> oversimplified that it's basically a lie.

Well, I guess it depends on how self-aware a person is. I think the biggest
danger is trying to rationally explain your decision when in fact it was based
mostly on your feelings, in which case I agree that the explanation is
"basically a lie". One needs to be honest when something is not based on a
fact but on a feeling to prevent pointless discussions. (If I hold an opinion
based on a feeling then you cannot convince me that I am wrong by giving me
facts.)

> You can already do that. Just change that number in the input and see how
> the output changes.

Makes sense. But I guess transparent models would still be generally
preferable because you can fully understand how the output is produced,
whereas in black-box models you might have to ask quite a lot of questions to
get a feeling for it, but even then you can't be sure that you have a full
understanding of it.

------
mkl
They never got to the results of the competition and how the interpretable
model did!

From [https://www.fico.com/en/newsroom/fico-announces-winners-
of-i...](https://www.fico.com/en/newsroom/fico-announces-winners-of-inaugural-
xml-challenge): "The team representing Duke University, which included Chaofan
Chen, Kangcheng Lin, Cynthia Rudin, Yaron Shaposhnik, Sijia Wang and Tong
Wang, received the FICO Recognition Award acknowledging their submission for
going above and beyond expectations with a fully transparent global model and
a user-friendly dashboard to allow users to explore the global model and its
explanations. The Duke team took home $3,000."

Cynthia Rudin is one of the article's authors.

~~~
RSchaeffer
But how did their model compare against others? The article only mentions how
their interpretable model compared against their own ML attempts

~~~
jph00
Their model didn't win. IBM's model won, based on actual metrics around useful
insights.

~~~
Mathnerd314
The IBM team got $5,000 and the second place/honorable mention NYU got $2,000.
So going by prize amounts, the Duke model was still pretty good.

IBM turned the model/paper into a toolkit:
[https://www.ibm.com/blogs/research/2019/08/ai-
explainability...](https://www.ibm.com/blogs/research/2019/08/ai-
explainability-360/) Their model seems to be a variant of decision trees that
has a knob controlling how complicated the trees are.

And the evaluation was completely subjective, so there's not any meaning to
the Duke people losing besides that the judges didn't like them.

~~~
netcan
Reading "subjective" to mean "nonexistent" is a potentially big mistake.

Objectivity is more accurate, sure. The winner of an objective contest is
always objectively better against objective criteria. But, objective criteria
are generally narrow. This works well if one is either (a) seeking fundamental
principles like in physics or (b) the narrow objective criteria _is_ the
definite goal.

In this area, we don't exactly know how to define narrow, objective goals &
subsequent criteria. We can definee goal _posts_ , but not goals. These are
guesses at useful markers of success, useful to the larger goal of
useful/novel ai.

Subjective goals have their own (massive problems), but since we can't
objectively define the goals of ai research... we need to fall back on human
subjectivity to define our subgoals.

~~~
dragonwriter
All objective criteria are chosen, directly or indirectly, based on subjective
criteria.

~~~
netcan
Also true. Subjectivity is unavoidable as long as we are relevant.. or so it
seems circa 2020

------
1e-9
The main advantage of a blackbox ML solution is shorter development time to a
useful performance level. Creating a transparent, explainable solution
typically takes more time, more work, and a higher level of expertise to get
to the same performance level. If the problem is complicated and the cost of a
mistake is low, then your best approach today is likely to be blackbox. If the
cost of a mistake is high, you should not even be considering a blackbox
approach (at least not without transparent safeguards). There is a large gray
area between these two extremes that requires good engineering judgement to
choose well.

I think we definitely need more R&D dedicated to creating easier-to-use,
lower-cost approaches to transparent, explainable ML. There is way too much
effort devoted to blackbox R&D today. Ultimately, transparent, explainable ML
should almost always beat blackbox ML due to a better ability to find and
resolve hidden problems, such as dataset biases, that may be holding back
performance.

~~~
autokad
> The main advantage of a blackbox ML solution is shorter development time to
> a useful performance level.

I think that's kinda true but kinda false. Its true that deep learning often
makes feature engineering moot. however, a lot of deep learning projects takes
machine learning engineers and applied scientists along with a host of other
support engineers + hardware costs ( I seen people at work say: I only used a
16 GPU instances over a couple weeks of training )

meanwhile, I consider gradient boosting fairly interpretable and they can get
pretty close results with a lot less tweaking and training time. If you want
to go full non-black box, logistic regression with l1 penalization and only a
little bit of feature engineering often does really well, probably a lot less
development time / cost compared to those high cost PHD research scientists.

~~~
MiroF
gradient boosted trees are black box

~~~
autokad
I would disagree with that statement. There are many ways at getting to what
the model is doing, feature importances, SHAP, etc. It's not as clear as
logistic regression or a single decision tree, but its not a black box either.

------
owenshen24
My understanding is that interpretable models, especially for neural networks,
are very far and away from state of the art performance. Work that e.g. tries
to approximate neural nets with decision trees have yet to be applied to very
large models [1].

Even in computer vision, which is where I think they've been most successful,
the visualization techniques used seem more suggestive, then explanatory [2].

[1] [http://www.shallowmind.co/jekyll/pixyll/2017/12/30/tree-
regu...](http://www.shallowmind.co/jekyll/pixyll/2017/12/30/tree-
regularization/) [2] [https://distill.pub/2019/activation-
atlas/](https://distill.pub/2019/activation-atlas/)

~~~
throwawayjava
I think it's a real open question whether the interpretable models are
actually worse, or merely worse in competition/benchmark problem sets. The
more deep models I build, the more I'm convinced that behind every inscrutable
parameter hides a certain amount of overfitting, with maybe a few notable
exceptions. E.g., can you build a decision tree that's not obviously overfit
but is also susceptible to adversarial perturbations?

~~~
mlthoughts2018
It’s also a real open question whether any of the interpretable models are
actually interpretable, or even if they are in any well-defined sense more
interpretable than “black box” alternatives.

In practice the answer is a massive “no” so far. Some of the least
interpretable models I’ve had the misfortune to deal with in practice are
misspecified linear regression models, especially when non-linearities in the
true covariate relationships causes linear models to give wildly misleading
statistical significance outputs and classical model fitting leads to
estimating coefficients _of the wrong sign._

Real interpretability is not a property of the mechanism of the model, but
rather consistent understanding of the data generating process.

Unfortunately, people like to conflate the mechanism of the model for some
notion of “explainability” because it’s politically convenient and susceptible
to arguments from authority (if you control the subjective standards of
“explainability”).

If your model does not adequately predict the data generating process, then
your model absolutely does not explain it or articulate its inner working.

~~~
throwawayjava
_> If your model does not adequately predict the data generating process, then
your model absolutely does not explain it or articulate its inner working._

That's a very dynamicist viewpoint. I don't necessarily disagree.

However, in what sense to the prototypical deep learning models predict the
data generating process?

I tend to agree that a lot of work with "interpretable" in the title is
horseshit and misses the forest for the trees.

------
sigmaprimus
I think the main reason is that creating a so called "Black Box" model is
faster and easier to some degree.

Its very similar to the argument kids make "Why do I have to learn how to do
long division when I have a calculator?" or "Why do I need to show my work on
the exam if I got the right answers?"

I tend to agree with the articles premise that any model being used in
critical decision making should at the very least have a list of parameters
and the weight given to each, no matter how long and complicated such a list
might be.

I think eventually this will be the end result of most critical
implementations of machine learning applications as they make their way
through the courts, as I can't imagine a judge accepting the argument "The
machine made a mistake, we are not sure why or the reason why is proprietary,
but we are not responsible because the machine made the error".

~~~
sigmaprimus
Ofcourse the scary thing is when decisions are being made that will affect a
persons life but will never make it to the courts. Like the example of
predicting loan defaults or parole releases, those decisions are made with
little to no explination. As much as I hate regulations I think this may be a
good place to legislate in order to protect those wrongly affected.

It makes me think about the mess the derivatives markets made in 2008

~~~
taneq
It’s not like your bank manager used to give you a phone call to explain in
depth why your loan was declined. Most of these situations were already
impenetrable from a human standpoint.

Humans are far more ‘black box’ than neutral nets, we just happen to be able
to construct plausible explanations in parallel.

~~~
BoiledCabbage
No, they were impenetrable from the customer perspective, but either the human
making the decision or the software being used can answer. With a black box
model, nobody can answer.

~~~
taneq
From the post I responded to:

> the scary thing is when decisions are being made that will affect a persons
> life but will never make it to the courts. Like the example of predicting
> loan defaults or parole releases, those decisions are made with little to no
> explination.

I was pointing out that it’s already like this. Whether it’s a black box ML
model or a capricious bank employee makes no difference if there’s no
transparency either way.

------
anfilt
Like I have pretty much always thought this when I read about ML or uses of
it. The process and end result is just math so it has to be explainable in
some way. I feel like not enough effort is given to understand the important
variables to ML models or how are results derived. When I see uses of it it's
more like the following: "Oh, it works, that's good enough."

For-instance when a model starts being over fitted, one can say it's pretty
much converging to something akin to a LUT (Look up table). While over fitting
is generally undesirable it still might be interesting. Principally, if
someone could figure out how basically it is indexing the data. Perhaps you
could create a simpler hash function or find some rules for creating useful
hash functions for that type of data.

~~~
psv1
The whole point is that your models should perform well on data that they
haven't seen before. You can't create a hash function for that. Let alone one
that works for 100s or 1000s of dimensions.

~~~
rubyn00bie
I'm confused by what you're saying... by data that hasn't seen before, do you
mean new variables which need to be hypothesis tested? Or just a larger
sample?

I also don't see why you can't create a "function" hash or otherwise, that
works for 1000s of dimensions. The model itself, if it was perfect would be an
equation... the number of dimensions really doesn't matter that much.

Sorry, I'm just real confused by your statement.

~~~
mkolodny
Say you're building a self-driving car. You need to take an image of what the
car sees, and figure out whether/where there is a pedestrian/car/bicycle/etc
in the image. So your input/variable is the image itself.

Almost every image the car sees is going to be different. I'd guess that's why
GP said that it doesn't make sense to use a hashing function - there's little
value in mapping inputs to results, because your input images are pretty much
always going to be different. So pretty much every time you look up an image
in your hashmap, you wouldn't find a match.

That's the point of ML models - find patterns in data so that when you see a
new example, you can predict what you're seeing based on what you've seen in
the past.

~~~
xelxebar
It seems like you might be deep in the ML rabbit hole. Zoom out a bit. A hash
function is "just a function." Every image labeling black box can be though of
as a hash from images to label vectors.

OP's comment effective asks whether there is another, more grokkable function
that maps/hashes inputs to the same labels.

Granted, that question boils down to "can we create human-understandable
models?" which is the whole point of this discussion.

It's a good question, though. If we had black-box-like spaghetti code
performing the same task, I predict that the comments here would be very
different.

~~~
mkolodny
Thanks for explaining that perspective. I was seeing things a little too
narrow-mindedly for sure.

Based on your phrasing of the issue, it seems like we could think of the
problem as: can we reduce the number of parameters in an ML model to the point
where humans can understand all of the parameters? That's related to an active
research area - minimizing model size. ML models can have billions of
parameters. It's unfeasible for a human to evaluate all of them. Research
shows that (sometimes) you can reduce the number of parameters in a model 97%
without hurting it's accuracy [0]. But, 3% of a billion parameters is still
way too many for people to evaluate. So I think the answer so far is no, we
can't create human-understandable models that perform as effectively as black
boxes.

[0]
[https://openreview.net/forum?id=rJl-b3RcF7](https://openreview.net/forum?id=rJl-b3RcF7)

------
bitL
I am worried about the recent trend of "ethical AI", "interpretable models"
etc. IMO it attracts people that can't come with SOTA advances in real
problems and its their "easier, vague target" to hit and finish their PhDs
while getting published in top journals. Those same people will likely at some
point call for a strict regulation of AI using their underwhelming models to
keep their advantage, faking results of their interpretable models, then
acting as arbiters and judges of the work of others, preventing future
advancements of the field.

~~~
throwawayjava
_> IMO it attracts people that can't come with SOTA advances in real problems
and its their "easier, vague target" to hit and finish their PhDs while
getting published in top journals._

I'm also pretty wary of interpretability/explainability research in AI. Work
on robustness and safety tends to be a bit better (those communities at least
mathematically characterize their goals and contributions, and propose
reasonable benchmarks).

But I'm also skeptical of a lot of modern deep learning research in general.

In particular, your critique goes both directions.

If I had a penny for every dissertation in the past few years that boiled down
to "I built an absurdly over-fit/wrongly-fit model in domain D and claimed it
beats SoTA in that domain. Unfortunately, I never took a course about D and
ignored or wildly misused that domain's competitions/benchmarks. No one in
that community took my amazing work seriously, so I submitted to
NeurIPS/AAAI/ICML/IJCAI/... instead. On the Nth resubmission I got some
reviewers who don't know anything about D but lose their minds over anything
with the word deep (conv, residual, variational, adversarial, ... depending on
the year) in the title. So, now I have a PhD in 'AI for D' but everyone doing
research in D rolls their eyes at my work."

 _> Those same people will likely at some point call for a strict regulation
of AI..._

The most effectual calls for regulation of the software industry will not come
from technologists. The call will come from politicians in the vein of, e.g.,
Josh Hawley or Elizabeth Warren. Those politicians have very specific goals
and motivations which do not align with those of researchers doing
interpretability/explainability research. If the tech industry is regulated,
it's extremely unlikely that those regulations will be based upon proposals
from STEM PhDs. At least in the USA.

 _> faking results of their interpretable models_

Jumping from "this work is probably not valuable" to "this entire research
community are a bunch of fraudsters" is a pretty big jump. Do you have any
evidence of this happening?

~~~
MiroF
> If I had a penny for every dissertation in the past few years that boiled
> down to...

This is very, very accurate. On the other hand, I oftentimes see field-
specific papers from field experts with little ML experience using very basic
and unnecessary ML techniques, which are then blown out of the water when
serious DL researchers give the problem a shot.

One field that comes to mind where I have really noticed this problem is
genomics.

------
lalaland1125
Interpretability is probably one of the most misunderstood topics in AI.
Interpretability is fundamentally not a math or statistics challenge, it's a
Human Computer Interaction (HCI) problem and needs to be studied in a similar
manner, with user studies. The fundemental goal of interpretability is to help
users achieve some concrete goal such as detecting bias, identifying bad
models, and debugging. This is similar how we can better design keyboards to
help users achieve faster typing speed. Unfortunately, most interpretability
studies don't actually bother testing if their methods actually work and I am
highly skeptical that things like "white box models" or salience maps or
attention weights or whatever latest methods can actually achieve the aims
that interpretability proponents claim they do. I would love to see a simple
randomized user study that would emperically show a concrete improvement in
something like bias detection or error detection.

~~~
anthony_doan
> nterpretability is fundamentally not a math or statistics challenge

No, I keep on seeing AI/ML/DS people keep on downplaying statistic.

Statistic interpret things. The majority of the models out there have a one to
one, predictor to response, holding all other predictor constant (linear
regression, logistic regression, arima, anova, etc..). Statistic inference is
a thing. Inference is interpreting. Descriptive statistic is interpreting.
Parsimonious is a thing. Experimental design is a thing. Degree of freedom is
a thing in statistic.

If you want interpretability do statistic. One of it's tenant is to quantify
and live with uncertainty not fitting a curve and lots of coefficient to just
predict. Not just classification.

It's a reason why biostat or econometric is a thing. Statistic.

Even the blog cited statistic papers even though it barely mention statistic
models in it. ~~And Rudin is a statistician and contributed a lot in
statistic.~~ Wrong person (I'm thinking of Rubin for casuality and
missingness)

This is not a tribal fight between statistic and ML. This is pointing out that
ignoring statistic is a detriment to AI/ML/DS as a field.

I predict that 2020 to 2030 statistic will be coming to AI/ML much more so
regardless how much people downplay statistic.

~~Seeing on Dr. Rudin is coming over.~~ I've seen other statistician too. Dr.
Loh works took decision tree and added ANOVA and Chisqaure to build
parsimonious decision tree.

------
dchichkov
I've happen to be in that workshop room and had chosen the 2% robot, not the
15% surgeon. If I remember correctly, the point of the question was
determining willingness of society to use black box models, not challenging
the need for models interpretability.

Interpretability obviously doesn't hurt accuracy. But it is costly to
engineer. And not always possible to make. Not always possible, because human
capacity (and willingness to put in the effort) into understanding the
explanation is limited.

~~~
1_over_n
>> Interpretability obviously doesn't hurt accuracy.

Why do you say this? From what i have seen it certainly can and does. For some
industries finding a trade off where the magic is.

[https://link.springer.com/article/10.1007/s10664-018-9638-1](https://link.springer.com/article/10.1007/s10664-018-9638-1)

[https://www.oreilly.com/ideas/predictive-modeling-
striking-a...](https://www.oreilly.com/ideas/predictive-modeling-striking-a-
balance-between-accuracy-and-interpretability)

~~~
dchichkov
You can take nearly any black box model and add _some_ interpretability into
it, without modifying this model. No modifications to the model, so accuracy
stays the same.

------
rubyn00bie
> [...] these black box models are created directly from data by an algorithm,
> meaning that humans, even those who design them, cannot understand how
> variables are being combined to make predictions.

So uhhh, isn't this like not science? Like my biggest problem with "machine
learning" is people assume the data they have can correctly answer the
question they want to answer.

If your data is off, your model might be incredibly close for a huge portion
of the population (or for the biased sample you unknowingly have), but then
wildly off somewhere else and you won't know until it's too late because it's
not science (e.g. racist AI making racist predictions).

A model cannot be accurate if it doesn't have enough information (like
predicting crime, or the stock market). There are an insane amount of
statistical tests to detect bullshit, and yet we're pushing those and
hypothesis testing right out the window when we create models we don't
understand.

Like I just don't get how some folks say "look what my AI knows" and assume
it's even remotely right without understanding the underlying system of
equations and dataset. You honestly don't even know if the answer you're
receiving back is to the question you're asking, it might just be strongly
correlated bullshit that's undetectable to you the ignorant wizard.

I find it pretty hard to believe we can model the physical forces inside a
fucking neutron star (holla at strange matter) but literally no one in the
world could pen an equation (model) on how to spot a fucking cat or a hotdog
without AI? Of course someone could, it would just feel unrewarding to invest
that much time into doing it correctly.

I guess I can sum this up with, I wish people looked at AI more as a tool to
help guide our intuition helping us solve problems we already have well
defined knowledge (and data) of, and not as an means to an end itself.

~~~
StuffedParrot
> So uhhh, isn't this like not science?

Very little of technology has anything to do with validating hypotheses.

> meaning that humans, even those who design them, cannot understand how
> variables are being combined to make predictions.

The _intention_ is to not rely on the explanation to evaluate the
effectiveness of the model. This does not preclude any of the infinite
narratives that might explain the model.

This is fundamentally a cost saving mechanism to avoid hiring engineers to
code heuristics useful to business. There is nothing related to science here
at all. A "black box" model is fashionable to those who prefer to observe and
not create meaning, even if the observed meaning is deeply flawed from a human
perspective. After all, people spend money based on less all the time.

~~~
1_over_n
At our company, we are working with "mechanistic" or mathmatical models rather
than stastical approches. I have observed :

1\. It is hard than it should be to explain the concept to people
(particularly VCs) 2 . people struggle to understand that a mechanistic model
could have more utlity than a machine learning black box 3\. people think you
are doing something wrong if you are not using a neural network 4\. The less
people understand about neural networks, the more they seem to believe they
are appropriate for all predictive / modelling problems 5\. There is generally
quite a low understanding of scientific method in the startup / VC space
(speaking as someone who has worked in and around academia for years) vs how
"scientific" people believe they are because it sounds good to be data driven
and scientific about running startups and funding them.

~~~
deepnotderp
Do you mean like symbolic regression or algorithmic information theory based
stuff?

If so, I'd love to get in touch, shoot me an email

~~~
1_over_n
We are using biophysics based approaches. Will send an email over :)

~~~
nickpsecurity
Send me one, too, if you don't mind. I like collecting these techniques to
give to researchers or practitioners wanting to try new things.

------
jph00
This article seems to misrepresent a number of important issues, and as a
result significantly overstates their claims. I'll pick just one illustrative
(but important) example:

> " _For instance, when ProPublica journalists tried to explain what was in
> the proprietary COMPAS model for recidivism prediction, they seem to have
> mistakenly assumed that if one could create a linear model that approximated
> COMPAS and depended on race, age, and criminal history, that COMPAS itself
> must depend on race. However, when one approximates COMPAS using a nonlinear
> model, the explicit dependence on race vanishes, leaving dependence on race
> only through age and criminal history. This is an example of how an
> incorrect explanation of a black box can spiral out of control._ "

The concern about the strong relationship between race and COMPAS predictions
is not largely based on a concern about whether there is an _explicit
dependence_ in the model. The concern is whether there's a relationship either
explicitly _or implicitly_. And in particular, whether such a relationship
results in _unfair_ outcomes. The findings of the ProPublica study
([https://www.propublica.org/article/how-we-analyzed-the-
compa...](https://www.propublica.org/article/how-we-analyzed-the-compas-
recidivism-algorithm)) strongly suggested this was the case:

"\- Black defendants were often predicted to be at a higher risk of recidivism
than they actually were. Our analysis found that black defendants who did not
recidivate over a two-year period were nearly twice as likely to be
misclassified as higher risk compared to their white counterparts (45 percent
vs. 23 percent).

\- White defendants were often predicted to be less risky than they were. Our
analysis found that white defendants who re-offended within the next two years
were mistakenly labeled low risk almost twice as often as black re-offenders
(48 percent vs. 28 percent).

\- The analysis also showed that even when controlling for prior crimes,
future recidivism, age, and gender, black defendants were 45 percent more
likely to be assigned higher risk scores than white defendants."

I understand the desire of the MIT researchers to promote the value of their
work, but in this case they appear to be doing so in a potentially damaging
way.

~~~
makomk
No, they're not misrepresenting this at all. ProPublica's article
[https://www.propublica.org/article/machine-bias-risk-
assessm...](https://www.propublica.org/article/machine-bias-risk-assessments-
in-criminal-sentencing) was pushing the claim that somehow, the COMPAS black
box was implicitly deducing defendants' race from the "137 questions" input to
it and labelling them likely reoffenders based on it in a way that was
indepdendent of the key factors known to affect reoffending rates, such as age
and gender. The paper in question seems to demonstrate the exact opposite is
true: ProPublica were inadvertently using race as an imperfect proxy for age
at sentencing, which is what the COMPAS algorithm really cared about, because
their attempt to control for age at sentencing didn't work. After controlling
for the actual age factor, COMPAS results didn't have any relationship to race
anymore. (It's not even a weird weighting factor: predicted reoffending risk
falls off rapidly with increasing age at first, then more slowly in a smooth
fashion. It's just not linear.)

Now, it's of course possible to argue that judging reoffending risk based on
age is in fact unfair and racist because it has disproportionate impact on
certain racial groups, even though it's strongly predictive across all racial
groups. That's not the argument ProPublica made, though. Their argument was
about the supposed perils of black boxes, and they kind of acknowledged that
age probably wasn't a racist criteria - or at least that it would be a lot
harder to justify calling it one - by attempting to strip out its effects in
the first place. It's also a different kind of argument entirely, one that
revolves not around whether the algorithm is somehow treating people
differently based on their detected race - because it isn't - but around what
it means for a decision like this to be fair in the first place.

~~~
jph00
Whether a variable is latent or explicit isn't really relevant to the question
of algorithmic fairness.

The link I provided gives the actual details of the method and findings; this
is probably a more useful source for the details. The claim that the actual
source of the difference is 'age' doesn't really make sense. There isn't
enough of a difference in the number of young people between black and white
populations to result in the differences found in the analysis.

(I do agree that the actual attempt to control for age was poorly done; it
really shouldn't have been done at all, since it had nothing useful to add to
the analysis or results.)

PS: It's 'COMPAS', not 'COMPASS'.

------
mlthoughts2018
“Explainability” is a subjective idea that allows people to act as gatekeepers
about the use of a model through arbitrary politics.

“The Mythos of Model Interpretability” is good reading on this.

[https://arxiv.org/pdf/1606.03490.pdf](https://arxiv.org/pdf/1606.03490.pdf)

------
netcan
Ultimately, a lot comes down to the accessibility of tools and know how. The
more widespread and general a technique or tool, the more widespread and
general it becomes... because people already have it and know how to apply it.

OTOH, these authors (and old school heavyweights like David Ferucci) are not
wrong. Interpretability, explainability & interoperability with human
intelligence is not something to just give up on.

I like "challenges" as ways of exploring these areas. Good luck to the
authors. Female team, btw.

------
andlima
It's not that accuracy will _always_ be sacrificed if one wants an explainable
model. The point is: if interpretability is an important constraint, it could
prevent improvements on accuracy.

Sometimes, the best interpretable model is as good as a black box, and that's
great.

When this is not the case, the trade-off is that one should see what's more
important for the actual problem. Perhaps interpretability is not a big deal.

Another solution is to try to extract interpretability from the more accurate
black box model with something like SHAP.

~~~
1_over_n
This is a great point. There is a general lack of understanding about what it
means for models to be interpretable & explainable. These words get thrown
around often by people who don't understand the definition, and also the trade
off with accuracy.

Some papers i found interesting on the subject:

[https://arxiv.org/abs/1606.03490](https://arxiv.org/abs/1606.03490)

[https://arxiv.org/abs/1707.03886](https://arxiv.org/abs/1707.03886)

[https://arxiv.org/abs/1806.07552](https://arxiv.org/abs/1806.07552)

[https://arxiv.org/abs/1702.08608](https://arxiv.org/abs/1702.08608) (i found
this was a good sumary of the issues)

------
boltzmannbrain
A couple salient quotes from "Interpreting AI Is More Than Black And White" in
Forbes [1]:

"No matter the definition, developing an AI system to be interpretable is
typically challenging and ambiguous. It is often the case that a model or
algorithm is too complex to understand or describe because its purpose is to
model a complex hypothesis or navigate a high-dimensional space, a catch-22.
Not to mention what is interpretable in one application may be useless in
another."

"Underspecified and misaligned notions of interpretation impede progress
towards the rigorous development of understandable, transparent, trusted AI
systems."

[1]
[https://www.forbes.com/sites/alexanderlavin/2019/06/17/beyon...](https://www.forbes.com/sites/alexanderlavin/2019/06/17/beyond-
black-box-ai)

------
Ericson2314
The interest in black-box models is partly for the reasons given, that it is
presumed that their inexplanability makes them more powerful, but also because
business people are sick of dealing with programmers and analysts as a class,
people with arcane knowledge limited replacability, and a penchant for blowing
deadlines.

The dream of machine learning is to fire the programmer and break up the last
trade (baring doctors and lawyers).

If it was just business people being alienated, I'd say fuck it they deserve
it. But the fact is everyone, and many of the programmers too, are alienated
too when much of the code makes sense to no one. I would rather fix
programming to be comprehensible to more people even though it will take down
the walls between programmer and non-programmer.

------
tripzilch
> there are actually a lot of applications where people do not try to
> construct an interpretable model, because they might believe that for a
> complex data set, an interpretable model could not possibly be as accurate
> as a black box. Or perhaps they want to preserve the model as proprietary.

I think they missed an important reason here. It's why I decided to study AI
and Machine Learning: The ability to "just throw some data at it and see if it
figures it out" is just so _exciting_ , mysterious and intriguing.

I remember actually being a bit disappointed when I learned how (classic, 3
layer, early 2000s) neural networks worked. That it was just something super
simple with derivatives, and resulted in something that seemed a lot like a
more complex form of statistical regression. Kind of took the magic out of it
for me, a little (didn't stop me of course, it was and still is an exciting
field of research).

I know it's not a _good_ reason, but surely I'm not the only one who thinks
it's extremely _cool_ that one is even able to build a black box that does
useful things with what you feed it, but you don't know how it works, and yet
you built it.

It's just something mystifying, and I believe there is also a tinge of fear
for that disappointment, to figure it out and find out it's not really doing
as clever things as you hoped it might.

------
tdhttt
When using ML models to deal with natural science such as physics, working
well+being interpretable actually implies new discovery. Working well only, on
the other hand, cannot win you a Nobel Prize.

------
computerex
This is a pointless article. Obviously everyone would prefer fully
interpretable models if it was possible to get comparable performance with the
leading black box/deep learning approaches. No one _chooses_ to make black
box, non-interpretable models, it's simply the fact that the models that give
the best performance tend to be black box. There is a lot of work happening on
trying to interpret deep learning models, because no one prefers non-
interpretable models and being unable to pin point and explain why some
prediction was wrong.

~~~
erostrate
The article explains that their fully interpretable model actually got within
1% accuracy of the best black box model. Which was within cross validation
error. That was at a data science competition whose explicit goal was to
encourage people to explain black box models. They also give other examples of
cases where people have used black box models unnecessarily.

~~~
computerex
> The article explains that their fully interpretable model actually got
> within 1% accuracy of the best black box model. Which was within cross
> validation error. That was at a data science competition whose explicit goal
> was to encourage people to explain black box models.

That's awesome, but this was a single model for a single application being
measured against presumably by a limited test criteria/benchmarks. The point
still stands that the recent narrow AI "renaissance" is largely due to deep
learning, which is inherently black box. There is a lot of work going on in
making deep learning more interpretable precisely because it's so prominent
today and because its lack of interpretability is a huge con.

Machine vision for example has come a long way due to deep learning. A lot of
autonomous car companies are relying on it for perception. All of these
companies hate the fact that there is no way to tell when the classifier will
fail or why it'd fail. When using deep learning in finance setting, its lack
of interpretability is a huge downside.

Despite this deep learning is being used because it provides an appreciable
improvement in performance over the alternatives. Certainly there could be
alternatives to deep learning that may perform just as well or even better.
But finding those alternatives for the vast number of applications deep
learning is being used for today is much easier said than done.

------
m0zg
Human cognition itself is a black box model. We routinely make snap decisions
first and then justify them after the fact. Anything that requires serious
consideration is darn near insurmountable for a human mind in the presence of
incomplete information, emotions, and conflicting motivations.

So while I'd like neural nets to be more interpretable, to me it'd take a
distant second place. The first would be to get models that actually work
better than humans for practical tasks, even under limited circumstances.

~~~
Rapzid
I share this sentiment.

It occurs to me that human decisions making can be either explained a very
high level, or a very low level(neurons firing). But the magic in between is
too complicated to draw direct lines between stimuli and decisions.

AI seems to be the same way. We have statistical models that explain how most
everything works at a low level. That a simple math formula can approximate
just about anything. How we arrive at weights that separate data sets in
different dimensions and etc. For convolution networks we also understand how
layer "decisions" flow through to other layers. But, it's too complicated to
look at an image and explain exactly how the input pixels will result in a
classification output.

I'm not sure how much I should care about that.. Seems like a problem for
mathematicians/statisticians. Having the doctor explain himself gives me no
more true insight into how his brain works than having somebody explain at a
high level how the robot was trained and reacts in different situations.

------
Lucadg
The convenience to be able to say: "we didn't take the decision, the algorithm
did" is the holy grail of non accountability.

I wouldn't be surprised if someone, especially in business, is actively
opposing transparent AI.

------
amitpaka
Interpretable models are an exciting new entry to the ML toolkit. While they
are likely sufficient for everyday ML tasks, they might not be expressive
enough vs deep networks to tackle complex tasks like fraud and anti-money
laundering classification. As with any tool, interpretable vs black models
have optimal target applications that data scientist can apply them for. As
interpretable models get wider tooling for training, expect them to play a
larger role in simpler ML tasks. Black box models are however here to stay and
will always have their place with their versatility.

------
hooande
I don't know how many of you are familiar with Kaggle competitions, but the
winning scores on the leaderboards can come down to 0.0001 points of accuracy
or less. Winning something like that with an explainable model is like having
one hand tied behind your back.

If tiny fractions of a point matter (the rare case), then making the model
explainable is adding a ton of complexity for little practical gain. It sounds
nice, but the numbers are unforgiving

------
szczepano
What a dumb question is it? The same logic is behind sports and nobel price
awards.

It's easier to say to people who make financial decision that this is great
artificial intelligence that will solve all your problems then explain
mathematical equations behind it that 90% people won't understand. The more
complicated the stuff is the higher probability it will be sold. That's the
standard in IT world from quite some time.

------
nurettin
We have all these tools to build neural networks and we use them to peform
feats where a simple C4.5 would have sufficed.

To continue innovation in the algorithmic field, I think we need a branch of
scientific research or a competition where researchers are constrained to
100MHz CPU clocks and no GPU cheats when actually running their algorithms
(not when loading and parsing data)

------
tw1010
I feel like we're ignoring the obvious answer: because most people in AI right
now are overwhelmed by the necessary mathematical background and comfort
necessary to produce first-principles generative innovative algorithms and
models, so all they're left with is following the latest trend, which is
locked on deep learning right now.

------
i000
While I can definitely see the value of black box models in engineering and
commercial applications (including clinical ones), I am quite troubled by the
new hype of applying to deep learning to so many basic scientific problems.
The 'deep' models do not tend to result in deeper insights - which
idealistically ought to be the goal.

------
somurzakov
When I want to predict something, I need the highest accuracy, and I am not
interested in interpretability of an individual parameter.

and interpretability is only a half measure. If i really want to understand
the model, I will conduct causal analysis with randomized controlled trial and
will find the root cause and impact of the factor of my interest.

------
zevets
I think this argument fails to understand that purpose of black-box models is
to be impossible to understand, as it shields the operator from pre-emptive
accountability.

Then, since they are supposedly the 'only way' to provide whatever function
the market is 'demanding', then clearly we must abandon any notion of
accountability, as this need 'must' be met.

This logic is perhaps most evident in google/facebooks content moderation
dilemmas, where both companies refuse to define any sort of non-vague
actionable standards about what they censor (or fail to promote) as 1) they
can't provide them and 2) they don't want to be held responsible for what they
actually are.

Then, as outrage has grown over facebook/google's terrible content moderation
and censorship policies, and the need for accountability has grown, both
companies have been forced to hire ever more moderators and censors, because
the technologies as claimed just don't work well enough.

~~~
taneq
Well maybe also because black box models can be applied to a wider range of
systems and can help to understand general properties of a class of
algorithms. I’m sure it’s mostly all a conspiracy, though.

------
hyperpallium
It's true our own cognition is also a black box, e.g. how do we recognize?
It's as much a mystery to introspection as digestion. But the goal of science
is to understand. Whereas with DL, we're just chimps in the dirt playing with
sticks.

------
ummonk
I think there is a lot of scope for interpretation in black box models. E.g.
when doing transfer learning with a frozen pretrained model it should explain
the new classifications in terms of pretrained classes.

------
sytelus
There are probably dozens of holes one can drill into this sensationally
titled click-baity study. It's surprising what gets through Nature's review
system these days.

~~~
hesselink
And yet you provide none. What do you base this on?

------
jayd16
Can you train a white box solution with only a tagged data set? It seems
obvious to me why we have black box solutions even if we could have spent man
hours engineering an algorithm.

~~~
swiftcoder
You actually can - for example
[https://arxiv.org/abs/1806.10574](https://arxiv.org/abs/1806.10574)

------
colinprince
Off topic, apologies, but why are we putting (2019) in the title when this
article is barely two months old?

Makes is seem like it's an out-of-date article. Is it?

------
StuffedParrot
Absolution of liability. Why else use the term "AI" to describe your
heuristic?

------
OrgNet
why include 2019 in the title of this post? was it automatically added (is
that a bug)?

~~~
Gaelan
There's a (human-enforced) HN rule to do that for old articles, so they're not
mistaken for news. I'm not sure what the standard for "old" is, but it isn't
2019 anymore, so I guess it's technically correct.

~~~
OrgNet
2 months old is usually not considered old news on this site... I think that
their algo is buggy on year-change

~~~
Gaelan
As far as I know, there's no algorithm doing it. People put it in the title
themselves, or mods do if they forgot.

------
rdlecler1
For god sakes. Enough of this black-box—we don’t understand how this works
narrative. Neural networks are JUST computational circuits. Unfortunately
there’s been a lot of laziness in cleaning those circuits up (W_ijk=O) to
better reveal the topological circuitry.

