Hacker News new | past | comments | ask | show | jobs | submit login
Stop Explaining Black Box Models for High Stakes Decisions (arxiv-vanity.com)
150 points by polm23 17 days ago | hide | past | web | favorite | 63 comments

Humans do not use explainable models. We confabulate an explanation post hoc.. There is some evidence that we may do it down to very simple cases (“why did you reach for that cereal? A: it’s good for me”).

We don’t do this for complex plans where we track multiple intermediate goals, even not on a piece of paper, but the intermediate gaps between what is written down are truly inscrutable.

That being said I have some sympathy for the technical portions of her argument (and for the rest too).

Humans don’t have completely explainable models, but we do have partiality explainable models which are still meaningful. A doctor who for example knows that antibiotics don’t work on viral infections can explain sufficient reasons to justify their actions. Shareable heuristics are a powerful tool and tossing them away binds you to the biases in your training sets.

Perhaps I was not clear; here's an example of what I meant:

We make complex reasoned plans like "how to make coffee": I'm going to need a filter in the pot, I'm going to need ground coffee, water, etc; let's make sure the filter's there before I add the coffee and both before the water, etc." Then we use some sort of hierarchical planning heuristics to run it. If someone asks what I did I can explain it at varying levels of resolution ("I made coffee", "I got these pieces together (xxx) and then made coffee...") again depending on various "explanation heuristics" which we learned as part of being eusocial organisms.

These plans are complex (even "get out of bed and pee" is a pretty complex plan).

However below that are a ton of decisions we aren't even aware we're making. I'm convinced (and my software reflects) that the actual "plans" we make organically are very short, and that the interesting plans -- the ones we can talk about and that we typically care about -- are very abstract ones. Making coffee is super abstract, after all.

Perhaps for an analogy: the extremely abstract reasoning for not using an antibiotic is at the level of "chemistry" while what I'm talking about is at the level of "physics".

I'm reminded of how proofs work. We don't understand how mathematicians really think, and each mathematician might think differently, using different mental images. But a mathematician can write a proof, and other mathematicians can verify it.

Similarly, the output of an opaque and perhaps unreliable process could be not just an answer but also a justification. If the justification can be verified more easily than starting from scratch, we've made progress.

Even in the humanities there is a difference between argument from authority (blind trust) and argument based on reason and justification. And sometimes we have both. Court decisions include not just the decisions but also an explanation of why the court ruled that way. It certainly doesn't eliminate bad decisions, but it's useful.

So the discipline of being able to explain yourself seems pretty important. I don't think AI will be able to fully participate in high-stakes decision-making until it can do that.

Yes! An excellent example of what I was attempting to describe.

This kind of sentiment is just a higher-level version of gaslighting. It conflates rationality and explainability.

Humans have a very good intuitive understanding of how other humans think, even when that thinking is irrational. This understanding has evolved over millions of years and it even predates our species. We developer all kids of mechanisms for controlling irrational behavior and tested them for thousands of years.

We have lived and evolved with human decision making since we've existed.

Taking a new system of machine decisions and adding it to the mix sounds problematic.

I recently watched a video on why you should hire a managing company if you rent to tenants. One example they gave: say you rent your place to someone because you met them and they seemed trustworthy, you just broke fair housing laws in many states and can be sued because you wont be able to provide objective criteria on why you didn't rent out to other tenants.

I think this relates to the current discussion. You don't need an explainable model for what cereal you choose, but you need a 100% clear one on what tenants you choose, who you lend to, who you hire, etc.

To quote one Maciej Cegłowski, "Machine learning is money laundering for bias".

That's a lovely and succinct way of putting it.

I really dislike the way so many companies are pushing ml based systems as some kind of scientific, perfect, measure for difficult human problems like recruitment.

It's a good line, what is the context / source for that quote?


His talks, particularly about superintelligence and automation, are also relevant.


polm23 mentioned that this is from a talk by Maciej (idlewords on HN), I think the particular talk is this one:


I'm no expert in ML, but doesn't this paper basically argue for reintroduction of rule-based expert systems (though obviously what's going to happen is that rules are going to get so complicated that they're no longer sensibly interpretable) and has basically no useful suggestion for actually complicated (and successful) fields of ML like computer vision.

> when a new image needs to be evaluated, the network finds parts of the test image that are similar to the prototypical parts it learned during training

basically just waving away the complexity/black-box-ness of "computer vision" into the complexity/black-box-ness of "similarity".

I don't think so.

There's a difference between expert systems and black boxes.

Black boxes are problematic in domains where what matters isn't just your decision today, but the evolution of your decision making process.

Easy examples being finance, medicine, legal, education etc.

In these areas, when you are explicitly weighing competing interests/rights/harms, it's pretty important that you be able to explain your reasoning to another. So they can check it, so they can test it, and so they can apply it if it's good.

Not just because your decision could be wrong, but because the process by which we evolve our decisions is important (think precendent for law, blow up analysis for finance etc).

If we want to push our understanding of a domain forward, black boxes populated with a lot of data aren't super helpful.

They are able to spot complex patterns yes, many of which can be cleaned/restructed into simple patterns.

In reality most of the best uses of ML thusfar have been either rapidly screening/classification based on simple patterns (think OCR on a check - the character recognizer engine in the machine isn't really teaching us much about language or typology, it's just processing existing patterns), or in domains with extremely rigid game mechanics, where the rules never change but you can run a billion simulations (chess, go, video games etc).

Yes. I think the idea is once you have a predictive model, is to go thru computationally hard process of facrtorization - identifying inputs that if removed don’t affect predictability that much. Rinse and repeat until you have an explainable model.

Imho you won’t alwsys get an explainable model, because some times there may be too many factors that are predictive, but the effort is what’s important.

Similarity can be a learnable metric or user defined.

>(ii) Explainable ML methods provide explanations that are not faithful to what the original model computes.

>Explanations must be wrong. They cannot have perfect fidelity with respect to the original model. If the explanation was completely faithful to what the original model computes, the explanation would equal the original model, and one would not need the original model in the first place, only the explanation. (In other words, this is a case where the original model would be interpretable.) This leads to the danger that the explanation method can be an inaccurate representation of the original model in parts of the feature space.

This is such a succinct phrasing of what makes me so uncomfortable with these approximate explanations.

The Morning Papers had its take on this paper: https://blog.acolyer.org/2019/10/28/interpretable-models/

> Because the data are finite, the data could admit many close-to-optimal models that predict differently from each other: a large Rashomon set.

Hmm. But the ultimate goal isn’t actually to find a model that makes good predictions on the training data. It’s to find a model that makes good predictions on data that‘s ingested in the future when the algorithm is put to actual use. But the set of possible future data is infinite! (Or at least exponentially large, depending on whether the input fields have finite precision.)

Yet, if an ML model can act as a reasonably accurate classifier on that data, the presence of an internal structure of that data is proven. In other words, there exists a transformation to a space in which the data set separates along lines similar to what is desired.

Yet, without any kind understanding of that transformation, we cannot reason about its properties in a meaningful way. This is the downfall of the current generation of succesful ML models. We may not need an exact understanding of a training result. An approximate one may be enough, depending on the kind of insight that needs to be extracted.

My pipedream vision is that ML models are some day just mere tools that help design simpler models with formally guaranteed properties.

This is a different issue. We already make some assumptions about the relationship between training data and future data, even when not worried about interpretability, and these assumptions are enough for this paper's argument too.

While I mostly agree, I’ll also point out that humans are also black boxes, who have often hard time explaining their decisions.

We have hundreds of thousands of years (if not more) of trying to understand other humans behind us. It's encoded in our biology (theory of mind, mirror neurons), and more recently, in countless of disciplines we engage in - from arts, through law, to psychology. Truly unexpected human behavior is a very rare thing.

We ought to be able to do better than building our ML models as black boxes and saying "humans are black boxes too". We're engineering this technology, so we have an opportunity to make it inspectable. We should take it.

One of the things neuroscience and psychology showed us is that humans always have a post hoc reasoning they can explain, but it's not always the actual reason they did something.

So humans explaining themselves is often misleading data, which may be worse than no data.

Personally, I currently believe that this research is quite probably bunk. There was a recent HN thread in which it was pointed out that some of the results of experiments involved might have been data processing errors. And then there's the most obvious confounding factor - peoples' reported reasoning may not be their true, or complete, reasoning, because people aren't always comfortable with revealing all the details.

But regardless of that, while post-hoc explanations may not be that good for building predictive models of behavior, observed behavior is. We generally know how people behave, and we know the expected variance due to individual and situational circumstances. Truly unexpected behavior is rare in society, and we tend to filter it out. Truly unpredictable people get locked up and/or are given medical help, but even less unpredictable people are tested and kept out of professions and activities where that unpredictability could cause problems.

True, humans are more predictable and we can kind of probe at it until we reveal the truth. I guess we could do sensitivity analysis on ML models but most people aren't equipped for that.

>Truly unpredictable people get locked up and/or are given medical help, but even less unpredictable people are tested and kept out of professions and activities where that unpredictability could cause problems.

Given our knowledge about e.g. Narcissists or psychopaths their behavior is not totally unpredictable. I guess a serial killer could be extremely predictable, but not something we want in society.

Variance reduction isn't the direct reason we lock away people or restrict their opportunities; personal and public safety is. So a serial killer is hunted and imprisoned, and work is done to detect would-be killers before they kill. Variance, however, is a safety issue in many areas - e.g. gun ownership, police service, flying, driving. So we have psychological evaluations not just to weed out the individuals obviously incapable of handling a task safely, but also the highly unpredictable ones.

"humans explaining themselves" is entirely different thing than "humans understanding human behaviour". You may have rationalized your choices, but I understand both your choices AND that you rationalized them.

But do all those techniques actually work better for humans than trying to post-hoc explain machine learning? Perhaps both cases are just about inventing stories around the decisions so they can be perceived as relatable.

I'm talking about variance. We roughly know what to expect of other people, we have a good idea of the shapes of distributions of human behaviour. We know the range of expected behaviours in a given situation with high confidence. We know how stupidity looks like.

None of this applies to ML. ML models can fail in ways we can't easily predict, for reasons we don't expect. Viewed as minds, they run on a different architecture and on different firmware than human minds. They're alien to us. Alien like a kitten who suddenly freaks out for no reason, like ants that can get stuck walking in a loop - except more so, because we've know cats and ants for as long as humanity exists, and they're more similar to us than ML models hardware- and firmware-wise.

Ok, fair point. So I guess whereas one could argue the explainability problem away, the predictability of failure modes is non-negotiable.

Perhaps, but explainability is very helpful for predictability - it allows us to do better than just making statistical inferences from observed behaviour. Since ML models, unlike biology, are entirely our design, we have an opportunity to make them explainable and reap the benefits in terms of predictability.

You would be mistaken to trust a random expert without some independent verification of the accuracy of their predictions.

humans can be held accountable for their decisions though

This point is really about incentives: since we know that our fellow humans are held accountable, we know they have the incentive to make high-quality decisions that will not impact the world negatively. Therefore, it's easier to trust them, even when they go with their gut.

We can't project the same incentive structure onto software companies. They are a different scale than humans, and might be able to tolerate the possible hit to their reputation better than a human can. Their incentives are usually money-based rather than social-based.

And for the models themselves, they have no incentives, unless you count "reduce their error rate" or "be interesting enough for researchers to continue to research them". We barely know why they work. Our basis for trust is rather tenuous.

In this case the analogy would be that software companies should be hold accountable for the decisions their AI makes?

The article makes the case that if an interoperable model (one we can explain) isn't used, then the user of the black box model should have the burden of proof to prove that no interoperable model exists that does the job, and some level of responsibility for trying to develop one.

Only because we all agree that a human is a ‘person’ ie. a thing that can take blame.

Fascinating. I'd never thought of defining a 'person' this way.

You might have never been in a VP+ position in a company then. A lot of tasks in these positions are about avoiding consequences or shifting blame for actions that are either necessary but look bad or that had to be made with way too little input information. And not necessarily just one's own actions.

I was definitely thinking about corporate blame-shifting when I wrote that, and actually nearly said "legal 'person'" instead.

So make these ai’s corporations — instant personhood.

I didn't read the full paper yet, but a quick ctrl+f shows no mention of Activation Atlas for vision neural network, a collaboration between Google AI and OpenAI :


Also no mention of the "Unforeseen Attack Robustness" metric by OpenAI :


There are probably other publications for NLP models, all the big players are aware that explanation is key.

Human knowledge is built upon the high level logical relations. For example, 3-body problem. The equations over the variables are always right although no one can predict them accurately using those equations.

Current black box AI does not learn such a high level logical relation although it might predict motions the most accurate.

High level logical relations likely generalize to other domains. Low level prediction models are sensitive to the distributions of the data and hardly generalizable cross domain.

Perhaps we need a hybrid system to combine both abstract logical reasoning and semantic tensor computations.

To be fair, the original article does have "please" in the title.

Still a terrible title for a scientific paper.

In many uses, inscrutability is the main feature sought. Accuracy is secondary, where it counts at all.

This probably needs legislative correction, because scrupulous data scientists are not in the driver seat.

Sparse decision trees and regression models are easily explainable.

In some domains, with sufficient feature engineering you can get reasonable results with these approaches.

"sparsity is a useful measure of interpretability, since humans can handle at most 7±2 cognitive entities at once "

If you are going to limit models to 9 features, or 9 combinations of features, or 9 rules, models are not going to work as well. This just seems like a very weak argument to introduce.

"A black box model is either a function that is too complicated for any human to comprehend, or a function that is proprietary" These seem like two very different things. A false equivalence is then made with many arguments against proprietary models being made as if they apply to complex models.

"It is a myth that there is necessarily a trade-off between accuracy and interpretability....However, this is often not true" It's a bit of a straw man to suggest that there is "necessarily" a trade-off, as there are surely cases where there is not. One could say that there is "often" a trade-off between accuracy and interpretability. I don't think there are many people out there with the naive view that one should never do feature engineering.

"If the explanation was completely faithful to what the original model computes, the explanation would equal the original model" This is just nonsensical to me. The idea is not to be completely faithful, but to raise things up to another level of abstraction. The series of videos from Wired comes to mind where a concept is explained at multiple levels. https://www.youtube.com/watch?v=OWJCfOvochA

"Black box models are often not compatible in situations where information outside the database needs to be combined with a risk assessment." This is absolutely untrue, depending on one's definition of often. The output of a model, whether it falls into this incorrect definition of a model or not, can be treated as a feature in another model. Ensemble learning exists.

COMPAS is the punching bag, but no one seems to know what it is. I haven't seen the evidence that its performance is equal to three if statements. It certainly doesn't have anything to say about machine learning in general, as it is set of expert designed rules. So, it is actually the kind of algorithm the author favors, except proprietary.

"typographical errors seem to be common in computing COMPAS.... This, unfortunately, is a drawback of using black box models" Unclear why typographical errors only affect black box models.

The BreezoMeter case is not clear evidence of anything broader either. It is unclear whether the one error noted out of millions of predictions is from bad source data. Stretching this to concern any sort of proprietary prediction, such as mortgage ratings, is a stretch that doesn't really tell us anything.

"Solving constrained problems is generally harder than solving unconstrained problems." This doesn't make sense to me at all. All evidence is that ML works better on constrained problems.

The idea that CORELS is somehow better, even if it comes up with a ruleset of millions of rules, doesn't make sense. The proposed workaround for this "the model would contain an additional term only if this additional term reduced the error by at least 1%" could result in the failure to create a model if no one term provided 1% on its own.

Scoring systems are useful, but it's like a single layer perceptron in the example provided. You need to consider combinations of factors to see their impact. A high X is bad, unless Y is also high and Z is is low.

The fundamental problem here is trying to limit the power of the algorithm to the power of the human mind. From the very beginning we have used computers to do things that are difficult or impossible for us to do. In some cases the answers were provably correct, but we have now reached a point where computer generated proofs are accepted. The "oracle" mode of computing will compute an answer for us, but we ultimately have to choose whether or not to accept it on the basis of the evidence, much like we do with the opinion of an expert. Simple techniques such as providing one's own test data set, and the kind of analysis done by Tetlock around Superforecasters, can go a long way to building that understanding of accuracy of predictions, such that we have a guideline for evaluating algorithms that are beyond our ability to understand.

I appreciate this thoughtful and detailed reply. I was thinking all those things in my head while reading as well, but couldn't bring myself to invest the time to address them all. I got the impression that the author wasn't someone who has a lot of experience building real-world predictive models otherwise they'd appreciate the trade-offs that need to be made sometimes to get something that works well and can be debugged/interpreted without too much trouble. Of course this isn't to say we shouldn't be striving to develop more interpretable solutions, but I don't think this paper is very helpful to due to its lack of rigor and straw-man tactics.

With an interpretable model typographical errors are obvious in the result. For example, if the system denies bail because you have four convictions, but you actually don't, then the problem is obvious. If the system denies bail with no interpretation then the typographical error goes unnoticed.

I guess I don't see that part. If the typo is in the number of convictions, wouldn't an interpretable model also be subject to that typo? An interpretable model would only consider number of convictions as one of the factors. So if you look at a model like one of the scoring models shown and there are 20-30 factors under consideration, the impact would not be any more apparent than it would be from reviewing the input data. Like if it said a person has zero convictions and allowed bail, but they had four convictions, it wouldn't be obvious from the result that there was a typo somewhere.

But it's required by law, at least in the EU. And what do you do when you can't really explain them? You BS people

This is dangerously close to trying to repurpose the term “black box” for irrational fearmongering.

The higher the stakes of the decision, the more incentive to approach it like a rational Bayesian agent who would use whatever tool has the best risk / reward tradeoff, totally regardless of explainability — or if “explainability” (which is not some universal concept, but instead differs hugely from situation to situation) is directly part of the objective, then its importance will be factored into the risk / reward tradeoff without any of this dressed up FUD language, and you might even have to pursue complex auxiliary models to get explainability in the primary models.

For a good example, consider large bureaucratic systems, like a military chain of command connected to a country’s political apparatus — the series of decisions routed through that is way too complex for a human to understand and it’s almost impossible to actually get access to the intermediate data about decision states flowing from A to B in, say, a decision to use a drone to assassinate someone and accidentally killing a civillian.

You could consider various legal frameworks or tax codes the same way.

What does “explainable” mean to these systems? A human can give an account of every decision junction, yet the total system is entirely inscrutable and not understandable, and has been for decades.

Turning this around on ML systems is just disingenuous, because there is no single notion of “explainable” — it’s some arbitrary political standard that applies selectively based on who can argue to be in control of what.

> The higher the stakes of the decision, the more incentive to approach it like a rational Bayesian agent who would use whatever tool has the best risk / reward tradeoff, totally regardless of explainability

These sorts of problems are mostly difficult precisely because we don't know the risk/reward tradeoff.

For example, in the prison system you need to evaluate the risk of recidivism. How exactly do you evaluate the tradeoff between releasing someone who is potentially violent vs. keeping someone potentially reformed behind bars? You'd need to weigh the dollar value of keeping them imprisoned and the personal harm to them and the rest of the prison population when they're imprisoned against the potential harm to anyone else if they're released... plus many other factors that I'm sure I've overlooked.

Anyone who tries to bake those tradeoffs into a "rational Bayesian agent model" will fail.

You're trying to describe the world in terms of game theory, but we live in a world where we usually don't know the risks, or the payoffs, or the rules of the game.

> “These sorts of problems are mostly difficult precisely because we don't know the risk/reward tradeoff.”

If you don’t know this trade-off, then you don’t have a decision problem.

Basically what you’re saying here doesn’t add up at all. Issues where an ML model is used but people want more “explainability” are places where we absolutely do know the risk / reward characteristics, and we’re specifically trying to measure more of the relationship between the internals of the decision process and those components of the risk / reward characteristics (for purposes of political arguments over who controls the process, dressed up as if they are scientific inquiry instead of politics).

If you did not even know what objective you are pursuing at all, you would not be using an ML model, and likely could not even identify what type of decision process you even need yet.

> If you don’t know this trade-off, then you don’t have a decision problem

Let me clarify what I mean.

In the cases I've worked with professionally (finance, autonomous driving), no-one gives you a reward function. You have a rough idea of the outcome you want, and you fiddle around with reward functions until your algirithm kinda-sorta delivers the outcomes you knew you wanted in the first place.

In this sense, the OP gets it precisely backwards. OP wants some sort of hyper-rational Bayesian model that optimises outcomes based on the one true loss function.

There is no correct reward function, and objective functions in general tend to be far too simple to capture the nuance one encounters in the real world.

> If you did not even know what objective you are pursuing at all, you would not be using an ML model...

It's not that you don't know at all, but you should be very clear that your objective is a fudge that probably doesn't capture what you really want. Maths is clean: the real world is messy.

> decision to use a drone to assassinate someone and accidentally killing a civillian.

So who's held responsible in this case? "Nobody" will not be an acceptable answer forever.

I think the prevailing political system is intentionally set up so that it is “nobody” or a low-level scapegoat. That’s the whole point of the system. Similar with corporate legal structure, corporate oversight and the way executive actors can avoid personal liability.

My overall point is that “explainability” is inherently subjective & situation-specific and whether a decision process “is explainable” has virtually nothing to do with the concept substrate it is made out of (e.g. “machine learning models” or “military chain of command” or “company policy” or “legal precedent” and so on...).

It’s about who successfully argues for control, nothing more.

The one who ordered the drone strike is responsible (but hardly held responsible). Easy to draw the parallel with ordering the deployment of anti-personel landmines: the one who ordered deployment is responsible (but may not have signed the treaty, and thus, is hardly held responsible).

Autonomy or explainability is often a red herring. Look at who gives the orders. It is unlikely to ever be the programmer, even if they made a grave mistake. We have a history for that with smart rocket systems.

It’s been acceptable for a long time now.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact