We don’t do this for complex plans where we track multiple intermediate goals, even not on a piece of paper, but the intermediate gaps between what is written down are truly inscrutable.
That being said I have some sympathy for the technical portions of her argument (and for the rest too).
We make complex reasoned plans like "how to make coffee": I'm going to need a filter in the pot, I'm going to need ground coffee, water, etc; let's make sure the filter's there before I add the coffee and both before the water, etc." Then we use some sort of hierarchical planning heuristics to run it. If someone asks what I did I can explain it at varying levels of resolution ("I made coffee", "I got these pieces together (xxx) and then made coffee...") again depending on various "explanation heuristics" which we learned as part of being eusocial organisms.
These plans are complex (even "get out of bed and pee" is a pretty complex plan).
However below that are a ton of decisions we aren't even aware we're making. I'm convinced (and my software reflects) that the actual "plans" we make organically are very short, and that the interesting plans -- the ones we can talk about and that we typically care about -- are very abstract ones. Making coffee is super abstract, after all.
Perhaps for an analogy: the extremely abstract reasoning for not using an antibiotic is at the level of "chemistry" while what I'm talking about is at the level of "physics".
Similarly, the output of an opaque and perhaps unreliable process could be not just an answer but also a justification. If the justification can be verified more easily than starting from scratch, we've made progress.
Even in the humanities there is a difference between argument from authority (blind trust) and argument based on reason and justification. And sometimes we have both. Court decisions include not just the decisions but also an explanation of why the court ruled that way. It certainly doesn't eliminate bad decisions, but it's useful.
So the discipline of being able to explain yourself seems pretty important. I don't think AI will be able to fully participate in high-stakes decision-making until it can do that.
Humans have a very good intuitive understanding of how other humans think, even when that thinking is irrational. This understanding has evolved over millions of years and it even predates our species. We developer all kids of mechanisms for controlling irrational behavior and tested them for thousands of years.
Taking a new system of machine decisions and adding it to the mix sounds problematic.
I think this relates to the current discussion. You don't need an explainable model for what cereal you choose, but you need a 100% clear one on what tenants you choose, who you lend to, who you hire, etc.
I really dislike the way so many companies are pushing ml based systems as some kind of scientific, perfect, measure for difficult human problems like recruitment.
His talks, particularly about superintelligence and automation, are also relevant.
> when a new image needs to be evaluated, the network finds parts of the test image that are similar to the prototypical parts it learned during training
basically just waving away the complexity/black-box-ness of "computer vision" into the complexity/black-box-ness of "similarity".
There's a difference between expert systems and black boxes.
Black boxes are problematic in domains where what matters isn't just your decision today, but the evolution of your decision making process.
Easy examples being finance, medicine, legal, education etc.
In these areas, when you are explicitly weighing competing interests/rights/harms, it's pretty important that you be able to explain your reasoning to another. So they can check it, so they can test it, and so they can apply it if it's good.
Not just because your decision could be wrong, but because the process by which we evolve our decisions is important (think precendent for law, blow up analysis for finance etc).
If we want to push our understanding of a domain forward, black boxes populated with a lot of data aren't super helpful.
They are able to spot complex patterns yes, many of which can be cleaned/restructed into simple patterns.
In reality most of the best uses of ML thusfar have been either rapidly screening/classification based on simple patterns (think OCR on a check - the character recognizer engine in the machine isn't really teaching us much about language or typology, it's just processing existing patterns), or in domains with extremely rigid game mechanics, where the rules never change but you can run a billion simulations (chess, go, video games etc).
Imho you won’t alwsys get an explainable model, because some times there may be too many factors that are predictive, but the effort is what’s important.
>Explanations must be wrong. They cannot have perfect fidelity with respect to the original model. If the explanation was completely faithful to what the original model computes, the explanation would equal the original model, and one would not need the original model in the first place, only the explanation. (In other words, this is a case where the original model would be interpretable.) This leads to the danger that the explanation method can be an inaccurate representation of the original model in parts of the feature space.
This is such a succinct phrasing of what makes me so uncomfortable with these approximate explanations.
Hmm. But the ultimate goal isn’t actually to find a model that makes good predictions on the training data. It’s to find a model that makes good predictions on data that‘s ingested in the future when the algorithm is put to actual use. But the set of possible future data is infinite! (Or at least exponentially large, depending on whether the input fields have finite precision.)
Yet, without any kind understanding of that transformation, we cannot reason about its properties in a meaningful way. This is the downfall of the current generation of succesful ML models. We may not need an exact understanding of a training result. An approximate one may be enough, depending on the kind of insight that needs to be extracted.
My pipedream vision is that ML models are some day just mere tools that help design simpler models with formally guaranteed properties.
We ought to be able to do better than building our ML models as black boxes and saying "humans are black boxes too". We're engineering this technology, so we have an opportunity to make it inspectable. We should take it.
So humans explaining themselves is often misleading data, which may be worse than no data.
But regardless of that, while post-hoc explanations may not be that good for building predictive models of behavior, observed behavior is. We generally know how people behave, and we know the expected variance due to individual and situational circumstances. Truly unexpected behavior is rare in society, and we tend to filter it out. Truly unpredictable people get locked up and/or are given medical help, but even less unpredictable people are tested and kept out of professions and activities where that unpredictability could cause problems.
>Truly unpredictable people get locked up and/or are given medical help, but even less unpredictable people are tested and kept out of professions and activities where that unpredictability could cause problems.
Given our knowledge about e.g. Narcissists or psychopaths their behavior is not totally unpredictable. I guess a serial killer could be extremely predictable, but not something we want in society.
None of this applies to ML. ML models can fail in ways we can't easily predict, for reasons we don't expect. Viewed as minds, they run on a different architecture and on different firmware than human minds. They're alien to us. Alien like a kitten who suddenly freaks out for no reason, like ants that can get stuck walking in a loop - except more so, because we've know cats and ants for as long as humanity exists, and they're more similar to us than ML models hardware- and firmware-wise.
We can't project the same incentive structure onto software companies. They are a different scale than humans, and might be able to tolerate the possible hit to their reputation better than a human can. Their incentives are usually money-based rather than social-based.
And for the models themselves, they have no incentives, unless you count "reduce their error rate" or "be interesting enough for researchers to continue to research them". We barely know why they work. Our basis for trust is rather tenuous.
Also no mention of the "Unforeseen Attack Robustness" metric by OpenAI :
There are probably other publications for NLP models, all the big players are aware that explanation is key.
Current black box AI does not learn such a high level logical relation although it might predict motions the most accurate.
High level logical relations likely generalize to other domains. Low level prediction models are sensitive to the distributions of the data and hardly generalizable cross domain.
Perhaps we need a hybrid system to combine both abstract logical reasoning and semantic tensor computations.
This probably needs legislative correction, because scrupulous data scientists are not in the driver seat.
In some domains, with sufficient feature engineering you can get reasonable results with these approaches.
If you are going to limit models to 9 features, or 9 combinations of features, or 9 rules, models are not going to work as well. This just seems like a very weak argument to introduce.
"A black box model is either a function that is too complicated for any human to comprehend, or a function that is proprietary"
These seem like two very different things. A false equivalence is then made with many arguments against proprietary models being made as if they apply to complex models.
"It is a myth that there is necessarily a trade-off between accuracy and interpretability....However, this is often not true"
It's a bit of a straw man to suggest that there is "necessarily" a trade-off, as there are surely cases where there is not. One could say that there is "often" a trade-off between accuracy and interpretability. I don't think there are many people out there with the naive view that one should never do feature engineering.
"If the explanation was completely faithful to what the original model computes, the explanation would equal the original model"
This is just nonsensical to me. The idea is not to be completely faithful, but to raise things up to another level of abstraction. The series of videos from Wired comes to mind where a concept is explained at multiple levels. https://www.youtube.com/watch?v=OWJCfOvochA
"Black box models are often not compatible in situations where information outside the database needs to be combined with a risk assessment."
This is absolutely untrue, depending on one's definition of often. The output of a model, whether it falls into this incorrect definition of a model or not, can be treated as a feature in another model. Ensemble learning exists.
COMPAS is the punching bag, but no one seems to know what it is. I haven't seen the evidence that its performance is equal to three if statements. It certainly doesn't have anything to say about machine learning in general, as it is set of expert designed rules. So, it is actually the kind of algorithm the author favors, except proprietary.
"typographical errors seem to be common in computing COMPAS.... This, unfortunately, is a drawback of using black box models"
Unclear why typographical errors only affect black box models.
The BreezoMeter case is not clear evidence of anything broader either. It is unclear whether the one error noted out of millions of predictions is from bad source data. Stretching this to concern any sort of proprietary prediction, such as mortgage ratings, is a stretch that doesn't really tell us anything.
"Solving constrained problems is generally harder than solving unconstrained problems."
This doesn't make sense to me at all. All evidence is that ML works better on constrained problems.
The idea that CORELS is somehow better, even if it comes up with a ruleset of millions of rules, doesn't make sense. The proposed workaround for this "the model would contain an additional term only if this additional term reduced the error by at least 1%" could result in the failure to create a model if no one term provided 1% on its own.
Scoring systems are useful, but it's like a single layer perceptron in the example provided. You need to consider combinations of factors to see their impact. A high X is bad, unless Y is also high and Z is is low.
The fundamental problem here is trying to limit the power of the algorithm to the power of the human mind. From the very beginning we have used computers to do things that are difficult or impossible for us to do. In some cases the answers were provably correct, but we have now reached a point where computer generated proofs are accepted. The "oracle" mode of computing will compute an answer for us, but we ultimately have to choose whether or not to accept it on the basis of the evidence, much like we do with the opinion of an expert. Simple techniques such as providing one's own test data set, and the kind of analysis done by Tetlock around Superforecasters, can go a long way to building that understanding of accuracy of predictions, such that we have a guideline for evaluating algorithms that are beyond our ability to understand.
The higher the stakes of the decision, the more incentive to approach it like a rational Bayesian agent who would use whatever tool has the best risk / reward tradeoff, totally regardless of explainability — or if “explainability” (which is not some universal concept, but instead differs hugely from situation to situation) is directly part of the objective, then its importance will be factored into the risk / reward tradeoff without any of this dressed up FUD language, and you might even have to pursue complex auxiliary models to get explainability in the primary models.
For a good example, consider large bureaucratic systems, like a military chain of command connected to a country’s political apparatus — the series of decisions routed through that is way too complex for a human to understand and it’s almost impossible to actually get access to the intermediate data about decision states flowing from A to B in, say, a decision to use a drone to assassinate someone and accidentally killing a civillian.
You could consider various legal frameworks or tax codes the same way.
What does “explainable” mean to these systems? A human can give an account of every decision junction, yet the total system is entirely inscrutable and not understandable, and has been for decades.
Turning this around on ML systems is just disingenuous, because there is no single notion of “explainable” — it’s some arbitrary political standard that applies selectively based on who can argue to be in control of what.
These sorts of problems are mostly difficult precisely because we don't know the risk/reward tradeoff.
For example, in the prison system you need to evaluate the risk of recidivism. How exactly do you evaluate the tradeoff between releasing someone who is potentially violent vs. keeping someone potentially reformed behind bars? You'd need to weigh the dollar value of keeping them imprisoned and the personal harm to them and the rest of the prison population when they're imprisoned against the potential harm to anyone else if they're released... plus many other factors that I'm sure I've overlooked.
Anyone who tries to bake those tradeoffs into a "rational Bayesian agent model" will fail.
You're trying to describe the world in terms of game theory, but we live in a world where we usually don't know the risks, or the payoffs, or the rules of the game.
If you don’t know this trade-off, then you don’t have a decision problem.
Basically what you’re saying here doesn’t add up at all. Issues where an ML model is used but people want more “explainability” are places where we absolutely do know the risk / reward characteristics, and we’re specifically trying to measure more of the relationship between the internals of the decision process and those components of the risk / reward characteristics (for purposes of political arguments over who controls the process, dressed up as if they are scientific inquiry instead of politics).
If you did not even know what objective you are pursuing at all, you would not be using an ML model, and likely could not even identify what type of decision process you even need yet.
Let me clarify what I mean.
In the cases I've worked with professionally (finance, autonomous driving), no-one gives you a reward function. You have a rough idea of the outcome you want, and you fiddle around with reward functions until your algirithm kinda-sorta delivers the outcomes you knew you wanted in the first place.
In this sense, the OP gets it precisely backwards. OP wants some sort of hyper-rational Bayesian model that optimises outcomes based on the one true loss function.
There is no correct reward function, and objective functions in general tend to be far too simple to capture the nuance one encounters in the real world.
> If you did not even know what objective you are pursuing at all, you would not be using an ML model...
It's not that you don't know at all, but you should be very clear that your objective is a fudge that probably doesn't capture what you really want. Maths is clean: the real world is messy.
So who's held responsible in this case? "Nobody" will not be an acceptable answer forever.
My overall point is that “explainability” is inherently subjective & situation-specific and whether a decision process “is explainable” has virtually nothing to do with the concept substrate it is made out of (e.g. “machine learning models” or “military chain of command” or “company policy” or “legal precedent” and so on...).
It’s about who successfully argues for control, nothing more.
Autonomy or explainability is often a red herring. Look at who gives the orders. It is unlikely to ever be the programmer, even if they made a grave mistake. We have a history for that with smart rocket systems.