We can't "explain" how we recognize a particular person's face, for example.
Robust pattern recognition based on 100+'s of factors is just inherently black box.
Even when people make decisions, they're generally a weighted combination of a bunch of "feelings". We attempt to explain a kind of simple, logical rationalization afterwards, but psychologists have shown that this is often a bunch of post-hoc fiction.
Black box doesn't necessarily mean bad. I think the relevant question is: how do we ensure machine learning is trained and verified in ways that don't encode bias, and only used with "intended" inputs, so a model isn't being trusted to make predictions on things it wasn't trained for?
And also: when do we want to decide to use alternatives like decision trees instead -- because despite being less accurate, they can have designers who can be held legally accountable and liable, which can be more important in certain situations?
This is one of the reasons the scientific method blossomed: objectivity, rigor, transparency, reproducibility, etc. Black boxes can lead to bad decisions because it's difficult to question the process leading to decisions or highlight flaws in conclusions. When a model is developed with more rigor, it can be openly critiqued.
Instead, we have models running across such massive datasets with so many degrees of freedom that we have no feasible way of isolating problems when we see or suspect certain conclusions are amiss. Instead, we throw more data at it or train the model around those edge cases.
To be clear, I'm not saying ANN/DNN models are bad, just that we need to understand what we're getting into when we use them and recognize effects that unknown error bounds may cause.
If, when the model fails to correctly classify a new data point, the result is your photo editing tool can't remove a background person properly... then so be it, no harm no foul. If the result is that the algorithm classified a face incorrectly with high certainty and lands someone in prison as a result (we're not there, yet) then we need to understand our method has potential unknown flaws and we should proceed carefully before jumping to conclusions.
The industry needs to stop misusing the term bias this way. Virtually every attempt to find this supposed human bias has failed. Latest public example was Amazon and hiring
Bias is the tendency to consistently mis-classify towards a certain class or tendency to consistently to over or under-estimate.
Somehow the term has been hijacked to mean 'discriminate on factors that are politically incorrect'. You can have a super racist model that's bias free, and most models blinded to protected factors are in fact statistically biased.
It's not constructive to conflate actual bias with political incorrectness.
Operational decision making, whether AI or human or statistical, faces an inherent trilemma: it's impossible to simultaneously treat everyone the same way, to have a useful model, and to have uniformly distributed 'bias'-free outcomes. At best a model can strive to achieve two of these factors.
Neither is wrong, but insisting that a naming clash carries any substantive significance on an underlying issue is just silly. Similarly, insisting that nonmathematicians should stop using a certain word unless they use it how mathematicians use it is a tad ridiculous.
Of anything, it's more reasonable for mathematicians to change their language. After all, their intended meaning is far less commonly understood.
Amazon's problem is a bug, they even describe it's nature. And given how flawed their recommendation algorithms are, it's especially unreasonable to assume this one is infallible.
So that linked Reuters does not show a failure to find bias, if anything it shows a design error.
Data says criterion is an eigenvalue and no matter how hard amazon tried to blind the solution to that eigenvalue, the ML system kept finding ways to infer it because it was that strongly correlated with the fitness function.
This is the difference between political newspeak '''bias''' and actual bias. Amazon scrapped the model despite it performing just fine and being bias-free, because it kept finding ways to discriminate on a protected attribute which is a PR nightmare in the age of political outrage cancel culture. It's fine to explicitly decide that some attributes should not be discriminated upon, but this comes with a cost either in terms of model utility or in terms of discrimination against other demographics. There's no way around this. In designing operational decision making systems, one must explicitly choose a victim demographic or not to implement the system at all. There's no everyone-wins scenario.
The harm of the newspeak version of '''bias''' is that it misleads people into thinking that making system inputs uniform somehow makes it bias-free when the opposite is typically true. Worse, it creates the impression that some kind of magical bias-free system can exist where everyone is treated fairly, even though we've formally demonstrated that to be false.
No amount of white-boxing or model transparency will get around this trilemma. The sooner the industry comes to grips with it and learns to explicitly wield it when required, the better.
Agreed. The optima of multiple criterion will essentially never intersect.
But for Amazon, there is no evidence the tool was accurately selecting the best candidates. They themselves never said it was. After all, altering word choices in a trivial way dramatically affects ranking. On the points you mention, why should we assume their data was relevant or their fitness function even doing what they thought? If they were naive enough, they could just be building something that predicts what the staff of an e-commerce monopoly in a parallel universe will look like.
The most likely story is that they failed at what they were doing. Part of that failure happened to be controversial and so got unwanted attention. I would guess there were quite a few incredible correlations the tool "discovered" that did not get to press.
At any rate, their recommendation engine is more important and has been worked on longer yet it is conspicuously flawed. When their recommendation tool inspires awe then maybe we could take their recruiting engine seriously enough to imagine it has found deep socio-psychological-genetic secrets.
Of course, black box AI itself is not the right solution. As more and more cross domain multitask settings emerge, open box AI will gradually take off. It is about compositional capability like functor and monad in functional language. Explanable or not is just a communication problem which is parallel to the ultimate intelligence problem. It is very possible that human intelligence is bounded.
How do we "know" we see red and not green or vice versa? How do we "know" we are feeling heat or coldness? We just do. We feel different things in different, yet recognizable ways.
And these "feelings" are multi-dimensional, feeling heat and feeling color are not on the same axis. Red and Green seems to be on different points of the same axis and hot and cold are points on some different axis, other dimension.
How many different dimensions of "feelings" are there? Note they are not the same as different senses. There can be many different feelings associated with things we see and recognize that way: colors, and shapes and lightness vs. darkness etc.
It's the decision making that's complicated.
To make a decision on whether it would like chocolate ice cream or not (before trying it) a brain will take all the data gathered over time, cross reference it with related data (does chocolate taste good alone? \ did I like it? \ It looked rather inedible. \ was it high cocoa content or not? \ Chocolate yoghurt seems similar, but I did not like it. \ I do like cold, but only when it's hot outside. \ My friend says chocolate ice cream tastes good. \ My other friend says it tastes bad. \ trust or discard opinions? and much more).
All within minutes, seconds even.
An AGI would need an absolutely massive database of knowledge similar to the one acquired by a human over the first 20-30 years of their life, if it's to be truly general.
It would also need bias correction, and maybe 2-5 other AGIs to form a consensus before making any decision properly.
And even that would not be enough to fit in with humans - if you want an AGI to make the objectively right decisions, it will have to ignore many human feelings/biases.
If you want it to fit in with people, it will have to make "mistakes", i.e. ineffective decisions.
It can measure them and process it as a number, but when does sensing come into play? Does a red sheet of paper recorded by a simple camera evoke the same "feeling" of redness inside the camera that it evokes in a human being?
The flippant answer is "of course not, the camera has no consciousness/perception". The question is then how and when this feeling of "redness" gets created and what are the necessary conditions for it to happen.
That would come from learned data, experience, imo. A child who's never touched a red hot electric stove, for example, would not have any bias towards it. No fear, no love, unless they previously interacted with one.
They would try and get close, watch it, smell it, touch it, to learn more.
The most interesting part of consciousness is how does one decide based on incomplete information? If you need to learn more, where do you find the raw information? It seems to be done subconsciously, some people have a higher affinity for learning/fact-finding than others. But all people are pre-wired to learn from others, distributed computing works best heh.
I guess you're asking the same question as me, where is this "programming" and how is it created? It's not just raw experience, it seems to be genetic/evolutionary. A sort of basic firmware to bootstrap further learning.
I find it fascinating, it would seem the brain never stops processing the data it acquires. During the day, and during the night, it always runs learning jobs.
When you're looking at a red sheet of paper, reading the input in the form of a signal is not the only thing happening. The signal is definitely read and passed along to other brain circuitry for processing, but somehow another thing happens: the experience of red (which is what the other poster called "feeling"). This experience of red is fundamentally different from the experience of green or blue or from the complete absence of looking. This experience is what I was referring to.
This experience seems to happen without any prior knowledge or exposure to the color red. The first time you stumble upon red light, this experience of red arises. You can tell red from green by the marked difference in experiences, but you cannot explain this difference in words to someone who hasn't experienced red or green themselves.
How does this experience get created? Does any sensor (such as a camera) have such experiences? Why not?
Yes. And the models developed like this haven't solved the problems the current black box models have.
> we have models running across such massive datasets with so many degrees of freedom that we have no feasible way of isolating problems when we see or suspect certain conclusions are amiss. Instead, we throw more data at it or train the model around those edge cases.
There are ways being studied to check for sensitivity to parameters, biases, etc. But in the end, reality is difficult and there's no way of dealing with that, or "getting the right answer" every time
The question of why we would choose a black box model over something interpretable that works equally or nearly as well still needs to be asked, especially if there is a misconception of black box models working better than interpretable models due to current trends and the hype cycle.
> held legally accountable
I think this misses the point. It is not about accountability but about getting the right results. The right question is where we need an interpretable model to eliminate biases or otherwise satisfy ethical requirements. Often you can't know whether a decision was ethical without knowing how it was reached. Would you accept a trial verdict given by AI? What about a lending decision, or a job promotion? If you need to justify a decision and not merely reach a decision, you need something interpretable.
Why? Human intelligence evolved over millions of years. And we're struggling to keep our changes to the world from killing us.
Political systems are also designed. If human behavior is in danger of killing us, we must look to the political systems we have created and solve the problems there.
Or, more commonly, people work to ensure their machine learning is trained and limited to not specifically notice any patterns of fact that would be politically unacceptable to notice and make use of, even if it was rational and useful for the task at hand.
The status quo moral system of our society rests on a collection of objectively false beliefs about the physical world. Most humans know enough to avoid noticing these facts because of the social cost and possible societal negative impacts of shaking those moral foundations. It's not even necessarily wrong, since agreed-upon lies can do a lot to reduce conflict and cruelty. But machines need to be specifically trained to not notice these facts. You need to teach the machine not to notice the emperor's nudity.
And what's even better is that this'll never get acknowledged in the literature. It's one of those self-hiding facts which exists if you're willing to notice it yourself, but which no authority will ever tell you (I love these, wish there was a word for them).
Mammalian pattern recognition is fairly well understood. We have a good understanding of the information path from the eye to V1 in most model animals. So sure are we of this pathway that we've successfully implanted ships into blind patients and restored sight (results vary a LOT, though).
I know this may not fit pattern recognition in a super-philosophical sense, but damned if it's not a ways down that road. We're literally shocking a person's brain so that they 'see' again (!!!). How V1 feeds into the rest of the brain is VERY active research and has had a lot of successes, but we're a long ways from a 'black box' these days.
Honestly, the auditory pathways are a lot better understood, as they seem to be 'older' and specific tones are represented physically in the mammalian brain (tonotopy). As such we know a lot better (relative to sight, smell, taste, etc) how tones are encoded and how they elicit specific recognitions and responses (this work is NOT straightforward).
In Neuroscience/Biochem, we are getting a LOT closer, but yes, we are not there yet by any means. The 'black box' idea of the mammalian brain is not going to last another 500 years, likely. We're marching along with good time as a species.
That said, wait, what? We can't do the back-propagation calculus in a NN? When has that been true and for how long? I thought it was fairly straightforward to know the weights of the connections of the nodes in your network. It's just a tensor you grep for, right?
Edit: I guess you were referring to this:
> Robust pattern recognition based on 100+'s of factors is just inherently black box.
I think he meant " ... given our current knowledge."
Sure, they say "Oops! It was an algorithm, we didn't do anything!", but is that any different than management saying "Oops! It was just a rogue employee, we didn't do anything!" For anything sufficiently consequential or systematic the second excuse doesn't work, so why should the first excuse work?
That's what individuals within companies do all the time.
You're onto something, form a limited liability company for your AIs and hire them as contractors. There, a clear assigned accountable entity :)
Agreed. But as you note, even though humans are basically black boxes we can ask them questions in order to find out how they came to a particular conclusion. (How reliable the answers to these questions are is of course a different matter.)
So maybe we don't necessarily need fully interpretable models but simply a way to ask black-box models specific questions about their state, e.g., "To what degree does a person's age influence the output?".
No, you can't. If somebody treats you with suspicion, it's because of a combination of their news intake, their culture, local events, what their friends and family would think, the way you present yourself, and many other factors. You can always ask somebody to state their reason as a simple "if-then" statement, and they can make one up on the spot, but it'll be so oversimplified that it's basically a lie.
> So maybe we don't necessarily need fully interpretable models but simply a way to ask black-box models specific questions about their state, e.g., "To what degree does a person's age influence the output?".
You can already do that. Just change that number in the input and see how the output changes. To that extent, even the most black box AI model is more transparent than human decision making.
Well, I guess it depends on how self-aware a person is. I think the biggest danger is trying to rationally explain your decision when in fact it was based mostly on your feelings, in which case I agree that the explanation is "basically a lie". One needs to be honest when something is not based on a fact but on a feeling to prevent pointless discussions. (If I hold an opinion based on a feeling then you cannot convince me that I am wrong by giving me facts.)
> You can already do that. Just change that number in the input and see how the output changes.
Makes sense. But I guess transparent models would still be generally preferable because you can fully understand how the output is produced, whereas in black-box models you might have to ask quite a lot of questions to get a feeling for it, but even then you can't be sure that you have a full understanding of it.
You punctuate the second sentence as though it were of secondary importance. But in many cases, we have little ability to figure out how we came to a conclusion, while being much better at fabricating plausible and politically acceptable answers. I put it to you that having questions answered with plausible fabrications, is actually a significantly worse situation than not yet being able to ask the questions at all. At least in the latter situation, we know what we need to be working on.
Hmm, too me it feels like I can explain the reasons why I came to a conclusion in many (but certainly not all) cases. You "just" need to clearly identify your feelings and emotions and separate them from your rational arguments.
Anyway, these are our own shortcomings and of course don't have to be adopted by any artificially built black-box model.
Maybe the next direction in AI will be to bridge the gap between expert systems and black box models?
For the most basic cognitive tasks, we typically can't.
If you show someone a picture of a cat and a dog, they can easily recognize which is which.
If you then ask, "but how do you know?", I don't think most people could say anything useful. If it looks like a cat and it walks like a cat...
Epistemology and phenomenology are philosophical fields that deal with this. By no means is it a solved problem.
At each synapse a significant amount of data processing occurs, right from the rod and cone cells in the retina and at nearly each synapse along the way. 'Data' is more-ish conserved into V1 where all hell seems to break loose, path-wise. After V1 and in nearly the entire cortex, these data are distributed in unique and very complicated ways throughout the brain. Most of the mammalian eye-V1 pathway is well understood, but that process is too much for a comment on HN. But, I want to stress, we have a very very good idea of what is going on.
That said, the research seems to indicate (highly debated still) that there are specific synapses that encode 'the left side of grandma's nose in dim lighting from 10 feet away' and other many other things. Again, I cannot stress enough that such 'grandma' synapses are not firm science and there is a LOT of research still ongoing. But the evidence seems to be pointing that way at this time. Such cells are fed this information from V1 and, likely, a lot of other places. When such 'grandma' cells fire, they then send that signal out to other cells they may be connected to. Such a system is likely replicated many times all over the brain; there are many copies of 'grandma' cells and they are wired up in many different ways. The physical location of 'grandma' cells in highly unique, if they exist at all. Again, research is very much on-going.
I think the article tries to compare black box learned algorithms to explicit algorithms. Humans being a blackbox is more of a philosophical issue from that perspective.
Isn’t the first sentence just an example of the second?
That's not the same question as whether there were feelings involved, or whether the process was conscious.
Sure, but that doesn't mean "a weighted combination of feelings" is an accurate description of the true process. That's tendentious and speculative. What I think we can be sure of is that there are parts of our decision making that we are not consciously aware of, but precisely because of that, one should be cautious of assuming how it works even if you're not doing it by reflex. Psychological explanations are the rationalizations that people make after the fact.
http://www.theunconsciousconsumer.com/consumer-psychology/20... (regrettably thin on citations)
If you look at the physical world with objects that have relationships and characteristics, and describe a mental world that also has discrete entities that interact in designated ways, then I think you are reasoning by (unfounded) analogy.
Does it matter? If a brain counstructs post-hoc argument for given decision and the argument is defensible against other arguments, then the argument itself is good enough substantiation of the decision, even if it is different from the actual decision process.
Are you implying you cannot be held accountable for black box models?
The argument being made is that even though the demonitisation of certain classifications of content is immoral (e.g. demonitising any and all LGBT content, even when family friendly), it is tacitly being allowed to happen due to YouTube's stance of "the algorithm" making the decision, not them.
Which may lead then to have a position of "supporting" LGBT as a company while simultaneously being able to demonitise said content, which could hurt their business in actively hostile to LGBT countries.
This isn't proven, but it's an example of how it could be used to absolve yourself of blame and point it at an inscrutable system.
AI systems aren't activating themselves, they're being used because a chain of people are authorizing them based on promises made by other people. So if an AI system in a courtroom wrongly jails people, you can still hold judges accountable for using the system. You can still hold manufacturers and businesses accountable for promising a degree of accuracy that their system couldn't meet.
In the same way, if my dog bites someone, and I get sued and argue, "animal motivation is really hard, we don't know why the dog chose to do that", I'm not going to win the case. The reason it happened is because I didn't leash my dog.
The reason the black box hurt someone is because an accountable human improperly used it or inaccurately advertised it.
A few commenters here posit that without a black box, designers might be held accountable. If there's a system where software manufacturers would be held accountable for bugs, is any judge going to say, "well, the software is bugged, but they don't know how to fix the bugs, so they're off the hook"?
If there's proof that Twizzlers are poisonous and kill people, the Twizzlers manufacturer can't say, "but we don't know why, we just threw a bunch of chemicals in a vat at random without writing down the labels. So therefore, it's not our fault."
I don't know, can they? I'm not a lawyer, maybe everything in our legal system is way more horrifyingly broken than I assume.
However, in this article and elsewhere Professor Rudin has cited compelling evidence of cases in which black box models have been demonstrated to be no more accurate than interpretable alternatives. I feel this fairly justifies the question in the title of the article. For example, based upon available evidence, it appears reasonable that some onus should lie on the creators and buyers of COMPAS (a proprietary black box recidivism model) to demonstrate COMPAS actually is more accurate than an interpretable baseline. While it may not be the case, as the article seems to suggest, that in all modeling cases there is an interpretable alternative with comparable accuracy, in cases which there is, there doesn't seem to be any justification for using a black box model.
On the matter of "human-style" interpretability, we are brought to the difference between "interpretability" and "explainability." Humans have a complex capacity for constructing explanations for the thoughts and actions of ourselves and others (among other things). As OP points out, a lot of famous psychological experiments by Kahneman and others have shown how much of our reasoning appears to be post-hoc, often biased, and often inaccurate (in other words, human explanations are not actually true transparent interpretations of our thoughts and actions). However, we humans do have a powerful capacity to evaluate and challenge the explanations presented by others, and we are able to reject bad explanations. For those interested, a great book on this topic is "The Enigma of Reason" by Mercier and Sperber (https://www.hup.harvard.edu/catalog.php?isbn=9780674237827), but the gist here is that we must understand that while explanations are not the same as transparent interpretability, they are still useful.
I would conjecture that at some level of complexity (which some predictive tasks like pixel-to-label image recognition seem to exhibit), true end-to-end interpretability is not possible -- the best we can do is to construct an explanation. However, two very important points should be observed when considering this conjecture:
1. (Professor Rudin's point in the article) In cases which are not too complex for interpretable models to achieve comparable accuracy to black-box models, we can and should use them, as they offer super-human transparency at no cost in accuracy.
2. Constructing no explanations (or bad explanations) is not the same as reaching the same level of semi-transparency that humans offer. If we want to use human interpretability as a benchmark, black box models with no explanations are not up to par.
I don't think its fair to describe CNN's and human vision as being black boxes in the same way.
In my eyes this is very similar to our lack of understanding about the inner workings of a CNN, or human vision. We understand the input and the output and their rough relationship, we just don't understand how the move from input to output happens.
The distinction I would make is that even before Newton, people could make useful predictions about the trajectory of falling objects. My dog does it well, every time he catches a ball. He even seems to grok the effect of wind on a frisbee.
We have no way of predicting how a CNN will react to an input. GANs really illustrate this point. I think this is an important distinction.
Decision trees are black box models and are highly accurate. (Think XGBoost, LightGBM, Random Forest, etc)
Not sure what point you are trying to make here.
From https://www.fico.com/en/newsroom/fico-announces-winners-of-i...: "The team representing Duke University, which included Chaofan Chen, Kangcheng Lin, Cynthia Rudin, Yaron Shaposhnik, Sijia Wang and Tong Wang, received the FICO Recognition Award acknowledging their submission for going above and beyond expectations with a fully transparent global model and a user-friendly dashboard to allow users to explore the global model and its explanations. The Duke team took home $3,000."
Cynthia Rudin is one of the article's authors.
IBM turned the model/paper into a toolkit: https://www.ibm.com/blogs/research/2019/08/ai-explainability... Their model seems to be a variant of decision trees that has a knob controlling how complicated the trees are.
And the evaluation was completely subjective, so there's not any meaning to the Duke people losing besides that the judges didn't like them.
That's what you get if you use a black box for judging :-)
Objectivity is more accurate, sure. The winner of an objective contest is always objectively better against objective criteria. But, objective criteria are generally narrow. This works well if one is either (a) seeking fundamental principles like in physics or (b) the narrow objective criteria is the definite goal.
In this area, we don't exactly know how to define narrow, objective goals & subsequent criteria. We can definee goalposts, but not goals. These are guesses at useful markers of success, useful to the larger goal of useful/novel ai.
Subjective goals have their own (massive problems), but since we can't objectively define the goals of ai research... we need to fall back on human subjectivity to define our subgoals.
I think we definitely need more R&D dedicated to creating easier-to-use, lower-cost approaches to transparent, explainable ML. There is way too much effort devoted to blackbox R&D today. Ultimately, transparent, explainable ML should almost always beat blackbox ML due to a better ability to find and resolve hidden problems, such as dataset biases, that may be holding back performance.
Other narriatives that can creep in "once we get enough data..." which for some problems may as well be never.
Also, for some systems pertubing the system changes the problem meaning your dataset and model may not reflect changes in a dyanmic / complex system.
Also, blackbox ML approaches struggle to combine data from multiple modalities which is often relevant to many real world problems (at least with the majority of algorithms which are realistically implementable off the shelf).
I think that's kinda true but kinda false. Its true that deep learning often makes feature engineering moot. however, a lot of deep learning projects takes machine learning engineers and applied scientists along with a host of other support engineers + hardware costs ( I seen people at work say: I only used a 16 GPU instances over a couple weeks of training )
meanwhile, I consider gradient boosting fairly interpretable and they can get pretty close results with a lot less tweaking and training time. If you want to go full non-black box, logistic regression with l1 penalization and only a little bit of feature engineering often does really well, probably a lot less development time / cost compared to those high cost PHD research scientists.
DL's reduction of the required feature/model engineering is a big deal for difficult problem domains where the cost of a mistake is low. That doesn't mean you don't still benefit from adding development resources, it's just that the development cost/performance tradeoff is still typically better than with a similar-performing explainable solution. I hope this will change in the future.
> ..., logistic regression with l1 penalization and only a little bit of feature engineering often does really well,...
While I agree that Lasso is far more explainable than DL, its explainability rapidly degrades as the useful feature dimension increases and it requires significant feature engineering for good performance on difficult problems.
This is not always true.
Many problems are modeled very well with less than 100 parameters and adding more is of little-to-no benefit.
Many problems are naturally hierarchical such that simple models can be combined to yield a large number of explainable parameters. If done well, this can result in a high-performing solution. Admittedly, this is usually harder than just applying a blackbox.
In critical applications, an explainable model with benign failure modes (even if it has worse overall performance), can be far preferable to a blackbox with wildly unpredictable failure modes. From a utility standpoint, the explainable results are better.
> There's a limit to how much humans can understand. We use machine learning to surpass that limit
We can also work to improve our ability to discover and understand. I think that holds far more promise than improving our ability to do things we don't understand.
Even in computer vision, which is where I think they've been most successful, the visualization techniques used seem more suggestive, then explanatory .
Well, that and a decision tree isn't any more interpretable if its nodes are all filled with `Pixel[i,j] > Threshold` -- or is the idea that you would somehow extract logical predicates by tracing paths through the tree or glean interpretation from those predicates?
In practice the answer is a massive “no” so far. Some of the least interpretable models I’ve had the misfortune to deal with in practice are misspecified linear regression models, especially when non-linearities in the true covariate relationships causes linear models to give wildly misleading statistical significance outputs and classical model fitting leads to estimating coefficients of the wrong sign.
Real interpretability is not a property of the mechanism of the model, but rather consistent understanding of the data generating process.
Unfortunately, people like to conflate the mechanism of the model for some notion of “explainability” because it’s politically convenient and susceptible to arguments from authority (if you control the subjective standards of “explainability”).
If your model does not adequately predict the data generating process, then your model absolutely does not explain it or articulate its inner working.
That's a very dynamicist viewpoint. I don't necessarily disagree.
However, in what sense to the prototypical deep learning models predict the data generating process?
I tend to agree that a lot of work with "interpretable" in the title is horseshit and misses the forest for the trees.
But it's also unclear to me how to get a decision tree to perform as well on image recognition tasks the same way that a CNN does. (Of course, as you mention, the CNN will likely face adversarial examples.)
Its very similar to the argument kids make "Why do I have to learn how to do long division when I have a calculator?" or "Why do I need to show my work on the exam if I got the right answers?"
I tend to agree with the articles premise that any model being used in critical decision making should at the very least have a list of parameters and the weight given to each, no matter how long and complicated such a list might be.
I think eventually this will be the end result of most critical implementations of machine learning applications as they make their way through the courts, as I can't imagine a judge accepting the argument "The machine made a mistake, we are not sure why or the reason why is proprietary, but we are not responsible because the machine made the error".
I once worked at data science consultancy. They worked on a project that they ended up using random forest. But they also ran a simple single layer net to validate the results, scoring a similar accuracy (it was like 70%). They gave a presentation on a Friday to the company (it was a regular show and tell demo slot).
At the end, I asked them if they double checked whether the net weights corresponded to the random forest and almost everyone in the room just looked at me like I was an idiot.
I ended up causing a heated discussion in the pub later on because I’d actually done something similar for my masters thesis (it was for modelling XOR operations on binary inputs).
The company in question is notoriously short term focussed, so it shouldn’t really have been a surprise that they didn’t even try to think about it.
And that’s kind of the problem. Short term speculative projects don’t care about interpretability. A director at some clothing firm doesn’t care about the super nerdy maths, they like getting their numbers.
But, on the flip side, Explaining loss functions and backprop is also a no go for the directors. If you can’t help them understand an algorithm in 2 sentences then they often don’t buy it. So then explainability becomes a big thing (hence why random forest was implemented and not the net).
It makes me think about the mess the derivatives markets made in 2008
Humans are far more ‘black box’ than neutral nets, we just happen to be able to construct plausible explanations in parallel.
> the scary thing is when decisions are being made that will affect a persons life but will never make it to the courts. Like the example of predicting loan defaults or parole releases, those decisions are made with little to no explination.
I was pointing out that it’s already like this. Whether it’s a black box ML model or a capricious bank employee makes no difference if there’s no transparency either way.
There's reality, there's legal reality, and then there's what legal reality does to reality.
Exactly. We included mathematicians, computer scientists and engineers in the design of those calculators. They use algorithms backed by rigorous proofs, run upon circuits with rigorous tolerances such that the answer is correct (to a reasonable amount of precision). Kids with their calculators are intellectually lazy -- we should expect that, but continue to teach otherwise. To see this from full-grown adults in academia is an embarrassment. But, god, would I enjoy a ML salary...
For-instance when a model starts being over fitted, one can say it's pretty much converging to something akin to a LUT (Look up table). While over fitting is generally undesirable it still might be interesting. Principally, if someone could figure out how basically it is indexing the data. Perhaps you could create a simpler hash function or find some rules for creating useful hash functions for that type of data.
I also don't see why you can't create a "function" hash or otherwise, that works for 1000s of dimensions. The model itself, if it was perfect would be an equation... the number of dimensions really doesn't matter that much.
Sorry, I'm just real confused by your statement.
Almost every image the car sees is going to be different. I'd guess that's why GP said that it doesn't make sense to use a hashing function - there's little value in mapping inputs to results, because your input images are pretty much always going to be different. So pretty much every time you look up an image in your hashmap, you wouldn't find a match.
That's the point of ML models - find patterns in data so that when you see a new example, you can predict what you're seeing based on what you've seen in the past.
OP's comment effective asks whether there is another, more grokkable function that maps/hashes inputs to the same labels.
Granted, that question boils down to "can we create human-understandable models?" which is the whole point of this discussion.
It's a good question, though. If we had black-box-like spaghetti code performing the same task, I predict that the comments here would be very different.
Based on your phrasing of the issue, it seems like we could think of the problem as: can we reduce the number of parameters in an ML model to the point where humans can understand all of the parameters? That's related to an active research area - minimizing model size. ML models can have billions of parameters. It's unfeasible for a human to evaluate all of them. Research shows that (sometimes) you can reduce the number of parameters in a model 97% without hurting it's accuracy . But, 3% of a billion parameters is still way too many for people to evaluate. So I think the answer so far is no, we can't create human-understandable models that perform as effectively as black boxes.
For example, you train an image recognition model to tell cats and dogs apart using images from the internet, then you take out your phone, snap an image of your dog and give it to your algorithm to determine the species. When preprocessed this picture is equivalent to one row in a table with thousands of columns and this specific combination of pixel values doesn't exist anywhere else. Where is the hash function looking and for what in that case?
Whilst your model may not be able to explain to you today why it can operate on future data, if it can explain to you why it works on any data you've already given it, it makes it easier to explain behaviour.
You have to trust it on new data.
You can always verify why it made the choices it did on old data.
That seems like something incredibly helpful, to me.
This makes no sense. Your model typically isn't asked to make predictions on "old data" because it was trained on it. And understanding why it made certain prediction isn't any different on "new" vs old data.
You can look into bias-variance tradeoff, train-test splitting and cross validation to get a better picture of this.
Or, like in the article, you can make use of introspective models that can tell you what dimensions were fitted. This can more easily show you where overfitting is occurring.
You can see the weights of the model and follow the decision network, which allows you to explain why something has occurred in a much clearer fashion.
An introspective model gives you a much clearer picture of why certain choices were made, rather than the fuzzy picture you get running statistical analysis against a black-box'd algorithm.
I'm also pretty wary of interpretability/explainability research in AI. Work on robustness and safety tends to be a bit better (those communities at least mathematically characterize their goals and contributions, and propose reasonable benchmarks).
But I'm also skeptical of a lot of modern deep learning research in general.
In particular, your critique goes both directions.
If I had a penny for every dissertation in the past few years that boiled down to "I built an absurdly over-fit/wrongly-fit model in domain D and claimed it beats SoTA in that domain. Unfortunately, I never took a course about D and ignored or wildly misused that domain's competitions/benchmarks. No one in that community took my amazing work seriously, so I submitted to NeurIPS/AAAI/ICML/IJCAI/... instead. On the Nth resubmission I got some reviewers who don't know anything about D but lose their minds over anything with the word deep (conv, residual, variational, adversarial, ... depending on the year) in the title. So, now I have a PhD in 'AI for D' but everyone doing research in D rolls their eyes at my work."
> Those same people will likely at some point call for a strict regulation of AI...
The most effectual calls for regulation of the software industry will not come from technologists. The call will come from politicians in the vein of, e.g., Josh Hawley or Elizabeth Warren. Those politicians have very specific goals and motivations which do not align with those of researchers doing interpretability/explainability research. If the tech industry is regulated, it's extremely unlikely that those regulations will be based upon proposals from STEM PhDs. At least in the USA.
> faking results of their interpretable models
Jumping from "this work is probably not valuable" to "this entire research community are a bunch of fraudsters" is a pretty big jump. Do you have any evidence of this happening?
This is very, very accurate. On the other hand, I oftentimes see field-specific papers from field experts with little ML experience using very basic and unnecessary ML techniques, which are then blown out of the water when serious DL researchers give the problem a shot.
One field that comes to mind where I have really noticed this problem is genomics.
While this is probably true (and IMO, possibly the right choice), this:
is taking a huge leap.
Why are they more likely to fake their results than companies selling a black box model?
But I'll let Geoff Hinton himself explain why the reliance on state-of-the-art results is effectively hamstringing progress in the field:
GH: One big challenge the community faces is that if you want to get a paper published in machine learning now it's got to have a table in it, with all these different data sets across the top, and all these different methods along the side, and your method has to look like the best one. If it doesn’t look like that, it’s hard to get published. I don't think that's encouraging people to think about radically new ideas.
Now if you send in a paper that has a radically new idea, there's no chance in hell it will get accepted, because it's going to get some junior reviewer who doesn't understand it. Or it’s going to get a senior reviewer who's trying to review too many papers and doesn't understand it first time round and assumes it must be nonsense. Anything that makes the brain hurt is not going to get accepted. And I think that's really bad.
What we should be going for, particularly in the basic science conferences, is radically new ideas. Because we know a radically new idea in the long run is going to be much more influential than a tiny improvement. That's I think the main downside of the fact that we've got this inversion now, where you've got a few senior guys and a gazillion young guys.
My point wasn't about lack of investment/propagation of fundamental research that is not trendy, it was about hijacking what should be science by "softy PhDs" that found a niche in less demanding areas and will likely impose their will over the ones who are doing hard science and not politics, like how CoCs were recently used to take control over open source/free software licenses by some fringe non-technical groups. It's a pattern that is repeating across all industry and academia in the past, the ones that move field forward are often displaced by their "soft-skilled" and less-capable peers.
1) I would prefer something that works most of the time (blackbox) than something interpretable that doesn't
2) if my DL converged to e.g. solving some complex partial differential equation, how does it help me that I could "interpret that" if 99.9999% human population has no clue what is going on regardless? "Your loan was rejected, because the solution to this obscure PDR with these starting conditions said so; these Nobel prize economy winners created this model with these simplifications and we deem it good enough to keep you where you are."
No, I keep on seeing AI/ML/DS people keep on downplaying statistic.
Statistic interpret things. The majority of the models out there have a one to one, predictor to response, holding all other predictor constant (linear regression, logistic regression, arima, anova, etc..). Statistic inference is a thing. Inference is interpreting. Descriptive statistic is interpreting. Parsimonious is a thing. Experimental design is a thing. Degree of freedom is a thing in statistic.
If you want interpretability do statistic. One of it's tenant is to quantify and live with uncertainty not fitting a curve and lots of coefficient to just predict. Not just classification.
It's a reason why biostat or econometric is a thing. Statistic.
Even the blog cited statistic papers even though it barely mention statistic models in it. ~~And Rudin is a statistician and contributed a lot in statistic.~~ Wrong person (I'm thinking of Rubin for casuality and missingness)
This is not a tribal fight between statistic and ML. This is pointing out that ignoring statistic is a detriment to AI/ML/DS as a field.
I predict that 2020 to 2030 statistic will be coming to AI/ML much more so regardless how much people downplay statistic.
~~Seeing on Dr. Rudin is coming over.~~ I've seen other statistician too. Dr. Loh works took decision tree and added ANOVA and Chisqaure to build parsimonious decision tree.
Interpretability obviously doesn't hurt accuracy. But it is costly to engineer. And not always possible to make. Not always possible, because human capacity (and willingness to put in the effort) into understanding the explanation is limited.
Why do you say this? From what i have seen it certainly can and does. For some industries finding a trade off where the magic is.
So uhhh, isn't this like not science? Like my biggest problem with "machine learning" is people assume the data they have can correctly answer the question they want to answer.
If your data is off, your model might be incredibly close for a huge portion of the population (or for the biased sample you unknowingly have), but then wildly off somewhere else and you won't know until it's too late because it's not science (e.g. racist AI making racist predictions).
A model cannot be accurate if it doesn't have enough information (like predicting crime, or the stock market). There are an insane amount of statistical tests to detect bullshit, and yet we're pushing those and hypothesis testing right out the window when we create models we don't understand.
Like I just don't get how some folks say "look what my AI knows" and assume it's even remotely right without understanding the underlying system of equations and dataset. You honestly don't even know if the answer you're receiving back is to the question you're asking, it might just be strongly correlated bullshit that's undetectable to you the ignorant wizard.
I find it pretty hard to believe we can model the physical forces inside a fucking neutron star (holla at strange matter) but literally no one in the world could pen an equation (model) on how to spot a fucking cat or a hotdog without AI? Of course someone could, it would just feel unrewarding to invest that much time into doing it correctly.
I guess I can sum this up with, I wish people looked at AI more as a tool to help guide our intuition helping us solve problems we already have well defined knowledge (and data) of, and not as an means to an end itself.
But all science works like this. All scientific models can have errors, and presumably they all do. Even really basic empirical science is only reliable to the extent that our models of how photons and our visual systems work, and those models have errors we know about and probably others we don’t know about.
Fallibility does not mean that something is not science, on the contrary, denying that some theory or model is fallible is profoundly unscientific.
But of course, that doesn’t mean we should accept “black box” algorithms as the end of the story. We should strive to develop explanations for those things just like for all other things.
Very little of technology has anything to do with validating hypotheses.
> meaning that humans, even those who design them, cannot understand how variables are being combined to make predictions.
The intention is to not rely on the explanation to evaluate the effectiveness of the model. This does not preclude any of the infinite narratives that might explain the model.
This is fundamentally a cost saving mechanism to avoid hiring engineers to code heuristics useful to business. There is nothing related to science here at all. A "black box" model is fashionable to those who prefer to observe and not create meaning, even if the observed meaning is deeply flawed from a human perspective. After all, people spend money based on less all the time.
1. It is hard than it should be to explain the concept to people (particularly VCs)
2 . people struggle to understand that a mechanistic model could have more utlity than a machine learning black box
3. people think you are doing something wrong if you are not using a neural network
4. The less people understand about neural networks, the more they seem to believe they are appropriate for all predictive / modelling problems
5. There is generally quite a low understanding of scientific method in the startup / VC space (speaking as someone who has worked in and around academia for years) vs how "scientific" people believe they are because it sounds good to be data driven and scientific about running startups and funding them.
If so, I'd love to get in touch, shoot me an email
You could use a black box model if you're more interested in predicting correctly images of handwritten digits than in understanding how the pixels relate to each other.
Of course, usually people want both accuracy and interpretability. It boils down to understanding what's more important for the problem at hand and making the compromises accordingly.
Every new theory has a plain english explanation that's easy to understand. Few of them have the raw accuracy or reliability of top ML models
Problem is, most machine learning algorithms cannot incoprorate background knowledge except by hard-coding inductive biases (as, e.g. the convolutional filters in convolutional neural nets). Unfortunately, this is a very limited way to incorporate existing knowledge.
This is actually why most machine learning work tries to learn concepts end-to-end, i.e. without any attempt to make use of previously learned or known concepts: because it doesn't have a choice.
Imagine trying to learn all of physics from scratch- no recourse to knowledge about mechanics, electomagnetism, any kind of dynamics, anything. That's how a machine learning algorithm would try to solve a physics problem. Or any other problem for which "we already have well defined knowledge (and data) of". We might as well be starting at around before the stone age.
I had a philosophy lecturer who would vehemently disagree that it is even possible to construct an algorithmic AI to decide what is and isn’t a cat.
For a start, do you mean domestic cats, or the cat family? What about photos, sculptures and other representations?
I mean, is Garfield a cat?
People use statistical machine learning over algorithmic AI because trying to model the real world with algorithms is an endless and often pointless exercise.
Let's look at your question.
Writing an equation for a cat is hard, actually really hard. Humans cannot reliably explain their decisions here. If I ask a person to tell me how they classify between cat and not cat, the answer will invariably be something along the lines of "well, it has the general shape of a cat". Which is actually just a huge combination of heuristics it took about 10 years to work out. There is quite a lot of work in neuroscience suggesting that the actual decision you make when you classify a cat happens before a rational is developed.
We could encode a function for that, but it relies on us knowing a lot about cats, which takes time and only works for toy examples.
If you use a convolution neural network, you can get close to human level performance on much more complex topics with little domain specific insight. There is no universal law for classifying hand written letters - they are an individual's interpretation of some symbols we made up. This task will always be 'non-rigorous' because the very underlying thing is not actually well defined. When does a 3 become an 8?
So we could have a person toil away and come up with a bunch of heuristics that we encode in a regression, but why is this better than having a machine learn those heuristics? Most problems are not life or death. What is the real added value in having people hand crafting features for predicting traffic jam delays or customer retention, when the end use is probably just to have a rough indication.
As somebody who does research using a huge range of models, I object that we should be guided by our intuition- our intuition is mostly wrong about non trivial problems.
Basically any "equation" somebody has discovered for what happens in a neutron star is "simple". There is a large amount of observational data, it is a consequence of some already well proven theorem, or relies on something well established to narrow the range of possible descriptions immensely, or (most commonly in my experience) the equation is basically a human version of deep learning, where grad students toil away making tweaks and heuristics until a point that the description fits the data somewhat well, and then there is some attempt to ascribe meaning after the fact.
For example, we can describe the trajectory of a comet using a "few" lines of high school level math. This means it is actually feasible for a person to have a reasonable intuition about what is happening, as the problem is actually dominated by a handful of important variables. Good luck getting anywhere near simple to describe cats (again, in a domain where the line of what is and isn't a cat is actually not even a property of it's physical attributes, so the problem is not properly defined under your requirement). To tell if something is or is not a cat, would require a DNA sequence. That is how we define the cat. So by your own definition, we do not have sufficient data in our dataset to properly do this classification.
I'm not sure you really understand the point you make about "statistical tests for bullshit".
Most statistical tests are themselves ivory towers of theory and assumption which nobody ever verifies in practice (which is as unsciency as anything you accuse machine learning of). And people do actually use well grounded ways of evaluating machine learning models. Cross validation is very common and predates most machine learning, and has various "correctness" results.
For any model we build, if we do not have data that encodes some pathological behaviour we can test it out on, there is no test, no statistical procedure to tell us that model is flawed. If we have that data, we can run the exact same test on a black box model.
You should not conflate science with formalism or complexity. Running a statistical test is pseudoscientific unless you do it correctly and appropriately.
Saying something is not scientific because the data may not contain enough information to fully answer the question is flat out wrong.
> "For instance, when ProPublica journalists tried to explain what was in the proprietary COMPAS model for recidivism prediction, they seem to have mistakenly assumed that if one could create a linear model that approximated COMPAS and depended on race, age, and criminal history, that COMPAS itself must depend on race. However, when one approximates COMPAS using a nonlinear model, the explicit dependence on race vanishes, leaving dependence on race only through age and criminal history. This is an example of how an incorrect explanation of a black box can spiral out of control."
The concern about the strong relationship between race and COMPAS predictions is not largely based on a concern about whether there is an explicit dependence in the model. The concern is whether there's a relationship either explicitly or implicitly. And in particular, whether such a relationship results in unfair outcomes. The findings of the ProPublica study (https://www.propublica.org/article/how-we-analyzed-the-compa...) strongly suggested this was the case:
"- Black defendants were often predicted to be at a higher risk of recidivism than they actually were. Our analysis found that black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent).
- White defendants were often predicted to be less risky than they were. Our analysis found that white defendants who re-offended within the next two years were mistakenly labeled low risk almost twice as often as black re-offenders (48 percent vs. 28 percent).
- The analysis also showed that even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 45 percent more likely to be assigned higher risk scores than white defendants."
I understand the desire of the MIT researchers to promote the value of their work, but in this case they appear to be doing so in a potentially damaging way.
Now, it's of course possible to argue that judging reoffending risk based on age is in fact unfair and racist because it has disproportionate impact on certain racial groups, even though it's strongly predictive across all racial groups. That's not the argument ProPublica made, though. Their argument was about the supposed perils of black boxes, and they kind of acknowledged that age probably wasn't a racist criteria - or at least that it would be a lot harder to justify calling it one - by attempting to strip out its effects in the first place. It's also a different kind of argument entirely, one that revolves not around whether the algorithm is somehow treating people differently based on their detected race - because it isn't - but around what it means for a decision like this to be fair in the first place.
The link I provided gives the actual details of the method and findings; this is probably a more useful source for the details. The claim that the actual source of the difference is 'age' doesn't really make sense. There isn't enough of a difference in the number of young people between black and white populations to result in the differences found in the analysis.
(I do agree that the actual attempt to control for age was poorly done; it really shouldn't have been done at all, since it had nothing useful to add to the analysis or results.)
PS: It's 'COMPAS', not 'COMPASS'.
“The Mythos of Model Interpretability” is good reading on this.
OTOH, these authors (and old school heavyweights like David Ferucci) are not wrong. Interpretability, explainability & interoperability with human intelligence is not something to just give up on.
I like "challenges" as ways of exploring these areas. Good luck to the authors. Female team, btw.
Sometimes, the best interpretable model is as good as a black box, and that's great.
When this is not the case, the trade-off is that one should see what's more important for the actual problem. Perhaps interpretability is not a big deal.
Another solution is to try to extract interpretability from the more accurate black box model with something like SHAP.
Some papers i found interesting on the subject:
https://arxiv.org/abs/1702.08608 (i found this was a good sumary of the issues)
"No matter the definition, developing an AI system to be interpretable is typically challenging and ambiguous. It is often the case that a model or algorithm is too complex to understand or describe because its purpose is to model a complex hypothesis or navigate a high-dimensional space, a catch-22. Not to mention what is interpretable in one application may be useless in another."
"Underspecified and misaligned notions of interpretation impede progress towards the rigorous development of understandable, transparent, trusted AI systems."
The dream of machine learning is to fire the programmer and break up the last trade (baring doctors and lawyers).
If it was just business people being alienated, I'd say fuck it they deserve it. But the fact is everyone, and many of the programmers too, are alienated too when much of the code makes sense to no one. I would rather fix programming to be comprehensible to more people even though it will take down the walls between programmer and non-programmer.
I think they missed an important reason here. It's why I decided to study AI and Machine Learning: The ability to "just throw some data at it and see if it figures it out" is just so exciting, mysterious and intriguing.
I remember actually being a bit disappointed when I learned how (classic, 3 layer, early 2000s) neural networks worked. That it was just something super simple with derivatives, and resulted in something that seemed a lot like a more complex form of statistical regression. Kind of took the magic out of it for me, a little (didn't stop me of course, it was and still is an exciting field of research).
I know it's not a good reason, but surely I'm not the only one who thinks it's extremely cool that one is even able to build a black box that does useful things with what you feed it, but you don't know how it works, and yet you built it.
It's just something mystifying, and I believe there is also a tinge of fear for that disappointment, to figure it out and find out it's not really doing as clever things as you hoped it might.
That's awesome, but this was a single model for a single application being measured against presumably by a limited test criteria/benchmarks. The point still stands that the recent narrow AI "renaissance" is largely due to deep learning, which is inherently black box. There is a lot of work going on in making deep learning more interpretable precisely because it's so prominent today and because its lack of interpretability is a huge con.
Machine vision for example has come a long way due to deep learning. A lot of autonomous car companies are relying on it for perception. All of these companies hate the fact that there is no way to tell when the classifier will fail or why it'd fail. When using deep learning in finance setting, its lack of interpretability is a huge downside.
Despite this deep learning is being used because it provides an appreciable improvement in performance over the alternatives. Certainly there could be alternatives to deep learning that may perform just as well or even better. But finding those alternatives for the vast number of applications deep learning is being used for today is much easier said than done.
So while I'd like neural nets to be more interpretable, to me it'd take a distant second place. The first would be to get models that actually work better than humans for practical tasks, even under limited circumstances.
It occurs to me that human decisions making can be either explained a very high level, or a very low level(neurons firing). But the magic in between is too complicated to draw direct lines between stimuli and decisions.
AI seems to be the same way. We have statistical models that explain how most everything works at a low level. That a simple math formula can approximate just about anything. How we arrive at weights that separate data sets in different dimensions and etc. For convolution networks we also understand how layer "decisions" flow through to other layers. But, it's too complicated to look at an image and explain exactly how the input pixels will result in a classification output.
I'm not sure how much I should care about that.. Seems like a problem for mathematicians/statisticians. Having the doctor explain himself gives me no more true insight into how his brain works than having somebody explain at a high level how the robot was trained and reacts in different situations.
I wouldn't be surprised if someone, especially in business, is actively opposing transparent AI.
If tiny fractions of a point matter (the rare case), then making the model explainable is adding a ton of complexity for little practical gain. It sounds nice, but the numbers are unforgiving
It's easier to say to people who make financial decision that this is great artificial intelligence that will solve all your problems then explain mathematical equations behind it that 90% people won't understand. The more complicated the stuff is the higher probability it will be sold. That's the standard in IT world from quite some time.
To continue innovation in the algorithmic field, I think we need a branch of scientific research or a competition where researchers are constrained to 100MHz CPU clocks and no GPU cheats when actually running their algorithms (not when loading and parsing data)
and interpretability is only a half measure. If i really want to understand the model, I will conduct causal analysis with randomized controlled trial and will find the root cause and impact of the factor of my interest.
Then, since they are supposedly the 'only way' to provide whatever function the market is 'demanding', then clearly we must abandon any notion of accountability, as this need 'must' be met.
This logic is perhaps most evident in google/facebooks content moderation dilemmas, where both companies refuse to define any sort of non-vague actionable standards about what they censor (or fail to promote) as 1) they can't provide them and 2) they don't want to be held responsible for what they actually are.
Then, as outrage has grown over facebook/google's terrible content moderation and censorship policies, and the need for accountability has grown, both companies have been forced to hire ever more moderators and censors, because the technologies as claimed just don't work well enough.
This is just outright wrong. And the argument about moderation is completely unrelated.
Occam's razors the easier way to explain the actions of the big guys and their own internal responses to cultural pressures to not allow things they don't like from happening or being tolerated in the world.
Makes is seem like it's an out-of-date article. Is it?