Hacker News new | past | comments | ask | show | jobs | submit login
Cynthia Rudin and interpretable ML models (quantamagazine.org)
67 points by SirLJ on May 1, 2023 | hide | past | favorite | 54 comments



I’ve always wondered what an sufficient explanation of a neural network would entail.

At a very low level, there’s no secret: all of the weights are there, all of the relationships between weights are known. The problem is that it doesn’t tell you anything about the emergent properties of the network, in the same way that quantum physics doesn’t give much insight into biology.

It may be possible that there is no English sentence you can utter about the network which is both explanatory and fully accurate. What is the network doing? It’s trying to approximate the function you’ve given it. That’s it.

You can try other things like ablation to find the effects of lobotomizing the network in certain ways, but this also can’t fully explain 2nd and higher order relationships.


This sentence from the article stood out to me:

> The explanations have to be wrong, because if their explanations were always right, you could just replace the black box with the explanations.

I've never thought of it like that, but I think that really gets to the core of the issue...

To be able to reason about the behaviour of a neural network with 100% confidence you would need an explanation that incorporated every weight, otherwise you have to accept some degree of uncertainty.

But the idea you could ever explain something with billions of parameters in a way that a human could comprehend seems ridiculous. Imperfect generalisation therefore would inevitably be required – "this bit of the network seems to perform x function". But in doing this you have to accept that this generalisation comes with uncertainty because such a high-level approximation is unlikely to capture all of the nuances. And if it did capture all nuances then the black box isn't needed.


Obviously debatable, but this seems wrong. It could easily be that we can't figure out how to do something but when presented with a solution understand it. Many people understand public private key cryptography but could never in a million years come up with it. Same for much of physics, a lot of math etc. As for the sheer complexity of millions/billions of parameters - people can explain and understand molecular dynamics simulations perfectly well. They can't use a small number of words to explain a specific configuration, but it's explainable.


I think there’s a major difference though. You can understand public key crypto at varying levels of complexity because it is a topic created from first principles. Any sub-topic within the umbrella of public key crypto can be explored further. In this sense the knowledge is like a large program.

Neural networks aren’t like that. The whole thing works together simultaneously to derive a result. So I’m skeptical that there is a high level explanation other than “it’s doing it’s best to approximate the data it was trained on” which is not that useful.


There must be some kind of pattern in the high-dimensional space of NN decision making that can be used to explain its decision, at least broadly (given some well-defined query). I’m not saying it’s easy, and I’m not saying the end user will have access to it, but surely someone in the pipeline needs to be able to acquire that. Or else there will never be a serious certification process before general and widespread use.


I guess this raises the question of levels of explanation. What kind of explanation would you like to be able to get? It reminds me of something the physicist Tim Maudlin described on a recent podcast (I will give a poor summary):

Suppose you have the spinning color wheel on your MacBook. You give it to a computer scientist who says that the program is in an infinite loop, and the wheel will never stop spinning. Then you give the computer to a superhuman physicist who studies all of the components including the state of the transistors and internal components and concludes that the wheel will spin for 9 years before the screen gives out. [1]

The point is that both are correct, but we’d really prefer the first explanation, since it gives insight into what the program is doing.

The problem is that neural networks aren’t like programs. They don’t have reducible internal components unless you reduce down to the weights of the network itself. [2]

[1] https://m.youtube.com/watch?v=k0CQ_D7dwA0

[2] https://karpathy.medium.com/software-2-0-a64152b37c35


Yes, I totally agree, and thanks for the Karpathy link, it’s really good. And yes I don’t know at which level is the sweet spot. But the fact that it’s software 2.0 means we need debugging 2.0 too.


Sure — abstractions elide details.

But “this portion of the network appears to recognize faces” is still a useful abstraction. In the same way it is for neurology. Even if both NNs and brains sometimes see faces that aren’t there because the network isn’t “recognizing faces” but a system which signals on a complex set of heuristics that mostly detects faces.

How does the answer change if I only want 99% confidence and accept the existence of illusions? — can we describe the bulk of the behavior in a summary?


It would be very satisfying if we could have an explanation like that. But that’s far from guaranteed, unless that was part of the explicit design of an MMOE for example. Very likely if you ask “what part of the network recognizes faces” the best answer is “the whole thing”.


There's some evidence for emergent "modules" inside NNs.

https://www.alignmentforum.org/posts/j84JhErNezMxyK4dH/llm-m...


We certainly cannot comprehend the entire thing at the same time. ”XAI” is a bit of a buzzword right now and every HCI researcher struggling with finding their grip have jumped on the hype boat, meaning lots of them will just say things because it sounds nice to people with even less knowledge of NNs as them.

However, to think we cannot comprehend them at all is too much. We can certainly ”query” a NN to understand the weights and activations that led to some decision in some specific scenario. How to consume and use that information is another matter, of course, but a debugging process is certainly feasible (as it is with any large scale computer system).

Plus, the architectures of neural networks are not as complicated as most people think. Sure it gets weird after training, but the basic architecture of components are not nearly as complicated as the hardware of a GPU for example, or even that of a complex OS kernel.


All models are wrong, but some models are useful.


The neural net gives a very noisy approximation of some ideal structure. So you're looking for what that ideal structure is, which very possibly does not require a neural network to represent.


The map is not the territory.

Or, at more length and in more depth, The Analytical Language of John Wilkins by Borges.


Thank you for sharing, the wiki has a pretty funny note:

> Lewis Carroll, in Sylvie and Bruno Concluded (1893), made the point humorously with his description of a fictional map that had "the scale of a mile to the mile". A character notes some practical difficulties with such a map and states that "we now use the country itself, as its own map, and I assure you it does nearly as well."


I'm reminded of: "I have the world's largest collection of seashells. I keep it on all the beaches of the world... perhaps you've seen it." - Steven Wright


One thing that would be fun to know is, e.g., "If the LLM answers question X correctly, then what's a minimal-sized set of things we could remove from the training set and cause it to get that question wrong?" I think with current methods this would be pretty expensive to find out, but, in principle I'm guessing it would be pretty illuminating.


Causal modeling lets you determine where particular facts are stored without too much computational cost and also how you can edit those facts in the model. That might allow you to decouple the problem into first finding the desired set of gradients (converging at the target modification, zero for the weights you don't care about), and just do a linear solve (since probabilistic weightings on how likely an input is in the training set will linearly impact each of the gradients in basically every neural architecture of note, including most LLM modifications) to find an approximately minimal (IIRC, a true minimal subset is NP-hard or something) subset of the inputs which when combined would give you the inverse of the target gradient. Remove that much of the weighting for each of the inputs.


>It may be possible that there is no English sentence you can utter about anything which is both explanatory and fully accurate.

FTFY. Philosophy aside, humans understand complex things by building simpler abstractions for them. I think the key to NN understandability centers on building simpler NNs focused on specific tasks that build into bigger ones.


> the same way that quantum physics doesn’t give much insight into biology

This is very insightful. I've long maintained that "understanding" requires some predictive ability in addition to an ability to explain what's happened in the past.

In the case of neural networks, we should be able to "simply" generate the network that has the desired behavior.

But we can't. We're still shackled to compiling some collection of training data that may or may not produce the desired behavior, tested by some other collection of testing data that may or may not test the desired phenomenon.


I've thought about pattern matching NN weights against known graph structures. I'd imagine, for example, that decision trees emerge at some point in GPT.

The real problem IMO is our ability to represent natural concepts as such graphs. Is love, for example, a DFA? Can we search for its isomorphism?


Same thing as the brain pretty much, except we have root control over the NN in a way we don't with the brain.

To my knowledge, some angles have been explored. Finding which inputs maximally stimulate indiv neurons, and tracing that up the chain.

There was a fantastic article that did this on ImageNet - the takeaway was that neurons adjacent to the input encode sharp, basic features (like edges), and "higher-level" ones encode more nuanced stuff (textures, fur, wheels).

Then ablations, as you mentioned. And finally, rerunning the same training data on differently sized + shaped architectures.

AI systems are missing at least one level of complexity. All neurons in a NN layer fire simultaneously, unlike brain neurons, which trigger each other async.


The brain is heterogenous in ways that a neural network needn’t be. Certain brain regions definitely perform certain functions, certain regions are hardwired to other regions. You can chop out a chunk of a person’s brain and they’ll be completely unable to do certain things.

Afaik the properties you’re referring to on AlexNet (the network trained to perform on Image Net) have to do with the nature of repeated convolution operations which, while interesting, is not as deeply insightful as I believe the article is aiming for.


> It may be possible that there is no English sentence you can utter about the network which is both explanatory and fully accurate.

reminds me of: "The Tao that can be told is not the eternal Tao."


Perhaps a sequence of sentences, almost a series in the mathematical sense, where you iterate your explanation-finder, generating more sentences in the series, until it's sufficiently detailed for the question you care about today.

Also, remember that the explanation doesn't really have to fully contain the semantics of the model. It doesn't have to fully encode the weights, just activate sufficiently sympathetic structures in a human brain.


Getting AI to show its work isn't just for accountability. "Showing your work" gives you a clearer picture of the problem / solution and prevents / fixes bugs in implicit reasoning, the key problem current AI has which prevents it from being truly autonomous.

Ask GPT4 to do a task, and then ask it to the same task showing its work; you'll find that GPT4 is less likely to make mistakes on the latter. This is especially apparent for tasks like counting # of words and multi-step problems, which GPT normally has trouble with.

But GPT4 still tends to struggle even breaking the task down, to the point where it starts producing extremely obvious mistakes (e.g. "the turtle moves 1 unit up, from (1, 0) to (2, 0)"). One possibility is that it isn't actually showing its work, it's just generating backwards explanations from a latent conclusion. Maybe this research will clarify whether this is the case, and help us develop a more coherent LLM.


>If you want to trust a prediction, you need to understand how all the computations work.

I disagree with the premise here. You don't understand how all the computations work in the brains of people's predictions whom you trust. You simply have a mental calculation of their batting average through exposure to their track record and this batting average functions as a proxy for trust.

I find this is more or less the same way that I learn whether or not I can rely on GPT-4 for a particular use-case. If it's batting average is north of a certain % for a given use-case, then it doesn't need to be right 100% of the time for me to derive value from relying on it.

I think we are slowly crossing a threshold where we accept indeterminism and mistakes from machines in a way that we haven't in the past.


I disagree too (despite being a researcher in the general neighborhood), but I find the usual comparison with a human brain to be inaccurate. We don’t trust a person just because of their track record, the fact that they are human to begin with already comes with a lot of baggage. I think the best analogy is with a very complicated piece of hardware.

That said, someone has to have a good understanding of what is happening (not necessarily the end user). In my experience neural networks today are basically like compiled software which source code has been lost. Every act of debugging and fixing has to include also reverse engineering (I’m talking about the actual network weights, not the e.g. PyTorch code). It’s too inefficient and cumbersome, and as they become more and more common, statistics says the failures will become more often and more severe.

When do we, for example, decide to put the first NN on an airplane cockpit?


The point is that you trust those people. You can't trust a black-box model, but you can trust a result if the explanation that is provided actually explains it. That is, if the explanation has factors X, Y, and Z, then every time those hold the result should be the same.

Otherwise, you are just trusting the training set and the training process. Sometimes that's fine, as the article also mentions.


No, the point is those people are also a black-box to you and they are a black-box that you trust because of YOUR training set and training process.


Yes, your exact thought process may be a black box to me, but I myself am a black box made to the same (DNA etc) spec. We have millions of ancestral years of experience trusting each other and we know how to simulate each other because we are the same kind of black box. Your trust in other humans is explained much more by your priors over human behavior than by any observed behavior of individual humans--that's layered on top.

In short: we trust people because we can model them effectively, and being one helps. We trust mathematical models and traditional programs because we understand them and can check their work in any specific instance. Large ANNs don't (yet) benefit from either of these two kinds of trust.


They may be referring more to a theory of mind than a theory of neuro-physics. I trust the decisions and opinions of those whose reasoning capacities I trust. It's not enough to know what they decide, you must also know the set of choices in front of them at the time and their process for choosing one among them.


The models, neural networks Professor Rudin is considering have a LOT of parameters, dimensions, neurons, neuron values, etc.

Okay. Since apparently no one has the explanations desired, we have to guess. So, let's do some guessing:

Given so many parameters, etc., we have in some sense -- in some case of geometry, spaces, maybe vector spaces, maybe as in linear algebra -- a lot of dimensions.

Then something surprising holds (once we get precise about a space, easy enough to prove): Given a sphere in the space, we can calculate its volume. We can do this for the space of any finite dimension. Here is the surprise: As we have a lot of dimensions, there is a LOT of volume in that sphere, and nearly all that volume is just inside the surface of that sphere. E.g., if do some work in nearest neighbors, discover this surprise in strong terms.

Net, in the space being considered, there is a LOT of volume. Then ...: There is plenty of volume to put faces of cats over here, dogs over there, men another place, women still well separated, essays on bone cancer far away, ..., for thousands, millions, ..., more things, thoughts, topics, etc. Then given some new data, say, a white cat not in the training data, likely the data on that white cat will settle on the volume with the cats instead of dogs, monkeys, etc. and, thus, we will have recognized a cat via some emergent functionality.

Just a guess.


I view explainability or interpretability of a network as the ability to take a network and replace it with a drastically smaller set of functions and tables that (a) you can explain and (b) work pretty much the same as the network does.

Because we understand these functions and tables, we understand exactly how well the network will work, and also what is missing (i.e., how we can expand its accuracy.)

I think this is a very hard problem, but it is one that needs to be solved.


I've always seen interpretability and explainability as different sides of the same coin.

If you take an information-theoretic approach to it, and think of a DL model like any other model, there is a certain equivalence in understanding the model features and how it behaves with reference to the universe of data it is applied to.

It was an interesting article but I felt like it created problems that need not be there (or maybe it's just describing problems that others created?)


To trust a prediction you do not need to understand the underlying computations. What you do need is an on demand understandable rational justifucation of how the prediction was derived at the right semantic domain level.


> They extract deeply hidden patterns in large data sets that our limited human brains can’t parse.

I think that the bulk of ML has so far produced what our brains in fact easily see. We can easily perform classification, or generation.


They're definitely worse than me at spotting spam.


Sometimes I think the interpretation of model parameters is all bunk. I think the legacy of over interpretation of parameters and results has resulted in the occification of some very shaky science.


We have no idea how the human brain works but no one seems to care.


I’m pretty sure the field of neuroscience is focused on understanding how the brain works.


Depends about on what you mean here. The good scientist mostly give up when trying to 'describe' higher level function (at best maybe generate some analytical expression which may correlate with IQ or something) and focus instead on biochemistry at the single neuron level. That's a very different science than trying to 'explain AI'.


Have we made any fundamental progress in understanding how the human brain works in the last 20 years? 50?


Issue there is the ability to trace out the brain networks accurately. We're getting closer. Once we're in the ~0.5 micron scanning range, things will get VERY interesting.

I believe we've already mapped some invertebrate brain, maybe 250 neurons.


We've made pretty good progress. But the same problem that plagues explainabilty of ANN's rears its ugly head in neuroscience.

Low level descriptions provide shockingly little insight into high level behavior.


Yes, but also your original question was about how much people care, so questioning the amount of progress is at best a weird tangent.


I am genuinely curious. It doesn't seem like we have made the slightest progress in truly explaining how the human brain works or why we are sentient and self-aware.


The only ingesting things now are the machines, that’s where all the money goes now.

We’re just “meat bags”.


Can you please stop posting unsubstantive comments? This is below the quality line for an interesting HN thread.


Sorry Dang, but that’s just how a lot of people see it.


Ok, but that doesn't make for a substantive comment, and we're trying for substantive comments here.


Not sure if it was intentional,but the title of this article is pretty hilarious considering she explicitly badmouths explainability (trying to peer inside a black box) and advocates for interpretability (building models that are less black-boxy in nature).

Also, man, quanta feels really rough and popsci-y when it comes to CS.


I was amused by her example of poetry needing interpretability, given that we already know why and did when they did that research: https://gwern.net/gpt-3#bpes


> Also, man, quanta feels really rough and popsci-y when it comes to CS.

I'm guessing this is just the Gell-Mann Amnesia effect. Why do you think the quality is better for other fields?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: