
Zoom In: Speculative claims about neural circuits - dsr12
https://distill.pub/2020/circuits/zoom-in/
======
gammadens
There's an entire literature on very closely related concepts and issues --
many of the same issues arguably -- in the psychological test and measurement
literature. There it's discussed tn terms of internal and external validity
but interpretation is at its core and the scenario (and often models, at some
level) are very similar. There you are trying to discriminate between
psychologically relevant states, or outcomes, or variables, based on inputs in
the form of responses to items (inputs). Focus is on articulating how to
interpret test items an model structural features vis a vis inputs and
outputs.

The literature on this is too hard to summarize in a post, but basically in
turns into an empirical-scientific question, of making predictions about model
features and testing these predictions scientifically.

------
dmvaldman
thousands of people should be studying this. we’ll look back on these moments
as the dawn of a new empirical science

------
activatedgeek
I am happy how the tail sections of the article address the main concern I've
had for a long time regarding this line of research. The research is no doubt
interesting from a purely scientific pursuit.

~~~
colah3
Thank you! I've been pretty obsessively thinking about meta-science issues
around interpretability for the last six months or so. :)

A notable researcher privately told me that they think all interpretability
research is nonsense. As someone who's dedicated the last six years of my life
to this field, that was pretty uncomfortable to hear. But I think it's
important to pay attention to, because I think it's actually a pretty common,
unspoken view.

As a result, this has been on my mind a great deal. I think two important
questions are:

(1) How can we surface the disagreements that are leading to such divergent
views between different members of the research community? (Especially when
people are generally too polite to say that they think something is total
nonsense.)

(2) What would a more epistemically stable foundation for interpretability
look like?

I'm not sure what the right answers to these are, but I think they're
important to discuss.

