On the idea of interpreting the weights, I've been very interested if it's possi...

On the idea of interpreting the weights, I've been very interested if it's possible to compute basis vectors of the weights matrix to define the core concepts within the model and then do a change of basis to allow reorganizing the model to more human understood concepts?

I think the inherent compression of a specific training set into a matrix makes this more difficult cause the basis vectors likely won't contain clean representations of human ideas, but I also wonder if starting a new training set with an initialized (or fixed) matrix of human defined concepts would help align the model's weights to something that can be interpretable