
(Re)Discovering Protein Structure and Function Through Language Modeling - baylearn
https://blog.einstein.ai/provis/
======
baylearn
BERTology Meets Biology: Interpreting Attention in Protein Language Models

Abstract

Transformer architectures have proven to learn useful representations for
protein classification and generation tasks. However, these representations
present challenges in interpretability. Through the lens of attention, we
analyze the inner workings of the Transformer and explore how the model
discerns structural and functional properties of proteins. We show that
attention (1) captures the folding structure of proteins, connecting amino
acids that are far apart in the underlying sequence, but spatially close in
the three-dimensional structure, (2) targets binding sites, a key functional
component of proteins, and (3) focuses on progressively more complex
biophysical properties with increasing layer depth. We also present a three-
dimensional visualization of the interaction between attention and protein
structure. Our findings align with known biological processes and provide a
tool to aid discovery in protein engineering and synthetic biology.

Paper: [https://arxiv.org/abs/2006.15222](https://arxiv.org/abs/2006.15222)

Code:
[https://github.com/salesforce/provis](https://github.com/salesforce/provis)

