A related work, “MSA Transformer” [1], contrasts the strengths and weaknesses of language models using single sequences as input (as here) against language models using alignments as input. The authors of “MSA Transformer” perform ablation studies and various kinds of feature randomization, and compare directly against the state of the art (Potts models) in predicting residue-residue couplings.
One interesting note from “MSA Transformer”
> Potts models and single-sequence language models predict protein contacts in fundamentally different ways. Potts models are trained on a single MSA; they extract information directly from the covariance between mutations in columns of the MSA. Single-sequence language models do not have access to the MSA, and instead make predictions based on patterns seen during training. The MSA Transformer may use both covariance-based and pattern-based inference
One interesting note from “MSA Transformer”
> Potts models and single-sequence language models predict protein contacts in fundamentally different ways. Potts models are trained on a single MSA; they extract information directly from the covariance between mutations in columns of the MSA. Single-sequence language models do not have access to the MSA, and instead make predictions based on patterns seen during training. The MSA Transformer may use both covariance-based and pattern-based inference
1. https://www.biorxiv.org/content/biorxiv/early/2021/02/13/202...