Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (arxiv.org)
85 points by tmfi 22 days ago | hide | past | favorite | 2 comments

Here's a nice video by Yannick Kilcher explaning the Nystromformer: https://www.youtube.com/watch?v=m-zrcmRd7E4

The benefits over regular transformers is that it is more efficient (does less operations), as the original transformer has a quadratic complexity in the number of input tokens.

It also links to a comparison that is not in the paper, against Performer, Linformer and Reformer:


