As a prerequisite to the attention paper? One to check out is: A Survey on Conte...

As a prerequisite to the attention paper? One to check out is:

A Survey on Contextual Embeddings https://arxiv.org/abs/2003.07278

Embeddings are sort of what all this stuff is built on so it should help demystify the newer papers (it’s actually newer than the attention paper but a better overview than starting with the older word2vec paper).

Then after the attention paper an important one is:

Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165

I’m intentionally trying to not give a big list because they’re so time-consuming. I’m sure you’ll quickly branch out based on your interests.