Those Computerphile videos[0] by Rob Miles helped me understand transformers. He specifically references the "Attention is all you need" paper.
And for a deeper dive, Andrej Kharpaty has this hands-on video[1] where he builds a transformer from scratch. You can check-out his other videos on NLP as well they are all excellent.
And for a deeper dive, Andrej Kharpaty has this hands-on video[1] where he builds a transformer from scratch. You can check-out his other videos on NLP as well they are all excellent.
[0] https://youtu.be/rURRYI66E54, https://youtu.be/89A4jGvaaKk
[1] https://youtu.be/kCc8FmEb1nY