
The Illustrated Transformer - rerx
http://jalammar.github.io/illustrated-transformer/
======
rerx
The self-attention mechanism is explained very well in this blog post. Because
of this it is very much worth a read for anybody interested in the state of
the art of deep learning models for machine translation. Other parts of the
Transformer model are glossed over more, though.

------
vstuart
Likewise, Alexander Rush
([http://nlp.seas.harvard.edu/rush.html](http://nlp.seas.harvard.edu/rush.html))
at HarvardNLP provides an excellent web page, "The Annotated Transformer"
([http://nlp.seas.harvard.edu/2018/04/03/attention.html](http://nlp.seas.harvard.edu/2018/04/03/attention.html)),
which basically provides a line by line discussion of the code and how it
relates to the Transformer model.

* Code for The Annotated Transformer blog post (GitHub): [https://github.com/harvardnlp/annotated-transformer](https://github.com/harvardnlp/annotated-transformer)

Rush also presents a workshop paper on this model
([http://aclweb.org/anthology/W18-2509](http://aclweb.org/anthology/W18-2509)).

Of course all of that is in reference to the original Google Brain/Research
paper, "Attention Is All You Need"

* arXiv landing page: [https://arxiv.org/abs/1706.03762](https://arxiv.org/abs/1706.03762)

* PDF: [https://arxiv.org/pdf/1706.03762.pdf](https://arxiv.org/pdf/1706.03762.pdf)

