Hacker News new | comments | ask | show | jobs | submit login
The Illustrated Transformer (jalammar.github.io)
17 points by rerx 3 months ago | hide | past | web | favorite | 2 comments

The self-attention mechanism is explained very well in this blog post. Because of this it is very much worth a read for anybody interested in the state of the art of deep learning models for machine translation. Other parts of the Transformer model are glossed over more, though.

Likewise, Alexander Rush (http://nlp.seas.harvard.edu/rush.html) at HarvardNLP provides an excellent web page, "The Annotated Transformer" (http://nlp.seas.harvard.edu/2018/04/03/attention.html), which basically provides a line by line discussion of the code and how it relates to the Transformer model.

* Code for The Annotated Transformer blog post (GitHub): https://github.com/harvardnlp/annotated-transformer

Rush also presents a workshop paper on this model (http://aclweb.org/anthology/W18-2509).

Of course all of that is in reference to the original Google Brain/Research paper, "Attention Is All You Need"

* arXiv landing page: https://arxiv.org/abs/1706.03762

* PDF: https://arxiv.org/pdf/1706.03762.pdf

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact