I actually found this to be one of the best explanations on this topic I've read. Fully recommend the author's book too.

Also recommend this as it follows it/is an alternative: https://arxiv.org/abs/1703.01619

Are you sure you linked to the right article? The linked article is about neural translations using seq-to-seq while TFA is about neural models for all kinds of language processing.

It's probably domain/industry/company dependent, but the vast majority (>90%) of NLP work I do nowadays is sequence-to-sequence models.

