
When Recurrent Models Don't Need to Be Recurrent - noisydonut
http://offconvex.github.io/2018/07/27/approximating-recurrent/
======
joycian
In what sense are we sure that natural language actually contains these long-
term dependencies (i.e. k > 25) in such abundance that they would show up in
datasets of current size? When I was younger, I took Latin, and while some of
the sentences definitely demanded a fairly large attention window, I'm not
sure such long dependencies would show up often.

Even then, I wonder how many humans alive would correctly pick up on an
incorrectly translated word in a text by Livius or Caesar even - maybe this is
different for modern languages but I feel they tend to use shorter sentences.
Dependencies that transcend a single sentence are less common, maybe even so
uncommon that a model will not pick them up during training.

Can someome with experience in the field comment?

