If you somewhat familiar with DL literature, you will see this paper, while having a very interesting angle, the underlying architecture is a standard, enc-dec network, with encoder being CNN and decoding being LSTM. Such application, has been studied before:


The above paper shows nice result that turns image to latex expression, and image to html.

