
The Reformer – Pushing the limits of language modeling - datashrimp
https://colab.research.google.com/drive/15oP52_7W5dRcAnbgX3tYADsu4R3cjMIf?usp=sharing
======
bitforger
A alternative to this has recently been published at ICML that claims to be
faster. The website and tutorial video are very nice, too.

[https://linear-transformers.com/](https://linear-transformers.com/)

~~~
blueblimp
Are the results actually good? Table 2 reports 3.40 bits/dim on CIFAR-10, but
PixelRNN in 2016 got 3.06 bits/dim (Table 3 in
[https://arxiv.org/abs/1601.06759](https://arxiv.org/abs/1601.06759)). I would
like to compare the MNIST results also but I'm having trouble converting
between bits/dim and nats in a way that gives a sensible result. It's a bit
annoying that the paper does not compare to previously-reported numbers on
these benchmarks.

~~~
joeddav
IMO the theoretical insight w.r.t. transformers as RNNs through the kernel
formulation of self-attention is more interesting than the experimental
results.

------
random_savv
This could be a transformative way to share research papers - it seems you can
run all their models _inside_ the paper!

------
Der_Einzige
What is the performance of reformer or linformer or any of these other new
models in practical applications (not the benchmarks that researchers game)?
Is it better than BERT?

~~~
The_rationalist
I actively follow the state of the art pre trained models on
paperswithcode.com and Nlp progress. The state of the art (often outperforming
BERT by far) is XLnet and sadly is from 2019. 2020 has been stagnating (except
for the special case of generative tasks with GPT3) I have observed that zero
researchers have tried to improve on top of XLnet. While BERT has had ~20
alternatives implementations that improve upon it. Researchers are often
unaware of what is the current state of the art,this induce a lag in research
progress.

~~~
ThomThom
"zero researchers have tried to improve on top of XLnet" I question this
assertion.

In particular at least the Roberta model by Facebook is already improving
significantly upon XLNet.

