
Plug and Play Language Model: Steer topic and attributes of GPT-2 models - sdan
https://github.com/uber-research/PPLM
======
minimaxir
Conditional text generation is the real future of AI text generation. This
approach is _interesting_ but somewhat hard to parse; the corresponding blog
post and papers are very long!

I had experimented with CTRL
([https://github.com/salesforce/ctrl](https://github.com/salesforce/ctrl)), a
previous Transformer-based model focused on conditional generation also
referenced by this project, and got very good (albeit inconsistent) results:
[https://minimaxir.com/2019/09/ctrl-fake-
news/](https://minimaxir.com/2019/09/ctrl-fake-news/)

------
cfoster0
Big fan of this work. The headline is only slightly misleading: you'll still
need a CUDA-enabled GPU to generate text at a reasonably interactive speed.
But you don't have to fine tune a language model beforehand, which is a _huge_
win. Plus, you can use whatever attribute model you'd like, provided you can
get a gradient from it.

------
sdan
"Fortunately, Uber AI’s Plug and Play Language Model allows researchers to
make use of the few pretrained models out there: rather than requiring
everyone to train their own wooly mammoth, PPLM lets users combine small
attribute models with an LM to steer its generation. Attribute models can be
100,000 times smaller than the LM and still be effective in steering it, like
a mouse sitting atop our wooly mammoth friend and telling it where to go
(Figure 1)."

Essentially, Uber put some attribute models on top of GPT-2 to steer how text
is generated. This is better than finetuning the entire model which at the
moment requires access to expensive TPUs or 32GB vRAM GPUs which are available
to select research institutions/I think AWS may have them at a high cost.

Finetuning is great when you want to specifically create poems or songs
instead of some jumbled up string of words which may not be what you're aiming
to create.

