
Salesforce releases language model bigger than GPT-2 large - strin
https://github.com/salesforce/ctrl
======
minimaxir
I am working on a guide (should be released tomorrow) to easily get it up and
running for personal use. Here's my Twitter thread of current experiments with
the model:
[https://twitter.com/minimaxir/status/1173081315177975810](https://twitter.com/minimaxir/status/1173081315177975810)

I recommend reading the linked paper in the repo as it gives decent
examples/instructions on how to use the model. Although the size and
architecture is comparable to GPT-2, the emphasis on conditional generation
differentiates it.

~~~
riku_iki
> running for personal use

how one can use it for personal use? In my understanding it will not fit into
single GPU memory available to average person? Someone need to distill model
first?

~~~
minimaxir
It currently fits into a P100, but _barely_.

------
purple_ducks
Wow, that's some license addendum:

> This software should not be used to promote or profit from:

> violence, hate, and division,

> environmental destruction,

> abuse of human rights, or

> the destruction of people's physical and mental health.

~~~
solarkraft
So Salesforce could never use it?

~~~
bmm6o
Not sure which of those you think they would be violating? And they own it, so
the license doesn't apply to them, it applies to everyone else. (Apologies if
you're making a joke and I'm ruining it)

~~~
solarkraft
You can very easily argue that Salesforce, by dealing with companies who do
it, does all of the things that are forbidden by the license.

------
rdiddly
Anyone have a real-world use case for something like this? I must admit I'm
having trouble thinking of any that aren't essentially deceptive. Because in
my little biased world, I have no need of "text" per se, and what value any
text has to me is closely linked to the fact that it came from a human.

~~~
zawerf
Machine learning researchers aren't working on language modeling because they
want to enable fake news.

They are working on it because it improves all downstream NLP tasks. See:
[http://ruder.io/nlp-imagenet/](http://ruder.io/nlp-imagenet/). BERT, Elmo and
XLNet all fall under this use case.

For example if you're trying to recognize speech or translate some text, it
helps a lot if you can start off producing something that is statistically
grammatical even if the content is nonsense.

------
skybrian
From the blog post: "Beyond the technical work to develop this model, we’ve
also taken several steps to anticipate and mitigate malicious use cases where
possible."

From the preprint, this seems to be doing some review before release and
having a code of conduct in the GitHub repo.

------
novalis78
The unicorn prompt is the new text generator lorem ipsum

------
visarga
It was trained on 140GB of text on 256 TPUs for 2 weeks, the model being made
of 48 transformer layers. I'm wondering when we will see a model trained on
1TB or 10TB of text.

~~~
p1esk
I doubt training a scaled up transformer on 10TB of text will lead to
significant improvements (btw, 10TB is about the size of all books in English
in the Library of Congress). Image classifiers don't get _a lot_ better when
trained on a lot more data than ImageNet. 140GB is probably enough to train a
general model, which could be finetuned on extra data for specific tasks.

Text generators need a world model and situational awareness, something like a
map and a GPS signal. So we are probably two major breakthroughs away from a
machine that actually _understands_ something (or at least which seems to
understand something, if you're philosophically opposed to the idea that a
machine can understand something).

------
foundart
Could someone provide a high level summary of what this is for a technical
person not conversant with the field?

~~~
csande17
Salesforce has created a computer program where you put in a small prompt,
like "Wikipedia page about badgers" or "News article starting with the line,
'Donald Trump was impeached today'", or "French translation of 'I like
pears'", and it tries to predict what the text will be. You can also run the
program in reverse, where you put in a snippet of text and it predicts whether
it came from Wikipedia or a mystery novel or the fitness subreddit.

Salesforce created the program by first writing some relatively simple linear
algebra, then fiddling with the constants until the output happened to look
right. Their program contains 1.6 billion constants, which is more than any
other program of its kind.

This program is also special because Salesforce has released it publicly;
other organizations, like OpenAI, have previously claimed that text-generation
software is too dangerous to release to the general public.

~~~
lixtra
> writing some relatively simple linear algebra

Except, that it wouldn’t work if it was purely linear.

~~~
csande17
Right, yeah, it's linear algebra combined with a few non-linear functions. The
point is that Salesforce didn't come up with an algorithm that generated
English text by writing a grammar or thinking really hard about what sentences
look like—all the functionality comes from the "training" process that set the
constants.

------
buboard
> Advertisement

Yeap, This one is indistinguishable from reality

------
dan_mctree
Are there any hardware reqs to work with this?

~~~
pas
In theory, no. But for any decent performance, you need a big CUDA capable
GPU, as far as I know.

But you can try it on a CPU of course. (Maybe with some modifications; see
this:
[https://news.ycombinator.com/item?id=20977776](https://news.ycombinator.com/item?id=20977776)
; also if someone can get it working in Google Colab you get a GPU capable
instance for free.)

------
kevinwang
Open AI did the right thing by not releasing their model; it's disappointing
that researchers are so callous about the potential effects of their research
in the name of progress.

~~~
csande17
I've never really gotten why AI types are so concerned about text-generation
models.

Like, sure, I can kind of see why you wouldn't want to make the Deepfakes
program public; it currently takes a lot of time, effort, and expertise to
swap faces realistically in a video, and maybe we don't want to give every
average Joe the ability to do that.

But pretty much everyone in the world can already pretty trivially write text.
(I'm doing it right now!) And the "typical" generation output from these
programs usually isn't very good—OpenAI had to try like thirty times for each
of the prompts in their PR materials—so it usually ends up being less work to
just write the fake news yourself instead of using the software.

My personal conspiracy theory is that all this talk of "the model is too
dangerous to release" really boils down to "if we let people test out the
model, they'll find it doesn't work as well as our PR team wants them to think
it does".

~~~
visarga
I dunno, this time the text looks really good. I got as far as 5 or 6 phrases
deep before it said anything silly. I would have been fooled if I red it in
real life.

My guess is that they will perfect the transformer and its training process,
curate the dataset and make this method really easy to use. Maybe it can do
translation, math, even auto-complete code. That is only by iterating more on
the current formulation of the Transformer.

But it is also possible that it is surpassed by something even better. This
new language model could replace the inductive bias specific to the
Transformer - the ability to "attend" to any part of the input text, with
something more efficient, because Transformers are quite hard and expensive to
train right now. Maybe the Transformer inductive bias is too general (like a
fully connected network) and needs too much data, with a slightly different
idea it could be made much more efficient and probably more convincing.

