
Show HN: A Python tool for text-based AI training and generation using GPT-2 - minimaxir
https://github.com/minimaxir/aitextgen
======
minimaxir
For fun, here's a little demo of aitextgen that you can run on your own
computer.

First install aitextgen:

    
    
        pip3 install aitextgen
    

Then you can download and generate from a custom Hacker News GPT-2 model I
made (only 30MB compared to 500MB from the 124M GPT-2) using the CLI!

    
    
        aitextgen generate --model minimaxir/hacker-news --n 20 --to_file False
    

Want to create Show HN titles? You can do that.

    
    
        aitextgen generate --model minimaxir/hacker-news --n 20 --to_file False --prompt "Show HN:"

~~~
totetsu
how does one download these models?

~~~
totetsu

      Ask HN: What's your favorite computer science podcasts?
      ==========
      Ask HN: How do I convince a non-technical exercise to keep a journal
      ==========
      Ask HN: Is it just me or not?
      ==========
      Ask HN: What do I do with my MVP?
      ==========
      Ask HN: How to sell?
      ==========
      Ask HN: How do you use HackerNews?
      ==========
      Ask HN: Best way to make a B2B startup?
      ==========
      Ask HN: Why do I have to live in San Francisco?
      ==========
      Ask HN: How to tell my heart changes?
      ==========
      Ask HN: How to deal with the difference between a job interview and a product?
      ==========
      Ask HN: What is your favorite open-source sytem?
      ==========
      Ask HN: What are your favorite blogs and resources?
      ==========
      Ask HN: What are the best books for learning a new language/frameworks?
      ==========
      Ask HN: What's your favorite HN post?
      ==========
      Ask HN: What is your favorite RSS reader
      ==========
      Ask HN: Is the SE not a mistake like a safe space business?
      ==========
      Ask HN: How do I start programming in a job?

------
simonw
I've been following minimaxir's work with GPT-2 for a while - I've tried
building things on
[https://github.com/minimaxir/gpt-2-simple](https://github.com/minimaxir/gpt-2-simple)
for example - and this looks like a HUGE leap forward in terms of developer
usability. The old stuff was pretty good on that front, but this looks
absolutely amazing. Really exciting project.

------
superasn
This is just brilliant. For someone who has little working knowledge but has
massive interest in this field I found your guide exceptionally well written
and newbie friendly (the way you've mentioned on how to setup this and that
and left so many tips throughout is indeed very useful).

I'm going to have a lot of fun with this and this is going to be my starting
point about learning more about colab notebooks and ai (always loved doing
practical things instead of reading theory to learn something new).

Kudos to you for all this amazing work.

p.s. sorry if this is a lame question, but can this be used like how gmail
recently has started to autocomplete my email sentences?

------
starskublue
Awesome work! Whenever people tell me they want to get started with NLP I tell
them to play around with your libraries as they're the easiest way to
immediately start doing cool things.

------
neoncontrails
Huge fan of your gpt2-simple library, which I used to train a satirical news
generator in a Colab notebook:
[https://colab.research.google.com/drive/1buF7Tju3DkZeL-
EV4Ft...](https://colab.research.google.com/drive/1buF7Tju3DkZeL-
EV4FtSu2KEJ4yepbL0?usp=sharing)

> Generates text faster than gpt-2-simple and with better memory efficiency!
> (even from the 1.5B GPT-2 model!)

This is exciting news. One of very few drawbacks of gpt2-simple is the
inability to fine-tune a model of more than ~355M parameters. Do these memory
management improvements make it possible to fine-tune a larger one?

~~~
minimaxir
> Do these memory management improvements make it possible to fine-tune a
> larger one?

Unfortunately not yet; I need to implement gradient checkpointing first.
Memory-wise, the results for finetuning 124M are promising (<8 GB VRAM when it
used to take about 12 GB VRAM with gpt-2-simple)

~~~
britmob
Have I been using gpt-2-simple wrong..? I’ve been fine-tuning 355M on a 8GB
1080 for months..

~~~
minimaxir
gpt-2-simple has gradient checkpointing; aitextgen does not (yet).

------
IanCal
This looks great!

If I want to fine-tune this to some text data, are there obvious constraints
to be aware of? I've got a reasonable amount of text (~50-100G) but seeing
that there's a json file created makes me think that's probably too much.
gpt-2-simple seems to describe 100M as 'massive' so what's a reasonable amount
to aim for?

Or should I be training from scratch? (edit - looking into training from
scratch since I don't have thousands to throw at this I'm guessing that's a
'no')

~~~
minimaxir
~50-100G isn't "some" text data. The _original_ GPT-2 was trained on 40G of
text data.

I'm not 100% sure you can encode and store that much data in memory with the
current implementation, even with the fast tokenizers.

~~~
IanCal
Oh, that's less than I was expecting - I'm used to having significantly less
data to play with than the major entities. I guess I do but in this case a
pretty reasonable amount of data was enough for very impressive results.

> I'm not 100% sure you can encode and store that much data in memory with the
> current implementation, even with the fast tokenizers.

That makes sense. I wasn't too sure what sensible sizes would be, there's
probably some interesting subsets of the data I could take though and use for
fine tuning (or some sampled data) - maybe down to 100M as that sounded like a
large-but-ok amount to use.

I'm looking forward to seeing what I can get out of this, thanks for making
something simple enough that I can do that for a "I wonder if" kind of
problem!

------
alphagrep12345
Your API looks really clean but what's the difference between this and just
GPT-2 (or) HuggingFace's implementations?

~~~
minimaxir
I talk about deviations from previous approaches in the DESIGN doc
([https://github.com/minimaxir/aitextgen/blob/master/DESIGN.md](https://github.com/minimaxir/aitextgen/blob/master/DESIGN.md)),
but to answer the difference between aitextgen and Huggingface Transformers:

Model I/O: aitextgen abstracts some of the boilerplate and supports custom
GPT-2 models and importing the old TensorFlow models better.

Training: _Completely_ different from Transformers. Different file processing
and encoding, training loop leverages pytorch-lightning.

Generation: Abstracts boilerplate, allowing addition of more utility functions
(e.g bolding when printing to console, allow printing bulk text to file).
Generation is admittingly not that much different than Transformers, but
future iterations will increasingly diverge.

------
jakearmitage
Does anyone know an efficient way to "embed" models like this? I'm currently
working in a Tamagotchi-style RPI toy and I use GPT-2 to generate answers to
the chat. I wrote a simple API that returns from the server. If I could embed
my model, it would save me having to have a server.

~~~
minimaxir
The hard part of embedding is that the smallest 124M GPT-2 model itself is
huge at 500MB, which would be unreasonable for performance/storage on the user
end (and quantization/tracing can't save _that_ much space).

Hence why I'm looking into smaller models, which has been difficult, but
releasing aitextgen was a necessary first step.

------
brendanfalk
Don't understand why this didn't get more hype. This is amazing. Well done

~~~
minimaxir
AI text generation in general is an industry that's been underhyped. Which is
why I'm trying to help shape it. :)

------
harshalaxman
Very cool. Can I ask what your use case is, or if it's just for fun?

~~~
minimaxir
I intend to productionize text generation, and this is a necessary
intermediate step. (gpt-2-simple had too many issues in this area so I needed
to start from scratch)

~~~
jramz
That is cool, do you have a timeline set out for this?

~~~
minimaxir
I'll likely start by creating a web API service similar to what I did for
gpt-2-simple, except more efficient:
[https://github.com/minimaxir/gpt-2-cloud-
run](https://github.com/minimaxir/gpt-2-cloud-run)

The next step is architecting an infrastructure for scalable generation; that
depends on a few fixes for both aitextgen and the base Transformers. No ETA.

------
cedyf
Nice trying this out

------
britmob
Great work, been loving gpt-2-simple recently!

------
master_yoda_1
One question: Why put more sh!!t on top of huggingface already really good
code?

------
IRegretNothing
Very interesting results

>>> ai.generate(1, prompt="Trump") Trump] The best way to start your life is
to have sex with someone who is still a virus

As funny as it seems, it shows what things are being associated with

