Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A Python tool for text-based AI training and generation using GPT-2 (github.com)
174 points by minimaxir 16 days ago | hide | past | web | favorite | 41 comments

For fun, here's a little demo of aitextgen that you can run on your own computer.

First install aitextgen:

    pip3 install aitextgen
Then you can download and generate from a custom Hacker News GPT-2 model I made (only 30MB compared to 500MB from the 124M GPT-2) using the CLI!

    aitextgen generate --model minimaxir/hacker-news --n 20 --to_file False
Want to create Show HN titles? You can do that.

    aitextgen generate --model minimaxir/hacker-news --n 20 --to_file False --prompt "Show HN:"

Show HN: Numericcal – A simple, distributed, and fast backups

Show HN: A simple, free and open source alternative to Turkish potatoies

Show HN: A boilerplate for mobile development

Show HN: Simple UI Gao-Parser (for the Web)

Show HN: A fast, fully-featured web application framework

Show HN: I have a side project you want to sell in a startup?

Show HN: S3CARP Is Down

Show HN: Finding the right work with friends and family

Show HN: I built a webapp to remind users to view your photoshopped stripes

Show HN: Send a hands-only gift reason to the Mark Zuckerberg & Stay a lot.

Show HN: A simple, high-performance, full-disk encryption

Show HN: Peer-to-peer programming language

Show HN: Browse and duplicate images in your app's phone

Show HN: Waze – Send a face back end to the internet

Show HN: A simple, minimal, real-time building app to control your Mac.

Show HN: Sheldonize – A collaborative group for startups

Show HN: Gumroad – Make your web app faster

Show HN: An easy way to track time using MD5?

Show HN: A simple, fast, and elegant ORM/Lambda: progressive web apps for Vim

Show HN: A simple landing page I've been working on elsdst Certy. Here is how I was within the last year

>progressive web apps for Vim

Well it knows how to get HN users attention all right.

Show HN: An easy way to track time using MD5?

I need to see this in action.

Hash the current time in seconds, every time the first three hex digits are 000 then on average another hour and 8 minutes has passed.

It's easy, awkward, time consuming and probably pretty wrong for tracking hours. Just like regular time tracking!

I get a source.error("unbalanced parenthesis") if I put unbalanced parentheses in the --prompt "argument)"

Likely same issue as here: https://github.com/minimaxir/aitextgen/issues/8

Will take a look.

how does one download these models?

  Ask HN: What's your favorite computer science podcasts?
  Ask HN: How do I convince a non-technical exercise to keep a journal
  Ask HN: Is it just me or not?
  Ask HN: What do I do with my MVP?
  Ask HN: How to sell?
  Ask HN: How do you use HackerNews?
  Ask HN: Best way to make a B2B startup?
  Ask HN: Why do I have to live in San Francisco?
  Ask HN: How to tell my heart changes?
  Ask HN: How to deal with the difference between a job interview and a product?
  Ask HN: What is your favorite open-source sytem?
  Ask HN: What are your favorite blogs and resources?
  Ask HN: What are the best books for learning a new language/frameworks?
  Ask HN: What's your favorite HN post?
  Ask HN: What is your favorite RSS reader
  Ask HN: Is the SE not a mistake like a safe space business?
  Ask HN: How do I start programming in a job?

Seems you figured it out, but for posterity, it will automatically download the models if not cached.

first you have to run python

  from aitextgen import aitextgen
  ai = aitextgen(model="minimaxir/hacker-news")
then use the cli.. I guess.

I've been following minimaxir's work with GPT-2 for a while - I've tried building things on https://github.com/minimaxir/gpt-2-simple for example - and this looks like a HUGE leap forward in terms of developer usability. The old stuff was pretty good on that front, but this looks absolutely amazing. Really exciting project.

This is just brilliant. For someone who has little working knowledge but has massive interest in this field I found your guide exceptionally well written and newbie friendly (the way you've mentioned on how to setup this and that and left so many tips throughout is indeed very useful).

I'm going to have a lot of fun with this and this is going to be my starting point about learning more about colab notebooks and ai (always loved doing practical things instead of reading theory to learn something new).

Kudos to you for all this amazing work.

p.s. sorry if this is a lame question, but can this be used like how gmail recently has started to autocomplete my email sentences?

Awesome work! Whenever people tell me they want to get started with NLP I tell them to play around with your libraries as they're the easiest way to immediately start doing cool things.

Huge fan of your gpt2-simple library, which I used to train a satirical news generator in a Colab notebook: https://colab.research.google.com/drive/1buF7Tju3DkZeL-EV4Ft...

> Generates text faster than gpt-2-simple and with better memory efficiency! (even from the 1.5B GPT-2 model!)

This is exciting news. One of very few drawbacks of gpt2-simple is the inability to fine-tune a model of more than ~355M parameters. Do these memory management improvements make it possible to fine-tune a larger one?

> Do these memory management improvements make it possible to fine-tune a larger one?

Unfortunately not yet; I need to implement gradient checkpointing first. Memory-wise, the results for finetuning 124M are promising (<8 GB VRAM when it used to take about 12 GB VRAM with gpt-2-simple)

Have I been using gpt-2-simple wrong..? I’ve been fine-tuning 355M on a 8GB 1080 for months..

gpt-2-simple has gradient checkpointing; aitextgen does not (yet).

This looks great!

If I want to fine-tune this to some text data, are there obvious constraints to be aware of? I've got a reasonable amount of text (~50-100G) but seeing that there's a json file created makes me think that's probably too much. gpt-2-simple seems to describe 100M as 'massive' so what's a reasonable amount to aim for?

Or should I be training from scratch? (edit - looking into training from scratch since I don't have thousands to throw at this I'm guessing that's a 'no')

~50-100G isn't "some" text data. The original GPT-2 was trained on 40G of text data.

I'm not 100% sure you can encode and store that much data in memory with the current implementation, even with the fast tokenizers.

Oh, that's less than I was expecting - I'm used to having significantly less data to play with than the major entities. I guess I do but in this case a pretty reasonable amount of data was enough for very impressive results.

> I'm not 100% sure you can encode and store that much data in memory with the current implementation, even with the fast tokenizers.

That makes sense. I wasn't too sure what sensible sizes would be, there's probably some interesting subsets of the data I could take though and use for fine tuning (or some sampled data) - maybe down to 100M as that sounded like a large-but-ok amount to use.

I'm looking forward to seeing what I can get out of this, thanks for making something simple enough that I can do that for a "I wonder if" kind of problem!

Your API looks really clean but what's the difference between this and just GPT-2 (or) HuggingFace's implementations?

I talk about deviations from previous approaches in the DESIGN doc (https://github.com/minimaxir/aitextgen/blob/master/DESIGN.md), but to answer the difference between aitextgen and Huggingface Transformers:

Model I/O: aitextgen abstracts some of the boilerplate and supports custom GPT-2 models and importing the old TensorFlow models better.

Training: Completely different from Transformers. Different file processing and encoding, training loop leverages pytorch-lightning.

Generation: Abstracts boilerplate, allowing addition of more utility functions (e.g bolding when printing to console, allow printing bulk text to file). Generation is admittingly not that much different than Transformers, but future iterations will increasingly diverge.

Does anyone know an efficient way to "embed" models like this? I'm currently working in a Tamagotchi-style RPI toy and I use GPT-2 to generate answers to the chat. I wrote a simple API that returns from the server. If I could embed my model, it would save me having to have a server.

The hard part of embedding is that the smallest 124M GPT-2 model itself is huge at 500MB, which would be unreasonable for performance/storage on the user end (and quantization/tracing can't save that much space).

Hence why I'm looking into smaller models, which has been difficult, but releasing aitextgen was a necessary first step.

The size of the model you need to get good enough generation with something like GPT-2 is going to be pretty impractical on a raspberry pi. You might maybe be able to fit a 3-layer distilled GPT-2 in RAM (not quite sure what the latest RPI have in term of RAM, 4GB?), but the latency is going to be pretty horrible (multiple seconds).

why not put it on a server, and just use an api to communicate and get the results, then the embed of the code that interfaces w/ api should be much smaller, and the server can be as big as you need.

What do you mean by embed the model?

Don't understand why this didn't get more hype. This is amazing. Well done

AI text generation in general is an industry that's been underhyped. Which is why I'm trying to help shape it. :)

Very cool. Can I ask what your use case is, or if it's just for fun?

I had to do something similar (not this library, but I wish I had known about it) just last week. I'm building out a product demo and I wanted to fill it with books. I didn't want to go searching for out of print books, so I created fake authors, book titles, descriptions, and reviews. The longer text was sometimes great, and sometimes had to be redone but overall it worked really well.

I intend to productionize text generation, and this is a necessary intermediate step. (gpt-2-simple had too many issues in this area so I needed to start from scratch)

That is cool, do you have a timeline set out for this?

I'll likely start by creating a web API service similar to what I did for gpt-2-simple, except more efficient: https://github.com/minimaxir/gpt-2-cloud-run

The next step is architecting an infrastructure for scalable generation; that depends on a few fixes for both aitextgen and the base Transformers. No ETA.

I’m planning to ingest all my historical email into the model and then I’ll have a generator that writes for me using my voice...

Nice trying this out

Great work, been loving gpt-2-simple recently!

One question: Why put more sh!!t on top of huggingface already really good code?

Very interesting results

>>> ai.generate(1, prompt="Trump") Trump] The best way to start your life is to have sex with someone who is still a virus

As funny as it seems, it shows what things are being associated with

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact