Hacker News new | past | comments | ask | show | jobs | submit login

Sorry if this is a n00b question, but how does one go about getting a hold of the GPT-2 model? I know that GPT-3 is only available for consumption on a pay-per-use API model.



The GPT-2 weights were released by OpenAI when GPT-2 was released. (https://github.com/openai/gpt-2)

Around that time (since no one else was doing it) I released a wrapper to streamline that code and make it much easier to finetune on your own data. (https://github.com/minimaxir/gpt-2-simple)

Nowadays, the easiest way to interact with GPT-2 is to use the transformers library (https://github.com/huggingface/transformers), of which I've created a much better library for GPT-2 that leverages it. (https://github.com/minimaxir/aitextgen)


Most “regular” people experiment with GPT-3 via AI Dungeon. It was also how a lot of people played with GPT-2 previously.


Kind of sad that we have to rely on third parties with enough compute power to get a good experience with GPT. I obtained 'a' GPU and spent a week trying to learn how to fine-tune GPT with it, and the results were terrible. Maybe this isn't the kind of thing suitable for hacking on as a hobby, unless you're a researcher or employee with access to a lot of capital and dozens of GPUs.

Still wish the old talktotransformer model was released, instead of monetized behind a new company. I haven't been able to find a comparable model yet.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: