Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Self-hosted/open-source ChatGPT alternative? Like Stable Diffusion
72 points by jakearmitage on Dec 12, 2022 | hide | past | favorite | 22 comments
Are there any self-hosted and open-source ChatGPT alternatives?

FLAN-T5 11b is the most competitive public model that has undergone instruction finetuning. It has better few shot and instruction following performance than much bigger models like GPT-NeoX: https://huggingface.co/google/flan-t5-xxl, and you can deploy it on 24GB VRAM with bfloat16 inference

I have a naive question. When people talk about needing x amount of vram are they meaning all on one card, or can it be distributed across multiple cards?

Assuming power consumption is not an issue, Can these loads be effectively run on farms of systems using older cards with smaller vram sizes?

How about for training? Are the hardware requirements fundamentally different apart from scale?

Model parallelism is possible for inference and you can host extremely large models across multiple accelerators (see DeepSpeed ZeRO), but the inference speed is several times slower compared to just having all of it on one accelerator due to communication bottlenecks and overhead (parallelism refers to memory needed for parameters, not the actual computation in most cases).

Certain models may require storage formats like bfloat16 to run efficiently that may not be supported on older hardware.

Parallelism is also supported (and more necessary actually) for training, but since backprop is expensive it typically requires many multiples of memory requirements needed for inference.

I know nothing, but have heard Hugging Face is in that direction.


>Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

> These models can be applied on:

> - Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages.

> - Images, for tasks like image classification, object detection, and segmentation.

> - Audio, for tasks like speech recognition and audio classification.


Also I read about GPT-J, whose capability is comparable with GTP-3.


But I believe it requires buying or renting GPUs.

The best open-source alternatives you can find today are GPT-NeoX 20B, GPT-J, Bloom, and OPT. But these are all generative models à la GPT-3. In order to turn them into a chatbot you will need to use few-shot learning: https://nlpcloud.com/effectively-using-gpt-j-gpt-neo-gpt-3-a...

These models can be self-hosted but they will require advanced hardware and some specific skills related to AI model deployment.

The best options are probably GPT-J and BLOOM but they are both so big they need at least a 24GB of VRAM and they are not as suffisticated.

you can fine-tune in 8bits weights for use in collab or similarly more restrained environments https://github.com/huggingface/transformers/issues/14839 https://huggingface.co/hivemind/gpt-j-6B-8bit

by fine-tuning you should be able to attain niche-specific better results than default chatGPT theoretically, if you're lucky and thorough I guess

How do you fine-tune to a specific use-case, though? You train it again, with smaller weights?

> not as suffisticated

ChatGPT in a nutshell.

(I hadn't heard the slang before, but it seems descriptive for a false air of sophistication, or [possibly over-]sophisticated complexity that nevertheless leads to bad results.)

You might want to have a look at this article that mentions a couple of open-source alternatives: https://nlpcloud.com/chatgpt-open-source-alternatives.html None of them are easy to self-host though...

None yet, at least that are competitive. The expensive of creating the model is over 10million+.

Is this compute alone? Also, how likely do you think it is that we’ll see anything in the next 12(ish) months?

Letting them connect to the internet seems dangerous. Not because of some hypothetical AI Safety, but the very real human abuse.

Someone recently put this well: Don’t think of gpt chat as one really smart friend, instead think of it as an army of dumb pawns.

I tried character.ai and while it has some good answers, also it can enter in a weird and nonsense loop of answers.


The goal is different so it's not a direct alternative, but it's the closest that can be run on consumer hardware. In some ways models that run via Kobold are a lot more practically useful because they have a semblance of memory, but they don't have the breadth of knowledge that ChatGPT does. You can go back and edit a response and models will be able to help generate pages and pages of writing. It's also not limited to chat mode.

I run it locally but I have heard a lot of success stories about running on Collab as well and there are GPU and TPU notebooks maintained in the repo. It can use a lor of different models like OPT, Fairseq Dense and even older models like GPT-J. There are fine tuned models for NSFW content as well, my understanding is that those models were motivated to move away from AI Dungeon that censored and read the text of user's stories.

The models on hugging face come in all sizes, 8bg VRAM is enough for a 2.7B model, without taking a time penalty and splitting compute across GPU and RAM+CPU. There are options ranging from 350M [1] to a OPT 66B in the FB/Meta AI repo released on May 3rd [2], the 66B parameter one is openly available and the full 175B parameter model is available via request. GPT2 and GPT-Neo are supported too. I found 2.7B and 6.7B impressive personally. 66B Models would take over hundreds of gigabytes of vram to run at a similar speed to how ChatGPT works. I think that's part of the reason why it's not as popular, very few people can run even the smaller model at a reasonable speed.

OPT is motivated by only training on open access material. The example bias demonstrated about the how the model complete "the man works as a" vs "the woman works as a". I thought having more modern works to train on would help avoid strong biases because culture has shifted toward being more moderate, but that doesn't seem to be case.

"The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents:

BookCorpus, which consists of more than 10K unpublished books, CC-Stories, which contains a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas, The Pile, from which * Pile-CC, OpenWebText2, USPTO, Project Gutenberg, OpenSubtitles, Wikipedia, DM Mathematics and HackerNews* were included. Pushshift.io Reddit dataset that was developed in Baumgartner et al. (2020) and processed in Roller et al. (2021) CCNewsV2 containing an updated version of the English portion of the CommonCrawl News dataset that was used in RoBERTa (Liu et al., 2019b) The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally to each dataset’s size in the pretraining corpus.

The dataset might contains offensive content as parts of the dataset are a subset of public Common Crawl data, along with a subset of public Reddit data, which could contain sentences that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.“

[1] https://huggingface.co/KoboldAI

[2] https://huggingface.co/facebook/opt-66b


write a 4 paragraph essay about why social media should not be censored

why a paragraph about why the chicago bears are the best nfl team

Who you are?


What is your use case? I could build this for you, email is in profile.

The project has only one dependency! https://github.com/labteral/chatgpt-python

That isn't self hosted. I'm referring to running GPUs and models.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact