I have got Vicuna-13B working on GTX 3090 Ti + OpenCL + CPU with 90% of weights on the GPU (otherwise running out of memory) at around 500ms per token.
This model is really good for a (semi-)open source model. I think this may be the first locally runnable model that I will actually use for real stuff rather than just play around for fun.
It's not ChatGPT level but it's not that far behind. It will draw ASCII art HUDs for a text adventure or analyze data or recognize languages or write stories. AFAIK it's been trained on ChatGPT discussions so makes sense.
This AI still gets uppity sometimes about offensive content but unlike ChatGPT, you can edit the prompts to put words in its mouth to encourage it to answer properly.
I only got it working at all yesterday and there's no nice UX at all. Not sure I recommend trying to use this as llama.cpp will probably have this in no time with a much better user experience, although I am also trying to make it more usable.
If you follow the instructions on Vicuna page over how to apply the deltas, and you can compile the project, then you could run:
Where /models/vicuna13b is the HuggingFace-compatible model. This will put 90% of weights on GPU and remaining 10% non CPU which is just barely enough to not run out of GPU memory (on a 24 gig card)
Create a text file 'prompt' with the prompt. I've been using this template:
You are a helpful and precise assistant for checking the quality of the answer.###Human: Can you explain nuclear power to me?###Assistant:
(the model seems to use ### as delimiters to distinguish Human and Assistant). The "system prompt" is whatever text is written at the beginning.
The feature to load a % to the GPU is novel and amazing! I couldn't get the project up and running myself (requires a nightly rust build) but I love this particular innovation.
This model is really good for a (semi-)open source model. I think this may be the first locally runnable model that I will actually use for real stuff rather than just play around for fun.
It's not ChatGPT level but it's not that far behind. It will draw ASCII art HUDs for a text adventure or analyze data or recognize languages or write stories. AFAIK it's been trained on ChatGPT discussions so makes sense.
This AI still gets uppity sometimes about offensive content but unlike ChatGPT, you can edit the prompts to put words in its mouth to encourage it to answer properly.