MMMercy2's comments

MMMercy2 · on April 6, 2023

You can try the smaller 7B version.

MMMercy2 · on April 3, 2023

You can use this command to apply the delta weights. (https://github.com/lm-sys/FastChat#vicuna-13b) The delta weights are hosted on huggingface and will be automatically downloaded.

superkuh · on April 3, 2023

Thanks! https://huggingface.co/lmsys/vicuna-13b-delta-v0

Edit, later: I found some instructive pages on how to use the vicuna weights with llama.cpp (https://lmsysvicuna.miraheze.org/wiki/How_to_use_Vicuna#Use_...) and pre-made ggml format compatible 4-bit quantized vicuna weights, https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/tree/ma... (8GB ready to go, no 60+GB RAM steps needed)

eurekin · on April 3, 2023

I did try, but got:

``` ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. ```

superkuh · on April 3, 2023

> Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.

DrSiemer · on April 4, 2023

Just rename it in the tokenconfig.json

eurekin · on April 4, 2023

Thanks, that indeed worked!

This and using conda in wsl2, instead on bare windows

MMMercy2 · on April 3, 2023

They are the parameters of this large language model. There are 13B fp16 numbers.

MMMercy2 · on March 31, 2023

There are certainly some effective language model benchmarks; however, they are not well-suited for evaluating a chat assistant. Some projects employ human evaluation, while this blog post explores an alternative approach based on GPT-4. Both methods have their advantages and disadvantages, making this blog post an intriguing case study that can inspire the future development of more comprehensive evaluations.

MMMercy2 · on March 31, 2023

I am a Vicuna developer. We plan to release the weights once we have addressed all concerns and have a low-resource version of the inference code ready. We released the demo first to get some early feedback on the model.

techdragon · on March 31, 2023

We hear a lot about "concerns" and many of us don't share the same ones... It would be good for clarity to know what are the concerns you feel are important enough to hold back releasing the weights?

zhwu · on March 31, 2023

It is mainly because of the legal issues caused by the license of llama model weights. We need to figure it out with Meta's llama team before releasing.

jart · on March 31, 2023

Hi. I wrote the weights file format that llama.cpp uses, as of yesterday https://github.com/ggerganov/llama.cpp/pull/613 What can I do to assist you getting these deltas ready?

number6 · on March 31, 2023

Financial and Political would be my guess. But maybe I just want to tease out an answer...

nenkoru · on March 31, 2023

It would be great if you can help me with this PR as well as adding a support for exporting a model that was quantized using GPTQ, bitsandbytes, plain torch. This would bring a lot of benefit from both worlds:

- Low memory footprint(thanks quantization)

- Fast inference(thanks io binding)

Particularly in case of alpaca I have seen a 5x decrease in latency on A100 and 10x on AMD EPYC. I believe this is the way for users to have an AI that could genereate a response as fast as it can on their hardware. I have also added a link to my profile on hf with small alpacas turned into ONNX format. Take a look into them.

[1] https://github.com/huggingface/optimum/pull/922

[2] https://huggingface.co/nenkoru

imjonse · on March 31, 2023

Has LoRA been considered as possible alternative for finetuning on your dataset? In that case releasing the 'diff' against the LLaMA weights would be simpler to work with.

zhisbug · on March 31, 2023

Yeah that might work but this model wasn’t tuned with lora

luckystarr · on March 31, 2023

If it's based on LLaMA, aren't these weights just some sort of "patch" for the initial model, which is licensed under a restrictive license?

Or is this work "transferable" to other LLMs, once they become available?

shallichange · on March 31, 2023

Why is it not called Vicuña as it should? Vicuna does not sound the same way

MMMercy2 · on March 30, 2023

This project fine-tunes LLaMA on ShareGPT and gets competitive performance compared to Google's Bard.

https://vicuna.lmsys.org/

zhwu · on March 30, 2023

They even have a eval page showing that they beat Bard by only training on ShareGPT. https://vicuna.lmsys.org/eval/

MMMercy2 · on March 30, 2023

It can do some basic coding. The code syntax highlighting looks cool.

> Do a quick sort in python

> Here's an example of a quicksort implementation in Python:

def quicksort(array): if len(array) <= 1: return array pivot = array[0] less = [x for x in array[1:] if x <= pivot] greater = [x for x in array[1:] if x > pivot] return quicksort(less) + [pivot] + quicksort(greater)

fauxpause_ · on March 31, 2023

Quick sort is a terrible test case as it will have numerous training data examples of this exact concept