Some notes:
- This is part of 33B V1 grafted onto 13B V2. They are different base models! The script is here: https://huggingface.co/chargoddard/llama2-22b/blob/main/fran...
- Its coherent, but doesn't seemingly "improve" the model, with eval scores around 13B v2
- But the larger model may have a greater capacity for learning from finetuning, and that may help "settle" the transplant.
- VRAM use seems about perfect for a single 24GB GPU: "I'm looking at 21 - 21.5GB VRAM usage on q4_M with 4K context limit and 1K used." https://old.reddit.com/r/LocalLLaMA/comments/156nvfk/llama22...
- This is just the beginning of experimentation, more finetunes are being trained with unclear results: https://huggingface.co/models?sort=modified&search=22b
Some notes:
- This is part of 33B V1 grafted onto 13B V2. They are different base models! The script is here: https://huggingface.co/chargoddard/llama2-22b/blob/main/fran...
- Its coherent, but doesn't seemingly "improve" the model, with eval scores around 13B v2
- But the larger model may have a greater capacity for learning from finetuning, and that may help "settle" the transplant.
- VRAM use seems about perfect for a single 24GB GPU: "I'm looking at 21 - 21.5GB VRAM usage on q4_M with 4K context limit and 1K used." https://old.reddit.com/r/LocalLLaMA/comments/156nvfk/llama22...
- This is just the beginning of experimentation, more finetunes are being trained with unclear results: https://huggingface.co/models?sort=modified&search=22b