Llama 22B: 13B V2 with 33B attention heads frankensteined on

brucethemoose2 · on Aug 18, 2023

Llama model grafting?

Some notes:

- This is part of 33B V1 grafted onto 13B V2. They are different base models! The script is here: https://huggingface.co/chargoddard/llama2-22b/blob/main/fran...

- Its coherent, but doesn't seemingly "improve" the model, with eval scores around 13B v2

- But the larger model may have a greater capacity for learning from finetuning, and that may help "settle" the transplant.

- VRAM use seems about perfect for a single 24GB GPU: "I'm looking at 21 - 21.5GB VRAM usage on q4_M with 4K context limit and 1K used." https://old.reddit.com/r/LocalLLaMA/comments/156nvfk/llama22...

- This is just the beginning of experimentation, more finetunes are being trained with unclear results: https://huggingface.co/models?sort=modified&search=22b