Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I uploaded GGUFs, 4bit quants, dynamic quants

The dynamic quantization[1] looks really interesting. Now, I've just been dabbling, but did I understand correctly that this dynamic quantization is compatible with GUFF? If so, how do you convert it? Just the standard way or?

I was really curious to try the dynamic 4-bit version of the Llama-3.2 11B Vision model as I found the Q8 variant much better than the standard Q4_K_M variant in certain cases, but it doesn't fully fit my GPU so is significantly slower.

[1]: https://unsloth.ai/blog/dynamic-4bit



Oh the dynamic 4bit quants sadly are not GGUF compatible yet - it currently works through Hugging Face transformers, Unsloth and other trainers.

My goal was to make a dynamic quant for GGUF as well - it's just a tad bit complicated to select which layers to quantize and not with GGUF - I might have to manually edit the llama.cpp quantize C file

Also I'm unsure yet if llama.cpp as of 11th Jan 2025 supports Llama Vision (or maybe it's new?) I do remember Qwen / Llava type models are working


> Oh the dynamic 4bit quants sadly are not GGUF compatible yet

Ah bummer. Is this a GGUF file-format issue or mostly "just" a code-doesn't-exist issue?

> Also I'm unsure yet if llama.cpp as of 11th Jan 2025 supports Llama Vision (or maybe it's new?)

Ah, I totally forgot Ollama did that on their own and didn't merge upstream.

I'm using Ollama because was so easy to get running on my main Windows rig so I can take advantage of my GPU there, I still do a bit of gaming, while all the stuff which uses Ollama for inference I run on my server.

Anyway thanks for the response.


> Ah bummer. Is this a GGUF file-format issue or mostly "just" a code-doesn't-exist issue?

Just code! Technically I was working on dynamic quants for DeepSeek V3 (200GB in size), which will increase accuracy by a lot for a 2bit model (if you leave attention in 4bit), and just use 20GB more. -> But I'm still working on it!

> Ah, I totally forgot Ollama did that on their own and didn't merge upstream.

Yep they have Llama vision support! Llama.cpp has Qwen, Llava support - I think Llama V support is coming, but it'll take much more time - the arch is vastly different than normal transformers due to cross attention




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: