Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I dont see the unsloth files yet but they'll be here: https://huggingface.co/unsloth/gpt-oss-20b-GGUF

Super excited to test these out.

The benchmarks from 20B are blowing away major >500b models. Insane.

On my hardware.

43 tokens/sec.

I got an error with flash attention turning on. Cant run it with flash attention?

31,000 context is max it will allow or model wont load.

no kv or v quantization.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: