> with tolerably close accuracy. No, speculative decoding has *exactly* the same... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

qeternity 62 days ago | parent | context | favorite | on: Quantized Llama models with increased speed and a ...

> with tolerably close accuracy.

No, speculative decoding has exactly the same accuracy as the target model. It is mathematically identical to greedy decoding.

kgc 60 days ago [–]

Is there a reference for this? I was wondering the same thing.

qeternity 60 days ago | [–]

Read the original whitepaper or go look at how any framework implements it.

You will see that tokens not predicted by greedy sampling of the target model are rejected. Ergo, they are mathematically identical.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact