Hacker News new | past | comments | ask | show | jobs | submit login

BERT style encoder-only models, like the embedding model being discussed here, don't need a KV cache for inference. A KV cache is only needed for efficient inference with encoder-decoder and decoder-only (aka GPT) models.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: