BERT style encoder-only models, like the embedding model being discussed here, d...

woadwarrior01 on Oct 26, 2023 | parent | context | favorite | on: Jina AI launches open-source 8k text embedding

BERT style encoder-only models, like the embedding model being discussed here, don't need a KV cache for inference. A KV cache is only needed for efficient inference with encoder-decoder and decoder-only (aka GPT) models.