The latter depends very strongly on how much computation is needed to compute th...

huac · 2023-07-17T18:21:05

no, because the compute intensity scales with the number of classes which you wish to classify to. if you have n classes, you need to do n gzip compressions at inference time. in the embedding world, you only call the embedding model once on insert, and only need to dot product at inference time.

the same logic extends to using a self-hosted embedding model, which tend to be as good as Ada on most benchmarks, and yes, can be finetuned over your private data.

marcinzm · 2023-07-17T19:02:50

>The latter depends very strongly on how much computation is needed to compute those embedding vectors.

Sure but the gzip metrics are worse than FastText which computes the embeddings in essentially no time. Tokenize, lookup embeddings by token id, and then do some averaging. So compared to that the gzip approach is very slow.