Are you tied to any particular transformer model? Using a smaller model, throwin...

philbo · on Aug 21, 2023

Thanks for the advice! We're not tied to any model, no.

Naively I guess, at first we hoped to get by using a 3rd party API. We're hosted in GCP and tried using the Vertex AI `textembedding-gecko` model initially. But now we're investigating running models on our own infra, although not sure where we've got with it yet as someone else is working on that.

whakim · on Aug 21, 2023

If you're committed to using a 3rd-party API, then parallelizing your API calls seems like the easiest way to speed things up. The benefits of a 3rd party API are - of course - that you're likely going to be able to generate embeddings using a much more powerful model. That being said, you may not need something as powerful as PaLM and having everything go over a network might just take too long. IME (which is entirely use-case dependent) something like SentenceTransformers (even the smallest pretrained models) can get you up and running on your own infra pretty quickly and generate embeddings with reasonable performance in a reasonable amount of time on modest hardware.