Are you tied to any particular transformer model? Using a smaller model, throwing more hardware at the problem, or generating embeddings in parallel are easy ways to make it faster. Depending on what you're doing with the output you may also consider truncating your documents (can be good for stuff like clustering) or breaking apart your documents (can improve search performance).
Another option if you just want search (and aren't training or tuning your own models) is a managed search offering where you aren't responsible for generating embeddings.
Thanks for the advice! We're not tied to any model, no.
Naively I guess, at first we hoped to get by using a 3rd party API. We're hosted in GCP and tried using the Vertex AI `textembedding-gecko` model initially. But now we're investigating running models on our own infra, although not sure where we've got with it yet as someone else is working on that.
If you're committed to using a 3rd-party API, then parallelizing your API calls seems like the easiest way to speed things up. The benefits of a 3rd party API are - of course - that you're likely going to be able to generate embeddings using a much more powerful model. That being said, you may not need something as powerful as PaLM and having everything go over a network might just take too long. IME (which is entirely use-case dependent) something like SentenceTransformers (even the smallest pretrained models) can get you up and running on your own infra pretty quickly and generate embeddings with reasonable performance in a reasonable amount of time on modest hardware.
Another option if you just want search (and aren't training or tuning your own models) is a managed search offering where you aren't responsible for generating embeddings.