Nice work and starred! I am curious how you get the embeddings computed, the # of dimensions for the embeddings and if you have run any benchmarks against OpenAI's offering?
The embeddings are computed using llama-cpp, but langchain makes a nice convenience wrapper to directly get them, so I use that. The embeddings are 4096 dimensional vectors.
And no, I haven’t benchmarked them against OpenAI’s embeddings. I should point out that this code will work for any model in GGML format, so if there are fine-tuned Llama2 versions that are optimized for embedding, you could use those instead very easily (or any other model). This project is more about making it easy to go from model to embeddings on demand via an API and then letting you do useful things with those embeddings easily.
Cheers!