Two questions:
1. Starting from this, what would be the proper way to create embeddings for a complete document (i.e. a long paragraph)? My goal is to directly compare two PDFs according to their contents. It seems that `compute_similarity_between_strings` could be used, but then why is `get_all_embedding_vectors_for_document` useful for?
2. Using your API, does the inference run directly on the VPS? Does it need special kind of hardware (GPU, TPU or whatever)?
Sorry if my questions are dumb, but I really appreciate your project simplicity, and I want to know if it could suit my needs.
Sure, the difference is that the first endpoint would give you back only one single embedding vector for the entire paragraph, while the second endpoint would give you a separate embedding vector for each sentence in the paragraph.
And yes, everything in this code is designed to run on the CPU well on a modest machine and is 100% self-hosted, no API keys needed at all. But if you do have a GPU installed and configured it will automatically use that since itβs powered by llama-cpp which now supports CUDA.
Honestly, I'm not exactly sure about that myself. Someone asked for that feature on reddit and I realized that it wouldn't be too hard to add it, so I did. Another poster here mentioned that it might be useful for doing word-level highlighting of what was most relevant to a semantic query string within a sentence. I do think it is able to capture a lot more nuance simply because it's giving you so much more data. The question is how best to leverage that incremental data and take advantage of it, and I'm still trying to figure that out.
Looks quite clean, congrats.
Two questions: 1. Starting from this, what would be the proper way to create embeddings for a complete document (i.e. a long paragraph)? My goal is to directly compare two PDFs according to their contents. It seems that `compute_similarity_between_strings` could be used, but then why is `get_all_embedding_vectors_for_document` useful for? 2. Using your API, does the inference run directly on the VPS? Does it need special kind of hardware (GPU, TPU or whatever)?
Sorry if my questions are dumb, but I really appreciate your project simplicity, and I want to know if it could suit my needs.
Thanks for sharing this piece of work.
;-)