I read the doubts on your approach compared to bge, gte, e5; however I am wondering if the advantage of your approach could be:
- 2048 token context.
- multilingual support.
- in-context learning or other discrete prompt tuning might enable to beat bge/gte/e5 on some tasks.
- optimization with quantized models, finetuned models...
I am just wondering about speed vs bge/gte/e5