Looks quite interesting! I’ve been working on AnyModal, a framework for integrating different data types (like images and audio) with LLMs: https://github.com/ritabratamaiti/AnyModal. It seems that voyage-multimodal-3 would be quite promising in developing multimodal LLMs, but I am not sure if that is the intended use case.