want to thank the op for sharing this; i threw this together in the last couple of days ramping out to the "steamroll" - we think one of the key problems in LLMs in general but esp voice is evals and wanted to have a good place to evaluate voice-to-voice systems. these systems can be end-to-end like openai or (asr+llm)->tts or asr->(llm+tts) or asr->llm->tts
we built an ELO benchmark very much in the style of LMSYS and will be releasing results every two weeks
we built an ELO benchmark very much in the style of LMSYS and will be releasing results every two weeks
source code here: https://github.com/thevoicecompany/bench.audio
will be adding proper contributing guide soon