I'd do the transcript and the summary parts separately. Dedicated audio models f... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		satvikpendem 4 days ago \| parent \| context \| favorite \| on: Gemini 3 I'd do the transcript and the summary parts separately. Dedicated audio models from vendors like ElevenLabs or Soniox use speaker detection models to produce an accurate speaker based transcript while I'm not necessarily sure that Google's models do so, maybe they just hallucinate the speakers instead.

trvz 4 days ago [–]

Agreed. I don’t see the need for Gemini to be able to do this task, although it should be able to offload it to another model.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact