Hacker News new | past | comments | ask | show | jobs | submit login

Probably because they're uploading (and playing back) from a webpage and Web Audio is weird and inconsistent, so sticking to a builtin codec is probably more reliable. As someone who trains on their data, it seems usable anyway. Training on 1000 hours of Common Voice makes my model better in very clear ways.

https://caniuse.com/#search=mp3

https://caniuse.com/#search=opus

I got flac working for speech.talonvoice.com with an asm codec so they could do whatever in theory, but I do get some audio artifacts sometimes.




Yeah especially compatibility with Apple browsers was very important for them. I'd added functionality to normalize audio for verification but they removed it multiple times because it didn't work on Safari for various reasons.

I ended up building an extension for Firefox that normalizes the audio on the website if installed: https://github.com/est31/vmo-audio-normalizer https://addons.mozilla.org/de/firefox/addon/vmo-audio-normal...

In general I don't think normalization should happen at the backend. It's useful for training data to have multiple loudness levels, so that the network can understand them all.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: