
Making audio more accessible with two new apps - prostoalex
https://www.blog.google/outreach-initiatives/accessibility/making-audio-more-accessible-two-new-apps/
======
alexgmcm
It seems to me that the audio space is still quite undeveloped when it comes
to Machine Learning.

I mean we have visual style transfer (look at the Deep Video Portraits shown
at SIGGRAPH, it's insane..) yet the equivalent in speech isn't even usable.

I think largely it is just a harder problem as you can break a video into
frames and each frame can be treated independently, of course you may wish to
take the sequential nature into account but there is a lot of information in
each static frame.

Contrast this with audio where a very small clip (analogous to a video frame)
is insufficient to get even phonemes let alone words, or to identify if the
speaker is male/female or if there is even any speech at all...

That said - I think it is the area with the most potential and the one which
interests me the most.

