
Show HN: Audible for the web, Switch seamlessly between audio and text - avantion
<a href="http:&#x2F;&#x2F;bit.ly&#x2F;2qeHLev" rel="nofollow">http:&#x2F;&#x2F;bit.ly&#x2F;2qeHLev</a><p>I’ve wanted to listen to my favorite articles or web novels and switch seamlessly between the audio and text so I can read, and if I want to cook or whatever else, I can continue off the audio. I could not find it anywhere, so I created one: <a href="http:&#x2F;&#x2F;bit.ly&#x2F;2qeHLev" rel="nofollow">http:&#x2F;&#x2F;bit.ly&#x2F;2qeHLev</a>. Click on the audio version, scrub the player and scroll the content to test it out. All of it is done programatically, so it works with anyone&#x27;s voice. Also, it works on the mobile web so feel free to check it out on your phone.<p>What do you guys think about my side project? I&#x27;d love to hear your thoughts
======
vitovito
I don't see any audio controls, or audio download progress, in either Safari,
Chrome, or Firefox. Looks like a syntax error, there's some HTML in a
Javascript file.

Conceptually, since I can't try it, it reminds me a bit of hyperaudio. Have
you seen [http://hyperaud.io](http://hyperaud.io)?

I also did a hypertranscript similar to this here:
[http://vitor.io/uxr101](http://vitor.io/uxr101) Click on any word to scrub to
it, or click in whitespace to toggle play/pause.

~~~
avantion
How did u manage the sync between the audio and words? ... In my little demo,
I've used IBM Watson's Speech to text API to analyze the audio. I used those
results - it had timestamps, to the recognized words - and used some
algorithms to match it with the text. Naturally, it did not catch like 20% of
the words but was able to massage it enough that it looked smooth.

~~~
vitovito
I had a verbatim transcript already, and I used CMU Sphinx to do the alignment
between that and the recording.

With raw audio only, I'd probably use a transcription service that provides
timings, or software like Trint to do machine learning first, and then clean
up by hand, like you did.

The software I wrote to handle the playback is here:
[https://github.com/vitorio/hyperaudio-
lite](https://github.com/vitorio/hyperaudio-lite)

------
visarga
I have a similar system implemented with TTS. It highlights text as it reads
it. Do you depend on human generated audio files?

~~~
avantion
I'd love to see it. Do you have a link? ... Yes, I depend on the human
generated audio file over TTS. The synthesized voice is not so great though
it's getting better every year. I wanted something similar to Audible's
WisperSync ([http://www.audible.com/mt/wfs](http://www.audible.com/mt/wfs))
which uses real voice narrators.

~~~
visarga
It's just a bookmarklet for Chrome on OSX, using the Alex voice from the
operating system. I put inside it a ton of JS, to find text and read it back,
while changing the color of words. You can trigger it with clicking on a word.
The script would also restyle text so it is easier to read.

[https://gist.github.com/Visarga/e6597edcb7ec6993521829ef5c17...](https://gist.github.com/Visarga/e6597edcb7ec6993521829ef5c1798db)

The code is uglified for bookmark use, but the original is not something I
would be proud to share (it's almost a mess).

By the way, I have special treatment for YC, reddit and arXiv. Only the posts
are converted into speech, skipping other texts.

------
optikals
that's The Thing 5+

~~~
avantion
I'm not really sure what you mean by 'The Thing 5+'

