Hacker News new | comments | show | ask | jobs | submit login
Show HN: Auditus – Ebook to Audiobook Conversion (auditus.cc)
146 points by Immortalin 4 months ago | hide | past | web | favorite | 53 comments

I like your project, but I'd like it more if it was accompanied by a video of the text being highlighted as it is being read (a kind of dual modality reading - visual + audio). I created such a tool for myself on MacOS with the Alex voice, that works in browsers on any page and in PDFs. I find dual modality reading to enhance my focus a lot. Btw, I use this tool to read this very thread of comments.

And of course I would want to hear the amazing Wavenet voices used in this role.

Do you plan to share yours? Looks very interesting from the dual modality. I would definitely try it out.

It's only for personal use, but if you want to try:


Also, it only works on MacOS because it uses the Alex voice which comes bundled with the system.

That's really cool, thanks for sharing!

I saw a site that highlighted the official audiobook/ebook of "The Lean Startup" on one page. I love this as a generic solution.

Also, whether by design or not, this links back to your personal website which leads to directly identifiable information about you.

Just FYI!

I just tried it, this is so awesome. Btw, so you have to manually select the text, and only then the voice option comes ?

hey.... seems like this 404s now - do you have an updated link?

The DAISY markup spec is a cool project if you’re into the tech of text/audio sync.

Also check out LearningAlly.org- a non-profit that produces audio books with synced text highlights. They are specifically oriented toward students with reading challenges but I believe anyone with some certification of learning difference can join. (They require certification to avoid conflict with copyright protections)

Funnily enough, I did look into it: https://www.youtube.com/channel/UCP4mlh8yZqOaEf0x7FKncIw

Have not quite gotten the whole video thing working yet

I made a similar tool and host some audio here. You can tap and read: http://book.vidalab.co/

I have the habit of convert everything I write into audio first before I publish it. It's a good way to make your mistakes pop out.

I use fromtexttospeech[1] to convert to audio. Judging by the voices, it seems like the author is also using the same speech engine.

If I'm not mistaken, these are from Nextup's TextAloud software.


Edit: Though this beats the 50k character limit

AWS Polly

It occurs to me that the mistakes in emphasis that I hear in these samples are the same mistakes I hear from young readers who are concentrating on decoding the words one at a time. The method that fluent text speakers use is to process the entire sentence while beginning to speak it.

I wonder if a better model of word emphasis considering whole sentences could move an automated reader out of the uncanny valley.

Currently the text is run through a sentence segmented before conversion

Hey, I just created a similar application using the Cloud Text-To-Speech API from Google and the textract (https://textract.readthedocs.io) library to extract text from lots of kinds of documents:


It runs inside a Docker container so fairly easy to try it out.

Hi! I built this to have an easier way to convert ebooks to audiobooks! The backend is powered by AWS Polly. If you have any feedback or feature requests, please feel free to drop me an email at <last 3 characters of username> @ <myusername>.com

What is the typical wait from payment to delivery? I know you can't be specific, but just ballpark it (eg. a few minutes, a few hours, etc.).

I'm not sure what the book's letter-count is, but the price was $2.82. It's only been about 10 minutes, so I'm not displeased, just curious what to expect.


Please email me the file, it should not be more than 15 min but the server's rather overloaded right now

It looks good! Could you add a plain-text (or even markdown) field as an input option? I'd be interested in trying this with blog posts and magazine articles.

(P.S. Kudos for the "Accelerando")

Planned :)

This is great! I forwarded this to my girlfriend (who is blind) and she loves it.

On Android there is http://www.hyperionics.com/atVoice/ which I really like and which seems on a similar level (at least judging by the example).

If you have any suggestions do let me know! I am currently working on adding support for PDF files, also considering adding a support for the new Google Wavenet speech synthesis but it is much more expensive (about 4x the cost) :(

I built something similar to listen to Paul Graham's essays It's a console app & uses OSX's "say" command for the TTS. Contributions are welcome. https://github.com/hemantasapkota/awesome-essays

I get a metamask (ETH Wallet) phishing warning on this site. Anyone else experience this?

Interesting. The MetaMask phishing detector keeps a blacklist of URLs/domains and compares a site's domain against it using the levenshtein distance algorithm. So it could be a false positive. After a quick check I didn't find Auditus on there:


This is cool and I figured we'd be heading down this path soon enough. A lot of the best audiobooks I've listened to were narrated by people that can do multiple voices well. I was thinking that being able to produce an audiobook that uses different voices for different characters would be great. Something like Narrator:


Narrator, though, uses Mac OS text to speech, which is nowhere near the level of Polly or Google Cloud Speech.

There are a few IOS apps that do this in real time. The best one by far is 'Voice Dream' and they use the same voices. It is basically and audiobook in your pocket anytime, anywhere for any text file and shows the words as it is reading back, start/stop/pause, adjust speed, change voice, etc etc. All around awesome. When the new google voices or equivalent make it to IOS, it will be almost human-like.

This is a good example of a tool that was created for the accessibility community (vision impaired, dyslexic) and has subsequently been adopted by mainstream readers.

As a language nerd I would like to praise naming the project after a Latin past participle.

It's really cool to see the applications made possible by the high quality, reasonably priced, and fairly licensed text-to-speech APIs offered by AWS, Azure, and Google Cloud.

The most fleshed out service of this type that I've found is narro.co, which offers web/pdf/epub/video/rss/email/text to audio conversions.

What are the best practices for doing the reverse: taking audio and producing text? I don't mind the translation to be rough, the error rate can be quite high for my purposes, but I want the process not to get stuck and recover so it processes a full length talk.

From my experiments generating subtitles from TV/movie audiotrack, 75% (worst case) to 95% (best case). If you model it as a standard distribution, somewhere around 85-90% accuracy. Most services provide much better accuracy for stuff like calls or conferences with proper microphones and minimal background noise than for things like TV shows and movies. If the input audio is noisy, I would do some noise filtering before piping it into conversion.

Which conversion tools/services do you have in mind?

Google and Azure

As an easier problem, what I’d find useful is a way to keep a pirated audiobook and pirated e-book in sync, the way that Amazon does with WhisperSync. A single app where I upload the .epub and the mp3s and it keeps me in sync when I read in either format.

You can also do this in iBooks with any of the built-in voices available on iOS.

Just turn on Speak Screen in Settings -> General -> Accessibility -> Speech and then swipe down with two fingers while reading your book. It'll even turn the page for you.

This is awesome. But doesn't Amazon bill you for usage? What's keeping you afloat?

Each conversion costs a couple dollars depending on length - cheaper than most audiobooks at the expense of human realism. You can listen to a sample of a human read version of accelerando: https://www.audiobooks.com/audiobook/accelerando/210129

The one generated by auditus is too smooth, slightly unnatural

Here's a human narrated sample for comparison: https://www.audiobooks.com/audiobook/accelerando/210129

That link returns a 404 for me.

Try again?

I usually do this by hand with surprisingly good results: I use calibre to convert the ePub to txt and then fix some common problems (i.e. remove line breaks and page numbers) using regular expressions. Then I convert it to an audio file using the macOS Automator text-to-speech action (be sure to download the high quality voices first).

Update: Server's overloaded right now. Any conversion that has not been sent will be delivered by end of tomorrow.

Love the idea about the project. I tried to upload a Epub and got an error page. I tried 3 times and different voices. I look forward to seeing more of it and think it's an awesome idea.

Send me the epub and I will convert it for free! Thanks for catching the bug, will look into it soon. Edit: Was the epub in English?

This doesn't seem to be working. I have tried uploading a sample epub. After the epub is uploaded it sends me to a conversions page. That page is just a copy of the homepage.

Also, you need to select the file type on the left, currently epub is the only option but PDF support is planned soon.

Send me the epub and I will convert it for free! Thanks for catching the bug, will look into it soon.

Part of me would actually really enjoy having the option to use an old school voice generator. Imagine a horror novel narrated by MS Sam

Really neat solution.

Is this in any way based on Amazon Polly?

Looks like its down. I get a "The page you were looking for cannot be served." error.

Metamask warns me that this site is on the Ethereum phishing list...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact