Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Project Gutenberg Open Audiobook Collection (windows.net)
403 points by isbn on Sept 11, 2023 | hide | past | favorite | 184 comments


When I read Rikki Tikki Tavi to my 8 yr old daughter, we play a game. She asks me to change one of the words in the page and she tries to listen and see if she can figure it out. It is mentally taxing at the end of a long day to do that on the fly without pausing to figure out the word to slip in. And, my daughter is very sharp and catches them.

I listened to a few of these. The voice sounds muted at times, as if the reader has a stuffy nose. H.G. Wells was read with a pause in between each period because it "thinks" that each letter boundary is a sentence change, which drove me batty. And, there is zero life in the stories. It might be a good thing to put in front of a kid to put them to sleep, maybe? But, it would not put me to sleep because it is just aggravating to listen to these stories stripped of all life by AI.

Like Louis CK said: "Everything is amazing and no one is happy." I know this is incredible that AI can take in a transcript and produce something that most people would be able to distinguish between a real human. But, we should ask if you would want to hang out with the voice actor at a party.


> Like Louis CK said: "Everything is amazing and no one is happy."

Everything is not amazing. Sure things are amazing from a technical perspective. But most tech advancements I think have been harmful to society in the last 30 years or so. Its awesome that computers are so powerful and we have awesome video and photos and can share things so easily. But technology should better lives, and not cheapen it, which it often does. Tech is being used to try and replace essential human lived experiences to try and inject advertising into it and extract money.

Technology can not replace the human, its impossible. No matter how good the AI is at reading the book, it will never replace sitting next to your parent and them reading it. No matter how easy it is to share a video or a photo, it will never replace sitting next to someone and them showing you photos, or better yet being there when the photo was taken.


I forget the exact quote but the thing I've seen making the rounds sums it up pretty well: Computers were supposed to do the work so people could make art and write poetry. Now the computers are making art and writing poetry and I still have to have a job.

In another life I'd love to do voice over work. (I even have a face for radio!) But, instead, technology is being used to avoid even having humans do that type of work. Sure, today it's PG, but they're definitely doing this with an eye to replacing actual voiceover actors.

Every advance in AI is "how can we replace people and save money?" and not "how can people have better lives and work less?" And it's going to continue until it's "what the fuck do we do with all these jobless people who've been replaced?"


As software developers, we know what getting workers to have better lives while working less looks like. There's some sleight of hand at play, though, in the employer/employee relationship (favoring the employer).

> Every advance in AI is "how can we replace people and save money?" and not "how can people have better lives and work less?"

It's not just AI, but technology generally. And it's because when it comes to managing people, organizations for the most part don't actually concern themselves with getting their employees to produce value—that is, whether they are, and how much, and at what cost (to the business) it comes at, and where that measure of productivity lies (objectively) when scored against some rubric. Instead what they make their most immediate concern is whether their employees are exposed to sufficient toil. Look at any example that involves someone accepting a new job with a set of work duties/expectations where they proceed to automate part of their workload and thus provide the same value (or more) in comparison to what they were doing before, or in comparison to their coworkers, or in comparison to whomever would have ended up with the job if the person who did accept and automate it had accepted an offer elsewhere instead: they end up soliciting feedback (or opining themselves) about whether what they're doing is unethical.

This is the mechanism that wealth disparity through concentration of wealth comes from, but everyone (the employer and the employee alike) walks around as if they either don't notice it or—if they do—as if it's wrong when there's a known path for the concentration to flow upward but it isn't happening.


> favoring the employer

If you've ever been an employer, you'd be disabused of that quickly.


Disabused of what?


> their most immediate concern is whether their employees are exposed to sufficient toil.

Protestant work ethic, twisted and disfigured through late capitalism has become a sadistic and wholly disgusting human trait. To impose moral, intellectual and physical labour on others, not of necessity, nor to create value, but to serve a system rooted in guilt and a craving for validation in the eyes of others is about as un-Christian as can be.


Wealth in a free market is not concentrated, it is created. It does not "flow", either, as free trades are an exchange of value, not a flow of value.

Wealth disparity comes from people creating different amounts of value.


Only until your business is targetted by big corp and goes bust/bankrupt.

Or acquired. Then the wealth flows upwards, employees are cut, to make things more profitable, and the people who originally created something great are not getting much for it. Instead, if they are not let go, they are under new lords, who take a big chunk of the profits.

Or a competitor gets VC funded and by means of marketting and sales, instead of actually making a better product and your business' product's adoption is dwarfed.

I think there are many reasons why some business can fail, and most of them are not about the amount of created value. The free market is not a rationally acting person.


The reason businesses fail is they cost more to operate than the value they produce.


You haven't responded (clearly) here to anything I actually said. You just posted two short, dismissive comments consisting of glib non-specifics.

If you want to dispute what I'm saying, how about starting with the example I gave (an employee figures out how to automate part of their job, enabling them to either 1. deliver the same amount of value to their employer at a fraction of the effort, or 2. deliver something like 2x–10x or more value, owing to the fact that they've been able to automate it)?


If you automate part of your job resulting in a 2x improvement of your productivity, you have demonstrated a skill that you can sell for more money. That's how you realize the value you created.

The wealth didn't "flow" to you. You created it.


> The wealth didn't "flow" to you. You created it.

Er, right. The "flow" here refers to what happens to the wealth after creation.

If after you create it you or your employer then undertake some change to the work arrangement (e.g. imposing a higher quota on productivity—thus allowing your employer to capture the additional value that you created while keeping your net wealth constant, along with the day-to-day toil you the worker are subjected to, probably—or maybe even increasing it), that would be an example of wealth flowing upward.

You created outsized value relative to needs of the employer. Your (wealthier) employer captured it and enriched themselves. That's wealth concentration.


When buying lunch are you paying the (fixed) price asked by its creator or are you sharing (some of) the future value that hamburger is providing you in nourishment and work energy for the rest of the afternoon?


What?


I think it’s more complex than that. Wealth is often created by monopolising things (e.g. enclosures) instead of by creating them.


There are 33 million businesses in the US. Are they monopolies?


On what do you base these statements?


Listening to and reading books by economists. If you think about it, you'll see it in action all the time. After all, consider yourself. Does wealth "flow" to you? Or do you create value at your job, and exchange that value for your paycheck?


Some Iyn Rand novel would be my guess.


> Every advance in AI is "how can we replace people and save money?"

this is not true now, and also does not have to be true. Instead of a "look at the incentives" talk to someone having a bad comment moment.. instead we can be reminded of Doug Englebart, who said "computer systems can augment human intelligence and team interaction" and specifically NOT "replace humans" .. As I understand it, in Palo Alto, Doug found great interest among the DoD crowd .. a good portion of whom would have a second meeting after his demos, and then discuss how they can get back to the important work of replacing people.

Consider the incentives, consider who has an interest in this hype cycle, and sales profits. When you see a US visit to Vietnam this week, with MSFT pitching "social trust" AI services to "ordinary people" .. does this really sound like trust in the making? Is AI drones in combat really what we need now ? Replacing striking Hollywood writers and getting name-brand actors for pennies on the dollar, is that what "we" need?

I do not agree that AI can only replace people.. however, there is a lot of short term profit and control ready for those that do.. maybe something needs to be done about that?


> what the fuck do we do with all these jobless people who've been replaced?

Around 1800, 93% of labor in America worked on farms. Today we have jobs that were unimaginable in 1800.

> "how can we replace people and save money?" and not "how can people have better lives and work less?"

Those two are actually the same thing.


In my opinion we seriously need to think about providing people with new perspectives, as we replace their jobs / automate them away. We need a social system, that encourages learning at every step in people's lives. A nation should have an interest in getting people back into meaningful jobs and should act according to that interest. The coal mining industry worker, who loses their job, because we no longer want to mine coal? How can we get that person a good new job? How can we make it so that that person gets the necessary qualifications?

We are still (I think in most countries around the world and at the very least where I live.) throwing away enormous amount of human potential.


> Those two are actually the same thing.

How exactly are they the same thing? It seems that the savings are made by the employer here at the expense of the employee.

There’s no guarantee that the savings will be passed on as price cuts.


> There’s no guarantee that the savings will be passed on as price cuts.

Profit margins tend to be consistent across industries, meaning savings wind up as price cuts sooner or later.

(Unless the government interferes with the price setting incentives.)


It will probably all fall apart when there is no one left to purchase this stuff, no job, no money, no purchasing power.

Once purchasing power has evaporated, then and only then will the system change.

Alternatively AI will also replace the jobless.


We'll invent a third World War long before that happens - to thin the herd and remind everyone using rationing and austerity about how great consumerism is, while creating plenty of jobs rebuilding the industrialized world.


it's simple and it works.. every time. /s


I wouldn't blame folks wanting to work on fast takeoff AI with no human alignment concerns. Heads, the world ends because you've bootstrapped something unsympathetic and more powerful than humanity. Tails, you've bootstrapped something that might be able to overpower entrenched interests, providing a chance at a better societal outcome.


> I'd love to do voice over work

Voice over work and screen actors put stage actors and burlesque workers and traveling minstrels out of business.


> Every advance in AI is "how can we replace people and save money?"

I think what happens is that the repeat jobs are automated, and the (remaining) people get the hard corner cases.


I think the thing that has surprised everyone in this revolution is that the opposite has happened. Musk wasted billions trying to automate vehicle manufacturing while AI is threatening to take the jobs of novelists and graphic designers.


On the other hand, I am reminded of a quote by Christopher Hitchens (from memory), “They say that everyone has a book in them. For most of them, it would be better if it stayed there".

Some of the films and TV programmes I've watched recently have made me wonder, as I gaze across at the writers on strike who have some legitimate concerns but who have also provided some bloody awful writing, if I wouldn't prefer AI to take over the production of art - it certainly wouldn't be able to produce a messy bed, would it? That'd be a win too.

https://en.wikipedia.org/wiki/My_Bed


> I gaze across at the writers on strike who have some legitimate concerns but who have also provided some bloody awful writing

Too often the writing of film and TV is dictated by the producers/studio -- people who have an interest in financial returns, not quality. Those writers would probably love to write their own show, their way, unhindered, and would probably produce something watchable.

Of course, writers subvert their instructions sometimes to great effect. On BSG I believe they were told their show was "too dark" and someone insisted someone have "a birthday party". Which they duly put in, and then had them all die in some kind of terror/bomb incident.


90% of books and movies are not worth the time to read or watch.


It hasn't been a surprise to anyone in the field. Turns out it's much easier to read digital content in the form of bits then to read real world data. Hardware is harder than software.


These kinds of responses are so easy to write after the fact. Show us a quote from ten years ago then, please, that says that creative writing and art will be among the first things to be automated at a mass scale. Since this

> hasn't been a surprise to anyone in the field

apparently, it should be pretty easy to find such a quote.


> it will never replace sitting next to someone and them showing you photos

It definitely does replace that. It sucks so much to be trapped next to someone showing you their photo album or vacation slides, when you don’t really care, that this became a stock scene in 20th-century comedy TV series and films. Nowadays when people are sharing their photos online, that gives their peers the choice of whether to look or whether to ignore, and that is immensely freeing.


The photo slide show of someone's vacation was a stock scene in comedies. But have you never sat down with family and went through old photos? Having conversations about where was that? who was this? who was this as a baby? Its a very different and much more personal experience than flipping through facebook.


He cherry-picked one of the three examples in your post and attacked it. He won't choose the other two because he has no argument for them. Ignore.


Sure have, it’s hellacious. I don’t give a fuck about who that baby was, why would I? Relations who my parents only vaguely remember going somewhere boring that I would never go, or someone’s 12th trip to the same lake, is a great use for Facebook. Most people are crazy boring, if I care I can always ask.


To quote a GP comment:

> But most tech advancements I think have been harmful to society in the last 30 years or so.

I thought this was overly cynical until I read your comment. Now I'm not so sure. Has our attention span really become so shot that just being with family has become a chore, and we'd rather our parents just post their life stories into the void that is Facebook?


Sounds like I struck a nerve. Next time you’re telling someone a detailed story about what route you took to dinner, stop and ask yourself if it’s actually that interesting.

Seriously though, technology has nothing to do with this. People have been bored out of their mind by other people for centuries, the only difference is there’s no excuse for having nothing interesting to say now.


You’re talking about _being shown_ photos you don’t care about, while GP is talking about showing your own photos to someone. I agree with the example you’re discussing, though.


Ah, you’re talking about main character syndrome. Showing pictures to someone is one of the cruelest things you can do; you’re probably boring and a terrible storyteller (most people are) but they’re going to feel obligated to not tell you that.

It’s always amazing to me that people almost universally hate other people’s slideshows, and then don’t have the self awareness to realize that they do the exact same thing.


>Showing pictures to someone is one of the cruelest things you can do

So you are saying that friends, family and significant others better no share old photos and memories with you? Because that is one of the cruelest things to do?


They can share those old photos and memories on social media, where their friends, families, and significant others can choose whether to look or ignore it.


That’s not what I’m talking about, just for clarity.


I'd suggest reading How to Do Nothing by Jenny Odell [1]. I think it addresses some of the concerns you have.

1. https://www.goodreads.com/en/book/show/42771901


Please don't dissect an aphorism by a standup comedian like it was a Ph.D. thesis.


Elevenlabs is a lot closer to compelling audiobook narration (needs a better way to deal with multiple characters in a story without manual use of multiple voices): https://pub-a24da573c61f4b2d905bdebb2d0ecf88.r2.dev/ElevenLa... (an H.G.Wells example I just asked it to read).



thanks!


I was going to mention ElevenLabs, too. Their samples are very impressive in how the intonation and word stress are varied based on the text’s meaning. Their pricing is a bit high for personal use, though.

(The link you posted seems to have been truncated. Can you try posting it again?)


Yeah, sadly it'd cost about $100 to get a book per month... Not quite competitive with Audible yet, but give it a year perhaps, or a few iterations of the open source models... (fixed the link)


100 dollars per book, right, but that book is public and can be shared between millions of people.


Any open source alternatives?


None of the open source models I've seen are as "well-rounded", production ready as Eleven Labs. Though for example bark is really great at prosody: https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a4... And piper isn't bad at speech quality: https://rhasspy.github.io/piper-samples/

We might only be a few papers away from a good open source Elevenlabs competitor.


Now you have a pretty good idea how blind people must feel. Yes, a good audiobook should be read by a human. But if you dont have that, speech synthesis is the best or even only thing you can get. And then, many years later, you read a post like yours. And you realize that man is spoiled.

Signed, a blind man


> But, we should ask if you would want to hang out with the voice actor at a party.

I think the question is really “Will I be able to enjoy great books I otherwise would not have experienced?”

For me, it’s not that these are superior or equivalent books to parents reading to their kid or voice actors; it’s whether I’ll listen to a book for free that I wouldn’t be able to afford $10-30.


Plus lots of books don’t have audiobooks. I’ve a few sitting on my to read list for years on end just cuz there’s no audio. Being able to make one myself with AI would be awesome.


This precisely.

Also, I’ve bought ebooks and want to listen to the book but don’t want to pay again for the voice acting. So I’ve bought a license to the IP and would like to listen rather than read.


Audible is $7.95 a month and you can listen to whatever book you want (like Spotify). If you’re not willing to go even with that in order to listen to an actual human, then maybe yeah, you can try AI.


> Audible is $7.95 a month and you can listen to whatever book you want (like Spotify)

Not true at all. Audible Plus gives you access to a tiny subset of the full library, the rest (which includes all the best titles) need to be purchased separately.


Unless things have changed since I was a subscriber, you get a token every month which can be used to purchase any book from the full library. So its effectively 1 book a month + a few extras bonuses for $7.95


You don't get a token without paying for the premium plan at $15 a month. Also, don't tell anyone but if you subscribe and then cancel and give the reason that it is too expensive you can often get a reduced price the next few months.


It's not at all like Spotify. The library you get for $7.95/mo is very limited. If it was like Spotify I'd happily pay a hell of a lot more than that.


Audible is $15/month and you get to choose one title.

I think you’re confusing audible today with audible of 20 years ago.


I typically buy at least two titles per month, and the best deal ended up being:

Audible Premium Plus - 1 Credit Every Other Month for $17 ($8.50/mo)

You can buy 3 more credits for $37.99 (12.66/ea). It’s also worth checking individual titles because quite a few cost less than the credits.

Correction: I guess I actually buy slightly fewer than 2/mo because there’s a plan for that ($22.95/mo) that’s cheaper than the 27.50/mo from my numbers above. I had that one for a while but ended up feeling pressured to use them before they expire.


You probably won't be interested since it's even more pressure to use them before they expire, but there's also annual plans which are even cheaper if you can be happy using 12 (or 24) tokens within 12 months (you get them at the start and they expire at the end of the year):

Audible Premium Plus Annual - 12 Credits $149.50/year (way cheaper in UK: £69.99/year)

Audible Premium Plus Annual - 24 Credits $229.50/year (£109.99/year)

US: https://www.audible.com/ep/memberbenefits UK: https://help.audible.co.uk/s/article/what-are-the-different-...

Although, as soon as I'm logged in with my account (UK) which had subscribed in the past but isn't currently, it doesn't seem to be giving me any options except to start a 1 month free trial for 1 token/month, not sure if other options aren't available or just extremely well hidden...

edit: no it is available for my account, though I'm going to remain a non-subscriber and keep using my local digital library :)


Hah thank you, the optionality is worth $0.20/mo to me.

I wish there were better options for listening 20-25 hours/mo. Like Amazon in general, the selection + convenience is tough to beat.


We're living through the Great Enshitification.


And it is living through us, or on us.


Seth Godin did a whole Akimbo podcast that was written by ChatGPT, and the audio was AI generated. The voice was spot on, the content and delivery was dead. I almost fell asleep listening to it, which is NEVER the case for any other episode of Akimbo I've listened to.


> H.G. Wells was read with a pause in between each period because it "thinks" that each letter boundary is a sentence change

This is why I'm a firm "two spaces after the period" guy. Makes it unambiguous the difference between the abbrevs. period and the sentence-end period. Otherwise you get sentences like "Let's not forget that Dr. Principal does not care about this." which can be read in two valid ways.


Of course some style guides would tell you not to put a dot after "Dr" because "r" is the last letter of "Doctor". Similarly, the abbreviation of "Saint" would be "St", while the abbreviation of "Street" would be "St.", according to those style guides.

Meanwhile the GB military style guide says never to use a dot after any abbreviation, I think.

Also, the style guides I'm familiar with prescribe "H. G. Wells", rather than "H.G. Wells", but "H.G.W." if you're abbreviating all of the words.

None of this is of much interest to anyone who isn't an editor but I thought I'd mention it anyway.


> "H. G. Wells",

Right. That's probably the most common historical form, and is a good example of how the punctuation for sentence-ends and abbreviations is often the same - period and then single-space.


This trick doesn't work across linebreaks (unless you adopt a rule like "treat the spaces in the nouns as non-breaking and do not permit a linebreak for anything that isn't a sentence boundary").


Emacs does (or did) exactly that, perhaps by default: I think I had to disable it once because it was annoying me ... (setq sentence-end-double-space nil)?


Not the same thing.


Sidenote, I asked ChatGPT about where to put the comma and how it would change the meaning of the sentence. It got it right.


Fair point, the sentence I invented off the top of my head isn't perfectly grammatically correct but it's close-enough that it shows the ambiguity problem. It's a lot to ask text-to-speech and typesetting programs to figure out contextually which periods are abbreviations and which periods are end-of-sentence, and so having a hard text cue like double-space would help. Then typesetters would have a hard cue of when to replace the space with a thin-space (which is supposed to happen in the case of something like "H. G. Wells").


How does it feel to have websites and books and newspapers and practically every other place silently ignore your double spaces and treat them as a single space?


See also: Librivox [1], for public-domain audio books read by actual humans

[1] https://librivox.org/


Lit2Go is also good: https://etc.usf.edu/lit2go/

The sibling poster is right, the quality varies. But the upper end of the quality range is really quite good. One of the best-read audiobooks I've ever heard was a Lit2Go edition of Pygmalion. And, for that matter, one of the worst-read audiobooks I've ever heard was an edition of an extremely famous and commercially successful book that I bought on Audible.


Are you a specist? Why should we value more an audiobook that's read by a human?

Sorry, just joking. But here's a reason: these things were not quality-checked at all. Click on Moon Voyage by Jules Verne and be greeted with a very human-like voice reading an numbered list of "other works by the author" in an extremely awkward fashion that's probably caused by how the .TXT file is organized.


Most likely the AI was actually trained on LibriVox, potentially even on largely the same books...

EDIT: The first book on the list https://marhamilresearch4.blob.core.windows.net/gutenberg-pu... is "100%: the Story of a Patriot" and the LibriVox version is at https://librivox.org/100-the-story-of-a-patriot-by-upton-sin...


Not a good idea then. The librivox experience turned me away from professionally read audiobooks for far too long.

Amateur readers are hit & miss. A lot of professional readers are actors or have a lot of experience. There is a reason people do pay for professionally read books instead of electronic reading or librivox only.


Thanks, I'm happy this exists. I think I'll start contributing this fall.


I was kind of hoping this was going to be human beings contributing read aloud versions of Gutenberg content. Since it’s not, I’ll propose a cool project. Raise money to enlist high quality voice actors to create audiobooks from Gutenberg. Release these audiobooks to the world for free. Which books come first could be voted upon. As someone who has used TTS a Lot in recent projects, I’m not excited about listening to AI read a book to me. It feels soulless.


>As someone who has used TTS a Lot in recent projects, I’m not excited about listening to AI read a book to me. It feels soulless.

AI TTS still uncanny valley enough to distract. I prefer even more soulless traditional TTS which sounds "neutral" after habituation. To the point where my brain can start layering on characterization as if I was reading. AI TTS feels like listening to to medicore voice actor, where it's hard to overwrite their creative choices, so just left disapointmented and annoyed.


I agree completely! I kinda like the neutral tone of a soulless robot when it knows how to stay out of the way. Far better than a bad AI _or_ a poor human reading.


I have used TTS in the past and in the last few years there has been a quantum leap in TTS quality. A similar such leap in the next few years and it will dominate the audiobook scene for good or bad.


AI might dominate, but it would be a loss. Here’s a tutorial explaining modern audio fiction:

https://www.drabblecast.org/2018/07/30/inside-drabblecast-au...

(In audio format, of course; roughly 1.5 hours)

————

This episode takes you inside Drabblecast audio production. Ever wonder how we produce an episode of the Drabblecast? Wonder no more!

We dig into all the technical aspects like voice acting, sound editing and mixing, foley effects, music and more.

Preproduction? Reading? Acting? Yeah, it’s all here folks, all the blood sweat and tears that go into every production of the Drabblecast.


It might be worse than human narration, but at some point the economics becomes so loopsided that it's dominance is inevitable. One good thing I can see coming out of that will be an abundance of audiobooks of copyright expired books.


Are the economics actually better, or do they look better due to a lack of quality control? Because no TTS - even the most current AI ones - are perfect. They need corrections, which involves a human's time. And it's time that dictates prices, not skill (which largely reduces time).


The key is just which time is faster. If you are able to just listen to it once, and note a few errors, and slightly adjust, it may still may be much faster to use AI.


Based off Apple’s advertised times to produce AI audiobooks, the times are comparable. AI is not running quickly nor inexpensively for this task it seems.


The economics are only lopsided if the cost of producing the audio version is significant compared to the cost of writing the work of fiction.


Does anyone know of and TTS available now that doesn't completely muck up foreign words? I know you can make custom pronouncing dictionaries to use with some of the open source ones, but I wonder if any of the more modern systems are good for this. I have been listening to the english news podcast from a japanese news paper that is made with TTS and it gets its one job, pronouncing Japanese names and places completely jarringly wrong.


Shameless plug - if you download lots of audiobooks and need help organizing them and figuring out which to listen to next, check out Audiobook Locker: https://gitlab.com/fonner/audiobook-locker. It's a desktop app (built with Tauri) that scans your audiobook folder and lets you sort, search and tag your audiobooks.


Sounded better than the handful of random corresponding Librivox recordings I listened to in order to compare them. To be honest, a lot of people go to great lengths to make Librivox recordings without having the skills to read aloud.

Which is a pity, but nonetheless.


I agree. I love the idea of Librivox, but the volunteers vary widely in quality.

Some are non-native English speakers, some have lisps or other articulation problems, some have other marks of fluency deficiencies, some have under- or over-dramatic intonation, etc.

And even if they're perfect voice actors, often their microphone setups are sub-par, and it comes through in the recording.

Frankly, these AI voices are now at a level where the few mistakes they make are easier to forgive than some of those issues from human readers.

That said, the final hurdle -- giving them the brains to know when to skip or resolve hiccups in the source material, such as typos, formatting issues, or text not intended to be read aloud -- is going to be very hard to overcome.


> hiccups in the source material

From Joyce's Ulysses (capitalization possibly wrong): "nes. yo."

Good luck with that!


Absolutely.

So, will your books be available with an audio section as a free encore in the future? :)


It's getting more passable. As someone who listens to a lot of TTS at high speed for years, eventually I adapted my brain do it and now it feels similar to phsyical reading with subvocalization where I can adjust the voices of characters. It's occasionally even preferrable, i.e. too much over produced podcasts these days where I just TTS the transcript.


Great news. It seems there is still much room for improvement, though. E.g. in “A Short History of the World” by H. G. Wells, the AI reads Darius I and Charles V as “Darius Eye” and “Charles Vee”. Open and closed brackets in sentences are not read out. The intonation is also a bit unnatural. But it is intelligent enough to parse 1,782 as 1782 rather than two numbers.

Another problem is that the audio clips are not broken into sessions. There is no way to locate the beginning of a chapter, for instance.


I noticed the same, especially when some kind of structure text (like a bullet point or numbered list) comes in.


Some podcasts embed chapter start timestamps into the rss (atom? m3u?) metadata.

That way, you get one file (and gapless playback), but most players have a chapter skip button that will do what you are asking for.


Oh, snap, we've been working on importing Gutenburg to LR:

https://www.languagereactor.com/m/t_en_-

We're ranking them using the download count, and also this prompt to chatGPT (it's primarily for language learners):

"Is this text engaging and interesting for a modern reader, someone not into fine literature? Rate the text excellent, good, ok or poor. I don't want crusty, flowery, contorted language, talking about buttons and mannerisms and the hue of the sky etc."

Then, we're rewriting the ~1000 most popular books using chatGPT to modernise/simplify the text.

Using some markdown as an internal format, drawing from the gutenberg plain text and html formats, this will go to a github repo shortly.

There's translations, and then, need to look at current best TTS voices.


Standard Ebooks has the ability to filter books by reading level.

That seems much better for people trying to learn English.

https://standardebooks.org/

They carefully curate and copy-edit their books, and go for quality over quantity. I think that is probably the right choice. We already have free access to an effectively infinite amount of mediocre content on the internet.


I did check that out. The thing is, the criteria is a little different. This is for learners who need to practice reading English. The scripts surface a lot of material in Gutenberg that otherwise would be time-consuming to find. 1950's science fiction that still reads well, or story books for children. Someone could certainly call these books 'mediocre', but I think that's a bit adjacent to what we're trying to do here. If this idea is worth the effort relative to other sources of material.. still undecided.


Just for fun, here's what happens to Pride and Prejudice:

User: Rewrite and simplify the following text for a modern audience: "It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of a wife. However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered as the rightful property of some one or other of their daughters."

ChatGPT: "People generally believe that a rich single guy must be looking for a wife. Even if we don't really know what he's thinking when he moves to a new area, everyone assumes he's up for grabs by one of the local girls."


This is a cool project. I'm learning French and am always on the lookout for sources of books -- and especially audio -- at the right level.

To others questioning the need for rewriting: language changes over the course of a century! It's no problem for me, as a native English speaker, to read English from around the turn of the 20th Century. However, it increases the degree of difficulty for me to try to read French or Spanish from the same time period. Also, to get through an old book I have to learn a bunch of useless words that have fallen into disuse; it's much more effective for me to learn the version of the language that is relevant in today's world.

It looks like the French TTS isn't working properly yet? At least on the couple of titles I tried. The selection of titles you have listed looks great.


If you've already determined the text is interesting to the modern reader, why rewrite?

Wouldn't it make more sense to look for texts that are thematically relevant, but with inaccessible language - and rewrite those?

I still shudder to think how this system will handle something like Shakespeare's sonnets...


It's for students learning English. The top 1000 will be rewritten and be available in addition to the original version, and about another 10,000 available in original only (selected using vocabulary frequency analysis and the chatGPT query).

Midsummer Night's Dream:

THESEUS Now, fair Hippolyta, our nuptial hour Draws on apace; four happy days bring in Another moon: but, O, methinks, how slow This old moon wanes! she lingers my desires, Like to a step-dame or a dowager Long withering out a young man revenue.

HIPPOLYTA Four days will quickly steep themselves in night; Four nights will quickly dream away the time; And then the moon, like to a silver bow New-bent in heaven, shall behold the night Of our solemnities.

Rewritten:

THESEUS Now, dear Hippolyta, our wedding day is approaching quickly. In just four more happy days, a new moon will arrive. But oh, it feels like this current moon is taking forever to disappear! It's holding back my excitement, like an old, slow stepmother or a widow who keeps a young man's money for far too long.

HIPPOLYTA Don't worry, Theseus. Four days will pass by in no time, and then four nights will also quickly slip away. After that, the new moon will shine brightly in the sky, like a freshly bent silver bow, and it will witness the night of our special celebration.

Hmm. Shakespeare is the extreme case. It does make it more accessible. The style is lost. shrug


It's amazing that a decade ago I'd have called you an exceptionally demented individual for doing this, but these days there's so much stupid unconscionable shit going on with AI that it hardly stands out.


floren: In epochs bygone, a mere decade hence, I would have proffered thee the epithet of an egregiously unhinged denizen, were thou to embark upon such an endeavor. However, in these current times, wherein the domain of artificial intelligence doth teem with preposterous and morally indefensible machinations, thy conduct emerges as naught but a faint ember in the vast tapestry of perplexities that now envelop us.

davidzweig: Hark! Methinks 'tis the act of rewriting, not the sifting, that doth wound thy pride most deeply, and affront thee grievously. Set thine eyes upon the instance presented hence, to glean a clearer understanding of our noble mission, one that doth strive to render these splendid volumes more reachable:

Original:

The other person was a man named O’Brien, a member of the Inner Party and holder of some post so important and remote that Winston had only a dim idea of its nature. A momentary hush passed over the group of people round the chairs as they saw the black overalls of an Inner Party member approaching.

O’Brien was a large, burly man with a thick neck and a coarse, humorous, brutal face. In spite of his formidable appearance he had a certain charm of manner. He had a trick of resettling his spectacles on his nose which was curiously disarming—in some indefinable way, curiously civilized. It was a gesture which, if anyone had still thought in such terms, might have recalled an eighteenth-century nobleman offering his snuffbox.

Winston had seen O’Brien perhaps a dozen times in almost as many years. He felt deeply drawn to him, and not solely because he was intrigued by the contrast between O’Brien’s urbane manner and his prize-fighter’s physique. Much more it was because of a secretly held belief—or perhaps not even a belief, merely a hope—that O’Brien’s political orthodoxy was not perfect. Something in his face suggested it irresistibly.

Rewritten:

The other person was a man named O'Brien. He belonged to a group called the Inner Party and had a very important job that Winston didn't fully understand. When people saw O'Brien, they became quiet, because he was a high-ranking member of the Inner Party.

O'Brien was a big, strong man with a thick neck and a rough, funny, and tough-looking face. Despite his intimidating appearance, he had a certain charm in the way he acted. He had a habit of adjusting his glasses on his nose, which was oddly disarming, and in some way, it seemed polite and refined, like an old-fashioned nobleman offering his snuffbox.

Winston had seen O'Brien about a dozen times over the past few years. He felt a strong connection to him, not just because he was curious about the difference between O'Brien's polite behavior and his tough appearance. It was more because Winston secretly hoped that O'Brien wasn't completely loyal to the government's beliefs. There was something about his face that made Winston believe this, even though it was just a feeling.

---> Orwell already has a clear style that I very much admire. The somewhat minor changes here would make for easier parsing by students, without changing the message much. I can't imagine Orwell would have objected.

I think your message is misplaced and pretty rude.


The example doesn't change my mind.

It throws away the flavor of the original text. Why reword the great works of literature into YA-level blandness? Might as well just read the Cliffs Notes at that point. 1984 isn't a particularly challenging work, either, and I think readers do themselves a disfavor if they don't read the original.

Edit: hey I do want to apologize my tone in the first comment though, it was too much. Sorry.


Our software is used by Koreans, Turks, Brazilians, Hungarians etc. that are trying to learn a foreign language (English). If importing (and rewriting some of) Gutenburg was the best use of time for them, well, I'm not sure. If you don't see a difference in difficulty in the two texts, then, I don't think you are able to appreciate the perspective of a language learner.


Well, the filtering prompt could have caused offense to someone that enjoys literature. It was the prompt I used after some iterations. If you go through gutenberg (~70,000 books), much of it is pretty undigestable to a modern reader, never mind a learner. Personally I'm finding a learning curve with chatGPT etc. where you have to relearn to directly ask for what you want.


I don't really understand the purpose here, given that literature is significantly aesthetic by nature. Can't students just read contemporary, accessible books if they want something easier?


Literature: "Anything written in print that's not Twilight." (urban dictionary) :)

Yes, they could.. but we can't put Harry Potter in the public library of texts for copyright reasons, and licensing material is difficult and not fun, I prefer to write software. Users can paste it in themselves, but then there's steps. The center of the project is still a browser extension that runs on Netflix and Youtube, we thus sidestep the copyright issue.


If you are looking for short stories, I strongly recommend audio fiction podcasts.

Escape Artists is one of my favorite production houses. The recordings are creative commons licensed, and the authors (and other artists) get paid professional rates:

https://escapeartists.net/

Other sites to check out (all are donation-supported and pay authors):

https://www.drabblecast.org/

https://www.lightspeedmagazine.com/podcasting/

https://www.asimovs.com/more-stuff/podcasts/

To give you an idea of what’s available, “Money in the Bank” by John Kessel and Bruce Sterling will likely sit well with the HN crowd:

https://www.lightspeedmagazine.com/fiction/money-in-the-bank...

I could have picked literally 100’s of other stories; this one wins due to recency bias and the authors cyberpunk roots.


For horror short stories - I'm a fan of the "Horrorbabble" podcast - https://www.google.com/search?client=firefox-b-1-d&q=horrorb...


This has dictation errors in the very first two words, in the title no less. https://ia801604.us.archive.org/29/items/synapseml_gutenberg...

"Mrs. [pause] Shelly by Lucy Maddox Brown Rosetti"

I expected a bit better than this for a launch of "the next amazing cool thing"; distinguishing between full stops and honorifics seems pretty dang basic. As xrd said issues like this make the books unlistenable, it's too distracting and weird. didntcheck plugged librivox which is nice if mixed in terms of quality, I'd also plug "libby" for anyone who doesn't have it: check out audiobooks from your library.


I picked Alice's Adventures in Wonderland[1] just to check out the quality and was met with:

>Lice was beginning to get very

>tired of sitting by her sister on

>the bank and of having nothing

>to do. Once or twice she had

Great concept but jeez the execution leaves a lot to be desired.

[1] https://ia801606.us.archive.org/35/items/synapseml_gutenberg...


Very cool. Although they shouldn't have even bothered with the poems, it sounds terrible

https://archive.org/download/synapseml_gutenberg_a_bell_s_bi...


I thought I'd look up something I knew and spotted 'Thérèse Raquin', which gets butchered into 'Rackwin'... some way to go yet, I think!

https://ia804709.us.archive.org/35/items/synapseml_gutenberg...

(Hats off for the effort, though.)


I put together a script to read epub books using Coqui TTS and I think the results are not far off from this. It's super quick if you've got a GPU, but it's reasonable too if it's just using CPU to do the text to speech.

https://github.com/aedocw/epub2tts


Does this handle text cleanup? Eg replace Roman numerals so they aren’t read literally etc? May need to dust off my Python for a Pr if not


It does not handle that. A PR to replace stuff like that would be fantastic, I'd love it - please do!


This is cool. Narration of audio books is a time consuming process! I agree with some of these comments here about how AI narration can sound robotic though and may not be too pleasant to listen to.

However, for anyone who is, or knows a family member/friend with a certified disability, or is a veteran, there is a free program to listen to a vast collection of audio books (with real narration) provided by the US Government. Check out https://www.loc.gov/nls/ (Braille material too!)


>I agree with some of these comments here about how AI narration can sound robotic though and may not be too pleasant to listen to.

I have encountered readers on librivox with such terrible pronunciation that following the story was rather difficult. on the other hand, a robotic voice could work well on some cyberpunk material


I tried to listen to The Call of the Wild and it was impossible to follow since accents and emphasis on words are all wrong. I could barely understand the story. I guess AI has more work to do.


Yes, the voice sounds very natural and not computer-generated. But it gets a lot of, even simple, pronunciations wrong. There's a long way to go before this is genuinely an enjoyable and useful option.


Upon first impression, this is incredible! Audiobooks have enabled me to enjoy fiction books that I, otherwise, would not have been able to due to time constraints or other commitments. Perhaps, in the near future, AI will be able to make many obscure books that are collecting dust in museums and libraries accessible to the public through audiobooks. That is a future to look forward to.


Some human book readers have a lot more than accurate reading skills. (Can even read a title like 'Astounding Science...' (without the 3 second pause before) 'Fiction September 1959'.) https://ia801608.us.archive.org/25/items/synapseml_gutenberg...

They have wonderful voice acting skills as well. They can create multiple character voices and personalities. They can detect and express emotions appropriately, without hints from the text.

They'll continue to be highly valued, and far more enjoyable to listen to, for a -long- time.


In sampling a couple I would call these narrations "serviceable" rather than "high quality". My benchmark is the voice of my mom reading Shakespeare and Grahame, with intonation and voice to each character. Perhaps AI authored narration could do that, but these haven't.


I'm curious about the sub-domain https://marhamilresearch4.blob.core.windows.net. Are these auto-generated? I'm guessing these style of subdomains are not named by a human.


Looks like Azure Blob Storage and 'marhamilresearch4' is the name of the storage account (think website hosted in a public S3 bucket).

Azure requires these names are globally unique and only allows alphanumerics.


this is likely Mark Hamilton's static site deployed from his own blob storage account https://github.com/mhamilton723


Isn't listening to people tell stories fundamental to what we are? Wouldn't you rather be a part of a culture that cares about the difference between listening to a person's voice vs. a bot?

Edit: My concern is audio files ending up in places where they aren't clearly labelled as AI-generated.


I think we should recognize that most choices are not between bread and cake.

They are between bread and going hungry.

I am not certain AI voiced audio books are better than nothing, but that's the way I'd bet. YMMV.


Do you have the same objection to reading transcripts?


Transcripts of what? I think stories are special, and I wouldn't lump them together with "all text".


Anyone interested in free, well-narrated audiobooks should check out the Classic Tales podcast. I can't really say enough about it. The host is a fantastic narrator and the books range from Plutarch's Lives to Philip K Dick.


Exactly the same scene as I use in my daily life. It feels great to have an audiobook version of any e-book. You don’t have to wait/look for any real person to complete their audio work. I have used this tool [1] to convert and listen to many books.

1. https://github.com/p0n1/epub_to_audiobook EPUB to audiobook converter with Azure TTS, optimized for Audiobookshelf.


A good human reader doesn't read as much as he dramatises the book. This conveys ideas and feelings more than what is just written. He takes liberties doing that which makes the narration interesting. I don't know if that's possible in the same way with an AI voice. I read to my kids often and try to dramatise the books in a similar way.

The libre vox project which is contributer driven audio books is, I think, a more valuable contribution to human culture than AI generated audio files


When I'm reading to my kid I can respond to the situation too.

We're currently going through the Chronicles of Narnia, and are finishing up the second book (The Magician's Nephew). I'll do different voices for each character (I'm a professional voice actor). But we've noticed that close to bedtime it can have an enlivening effect so that it keeps him awake, so I'll just read the voices in my normal narrator voice.


That's sweet. The "negative" effects of being too good at your job. :)


Sometimes he admonishes me for doing the voices too. "I said NO voices!" So hard to please :)


Yeah it’s still robotic, I just tried. When enunciating long conjugate parts of a sentence such as “A long windy snow filled country road”, it doesn’t know that this is a one set of adjectives describing a country road. It’s a dead give away it’s AI. Maybe they can fix this.


I'm sure they will but even if they do and produce something that's exactly the same as a human narrator, there's something intangible that's lost. That's more or less my point. To make it even more intangible, I think the loss is greater to the narrator than to the listener as is the case with many of these AI generated creative pieces.


Thirty five seconds in to the very first link [1] and I hear the phrase "kon-fee de rate". It took me five seconds to realize the AI can't pronounce "confederate".

[1] https://archive.org/download/synapseml_gutenberg__quot_run_t...


I'm not even sure what this project is supposed to accomplish. I tried a few books, and gave up within a few seconds. Pauses and punctuations, intonations, the spacing of words, and a whole lot more are really out of whack - it takes me a lot of effort at times to figure out what's being said. After a short while, it was really starting to stress me out so I gave up.


The audio book for Project Hail Mary is brilliantly done with amazing voice acting and even uses effects on Rocky’s voice to emphasize his musicality. Listening to a good audio book is like listening to the perfect film adaptation - it adds to the reading experience.

There’s a long, long way to go for AI to learn emotion before I’d spend 20+ hours listening to a book read by one.


Great example. I agree that Ray Porter knocked it out of the park with that book.

Listen to this. Apple must of licensed his voice and while it is impressive, it goes to show how dead-pan the voice still is.

"Mitchell, a digital voice"

https://authors.apple.com/support/4519-digital-narration-aud...


Fantastic progress. Nevertheless, here is something for the internet veterans: Remember Microsoft Sam? MS came a long way to finally do good text to speech: https://youtu.be/3db_4xYahVc?si=SsXKvfHCabQ5rLef

Enjoy the ROFLcopter. ;)


I cannot listen to audiobooks read by text to speech they sound awful.

Only human narrators are acceptable.


Only professional, human narrators are acceptable to me. With few exceptions (e.g., Jimmy Carter), I can’t listen to an audiobook that was narrated by the author.



I applaud the effort, but the voices are too distracting for me to focus for more than a minute.


The first example I clicked on turned out to be a super racist book! Luckily the narration was repetitive (like a record skipping), a tonal, and with prosody that chopped up sentences to the point of near intelligibility.


The site is curiously broken in Safari browser.


Oh man they have obscure Plato! I can't even spend money to get all of plato read.


Hi all, we created this work and are happy to answer any questions!


So many works are labeled [] or nan.


Listening to a few of them, im amazed in what bad shape text to speech technology is. After all these years it still sounds robotic.


Search

> kritik der praktischen vernunft gutenberg

and then go to 2:00 and listen until it says "Moral reason" it's a bit creepy :(


Yet another example on how jobs will go away, goodbye voice actors.


I am an avid consumer of audiobooks and I will never pay/listen to anything AI-generated. Maybe it's just me, I don't know, but just because they have shown that it is technically feasible, that doesn't mean that there is a market for it. I am skeptical. Listening to audiobooks is already a compromise over reading the book, listening to an AI-generated audiobook sounds to me like a bit too much. But let's see.


Check out audio fiction podcasts. Some do full productions of short stories. Depending on the work, reading the text is a compromise over listening to the reading. For instance, music is extremely important to these two stories by Aliya Whiteley:

https://www.drabblecast.org/2007/12/20/drabblecast-43-jelly-...

Warning: the above has mild language and adult themes.

https://www.drabblecast.org/2010/07/27/drabblecast-173-go-be...


You'll never pay/listen to anything you're able to identify as AI-generated.


Obviously there are lots of short snippets of audio that are machine generated. But, no, at the current state of the art I'm not going to listen to a machine generated audiobook much less pay for it.


It’s not for you, it’s for the millions/billions not currently listening to audiobooks.

This reminds me of the “I’ll never listen to mp3, I love my <whatever>.” The goal wasn’t to convince existing people but to expand to new people.


I would agree but there is one big exception: books I really want to read but there's no audiobook version.

I have a particular interest in early Mormon history and the history of the western US, and there are some really great books that aren't available as audio. I ended up generating some with aws and while the voice annoyed me, I was willing to do it, and the cost was much higher than a normal audiobook would cost.

I think in reality, the more popular books will get a pro reading, but as long as it's labeled, there will be a market for ai audiobooks.


For now, actual voice actors make audiobooks listening way more enjoyable. Those AI voices are convincing but lack the soul, emotions and art direction of actual voice acting. Think about listening to a book for 8 hours with a monotonous AI voice reading it...


I mostly listen to real audiobooks now so I understand their appeal, but I have also listened to dozens of books using the primitive TTS built into my old Kindle (keyboard model). TTS was rough at first but I knew it could work, because I know blind people use it, so I stuck with it and I found that after a few hours I no longer perceived the awkwardness and it became an effective and satisfying way for me to 'read' books. Brain plasticity is a marvel.


It’s all voice and no acting


I was quickly put off by the monotone voices. Maybe it’ll improve in a couple of years, but until then I’m sticking to real voice actors. For some books like Greenlights, I reckon you miss out on most of the impact if you don’t hear it in Matthew‘s voice.


It seems pretty obvious that, at least at this point, the competition is either people doing this sort of thing as a hobby or (maybe) at race to the bottom wages. (Or not at all--as is largely the case with machine transcription vs. human transcription.)

If I want mediocre text to speech, I have that on my Kindle.


Seems like reasonable backfill for the countless books that will never get audio treatment.


True, and sadly they will go too quickly for voice actors to have any time to adapt.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: