Right now, I'm splitting all the text into 4,000-character chunks (OpenAI TTS limitation), and converting them into audio "on-demand".
When it's like 1-2 minutes before the end of the current chunk — I'm starting to generate the next one, for a seamless transition.
One chunk is taking about 30-40 seconds to generate (OpenAI API is 20-30s, Azure OpenAI API is ~40s).
I was planning to convert the whole book (just by queuing and parallelizing the requests) and concatenate it into a single MP3 (or an MP3 for each chapter), but it's not ready yet.
When it's like 1-2 minutes before the end of the current chunk — I'm starting to generate the next one, for a seamless transition.
One chunk is taking about 30-40 seconds to generate (OpenAI API is 20-30s, Azure OpenAI API is ~40s).
I was planning to convert the whole book (just by queuing and parallelizing the requests) and concatenate it into a single MP3 (or an MP3 for each chapter), but it's not ready yet.