Hacker News new | past | comments | ask | show | jobs | submit login

Hi everyone, cofounder of play.ht (the startup behind this podcast) here. let me know if you have any questions.

To give more context, the podcast was totally AI generated, the content itself was generated from a finetuned GPT3 on SteveJobs' biography, the voices were cloned from few hours of both Joe and Steve voices, even though it was tough to get good content for Steve Jobs. And the podcast artwork was generated by SD.

We will be releasing more episodes soon which will be even more mind-blowing!




Please take the naysayers with a grain of salt, this is a fantastic demo of what's possible. While the voices aren't 100% convincing on good speakers, if this was over the phone it would be indistinguishable, the content of the podcast is also spot on with Joe's sort of rambling and Steve always acting like he's making a point or telling a story.

Great work.


Honestly I expected Joe to ramble a lot more, but I understand this demo is meant for people to listen to Jobs so they probably had to cut out a lot Joe's ramblings.


Do you have Joe Rogan's permission to use his like-ness?

Do you have Jobs'?


This probably falls under the same umbrella as remixing samples into a song or parodying a public figure.


Wow, I was sure, from listening to the first few minutes, that the script was written by a human trying to be funny. The part about the NeXT Computer and the three applications was just too funny. Edit: Not to mention the reference to the movie Ghost.


How do you intend to monetize your startup? If you're planning on advertising, do you expect to pay the people you are cloning?


We will never use any cloned voice in any commercial way without consent and compensation, we only wanted to show the community what is possible and what generative AI models can do.


Aren't you using the cloned voices to generate marketing material for your company to make profit? Indirectly profiting without consent or compensation seems like a difficult ethical line. It also seems like something you want to have a very good stance on when a law could be passed that completely shuts down your ability to operate.


I know the reaction here is mixed and tbh that's what makes HN so interesting for me. But FWIW I love this podcast! It's a great demonstration of what AI can do. I am going to share it with my students before the next class so that they see what can be possible.


Thank you :) that is the main point of it, to show people what is possible and inspire them to create.


Have you reached out to Joe Rogan's team?

It seriously seems like something he would be interested in putting on his show feed, and accompany it with a real interview about the state of AI.


How much human editing was done on the output from GPT3?


>...the content itself was generated from a finetuned GPT3 on SteveJobs' biography

Was the actual dialog generated by GPT-3, or just Jobs' responses? If the former, was any of the dialog human edited/spliced together or was the entire script generated as a single output?


No, All was GPT3 generated, check the other comment about how we prompted GPT3 to start the conversation


Does this have a textual transcription? Please publish it!!

I found it so interesting when he was talking about how glad he was that his wife was born, and the purpose of life and such.


Can you elaborate more on what the prompts for the text content were?


This was the prompt: " Podcast.AI Great people, great interviews with our host Joe Rogan. Episode 1 - Steve Jobs Summary: " Then GPT3 generated the summary, then we added: " Transcript: Joe: " That is all.


I still have a hard time believing that the opening of the script was auto-generated. Did GPT-3 really generate the part about the movie Ghost? If so, it has a surprising amount of understanding of what it's generating.


It's still unclear to me. Did you then feed the entire above blurb into GPT3 on a continuous basis to get the actual interview? To be more specific:

In: "Podcast.AI Great people, great interviews with our host Joe Rogan. Episode 1 - Steve Jobs Summary:"

Out: [the summary]

In: [all the above] + " Transcript: Joe: "

Out: the entire podcast?


Is your TTS model a decoder-only language model using discrete audio features inspired by the likes of Tortoise-TTS? The results are impressive.


Great work! How did you fine-tune GPT3? Did you convert the bio into prompt/completion format?


Is the audio entirely generated from scratch along with their voices or was it cut and paste from existing audio?


Everything, the content and the voices are generated by AI. check my other comments for how we did that.


Absolutely incredible! I've shared this with everyone I know and they are all gobsmacked!


How much audio do you need to build a model?


The original model (https://play.ht/blog/introducing-truly-realistic-text-to-spe...) was trained on 50k hours of audio, the above voices were just finetuned on the model, only 4-6 hours each.

We just finetuned another voice recently with only 1hr though... I think eventually (soon) we will only need 15-20 mins with zeroshot not even finetuning.


Just wanted to say this is amazing!


heyy felfel, Mustafa here, nice work man! so proud to see you on the orange website.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: