Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: The HN Recap – AI generated daily HN podcast (buzzsprout.com)
177 points by wondercraft on May 5, 2023 | hide | past | favorite | 88 comments
We've been running The HN Recap for a month to make it easier to consume Hacker News. While this was a PoC in understanding adoption for AI-generated podcasts, we now plan to keep this going, since lots of people are now listening to this daily.

Let us know what other content channels you'd like to receive as Podcasts and we'll get on it.

Read more about our learnings here → https://wondercraft.ai/blog/learnings-from-1-month-of-ai-pod...




This is really impressive. Is anyone else really freaked out by that? I'm having an uncanny-valley type feeling, because the audio is 99.9% convincing, and the only thing that gives it away is inconsistent, not-always-correct pronunciation. But I'm not picking at the technical details. I'm freaked out by how good it is, and how easily it could pass for some person reading a human-written podcast. I know speech generation has been getting progressively better, but I guess I haven't heard it in a while (compare to the stock TikTok voice, for example.) Coupled with an LLM, this is too close for comfort.


Thanks! Yeah the quality of this audio was what made us automate this process. With LLMs doing the leg work on the content curation as well, it's fairly straightforward. We built a UI around it on https://app.wondercraft.ai/ if you wanna check it out. The TTS engine used is elevenlabs btw.


It's good - but the even more crazy thing is that this can be done by a script kiddy in a few hours - not an expert who spends months or years trying to whittle at some part of the process as it was several years ago.


You're right that it's a script, but it does have some intricacies that require a lot of testing to get it right. Examples:

Using LLMS:

- formulate the right prompts for the intro and outro generation

- pass the content of a post in segments while maintaining history, as if you do in one go you will exceed token limit

- figure out how to integrate comments properly

- turn the summary into spoken format, not condensed written

Using TTS: - train the right voice, one that fits the content. Not all voices of a TTS engine have the same characteristics.

- understand the bugs of the TTS engine. For example Elevenlabs that we're using (and its beyond amazing overall and the team fantastic), is struggling when given this "$2.5". It will read it out "dollar 2(long pause) 5".

- a few more things

Overall:

- Figure out how to connect all of the different segments, music intros, outros etc


It is still a kid's play and more importantly having a low barrier to get into this is scary as hell


Oh man, these edge cases are frustrating.

I ran into “it’s a 50…………50 chance”, apparently it reads a hyphen as (long pause) too.

I’m bullish on being able to give cues which are not read like Bark is doing. Their audio quality isn’t quite as polished as eleven labs, but it’s convincing / uncanny valley in other ways - laughs, throat clears, stutters, pauses


For me the intonation of words and the pauses between them seem quite off from natural speech as well.


It sounds like unnatural "podcast speech".


As a non-native speaker, I very much prefer this more steady voice than the unbelievably fake intonation a lot of American podcasters/YouTubers use when reading from a script.

Mark Rober comes to mind as having a particularly unnatural and annoying cadence, intonation and pitch shifting during sentences.


It does. It's irritatingly stilted in exactly the same way as The Daily.


> Music enhances the experience: Simply put, music makes everything better

Actually, some think the radical opposite.

Music is put everywhere "in some territories", but some people refuse radically to try and focus on content while other stimula are present. Some people find it distracting and senseless.

(In fact, some of us consider the use of ML to remove it from content that bewilderingly decided that you should "be helped to feel" during the fruition of intellectual material like documentaries.)


if I’m watching something for the emotional impact and experience - i.e. drama - then music is absolutely welcome. if I’m watching it because I want to learn about something and make my own opinions - i.e. documentaries - I can’t stand music as I feel like I’m being emotionally manipulated

then there are the absolute best dramas that do not need music to have an emotional impact and on the flip side the absolute worst documentaries that do to be of interest


> Some people find it distracting and senseless.

And some people who prefer music OR no music, depending on the moment and mindset, are ... The same person!


Interesting! I'm working in video and qm very interested by this. Would you have somewhere to point me towards to learn more?


Wonderful!

I have a similar project idea in mind but for a different audience.

Purpose: learning a new language (kid and beginner friendly)

Idea:

- Take a reliable publisher of news stories in target language. (either text or audio)

- Grab top three headlines daily.

- Translate the headlines into English.

- Create an audio with headlines in both languages alternately.

This will help listeners connect current affairs and the words / grammar used to describe them in sentences of target language .. and learn those concepts better. Likely hood of encountering newer words and ideas that dont feel forced.

I am hoping to find some guidance or sample projects that I can adapt for this use case. Either myself or work along with my kids as a hobby project for the summer.

If any one can point me to interesting open-source projects like these or OPs, I'd appreciate.

Here is the news podcast in target language (Kannada) that gave me this idea -- it is published 2-3 times a day currently: https://www.prajavani.net/podcast


Yeah a few different people doing so currently on the platform. Give it a go, see if it works for you: https://app.wondercraft.ai/ and ping us if you need help


I sat in my listening room attempting to find the flaws in the audio and I was left wanting

Having tried this kind of text to speech a few times in the past I’m impressed.

I think having specific subreddits would be a great option.

More importantly though I’d like to be able to just specify my own url list and have it generate the recap of those.

More generally this feels like the missing puzzle piece in personalized voice Q&A service


(same answer to another similar comment) Yeah the hyperpersonalised content is super interesting. For 10' of this audio, compute would cost about $1. So at $1 per daily episode, a realistic price a business would charge for this is $49 a month. Would you pay that? Maybe you could opt in for ads, that would pay that $1 for your attention. After all, the advertiser would know exactly your interests, based on your twitter feed.

An alternative is the semi-personalisation (e.g. HN, or subreddits), where the content might not be hyper personal, but still close enough, and the cost is absorbed by community.


is anyone else weirded out that it simulates breath? like this is a robot, not a real woman, but it pauses to take a breath to mimic human sound.

on the one hand i like it. she sounds like a real podcast host and person with a nice, professional voice.

on the other hand it's weird. like why does it need to do that other than to pretend to be a person.


Probably helps with engagement, as it's more natural.


I agree...I really want to know if it could do the kind of verbal-keepalive feedback you hear in conversational Japanese, between two people. It'd be amazing to hear an AI group podcast like that, just for technical amazement purposes.


Most people don't like listening to overtly artificial voices for anything long form. Breath pauses also give the listener time to chunk things.


if you think about how these models were trained it makes sense. When a human reads an audiobook, they would breathe. So the model has learned how to make the breathing sound.


Yes. That's what I found most impressive. Incredible.


This is so good! I'm amazed it's even possible for this to be done "so easily" (I'm sure it's a lot of work!)

I honestly don't recall being this amazed at technology in quite a while. This is the future

I wonder if you could try different voices for different comments to make it seem like a conversation ;-)

I want this on my pocket to help me navigate the HUGE amounts of information we have nowadays. I don't even care that I don't get the exact subset of that information that I would personally highlight if I were to sift through all that hits my inbox and screen on any given day--I am happy to outsource that to the model even if it's only 80% accurate


I have very mixed feelings about this. I produce a podcast and a large (1hr) episode with multiple interviews raw audio sources etc can involve 6-10 hours of editing work. Seeing it generated at the push of a button is technically impressive (and in line with predictions I've made here about the automatability of such things) but also demotivates me from doing my own editing labor, as I can see my own decades' worth of audio editing skills becoming obsolete and economically unsustainable.


Hey! I understand your point but I think as with every technology there's two perspectives: 1. You get demotivated by thinking that the machine will replace you 2. You use all of your experience as springboard and leverage this technology to accelerate your day to day. We already have 3 podcast studios as customers, and they love it as they can create a new podcast in no time, when before it took a while. Message us at (team AT wondercraft.ai) if you want to know more.


Thanks for the nice words! The conversation style pods still need a little bit of work... the interaction between the voices is a little bit unnatural.


Listening to the May 4th ep: The (amount of) dead air between segments feels a bit off. Feels related to the timing of the muzak. Maybe get some experienced podcast producer/audio engineer/whatever to design the transitions for you?

Otherwise: wow.


Isn't there a risk of copyright violation because you use content from third parties only linked on Hackernews?

Some people are even against being posted on Hackernews in the first place.

And I don't know if all commentators agree if their comments are used.


Maybe in the EU at least? Google News has to license the content it uses from news sources. IANAL but I can't imagine any commenters here would have any licensing rights to their publicly posted comments.


Does this mean we can now get good-sounding audiobooks of pretty much any book available in digital format in the near future?


Which text to speech platform do you use? Sound really good


Going to take a guess it's their own, https://wondercraft.ai/.


In another comment [0], it's mentioned that the TTS engine is ElevenLabs' [1]

[0] https://news.ycombinator.com/item?id=35832886

[1] https://elevenlabs.io/


It's ElevenLabs [1], according to OP's earlier comments.

[1] https://beta.elevenlabs.io/


according to link's description it's Wondercraft https://app.wondercraft.ai/


Elevenlabs! Those guys are amazing!


I’m blown away by the audio quality, well done. It’s nice that it also recaps the comments.

You said the cost is about $2 an episode, is that mostly for the summarization or audio generation?


Unless they are doing something unreasonable, i have to believe it’s audio. ElevenLabs is $0.3/1000 characters. The summarization cost shouldn’t come close.

Excited to see competition to drive the cost down here.

I’m under the impression their margins are crazy.


Yes, it's 90% the audio generation. It's about to get a lot cheaper though.


I'd love to have something that could create daily summaries of private Twitter lists that I use to track developments in AI. The audio part would be great to consume during my re-emerging commute to the office. I would imagine this would be a pretty cool thing to have for any information worker as well. The difference is that this would be for an audience of 1.


Yeah the hyperpersonalised content is super interesting. For 10' of this audio, compute would cost about $1. So at $1 per daily episode, a realistic price a business would charge for this is $49 a month. Would you pay that? Maybe you could opt in for ads, that would pay that $1 for your attention. After all, the advertiser would know exactly your interests, based on your twitter feed.

An alternative is the semi-personalisation (e.g. HN, or subreddits), where the content might not be hyper personal, but still close enough, and the cost is absorbed by community.


I think I would love to have something like this generated from data in github pull requests, closed JIRA tickets, and confluence pages. I would very very willingly spend 15 minutes a day listening and learning about progress on different projects across my organization


Reminds me of a very recent neat feature from Slack [1] that among other things summarizes the unread channel messages.

[1] https://slack.com/blog/news/introducing-slack-gpt


Wow I literally posted the same thing a month ago - check out https://radio-hn.pages.dev

I did not use music but did orchestrate multi host show. Based on chat gpt, eleven lab and a bit of ffmpeg scripting.


There's also this! https://camrobjones.com/hackercast/

It was posted a few days ago and got 0 comments. https://news.ycombinator.com/item?id=35751065


I think people are blown away by the quality of the audio narration of this one, not by the idea or content itself. AWS Polly sounds like the current generation of artificial voice we are used to.


will the podcast break when it covers itself tomorrow?


(* laughs in robot *)


lol


I'm shocked by how realistic it sounds. Can we embed it in our website/blog ?


A big fan of this project! I would be really interested in seeing if you can select a different voice or select the length you would like


Thanks! Yeah there's a selection of different voices and you can even clone your own. Length is really up to you, no restriction!


This is really outstanding. Especially the voice - it sounds real human (or is it a real human?)

Looking forward to see you in _Google Podcasts_ soon.


Just enabled!


Well executed! I had a similar idea in mind, except doing an independent summary of what went on in the parliament. The data is available at https://parliamentlive.tv . Subtitles are available but the AI would need to remember speaker names and voices to know who said what.


Unless there is a transcript that comes with it? If you have one try creating a summary for at https://app.wondercraft.ai/


The AI pronounced `sudo` as "su-dough" instead of "su-doo". Literally unwatchable

:P

This is fantastic work, keep up the great job.


hehe thanks!!


Wow this is amazing! I was going to launch a similar podcast for a completely different audience this Friday. I was going to use a multi step approach to create the episodes.

Going to delay the idea and try out windercraft, this has blown my mind!


I know your focus is on the podcast/audio side of things (and you nailed this, very very cool), but it would be interesting to me to see how you generated the summaries themselves, if that was available, or at least a brief summary of the code, to understand what it took to get to that point.


Hey! Get in contact (team AT wondercraft.ai), happy to chat!


I made something similar, https://odysseysplace.buzzsprout.com/ - there’s an episode where an AI interviews an AI too. I thoroughly enjoy the concept of your podcast though - its found a niche!


That’s one of the standard voices from ElevenLabs right?


Yeah I just tuned it a bit. The other voice it interviews is a custom made though.


It might be fun to add a few different podcasts for the same news day that present the material in different tones, like "Witty", "Dry", and "Sarcastic". I imagine it would just be adding prompt info for the LLM generating the text.


Might need to play with a few different voices as well, to match the tone of the language. But yeah totally plausible. Different languages as well, easily done!


The background music was just so distracting I couldn't focus at all on what was being said and had to turn it off almost immediately. Granted I do have ADHD, so YMMV.


such a great idea! I wonder how a podcast with two voices sounds, it would be interesting to hear the AI interaction


Thanks Mario. It gets a bit more challenging when two voices are interacting to be honest.. You can test it out at https://app.wondercraft.ai/ if you want :)


wondercraft, just FYI, when you oauth in the app name is some random firebase app URL. Keep up the good work though!


Yup, thanks! Need to get to that.


This is pretty cool! Thank you for sharing.

What was the technical effort to create such a podcast?


Thanks for the nice words!

The technical work is split in three pieces: 1) LLM prompt and chaining for script generation 2) workarounds for some TTs bugs (not that many, elevenlabs is amazing) 3) Stitching all moving pieces together

Nothing advanced for tech, but requires a lot of experimentation as neither LLMs nor TTS are deterministic which create headaches for prod.


Impressive. How about using a male voice? I think it would fit the character of HN since the vast majority here seem to be men. (Optimally it would be someone who sounds roughly like a tech person, if that makes sense, though I guess there isn't so much choice in voices.)


Read your comment again and ask yourself why the tech sphere is often perceived as sexist


I don't think this is sexist, e.g. even as a man in some predominantly female community I would agree a female voice would be a somewhat better fit.


We have a selection of voices, we just thought that Anna fit this role very well. You can also clone your own voice if you'd like. https://app.wondercraft.ai/


"Hi. One hardcore stereotype please."

"Uh. Generally even tech insiders don't say this kind of thing anymore...it makes life even harder for..."

"Male majority though, oh and give them a tech voice too"

"Hold on, we may actually wish to study your brain, to see how it responds to stimuli from this century as compared to the last"


I feel sorry for the mindset which would prompt such a response.


Tech people can’t be women?


the intro music, it has a circular feel to it. who can tell me about this? why is it used when it's used?


impressive but kind of an onslaught of words


I love it, thanks for 2x voice speed up


Thank you!


really good stuff!


Thanks!!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: