Hacker News new | past | comments | ask | show | jobs | submit login
I’m consuming 5500 hours of Joe Rogan with the help of AI (medium.com/steamship)
26 points by EniasCailliau 6 days ago | hide | past | favorite | 37 comments

Would love to have a service which allows me to search through all podcasts I’m consuming. There are so many times that I have some anchor knowledge but I cannot remember where I’ve heard it the last time and cannot find it back.

You may be interested in Andrej Karpathy's experiment with OpenAI's latest STT Whisper model. Relevant tweet here: https://twitter.com/karpathy/status/1574474950416617472

By the way, if you are looking for a clean podcast consumption experience, do give a try to https://jkstream.com. Easy way to subscribe to your favorite podcast interviewers and guests.

When that happens, how can you be sure that it is something you heard on a podcast? Maybe it was something you heard in a YouTube video. Or read in a Medium post. What I’m saying is, wouldn’t a real solution to the problem you’re describing require access to all forms of verbal media that you are consuming?

That's probably true. I guess we have to start somewhere. I do agree with thimm I often reference a podcast but forgot which one. Being able to query the exact time a topic is discussed in a podcast does sound valuable.

Do you know of a solution that aims to solve this problem?

If its something read in a post on the internet then its usually easy enough to find by typing the parts you remember into google. But I agree, I have occasionally wished that youtube made the closed captions of youtube videos searchable for this reason.

Time for YT to up their caption game.

Steamship is happy to support a company looking to develop this!

Using AI is probably the only way I would consume Joe Rogan podcasts. But do you worry that this will contribute to our polarizing political landscape by taking things out of context? Do we trust AI to have deep understanding of the nuances of conversation and debate?

[Steamshipper here]

We debated this a lot internally -- specifically whether we should pick a less charged initial dataset to experiment with.

In the end, one of the reasons we decided to run with it was because we felt the controversy times listenership actually made it more needing of computer-assisted search.

There aren't a lot of potentially hazardous situations that could benefit from a deep analysis of Fresh Air, but there are quite a few situations related to Rogan's show that probably could have been engaged with more effectively if there was better access to the underlying data.

The hope is that easier search into "original/source data" will ultimately act as a net-positive societally. E.g.: best way to show the Rogan show was behaving irresponsibly during the pandemic is to make it really easy to get the receipts. But more generally -- so too on either side of any debate.

Totally agree on the "AI and nuance" problem. I think this is going to be perpetual (and good, and necessary) question that needs a lot of attention.

That makes sense! And as I thought about it, this is an excellent example of where AI can enhance but not replace human decision making. It can make it easier to search through his podcasts, but still require human input to make the correct interpretation.

Hello, bias. Have you listened to any full episodes? He's generally a pretty great interviewer. There's a reason his audience is bigger than most other media outlets. There are a lot of episodes I'm not interested in but some are very interesting and informative.

So now if someone doesn't like something you like, they're the biased one?

Cool project! I can't verify your results, because I don't listen to Joe Rogan, but your setup seems sound.

I never heard of Steamship before. You used it to analyse audio, and create summaries, is that it's sole purpose? We just started research on text classification in case of extremely small datasets, and was wondering if I should add it to the list.

Me neither. Not planning on spending a year listening to just 1 podcast. :)

Re: quality - Entity extraction is super reliable given proper transcriptions. Summaries are having a hard time though. Some of them give random names to the guests. I saw Elon Musk getting called "Francis" before.

Re: Steamship - We're building a developer SDK for language packages. We're a great solution if you need stateful (you can search using language AI features) or you want an easy interface to tag documents.

We've done text classification before in our ticket tagger. Here's a blog that explains how we did it: https://medium.com/steamship/bootstrapping-classification-wi...

Great, I'll add it to our list and try it out in the future.

Thank you for sharing this! One thing I noticed is that, to benefit from the natural language processing tooling, you had to first transcribe the audio. Is there some mechanism for avoiding that pre-processing step with the help of speech recognition models (e.g., OpenAI's new Whisper model)?

Steamship works a bit like the the heyday of jQuery plugins:


Except instead of manipulating a web page like jQuery, it orchestrates remote NLP workflow over your data. We haven't released our SDK yet, but we're working to make a bunch of awesome reference plugins to let folks mix and match different models out there.

Whisper will definitely be in the mix!

This is equal parts hilarious and great. I got an especially good laugh out of:

> “Enias, you work at an NLP startup. Just have the computer tell you what this guy thinks”.

Pretty sure that's how we end up with Skynet /s

Do you have to have the transcriptions available or can this work on any audio?

Thanks! Happy you liked it! Theoretically works with any audio/video stream.

Interesting take. I also suffer from FOMO about podcasts and tv series. Especially with long-running podcasts, I never even start listening to any of them as I would need to start from episode 1. In the case of Joe Rogan, that's virtually impossible.

Steamship seems like a handy framework for some prototyping. Are there any similar tools that can do similar things?

[Disclosure: I'm a Steamshipper]

The similar thing this project has got me really wanting is the ability to find snippets of a topic across all the podcast archives I like. Sort of the podcast equivalent of falling into a Wikipedia hole and learning all about a topic from different angles.

Re: other tools -- We're a developer platform, so we're offering tooling from that angle: packages you can drop into a platform and just start using. (In this case, audio search). What's nice about the way it works is you can swap out components: use any transcription engine, any set of models, etc -- and then query across the results.

Some of the transcription-specific API companies (like Assembly) are starting to build in search capabilities, which will also be useful depending on workload and whether you want to add your models or endpoints to the mix.

I know. And it seems like every influencer out there is starting their own podcast these days. Information overload.

Tv series is a good idea! What insights would you extract from them?

I can already see myself analysing the mood in shark tank pitches. I wonder if you can create a model to analyze all pitches on shark tank and then come up with its own. That would be cool!

There is no reason to start at episode 1. His show is not a story nor does it build much on the past. Just search through the backlog for guests you are interested in.

For example I skip most episodes with comedians, entertainers etc and mostly listen to creators, doers and experts in a field, as I find those the most interesting.

Good idea!

What point is this supposed to be making about the product? I read the article and I see no insights of value produced, doesn't really seem to be a good advert for this technology.

This is not a judgement on the value of Joe Rogan content. If the goal was to extract useful, interesting or accurate insights from that content it seems to be a failure.

Sorry you feel that way. Have you tried out the demo, it showcases how you can use our tech stack to index across the Joe Rogan podcasts.

For us though it was a nice exercise to show we can support audio transcription and large data files.

Curious, what type of database are you using for storing the language AI features? I would guess SQL as it's structured but could be wrong. Are you planning to support embeddings too, that will probably help to make the search semantic instead of keyword based.

We've built a hybrid database that federates workloads & queries atop both a relational & vector component. Right now we've got fairly limited interaction between those two halves of the brain, but it's one of the things that gets me most excited when thinking about potential.

There's a lot of great work going on in this space too, by the way -- everything from PostGres plugins for vectors to vector databases adding relations to things in between.

It's going to be awesome few years to watch. There's a real pull toward finding a way to comfortably merge the two: it enables some pretty incredible queries.

We have a database engine layer that can wrap around multiple databases. Today though we're mainly using SQL since it's so fast with the right indexing. Soon though we're planning to support search using embeddings too. We'll probably use an optimised db for that like faiss or ngt.

Well, I don't see anyone really commenting so far that has listened to much JRE, but I've probably listened to 200 episodes or so, and while perhaps the tone / style / type of humor of the provided results you got weren't far off, the actual descriptions and prescriptions and content was just all wrong. Like those aren't actual summaries or representations of anything I've seen the quoted figures say So I guess, you made an effective fake news generator / textual deepfake that creates falsely attributable statements/quotes that at a surface level sound reasonable, but don't actually represent the person being modeled's real views.

Cool I guess but also really dangerous if people start linguistically shadowboxing fake versions of their ideological opponents that don't even share the same views as the real opponent.

I imagine a world where Jordan Peterson fanboys have arguments with a virtualized Fauci (that makes stylistically accurate but factually incorrect defenses of the vaccine).

Neat experiment but I'd really caution being careful in how you describe your results.

Appreciate the comment. Are you referring to the summaries at the bottom of the blog post? Happy to hear more (here or on enias@steamship.com) — if we drew an inaccurate excerpt we should fix it.

If the goal of the tldr tool were to extract a thesis statement of a broad summary of the content of the episode, it doesn't seem to do that, rather just a random selection of phrases or sentences taken out of context and stitched together.

And the excerpts in the blog post don't really fit my mental model of what the guests would say, or aren't complete enough to give any real understanding.

But the audio transcription and searchability and sentiment analysis stuff is neat. Just don't think your summarization feature is any good.

Appreciate the feedback. Whether or not a model’s summary represents the underlying text is absolutely something we all have to pay careful attention to as these things improve. One thing we’ve seen work well is systems that always link back to the source so you can see the content that underlied the synthesis. Not a full solution, but important to keep that provanance.

I'll update the UI so the reader can listen to each chapter by themselves and add a disclaimer that these summaries are AI generated and may lack context or be false.

Plug here from Steamship: if you're working on a project with audio/video and wanting some form of search, analysis, or triggering, we would really love to hear how you're thinking about it.

just use whisper...

We will!

Part of what we're doing is building a platform that captures the broader lifecycle of tasks beyond "inference alone" -- things like data import, index building & maintenance, drift detection, corpus query.


Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact