By the way, if you are looking for a clean podcast consumption experience, do give a try to https://jkstream.com. Easy way to subscribe to your favorite podcast interviewers and guests.
Do you know of a solution that aims to solve this problem?
We debated this a lot internally -- specifically whether we should pick a less charged initial dataset to experiment with.
In the end, one of the reasons we decided to run with it was because we felt the controversy times listenership actually made it more needing of computer-assisted search.
There aren't a lot of potentially hazardous situations that could benefit from a deep analysis of Fresh Air, but there are quite a few situations related to Rogan's show that probably could have been engaged with more effectively if there was better access to the underlying data.
The hope is that easier search into "original/source data" will ultimately act as a net-positive societally. E.g.: best way to show the Rogan show was behaving irresponsibly during the pandemic is to make it really easy to get the receipts. But more generally -- so too on either side of any debate.
Totally agree on the "AI and nuance" problem. I think this is going to be perpetual (and good, and necessary) question that needs a lot of attention.
I never heard of Steamship before. You used it to analyse audio, and create summaries, is that it's sole purpose? We just started research on text classification in case of extremely small datasets, and was wondering if I should add it to the list.
Re: quality - Entity extraction is super reliable given proper transcriptions. Summaries are having a hard time though. Some of them give random names to the guests. I saw Elon Musk getting called "Francis" before.
Re: Steamship - We're building a developer SDK for language packages. We're a great solution if you need stateful (you can search using language AI features) or you want an easy interface to tag documents.
We've done text classification before in our ticket tagger. Here's a blog that explains how we did it: https://medium.com/steamship/bootstrapping-classification-wi...
Except instead of manipulating a web page like jQuery, it orchestrates remote NLP workflow over your data. We haven't released our SDK yet, but we're working to make a bunch of awesome reference plugins to let folks mix and match different models out there.
Whisper will definitely be in the mix!
> “Enias, you work at an NLP startup. Just have the computer tell you what this guy thinks”.
Pretty sure that's how we end up with Skynet /s
Do you have to have the transcriptions available or can this work on any audio?
Steamship seems like a handy framework for some prototyping. Are there any similar tools that can do similar things?
The similar thing this project has got me really wanting is the ability to find snippets of a topic across all the podcast archives I like. Sort of the podcast equivalent of falling into a Wikipedia hole and learning all about a topic from different angles.
Re: other tools -- We're a developer platform, so we're offering tooling from that angle: packages you can drop into a platform and just start using. (In this case, audio search). What's nice about the way it works is you can swap out components: use any transcription engine, any set of models, etc -- and then query across the results.
Some of the transcription-specific API companies (like Assembly) are starting to build in search capabilities, which will also be useful depending on workload and whether you want to add your models or endpoints to the mix.
Tv series is a good idea! What insights would you extract from them?
I can already see myself analysing the mood in shark tank pitches. I wonder if you can create a model to analyze all pitches on shark tank and then come up with its own. That would be cool!
For example I skip most episodes with comedians, entertainers etc and mostly listen to creators, doers and experts in a field, as I find those the most interesting.
This is not a judgement on the value of Joe Rogan content. If the goal was to extract useful, interesting or accurate insights from that content it seems to be a failure.
For us though it was a nice exercise to show we can support audio transcription and large data files.
There's a lot of great work going on in this space too, by the way -- everything from PostGres plugins for vectors to vector databases adding relations to things in between.
It's going to be awesome few years to watch. There's a real pull toward finding a way to comfortably merge the two: it enables some pretty incredible queries.
Cool I guess but also really dangerous if people start linguistically shadowboxing fake versions of their ideological opponents that don't even share the same views as the real opponent.
I imagine a world where Jordan Peterson fanboys have arguments with a virtualized Fauci (that makes stylistically accurate but factually incorrect defenses of the vaccine).
Neat experiment but I'd really caution being careful in how you describe your results.
And the excerpts in the blog post don't really fit my mental model of what the guests would say, or aren't complete enough to give any real understanding.
But the audio transcription and searchability and sentiment analysis stuff is neat. Just don't think your summarization feature is any good.
I'll update the UI so the reader can listen to each chapter by themselves and add a disclaimer that these summaries are AI generated and may lack context or be false.
Part of what we're doing is building a platform that captures the broader lifecycle of tasks beyond "inference alone" -- things like data import, index building & maintenance, drift detection, corpus query.