
Show HN: Full Text Search on Podcasts - nmiodice
https://atshpthmkhc-app.azurewebsites.net/
======
wenbin
If you just want meta data search (with 150k transcripts search), try
[https://www.listennotes.com/](https://www.listennotes.com/)

And [https://www.listennotes.com/api](https://www.listennotes.com/api)

~~~
gingerjoos
Hi wenbin,

Listennotes is one of my favourite single-person companies. Congratulations on
a wonderful product! I've been really impressed that you were able to build
such a good and stable product; I suppose some of that comes from using
"boring technology" ([https://www.listennotes.com/blog/the-boring-technology-
behin...](https://www.listennotes.com/blog/the-boring-technology-behind-a-one-
person-23/))

What made you add the playlist feature? It seems more b2c than some of your
other features. Is it gaining traction? Do you think at some point you would
make your own podcast app?

Thanks! Anirudh

------
krat0sprakhar
Great idea but I think you should expose more search options to make this
useful. Options like post, duration, etc would make this very very useful!
Something like:

    
    
      podcast:thedaily q:coronavirus testing
    
    

I can't remember the number of times I remember the podcast where I heard
something but can't seem to recollect the episode.

~~~
andrewmatte
Cool idea for an add-on!

I've noticed that I feel better about giving feedback to uncompensated
developers using "this is great and" instead of "this is great but."

It could make or break someone's day.

~~~
krat0sprakhar
That's great feedback - thank you.. I will keep that in mind for future :)

------
nmiodice
I made a tool that enables you to run full text search against audio content
and explore the results using an embedded media player.

As of now (mostly for cost reasons) I have ingested a limited set of podcasts.

Please let me know what you think!

Note: It is not yet mobile optimized!

~~~
nexuist
Are there plans to allow users to pick their own podcast episodes?

~~~
pdwittig
Very strong second on this! Use case for me: Like most ppl, I often listed to
podcasts while doing some other primary task (cooking, driving, etc.), and am
unable to "note" the interesting snippets. When I go back to find those
snippets, I am often unsure of which exact episode I heard it on, and if I do
remember that, using the audio scrubber to find it is still a disaster. Would
love to give it a try when you roll this out.

------
tiew9Vii
I had a similar idea wanting to play with the AWS / Google speech to text
services.

I wanted to pipe in audio of various Youtube tech conference videos then apply
some basic taxonomy/tagging and provide full text search so you can find a
conference talk which contains some specific technology/subject you want to
view.

I ran in to difficulty in technology / software conferences uses very specific
acronyms and words that are not very general. Also being international there's
many accents and levels of English. This means the AWS/Google API's struggled
to translate videos which was also made difficult by using compressed audio
streams you get from Youtube vs wavs.

~~~
lowdose
Google offers the functionality to add your own acronyms and products on the
commercial speech to text. I think there is even a manual quality feedback
loop in alpha.

------
crawdog
Very interesting. With the speech to text APIs out there would be interesting
to further expand to point in time queries of the articles. Similar to the
[https://podcastsearch.david-smith.org](https://podcastsearch.david-
smith.org).

Facets would be a nice to have - such as:

Series, Category, Guest, Date

Extra credit:

Speaker diarization and be able to search by individual speakers! If multiple
channel feeds were available for the podcasts this would be easier to do...
Maybe a search engine for podcasts where you partner with the content creators
and give them incentives to tag/provide better feeds?

edit formatting.

------
bryanrasmussen
It looks like you're wrapping some sort of api from hubhopper, so I guess
there is not much you can do about setting up the search engine, but as a
general rule you want to give extra weight to specific fields to end up with
something useful.

In the case I searched for coronavirus the first hit of some sort of mountain
bike podcast ranked higher than a number of podcasts that had coronavirus in
the title.

------
mnfn
I've been hoping that someone would do a broader version of David's Smith's
podcast search[1] for a long time. It's really helpful for answering 'what was
that episode where they talked about x?' type questions.

[1]: [http://podcastsearch.david-smith.org/](http://podcastsearch.david-
smith.org/)

------
smcleod
It doesn’t seem to handle searching multiple words, for example with someone’s
full name - if I search for the Doug Stanhope or “Doug Stanhope” it only
returns irrelevant results for the word Doug and ignored Stanhope.

~~~
nmiodice
My hunch is that, due to the limited dataset, there are just no results for
"Stanhope".

~~~
smcleod
What exactly is the dataset? The page has nothing on it other than the search
bar.

------
lihaciudaniel
This may be the best post you can't believe how useful this is

~~~
nmiodice
Thank you! I always appreciate feedback on how to make it better. Let me know
if you have any specific feeback.

------
wusatiuk
Would be interested in the stack / workflows / frameworks / tools you use.

~~~
nmiodice
The front-end is built in React. The service is deployed as a single Spring
Boot application written in Kotlin. In terms of workflows, there is state
machine that jobs process through (asynchronously with persistent state)
within the service itself. It depends on no external workflow engines.

Infrastructure components include CosmosDB, Azure Blob Storage. The service is
deployed as an App Service.

The STT and Search tech I'll keep under-wraps for now as they may change.

------
rohan_shah
Is there any site that ranks podcasts by viewership/listenership numbers?

------
schlu
Did you handle dynamic ad insertion? If so what approach are you taking?

------
jdc
I get a 500 error when I put my query in quotations marks.

------
notRobot
What podcasts does this search?

~~~
rapsey
Not many it seems.

~~~
nmiodice
Correct. It’s costly to ingest data and the app is in beta so it doesn’t make
sense to invest - yet - in indexing broader content

~~~
bravura
So charge people to ingest their favorite podcast

~~~
KMnO4
How much are people willing to pay? A quick search shows Google's STT API at
$1.44/hour. As an example, the Joe Rogan Experience is ~1500 multi-hour
episodes, meaning it would cost >$5000 for just that one show.

Presumably the OP is using an offline speech processing tool, but compute
costs would still be expensive.

~~~
artificial
Instead of Folding at home what about STT@Home and offer credits on the
service for pooling resources? I've got compute I'd kick at this.

------
_curious_
Cool, thanks for sharing this!

