What are the data sources you're using for the transcripts behind the api?
As an idea for others who are better able to implement, I'd love a system which would search every podcast I've ever listened to. It's very difficult for me to remember which podcast I heard a specific story, maxim, or interview when I want to share it with a friend. Bonus points if a transcript has some timestamps lined up.
Everyone says they want transcripts, but after some digging, I decide that transcript is not very useful for now --
For listeners, those who choose to listen hate reading texts.
For podcasters, it's expensive to produce transcripts and the seemingly SEO boost is not easy to justify -- 1, it takes time for SEO to work; 2, conversational contents (most podcasts) are not high quality when you see them in texts.
For Listen Notes (podcast search engine), indexing transcripts introduces more noise than signals.
I did some experiment around transcripts, e.g., https://www.listennotes.com/e/51222de65c2c484e8a47608eac1329... But I decided not to continue for now.
The search results from Listen Notes include uuid for episode & podcast. So the client side (e.g., podcast player) can keep track of listen history for a user.
We do the first option in what you bring up in your later comment
1) it's hella expensive to make transcripts of podcasts. Allow users to contribute a set amount for podcast transcriptions they're interested in (e.g. 50 cents per episode).
2) standard subscription model. Give access tiered access to n podcasts for a set amount per month.
3) modified subscription model. Target 5 minutes of transcribed audio per user. Split audio files into small overlapping segments. People can either pay a subscription fee equal to 5 minutes of audio transcription or can transcribe 5 minutes of content per month.
Any thoughts on which would work best? The crowdsourcing of transcriptions would need stitching together and editing to make it flow, but it might be less obtrusive to people who don't want to pay.
It may be more realistic to ask podcasters to provide good show notes, instead of full transcripts. All important things (wikipedia entries, guests background, places mentioned, ...) should be in show notes. Listeners may be interested to contribute to show notes, which is lightweight enough to produce.
That's a good point I hadn't considered. Personally I don't consume show notes, but perhaps reading those would help me out overall.
1. Depth first approach. Position yourself as a AI company and do in-audio search, but only index a small set of podcasts. It's like if Google just indexes 1000 web pages and try to refine keyword matching techniques -- too few data to improve search relevance & ranking.
2. Breadth first approach. Start something simple. Index just meta data for as many podcast & episodes as possible. It's not sexy, in terms of raising money or going to headline of TechCrunch. It's not AI (for now).
Given the limited resource I have now (i.e., one person, not VC funding, 2 months fulltime work), I have to take the breadth first approach. In the future, I still have choice to do in-audio search gradually. It's like playing strategy game; it's about build order :) Ideally you do everything all at once, e.g., AI, player app, community, ... But in reality, you get to take the very first step...
I'm definitely in that category of people that spend more time listening than watching, but I can't find 5+ hours per day. Although I might if I replaced all blogs and HN with podcasts haha.
In any case, I'll definitely try your service out and show some friends :). Hope you get some donations!
How do you have time for that? Do you listen while doing light work? Looks like a great service anyway!
I also listen podcasts during commute, workout, doing grocery, ...
I heard serval of my friends said that they spend more time listening to podcasts / audiobooks than watching TV / reading books. I don't know whether it's silicon valley bubble or it's actually a trend :)
I just can't concentrate on code and a podcast at the same time. Both will suffer if I try to combine them.
Great service! Thanks for sharing.
And got an empty result.
Looks like you need to sign up with some external service to use it? Why that? For the monetization? Or am I missing something?
I host my API on Mashape, where it takes care of some mundane but necessary tasks for building an API, e.g., API key/secret, rate limit, payment, documentation, ...
For details, please refer to https://market.mashape.com/listennotes/listennotes
It says "Search (almost) all podcasts & episodes on the Internet."
Where does the data come from?
In the future I'll use my own data to come up with my own top chart, e.g., data for searches, clicks, plays, shares, ratings (TODO)...
Sure, I need to make it more clear.