Hacker News new | more | comments | ask | show | jobs | submit login
Show HN: Listen Notes Podcast Search API (listennotes.com)
80 points by wenbin on Dec 1, 2017 | hide | past | web | favorite | 30 comments

This looks very promising! I've thought of doing something similar and started (very slowly) on the project a month ago.

What are the data sources you're using for the transcripts behind the api?

As an idea for others who are better able to implement, I'd love a system which would search every podcast I've ever listened to. It's very difficult for me to remember which podcast I heard a specific story, maxim, or interview when I want to share it with a friend. Bonus points if a transcript has some timestamps lined up.

I have meta data of podcasts & episodes in my own database, including title, description, publisher, ... But I don't have transcripts.

Everyone says they want transcripts, but after some digging, I decide that transcript is not very useful for now --

For listeners, those who choose to listen hate reading texts.

For podcasters, it's expensive to produce transcripts and the seemingly SEO boost is not easy to justify -- 1, it takes time for SEO to work; 2, conversational contents (most podcasts) are not high quality when you see them in texts.

For Listen Notes (podcast search engine), indexing transcripts introduces more noise than signals.

I did some experiment around transcripts, e.g., https://www.listennotes.com/e/51222de65c2c484e8a47608eac1329... But I decided not to continue for now.

The search results from Listen Notes include uuid for episode & podcast. So the client side (e.g., podcast player) can keep track of listen history for a user.

If you're interested in transcript search. We do it as part of our video processing engine and we work with a some podcast creators to provide search api. Message me if you are interested.

We do the first option in what you bring up in your later comment

I've also looked into this problem. It would be very valuable to search podcasts if transcription were accurate. Lots of companies and services get mentioned in passing in podcasts but never get to find out about it.. I think quite a few companies would pay to be "alerted" about this, much in the way they do Twitter searches now. I'd certainly do it for my own name and business.

No idea if my idea is feasible or not, but I have 3 concepts for how it could work out:

1) it's hella expensive to make transcripts of podcasts. Allow users to contribute a set amount for podcast transcriptions they're interested in (e.g. 50 cents per episode).

2) standard subscription model. Give access tiered access to n podcasts for a set amount per month.

3) modified subscription model. Target 5 minutes of transcribed audio per user. Split audio files into small overlapping segments. People can either pay a subscription fee equal to 5 minutes of audio transcription or can transcribe 5 minutes of content per month.

Any thoughts on which would work best? The crowdsourcing of transcriptions would need stitching together and editing to make it flow, but it might be less obtrusive to people who don't want to pay.

To be honest, I'm hoping for 4) have a computer transcribe and index podcast contents to at least a 90% accuracy. It appears this is a big ask right now, however, which surprises me given how good things like Alexa are.

Full transcript sounds appealing, but is it really what people want? For all signals I've got so far, there's not strong use case for podcast transcript.

It may be more realistic to ask podcasters to provide good show notes, instead of full transcripts. All important things (wikipedia entries, guests background, places mentioned, ...) should be in show notes. Listeners may be interested to contribute to show notes, which is lightweight enough to produce.

My downfall is remembering a sentence or two but having no idea which of the approximately 20 podcasts I listen to the content came from. There are sometimes guests who are on multiple podcasts, overlapping topics, etc.

That's a good point I hadn't considered. Personally I don't consume show notes, but perhaps reading those would help me out overall.

fyi http://audiosear.ch/ just closed doors after years of struggling to do in-episode search. its an expensive problem to tackle and something people clearly don't want enough for Audiosearch to survive. its easy to ask and imagine for these but running these services is extraordinarily hard.

For a podcast search engine, you can take two approaches.

1. Depth first approach. Position yourself as a AI company and do in-audio search, but only index a small set of podcasts. It's like if Google just indexes 1000 web pages and try to refine keyword matching techniques -- too few data to improve search relevance & ranking.

2. Breadth first approach. Start something simple. Index just meta data for as many podcast & episodes as possible. It's not sexy, in terms of raising money or going to headline of TechCrunch. It's not AI (for now).

Given the limited resource I have now (i.e., one person, not VC funding, 2 months fulltime work), I have to take the breadth first approach. In the future, I still have choice to do in-audio search gradually. It's like playing strategy game; it's about build order :) Ideally you do everything all at once, e.g., AI, player app, community, ... But in reality, you get to take the very first step...

oh for sure man i understand. not criticizing you. i was just trying to respond to the guy above me.

I love this, I just love this. This is what Google should have released a few years back when podcasts started taking over. I can't donate with PayPal because that platform hates me, maybe I can venmo you money for a beer!?. Thanks for sharing it with us!

just build on it, he's trying to monetize it not ask for donations

He is asking for donations. There's a PayPal link at the bottom of the home page: 'Buy me coffee or donate some server time'

That makes sense, I find it too distracting to listen to unless I'm doing something entirely monotonous. The illusion of multitasking and all that. I'm jealous that you can just pause to think!

I'm definitely in that category of people that spend more time listening than watching, but I can't find 5+ hours per day. Although I might if I replaced all blogs and HN with podcasts haha.

In any case, I'll definitely try your service out and show some friends :). Hope you get some donations!

"I'm an avid podcast listener. I listen to 5+ hours podcasts everyday."

How do you have time for that? Do you listen while doing light work? Looks like a great service anyway!

I wear AirPods while coding. You know, modern programming work involves a lot of mechanical work... like moving code block around, copy & paste... For such mechanical work, I can multitask by listening to podcasts -- so I don't feel that I waste time :) Whenever I need to think, I just double tap my AirPods to pause.

I also listen podcasts during commute, workout, doing grocery, ...

I heard serval of my friends said that they spend more time listening to podcasts / audiobooks than watching TV / reading books. I don't know whether it's silicon valley bubble or it's actually a trend :)

I'm with you up until the part about listening while coding. For all of the other more mundane tasks like shopping, dishes, showering, video games, commuting throwing on a podcast makes sense.

I just can't concentrate on code and a podcast at the same time. Both will suffer if I try to combine them.

Great service! Thanks for sharing.

I'm a full time student with a part time job, and I probably average between 2 and 3 hours of podcasts a day. I live nowhere near silicon valley.

Time? Speed-listen, as was discussed a couple of weeks ago https://news.ycombinator.com/item?id=15741428

He listens to podcasts for 5 hours or more per day, not 5 hours of content in less time.

Very cool! How are you doing the typeahead feature? Do you have a big list of topics or something?

Big podcast fan here. After looking at the page, I tried:


And got an empty result.

Looks like you need to sign up with some external service to use it? Why that? For the monetization? Or am I missing something?

The API can't be accessed via www.listennotes.com.

I host my API on Mashape, where it takes care of some mundane but necessary tasks for building an API, e.g., API key/secret, rate limit, payment, documentation, ...

For details, please refer to https://market.mashape.com/listennotes/listennotes

Yup, got it!

It says "Search (almost) all podcasts & episodes on the Internet."

Where does the data come from?

From iTunes initially. But right now, more and more podcasters submit their podcasts to Listen Notes. Podcasters want their shows to be discovered.

how exactly do you rate "best podcasts"? not disagreeing, just interested in methodology. ever thought of giving per episode rating?

I want to be frank here: it's from iTunes top chart via their API https://affiliate.itunes.apple.com/resources/documentation/i...

In the future I'll use my own data to come up with my own top chart, e.g., data for searches, clicks, plays, shares, ratings (TODO)...

ah i see. might want to disclose that then. i have no problem with it but it might confuse people that you are adding your own rating somehow

For now, there's a footnote at the bottom of that page :)

Sure, I need to make it more clear.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact