Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Lexika – Search Engine for Spoken Words/Phrases in YouTube Videos (lexika.io)
85 points by stephensonsco on Oct 13, 2015 | hide | past | favorite | 33 comments



YouTube actually had something like this available for a while (or still does this?) through Caption Search.[1] If a video had captions available (either user provided or automated), one could search based on the captions on that video. Results would return the video and the time associated with that clip in question.

I am not exactly sure why they discontinued the user experience, since it seems like it could be really useful for finding relevant parts in potentially long videos.

[1] http://youtube-global.blogspot.ca/2012/02/captions-for-all-m...


Yeah, good point. We knew about this but the accuracy wasn't all that great. Our algorithm is fuzzy and can find results even if they are transcribed incorrectly (even today, every speech transcription technology out there still makes plenty of mistakes).


I would love to contact you regarding using this for sales phone calls. Often times, my employees don't remember an important portion of some conversation. This makes it really easy to find that!

tonydiepenbrock[@at]gmail.com


Me too!

manu <at> korfmann <dot> info


Will this work with videos that aren't already subtitled? I haven't gotten one without subtitles to work yet maybe I'm just not waiting long enough though.


We do two things simultaneously. If there is no closed caption, then the video is submitted to our API to be processed (which takes longer on longer videos - a link shows up on the page to let you know the future URL). If there is a CC, then we parse that right away (which gives ok search results) but still submit to our API for processing, so later the search results will purely be from our indexing.


If you're interested we can talk about some mutual cooperation. We (https://pexe.so) have data for every publicly available video on Youtube and other sites (Facebook, Twitter, Vine, ...) including fresh metadata (views, keywords, ...).


This only works if you know the particular youtube video the phrase appears in.

Scraping the content would be more useful, but I'm guessing that gets shut down quick.


You are right that you need to input the particular video for now, but we are expanding to include multi-video search (constantly indexing in the background). Right now the use case is particularly nice for finding phrases in long videos (look for Donald Trump saying random stuff {wall, tremendous, many many, women}, it is pretty comical - e.g. http://www.lexika.io/?s=https%3A%2F%2Fwww.youtube.com%2Fwatc...).


Do you have an agreement with Google? I'm guessing this is against their TOS, especially considering they're doing a lot of work in this area themselves.


A heads up on the homepage, the video you featured Steve Carrell and Jimmy Fallon with the keyword "pantaloons" is captioned as "the wrong kind pantaloons ah", when the spoken phrase was "the wrong kind of pantaloons on" - maybe this is me being paranoid and pedantic, but I feel like having a featured video with a pre-selected keyword that returns flawed captions gives a worse impression than just having the video flags without any captions at all. I understand you guys are still refining the product but highlighting one of your flaws (however small) so early on can rub people the wrong way.

On a separate note, when I google "lexika search engine" I get your old homepage lexika.co instead of the one linked here, lexika.io. Consider setting up a redirect from .co to .io?


Thanks for the transcription feedback. You can look at this as a problem but we think it's a cute demonstration of our search ability. We didn't really set out to make transcriptions perfect (that's very, very hard), we wanted to make search very accurate. Having a better transcription model would be great though (we are working on that!).

We'll definitely work on the site. (We should have done that earlier.) Thanks again!


Nice service! May I ask what machine learning technology you're using? I'm experimenting with neural nets these days, in order to learn the basics of image & speech recognition and would love to hear about your experiences if your'e usikg ANN's.


You've got it. Deep neural networks are currently the best performers when doing automatic speech recognition.


YouTube is a huge domain - the potential for audio indexing could bring vast improvements to search relevancy. What other audio domains are you experimenting with?


The first thing that comes to mind is podcasts. That are a slightly different direction, but a great fit for our service. They're long, have well spoken words, and are regularly released, so there is a mound of searchable audio data out there just waiting to be liberated.


I saw this chrome extension a while back that embeds a search button directly in YouTube and searches for keywords/phrases in the video: https://chrome.google.com/webstore/detail/ytfind/gkeiaihfolc...


Also, once you've found videos you like, you can then load them by voice command (voice bookmark):

https://youtube.com/treycent/search?query=youtube


Really well executed product in the 'why doesn't this exist already' category!


I love this idea - can you shed any light on the technologies you are using?


Yeah for sure! This site is basically a demo/test for our backend.

We built an API that processes audio and forms an index for that file. The API search function then goes into that index to look for queries. It doesn't just look for words that match the text you see, but also the way it sounds.


The potential for use finding gifs is enormous.


I am loving making mashup videos with it. Trump talking about walls: https://www.youtube.com/watch?v=F38EwKOmDdU


How did you construct the mashup - is there a way to feed the search results to a video editor or seamless playlist generator?


Not yet but this is a pet project that I would love to get going (time strapped now, though). I used the Lexika API to get the times from query results (i.e. when the words were said), then used some hacked together python along with ffmpeg with a lot of auto-generated seeking command line arguments to splice together the video feed. Then the title screens were done in a normal editor.

I would love to make this an app though. Just search for phrases, choose clips, and output a cute video. You could just have a thousand people saying a "thousand", or something equally weird.


A decade ago, SMIL players could perform seamless editing/composition of media streams on the fly, using only a playlist of streams and start/stop times. It's unfortunate that those standards have fallen by the wayside, at least until someone builds an HTML5 equivalent.


the Search button is absent (on a mobile chrome). Only the text field. How to submit a request?


You should be able to hit enter in the search field on your phone's soft keyboard (or the button might just say search). Which phone/operating system is this?


Chrome on android. In Firefox on android I can see the label "youtube video url" (in Chrome I thought it's a query field for words to be searched). But in Firefox the on the page images are not rendered - the page look strange.

I think it's easier for you to try yourself on android phone. Maybe it's only me. But I browse a lot on that phone, only lexica.io works strage.


Could this be used for subtitling videos? I have a lot of anime VHS tapes in Japanese.


Ah, that would be really nice to do. Our speech transcription only does English at the time but we definitely have our eye on other languages.


When it supports other languages, please post a link to it on /r/languagelearning. It would be very helpful for alot of people!


What's next after indexing and search?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: