
Show HN: Lexika – Search Engine for Spoken Words/Phrases in YouTube Videos - stephensonsco
http://www.lexika.io
======
tsurantino
YouTube actually had something like this available for a while (or still does
this?) through Caption Search.[1] If a video had captions available (either
user provided or automated), one could search based on the captions on that
video. Results would return the video and the time associated with that clip
in question.

I am not exactly sure why they discontinued the user experience, since it
seems like it could be really useful for finding relevant parts in potentially
long videos.

[1] [http://youtube-global.blogspot.ca/2012/02/captions-for-
all-m...](http://youtube-global.blogspot.ca/2012/02/captions-for-all-more-
options-for-your.html)

~~~
stephensonsco
Yeah, good point. We knew about this but the accuracy wasn't all that great.
Our algorithm is fuzzy and can find results even if they are transcribed
incorrectly (even today, every speech transcription technology out there still
makes plenty of mistakes).

------
tonydiv
I would love to contact you regarding using this for sales phone calls. Often
times, my employees don't remember an important portion of some conversation.
This makes it really easy to find that!

tonydiepenbrock[@at]gmail.com

~~~
mkorfmann
Me too!

manu <at> korfmann <dot> info

------
pmorici
Will this work with videos that aren't already subtitled? I haven't gotten one
without subtitles to work yet maybe I'm just not waiting long enough though.

~~~
stephensonsco
We do two things simultaneously. If there is no closed caption, then the video
is submitted to our API to be processed (which takes longer on longer videos -
a link shows up on the page to let you know the future URL). If there is a CC,
then we parse that right away (which gives ok search results) but still submit
to our API for processing, so later the search results will purely be from our
indexing.

~~~
doh
If you're interested we can talk about some mutual cooperation. We
([https://pexe.so](https://pexe.so)) have data for every publicly available
video on Youtube and other sites (Facebook, Twitter, Vine, ...) including
fresh metadata (views, keywords, ...).

------
fiatmoney
This only works if you know the _particular_ youtube video the phrase appears
in.

Scraping the content would be more useful, but I'm guessing that gets shut
down quick.

~~~
stephensonsco
You are right that you need to input the particular video for now, but we are
expanding to include multi-video search (constantly indexing in the
background). Right now the use case is particularly nice for finding phrases
in long videos (look for Donald Trump saying random stuff {wall, tremendous,
many many, women}, it is pretty comical - e.g.
[http://www.lexika.io/?s=https%3A%2F%2Fwww.youtube.com%2Fwatc...](http://www.lexika.io/?s=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DH-
DSfvYCKwY)).

~~~
fiatmoney
Do you have an agreement with Google? I'm guessing this is against their TOS,
especially considering they're doing a lot of work in this area themselves.

------
jjzhang
A heads up on the homepage, the video you featured Steve Carrell and Jimmy
Fallon with the keyword "pantaloons" is captioned as "the wrong kind
pantaloons ah", when the spoken phrase was "the wrong kind of pantaloons on"
\- maybe this is me being paranoid and pedantic, but I feel like having a
featured video with a pre-selected keyword that returns flawed captions gives
a worse impression than just having the video flags without any captions at
all. I understand you guys are still refining the product but highlighting one
of your flaws (however small) so early on can rub people the wrong way.

On a separate note, when I google "lexika search engine" I get your old
homepage lexika.co instead of the one linked here, lexika.io. Consider setting
up a redirect from .co to .io?

~~~
stephensonsco
Thanks for the transcription feedback. You can look at this as a problem but
we think it's a cute demonstration of our search ability. We didn't really set
out to make transcriptions perfect (that's very, very hard), we wanted to make
search very accurate. Having a better transcription model would be great
though (we are working on that!).

We'll definitely work on the site. (We should have done that earlier.) Thanks
again!

------
mrborgen
Nice service! May I ask what machine learning technology you're using? I'm
experimenting with neural nets these days, in order to learn the basics of
image & speech recognition and would love to hear about your experiences if
your'e usikg ANN's.

~~~
stephensonsco
You've got it. Deep neural networks are currently the best performers when
doing automatic speech recognition.

------
chejazi
YouTube is a huge domain - the potential for audio indexing could bring vast
improvements to search relevancy. What other audio domains are you
experimenting with?

~~~
stephensonsco
The first thing that comes to mind is podcasts. That are a slightly different
direction, but a great fit for our service. They're long, have well spoken
words, and are regularly released, so there is a mound of searchable audio
data out there just waiting to be liberated.

------
xasos
I saw this chrome extension a while back that embeds a search button directly
in YouTube and searches for keywords/phrases in the video:
[https://chrome.google.com/webstore/detail/ytfind/gkeiaihfolc...](https://chrome.google.com/webstore/detail/ytfind/gkeiaihfolcgfgijebiihmmgknapfgpj?hl=en-
US)

------
dmcswain
Also, once you've found videos you like, you can then load them by voice
command (voice bookmark):

[https://youtube.com/treycent/search?query=youtube](https://youtube.com/treycent/search?query=youtube)

------
bedeho
Really well executed product in the 'why doesn't this exist already' category!

------
misiti3780
I love this idea - can you shed any light on the technologies you are using?

~~~
stephensonsco
Yeah for sure! This site is basically a demo/test for our backend.

We built an API that processes audio and forms an index for that file. The API
search function then goes into that index to look for queries. It doesn't just
look for words that match the text you see, but also the way it sounds.

------
duaneb
The potential for use finding gifs is enormous.

~~~
stephensonsco
I am loving making mashup videos with it. Trump talking about walls:
[https://www.youtube.com/watch?v=F38EwKOmDdU](https://www.youtube.com/watch?v=F38EwKOmDdU)

~~~
walterbell
How did you construct the mashup - is there a way to feed the search results
to a video editor or seamless playlist generator?

~~~
stephensonsco
Not yet but this is a pet project that I would love to get going (time
strapped now, though). I used the Lexika API to get the times from query
results (i.e. when the words were said), then used some hacked together python
along with ffmpeg with a lot of auto-generated seeking command line arguments
to splice together the video feed. Then the title screens were done in a
normal editor.

I would love to make this an app though. Just search for phrases, choose
clips, and output a cute video. You could just have a thousand people saying a
"thousand", or something equally weird.

~~~
walterbell
A decade ago, SMIL players could perform seamless editing/composition of media
streams on the fly, using only a playlist of streams and start/stop times.
It's unfortunate that those standards have fallen by the wayside, at least
until someone builds an HTML5 equivalent.

------
avodonosov
the Search button is absent (on a mobile chrome). Only the text field. How to
submit a request?

~~~
stephensonsco
You should be able to hit enter in the search field on your phone's soft
keyboard (or the button might just say search). Which phone/operating system
is this?

~~~
avodonosov
Chrome on android. In Firefox on android I can see the label "youtube video
url" (in Chrome I thought it's a query field for words to be searched). But in
Firefox the on the page images are not rendered - the page look strange.

I think it's easier for you to try yourself on android phone. Maybe it's only
me. But I browse a lot on that phone, only lexica.io works strage.

------
Hydraulix989
Could this be used for subtitling videos? I have a lot of anime VHS tapes in
Japanese.

~~~
stephensonsco
Ah, that would be really nice to do. Our speech transcription only does
English at the time but we definitely have our eye on other languages.

~~~
superplussed
When it supports other languages, please post a link to it on
/r/languagelearning. It would be very helpful for alot of people!

------
Kinnard
What's next after indexing and search?

