Millions of Videos, and Now a Way to Search Inside Them

Nick_Smith · on Feb 25, 2007

I'm not much of an expert on this stuff... what do you all think about their chances?

jwp · on Feb 26, 2007

I work on speech recognition. What Blinkx is doing isn't novel. (Sorry, Blinkx.) Google has top speech researchers working on search for speech and video. Same for MSFT. Remember the Kai-Fu Lee thing? The guy who built CMU Sphinx, an open-source HMM-based speech recognizer, in the late 80s? He and many other solid speech people are at Google to work on searching audio/video.

And there are other companies in this space, but they tend to center around US gov customers. Virage is one. It's owned by the Autonomy group, where, according to the article, the founder of this company used to work.

There's also Podzinger, a subsidary of BBN, which is another company that gets a lot of gov business. Podzinger runs BBN's speech recognition system on podcasts and videos, and pipes the output to a search engine: <http://podzinger.com/>.

I could go on... And if people are interested, I'd be happy to post links to some relevant papers and tools.

To my mind 2 interesting things are going on here. 1) The company appears to be thriving by applying 20 year-old stuff from the lab to a new problem, in apparently no special way. (And that's not a bad thing!) 2) They got an article in the NYT business section to talk about Hidden Markov Models. Although maybe that's not so surprising, since hedge funds have recently started speaking out about using machine learning.

danielha · on Feb 25, 2007

All I know is that it works. I tried out a few terms and got what I had in mind every time.

They heavily emphasize speech recognition, I think. For what this is, it's very cool. The technology is there and the product works. I think this is going places.

dangrsmind · on Feb 26, 2007

I'd say I'm a little skeptical.

The first question I'd have is how fast they can parse video. The second is how much it costs to do it.

It seems you would have to be able to do recognition much faster than real-time for a realistic web video search capability (see for example http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=599600) and you would certainly need a lot of hardware to do this at scale for millions of video clips.

See also: http://www.newmediamusings.com/blog/2005/09/blinkx_a_citize.html

jwp · on Feb 26, 2007

The first link you cited is spot on. The authors are from Univ of Cambridge, and work on HTK <http://htk.eng.cam.ac.uk/>.

That paper is 10 years old. As I'm sure you can imagine, there have been improvements in the field since then. To be completely honest, I don't stay on top of search applied to speech, but the keyword you want is "Spoken Document Retrieval" (SDR). Ciprian Chelba and TJ Hazen do cool stuff in this area; they are giving a tutorial at ICASSP this year SDR.

An aside. Both of these approaches use the fact that when you process speech, you essentially form a graph of words (or phonemes). Paths through the graph represent possible transcriptions. So, since graph is a denser, richer thing to search than the transcript, and we've got graph algorithms sitting around, there are neat tricks you can do to build a search engine index for speech...

I've recently been reading some interesting work that uses locality-sensitive hashing to search audio. The Google speech people are presenting a lot of it at ICASSP this year. See this post for more, and chase the links in their papers for even more: <http://googleresearch.blogspot.com/2007/02/hear-here-sample-of-audio-processing.html>

dangrsmind · on Feb 26, 2007

Thanks for the information and links. My background is in video and image processing, well originally multiple target tracking, sensor management, and sensor fusion, but now I work in biometrics and video analytics. Understood about processing the information into a graph.

Your point about Google raises one of the obvious questions about this company... if Google is doing leading edge research in this field it seems unlikely they need to buy a "video search destination" site employing lesser technologies, that is unless it gets really really big (i.e. YouTube). They might be interested in some deep technology, but my impression from the reading I've done and the links you've posted is that Blinkx is using standard well known techniques to achieve their results.

FWIW: I was applying Markov modeling to areas such as mission planning and modeling integrated air defense networks back almost twenty years ago now. We didn't call them HMMs, but there were some very similar ideas employed.

jwp · on Feb 26, 2007

Hmm, perhaps we should talk. Email me at e40.32313371@bloglines.com if you're interested.

Nick_Smith · on Feb 25, 2007

Since the NYT demands your info:

THE World Wide Web is awash in digital video, but too often we cant find the videos we want or browse for what we might like.

Thats a loss, because if we could search for Internet videos, they might become the content of a global television station, just as the Webs hypertext, once it was organized and tamed by search, became the stuff of a universal library.

What we need, says Suranga Chandratillake, a co-founder of Blinkx, a start-up in San Francisco, is a remote control for the Webs videos, a kind of electronic TV Guide. Hes got just the thing.

Videos have multiplied on social networks like YouTube and MySpace as well as on news and entertainment sites because of the emergence of video-sharing, user-generated video, free digital storage and broadband and Wi-Fi networks.

Today, owing to the proliferation of large video files, video accounts for more than 60 percent of the traffic on the Internet, according to CacheLogic, a company in Cambridge, England, that sells media delivery systems to Internet service providers. I imagine that within two years it will be 98 percent, says Hui Zhang, a computer scientist at Carnegie Mellon University in Pittsburgh.

But search engines like Google that were developed during the first, text-based era of the Web do a poor job of searching through this rising sea of video. Thats because they dont search the videos themselves, but rather things associated with them, including the text of a Web page, the metadata that computers use to display or understand pages (like keywords or the semantic tags that describe different content), video-file suffixes (like .mpeg or .avi), or captions or subtitles.

None of these methods are very satisfactory. Many Internet videos have little or obscure text, and clips often have no or misleading metadata. Modern video players do not reveal video-file suffixes, and captions and subtitles imperfectly capture the spoken words in a video.

The difficulties of knowing which videos are where challenge the growth of Internet video. If there are going to be hundreds of millions of hours of video content online, Mr. Chandratillake said, we need to have an efficient, scalable way to search through it.

Mr. Chandratillakes history is unusual for Silicon Valley. He was born in Sri Lanka in 1977 and divided his childhood among England and various countries in South Asia where his father, a professor of nuclear chemistry, worked. Then he studied distributed processing at Kings College, Cambridge, before becoming the chief technology officer of Autonomy, a company that specializes in something called meaning-based computing. This background possibly suggested an original approach to search when he founded Blinkx in 2004.

Mr. Chandratillakes solution does not reject any existing video search methods, but supplements them by transcribing the words uttered in a video, and searching them. This is an achievement: effective speech recognition is a nontrivial problem, in the language of computer scientists.

Blinkxs speech-recognition technology employs neural networks and machine learning using hidden Markov models, a method of statistical analysis in which the hidden characteristics of a thing are guessed from what is known.

Mr. Chandratillake calls this method contextual search, and he says it works so well because the meanings of the sounds of speech are unclear when considered by themselves. Consider the phrase recognize speech, he wrote in an e-mail message. Its phonemes (rek-un-nise-peach) are incredibly similar to those contained in the phrase wreck a nice beach. Our systems use our knowledge of which words typically appear in which contexts and everything we know about a given clip to improve our ability to guess what each phoneme actually means.

While neural networks and machine learning are not new, their application to video search is unique to Blinkx, and very clever.

How good is blinkx search? When you visit blinkx.com, the first thing you see is the video wall, 25 small, shimmering tiles, each displaying a popular video clip, indexed that hour. (The wall provides a powerful sense of the collective mind of our popular culture.)

To experiment, I typed in the phrase Chronic WHAT cles of Narnia, the shout-out in the Saturday Night Live digital short called Lazy Sunday, a rap parody of two New York slackers. I wanted a phrase that a Web surfer would know more readily than the real title of a video. I also knew that Lazy Sunday, for all its cultish fame, would be hard to find: NBC Universal had freely released the rap parody on the Internet after broadcasting it in December 2005, but last month the company insisted that YouTube pull it.

Nonetheless, Blinkx found eight instances of Lazy Sunday when I tried it last week. By contrast, Google Video found none. Typing Lazy Sunday into the keyword search box on Googles home page produced hundreds of results but many were commentaries about the video, and many had nothing to do with Saturday Night Live.

Blinkx, which has raised more than $12.5 million from angel investors, earns money by licensing its technology to other sites. Although Blinkx has more than 80 such partners, including Microsoft, Playboy, Reuters and MTV, it rarely discloses the terms of its deals. Mr. Chandratillake said some licensees pay Blilnkx directly while others share revenue and some do both. Blinkx has revealed the details of one deal: ITN, a British news broadcaster, will share the revenue generated by advertising inserted in its videos.

For all of Blinkxs level coolness, there are at least three obvious obstacles to the companys success.

First, because Google Video is not much good now doesnt mean it wont get better: after all, when Blinkx was founded, it first applied machine learning to searching the desktops of personal computers, a project that was abandoned when Google and Microsoft released their own desktop search bars.

Second, even if Google improbably fails to develop effective video search, the field will still be crowded: TruVeo, Flurl, ClipBlast and other start-ups are all at work on different subsets of the market.

Finally, Blinkx might not go far enough in searching the content of videos: the company searches their sounds, but not their images.

THIS last objection is the most serious.

Because Blinkx emphasizes speech recognition, there is a great amount of multimedia content that they cannot address, like photographs, said John R. Smith, a senior manager in the intelligent information management department of I.B.M.s T. J. Watson Research Center in Hawthorne, N.Y. But whats worse, speech is not a very good indicator of whats being shown in a video.

Mr. Smith says he has been working on an experimental video search engine called Marvel, which also uses machine learning but organizes visual information as well as speech.

Still, at least for now, Blinkx leads video search: it searches more than seven million hours of video and is the largest repository of digital video on the Web.

Search is our navigation, our interface to the Internet, said John Battelle, chief of Federated Media Publishing and author of The Search, an account of the rise of Google. With Blinkx, we may have such an interface for digital video, and be a little closer to Mr. Chandratillakes vision of a universal remote control.

Jason Pontin is the editor in chief and publisher of Technology Review, a magazine and Web site owned by M.I.T. E-mail: pontin@nytimes.com.