
Millions of Videos, and Now a Way to Search Inside Them - onebeerdave
http://www.nytimes.com/2007/02/25/business/yourmoney/25slip.html?_r=1&oref=slogin
======
Nick_Smith
I'm not much of an expert on this stuff... what do you all think about their
chances?

~~~
dangrsmind
I'd say I'm a little skeptical.

The first question I'd have is how fast they can parse video. The second is
how much it costs to do it.

It seems you would have to be able to do recognition much faster than real-
time for a realistic web video search capability (see for example
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=599600) and you would
certainly need a lot of hardware to do this at scale for millions of video
clips.

See also: http://www.newmediamusings.com/blog/2005/09/blinkx_a_citize.html

~~~
jwp
The first link you cited is spot on. The authors are from Univ of Cambridge,
and work on HTK <http://htk.eng.cam.ac.uk/>.

That paper is 10 years old. As I'm sure you can imagine, there have been
improvements in the field since then. To be completely honest, I don't stay on
top of search applied to speech, but the keyword you want is "Spoken Document
Retrieval" (SDR). Ciprian Chelba and TJ Hazen do cool stuff in this area; they
are giving a tutorial at ICASSP this year SDR.

An aside. Both of these approaches use the fact that when you process speech,
you essentially form a graph of words (or phonemes). Paths through the graph
represent possible transcriptions. So, since graph is a denser, richer thing
to search than the transcript, and we've got graph algorithms sitting around,
there are neat tricks you can do to build a search engine index for speech...

I've recently been reading some interesting work that uses locality-sensitive
hashing to search audio. The Google speech people are presenting a lot of it
at ICASSP this year. See this post for more, and chase the links in their
papers for even more: <http://googleresearch.blogspot.com/2007/02/hear-here-
sample-of-audio-processing.html>

~~~
dangrsmind
Thanks for the information and links. My background is in video and image
processing, well originally multiple target tracking, sensor management, and
sensor fusion, but now I work in biometrics and video analytics. Understood
about processing the information into a graph.

Your point about Google raises one of the obvious questions about this
company... if Google is doing leading edge research in this field it seems
unlikely they need to buy a "video search destination" site employing lesser
technologies, that is unless it gets really really big (i.e. YouTube). They
might be interested in some deep technology, but my impression from the
reading I've done and the links you've posted is that Blinkx is using standard
well known techniques to achieve their results.

FWIW: I was applying Markov modeling to areas such as mission planning and
modeling integrated air defense networks back almost twenty years ago now. We
didn't call them HMMs, but there were some very similar ideas employed.

~~~
jwp
Hmm, perhaps we should talk. Email me at e40.32313371@bloglines.com if you're
interested.

------
Nick_Smith
Since the NYT demands your info:

THE World Wide Web is awash in digital video, but too often we cant find the
videos we want or browse for what we might like.

Thats a loss, because if we could search for Internet videos, they might
become the content of a global television station, just as the Webs
hypertext, once it was organized and tamed by search, became the stuff of a
universal library.

What we need, says Suranga Chandratillake, a co-founder of Blinkx, a start-up
in San Francisco, is a remote control for the Webs videos, a kind of
electronic TV Guide. Hes got just the thing.

Videos have multiplied on social networks like YouTube and MySpace as well as
on news and entertainment sites because of the emergence of video-sharing,
user-generated video, free digital storage and broadband and Wi-Fi networks.

Today, owing to the proliferation of large video files, video accounts for
more than 60 percent of the traffic on the Internet, according to CacheLogic,
a company in Cambridge, England, that sells media delivery systems to
Internet service providers. I imagine that within two years it will be 98
percent, says Hui Zhang, a computer scientist at Carnegie Mellon University
in Pittsburgh.

But search engines  like Google  that were developed during the first, text-
based era of the Web do a poor job of searching through this rising sea of
video. Thats because they dont search the videos themselves, but rather
things associated with them, including the text of a Web page, the metadata
that computers use to display or understand pages (like keywords or the
semantic tags that describe different content), video-file suffixes (like
.mpeg or .avi), or captions or subtitles.

None of these methods are very satisfactory. Many Internet videos have little
or obscure text, and clips often have no or misleading metadata. Modern video
players do not reveal video-file suffixes, and captions and subtitles
imperfectly capture the spoken words in a video.

The difficulties of knowing which videos are where challenge the growth of
Internet video. If there are going to be hundreds of millions of hours of
video content online, Mr. Chandratillake said, we need to have an efficient,
scalable way to search through it.

Mr. Chandratillakes history is unusual for Silicon Valley. He was born in Sri
Lanka in 1977 and divided his childhood among England and various countries in
South Asia where his father, a professor of nuclear chemistry, worked. Then he
studied distributed processing at Kings College, Cambridge, before becoming
the chief technology officer of Autonomy, a company that specializes in
something called meaning-based computing. This background possibly suggested
an original approach to search when he founded Blinkx in 2004.

Mr. Chandratillakes solution does not reject any existing video search
methods, but supplements them by transcribing the words uttered in a video,
and searching them. This is an achievement: effective speech recognition is a
nontrivial problem, in the language of computer scientists.

Blinkxs speech-recognition technology employs neural networks and machine
learning using hidden Markov models, a method of statistical analysis in
which the hidden characteristics of a thing are guessed from what is known.

Mr. Chandratillake calls this method contextual search, and he says it works
so well because the meanings of the sounds of speech are unclear when
considered by themselves. Consider the phrase recognize speech,  he wrote
in an e-mail message. Its phonemes (rek-un-nise-peach) are incredibly
similar to those contained in the phrase wreck a nice beach. Our systems use
our knowledge of which words typically appear in which contexts and everything
we know about a given clip to improve our ability to guess what each phoneme
actually means.

While neural networks and machine learning are not new, their application to
video search is unique to Blinkx, and very clever.

How good is blinkx search? When you visit blinkx.com, the first thing you see
is the video wall, 25 small, shimmering tiles, each displaying a popular
video clip, indexed that hour. (The wall provides a powerful sense of the
collective mind of our popular culture.)

To experiment, I typed in the phrase Chronic  WHAT  cles of Narnia, the
shout-out in the Saturday Night Live digital short called Lazy Sunday, a
rap parody of two New York slackers. I wanted a phrase that a Web surfer would
know more readily than the real title of a video. I also knew that Lazy
Sunday, for all its cultish fame, would be hard to find: NBC Universal had
freely released the rap parody on the Internet after broadcasting it in
December 2005, but last month the company insisted that YouTube pull it.

Nonetheless, Blinkx found eight instances of Lazy Sunday when I tried it
last week. By contrast, Google Video found none. Typing Lazy Sunday into the
keyword search box on Googles home page produced hundreds of results  but
many were commentaries about the video, and many had nothing to do with
Saturday Night Live.

Blinkx, which has raised more than $12.5 million from angel investors, earns
money by licensing its technology to other sites. Although Blinkx has more
than 80 such partners, including Microsoft, Playboy, Reuters and MTV, it
rarely discloses the terms of its deals. Mr. Chandratillake said some
licensees pay Blilnkx directly while others share revenue and some do both.
Blinkx has revealed the details of one deal: ITN, a British news broadcaster,
will share the revenue generated by advertising inserted in its videos.

For all of Blinkxs level coolness, there are at least three obvious obstacles
to the companys success.

First, because Google Video is not much good now doesnt mean it wont get
better: after all, when Blinkx was founded, it first applied machine learning
to searching the desktops of personal computers, a project that was abandoned
when Google and Microsoft released their own desktop search bars.

Second, even if Google improbably fails to develop effective video search, the
field will still be crowded: TruVeo, Flurl, ClipBlast and other start-ups are
all at work on different subsets of the market.

Finally, Blinkx might not go far enough in searching the content of videos:
the company searches their sounds, but not their images.

THIS last objection is the most serious.

Because Blinkx emphasizes speech recognition, there is a great amount of
multimedia content that they cannot address, like photographs, said John R.
Smith, a senior manager in the intelligent information management department
of I.B.M.s T. J. Watson Research Center in Hawthorne, N.Y. But whats worse,
speech is not a very good indicator of whats being shown in a video.

Mr. Smith says he has been working on an experimental video search engine
called Marvel, which also uses machine learning but organizes visual
information as well as speech.

Still, at least for now, Blinkx leads video search: it searches more than
seven million hours of video and is the largest repository of digital video on
the Web.

Search is our navigation, our interface to the Internet, said John Battelle,
chief of Federated Media Publishing and author of The Search, an account of
the rise of Google. With Blinkx, we may have such an interface for digital
video, and be a little closer to Mr. Chandratillakes vision of a universal
remote control.

Jason Pontin is the editor in chief and publisher of Technology Review, a
magazine and Web site owned by M.I.T. E-mail: pontin@nytimes.com.

