
Show HN: Deep learning visual search and data analytics - aub3bhat
https://github.com/AKSHAYUBHAT/DeepVideoAnalytics
======
aub3bhat
Deep Video Analytics is an open source visual data analytics platform. The
platform uses deep learning based indexing, detection and recognition models
for visual search. Using Deep Video Analytics users can quickly load,
annotate, index, images & videos. They can detect and recognize objects (such
as faces) and seamlessly import and share processed datasets using Visual Data
Network. Deep Video Analytics is developed using Django, Postgres, Tensor Flow
& Docker to enable flexible deployment. You can find more information here:
[https://deepvideoanalytics.com/](https://deepvideoanalytics.com/)

~~~
garysieling
When you say video, are you talking just images, or also audio? I built a
video search site
([https://www.findlectures.com](https://www.findlectures.com)) and I'm
investigating options like this to feed more ranking factors into the system.

~~~
aub3bhat
Hi the system is under active development I am planning to add support for
audio processing using

1\.
[https://projects.csail.mit.edu/soundnet/](https://projects.csail.mit.edu/soundnet/)

2\.
[https://github.com/aalireza/SimpleAudioIndexer](https://github.com/aalireza/SimpleAudioIndexer)
(using PocketSphinx, NOT Watson)

~~~
garysieling
Awesome, thanks!

------
Omnipresent
This is phenomenal. Is the primary use case for this to:

\- Find similar looking frames?

Question:

\- Does it perform object detection on the frame? Similar to the video demo on
Clarifai - [https://clarifai.com/demo](https://clarifai.com/demo) ?

~~~
aub3bhat
We have Visual Search as a primary interface. However the goal is to build an
application agnostic visual data analytics platform. Similar to a relational
database we have high level concepts of indexers (convert image/bounding box
into a feature vector), clusterers (cluster feature vectors) and retrievers
(retrieve similar images/objects/annotated-regions).

To answer second question we also detect objects (VOC, YOLO 9000, Faces etc.),
detected objects are also indexed and retrieved when performing visual search.
Further you can perform clustering on these set of "indexing" vectors for
things such as fast retrieval and quick labeling/annotations. We use Flickr
LOPQ to implement ANN but like all other things you can use custom algorithm.
I am working on adding indexing over any set of annotations/detections/frames.

You can find more information about the design goals and vision behind the
project in presentation at
[https://deepvideoanalytics.com/](https://deepvideoanalytics.com/)

------
thedatamonger
This looks awesome. I look forward to the usual hacker news banter of "yes
this is awesome but this is awesomer, see xyz" :) yes awesomer is a word

------
dk8996
Is there a way to find other things besides faces, for example a blue car or a
logo?

~~~
aub3bhat
All frames and detected objects are indexed using inception Pool3 features,
which serves as a General purpose indexer. So yes you can search for arbitrary
objects such as a blue car or a long or a particular "scene", at the same time
when definition of "similarity" is more object-specific/fine-grained (such as
in case of face) you can use a custom network (such as facenet for faces) to
generate the embedding/indexing vector.

To summarize yes we provide two indexers out of the box a general purpose
inception v3 and a facenet. We plan to add more indexers soon, e.g. trained on
Open Images or other domain specific dataset.

------
Jayakumark
Exactly whats needed now as part of growing dataset and deep learning models

~~~
aub3bhat
Yeah I am actively working on improving Visual Data Network, essentially the
goals is to make sharing, downloading and configuration of datasets to be a
single click operation.

Sharing visual data opens up a whole new set of opportunities for both
businesses and researchers.

~~~
Omnipresent
Can you please explain some real world use cases you're thinking about?

------
r0lisz
Could this be used to power something similar to Google Photos?

~~~
aub3bhat
Yes we have entire pipeline (detection -> embedding -> clustering) for faces
as well as ability to extract text tags using models such as Open Images.

------
salilpn12
is there a way to create an API which returns these features?

~~~
aub3bhat
The features are stored in .npy files. Currently there is a rest API but its
only for django models (using amazing django rest framework), for search and
feature retrieval creating API is straightforward and I will add one soon.

