Deep Video Analytics is an open source visual data analytics platform. The platform uses deep learning based indexing, detection and recognition models for visual search. Using Deep Video Analytics users can quickly load, annotate, index, images & videos. They can detect and recognize objects (such as faces) and seamlessly import and share processed datasets using Visual Data Network. Deep Video Analytics is developed using Django, Postgres, Tensor Flow & Docker to enable flexible deployment. You can find more information here: https://deepvideoanalytics.com/
When you say video, are you talking just images, or also audio? I built a video search site (https://www.findlectures.com) and I'm investigating options like this to feed more ranking factors into the system.
We have Visual Search as a primary interface. However the goal is to build an application agnostic visual data analytics platform. Similar to a relational database we have high level concepts of indexers (convert image/bounding box into a feature vector), clusterers (cluster feature vectors) and retrievers (retrieve similar images/objects/annotated-regions).
To answer second question we also detect objects (VOC, YOLO 9000, Faces etc.), detected objects are also indexed and retrieved when performing visual search. Further you can perform clustering on these set of "indexing" vectors for things such as fast retrieval and quick labeling/annotations. We use Flickr LOPQ to implement ANN but like all other things you can use custom algorithm. I am working on adding indexing over any set of annotations/detections/frames.
You can find more information about the design goals and vision behind the project in presentation at https://deepvideoanalytics.com/
All frames and detected objects are indexed using inception Pool3 features, which serves as a General purpose indexer. So yes you can search for arbitrary objects such as a blue car or a long or a particular "scene", at the same time when definition of "similarity" is more object-specific/fine-grained (such as in case of face) you can use a custom network (such as facenet for faces) to generate the embedding/indexing vector.
To summarize yes we provide two indexers out of the box a general purpose inception v3 and a facenet. We plan to add more indexers soon, e.g. trained on Open Images or other domain specific dataset.
Yeah I am actively working on improving Visual Data Network, essentially the goals is to make sharing, downloading and configuration of datasets to be a single click operation.
Sharing visual data opens up a whole new set of opportunities for both businesses and researchers.
The features are stored in .npy files. Currently there is a rest API but its only for django models (using amazing django rest framework), for search and feature retrieval creating API is straightforward and I will add one soon.