Current state of the art in objects classification

dmm · on Aug 14, 2018

I see object detection systems often compared by accuracy percentages but rarely by speed.

I ask because I'm building a surveillance system that uses object detection to answer simple questions, like: "Is there a squirrel on my bird feeder?", "Is my car in the garage?", "Is there a package on my front step?".

Right now I'm playing with Darknet and yolo from pjreddie because he emphasized performance and I'm getting about 69fps on my 1080ti.

Is speed not considered important by researchers at this time?

Is there any system that provides decent performance on CPUs? Dedicating $700 gpus to detect squirrels on the bird feeder is fine for crazy people like me but others may find that excessive.

kajecounterhack · on Aug 14, 2018

Inference speed and net size is definitely considered important and some researchers are definitely tackling it (e.g https://arxiv.org/abs/1602.07360). However it's more of an engineering area (meaning lots of engineers are working on it) and less of a research area (meaning fewer papers get produced per work) because there's not much in the way of performance-per-gflop benchmarks or whatever.

CPU performance is good-ish when you're doing sequential stuff e.g it's not bad at old-school RNNs, but for image recognition using a cheap graphics unit is still going to be better, and if the question is "can I use cheaper hardware and get good performance" the answer is yes, you just have to do some tricks (e.g distillation).

Plenty of image recognition projects on the cheap raspberry pi and its graphics unit.

dekhn · on Aug 14, 2018

Inference Speed is very important. The TensorFlow object detection model zoo lists timings for their models; we picked our model to meet timing demands on a cell phone.

I don't think CPUs are going to give you decent performance unless you microoptimize the inference for the host architecture and even then the gains are marginal.

If you want to avoid expensive GPUs, take a look at things like Intel's Neural Compute Stick, Google's Vision Kit and the Edge TPU. But you can also run models on modestly priced cell phones- we use $100 cell phones to do object detection and tracking, but it's not pretty.

bitL · on Aug 14, 2018

BTW, try SSD, it was even a bit faster than YOLO2.

aaroninsf · on Aug 14, 2018

Is there a Docker makefile available for playing with SSD if you're not working in the domain?

I have some 100Ks of pix I'd like to put through classifiers but could never get YOLO2 properly working in a container, in part because even reading the code I couldn't figure out how to simply point a trained model at a directory of images and get classification out... :/

Definitely PEBKAC but if there are any dead-simple tutorials available I'd love a pointer–classification speed not a factor, it would all be CPU non-real-time and could be leisurely. Just interested in seeing what the state of the art is in classifying a personal image set without training against it..

joshvm · on Aug 14, 2018

There are some wrappers around YOLO3 in Python which may help: https://github.com/qqwweee/keras-yolo3

It's not Dockerised though, but you could grab a container with the usual machine learning packages and install that on top.

Also look at NVIDIA Digits (https://github.com/NVIDIA/DIGITS) which has Docker images.

aaroninsf · on Aug 16, 2018

Thanks Josh!

aaroninsf · on Aug 14, 2018

Hmm https://github.com/knjcode/mxnet-finetuner looks like maybe what I want...

Any comments from people who know what they're doing welcome :)

nmca · on Aug 15, 2018

Checkout MobileNetV2 as a backbone, the paper uses it for SSD and I believe weights are release.

calebh · on Aug 14, 2018

Agreed. I wish that more effort would be made towards reducing inference time, since that is the limiting factor in many real-world applications.

Q6T46nT668w6i3m · on Aug 14, 2018

Speed is considered a fundamental problem, e.g. the aforementioned YOLO and _Faster_ RCNN.

antpls · on Aug 14, 2018

If you look for "tensorflow raspberry" on a search engine, you will find several pointers about running TF on cheaper hardware. You would probably get about 1 frame per second with it.

You may also look at recent SnapDragon-powered dev boards, which have TF core accelerators, and provide a serious boost of FPS for less than $700.

Also, using a cloud ML service is an option if you have only a few inferences/day to process

yorwba · on Aug 14, 2018

> Last updated on 2016-02-22.

Given the speed of development in machine learning, the numbers are probably outdated.

Dzugaru · on Aug 14, 2018

Many years of "human optimization" on the same old datasets must lead to severe overfitting, so these results should be taken with a grain of salt. Related research:

https://arxiv.org/abs/1806.00451

bonoboTP · on Aug 14, 2018

Have you read the linked paper? It finds no evidence of such overfitting happening, despite common fears of this phenomenon.

Quote: "This shows that the current research methodology of “attacking” a test set for an extended period of time is surprisingly resilient to overfitting."

dontreact · on Aug 14, 2018

This is two years out of date. Current state of the art for CIFAR has about 1/4th of the errors

https://arxiv.org/abs/1805.09501

jamessb · on Aug 14, 2018

Similar data (but for a wider range of problems - not just object classification) was tabulated by the EFF's AI Progress Measurement project [1]; I made an alternative interface to visualise it [2].

[1]: https://www.eff.org/ai/metrics

[2]: https://jamesscottbrown.github.io/ai-progress-vis/index.html

spott · on Aug 14, 2018

Anyone know of something like this for face recognition?

bitL · on Aug 14, 2018

Must be very old, DenseNet-BC is nowhere mentioned...