
Object Detection with Open Images using Tensorflow - mikeyanderson
https://blog.algorithmia.com/deep-dive-into-object-detection-with-open-images-using-tensorflow/
======
joshvm
One thing that really puts me off Tensorflow is the large amount of work you
have to do just to get your data in.

I recently trained a detector using Nvidia's detectnet and the DIGITS
environment really makes it easy from dataset/database creation to model
generation and training. The web interface handles all the padding and
resizing for you when you import your images, and detectnet does online
augmentation during training. It takes data in KITTI format which is just
image + textfile with labels. No wrangling in Python needed.

The documentation is pretty terrible though, you have to pore over github
issues to fix and modify things, but it does work.

~~~
zeryx
I would be lying if I didn't agree with you. The valuable part is that
Tensorflow's extendability and functionality are best in class. Making an
object detector in scikit-learn is incredibly difficult, same with torch.
Keras definitely can improve the experience but keras is also quite fragile.
When in doubt, use the best tool for the job.

~~~
joshvm
For what I was using it for - locating fruit on a plant - detectnet works
pretty well (it runs at 7fps on a TX2 at 1280x960). It's a modified GoogLeNet
with some custom layer on the end to do the bounding boxes. DIGITS itself is
just a management console.

That said, you can train Tensorflow models using DIGITS, so I assume that
DIGITS handles the data IO for you:

[https://github.com/NVIDIA/DIGITS/blob/master/docs/GettingSta...](https://github.com/NVIDIA/DIGITS/blob/master/docs/GettingStartedTensorflow.md)

It looks like you just define a wrapper around your model so that DIGITS knows
how to run it and evaluate the loss, but otherwise your model could be
anything.

------
zeryx
Author here, if you have any questions/comments I'll check back in
periodically and answer them here!

~~~
AndrewKemendo
1\. Assuming someone is looking for only one or a handful of objects, why
would they train their own dataset on open image instead of using the
inception/object_detection built into TF? Seems like this use case is for
systems that are looking to eval/classify a lot of different object classes.

2\. Why would someone run this on Algorithmia instead of their own hosted
NC/P2/P3 instance?

3\. Have you tried open image with any other DNNs like Shang's implementation
of FastMaskRCNN?

~~~
zeryx
1) The biggest reason is label diversity. COCO is a great dataset, but at 90
labels you're going to be missing some potential. The problem with commercial
machine learning development is access to data, and with Open Images and it's
545 classification labels we get that diversity.

2) Scaling is hard, Cross-language Interoperability is hard, And bringing your
product to market is hard. Algorithmia solves all of those problems with grace
and has a development experience that is second-to-none. Don't take my word
for it, compare it against tensorflow-serving other competitor technologies.

3) I haven't yet but I could certainly see an improvement in performance, we
did notice a dramatic difference between our SSD and faster-rcnn model as was
expected. This was only an intro, but if you create a model that performs
better than ours, we'd love to host it on Algorithmia.

~~~
krasin
A small elaboration. OpenImages v2 has 545 classes for object detection, and
5000 classes for image classification.

~~~
zeryx
Thanks, it's worth mentioning that we're entirely focused on the dataset for
object detection here, but it has a massive classification dataset as well.

------
dataronin
I mean, this is neat but as someone who actually tried to build a computer
vision product, can I just say Open Images data aren't quite enough? Also,
computer vision isn't quite at "human level" yet. For your own project,
building a model that has 90% accuracy on the test set is awesome but for an
actual product to be released into the wild, it could have serious problems
(not to mention adversarial examples).

------
pstoll
A minor nit but the function to dedup the image ids and the corresponding
comments seem to be off from a data structure/algorithmic POV.

"Looking at our deduplication function, it’s functional and performant, but
not very descriptive. Essentially it checks a running set called seen, which
is checked for originality as the deduplication script progresses. As python
Dicts are essentially a hash map, the in check compares element hashes instead
of each individual component of the dictionary object. This massively speeding
up the deduplication process."

Uhm..I don't think so.

It is just extra work to check if the objects are already in the set vs just
stuffing them all in and let the set handle uniqueness.

Am I missing something?

[https://gist.github.com/pstoll/ae73582763540051d321a4eb15304...](https://gist.github.com/pstoll/ae73582763540051d321a4eb15304226)

Again, a minor thing. But seeing something like that makes my 'what else do I
need to review' detectors go up.

------
doppenhe
live demo:
[https://algorithmia.com/algorithms/zeryx/openimagesDemo](https://algorithmia.com/algorithms/zeryx/openimagesDemo)

