Object Detection with Open Images using Tensorflow

joshvm · on Oct 27, 2017

One thing that really puts me off Tensorflow is the large amount of work you have to do just to get your data in.

I recently trained a detector using Nvidia's detectnet and the DIGITS environment really makes it easy from dataset/database creation to model generation and training. The web interface handles all the padding and resizing for you when you import your images, and detectnet does online augmentation during training. It takes data in KITTI format which is just image + textfile with labels. No wrangling in Python needed.

The documentation is pretty terrible though, you have to pore over github issues to fix and modify things, but it does work.

zeryx · on Oct 27, 2017

I would be lying if I didn't agree with you. The valuable part is that Tensorflow's extendability and functionality are best in class. Making an object detector in scikit-learn is incredibly difficult, same with torch. Keras definitely can improve the experience but keras is also quite fragile. When in doubt, use the best tool for the job.

joshvm · on Oct 28, 2017

For what I was using it for - locating fruit on a plant - detectnet works pretty well (it runs at 7fps on a TX2 at 1280x960). It's a modified GoogLeNet with some custom layer on the end to do the bounding boxes. DIGITS itself is just a management console.

That said, you can train Tensorflow models using DIGITS, so I assume that DIGITS handles the data IO for you:

https://github.com/NVIDIA/DIGITS/blob/master/docs/GettingSta...

It looks like you just define a wrapper around your model so that DIGITS knows how to run it and evaluate the loss, but otherwise your model could be anything.

malux85 · on Oct 28, 2017

Shameless plug: our Deep Learning platform SignalBox helps solve this problem, it’s a GUI for TensorFlow with a drag and drop neural net editor.

We’re still growing and iterating the platform rapidly based on customer feedback, feel free to reach out

kakaorka · on Oct 28, 2017

I participated in a closed source project that tackled exactly this. We added tensorflow support into DIGITS and it made it a much easier framework to use.

zeryx · on Oct 27, 2017

Author here, if you have any questions/comments I'll check back in periodically and answer them here!

Q6T46nT668w6i3m · on Oct 27, 2017

Hi, zeryx:

How does this work in production? I'm especially curious about the stuff that isn't part of the TensorFlow graph like anchor generation and post-processing. Is your service running the graph directly or does it run some Python script that executes the graph?

zeryx · on Oct 27, 2017

So the object detection extension is mostly focused on breaking down complex operations like anchor generation into functional steps that are compatible with the base tensorflow metagraph definition. You can see these builders here: https://github.com/tensorflow/models/tree/master/research/ob...

What that means is that when it comes to inference in a production environment, we only need our Tensorflow python package, as the metagraph is defined in terms that the base Tensorflow package can decypher.

Although as I'm not an author of the object detection API, there is probably a more nuanced answer here.

AndrewKemendo · on Oct 27, 2017

1. Assuming someone is looking for only one or a handful of objects, why would they train their own dataset on open image instead of using the inception/object_detection built into TF? Seems like this use case is for systems that are looking to eval/classify a lot of different object classes.

2. Why would someone run this on Algorithmia instead of their own hosted NC/P2/P3 instance?

3. Have you tried open image with any other DNNs like Shang's implementation of FastMaskRCNN?

zeryx · on Oct 27, 2017

1) The biggest reason is label diversity. COCO is a great dataset, but at 90 labels you're going to be missing some potential. The problem with commercial machine learning development is access to data, and with Open Images and it's 545 classification labels we get that diversity.

2) Scaling is hard, Cross-language Interoperability is hard, And bringing your product to market is hard. Algorithmia solves all of those problems with grace and has a development experience that is second-to-none. Don't take my word for it, compare it against tensorflow-serving other competitor technologies.

3) I haven't yet but I could certainly see an improvement in performance, we did notice a dramatic difference between our SSD and faster-rcnn model as was expected. This was only an intro, but if you create a model that performs better than ours, we'd love to host it on Algorithmia.

krasin · on Oct 28, 2017

A small elaboration. OpenImages v2 has 545 classes for object detection, and 5000 classes for image classification.

zeryx · on Nov 1, 2017

Thanks, it's worth mentioning that we're entirely focused on the dataset for object detection here, but it has a massive classification dataset as well.

Omnipresent · on Oct 28, 2017

Hi

On this demo image [0], why isn't "text" found as well since there is clearly text in the image. Is it because "text" isn't considered an object? If so, what would it take (based on your article) to also train for text?

Thanks!

[0] https://i.imgur.com/0jwgJAG.png

SloopJon · on Oct 27, 2017

I can't quite parse this sentence: "The Open Images dataset is comprehensive and large, but many of its classes are unbalanced which effects or precision of underrepresented classes." Do you mean "affects our"?

zeryx · on Oct 27, 2017

Yea sorry that was a mobile fix, thanks for catching that. I'll update shortly.

dataronin · on Oct 28, 2017

I mean, this is neat but as someone who actually tried to build a computer vision product, can I just say Open Images data aren't quite enough? Also, computer vision isn't quite at "human level" yet. For your own project, building a model that has 90% accuracy on the test set is awesome but for an actual product to be released into the wild, it could have serious problems (not to mention adversarial examples).

pstoll · on Oct 29, 2017

A minor nit but the function to dedup the image ids and the corresponding comments seem to be off from a data structure/algorithmic POV.

"Looking at our deduplication function, it’s functional and performant, but not very descriptive. Essentially it checks a running set called seen, which is checked for originality as the deduplication script progresses. As python Dicts are essentially a hash map, the in check compares element hashes instead of each individual component of the dictionary object. This massively speeding up the deduplication process."

Uhm..I don't think so.

It is just extra work to check if the objects are already in the set vs just stuffing them all in and let the set handle uniqueness.

Am I missing something?

https://gist.github.com/pstoll/ae73582763540051d321a4eb15304...

Again, a minor thing. But seeing something like that makes my 'what else do I need to review' detectors go up.

doppenhe · on Oct 27, 2017

live demo: https://algorithmia.com/algorithms/zeryx/openimagesDemo