This paper is comparing AWS Rekognition, Google Cloud Vision, and Azure Computer Vision. I tried Google and Azure. I also tried clarifai.
To get 100% accuracy I ended up building my own service, and doing some really unconventional things. I used tensorflow, but I may rewrite the whole thing in pytorch.
I tried using neural architecture search but it was a dead end.
The key for me was training data distribution search.
The ML hype train has led to a frustrating amount of throwing the baby out with the bathwater, where people who should know better decide to use ML for more things than is reasonable (something something "end-to-end").
Ideally, ML should be used for a few very specific tasks, and then classical machine vision / geometric analysis / plain old logic should be doing the rest. If you don't do it this way, you eventually end up with the problems described in this paper, where performance is inconsistent and impossible to debug, and nobody can tell you what's going on or why.
Because all of the factors are known you could almost do pixel counts and compare to "true" circles (no need for AI models), any washers that didn't meet the critera were pneumatically puffed off of the line from an air hose.
Any tech that you think shows the most promise?
Like mentioned in another comment, targeted application of AI/ML can provide excellent results when the expectation is carefully defined and the objective is clearly stated.
Many AI/ML proponents jump directly to a 'black box' attitude where a product enters one side and leaves the other side and a smiley or frown face tells you if the product is good or bad. There are no definitions of dimensional tolerances nor geometric features. It's just good or bad based on the training model. But unfortunately, GIGO applies very strongly to the training model.
The comments here (so far) address "does DeepLearning image analysis work" .. which is a broader question than what is being addressed in the paper.. Importantly, other kinds of image analysis methods, including other ML approaches, are not being compared..
The authors seem to be raising a bit of an alarm about services like these, reflected in the paper title (weakly):
[RH1] Computer vision services do not respond with consistent outputs between services, given the same input image.
[RH2] The responses from computer vision services are non-deterministic and evolving, and the same service can change its top-most response over time given the same input image.
[RH3] Computer vision services do not effectively communicate this evolution and instability, introducing risk into engineering these systems
To a non-specialist, this seems like detailed description of a useful real-world investigation, like a lab. The authors' skepticism is healthy, and the paper overall looks good. On the negative side, the discussion of labels in Computer Vision seems to be insufficiently distinguishing between fundamental problems in taxonomy and classification, problems with data grouping in general, and then specifically problems associated with this kind of DeepLearning image identification.
This is known as overfitting, and its one of the main things that should cast doubt on any ML system's capability to reliably produce results outside of a training set.
In a way, these outcomes are to a point predictable (in the sense of "the possibility exists" as opposed to "this set of weights yields these generalizability results") if you take the whole "neural network" thing a bit more straight than many academics are comfortable with you doing. The human brain, or any collection of biological neurons, is in a state of constant flux, creating different networks in order to react to stimuli in the environment and implement actions that bring us closer to achieving $goals.
Who hasn't experienced an off day where the gears of your mind just aren't producing what you darn well know they should be? It's just a fundamental change in the primary set of neural tools you've got to work with that day. When the weights change, so too does the output, and the function modeled. The cerebellum if I recall correctly, actually acts as a QA like functionality built into our own minds.
There's nothing magic about simulating neural networks in silicon that suddenly gets you to a more "free-of-mistakes" state besides being able to condense way more dimensions of data into the networks "sensory space" as it were. Even then though, the possibility of suboptimal functions being imitated is inescapable.
Try explaining that to someone that wants to save millions on workforce, or automate safety-critical tasks without concern for the consequences though. It's amazing the cognitive barriers we can build.
"The quality of your classifier depends on the amount, quality, and variety of the labeled data you provide it and how balanced the overall dataset is. "