
Losing Confidence in Quality: Unspoken Evolution of Computer Vision Services - mr_tyzic
https://arxiv.org/abs/1906.07328
======
vidanay
I have worked in industrial machine vision for the last 18 years, and this
summary (I didn't read the paper) reflects my comments to management every
time they bring up the topic of AI or ML. Our inspection systems operate at
anywhere from 300 parts per minute to 3000 parts per minute. AI/ML has way too
high of a false reject rate (or even worse, a false accept rate!) The worst
scenario I explain to my managers is if we implement a AI/ML system and for
some reason it starts rejecting 50% of the customers product at 3am on a
Sunday, then there is no practical way to analyze the results to determine a
difinitive cause for the reject, and what the corrective action needs to be.
The final gut punch is then we would have to tell the customer that it could
be several hours to retrain the model(and that's only after we figure out what
needs to be represented in the good and bad image sets to account for the new
failure mode).

~~~
p1esk
So what are your inspection systems based on?

~~~
ohazi
Vision systems in controlled environments like factories that need to be fast
and stupidly reliable are often based on "classical" (non-ML) computer vision
techniques.

The ML hype train has led to a frustrating amount of throwing the baby out
with the bathwater, where people who should know better decide to use ML for
more things than is reasonable (something something "end-to-end").

Ideally, ML should be used for a few _very_ specific tasks, and then classical
machine vision / geometric analysis / plain old logic should be doing the
rest. If you don't do it this way, you eventually end up with the problems
described in this paper, where performance is inconsistent and impossible to
debug, and nobody can tell you what's going on or why.

------
mistrial9
This paper shows measured results of using "popular image recognition
services" .. that include Azure, AWS, Google, IBM and other current commercial
offerings.. (the implication right away is that the tested services are using
some DeepLearning system on the server side). The paper specifically says
"from the point of view of a software developer".. and spends quite a bit of
effort to question the assumptions of a user of these services, and identify
potential pitfalls from a mis-match of user assumptions, including consistency
over time, consistency between services themselves, and employing a machine
that produces deterministic outcomes versus probabilistic ones. The paper
looks at the behavior of a Vision-as-Service use from a Software Quality
Assurance (SQA) point of view - is the result -of commercial services on the
web- reliable over time. Liability within safety-critical environments is
questioned.

The comments here (so far) address "does DeepLearning image analysis work" ..
which is a broader question than what is being addressed in the paper..
Importantly, other kinds of image analysis methods, including other ML
approaches, are not being compared..

The authors seem to be raising a bit of an alarm about services like these,
reflected in the paper title (weakly):

[RH1] Computer vision services do not respond with consistent outputs between
services, given the same input image.

[RH2] The responses from computer vision services are non-deterministic and
evolving, and the same service can change its top-most response over time
given the same input image.

[RH3] Computer vision services do not effectively communicate this evolution
and instability, introducing risk into engineering these systems

To a non-specialist, this seems like detailed description of a useful real-
world investigation, like a lab. The authors' skepticism is healthy, and the
paper overall looks good. On the negative side, the discussion of labels in
Computer Vision seems to be insufficiently distinguishing between fundamental
problems in taxonomy and classification, problems with data grouping in
general, and then specifically problems associated with this kind of
DeepLearning image identification.

~~~
salawat
One thing to keep in mind is that any neural net based ML system is
essentially just a mathematical function imitator. The observations in this
paper are spot on in the sense that many mathematical functions can have the
same subset of results (success within your training data set), but can have
wildly varying behavior in the general case.

This is known as overfitting, and its one of the main things that should cast
doubt on any ML system's capability to reliably produce results outside of a
training set.

In a way, these outcomes are to a point predictable (in the sense of "the
possibility exists" as opposed to "this set of weights yields these
generalizability results") if you take the whole "neural network" thing a bit
more straight than many academics are comfortable with you doing. The human
brain, or any collection of biological neurons, is in a state of constant
flux, creating different networks in order to react to stimuli in the
environment and implement actions that bring us closer to achieving $goals.

Who hasn't experienced an off day where the gears of your mind just aren't
producing what you darn well know they should be? It's just a fundamental
change in the primary set of neural tools you've got to work with that day.
When the weights change, so too does the output, and the function modeled. The
cerebellum if I recall correctly, actually acts as a QA like functionality
built into our own minds.

There's nothing magic about simulating neural networks in silicon that
suddenly gets you to a more "free-of-mistakes" state besides being able to
condense way more dimensions of data into the networks "sensory space" as it
were. Even then though, the possibility of suboptimal functions being imitated
is inescapable.

Try explaining that to someone that wants to save millions on workforce, or
automate safety-critical tasks without concern for the consequences though.
It's amazing the cognitive barriers we can build.

------
andrewtbham
I feel like you can infer these services are stochastic and not deterministic
based on the documentation.

[https://docs.microsoft.com/en-us/azure/cognitive-
services/cu...](https://docs.microsoft.com/en-us/azure/cognitive-
services/custom-vision-service/getting-started-improving-your-classifier)

"The quality of your classifier depends on the amount, quality, and variety of
the labeled data you provide it and how balanced the overall dataset is. "

------
BigGodzilla
Good analysis for a real-world problem that software developers face. Customer
expectations definitely do not align with the realistic capabilities of the
technology.

