
Are Human Experts Less Prone to Catastrophic Errors than Machine-Learned Models? - breily
http://anand.typepad.com/datawocky/2008/05/are-human-experts-less-prone-to-catastrophic-errors-than-machine-learned-models.html
======
osipov
This is a fascinating discussion, particularly in light of the human brain
architecture. Back in school, my prof used to reference the distinctions
between the left and right brains in context of alternative computing
approaches used by each half. The right brain seems to be more in line with
the statistical machine learning approach, effectively a black box data
processor producing intuitive results from large quantities of data. The right
brain works great in context of normal data distributions. The left brain is
the logical brain and can override the right brain. The left brain is about
language based reasoning and can handle the novel and unexpected situations or
in other words the statistical outliers that give trouble to the right brain.

~~~
randomhack
Is there any proof for the statements made about left/right brains? Was the
prof learned in neuroscience? Or is it just some folk tale stuff?

~~~
osipov
Nothing folksy about left/right brain distinction. Check out this tour de
force TED video with a Harvard neuroscientist talking about the differences in
brain halves <http://www.ted.com/talks/view/id/229>

------
tndalpaul
A human expert may be better when you're in unknown territory (i.e., something
happens that doesn't fall into the domain of expertise) because then external
factors acquired over a lifetime may prove to be more useful than textbook or
algorithmic knowledge.

But more importantly I _do_ know that statistical models are better at
diagnosis and prediction than are human experts. Sad to say, doctors, lawyers,
judges and other foolish people keep us from using them.

FuturePundit summary article "Statistical Prediction Rules More Accurate Than
Experts": <http://www.futurepundit.com/archives/001558.html>

The FuturePundit article reviews the paper: "50 Years of Successful Predictive
Modeling Should Be Enough: Lessons for Philosophy of Science" by Michael A.
Bishop and J. D. Trout
[http://www.google.com/search?hl=en&q=%2250+Years+of+Succ...](http://www.google.com/search?hl=en&q=%2250+Years+of+Successful+Predictive+Modeling+Should+Be+Enough%22&btnG=Google+Search)

------
mark_h
Peter Norvig is "... now taking a short leave of absence from Google to update
his AI textbook."

This is news to me, and potentially quite exciting (even if I have just
finally acquired my own copy). Does anyone know anything else about this; what
sort of new material, etc?

------
msg
This comment will be a little roundabout, but it has a real conclusion.

It seems like the subject of this Google Tech Talk keeps coming up over and
over again.

<http://video.google.com/videoplay?docid=-2469649805161172416>

Here's a summary. Yann Le Cun discusses his research on deep learning. The
basic problem with standard learning, SVMs, neural networks and the like is
that they are limited to coming up with shallow templates for classification,
with more or less fancy versions of linear and nonlinear combinations of
weights. The number of such templates you need in a highly dimensional space
like computer vision is exponential (think about how much data you need to
represent one object at different combinations of distance, lighting,
orientation, focus, etc.).

Instead, he suggests, we need to learn how to get past shallow template-
matching and train, essentially, features of features or networks of networks.
This gives us a shot at discovering highly abstract features of the data we
have and doing real learning.

If you have any interest in the subject, I strongly suggest you carve out an
hour for the video.

TD-Gammon is another example. It is a learning program that plays backgammon.
Training from simple board features, it eventually derived what you might call
first-order expert features by doing shallow learning. But, when similar
first-order expert features (presence of an anchor, etc.) were added by hand
to the initial state space, TD-Gammon derived deeper expert features to the
point that it and similar programs became better than the best players in the
world.

The point here is that there was no way to transition from the shallow
features to the deep features without recoding the system from scratch. That
obviously won't scale to the kinds of AI problems that are interesting.

Big finish: The advantage humans have over computers (at the moment) is that
we work on those multiple levels of abstraction all the time. We do deep
learning in every field of human endeavor. That's what the deeply connected
neural networks in our brain are all about. In fields where such expertise is
possible, humans have it all over the computers. In fields where brute-force
calculation can win, the computers have it all over the humans.

What that implies to me is that we can't train computers to think as deeply
about catastrophes as we do. It requires a new paradigm of learning to get the
computer to that point.

------
jsmcgd
Although this doesn't involve machine-learned models, this story highlights
the worth of good human judgment, something that currently isn't replicable in
machines.

<http://en.wikipedia.org/wiki/Stanislav_Petrov>

------
aidanf
Failure would rarely be "catastrophic" in the context of web search as
described in the article. If the ML model makes an incorrect prediction on new
data you have a bad search result. No big deal - just feed the new data back
into the model and learn again.

If the data set is large enough then the ML model may find patterns that
escape a human expert. When it comes to finding patterns in very large
datasets machines scale much better than humans. Given a large enough dataset
a ML approach should be less susceptible to the Black Swan phenomena than
human experts.

On the other hand, if failure of the system really could be considered
catastrophic then there could always be a human involved. In these cases
output from ML models could be one of the inputs that the experts considers
before coming to a final decision. E.g. you wouldn't want a ML model doing
medical diagnosis by itself but it could be very useful for identifying
patients that should be double-checked, scanning diagnosis for errors etc.

------
kurtosis
It's possible to think of the process that airline safety regulators have gone
through where they have observed crashes and guessed about the cause and then
suggested fixes as a type of learning problem.

For goog's case it would be fun to try and build a supervised learning system
whose sole purpose is to try and identify the queries which a human observer
would consider a "catastrophic failure"

I've also heard of stories where decision trees supposedly outperformed human
cardiologists in making diagnoses. (I'm skeptical of this claim but let's
assume that it's true) If this type of advance is real then it could save a
lot of lives. Unfortunately, if goog's engineering team has this kind of doubt
about the machine, then I imagine that it would be easy to persuade a random
jury that installing such a poorly understood black box is negligence.

~~~
tndalpaul
"I've also heard of stories where decision trees supposedly outperformed human
cardiologists in making diagnoses. (I'm skeptical of this claim but let's
assume that it's true)"

If you substitute "Statistical Prediction Rules(SPRs)" for "decision trees" in
your statement, then it's true in every tested field of medicine. Read the
FuturePundit article (URL below) and it's links. The system _always_
outperforms the experts in it's field, without fail within the domain of
expertise. Even the _best_ experts. Always.

~~~
kurtosis
This article is very interesting, thank you for posting it. I haven't read it
yet, but it seems possible that this review suffers from a kind of selection
bias, in that the only SPR's which have appeared in the literature are those
which were more successful than human experts. This does not mean that a
widespread switch to SPR's would outperform human experts, because the average
implementation could be inferior to human experts. Consider how difficult it
is for IT departments to follow best practices in security.

another aspect of these systems which people may find repugnant is that they
allow one to weigh the costs and benefits of different treatment decisions in
a consistent way. I personally feel, after watching people I loved suffer in
an oncology ward that denying treatments which are very expensive but have
small probability of success would be a good thing, but I know other people
don't feel this way and there would be outrage.

