
Artificial Intelligence Needs a Bullshit Meter - amplifier_khan
https://gab41.lab41.org/i-need-an-ai-bs-meter-27e94d48c8c1#.omnk0unal
======
tensor
This is a very confused article with many errors. It appears to imply that no
one validates the accuracy of machine learning models, or that they lie about
the accuracy. Even more strangely, it suggest that the only way to address
this is by ensemble methods.

Validation of models is one of the most important parts of any machine
learning system. Every expert practitioner measures the accuracy of their
models with well established methods such as cross validation or hold out
tests. So the basic premise of this article seems quite at odds with the
reality.

Further, the article users the security domain as an example of the lack of
validation. Most applications of ML to security use _unsupervised_ algorithms
to perform anomaly detection. This is entirely a different thing to a
supervised algorithm. Anomaly detection via unsupervised algorithms is well
known to have many false positives.

But possible the worst error in the article is suggesting that ensemble
methods are a way to validate the accuracy of a model. An ensemble technique
is _not_ a way to validate accuracy. Rather it's a way to try to obtain higher
accuracy. You still need to validate your ensemble via something like cross
fold validation to understand the expected error.

~~~
nl
_This is a very confused article with many errors._

Au contraire! It is a good article which highlights a number of subtle points!

 _But possible the worst error in the article is suggesting that ensemble
methods are a way to validate the accuracy of a model. An ensemble technique
is not a way to validate accuracy. Rather it 's a way to try to obtain higher
accuracy._

Err... to quote the article: "One of the ways to _improve result quality_ is
by running ensembles of algorithms."

The talk of blending recommender systems and deep learning appears to be
inspired by Google's Wide and Deep Learning[1] work, which is effectively a
way of blending global and local results.

 _Every expert practitioner measures the accuracy of their models with well
established methods such as cross validation or hold out tests._

The problem here is knowing how well the model will work with radically (or
even somewhat) different data than it was trained on. This is not the same as
doing CV or hold out.

For example, the ImageNet set has an enormous number of dog pictures. This
means that its CV or hold-out performance tends to translate well to
performance on similar datasets, and if the new dataset has a lot of dog
pictures it will translate very very well.

However, if you attempt to use a network trained on ImageNet in a completely
different context (classifying X-Rays for example) it is unclear how well it
will perform before testing.

 _Further, the article users the security domain as an example of the lack of
validation. Most applications of ML to security use unsupervised algorithms to
perform anomaly detection. This is entirely a different thing to a supervised
algorithm. Anomaly detection via unsupervised algorithms is well known to have
many false positives._

Lab41 works in the intelligence space. That isn't your normal computer-
security anomaly detection. Have a look at their other work[2] - there is only
one thing that is conventional security log file analysis.

[1] [https://research.googleblog.com/2016/06/wide-deep-
learning-b...](https://research.googleblog.com/2016/06/wide-deep-learning-
better-together-with.html)

[2] [http://www.lab41.org/work/](http://www.lab41.org/work/)

~~~
tensor
They've removed the part about ensembles and replaced it with a call for more
"human understandable models." This at least addresses the premise of the
article, though I'm not completely sold on it as a solution.

> The problem here is knowing how well the model will work with radically (or
> even somewhat) different data than it was trained on. This is not the same
> as doing CV or hold out.

This isn't a problem with validation techniques, but rather a problem with the
data supplied to the algorithms. The established solution here is to make sure
your test data actually resembles the data you intend the algorithms to work
on.

Regarding the new suggestion of using models that are easy for humans to
understand, I suppose my question would be "how do you know that a model that
a human feels good about actually generalizes any better?" In my experience
things that seem to be right by intuition are often wrong. This is one of the
advantages of machine learning approaches over rules based systems, the
machine can learn things a human wouldn't think of. The other big advantage is
being able to scale easily, perhaps harder if you are expecting a human to
manually inspect every model.

I'm definitely on board with their suggestion to continually monitor
performance. There are some validation techniques that can help with that such
as incremental validation.

~~~
nl
_The established solution here is to make sure your test data actually
resembles the data you intend the algorithms to work on._

That's an ideal solution, yes.

But in practice - especially for deep neural networks that take weeks to train
on pretty significant hardware - most people use pre-trained models, and
attempt to retrain the last layer[1].

The other issue is of course the large amounts of supervised training data
needed.

[1] eg
[https://www.tensorflow.org/versions/r0.9/how_tos/image_retra...](https://www.tensorflow.org/versions/r0.9/how_tos/image_retraining/index.html)

------
Animats
From the article: _" What I hope I have done is sufficiently piqued your
interest to get you involved in Lab41."_ So this is an ad.

While I sort of agree that AI could use a bullshit meter, it's way better than
it was in the 1980s. Today, much of the stuff actually works. Real work is
done with AI. Deposit a handwritten check at an ATM and watch it be read
properly. I'm amazed that works.

------
make3
Again with the use of the word AI for what is really just supervised (deep)
machine learning. Pretty vacuous article on a subject covered at length since
the dawn of machine learning by a very large amount of authors, in much more
detail than here.

~~~
tnecniv
I'm curious as to what you consider AI. I tend to feel AI is a catch-all for
hard problems that we don't know how to solve. Once we know how to solve
something, we give it a name.

~~~
blahi
Intelligence implies the ability to reason. What people call AI are glirified
calculators. And it,s getting really tiresome.

------
dharma1
Given who they work for, I'm surprised they open source all this -
[http://www.lab41.org/work/](http://www.lab41.org/work/)

------
shulmanbrent
Gotta love good o'le Bob G

------
willhamina
Nice try, NSA.

