

Strong profiling is not mathematically optimal for discovering rare malfeasors - unignorant
http://www.pnas.org/content/106/6/1716.full.pdf+html

======
bermanoid
There's a practical problem when trying to apply this paper to something like
airport screening: it minimizes E[stops until terrorist is found], whereas
what we really care about and want to minimize is P(terrorist gets through
without being stopped).

It's a highly useful analysis as applied to, for instance, profiling for
random stop-and-frisk street searches where you're just trying to find as many
"evildoers" in a crowd as possible - most city cops would do well to realize
that even if "profiling works", they're probably doing it far too
aggressively, and actually ending up with fewer arrests per search than they
would if they didn't profile at all. The paper's conclusion is not so useful
when there may or may not be evildoers coming through your screening station
and your task is to make sure that as few get through as possible, given a
ceiling on the number of searches that you can do.

btilly mentioned (<http://news.ycombinator.com/item?id=4154989>) that the
optimal solution (neglecting game theoretical considerations, which are
extremely important!) to the airport screening problem is to set a prior
probability cutoff based on resources available, and screen everyone in a
group that has P(terrorist|group) above that cutoff, but he didn't elaborate.
The gist of the proof, by example, is that if you can only screen 10 people
out of a line of 100, then to minimize the chance of a terrorist getting
through you should never _ever_ use up one valuable screening on someone with
P(terrorist) = .02 if there's someone else with P'(terrorist) = .05. You'd
have to be crazy to do so, since that's an extra 3% chance that a terrorist
gets through, with no additional benefit.

As for the game theory issues, all I'll add there is that _if_ your prior
probabilities are "correct" for the sample coming through the gate, then the
strategy still holds. The problem is that if you know that terrorists know
that you're profiling and might decide to send in lower probability people,
your prior probabilities should change accordingly. How far, who can say...the
fact is, our priors are extremely uncertain to begin with in this area, so all
of this becomes much more subjective.

------
blahedo
Interesting for its sociopolitical primary application, but this paragraph
from the discussion section shows its general relevance to a lot of strong-AI
tasks:

 _It applies whenever a ‘‘bell-ringer’’ event must be found by sampling with
replacement, but can be recognized when seen. For example, one can thus sample
paths through a trellis or hidden Markov model when their number is too large
to enumerate explicitly, but one path can be recognized (e.g., by secondary
testing) as the desired bell ringer. It seems peculiar that the method is not
better known._

It _does_ seem a little peculiar, although it's not quite as unknown as the
author implies; rather, it provides a mathematical justification for one of
the hacks we would sometimes try (often helpfully!) if our figure-of-merit was
being overly dominated by high-ranking items: take the square root of the
probability[0] and use that. :)

[0] Well, once we'd got to the point of this kind of hackery to improve our
performance, it typically wasn't much of a probability anymore, at least not
as such; call it a "probability-derived score".

------
starship
Profiling is absurdly counterproductive, but not for the reasons the paper
describes. We don't need detailed analysis for this one, simple math will do.
Just to be clear, we're talking about heightened security procedures for
passengers of Arab ethnicity (as a proxy for people of Muslim faith.)

The problem? The majority of muslims aren't Arab! There are large numbers of:
\- asian muslims, Indonesia = world's largest muslim country \- black muslims,
across various African countries, i.e. _The Underwear Bomber_ \- white
muslims, in the balkans and Chechnya for example \- And of course there's
large numbers of muslims in India, Pakistan, and Iran who are not of Arab
ethnicity, although they might or might not look similar enough (depending on
dress most likely) to get thrown into the same profiling bucket.

The paper goes into the details of how a profiling system would be optimally
set up, but the entire issue is moot. The premise that we have a good
understanding of what groups should be receiving heightened security screening
is itself wrong.

(Note: I couldn't find the statistic with a few minutes of Google searching,
so I'll keep looking, but I know I have seen it reported that >50% of muslims
in the world are of ethnicities other than Arab.)

~~~
yummyfajitas
The paper describes profiling as being statistically optimal, not
counterproductive.

Also, no one proposes using Arab ethnicity as a proxy for being Muslim - who
cares if someone is Muslim? Arab ethnicity (or being a Muslim) is used
_probabilistic evidence_ (not a proxy) for a person being a terrorist.

------
defen
This analysis completely ignores the cost of _not_ finding malfeasors, does it
not?

~~~
barrkel
It looks at how you can proportionally allocate resources under the assumption
that you're not going to examine everyone deeply. Can you explain why the cost
of a false negative affects the allocation of limited resources, under its
assumptions?

~~~
defen
I don't care about false negatives, per se - only whether or not things get
blown up. It seems to be optimizing for the wrong thing - minimizing samples,
rather than minimizing total cost of malfeasance and its countermeasures. In
the real world, resources available for countermeasures are not fixed, but are
a function of the perceived cost of terrorism.

------
confluence
Great to hear that some of my "no duh" thoughts have mathematical backing!

I've always found ethnic profiling rather strange. If we assume, correctly,
that the vast majority of people are not "malfeasors" and the fact that said
"malfeasors" will plan against any and all barriers which we put into place,
why on Earth would you think that those you "nab" at airports and the like
would be anything but innocent?

Seems like you'd only nab "malfeasors" who would've failed anyway, and in turn
antognized a large population for little to no benefit.

What better way to breed "malfeasors" than to select and irritate a large
number of people for long periods of time for no particular reason?

Contempt breeds contempt breeds contempt.

One last thought: Malfeasance is a result of some cause. Treating malfeasance
is all well and good. But I always assumed that it would've been better to
address the root cause that creates "malfeasors", which are (in no particular
order); brutal poverty, constant assault and general mistreatment by those
other than themselves.

I wonder what would have happened had we flooded the Middle East with cheap
food/consumables and paid for hospitals, schools and infrastructure instead of
going in and starting a rather prolonged war.

I suppose it probably doesn't matter now - the damage is done.

~~~
bermanoid
Don't overstate the result here: this paper says that (mathematically
speaking) if you want to find terrorists in a crowd with as few checks as
possible, it _is_ optimal to profile rather aggressively, you just shouldn't
do so directly in proportion to the probability that each person is a
terrorist, but instead sample in proportion to the square root of that
probability.

This is not super surprising mathematically - there's no reason we ever should
have assumed that P(terrorist|race) is exactly proportional to the optimal
sampling frequency if we want to find terrorists when all we know is race, all
we'd assume up front is that there's _some_ relation.

------
Evbn
MIT researchers published a paper showing this simple mathematical fact in
2002 when the terrorism scare first appeared in the US.

