
Weapons of Math Destruction (The Dark Side of Data Science) - oldbuzzard
http://boingboing.net/2016/09/06/weapons-of-math-destruction-i.html
======
trendia
The problem with most of these is not adjusting for cohort changes. For
instance, in the SAT example, the author writes:

> In the 1980s, the Reagan administration seized on a report called A Nation
> at Risk, which claimed that the US was on the verge of collapse due to its
> falling SAT scores.

Suppose that low-income individuals start to take the SAT in 1980 whereas they
didn't in 1970. The _wrong_ way to analyze SAT scores is to evaluate:

sum over cohorts P(SAT Score | cohort, Y)

where Y is the year. For instance, you might compare the total average score
in 1980 vs. 1970. Doing so will show a decrease in SAT score because of the
increase in low-income individuals taking the SAT, _not_ because the high-
income individuals are doing worse. (This assumes that low-income people have
less access to SAT training materials, and those training materials affect the
score).

The correct way is to only compare scores _within a cohort_ :

P(SAT Score | cohort, 1980) > P(SAT Score | cohort, 1970)

That is, did the same cohort do better in 1980 vs. 1970?

(There might _still_ be some differences between the cohorts in 1980 vs. 1970.
Maybe the low-income individuals who took it in 1970 had high confidence in
school, whereas the 1980s kids were from a broader background.)

~~~
oldbuzzard
The article addresses that a couple of paragraphs down. Thats her whole
point.To quote:

"The Nation at Risk report that started it all turned out to be bullshit, by
the way -- grounded in another laughable statistical error. Sandia Labs later
audited the findings from the report and found that the researchers had failed
to account for the ballooning number of students who were taking the SATs,
bringing down the average score.

In other words: SATs were falling because more American kids were confident
enough to try to go to college: the educational system was working so well
that young people who would never have taken an SAT were taking it, and the
larger pool of test-takers was bringing the average score down."

~~~
trendia
I was converting her text into math-ish notation. She's saying that

P(SAT Score > 700 | 1970) > P(SAT Score > 700 | 1980)

is inaccurate, and that we should instead use:

P(SAT Score > 700 | cohort, 1970) ~ P(SAT Score > 700 | cohort, 1980)

------
ccvannorman
> These brokers are training their model on the corrupted data of the past.
> They look at the racialized sentencing outcomes of the past -- the outcomes
> that sent young black men to prison for years for minor crack possession,
> while letting rich white men walk away from cocaine possession charges --
> and conclude that people from poor neighborhoods, whose family members and
> friends have had run-ins with the law, and "predict" that this person will
> reoffend, and recommend long sentences to keep them away from society

This is an extremely important point to our times. Be aware that this sort of
algorithm is harming society when it comes to prison sentences, and you're
paying for it at multiple levels.

> Amazon carefully tracks those customers who abandon their shopping carts ..
> interested in knowing everything they can about "recidivism" among shoppers
> .. [and they seek out and talk] to their subjects -- to improve their
> system.

 _If the prison system was run like Amazon ... [it would be] oriented toward
rehabilitation ..._

(emphasis mine) (edit:formatting)

