
Make Algorithms Accountable - GCA10
http://www.nytimes.com/2016/08/01/opinion/make-algorithms-accountable.html
======
yummyfajitas
It's tremendously disheartening to see the mainstream media repeating
ProPublica's lies. They ran a statistical analysis. Their R-script said that
bias was statistically insignificant. So they repeated a bunch of anecdotes in
the story and left that part of the analysis out.

Reporting null results won't get you cited by the NYT, I guess.

[https://www.chrisstucchio.com/blog/2016/propublica_is_lying....](https://www.chrisstucchio.com/blog/2016/propublica_is_lying.html)

The power of the media to push a socially convenient lie is pretty amazing.

~~~
obastani
I might be misunderstanding the article you linked, but it seems very
misleading to me (though I agree with the conclusion that the ProPublica
article is misleading). They claim to conclude:

"The predictor is probably not biased against any particular race - the
race_factorAfrican-American:score_factorHigh term is not statistically
significant. Or, as ProPublica puts it, it's "almost statistically
significant"."

I don't think you can conclude that the predictor is probably not biased
against any particular race -- you can only use significance to reject the
null hypothesis, not prove the null hypothesis (especially since the
significance is still somewhat high). Am I misunderstanding the claim in the
article?

~~~
yummyfajitas
I wrote the article. It's always tricky to me to figure out how to phrase a
statement about a frequentist method (since frequentist methods formally say
so little). I may have phrased this incorrectly.

So what I'm attempting to say is that they ran a statistical test, were unable
to reject the null hypothesis, and then wrote an article phrased as if they
did exactly that. But I think you are right that my phrasing is incorrectly,
however.

~~~
GFK_of_xmaspast
> It's always tricky to me to figure out how to phrase a statement about a
> frequentist method

That seems to me to be a fault whose location is not in your stars.

------
radarsat1
"In Wisconsin, for instance, the risk-score formula was developed by a private
company and has never been publicly disclosed because it is considered
proprietary. This secrecy has made it difficult for lawyers to challenge a
result."

Shouldn't that on the contrary make it extremely easy for lawyers to argue
that the score evidence should be thrown out?

------
wyager
I have a moral objection to the government using mechanized algorithms for
e.g. sentencing, because the government is (or ought to be) accountable to the
public, and must be able to justify all actions. Proprietary algorithms are
particularly odious, because it is impossible to "justify" a decision even in
the sense of tracing how the algorithm got to that point.

On the other hand, private individuals ought not to be accountable to anyone
for their decisions (unless they contractually agreed to be), so I have no
issue with e.g. a bank making loan decisions with a computer. After all, if
the bank's algorithm is innacurate, this actually hurts their bottom line
(because they either deny good customers or accept bad customers), so there is
a strong incentive to have good algorithms. On the other hand, no such
incentive exists for e.g. sentencing.

~~~
GFK_of_xmaspast
[https://en.wikipedia.org/wiki/Redlining](https://en.wikipedia.org/wiki/Redlining)

~~~
wyager
Yes, this was a viable solution back in the day because there was a 90% chance
(made up number, but you get the idea) that denying someone from a redlined
district was a good financial choice, and it was too expensive to do a
detailed analysis to determine if an applicant was in the other 10%. With the
automated credit checking and statistical analysis available these days, this
isn't really an issue anymore. Banks can afford to do a more detailed check of
each applicant, and they will, because failing to do so would be an economic
disadvantage.

~~~
jonathankoren
> Yes, this was a viable solution back in the day because there was a 90%
> chance (made up number, but you get the idea) that denying someone from a
> redlined district was a good financial choice,

No That wasn't the reason why redlining existed. It existed because it was
_explicitly_ racist policy enabled directly from the National Housing Act of
1934[0] -- which was basically just codifying the existing racist attitudes of
the day.[1] It stemmed from the racist idea that blacks were inherently bad
neighbors in all ways you can imagine. To say this was merely a good enough
heuristic, or even the result of government over reach is grossly
misunderstand the historical context of these policies. These policies were
not any more controversial than saying, we should not zone a preschool next to
a oil refinery.

[0]
[https://en.wikipedia.org/wiki/Redlining#History](https://en.wikipedia.org/wiki/Redlining#History)

[1] [http://www.theatlantic.com/magazine/archive/2014/06/the-
case...](http://www.theatlantic.com/magazine/archive/2014/06/the-case-for-
reparations/361631/)

~~~
wyager
Your first link indicates that, as I said, the purpose of redlining was to
direct wise economic decisions. Some researched (linked from wikipedia)
suggests that the data may have been biased against black neighborhoods due to
bias of the appraisers. It does _not_ suggest that the purpose was to
specifically screw over black people.

------
shanacarp
I think there are elements to complicated tags in complicated algorithms that
people need to see and understand. All the details - nah. I don't know all the
details that go into the math of my credit report. I do know I have the right
and responsibility to see what debts and savings that are reported.

I do think the author is right - issues that can affect someone's rights and
responsibilities that are being embedded in an algorithms are part of the
public trust and should be examined a bit more closely by the public. Do I or
anyone else need to see all the details - no. Do I need to understand some
basics and understand some of the tagging. Yes.

This is what helps uphold a free society.

------
pducks32
I would love—assuming Congress had their heads in the right place—to make a
small council (council to prevent Luddite or crazy Presidents/Congresses from
completely wreaking havoc) that would understand tech stuff like we do. People
who just comment "Yea that's not how this works," or "hey techies think this
is a good idea." Because I give a lot of governments good credit on tech stuff
(and especially local governments) but on the federal level I think we're a
little lost and lobbyists get to influence what congress members believe.

~~~
Houshalter
Technology isn't special. Politicians are also uneducated on every other
industry and area of life. It's just that when they misunderstand technology
you notice.

~~~
pducks32
Well I notice it about a couple areas but in a ton of those areas we have
specific branches of the executive to deal with the nitty gritty but not
really one for technology. Similar in UK where they have white wall ministries
to deal with things that parliamentarians should not be expected to have a
deep knowledge of.

~~~
jamessb
Presumably you mean "Whitehall", rather than "white wall"?

------
wiwofone
Here is a paper [1] discussing the effects of the new European Union
legislation mentioned in the article ("right to explanation"). Interesting
related read, regardless of the validity of the numbers mentioned in this
opinion piece.

[1]
[http://arxiv.org/pdf/1606.08813v2.pdf](http://arxiv.org/pdf/1606.08813v2.pdf)

------
dmreedy
I think the problem is vastly more subtle and fundamental than the author of
this piece (and ProPublica in general, given their previous coverage of the
same story) is giving credit.

Software is in a dangerous place right now[1]. Half[2]-way between stupid and
smart. Recent advances from fields like Machine Learning have moved the
software out of world of the purely deterministic. Stochastic methods have
given us software that can live in the gray that is reality; this has made it
mighty. Particular subdomains that were once entirely the purview of human
workers are rapidly moving towards automation. These algorithms have gotten
good enough at what they do, accurate enough at what they do, that they are
approaching the semantic-work-outsourcing limit that is 'trustworthiness'; You
can treat them more like agents than tools, and trust that they do their job
sufficiently well that you, the consumer of their work, need not worry about
the details; you get to take the executive role of dealing only with the
abstractions they provide. "Just tell me yes or no if we should do this". You
implicitly trust that the system will "Do The Right Thing"; of course it will,
it's got a fantastic resume with some great recommendations[3].

The problem is, the algs are also still dumb. Very dumb. They cannot model
themselves. They cannot introspect. And, perhaps most crucially, they cannot
interface with their newly-promoted executives in the lingua franca to explain
why they're dumb. When you want to figure out why the intern decided that it
would be a good idea to name all their variables some permutation of the words
'herp' and 'derp', you march over to their cube and ask them. A conversation
occurs. Different perspectives are exchanged via a common protocol[4]. New
knowledge is acquired. A mutual understanding is reached. When you want to
figure out why your Facial Recognition Software isn't acknowledging Black
People[5]... you go get a Masters in Statistics, Distributed System,
Probability Theory, with a minor in Anthropology and Demographics. Then you
spend a month reading code and running experiments. The software, briefly
perceived as a trustable agent that knows how to do its job, suddenly becomes
a tool again, because you can't just ask it why it screwed up so badly. And
not just any tool; an incredibly complicated, fragile, and opaque tool, with a
million different knobs and dials and a Gordian nest of pipework and conduits
that would make even the bravest chaotician sweat a little bit. Even when
you've open sourced the data and implementation and the deployment
architecture and the napkins you've been scribbling hyperparameters on, the
box is still pretty damn black.

And so, while auditability or accountability is important, it's only a small
(and very, very, very hard) piece of the societal changes that might be
needed. Changes in ethics (who goes to jail when a UAV confuses a hospital for
a barracks? What does a smart gun do if its wielder pulls the trigger while
pointing at a civilian?). Changes in focus (Is it correct to make such
deterministic choices about the world? Maybe Hume was right and the
predictability of human behavior is slightly harder than we currently state it
to be[6]. Maybe accuracy is bounded lower than we like, and so justification
should be the primary target). Changes in education (and a reduction in the
magical thinking about computers. I hope some distant ancestors of mine
finally see the day that computers are as boring and obvious as hammers).

I worry about focusing so much on accountability because as it stands now, a
highly autidable system as defined by the article still needs deep domain
knowledge to even begin theorizing about. All of these things above and more
will probably need to shift as well, whether deliberately or not, in order to
accommodate these smart-dumb tool-agent hybrids we have now, systems that
we're building right now, that are just powerful enough to be dangerous.

\---

[1] Every generation has said these exact words about every technological
advance across every field for as long as we've been finding ways of rendering
human beings obsolete. I'm sure there's some philologically reconstructible
PIE for the phrase "those scientists are playing god". We've discovered it
before, and I have no doubt we'll discover again solutions for these things,
but that doesn't render the conversation in the interim any less meaningful.

[2] 'Half' is probably optimistic. Consider it an upper bound.

[3] Studied at MNIST... Tuned by Hinton and Ng... Deployed on three customer
engagements with great KPIs... Very impressive stuff, Mr. Convolutional!

[4] I mean Natural Language here, but I suppose baseball bats and guttural
screams may be other possible channels in this particular scenario.

[5] Or why your self-driving smart car crashed into a cement barrier

[6] I was trying to construct some pun about "the son not rising to the
mistakes of the father tomorrow". It was pretty bad.

~~~
Houshalter
So first of all this isn't a new thing. The statistical methods vs human
judgement debate goes back decades. Long before computers, people were
training simple linear models with pencil and paper. The earliest paper I
found is from 1928, on a similar issue as in the article. Where a very crude
statistical algorithm was better at predicting recidivism of inmates, than
three prison psychologists. This isn't a new thing at all.

Second the models in question aren't as complicated as you imagine. These are
just decision trees with a few hundred inputs at most. Even with more
complicated models, you can train simpler models to mimic them, and then
inspect at those. There are other ways to make algorithms more transparent,
but my point is these aren't complicated image recognition deep neural nets.
They don't need to be either.

Third I don't think these algorithms are "dumb". By all accounts they
significantly outperform humans. Humans, even experts, are terrible at doing
even basic statistics in their head. Often our decisions aren't much better
than random chance. Even when humans are allowed to see the results of an
algorithm, and tweak it's output when they think it's making a mistake, they
do worse than just the algorithm alone.

There's now evidence to suggest humans are irrationally biased against
algorithms. Search for "algorithm aversion". A study shows that even after
watching a statistical algorithm do better, people still prefer worse human
judgement.

~~~
dmreedy
Yep, like I said, it's absolutely an old problem, even when it comes to
statistical techniques.

I understand that the particular models in the article are on the simpler
side. I was speaking more generally. As to whether the models in question here
need to be more complicated, well, that is an interesting question.

Humans are absolutely terrible at doing basic statistics in their head.
Algorithm aversion is probably a thing, no doubt motivated by similar
anthrocentric biases to luddism. My point was there are other ways of
measuring how 'good' a system is. In the space of a well-defined problem with
well-defined parameters, sure, maybe accuracy is king. This is strikes me as
akin to trying to build a bridge while assuming that the world is a
frictionless vacuum. If everyone has agreed that the problem is correctly
framed, then yes, a statistical system can be trusted, the mathematics are
inevitable. But a statistical model (the current iterations of them, at least)
is never going to ask whether or not it -should- only be considering the
inputs its given, or whether more is needed. The world that these systems
understand consists of the subset of the world provided to them by a set of
sample data The patterns they learn can only ever be as good as that.

That's what I mean by 'dumb'. Not that they aren't good at what they do. That
they aren't good at knowing about what they can't do.

