
EU regulations on algorithmic decision-making and a “right to explanation” - pmlnr
http://arxiv.org/abs/1606.08813
======
InTheArena
Disclaimer - I work in the financial services industry.

This is nothing new in most financial service machine learning algorithms have
to meet these requirements. This is why why supervised analytics are more
popular, and one of the reason why algorithms such as credit scores are
typically generated and structured as scorecards with explicit reason codes
for each independent characteristic, variable or vector.

This is really needed for a number of different reasons. The biggest is that
algorithms are increasingly running our world. Even outside of mortgage, have
you ever had to fork over the right to pull your credit score for either rent
or (in some limited cases) to apply for a job? Not being able to understand:
a) Why you were rejected. b) Why you are paying a higher interest rate. c)
What you can do to fix that?

Housing and employment is fundamentally a question of human rights, not just
banking profitability or academic algorithm improvements.

Part of the reason why Credit Scoring (which is intrinsically algorithmic
decision-making) is because it replaced the "old boys" network that used to
dominate. In this world you went to your banker, and the decision to extend
credit was dominated by your personal relationship with a banker who might not
be the same ethnicity, religion, nationality, etc as you would dominate your
ability to purchase housing. The credit score democratized financial access.

It can still be used to discriminate. There have been a number of studies that
have shown that simply being from a white community tends to increase the
number of other people you can lean on in a emergency, and hence make you less
likely to default on a loan. From a pure predictability point of view, a bank
is at a lower risk of default lending to someone from that community, but that
in turn denies financial access to people not in that community, continuing
the trend.

A big problem in this kind of financial model is that it's relatively easy to
accidentally or deliberately find other variables that mirror "protected"
vectors. A simplistic example is zip code, where the zip code reflects a area
that is predominately ethnic minorities.

So it's not cut and dried. It's my PoV that It's not predictability versus
red-tape, and the people trying to do unaccountable analytics in this space
are (perhaps in-advertedily) perpetuating racism.

~~~
sergioisidoro
I think the case of zip code based discrimination is even named. It's called
redlining, and is being used in machine learning world to describe indirect
discrimination based on certain attributes (eg. discriminate people based on
their zip code, a zip code with mostly black population for example)

~~~
yummyfajitas
But it's also very important to understand _why_ machine learning systems do
this. If you take race neutral data, and run virtually any machine learning
system on it, you'll get a race neutral output. I.e., if a $40k/year black
person in 10001 is equally likely to default as a $40k/year white person in
10002, then the algorithm can be expected to give them equal credit scores.

In the event that an algorithm choose black zip codes for extra penalties,
it's because the algorithm has deduced that _stereotypes are correct_ \- the
bank will make more money not lending to blacks at given rates, because
something directly correlated with race and not captured by fields like
income/past defaults/etc is predictive.

I discuss this in greater detail here, taking linear regression as a toy
example:
[https://www.chrisstucchio.com/blog/2016/alien_intelligences_...](https://www.chrisstucchio.com/blog/2016/alien_intelligences_and_discriminatory_algorithms.html)

Having consulted for folks doing lending, I'll mention an insanely difficult
$100B startup idea here: build an accurate algorithm that exhibits no
disparate impact. The amount of money you'll save banks is staggering.

~~~
InTheArena
True, and then the question becomes, is the stereotypes correct, or is it that
the credit score is then propagating a system that enforces this outcome.

~~~
vkou
The stereotype (That predominantly black households in a poor area code are
poorer then the predominantly white households in a rich area code) can be
proven correct by simple demographics, and the fact that household wealth is
correlated with credit score.

The problem is that there's a positive feedback loop at play, here.

~~~
Houshalter
The bank already knows their assets and income. The question is if an equally
poor and educated white is just as likely to repay a loan as an equally poor
and educated black. I imagine most of the difference would go away. Unless you
really believe black people are inherently less likely to pay back loans, all
else equal.

~~~
yummyfajitas
If you can come up with an algorithm that reproduces this conclusion - with
accuracy even remotely close to the "racist" ones - banks will cross heaven
and earth to pay you $billions.

Unfortunately the only public analysis I'm aware of is from a blog using
Zillow data:
[https://randomcriticalanalysis.wordpress.com/2015/11/22/on-t...](https://randomcriticalanalysis.wordpress.com/2015/11/22/on-
the-relationship-between-negative-home-owner-equity-and-racial-
demographics/#my_analysis)

This effect is reproduced in various walks of life, e.g. education.

You'll be doing social good in addition to earning billions of dollars. What's
not to like?

~~~
Houshalter
I do in fact have an algorithm to remove racism from models, but I doubt its
worth "$billions". The whole point of my argument is that it shouldn't be
necessary. Surely you don't really believe racist stereotypes are true?

~~~
yummyfajitas
I do, in fact. There is extensive research supporting the fact that many
(though not all) are accurate. The typical racist stereotype is about twice as
likely to replicate as the typical sociology paper.

Here are a couple of review articles to get you started:
[http://www.spsp.org/blog/stereotype-accuracy-
response](http://www.spsp.org/blog/stereotype-accuracy-response)
[http://emilkirkegaard.dk/en/wp-content/uploads/Jussim-et-
al-...](http://emilkirkegaard.dk/en/wp-content/uploads/Jussim-et-al-
unbearable-accuracy-of-stereotypes.pdf)

I didn't say that simply removing "racism" (by which I assume you mean
disparate impact) is worth billions. I said doing so with the same accuracy as
models which are "racist" is worth billions. Obviously you can do anything you
want if you give up accuracy.

Why do you believe they are false? Simply because it's socially unacceptable
to think otherwise?

~~~
dang
> _Simply because it 's socially unacceptable to think otherwise?_

Please stop insinuating this at people in HN comments. You've done it
frequently, and it's rude. (That goes for "simply because of your mood
affiliation?", etc., too.)

~~~
yummyfajitas
Dang, I'm really confused here. As Houshalter points out, he was deliberately
making an argument based on social unacceptability of my beliefs. Searching
for the phrase on hn.algolia.com, I've used the term 5 times on HN ever, never
in the manner you imply.

I'm also confused about my use of the term "mood affiliation". Searching my
use of the term, most of the time I use it to refer to an external source with
a data table/graph that supports a claim I make, but a tone which contradicts
mine. For example, I might cite Piketty's book which claims the best way to
grow the economy is to funnel money to the rich (this follows directly from
his claims that rich people have higher r than non-rich people). What's the
dang-approved way to point this out?

In the rare occasion I use it to refer to a comment on HN, I'm usually asking
whether I or another party are disagreeing merely at DH2 Tone (as per Paul
Graham's levels of disagreement) or some higher level. What's the dang-
approved way to ask this?

[http://paulgraham.com/disagree.html](http://paulgraham.com/disagree.html)

Let me know, I'll do exactly as you ask.

~~~
dang
> _usually asking whether I or another party are disagreeing merely at DH2
> Tone (as per Paul Graham 's levels of disagreement) or some higher level.
> What's the dang-approved way to ask this?_

That seems like a good way to ask it right there.

Possibly I misread you in this case (possibly even in every case!) but my
sense is frequently that you are not asking someone a question so much as
implying that they have no serious concern, only emotional baggage that
they're irrationally unwilling to let go of. That's a form of derision, and it
doesn't lead to better arguments.

But if I've gotten you completely wrong, I'd be happy to see it and apologize.

------
sergioisidoro
I made a couple of experiments on discrimination free machine learning models
with naive Bayes, and I changed my perspective on data science.

Usually people are concerned about maximising prediction accuracy, and never
stop to think about what correlations is the model finding down below, and the
human biases present in the data annotations.

Removing sensitive variables (gender, race, etc) doesn't always help, and
specially if the models have high impact on people's lives (applying to loans,
university, scholarship, insurance), we cannot afford to blindly use existing
annotations in black box models.

All of this excludes the fact that companies will probably maximise to profit,
and will use "algorithm" as an excuse to turn down people disregarding ethics.

Edit: here is code for the curious
[https://github.com/sergioisidoro/aequalis](https://github.com/sergioisidoro/aequalis)

~~~
AnthonyMouse
> Usually people are concerned about maximising prediction accuracy, and never
> stop to think about what correlations is the model finding down below, and
> the human biases present in the data annotations.

Because maximizing prediction accuracy is inherently unbiased. Bias is when
the predictions made are _inaccurate_ to the detriment of a group of people.
If you had a prediction algorithm that functioned using time travel to tell
you with 100.0% accuracy who would pay back their loans, there would be a
racial disparity in the result, but the cause of it is not the fault of the
algorithm.

And you can't fix it there because that's not where the problem is.

Suppose you have a group of 800 white middle managers, 100 white sales clerks
and 100 black sales clerks. Clearly the algorithm is going to have a racial
disparity in outcome if the middle managers are at 500% of the cutoff and the
sales clerks are right on the line, because it will accept all of the middle
managers and half of the sales clerks which means it will accept >94% of the
white applicants and 50% of the black applicants.

But the source of the disparity is that black people are underrepresented as
middle managers and overrepresented as sales clerks. The algorithm is just
telling you that. It can't change it.

And _inserting_ bias into the algorithm to "balance" the outcome doesn't
actually do that, all you're doing is _creating_ bias against white sales
clerks who had previously been on the same footing as black sales clerks. The
white middle managers will be unaffected because they're sufficiently far
above the cutoff that the change doesn't affect them, even though that they're
the source of the imbalance.

~~~
tombone12
You entirely miss the point! The point is that in supervised learning for
example, if you optimize prediction accuracy with respect to your human
generated examples, you will get a model that exactly reproduces the racist
judgment of the human that generated your training set.

~~~
AnthonyMouse
> The point is that in supervised learning for example, if you optimize
> prediction accuracy with respect to your human generated examples, you will
> get a model that exactly reproduces the racist judgment of the human that
> generated your training set.

In which case you _aren 't_ optimizing prediction accuracy. Prediction
accuracy is measured by whether the predictions are true. If you have bias in
the predictions which doesn't exist in the actual outcomes then there is money
to be made by eliminating it.

It seems like the strangest place to put an objection where the profit motive
is directly aligned with the desired behavior.

~~~
wodenokoto
You need to think about how we measure truth and even what truth is.

In machine learning we tend to assume the annotations and labels are "true"
and build a system towards that version of the "truth".

> Prediction accuracy is measured by whether the predictions are true.

The more I think about this sentence, the less sense it makes. Prediction
accuracy can only be measured against records of something, and that record
will be a distortion and simplification of reality.

~~~
AnthonyMouse
> Prediction accuracy can only be measured against records of something, and
> that record will be a distortion and simplification of reality.

Prediction accuracy can be measured against _what actually happens_. If the
algorithm says that 5% of people like Bob will default and you give loans to
people like Bob and 7% of them default then the algorithm is off by 2%.

~~~
wodenokoto
You are still assuming everything that is recorded to be "like Bob" is the
truth and captures reality clearly.

Moreover you need to give loans to everybody in order to check the accuracy of
the algorithm. You can't just check a non random subset and expect to get non
biased results.

~~~
AnthonyMouse
> You are still assuming everything that is recorded to be "like Bob" is the
> truth and captures reality clearly.

Nope, just finding correlations between "records say Bob has bit 24 set" and
"Bob paid his loans." The data could say that Bob is a pink space alien from
Andromeda and the algorithm can still do something useful. Because if the data
is completely random then it will determine that that field is independent
from whether Bob will pay his loans and ignore it, but if it correlates with
paying back loans then it has predictive power. The fact that you're really
measuring something other than what you thought you were doesn't change that.

> Moreover you need to give loans to everybody in order to check the accuracy
> of the algorithm. You can't just check a non random subset and expect to get
> non biased results.

What you _can_ do is give loans to a random subset of the people you wouldn't
have to see what happens.

But even that isn't usually necessary, because in reality there isn't a huge
cliff right at the point where you decide whether to give the loan or not, and
different variables will place on opposite sides of the decision. There will
be people you decide to give the loan to even though their income was on the
low side because their repayment history was very good. If more of those
people than expected repay their loans then you know that repayment history is
a stronger predictor than expected and income is a weaker one, and if fewer
then the opposite.

------
Houshalter
Is my reading of this law correct? It sounds like it outlaws all use of
algorithms to evaluate people. The right to explanation is just for the rare
exceptions where a "member state" authorizes it. But the legalese is difficult
to parse, and no one else seems to get this impression.

I think this is really bad. That's the majority of uses of machine learning.
It also has a lot of economic value to predict things like how likely someone
is to pay back a loan, or even who is a spammer.

Most importantly, most these applications will go back to human judgement. And
humans are _far worse_. Human predictions about candidates are really bad. We
are incredibly biased by things like race, gender, and especially
attractiveness. Unattractive people get twice the sentences of attractive
people. Not to mention random stuff, like people being judged more harshly
when the judge is hungry before lunch. They also rarely give explanations for
their decisions, if they are even aware of the true reasons for them.

Going back to humans is a huge step backwards and will hurt a lot more people
than it helps. I think the same regulations that apply to algorithms, should
apply to humans. That would show the absurdity of these laws. But humans are
algorithms after all. And particularly bad ones (for this purpose anyway.)

~~~
germanier
> It sounds like it outlaws all use of algorithms to evaluate people. The
> right to explanation is just for the rare exceptions where a "member state"
> authorizes it.

Who says that this is rare? That's a pretty standard way EU regulations are
written to mean "Action A can only be allowed if conditions B are met".

If the regulation is approved, member states take a look and change their laws
accordingly. They could outlaw algorithmic decision making if they wanted to
(but they wouldn't need that regulation to do that). Much more likely is that
they impose those conditions on the companies that use such algorithms just
like they already impose a lot of conditions (such as "ethnicity is not
allowed to be a direct input variable" currently is).

~~~
ddebernardy
> If the regulation is approved, member states take a look and change their
> laws accordingly.

Small nitpick id I may: you seem to be confusing Regulations with Directives,
which must get transcribed within a certain timeframe in national law.
Regulations apply directly - no transcription needed.

Also: either way, EU law has primacy over national law.

~~~
germanier
Thanks for pointing this out, my post was not entirely clear on that matter.
This proposed regulation does in fact contain some directive-like clauses with
regards to the issue at hand. In this specific case they require the member
states to disallow certain practices in their national law.

It's also not uncommon for member states to enact laws dealing with the
effects of a (directly applicable) regulation.

------
tzs
The United States does not have anything as sweeping as this, but in the
limited area of credit the Equal Credit Opportunity Act (ECOA) requires that
lenders that turn down your credit application give you an explanation. From
the Federal Trade Commission's site:

    
    
      The creditor must tell you the specific
      reason for the rejection or that you are
      entitled to learn the reason if you ask
      within 60 days. An acceptable reason might
      be: “your income was too low” or “you haven’t
      been employed long enough.” An unacceptable
      reason might be “you didn’t meet our minimum
      standards.” That information isn’t specific
      enough.
    

[https://www.consumer.ftc.gov/articles/0347-your-equal-
credit...](https://www.consumer.ftc.gov/articles/0347-your-equal-credit-
opportunity-rights)

(The ECOA is at 15 U.S. Code § 1691, for those who want to go find the gory
details)

In one of the lectures from Caltech's "Learning from Data" MOOC the professor
mentioned that problem came up at a major lender that he had consulted for. He
did not say how they solved the problem.

I wondered if you could generate a satisfactory rejection reason, at least as
far as the law is concerned, by taking rejected applications and running them
through the system again but with some of the inputs tweaked until they get an
acceptance. Then report that the rejection was due to the un-tweaked parameter
value.

For instance, if you picked income to tweak, you'd raise the income by, say,
$5000 and try again. If that is rejected, raise it another $5000. If it then
passes, you can tell the applicant they were rejected because their income was
too low, and that an additional $10000 income would be enough to qualify them
for a loan.

You'd have to put a bit of sophistication into this to make the explanations
reasonable. For instance, if a rejected application would be approved with a
very large tweak to income or with a small tweak to employment length it would
probably be better to give employment length as the reason for rejection.

~~~
PeterisP
A potential problem with this is that if you "tell the applicant they were
rejected because their income was too low" then if this is disputed then they
will likely be able to point to many other applicants with even lower income
that were accepted, because their total combination of factors was better.

A simple explanation "factor X is too low" implies that there exists a
particular cutoff that is required and sufficient; but an explanation that
_accurately_ describes why you were rejected would likely be too complex to be
understandable.

But yes, probably the intended result could/should be simply identifying the
top 1-2 factors that dragged your score down compared to the average accepted
candidate (which can be rather easily done automatically even for the most
black-box ML methods) and naming them appropriately.

~~~
basseq
> an explanation that accurately describes why you were rejected would likely
> be too complex to be understandable.

This is the core of the problem. Have you ever tried to explain a statistical
segmentation to someone? It requires them to _let go_ of how they think about
if-x-then-y criteria and attributes. Very intelligent people who understand
statistics, weighting, clustering, etc. struggle with it--think about an
"average" person.

You can back into a reason: "Here's the attribute(s) on which you deviate the
most from the median of the segment of which you want to be a part." But it's
not a linear, single-variable "fix".

------
Kenji
Good luck arguing the 'why' when you have a trained neural network and all you
have is the network and weights.

On a more serious note, I love transparency but again, this is an overeager
regulation (not surprising from the EU). You almost never get the true reasons
for being rejected, be it in an interview or for a credit, etc. You have to
figure that out yourself from the often very vague rejection letter. Our minds
are basically running algorithms to which we have even less insight than our
computer algorithms. Therefore, this law only hampers the flourishing of the
economy while providing no value whatsoever.

Another law that 'solves' a non-issue, brought into being by overpaid career
politicians.

~~~
sparkie
If all you have is a network and weights, you have a black box. While it might
be magic and make the _right_ decision most of the time, we need to look in
and see why those decisions are being made, somehow.

This generally basically means seeing how the net was trained, and its initial
conditions. If you threw away your training methodology and data set, why
should anyone trust that your algorithm can make suitable decisions? Are we to
assume that nobody who writes AIs which are making influential decisions has
vested interests?

If you look for example at common law; we have a system where decisions by
judges are binding, and future equivalent cases are required to uphold the
previous decisions. The only way this system can work of course, is if someone
is keeping a record of the previous decisions. The alternative is we'd have a
system of judgement based on heresay, and we can have about as much confidence
in it as asking a random person on the street to make the decision.

The technical challenge is really about storage and retreival. If we know that
discarding training data makes our neural networks "unaccountable for their
decisions", then our technical requirement must be that we store the training
data _in its entirity_ , such that we can look back and maybe glimpse at why a
neural network may behave as it has.

For example, I might create an AI which is used to decide whether to give
someone a mortgage. I could have millions of samples as training data, but I
might chose to sort the training data by race, and begin training a network
initially so that non-white people who have been rejected mortgages initially
overtrain the network to correlate race with rejection, then use a limited
sample of white man, mostly accepted mortgage requests to train it the other
way.

Of course this is extreme, and an expert who looked at such data set would
quickly notice that the AI is unfit for purpose, and blatantly racist. But he
needs the training set to even have a chance of concluding that. Without it,
he has effectively random numbers which tell him next to nothing.

These laws aren't meant to stifle innovation or economic benefits, but only to
ensure that fair treatment is practiced in their development. As far as I see
it, if you have a neural network, a sound justification of its design and the
methodology you used, combined with a complete data set and training set which
can be analysed for biases, then there's no reason these regulations should
get in your way.

------
ddebernardy
Trying to picture the "right to explanation" being applied to a Google car or
an automated vacuum cleaner. Mm...

In more seriousness, would it even be feasible for recent machine learning
algorithms to explain _why_ they opted for a decision?

~~~
chriswarbo
I suppose we can always weasel around with the definition of "why".

 _Why_ was my claim refused?

Because you have a large number of consonants in your surname which, when
combined with the fact your phone number has a prime number of "4" digits,
leads to an increase risk of fraud.

 _Why_ do those factors indicate fraud?

Because our training data indicates that they do.

 _Why_ does your training data indicate that?

...

~~~
PeterisP
Well, that's not a problem - when you say "my training data indicates that
they do" what you really mean is "our previous experience shows that
applicants with these factors have a significantly elevated chance of not
paying back our money", which by itself is a valid reason as it is objective
and based on real world facts.

A bigger problem would be when your features actually turn out to be proxies
for 'undesired' features. For example, if you do put names in as features,
then any credit risk system _will_ learn that e.g. "Tyrone Washington" is a
higher risk customer (assuming everything else is equal) than "Eric Shmidt",
since those names are highly predictive of most socioeconomic factors;
however, you'll risk a judge saying that this implements racial discrimination
even if you exclude race directly, since those names are even more predictive
of skin color than income.

~~~
fixermark
Kind of, mostly. If the training data is correlating consonants in surname or
prime number of '4' digits in a telephone number with increased credit risk,
there's a really solid case there to ask where the training data sample came
from and check it for biased sourcing.

~~~
PeterisP
If the system is correlating consonants in surname or prime number of '4'
digits in a telephone number with increased risk, well, then in practice that
simply means that you have a bad system with a serious overfitting problem -
i.e., not a problem with your data but a problem with your learning process
that's obtained "superstition" by treating random noise as important signal.

However, if you do some reason analysis, then this is useful and gives a lot
of interesting information.

Names will be correlated with socioeconomic status and ethnicity, and also
with education level [of your parents], income, etc.

Addresses obviously are strong indicators especially if you combine them with
data about that location or even particular building if that's available.

Even phone numbers can carry a lot of true information e.g. location, or for
countries that handed out new ranges for mobile phones, it would be correlated
with how long you've had your phone number/how often you change them, which is
a proxy for stability.

I'd bet that you could even build a somewhat predictive model of credit risk
based on character-by-character analysis of email adresses - even excluding
domain names, the fact if someone chose fluffybunny420 or
jonathan.cumbersnatch carries _some_ signal; not very much, but if you don't
have better data then it will serve as a vague proxy for age, lifestyle and
frivolousness (and in case of domain names, possibly employer) which all are
important influencers of risk.

~~~
fixermark
This explanation is solid and speaks loudly to the benefits of right to
explanation. Because if the explanation is "We've found signal in the data
that correlates strongly with social biases (example: surname analysis---of
course there's increased risk, if the lender is operating in a country where
people don't trust anyone whose name starts with "Mc", for example), and we
can use that signal to basically ethnically profile without having to admit
that we're ethnically profiling..." that's bad, and the law may address it.

"Redlining" practices aren't bad because they don't economically work. They do
economically work; if one demographic is societally disadvantaged, you don't
do your firm any short-run favors by catering to them, because they aren't
where the money is. They're bad because they work by unfairly shifting the pie
away from people who don't actually deserve to have the pie shifted away from
them---it's a variant on "sins of the fathers" reasoning to bias on categories
like surname, or race, or demographics that correlate strongly with those
demographics (like geography, in cities in the United States). There are moral
reasons to not allow those practices to get codified into algorithms so the
people using them can excuse bad behavior as "just following the computer's
orders."

------
mknocker
The first thing that came to my mind was... what about advertising on the web
? This is an area where machine learning and algorithmic decision-making use
discrimination to either show or not an ad to a particular user. There are
millions decision made everyday. Under that regulation, could you ask for the
reason an ad was shown to you ?

~~~
matt4077
Nope, because advertisement doesn't "significantly affect" you. At least
that's how it's meant and how it will be understood. This is about credit,
employment, insurance, medical care etc.

~~~
Symmetry
Medical care?! Oh God. Well, I suppose the AMA will keep the US from embracing
machine learning so it's not like this is actually a chance for the US to
reverse its lag in mortality statistics.

------
dharma1
I think it will be very hard to implement/enforce this regulation. I attended
the London AI summit last week, and they had a speaker from a German lender
called Kreditech.

There was a question of black box credit scoring, and the speaker made a fair
point - their models have 20,000 vectors in determining credit worthiness. How
would you begin to break that down to something explainable? You can list the
sources of the data, or offer manual review of the application if rejected
(which is something they do) - but it would be very hard to show exact causal
reasoning patterns in the automatic scoring.

~~~
kaijush
>> their models have 20,000 vectors in determining credit worthiness. How
would you begin to break that down to something explainable?

Well, somehow they decided that their 20k-parameter model is accurate. They
should at least be able to explain why they took that decision, even if the
model itself is too complex.

~~~
JoeAltmaier
Because the training set was covered with x% accuracy. Ok, how does knowing
that help?

~~~
kaijush
OK, so we're assuming supervised learning. In that case, how were the features
of the training set chosen? And how was the training set labelled?

Most of the time, outside of semi-supervised learning, those things are not in
the data, someone has to (painstakingly) decide them. That can very well be
explained I believe.

More generally, what I mean to say is that you can learn any number of models
from the same data, so to a great extent you choose what your model learns by
manipulating your algorithm's (hyper)parameters, the training set's features,
choosing what goes into the training set itself etc etc. That process leaves
room for a lot of regulatory oversight.

\- in principle. In practice I expect since we're talking about people lending
money or selling insurance we're only ever going to get those details out of
their cold dead hands, and only with considerable effort at that.

------
zoner
Finally I get to know why my mortgage was declined :)

~~~
phkahler
>> Finally I get to know why my mortgage was declined :)

My first attempt at one was declined due to "problems with my credit report"
or some such. I happened to be working at a place where they did credit checks
on customers, so I asked someone to pull my credit report. They had all my
stuff, but had mixed in a bunch of information from someone else with the same
name - different age, there were loans on there from when I was 3. I called
the reporting company to complain and they kept getting caught up in "how did
you get your report?" as if I had done something wrong in just having access
to their incorrect information. Perhaps having someone pull the info instead
of the "consumer" getting it through proper channels was against some rule -
this was 1995. I think the same is often true today, companies want to collect
data on you but don't really want you to know what they've got or how they use
it.

~~~
dublinben
This is exactly why you are legally entitled to receive a copy of your credit
report every year. You can challenge anything on it, and have incorrect
information removed.

~~~
jerf
When mentioning this, I find it helpful to include this page:
[https://www.ftc.gov/faq/consumer-protection/get-my-free-
cred...](https://www.ftc.gov/faq/consumer-protection/get-my-free-credit-
report)

If you are one of the people just finding out about this today, and you Google
"free credit report", you come up with a _loooot_ of bad, scammy links. This
is how you get your legally-mandated, really free, annual credit report,
starting from an ftc.gov address so you know it's really the right one.

~~~
sshumaker
Credit Karma is legit and free. Also gives you your credit score and credit
monitoring (all free).

(Disclaimer: I work there).

~~~
dublinben
To be fair, the credit score (VantageScore 3) you provide is not the credit
score (FICO) most people are interested in, and that is used for most credit
decisions.

------
Symmetry
I wonder machine learning could just handle this the way that humans do. Human
decisions processes are of course extremely opaque even if we're good at
coming up with post-hoc justifications for our reasoning as needed.

"You're income is too low" "But his income is also low and he got a loan!"
"Well he has a better credit score." etc

It would obviously be impossible to give a complete and accurate explanation
of a machine learning decision process the same way it would be impossible for
a human learning decision process, but I wonder if it would actually be
difficult to come up with particularly salient distinguishing features in
general or between certain cases the same way that people do?

------
gersh
Can't you just say the decision was based on a random forest trained on the
following parameters with historical data. You could get more specific on how
the model was trained or even show model if you have to. I'm sure you can
create a small-print tree diagram.

In terms of human review, does this require more than a spot check? I can
imagine look over the p-value and possibly spot check the input data, and then
approve it.

------
bwgoodman
hi, I'm one of the co-authors. This feedback is awesome. I have a longer
version in the works. if you're interested please feel free to email me, bryce
dot goodman at stx dot ox dot ac dot uk.

Thanks! Bryce

------
vixen99
So what effect will this have on risk assessment as used by insurance
companies?

"Insurance companies use a methodology called risk assessment to calculate
premium rates for policyholders. Using software that computes a predetermined
algorithm, insurance underwriters gauge the risk that you may file a claim
against your policy. These algorithms are based on key indicators about you
and then measured against a data set to weigh risk. Insurance underwriters
carefully balance the insurance company’s profitability with your potential
need to use the policy."

------
zxcvvcxz
> We argue that while this law will pose large challenges for industry, it
> highlights opportunities for machine learning researchers to take the lead
> in designing algorithms and evaluation frameworks which avoid
> discrimination.

This sounds like a very slippery slope.

Let me get this straight, what the EU wants to do here. If I make a business
that decides how to, say, make loans to people, and I have software that
analyzes how they filled out an application or took a written/logical test or
_something_ I may have to:

1) Disclose the findings and methodology of my proprietary algorithm

2) Potentially undo decisions that are deemed discriminatory

3) Thereby potentially be forced to choose who I do and do not do business
with

Some regulations are good, they help us avoid things like moral hazards. Some
regulations, however, act more like power grabs that take away freedom from
individuals and businesses.

I'm sorry to say but this sort of thing sounds like the latter.

There are some uncomfortable facts about people and society that algorithms
will uncover, and that people will try using to avoid making mistakes. Here's
a poignant example, albeit with much less sophistication:
[http://www.nytimes.com/2015/10/31/nyregion/hudson-city-
bank-...](http://www.nytimes.com/2015/10/31/nyregion/hudson-city-bank-
settlement.html)

> Instead, some officials say, some banks have quietly institutionalized bias
> in their operations, deliberately placing branches, brokers and mortgage
> services outside minority communities, even as other banks find and serve
> borrowers in those neighbourhoods.

Alright sure. I think we all know it's common sense to not open a jewellery
store in inner city Chicago. The local clientele would not be able to afford
such products and services, and that's enough of an argument, it's probably
business school 101. Just like how you're not going to find a good market for
a food delivery startup in Middletown, USA. The economics probably don't work.
EDIT - and if you think these businesses are wrong, and they should be in
these communities, let their competitors eat their lunch!

But it's viewed as discriminatory. Back to the paper, it's discussing
essentially the same thing: it's the natural evolution of "don't start a
jewellery store in inner city Chicago" turning into "don't start a jewellery
store in neighbourhood where access to urban center is less than X, available
nearby 3-lane highways is less than Y, percent of women from 18-29 is less
than Z, proportion of people from culture W who are in the bottom quartile of
people who buy jewellery is more than V, ...".

The solution here isn't to regulate and force people to open jewellery stores
in inner city Chicago. The solution, if we really want to bring diamonds to
Englewood, is to drastically reduce the poverty and crime rates so that
businesses will _want_ to sell there
([https://en.wikipedia.org/wiki/Englewood,_Chicago#Socioeconom...](https://en.wikipedia.org/wiki/Englewood,_Chicago#Socioeconomics)).

And if you think having access to a local business is a "right", then you must
concede that such a public good should be provided by a central government of
some type. And then you have to ask yourself, do you want the government
managing your bank loans, or would you rather have private companies compete
on prices and service? If you want the latter, then you need to accept that
certain things are just going to arise (or not arise) naturally.

