Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Disclaimer - I work in the financial services industry.

This is nothing new in most financial service machine learning algorithms have to meet these requirements. This is why why supervised analytics are more popular, and one of the reason why algorithms such as credit scores are typically generated and structured as scorecards with explicit reason codes for each independent characteristic, variable or vector.

This is really needed for a number of different reasons. The biggest is that algorithms are increasingly running our world. Even outside of mortgage, have you ever had to fork over the right to pull your credit score for either rent or (in some limited cases) to apply for a job? Not being able to understand: a) Why you were rejected. b) Why you are paying a higher interest rate. c) What you can do to fix that?

Housing and employment is fundamentally a question of human rights, not just banking profitability or academic algorithm improvements.

Part of the reason why Credit Scoring (which is intrinsically algorithmic decision-making) is because it replaced the "old boys" network that used to dominate. In this world you went to your banker, and the decision to extend credit was dominated by your personal relationship with a banker who might not be the same ethnicity, religion, nationality, etc as you would dominate your ability to purchase housing. The credit score democratized financial access.

It can still be used to discriminate. There have been a number of studies that have shown that simply being from a white community tends to increase the number of other people you can lean on in a emergency, and hence make you less likely to default on a loan. From a pure predictability point of view, a bank is at a lower risk of default lending to someone from that community, but that in turn denies financial access to people not in that community, continuing the trend.

A big problem in this kind of financial model is that it's relatively easy to accidentally or deliberately find other variables that mirror "protected" vectors. A simplistic example is zip code, where the zip code reflects a area that is predominately ethnic minorities.

So it's not cut and dried. It's my PoV that It's not predictability versus red-tape, and the people trying to do unaccountable analytics in this space are (perhaps in-advertedily) perpetuating racism.



I think there are two problems with requiring explainable models:

Explainable models will be more easily gamed, and they are likely to be less accurate.

The features in the models themselves will become less useful at their task. They will be gamed. This is roughly along the lines of Campell's law[1], though I've seen other, better explanations that I can't find. What happens when someone is turned down for reason A and B. They go and fix reason A and B, but in the mean time, so have many other credit seekers, diminishing their predictive value. In that time, the modelers create a new, different explainable model, that no longer uses A and B, but somewhat related predictors C and D, which haven't yet been used, and so haven't yet been gamed, which the original seeker does not meet?

Explainable models, being a subset of all models, are likely to not contain the most accurate models. I don't anything about the domain of credit scoring (maybe state of the art models are actually small and understandable?), but in speech recognition, for example, models are constantly growing in complexity, their individual decisions are way beyond explainable to anyone in reasonable amount of time, and they are only getting more powerful as they get larger and more complex. In speech, models are already many gigs. In credit scoring, less accurate models mean higher rates, so there is an aggregate loss.

[1] https://en.wikipedia.org/wiki/Campbell%27s_law


A fair point, but as a society, we have decided that racial discrimination is not a valid mechanism for banks to profit by. That does result in everyone paying a bit more in interest as the risk pool is larger, but a acceptable tradeoff.

In terms of gaming, verification is just as important as scoring If the data you have going into to the system is rigged, and income is not being properly validated, bad things will happen.


As a society we have directed banks to make bad loans to blacks and charge non-blacks extra to make up the difference? I'd be surprised if even 10% of people know this decision was made.

Also, what makes it acceptable to engage in this form of surreptitious wealth redistribution on racial lines?


Not being a racist makes it acceptable to not take race or a surrogate for race into condition for a loan ;-)

(Please don't take that the wrong way. I am not accusing anyone of racism. Simply stating that at some points our ideals is more valuable then an additional point of profits for the bank).

Disparate impact and it's use in credit scoring is mostly governed by Equal Credit Opportunity Act (ECOA), but most of the banks I am aware of go steps further in ensuring that disparate impact does not occur.


This is only important if you believe that racial stereotypes are true, but should be ignored. That if you control for education, income, region, etc, differences between races are still significant.

Disparate impact goes well beyond removing race as a feature. You sometimes can't use features that correlate with race, even if they are highly predictive. E.g. education.

It also has nothing to do with the profits of banks. Better prediction algorithms for loans mean less interest rates for people that are good borrowers, and less crippling debt to those that aren't. It has huge benefit to the economy and society.


[flagged]


Society has chosen to not regulate which alleyways people choose to walk down.

There are many things that individuals are free to do but organizations are not.


I think the case of zip code based discrimination is even named. It's called redlining, and is being used in machine learning world to describe indirect discrimination based on certain attributes (eg. discriminate people based on their zip code, a zip code with mostly black population for example)


But it's also very important to understand why machine learning systems do this. If you take race neutral data, and run virtually any machine learning system on it, you'll get a race neutral output. I.e., if a $40k/year black person in 10001 is equally likely to default as a $40k/year white person in 10002, then the algorithm can be expected to give them equal credit scores.

In the event that an algorithm choose black zip codes for extra penalties, it's because the algorithm has deduced that stereotypes are correct - the bank will make more money not lending to blacks at given rates, because something directly correlated with race and not captured by fields like income/past defaults/etc is predictive.

I discuss this in greater detail here, taking linear regression as a toy example: https://www.chrisstucchio.com/blog/2016/alien_intelligences_...

Having consulted for folks doing lending, I'll mention an insanely difficult $100B startup idea here: build an accurate algorithm that exhibits no disparate impact. The amount of money you'll save banks is staggering.


True, and then the question becomes, is the stereotypes correct, or is it that the credit score is then propagating a system that enforces this outcome.


The stereotype (That predominantly black households in a poor area code are poorer then the predominantly white households in a rich area code) can be proven correct by simple demographics, and the fact that household wealth is correlated with credit score.

The problem is that there's a positive feedback loop at play, here.


The bank already knows their assets and income. The question is if an equally poor and educated white is just as likely to repay a loan as an equally poor and educated black. I imagine most of the difference would go away. Unless you really believe black people are inherently less likely to pay back loans, all else equal.


If you can come up with an algorithm that reproduces this conclusion - with accuracy even remotely close to the "racist" ones - banks will cross heaven and earth to pay you $billions.

Unfortunately the only public analysis I'm aware of is from a blog using Zillow data: https://randomcriticalanalysis.wordpress.com/2015/11/22/on-t...

This effect is reproduced in various walks of life, e.g. education.

You'll be doing social good in addition to earning billions of dollars. What's not to like?


I do in fact have an algorithm to remove racism from models, but I doubt its worth "$billions". The whole point of my argument is that it shouldn't be necessary. Surely you don't really believe racist stereotypes are true?


I for one of course do not believe that people from other regions/continents are somehow inherently worse or better.

But I do believe that certain elements of different cultures and different ways of social upbringing can have a lasting positive or negative effect on a person. (although who am I to judge what is positive or negative? I try not to do this, I just see differences)

If these things didn't affect how we later as adults view the world, interact with others and respond to different types of challenges in our life then you could expect that basically everyone around the world would have the same moral value system and beliefs about almost everything.

This is obviously not the case.


I do, in fact. There is extensive research supporting the fact that many (though not all) are accurate. The typical racist stereotype is about twice as likely to replicate as the typical sociology paper.

Here are a couple of review articles to get you started: http://www.spsp.org/blog/stereotype-accuracy-response http://emilkirkegaard.dk/en/wp-content/uploads/Jussim-et-al-...

I didn't say that simply removing "racism" (by which I assume you mean disparate impact) is worth billions. I said doing so with the same accuracy as models which are "racist" is worth billions. Obviously you can do anything you want if you give up accuracy.

Why do you believe they are false? Simply because it's socially unacceptable to think otherwise?


I'm not sure whether or not the stereotype is true. The point is you are forced to acknowledge a pretty unpopular belief to defend this idea. And that the kind of people super concerned about disparate impact are usually not the kind of people that believe that. And I think that if you do acknowledge there are differences, the argument that it's unfair and wrong is much less strong.

In any case the link you posted earlier suggested it could be explained by different rates of numeracy and literacy. Which sound like obvious proxies for IQ. If you gave people an IQ test, or at least literacy or numeracy test, this would probably eliminate any racist bias.

> I said doing so with the same accuracy as models which are "racist" is worth billions.

You necessarily lose some accuracy if you really think race is predictive. But you need not lose all of it. The current methods for removing bias just fuzz features that correlate with race. But there is a way to make sure a model doesn't learn to use features just as a proxy for race, but instead to predict how much they matter independent of race.


> Simply because it's socially unacceptable to think otherwise?

Please stop insinuating this at people in HN comments. You've done it frequently, and it's rude. (That goes for "simply because of your mood affiliation?", etc., too.)


Dang, I'm really confused here. As Houshalter points out, he was deliberately making an argument based on social unacceptability of my beliefs. Searching for the phrase on hn.algolia.com, I've used the term 5 times on HN ever, never in the manner you imply.

I'm also confused about my use of the term "mood affiliation". Searching my use of the term, most of the time I use it to refer to an external source with a data table/graph that supports a claim I make, but a tone which contradicts mine. For example, I might cite Piketty's book which claims the best way to grow the economy is to funnel money to the rich (this follows directly from his claims that rich people have higher r than non-rich people). What's the dang-approved way to point this out?

In the rare occasion I use it to refer to a comment on HN, I'm usually asking whether I or another party are disagreeing merely at DH2 Tone (as per Paul Graham's levels of disagreement) or some higher level. What's the dang-approved way to ask this?

http://paulgraham.com/disagree.html

Let me know, I'll do exactly as you ask.


> usually asking whether I or another party are disagreeing merely at DH2 Tone (as per Paul Graham's levels of disagreement) or some higher level. What's the dang-approved way to ask this?

That seems like a good way to ask it right there.

Possibly I misread you in this case (possibly even in every case!) but my sense is frequently that you are not asking someone a question so much as implying that they have no serious concern, only emotional baggage that they're irrationally unwilling to let go of. That's a form of derision, and it doesn't lead to better arguments.

But if I've gotten you completely wrong, I'd be happy to see it and apologize.


As the person he was replying to, that statement wasn't rude in context. The belief being discussed really is socially unacceptable. My argument in fact, relied on that.


I personally draw the line of acceptable and unacceptable discrimination at "if the disadvantaged person can change that aspect".

Skin color cannot be changed, but zipcode certainly can. There is no rule I'm aware of that restricts visible minorities from living in rich neighbourhoods.


Except of course of the fact that visible minorities can not in fact afford to live there, or get a loan to live there, because they currently live in a poor neighborhood.


Interesting aside (but still On Topic): This works for IP addresses as well as zip codes.


> It's my PoV that It's not predictability versus red-tape, and the people trying to do unaccountable analytics in this space are (perhaps in-advertedily) perpetuating racism.

I was with you until that last line. It is not racism to provide better credit conditions to groups that demonstrably have a lower risk of defaulting on a credit.

It would be racism if you offer worse conditions to a certain group without any rational business related explanation.

If some group has a 2x chance on defaulting on a credit then it isn't about the skin of their color - they get to pay more interest because it is more risky to offer this group a credit in the first place.

With that logic you could just as well say that offering better credit conditions to rich people is racist against poor people.


You said: I was with you until that last line. It is not racism to provide better credit conditions to groups that demonstrably have a lower risk of defaulting on a credit.

It absolutely is, under ECOA, which explicitly maintains that you may not discriminate by race for loans.

And btw, as someone very familiar with credit scoring once mentioned to me, a small part of the reason the credit scores do not take into account what you currently make is that ability to pay is often a very poor predictor for willingness to pay. this burned a lot of people in 2006/2007 when it was assumed a good credit score justified a big loan, without any proper verification of income, assets and liabilities.


I didn't say that it is okay to discriminate by race for anything, I'm against that.

What I'm saying is that people should be treated fairly instead of using some kind of affirmative action.

If you did a proper verification of income, assets, liabilities and education, you'd likely find that some groups would have to receive worse credit conditions than others.

This doesn't mean that the system is or would be racist, it would strictly evaluate on socio-economic background, not based on race.

What I reject is to handicap one group to make up for the losses that another group generates for a bank by artificially lowering standards for certain groups.

Although this is made with best intentions, it is inherently racist (it presumes that some races are inferior to others - hence the introduction of lower standards), whereas evaluating the way I described would not be racist.

Btw: I'm myself a second-generation minority. If I'd have had the option to navigate through life with some social security autopilot and affirmative action provided by society I would have never made it to the upper strata of society.

You have to understand that expectations for life from where I and many others started were much lower than for most others, so people from this level are much more willing to accept a lower living standard provided for free than working hard for the chance to a higher standard.

From the perspective of upper middle class people these kinds of affirmative action and social benefits might seem like genuinely helping people whereas from my perspective it seems like poison that could have had inhibited my will to work on myself and fight to move upwards in society.


I was trying to think of how to specify this position more precisely:

1. Certain factors are thought to be causally correlated with higher default rates by individuals.

2. Certain populations have higher incidences of individuals matching such factors.

3. These populations have higher than average default rates.

4. Other factors that correlate with membership in these populations will thus be also correlated with higher defalt rates.

5. For an algorithm to be fair, it must be restricted to using only vectors for which causality can be proved.

One of the questions is, even if a vector is "proved" to be causal, would it be disallowed if it also strongly predicted membership in a protected population?


Yep. Or other -isms:

> What if [your hiring algorithm] is weeding out women most likely to be pregnant in the next year?

https://theoverspill.wordpress.com/2016/07/04/start-up-linke...


This is the law of unintended consequences. Data points are correlated to other data points. Unless your hiring algorithm is a random number generator, you could find some sort of "bias" you didn't intend.

At least an algorithm is measurable, repeatable, and consistent. Yes, you have to monitor the output of any process for unintended consequences, then you make tweaks as appropriate to eliminate unintended outcomes.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: