
On the Rise of FinTechs – Credit Scoring Using Digital Footprints - zt
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3163781
======
21
There was another article recently which argued that because there is a strong
correlation between race and default rates, if you apply machine learning to a
dataset, the algorithm will find a way to extract what is basically a proxy
for the race from the data.

So basically any sort of ML applied to credit data will run afoul of the equal
credit opportunity act.

The article also made the point that basically all ML credit scoring startups
are illegal because of this, but they get away just because they are small and
not on the radar.

~~~
tyfon
I make credit scoring models for a living and I can tell you that in most
countries you won't find "proxies" or any other strong variables just by
applying "machine learning".

Usually when you try neural networks in this segment you end up with exactly
the same variables and outcome as you would with a normal logistic regression
with 10x the complications and a lot less stability of the model.

There simply is not enough input parameters that are significant to the
outcome.

But it might be in the US where the field is less regulated and you are
allowed to collect all kinds of information on the person that you could proxy
something like ethnicity although I have not found ethnicity significant in
any of my sets. We do get a some variables even if we are not allowed to use
them. Again this might be different in the US.

~~~
21
Quote from the article:

> You can load in tons and tons of demographic data, and it’s disturbing when
> you see percent black in a zip code and percent Hispanic in a zip code be
> more important than borrower debt-to-income ratio when you run a credit
> model.

[https://thenextweb.com/syndication/2017/12/29/future-
fintech...](https://thenextweb.com/syndication/2017/12/29/future-fintech-
racist-according-anonymous-data-scientist/)

~~~
tyfon
If you account for most other significant things like income, education,
social status, job etc you will find that ethnicity is not significant.

The fact that you can sometimes use ethnicity as a proxy for social status and
other things just show the discrimination that happens in some places. But
when you set all other factors equal someone from Asia, Africa, Northern
Europe will have the same default rate. At least in the (European) countries
I've run models in.

~~~
taurine
People with the exact same FICO score have different default rates, if you
manage to bin them by race: Asians > Caucasians > Latin > Black.

> We do get a some variables even if we are not allowed to use them.

Cool, so you had access to an ethnicity variable to measure its proxy power
and significance? I feel this is important and very rare outside of Europe.

------
acjohnson55
The book Weapons of Math Destruction talks all about this. I've come to
believe that pure risk shouldn't be the only factor in a person's interest
rate.

That obviously makes a ton of sense from a business standpoint. You want to
contain losses for risky borrowers but compete with other lenders for low risk
borrowers.

But socially, this is perverse. People tend to be risky because they are
already poor. So now money costs more for those who have the least of it. This
is one of the feedback loops that makes poverty (and affluence, for that
matter) so sticky.

I had this realization in my personal experience when I was able to refinance
almost $100k in student loans at a crazy low interest rate. My household's
finances are in great shape as my wife and I enter our prime earning years.
But for us, such an opportunity is a gift, on top of an already sweet
situation. The savings could be a game changer for a family whose finances are
more marginal.

~~~
taurine
You can overdo fairness and cause troubles to the poor. Very concretely:
Giving someone a loan, despite their credit score being marginal, will
severely mess up their credit score forever, if they can't repay you.

If you are poor, then don't create more debt! It should be hard to rack up
such a debt, not easy and accessible. No amount of credit is going to increase
your social status, because you have to pay it back with your own
current/near-future money.

If we want social justice for the poor through access to more money, then
capitalism is not a good way to go. The state should become a credit provider.

~~~
skinnymuch
How would someone’s credit score be messed up forever? Debts are cleared
completely after 7 years and have less weight after about half way through
that time. Bankruptcy is along the same lines.

So your example isn’t correct.

~~~
taurine
You still owe any debts after 7 years (up to 15 years in some states). Zombie
debt is not "completely clear". But ok, read "mess up your credit score for 7
years" instead. Then point remains: Marginal high-risk credit underwriting is
not only dangerous to the institution, but also to the receivers of the loans
(and the economy in general). It is socially perverse to hook lower-income
people to consumer credit. To have middle-income people lose their house.

------
adrr
Problem with utilizing datapoints like digital footprints is that it will run
afoul equal credit opportunity act. ECOA was designed to stop banks from
redlining neighbourhoods which usually punished minorities. With digital
footprints, they'll be in theory redlining their digital including sites,
products purchased etc.

------
incompatible
The attributes, from the paper:

The device type (for example, tablet or mobile), the operating system (for
example, iOS or Android), the channel through which a customer comes to the
website (for example, search engine or price comparison site), a do not track
dummy equal to one if a customer uses settings that do not allow tracking
device, operating system and channel information, the time of day of the
purchase (for example, morning, afternoon, evening, or night), the email
service provider (for example, gmail or yahoo), two pieces of information
about the email address chosen by the user (includes first and/or last name
and includes a number), a lower case dummy if a user consistently uses lower
case when writing, and a dummy for a typing error when entering the email
address.

~~~
acjohnson55
I feel like _none_ of these things should be considered relevant to access to
credit.

~~~
taurine
Let's play a game. You are in charge of a large pile of cash and want to make
it grow by giving loans. Each day, two people apply, and you can give out one
loan (you will have to rank the applicants). When people de-fraud you, you
lose all of the loan. When people don't or can't pay you back, you lose all of
the loan. When people pay back the loan, you make a little money.

Day1: User Agent: iPhone latest vs. Windows XP Day2: Referral: Facebook friend
vs. search "cheapest loans" Day3: Time of interaction: 21:30 vs. 04:30 Day4:
Email: ari.johnson@cs.mit.edu vs. hpqwoovz11721@hotmail.com Day5: Funnel:
Someone who spend 10 seconds vs. someone who spend 10 minutes, made a mistake
in the name, entered an email address, then deleted it, and entered another
email address at a different provider.

Now if your feeling does not point you to the first applicant every day, you
look at the data for guidance. You find that the number of fraudsters and non-
pay's is statistically significantly higher for people with the second set of
characteristics.

The alternative is to use third-party data providers. That's another can of
worms. Or flip a coin and start gambling proper.

~~~
pps43
That game is common in credit scoring classes for fresh analysts. The class is
split into small teams. Each team is given ten anonymized, but real, credit
applications and corresponding credit bureau pulls. Five from customers who
subsequently defaulted, and other five from customers who paid the loan back.
Each team tries to guess which are which. The team that makes the most correct
guesses wins.

Then the results are compared with FICO score, and usually FICO is clearly
better. Even with people with banking experience on the teams it's very rare
to see humans beat the model, partly because humans tend to base their
decisions on irrelevant details and their own biases.

~~~
taurine
Let's change the game. You can allocate an investment to either of two banks.
When building credit scoring models, one bank has access to just FICO scores,
the other bank also has access to FICO scores in addition to behavioral and
signature data. Which bank do you allocate your cash to?

Now change the game so FICO is unavailable: For instance, when micro-lending
to third-world country entrepreneurs. Do you still feel these digital
signatures are irrelevant to making better credit risk decisions?

~~~
mmt
> Do you still feel these digital signatures are irrelevant to making better
> credit risk decisions?

Yes, absent a credible theory as to why a particular characteristic could
reasonably be linked to loan performance.

Otherwise, it's too likely the model could fall prey to confusing correlation
with causation.

~~~
taurine
Let's say you add these digital signature variables to your credit risk
scoring model anyway. The model then falls prey to confusing correlation with
causation. What happens to the performance of the model?

~~~
mmt
I have no idea, as merely _adding_ them may have no effect at all.

However, depending on them exclusively (or in substantial/majority part),
which I believe is the main premise, the eventual performance will depend
entirely on if the the _actual_ causal relationship which created the
correlation holds true. If it doesn't, the model would no longer be
predictive.

[https://en.wikipedia.org/wiki/Confounding](https://en.wikipedia.org/wiki/Confounding)

------
yjftsjthsd-h
"We analyze the information content of the digital footprint – information
that people leave online simply by accessing or registering on a website – for
predicting consumer default."

Wonderful. Gameable and dystopian all at the same time.

~~~
Spartan-S63
Feels like the open market is coming up with a solution similar to China's
"social credit" scoring system.

~~~
azborder
The only thing stopping it in the US is the equal credit act. Most of these
“novel” credit scoring solutions are just attempts to work around the race and
other prohibitions in credit scoring. The good news is that these things get
shut down quickly with enough complaints. This study points out that the
digital tracking is likely a violation.

~~~
ThrowAway1451
If that is true ("equal credit act is the only thing stopping it in the US"),
then race is the strongest factor[1] for credit scoring, so I'm not sure how
China does it (Chinese population is racially homogenous)?

[1]
[https://en.wikipedia.org/wiki/Factor_analysis](https://en.wikipedia.org/wiki/Factor_analysis)

~~~
voidmain0001
The Chinese recognize race within its borders[1], additionally it may be of
interest to the scorer to know who is Hui, Tibetan, or Uyghur.[2]. I don't
support this I'm just noting that China is not racially homogenous.

[1]
[https://en.m.wikipedia.org/wiki/Five_Races_Under_One_Union](https://en.m.wikipedia.org/wiki/Five_Races_Under_One_Union)
[2]
[https://en.m.wikipedia.org/wiki/Ethnic_issues_in_China](https://en.m.wikipedia.org/wiki/Ethnic_issues_in_China)

------
pps43
The baseline (FICO) has AUC of 68.3% which looks low. This may be because the
analysis is performed not on the entire through-the-door population, but only
on the customers that passed the creditworthiness check (which is using FICO).

In such situations it is customary to do some kind of reject inference or
testing below the cutoff, as well as swap-in and swap-out analysis. It does
not look like they did any of that.

------
rahimnathwani
I lead the Data Science team at Oakam, a London-based fintech company founded
in 2006.

If you find the article interesting, you may also be interested in Oakam's
work using alternative data to predict credit default, which was covered
recently in The Economist:

[https://www.economist.com/special-
report/2018/05/03/mobile-f...](https://www.economist.com/special-
report/2018/05/03/mobile-financial-services-are-cornering-the-market)

[https://www.economist.com/special-
report/2018/05/03/mobile-f...](https://www.economist.com/special-
report/2018/05/03/mobile-financial-services-are-cornering-the-market)

If you're a Data Scientist looking to work in this area, or just looking for a
new challenge, please contact me (personal email in my profile) so we can have
a chat!

We are also hiring software engineers (stack is React Native for iOS/Android,
and mostly C# for everything else).

