Hacker News new | past | comments | ask | show | jobs | submit login
‘Simple’ AI can anticipate bank managers’ loan decisions to over 95% accuracy (unite.ai)
107 points by Hard_Space on Feb 19, 2022 | hide | past | favorite | 90 comments



This headline and article are horrible and misrepresent the problem and the outcome.

The paper is about the manual process of re-assigning a credit score on a scale of 1 to 15 based on other customer criteria. Really the fact that this process exists at all shows that their initial credit scoring approach is flawed or too simplistic. The argument of "just replace it with an if statement" does not hold up in this scenario.

So this is not a "if number big lend. If number small no lend" problem. Its a 15 way multi-class classification problem. They even give a baseline for what happens if they randomly pick or always pick the biggest class in the paper

> As is typical in machine learning we also report the Accuracy p-value computed from a one-sided test (Kuhn et al., 2008) which compares the prediction accuracy to the "no information rate", which is the largest class percentage in the data (23.85%).

So yeah, 95% is somewhat better than 23.85%.

I agree with the general sentiment that is is likely a fairly straight forward problem to predict if you are familiar with the bank's operating procedures as there is no way these individuals are making their own risk models and independent decisions. They are there to follow the rules and provide human accountability.

An error analysis on the items the model couldn't predict would definitely have been most interesting.


Plus, systemic risk of a repeatable exploitation is more likely without humans in the loop. Making a bad loan for $1M is bad, but if “attackers” can repeatedly prove until they get a bad risk $1B loan, it becomes business shattering.


Yes, which is why there are hard breakpoints where banks' credit policies change and specifically an absolute ceiling for automation in terms of risk of loss on lending that is set in the credit policies of the bank and any changes have to be approved by their regulator.

No bank in any credible jurisdiction will have an automated system approving 1bn USD equivalent loans any time soon. A typical system would be loans up to a certain amount can get approved automatically, up to $xM by a credit officer, up to $yM by a senior credit officer, anything over that by the credit committee. Regulators push back very hard on automated decisionmaking for large loans particularly because of "default correllation skew"[1] problems as were revealed in the 2008 crisis. Relatively few bad decisions on big loans can push a bank into difficulties if they are not well-capitalized. This is particularly a problem for automated decision-making because as loans get larger they also get more idiosyncratic and therefore it's much harder to fit a model with confidence because there simply aren't enough data points.[2]

[1] Often credit quality for a group of loans rises because of idiosyncratic factors but deteriorates together, so as loans become more risky in an adverse economic environment the correllation of the default probability between the loans goes up. An intuitive way of thinking about this is in housing loans. If 3 or 4 of your neighbours default on their loan, property prices on your street will go down (because the banks will be trying to sell all those houses at once), making it much more likely you and the rest of the residents will default too.

[2] Say I'm trying to approve a 200k loan to expand a pizza restaurant. If I'm in a big bank I have hundreds of similar loans to use as data points for pricing and risk. If I'm trying to approve a 200M loan to build a luxury hotel complex that includes 5 restaurants, accommodation, retail etc it is completely a one-off. Even if I'm the largest commercial lender this loan will be unique in my portfolio. I will have many other large loans but there will be lots of idiosyncratic factors that make them different.


Do they remove the human in the loop? That doesn't seem like a smart idea. A model would be good just for suggestions.

Humans are both biased and with high variance (not to mention corruptible), but the algorithm can benefit from much better scrutiny and ensure uniform application of the criteria. If a human overrides, then they got to have a good reason.


In Europe you have an absolute right to a human making your loan decision[1]. So if a loan decision is made automatically you can request a human decision instead.

[1] https://ec.europa.eu/info/law/law-topic/data-protection/refo...


This isn't surprising. Most 'AI' comes down to correlating a small number of variables (one or two) with a prediction target. The real benefit of any form of machine learning is detecting functional relationships between variables (ie, "when this AND NOT that OR this"). It's just that these relationships don't provide a real benefit in 99/100 real world use cases.

In the case of a loan, if the credit score is high they'll probably qualify. If it's low, they won't. All other variables are very minor in comparison to this one, which already encapsulates almost all of the relevant information. This is by far the most common situation for practical machine learning.


When dealing with stakeholders in big corporations I learned that they absolutely needed AI, machine learning and data science as capabilities in the projects that they started. Even if data was only very tangentially related to the core product/project. If so, for example one needed to datascience the shit out of project metadata to create a project efficiency self optimization loop. Or what ever crap one could come up with.

Else they would not be able to secure funding for the idea from upper management because they had learned that 'data is the new oil'.

The second VS part always needed/needs to be how this project enables the product to be (or become) a platform where the company can the run value added services on top (value adding for the company, not the customer).

Imagine a car being a platform and you can book additional horse power for the upcoming trip. Or change the background image for the instrument panel. Or pay extra to natively have access to your Spotify playlists. And then datascience the user data to recommend even more stuff resulting in $$$ (by imagination from management and fuelled by slide decks created by some junior strategists that have never even owned a car but presented by the client account lead in a lush retreat).


Are we headed for another AI winter? Because this sounds like a recipe for another AI winter.


I am unclear as to whether what you are saying is that they need data science to justify their projects because when they say that an AI told them that they need $X extra for the project it carries more weight than when they say they did the analysis by hand or if the data science actually provided value in this case.


No. They need AI in their projects to receive management buy in (and funding) because management swallowed the hook and now wants everything AI & data driven (even if it makes no sense) to be able to brag to other managers from the competition and other industries that they now use AI and are a data driven company.


So you can feed them data from an RNG and say it’s from an AI and they’ll follow it?


Yes. But there is also an interesting failure mode - people don't bother applying if they don't think they'll get accepted. So presumably even just a "YES" will be a fairly good model of what actually happens without necessarily capturing the real model that the bank is using. A 95% accuracy rate isn't necessarily very good because of this effect (it might be, it just isn't certain).


My SO used to work in a bank in the business loan securities back office. Believe me. Lots of people and businesses apply even if they themselves see no realistic chance of receiving a loan.

And in private non business customers there are even a significant amount that have less than nothing to their name, living of welfare and still don't see, why they should not receive a loan from the bank for the newest iPhone. Or a new car. Or TV because "football's coming home".

Excuse my snarkiness, but the stories she told me from her apprenticeship in the bank, when she was customer facing were not painting a good picture of humans (Btw. well off or not). But that is probably the case in any consumer facing job.


Excuse my snarkiness, but having actually worked for a bank and seeing the inside of the loan department from an IT perspective, the bulk of that translates into: banks will allow the people that are already wealthy to leverage to make them even more wealthy while working hard to keep those already down firmly in their place. Not to mention to take advantage of them by charging them exorbitant fees for the little bit of credit that is extended to them.


I am with you. Shitty entitled customers calling female tellers whores that need to be f**ed to loosen up a bit (happened more than once) do not absolve banks from their abysmal policies. Or the system that not only enables but sometimes regulates/necessitates these policies.

I am not siding with banks in general here. I was just exemplifying one aspect to show how it is not always black and white.amd how there can be unreasonable customers.

[Edit typos]


Since you have industry experience how would you have banks rectify this perceived injustice? I can’t conceive of an underwriting process worthy of the name that isn’t going to consider those borrowers with collateral more creditworthy than those without.


For one, overdraft fees should only be charged once per calendar month, not for every overage.

Also, the burden on the bank for an overdraft is near zero, so why is it $35 for the customer? Perhaps a fee of only $5 would be less regressive?


I don’t know if it’s still the case, but it used to be at least some US banks would order pending transactions in descending order of amount to maximize overdraft fees. For example, if you had $100 in your account and had charges for $1, $1, $1, $1, and $100 then as a “courtesy”[1] the bank would clear the $100 and then charge $35 times 4 for the remaining transactions.

Anyhow that’s really awful and needs to stop, but I don’t see eliminating overdraft fees making a real dent in the advantages wealthy borrowers have.

How could underwriting be changed to identify truly creditworthy borrowers who don’t have any financial or real assets? We already know that simply lowering underwriting standards has problems.

[1] Their words, not mine.


An EU bank director told me once that their best customer is not the one that pays it's loans on time but the one that misses dates so they can charge high fees for.


Yeah, that's a very high-interest loan.


> why they should not receive a loan from the bank for the newest iPhone

Excuse my snarkiness but as someone who's been poor and needed a phone, I'd have got a loan to buy a 2nd hand phone if anyone would've given me one. Trouble is, both the companies providing the phones, and those providing the loans seem to prefer that I get the brand new one.


I am not talking about reasonable needs here. I am talking about people who are Cleary feeling entitled to receive loans they know the can never repair and would not even be willing to try.

I myself had to live on 345 Euros a month for quite some time. And even longer was living below what Germany considers the amount of monthly income that one would qualify to not be called poor ("Armutsgrenze"). I know the feeling and hear you. The only difference is that I could not even get a contract as no telco would find me creditworthy enough. Needed to do prepaid and see how I would get any old mobile.


Yup. This has been shown over and over by Meehl and that gang.

> The real benefit of any form of machine learning is detecting functional relationships between variables (ie, "when this AND NOT that OR this"). It's just that these relationships don't provide a real benefit in 99/100 real world use cases.

And the real problem with humans, ironically enough, is that due to narrative fallacy, confirmation bias, etc., humans vastly overvalue the contributions of these special circumstances.

If a dumb 2--3 variable rule predicts your friend will go to the movies in the weekend, you are likely to go, "Yeah, maybe. But she complained of that headache this morning, maybe there's something deeper there that also makes them not want to go to the movies."

In most cases, you'll be wrong to override the dumb rule.

So why do people do this? Well, the original movie-going prediction might only have been a 55 % shot. So 45 % of the time, your friend won't go to the movies, headache or not. But if they don't, you'll think you made the right call considering that headache.

If they do go, you'll end up thinking, "Right, of course. They had the headache but their friend really wanted to go so of course they would endure it."

In other words, you'll think of a way to frame your incorrect call as the right one. (Hindsight bias.)


Title is oddly framed. What is interesting or useful about merely not-quite predicting what a human will do?

Do the AI's 5% discrepency picks perform better or worse than the human's picks?

The title is worded to suggest the AI can, or is very close, do the human's job, which could totally be so.

But it could also be that the AI loses 5% vs the human, and the bank only makes 5% on loans in the first place, and so losing 5% of them is like totally erasing the entire income of the bank (from loans) which makes the AI a complete Hindenberg, rather the opposite of the implication from the title.


Absolutely! There is not enough data here to signify the actual performance of the algorithm.

Without more data it's not clear if the algorithm is actually effective - e.g. if the approval rate of these loans is 80% for instance, a formula of "always return true" could be said to be '80% accurate compared to human decisions'.

And if the AI makes the same decisions as a human 95% of the time, but then makes horrific errors the other 5% of the time, it wouldn't be an appropriate replacement for the human. This is the same issue with self driving - it doesn't matter if you make great decisions 99% of the time if 1% of the time you decide to drive into a wall (until you have got your error down to a place where it's safer on aggregate than a human).


It takes a long while to validate this statistically though as default cohorts move through the lending cycle.


Given the research by Meehl, if there's a decent numeric rule and a human decision that are closely predicting each other, my money would be on the numeric rule being slightly more accurate.



I was just coming here to post the same link. The 95% accuracy should be compared to the general approval rate - does the fancy AI beat a simple return true?


The logistic regression would likely pick up on that if it were a stratification issue.


Yup. I always advocate that the first model one should build is the one that constantly returns the most common answer.

That way, you can compare the more fancy stuff to something to see whether you're really improving.

You also get to evaluate the economic gains of the more sophisticated model against the development, maintenance, and data costs of it, compared to the dumb one.

(Another good baseline is returning a random historic result in proportion to how often it occurs. It sometimes helps against exploit attempts at the expense of data.)


Accuracy is almost never the right statistic. Here, profitability is. As soon as you realise the target statistic applies a human-goal-value-weighting to the prediction outcomes, it should be easier to see why ML systems are often unsuited to the task they're set.

Here, those 5% of cases are likely business-ending or business-making: precisely because they arent naively routine.


I think you can be even bolder: accuracy never matters. It's always about the consequences, not probabilities. Sometimes the two are the same, but most of the time they are not.

Also worth noting that the arithmetic expectation is only a good way to measure the profitability if we are talking about small amounts compared to your total wealth. For any other case, you should use the geometric expectation of total wealth to evaluate options. (This is equivalent to maximising log wealth, the Kelly criterion.)


Outcome * probability = expectancy, which is the quantity usually maximised or minimized in AI.

The problem in this case was defined as reproducing the human scores. Mainly, it's a demonstration of the information content in the data and the scores. To me, it demonstrates how simplistic the human scores are.

With information about historic outcomes, we could have compared the effectiveness of the "credit score method" with some other AI algorithm that was optimizing total value.


One little expansion: what you described first is the arithmetic expectation, which is a good approximation when the numbers involved are small compared to total wealth.

When you start taking larger bets, you want the geometric expectation of total wealth, i.e. (current wealth + outcome)^probability.

(This is the Kelly criterion for judging significant opportunities.)


If you spend 5 minutes in a bank this is not surprising. Probably 99.99999% of most banking loan staff have zero input on if a loan is accepted or not. They just follow the rules set from on high and tell you yes or no to your face instead of a computer. Their job is really just to calm you down when you get rejected or tell you what you have to do to get accepted from their secret rules.

I even worked on a loan program years back to centralize a bunch of acquired banks. They literally told their staff not to tell customers there was nothing they could do and a set of rules in the computer made 99% of the decision for the loans now.


I had an uncle who worked on a rural independent regional bank. While this was a while ago, the bank could and would use a variety of factors including references, propensity towards substance issues and others to make a loan.

I wouldn’t doubt that if you sat down, and created a data set/data collection scheme to gather this data you could make an algorithm to closely mirror the outcomes of a loan decision. However as a human, the loan officer might simply add a new criteria as desired.


How does "simple AI" compare with an old fashioned small table lookup and a few "if" statements?


In my experience, many financial products marketed as having "intelligence" are indeed just layers of SQL queries (rules).

The benefit of using SQL rules engine in a financial setting is that you can prove causality and intent throughout. Why a customer was declined for a loan can always be traced deterministically to some rule SQL that legal previously approved per regulations in that jurisdiction.


Decision trees (the base learner in random forests and gradient boosted decision trees) are effectively nested “if” statements, and predictions from tree ensembles (RF and GBDT) are effectively averages of predictions from those nested “if” statements.




Difference is that random decision forests learn the rules for themselves, they don't have to be programmed in manually by experts. They're one of the most performant "pre-deep-learning" machine learning models and a sensible baseline for many ML tasks before you go out and buy a $1500 GPU


The ability to program approval rules and understand why decisions have been made (and then tweak the ruleset based on economic/statistical analysis) is a feature for these sort of organizations, not a limitation.

There will be elements of AI which are useful, but ultimately banks will want to know why a certain decision was made, and want to incorporate their own economic calculations and forecasts into the model.


Decision forests don't provide that explainability though, how can you interpret averages of hundreds of decision trees?


The "AI" or "machine learning" part is what decides on those statements. Nothing special there, typically banks are using simple methods which produce results that also humans can understand.

Toward Data Science has a neat example [1], scroll to the end to see the sample "scorecard" which can be implemented as lookup tables and ifs. Example shows how the customer gets certain points depending on the age group, home ownership and income group. Sum up these points and you get the total score, which then translates to probability of getting your money back from that customer.

[1] https://towardsdatascience.com/intro-to-credit-scorecard-9af...


A simple AI is merely an algorithm to generate the if statements. It wouldn’t be surprising if the final model could be reduced to 20 or so “ifs”, literally.


I mean Zillow’s valuation platform that bought homes had great accuracy. We all know how that turned out— Zillow ended up buying a whole bunch of houses for much more than they were worth. Such as the fate of recommender systems operating an auction markets with humans who know better.


I really wish we would stop calling this AI.

I understand startups need funding from VCs whose idea of technology is "BLOCKCHAIN AI" but, being interested in AI, the amount of spam I have to dig through to find actual AI work is insane.


What does actual AI work look like these days?


If key==0: print(“its zero”) else: print(“here is your 10 sec un-skippable ad”)


I had a look at LendingClub data and indeed unsexy RF and GB were pretty good. This shouldn't surprise us though, because what are the likely predictive features other than how much the person makes compared to what they want to borrow? Sure, there's what they intend to do with it, perhaps adjustments as well for how high their income is relative to where they live. All things that a model can help quantify, but nothing terribly surprising either.

What's left is a bit of judgement as to whether an applicant is temporarily showing wrong data, eg if they're a student starting their first job, what is their income, low or high? But basically it means the bulk of cases can be done by machine and the loan officer looks at some corner cases.

Note this person is also an actual person, i.e. capable of being held to account. I once worked with a prime broker who decided to waive the red flagged checks on a certain client in order to win the business. The guy blew up badly, the bank lost money, and he lost his job.


Loan decision is relatively straightforward algorithmic process. I see no reason why AI is needed or wouldn't give similar answers.


Shouldn't the sentence be flipped in that case? I see no reason why expensive human is needed or wouldn't give similar answers.


The absence of human picking up obvious discrepancies the model isn't trained on or doesn't accept as inputs is an order of magnitude or three less expensive than making more bad loans (of the size/rate that aren't already determined by a basic credit check). Also, checking the loan matches the bank's credit criteria isn't the only task a bank employee performs during the course of their employment, which likely includes quite a few the AI is utterly terrible at, like talking to people.


I think they mean algorithmic as in, simpler and deterministic and auditable plain if/then/math algorithm, vs black box magic AI.

Even if an AI can produce seemingly the same results as a human, it should be out of the question anyway to let an inscrutable black box make decisions over people's lives. Because at least with a human you (their boss, or a judge, etc) can ask them "Why did you decide that?" and they can tell you. A racist or mysoginist or religious human can be identified and fired or corrected etc. How do you judge if an AI is giving inhumane decisions?

The decisions themselves can't really be judged, only the process that generated them, and you can't see that process in an AI.

If someone doesn't get a loan, and someone else does, you can't tell that wasn't right just from the final result.

Even if the results "look" wrong, like only 30% of black people get the loan while they made up 40% of applicants, even that could possibly be exactly correct, but you can't know if you can't see the process.

But a human can be asked, and simple algorithm code can be read.

Probably these days the human has so little discretion anyway that the corporate policy is the algorithm and the human is pointless anyway except as a sham human-looking interface to appease customers. It helps sales to have a human, but the human in fact wields none of the human power that the customer wants a human for.


I remember ten years ago at least some AI systems were done on loan approvals, and bam, and out came the same structural racism in financing and loans.

If the number of variables is low, then the extant bias may be cooked into the inputs and the AI result is then inevitable. For example, income will be suppressed all other variables being the same, that's an immediate loss in approval chances.

If crime maps and the like go into some risk profile on the safety of the property, well guess what, that'll be effectively racist and/or lower loan preapproval.

It doesn't have to just be that the race checkmark/dropdown becomes an input.

But you're right, the AI is a black box. Rules engines or other calculations at least can be back-traced, and maybe counter-weighted.

A great deal of structural and persistent racism comes down to housing: where you live. It doesn't take a die-hard Zillow user to know that where you live will signal things to the loan application algorithms, regardless of "race blind" applications if that is even a thing.

Thus, higher loan rate, lower maximum value, less access to various stratifications of neighborhood "eliteness". And the transaction rate/speed on housing is really low (annoyance to move, realtor fees, closing costs, PITA to shop for new/sell your old), so policy to address the bias would take a longer amount of time to actually show up in the stats than its political survival time (because so many well monied interests will lobby for its removal).

And thus, the more things change, the more it just stays the same.


Those days when you dress up, go to your local bank and the banker makes a decision based on their 'knowledge' about you, are long long gone. Loan applications are standardised and things like credit scores play big role. Things are probably a little bit different when we are talking 7-8 digit loans.


The point is that it can be solved with simple algorithms. No expensive human nor AI needed.


Not to forget that AI can be essentially a black or grey box. You have some inputs and you have some outputs. Mostly correct, but what if that fails catastrophically? At least algorithms, in this case can be mostly walked through by regular humans and errors possibly be noted. Unless the complexity isn't idiotic in charge of profits.


Mostly they are there as sanity check. Just collect the numbers and input in the system. Humans really can't be trusted, especially with money. So someone in the loop have to check can the information provided be trusted.


Every bank is already using a data driven model for credit decisions too. I am sure that is the major input for the human decision already.


Some more details would be nice.

So a loan can be good or bad. The loan officer can rate it as good or bad. And the software can rate it as good or bad.

To evaluate the situation, would like more than the "95%":

(1) Would like to see the arithmetic that yielded the "95%". (2) Would like to know the rate (probability) of false positives, when the software said the loan was good but it wasn't. (3) Would like to know the rate (probability) of the false negatives when the software said the loan was bad but it was good.

And, really, when the software and the loan officer disagreed, what were the ratings of the applications and when was the software correct and when, the officer correct.

Finally, what was the average cost per mistake for the false positives and for the false negatives. E.g., a false negative could cost the bank some business but a false positive could cost them $millions.


One of my all-time favorite papers is: In Praise of Epistemic Irresponsibility: How Lazy and Ignorant Can You Be by Michael Bishop.

The author looks at empirical evidence (experiments) and shows that you can take experts' predictions, use a super-dumb statistical model of what they do, and then outperform the experts' judgments with a simple linear model!

https://philarchive.org/archive/BISIPOv1


If you want to impress me, tell me also the results of these:

- Performance against a fair dice - Performance against a group of humans trained to predict bank managers' loan decisions


The article is a bit rubbish. They're not predicting on the binary "would we give these people a loan or not" but they're predicting manual corrections fo credit scores by bank managers. It's a 15 way classification problem (1 is low score, 15 is high). The data is distributed in a bell-curve like way with the most people in the 6 or 7 bracket.

From the paper:

> As is typical in machine learning we also report the Accuracy p-value computed from a one-sided test (Kuhn et al., 2008) which compares the prediction accuracy to the "no information rate", which is the largest class percentage in the data (23.85%).

So fair dice 23.85%, model 95%.

That said I bet a human who had read the banking rules and regulations and recommendations on lending could easily match this performance.


Performance against a group of humans not trained to predict bank managers' loan decisions, might come in pretty close as well.


i d be more interested if the ai made a better financial decision than the managers


If I run N different machine learning models over the same data, and each has some random error in fitting the objective function, then I pick the one which matches the validation data best, isn't there a danger of picking the one which was "luckiest" with the random errors? Presumably for large N that's a real problem? How do people account for that?


You have a test set that isn’t used during training, separate from the validation set. But in general this isn’t too much of a problem.

For large N sizes the models will tend to converge on the same logic. It’s actually much more of a problem with small samples. Simpler models like linear regressions and decision trees could even be deterministic.



Cross validation doesn't solve that problem. As the Wikipedia article says: "The variance of F* can be large.[26][27] For this reason, if two statistical procedures are compared based on the results of cross-validation, the procedure with the better estimated performance may not actually be the better of the two procedures (i.e. it may not have the better value of EF). Some progress has been made on constructing confidence intervals around cross-validation estimates,[26] but this is considered a difficult problem. "


Well the historical data {(x, y)...} is assumed to be distributed according to the true distribution, such that y = t(x) where t is the true function which maps x to y. Of course, in many situations, no such function exists (ie., there are genuinely ambigious xs, such that t(x) cannot produce a single y -- consider an ambigous cat/dog picture).

If we sweep models f1,...fn across the validation set ... and choose max() of scores() of f1..fn on V, we get f*.

Now if your issue is that f* might be an "unlucky draw", you're right. But there is no statistical way of fixing this -- if we know (via experiment, etc.) what the true distribution is, we can meaasure |f* - t| -- but if we knew this, we wouldnt bother finding f*.

If you want a mechanism to mitigate these problems, there is one main one: the scientific method. To test whether f* poorly reflects t, go and do some experiemnts. If you can't, then you wont know.

(Hence: there is no way of doing science via "mere statistics". It is the experimental conditions constructed by concept-laden, in-the-world, experimenters which are able to obtain the sequence of datasets needed to give confidence to any given model. ML is therefore not able to know anything, its "conclusions" enterily derivative-of, and limited-by, human experimentation. The intelligence occurs in the experimental design, when that's done, everything else is "stamp collecting").


Thank you for the detailed response. I think this is exactly what I worry about. If you were cynical enough and had enough datasets x1...xn and models f1...fn you would eventually a dataset on which a simple model performs well and be able to publish a paper like this. Even if the author here wasn't cynical enough to do that on purpose, many individually well meaning researchers looking for a nice result to publish effectively do the same thing!


I reject 95% of loan application, so if u just predict reject 100% of the time, then your accuracy is 95%. Just stupid shit.


Genuinely surprised that "bank managers" either exist or have any say in loan decisions. In the UK it appears to be pretty much entirely computerised, with a small amount of oversight from teams of analysts if the computer decision is borderline. Different elsewhere?


minimum_credit_score = Loan.joins(:borrower).where("loans.default = true").average("borrowers.credit_score")

if loan_applicant.credit_score > minimum_credit_score

  decision = "approve"
else

  decision = "reject"
end


You want to use the average credit scores of all defaulting loans as threshold? That seems really low, you're setting your bank up for a lot of defaults. But then there's a lot of selection bias in your data -- presumably your bank has been denying loans to people with bad scores, so over time you your minimum credit score is the upward inching average inside the cracks between safe and denied loads.


It depends heavily on the interest rate as well. If the interest rate is high enough, you can show a profit even with a fair number of loans sent to collections.


On a long enough timeline minimum_credit_score will converge towards maximum_possible_credit_score.


Would a linear decision barrier not be fit by the logistics model?


Because this is a trivial problem reducible to an n-dimension vector.


"Heuristics that work 95% of the time"


As bank I'd look into these 4.x % of loans where the machine learning disagrees. This headline feels like 20 years old though, except for calling it "AI".


I don't think you need AI. Like in Italy the total of your loans cannot exceed 1/3rd of your monthly salary, as simple as that.


Does this mean most Italians rent their homes for life and a small fraction purchase their homes in cash?


No, we have some of the highest home ownerships in the world.

1/3rd of salary is more than enough to buy you a home. That's enough to get a 200k loan for 30 years while making average salary (1700 net euros).


Aha you mean the interest cannot exceed 1/3 of salary, not the principal? That makes more sense.


I'm not sure I expressed myself correctly: whatever you make you can't have your loan payments exceed one third of your monthly salary. So if you make 2000 euros, your loans cannot exceed 666 €.


More likely to be based on the payment (principal plus interest) than on the interest alone.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: