Hacker News new | past | comments | ask | show | jobs | submit | DrRavenstein's comments login

Thanks for the comment.

My point - which you actually make for me at the end of your comment - is that this stuff is probably intuitive to an NLP practitioner but not to a lay-person and therefore there's a kind of education/awareness piece which is what I'm trying to do here. There's no profound statements being made and I like to think I know my way around this stuff pretty well.

Based on the feedback I've added an update with couple of new experiments where I play with multi-turn contexts. With true RNG this shouldn't make a difference but with LLMs and the way that they use context, I figured you're probably right - it's worth trying.

Looks like (and again probably no big surprise for those of us who are familiar with these systems) multi-turn probabilistic behaviour is still not in line with what was asked for within the prompt.


I've been investigating how well the smallest Phi3 and Llama3 models do at BioASQ - a biomedical Q&A reading comprehension task.

Llama 3 8B is certainly quite competitive and compelling. Phi doesn't do quite as well but there may be opportunities for fine tuning.


This headline and article are horrible and misrepresent the problem and the outcome.

The paper is about the manual process of re-assigning a credit score on a scale of 1 to 15 based on other customer criteria. Really the fact that this process exists at all shows that their initial credit scoring approach is flawed or too simplistic. The argument of "just replace it with an if statement" does not hold up in this scenario.

So this is not a "if number big lend. If number small no lend" problem. Its a 15 way multi-class classification problem. They even give a baseline for what happens if they randomly pick or always pick the biggest class in the paper

> As is typical in machine learning we also report the Accuracy p-value computed from a one-sided test (Kuhn et al., 2008) which compares the prediction accuracy to the "no information rate", which is the largest class percentage in the data (23.85%).

So yeah, 95% is somewhat better than 23.85%.

I agree with the general sentiment that is is likely a fairly straight forward problem to predict if you are familiar with the bank's operating procedures as there is no way these individuals are making their own risk models and independent decisions. They are there to follow the rules and provide human accountability.

An error analysis on the items the model couldn't predict would definitely have been most interesting.


Plus, systemic risk of a repeatable exploitation is more likely without humans in the loop. Making a bad loan for $1M is bad, but if “attackers” can repeatedly prove until they get a bad risk $1B loan, it becomes business shattering.


Yes, which is why there are hard breakpoints where banks' credit policies change and specifically an absolute ceiling for automation in terms of risk of loss on lending that is set in the credit policies of the bank and any changes have to be approved by their regulator.

No bank in any credible jurisdiction will have an automated system approving 1bn USD equivalent loans any time soon. A typical system would be loans up to a certain amount can get approved automatically, up to $xM by a credit officer, up to $yM by a senior credit officer, anything over that by the credit committee. Regulators push back very hard on automated decisionmaking for large loans particularly because of "default correllation skew"[1] problems as were revealed in the 2008 crisis. Relatively few bad decisions on big loans can push a bank into difficulties if they are not well-capitalized. This is particularly a problem for automated decision-making because as loans get larger they also get more idiosyncratic and therefore it's much harder to fit a model with confidence because there simply aren't enough data points.[2]

[1] Often credit quality for a group of loans rises because of idiosyncratic factors but deteriorates together, so as loans become more risky in an adverse economic environment the correllation of the default probability between the loans goes up. An intuitive way of thinking about this is in housing loans. If 3 or 4 of your neighbours default on their loan, property prices on your street will go down (because the banks will be trying to sell all those houses at once), making it much more likely you and the rest of the residents will default too.

[2] Say I'm trying to approve a 200k loan to expand a pizza restaurant. If I'm in a big bank I have hundreds of similar loans to use as data points for pricing and risk. If I'm trying to approve a 200M loan to build a luxury hotel complex that includes 5 restaurants, accommodation, retail etc it is completely a one-off. Even if I'm the largest commercial lender this loan will be unique in my portfolio. I will have many other large loans but there will be lots of idiosyncratic factors that make them different.


Do they remove the human in the loop? That doesn't seem like a smart idea. A model would be good just for suggestions.

Humans are both biased and with high variance (not to mention corruptible), but the algorithm can benefit from much better scrutiny and ensure uniform application of the criteria. If a human overrides, then they got to have a good reason.


In Europe you have an absolute right to a human making your loan decision[1]. So if a loan decision is made automatically you can request a human decision instead.

[1] https://ec.europa.eu/info/law/law-topic/data-protection/refo...


The article is a bit rubbish. They're not predicting on the binary "would we give these people a loan or not" but they're predicting manual corrections fo credit scores by bank managers. It's a 15 way classification problem (1 is low score, 15 is high). The data is distributed in a bell-curve like way with the most people in the 6 or 7 bracket.

From the paper:

> As is typical in machine learning we also report the Accuracy p-value computed from a one-sided test (Kuhn et al., 2008) which compares the prediction accuracy to the "no information rate", which is the largest class percentage in the data (23.85%).

So fair dice 23.85%, model 95%.

That said I bet a human who had read the banking rules and regulations and recommendations on lending could easily match this performance.


Difference is that random decision forests learn the rules for themselves, they don't have to be programmed in manually by experts. They're one of the most performant "pre-deep-learning" machine learning models and a sensible baseline for many ML tasks before you go out and buy a $1500 GPU


The ability to program approval rules and understand why decisions have been made (and then tweak the ruleset based on economic/statistical analysis) is a feature for these sort of organizations, not a limitation.

There will be elements of AI which are useful, but ultimately banks will want to know why a certain decision was made, and want to incorporate their own economic calculations and forecasts into the model.


Decision forests don't provide that explainability though, how can you interpret averages of hundreds of decision trees?


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: