This is a serious problem with all AI that makes decisions like this. Having a Hotmail account is a symptom of a problem which could lead to higher risk, but it is judging that symptom as if it is a cause which leads to risk.
Its exactly similar to the supposition that black people statistically commit more crime, and thus should pre-emptively receive harsher bail or be profiled. Unless you can scientifically prove that ethnicity causes crime to happen, its disgusting. This is obvious to us.
In year's past, correlation fallacies might have meant that black people were profiled by police more. Today, it means that we build AIs which make life-ending decisions like determining repeat offender risk, visa status, employment decisions, all of which predetermines an unknown outcome by correlating your known qualities with the known outcome of people with similar qualities to you.
We need regulation in place to stop any punitive decision making, public and private, which can be found in court to be based on correlation instead of causation.
The real concern here isn't discriminating against people with Hotmail accounts. On the surface level, there's nothing wrong with that. This does become an issue if the property of having a Hotmail account is used as an avenue to discriminate against people based on their membership in a protected class -- a sort of actuarial parallel construction, if you will. At that point, we do have a bit of a problem.
> We need regulation in place to stop any punitive decision making, public and private, which can be found in court to be based on correlation instead of causation.
So, young men could not be charged more for insurance than young women? In fact, young people in general would have to get the same price as everyone else? People in New York would have to pay the same as someone in rural Alaska? Someone who's been driving a year gets charged the same as someone who has been driving fifteen, etc.? Even being in an accident doesn't cause you to get in another accident. It's just correlated with that outcome. Throw out the entire actuarial aspect of insurance? These "punitive" charges are all based on correlation. I don't see this happening.
That said, as a non-statistician, I'm interested in what kinds of tests can be run on signals to verify that they're not proxies for selecting and discriminating against protected classes. Is there a way to modify a signal using its correlation with membership in these classes such that you can marginalize the attributes you're not allowed to discriminate based on?
For example, it seems reasonable to believe that hotmail use is correlated with age, race, gender, etc. As an insurance company, I would have information on hand to inform me as to the extent of this correlation, at least among my customer base, and I would in turn know the correlations of those factors with risk. Could I somehow remove the racial, gender, and age components of the Hotmail risk signal to obtain a signal that only conveys the portion of the risk correlation that is not based on those classes? If so, what is that statistical technique called?
Racial discrimination requires "strict scrutiny" to be legal.
Sex discrimination requires "intermediate scrutiny" to be legal.
This podcast gives a wonderful history:
All that remains to do in the US is find the political will to outlaw discrimination that favors women.
edit And then I get downvoted for sticking my neck out. Let me break the parent post down.
It asserts that charging genders equally will shift more burden to women. This is clearly true because premiums are higher for men because we have more accidents (so much for all the female driver jokes).
It also asserts women were a previously disadvantaged class. Also clearly true.
So to the downvoters out there, you might have found something disagreeable about the parent post, but you're wrong and he's right.
> fair share
Male drivers and female drivers don't have a "fair share" of insurance costs. No one should be obligated to pay for the actions of others simply because they're the same sex.
Driving risk should be evaluated on an individual basis based on those individuals' behavior.
> putting them at even more of a disadvantage
Young women are no longer at a disadvantage. Young women graduate college more often and earn more than young men.
I focus on young women and young men in this case because those are the people most affected by gender discrimination in auto insurance pricing, because they lack individual driving records on which to base insurance rates.
Women graduate from college more often, but college is not what it once was - I'm a college dropout and I'm sure I earn more than most graduates (plus I don't have the debt.) Women still earn less than men. Your claim that they earn more is just false.
But maybe if you look beneath the surface you will find out that minorities are more likely to drive older, less safe cars, which leads to them having more accidents - they don't have more accidents because they are black, they have more accidents because of other factors.
The same with men and women - I think it's fair to say that men drive more powerful, larger cars(on average) than women do. So....they get into more accidents because the cars they drive are objectively more difficult to drive. Which obviously means that you should calculate your premium based on the type of vehicle being insured, not on the gender of the driver - and, by extension, insurance for a man and a woman, on the same vehicle, should be identical. Penalizing a man in this situation because men as a group get into more accidents is absolutely unfair.
I know you're sorta spitballing here, but at least in my experience that's a bit backwards. Around here, I'd guess women drive SUVs more than men do. Growing up, fathers would buy their daughters SUVs because they were safer. Now, husbands have their wives drive the family SUV for the same reason. Plus, moms and minivans, etc. That might just be a thing in the area I grew up, though. It's probably different in downtown Seattle.
> So....they get into more accidents because the cars they drive are objectively more difficult to drive.
I'd bet the testosterone doesn't help much, either :-)
I'd love to see actual data on this , especially power and size of cars grouped by gender.
>>I'd bet the testosterone doesn't help much, either :-)
Women have their own behaviour-affecting hormones too, you know.
Where in the UK are you? I lived several years in England as a teenager and went to a 'nice' school in the suburbs of Surrey and almost everybody drove their kids to school.
Also in the UK and can report that mothers driving “Chelsea tractors” is totally a thing.
> Women have their own behaviour-affecting hormones too, you know.
Of course. I was making a half-joke.
Whoa there. Someone might infer from this that you believe men might generally make decisions / respond to situations differently than women with biological reasons being a contributing factor. That's a dangerous line of thought.
Oh, come on, do you really think insurance companies don't break out all the different factors? They're in a cutthroat business where every tiny margin they make over their competitors is a big advantage. But if they guess wrong, they'll cut premiums too aggressively and lose money on accident claims.
Car size and gender are both factors. It's possible they're partly correlated, but actuaries can still incorporate both factors into their models.
Charging higher premiums for men is fair if you only care about accident rates. If you feel it's a form of discrimination and want to level the playing field, that has to be done via government regulation.
The simplest "AI" (linear regression) will already factor this out if it has the data about the car driven and gender. Any remaining gender imbalance is attributable either to another factor or it is actually gender related (e.g. increased road rage, high speed driving and risky following behavior due to testosterone might be a stronger accident predictor than too careful driving (not speed matching fast enough when merging on the highway), too slow driving or worse spatial reasoning which are usually correlated with higher estrogen levels.)
In general the AI will just become a better predictor the more data it has available and the more detailed the data is. But an effeminate man or a manly woman might behave more than the opposite gender and thus be treated wrongly. We could probably solve this by tacking on a recent endocrinological report to every application, but this gets quite privacy invasive at some point.
In general if you get insurance you want to be part of a larger risk group to mitigate your effects. Having no insurance just means you have the most fair risk group, which is just yourself.
Because actuarial tables say so. Insurance companies that decided it did not matter folded or changed their opinion after the losses
Race is the other way around.
Insurance companies have a strong incentive to price policies accurately, so you have to assume that discriminatory decisions are made with that goal in mind. Any restriction on factors that make the model less accurate (like forbidding race) will lead to higher volatility and a lower stock price (insurance companies are prized in portfolio management for their low volatility).
The government, however, recognizes that most people are forced to buy car insurance, which means that they are being forced to pay for those profit-maximizing decisions. And to the extent that being forced to pay can creates a downward spiral of high costs leading to poor decision making and even higher costs, that is something to be prevented.
So where to draw the line? Well, the people most at risk of getting into a downward spiral are the already economically disadvantaged (read:poor). The government already prevents considering whether the insured is poor (either explicitly or implicitly by not allowing questions about income/net worth on applications).
But really, any factor which would tend to disproportionately target the already-poor would need to be limited as well, since that can often become an end-run around the limitation. If you consider the factors that would do that at the population level, race would be very strong indicator of income, but gender would not be. (yes, I know women make less in the same jobs, but we're talking about the macro here, including women who are heads of household, 2-income families, and stay-at-home moms).
Now, it doesn't explain why other factors that would seem to be associated with low income like zipcode are allowed. But to answer your question about why gender is ok to price discriminate against while race is not, I think the answer is we try not to force poor people to pay more for government-mandated things just because they are poor or otherwise exhibit a feature that would characterize them that way to an uninformed observer.
While some studies show that it is true that Mexicans/African-Americans/insert-other-group-here are a higher insurance risk due to crime and to a smaller extent due to accidents, this is not because of their ethnic background but because of other perhaps correlated but not causally linked factors - they are more likely to live in poor areas with higher crime rates and not be able to afford better security measures. Being Mexican/black/other is not a causal factor despite there being some correlation, so it is not fair to discriminate based on that property, any discrimination should be based upon the causal factors, rather than the lazy correlation, to avoid being unfair to those said factors do not apply to.
(FYI: male here, with many an anecdotal tale to support us being worse drivers generally, if only by a little bit)
lt;cbatr: correlation does not imply causation. It is not fair to discriminate based on factors that are merely correlated with the risk, but it is fair to discriminate based on factors that can be shown to be causally related to the insurance risk.
Of course in the UK (and the EU more generally?) it has gone the other way: insurance companies were forced to stop discriminating by gender which means they can no longer give women lower premiums. Did they lower male premiums at the same time to make up the difference, or just bump female premiums up and pocket the extra? Go on, guess...
Are you suggesting that their "inherent racism" is causing them to forgo shareholder profits?
Using discriminating factors that are simply correlated rather than causal is unfair as it penalises some people within a given profiled group for no good reason. This may not be due to inherent racism, but merely due to not properly understanding the statistics. Or not due to inherent racism now but because of lazy "it has always been done this way" reasoning. It may in part be due to racism, of course, as people are less likely to question results that agree with their worldview.
Identifying the correct causal factors and using the as discriminators is fair, and can be demonstrated to be better for the business too by allowing lower prices for some groups aiding competitiveness, but it can be harder work leading to the less effective and less fair option being used - saving effort now at the expense of being fair (and some potential longer-term business benefits).
A similar norm I often see from caregivers is that they consider boys to be "rough and tumble" and girls "soft and dainty". It is easy to see how those behaviour classes could extend to insurance risk.
Is being a woman/man really the causation? Or is it something like the environment that the genders are brought up in? If it is the latter, how is that any different than a minority being brought up in a poor neighbourhood (and, statistically, stay there)? After all, in both cases, you are penalizing those who had an upbringing that is outside of the statistical norm due to a presence of correlation.
With a lot of states changing the requirements for drivers permits to require more training and disallowing multiple occupants under certain conditions when the driver is newly licensed it will be interesting to see if this changes.
Why should an actuary care about cause? The correlation is what matters for that purpose.
Edit: guys, you should probably consult a lawyer. If you don't refuse to disclose but lie instead, that's called deception and at least in the EU it's a criminal offense. Also, morality, huh?
Why “should” it? Don’t hold your breath.
Why wouldn't the same be true of differential treatment to the detriment of men?
Do you? Because you're going to actually have to make this argument for me - I don't think I've ever seen feminists say that capital accumulation mechanics are the cause of misogyny.
In this case auto insurance rates are both a consequence of the stereotype and a mechanism that imposes it on new drivers.
The argument for restricting discrimination on a gendered basis relies on the socially instructive nature of vehicular insurance to teenagers.
That feels exceptionally weak.
I really enjoy the fact that young people pay for car "insurance" more because of their increased risk at driving, but old people don't pay for health "insurance" more (if they pay at all) because of their increased risk at getting ill. Because one of those would be totally unfair while the other one is totally OK. You understand which one is which, right?
It is basically youth serfdom.
My health plan, purchased through my state's ACA exchange in the US, costs me as a 57 year old 2.71x what it would have cost a 23 year old purchasing the exact same plan.
I would happily take a 23 year old's car insurance premiums if I could get a 23 year old's health insurance premiums in exchange.
They do in fact pay much more for health insurance. US law does now limit how large of a difference there can be to lower than the natural amount, though, so young people are subsidizing older people's health insurance.
The reasoning being, over your lifetime you pay a bit more when young, which acts as a virtual savings account for when your older and need more care.
If you relied on people saving, they wouldn't and you'd have old people dying in the streets. Similar logic applies to enforcing pension savings.
Young generation is guaranteed nothing for their contribution.
It could only be sound if population is to grow organically and indefinitely, which will obviously be unsustainable either.
Modern economy works in this way: you give up ~40% of your income, which is then redistributed between retirees. Infrastructure and all other kinds of social programs also take their cut. This works when there's two workers for every retired person
If there would be as many retirees as workers, you will end up giving 80% of your income. But it is totally unsustainable. Imagine that you pay $10 for a coffee, and the barista only gets $2 towards his account, post-tax. She won't be able to make decent living. You strangle life out of economy and make it go to grey zone.
Yes we may be able to produce endless amount of wealth, but we are not able to make this sheet balance itself. There's just 100 of percents and they don't make those anymore.
Say you _don't_ collect info on people's religiousness but other properties you collect in effect result in you either discriminating for or against either religious people or perhaps against atheists.
If there is no awareness, is there still liability?
"Where a disparate impact is shown, the plaintiff can prevail without the necessity of showing intentional discrimination unless the defendant employer demonstrates that the practice or policy in question has a demonstrable relationship to the requirements of the job in question"
You would need to keep in mind xkcd #882 (https://xkcd.com/882/).
OK, so I find that among Black users, Hotmail is associated with a 1.05 relative risk to the baseline, and among white users, 1.15 relative risk. How do I then apply this knowledge to an incoming user of an unknown race with a Hotmail account? Do I give them the population weighted average of of the relative risks among all my buckets? What if I do know their race? Do Black users get assigned 1.05 as much risk and white users get 1.15?
Also, what if some of my buckets aren't full enough to get a high confidence of the relative Hotmail risk in that bucket? Let's say I don't have enough gay Hispanic women in my cohort. Do I just drop them from the analysis and hope they're similar to the other populations?
No, it's not. Insurance is not about establishing causality. Insurance is about assessing risk. If you are correctly assessing risk via proxy signals, you are still correctly predicting the losses that will have to be covered by your insurance, it's completely irrelevant how the losses are causally connected to the signals that you use for the risk assessment. If careless people are more likely to both cause accidents and to have hotmail accounts, then use of a hotmail account does correctly predict an increased risk.
> Its exactly similar to the supposition that black people statistically commit more crime, and thus should pre-emptively receive harsher bail or be profiled.
No, insurance is not about influencing people, it is about assessing risk. Insurance companies ultimately don't care about how much losses they have to cover, they only care about pricing their premiums such that they do have the money to pay for it. Insurance companies don't raise premiums to incentivize people to reduce risk. Insurance companies raise premiums because there are more losses to cover where the risk is higher. That that might also sometimes incentivize people to reduce risk is a side effect, but not an intention of the insurer.
Historically, insurance companies only had things like postcode, gender, age, accident history to go off... but they're always interested in other data sets that could be used to modify pricing (or select less risky customers).
Clearly someone or something has identified email domain as a factor (presumably based on data).
More customers (full stop, if you have customers then your competitors don't have them, so you grow and they fail)
Customers who pay more money
Correctly assessing risk is not irrelevant but it's a small factor. You will not go out of business if you get the risk factors slightly wrong, you will if you can't attract customers or if they don't pay.
In markets where you insurer fewer, bigger customers you care even less about accurately computing their individual risk, and more about a problem called aggregate risk - if you ensure all the residential blocks in San Francisco against earthquake damage, the Big One wipes you out. Insure one such block in every US city and you're fine, the Big One means you take a loss in SF but not in New York or Seattle. But if they're all built from Newsteel and then it turns out Newsteel turns into mush after 25 years you're back to big problems. So you are suddenly very interested in understanding what is the Same about things you insure.
It's no secret that sports cars are more costly to insure, specifically because people are more likely to drive recklessly in them.
If we didn't discriminate against people's car preferences, everyone's insurance would increase overnight.
No, it's not judging that symptom as a cause. It doesn't even matter if it's a cause or a symptom, as long as it is statistically sound. Hotmail, like [not] smoking, [not] exercising, [not] drinking alcohol, is a choice. Having chosen one way or the other does, statistically, imply other things.
Whether we, as a society, want to allow this kind of statistical inference to meaningfully impact things like insurance costs, prison terms and other important decisions is a completely different issue.
Now, assume for a second that ability to use modern technology (like touch screens and digital displays) correlates with safer driving -- an assumption that could actually be tested. If that's the case, then you would probably accept that a score on some objective test of tech savvy could affect car insurance rates, right? An objective, though slightly less accurate, estimator of tech savvy is the domain name - "@microsoft.com" or "@google.com" means likely savvy, "@aol.com" or "@hotmail.com" likely less so.
Age, Gender and Religion are other predictors of the same kind; e.g. observant Jews have significantly lower (~15% in some stats) chance of being in a car accident. That's not because they are better drivers or drive better cars - it's because they refrain from driving for one day a week. Knowing someone is Jewish therefore implies lower chance of being in a car accident - maybe only 3%, because 80% do not observe. So "being Jewish" is, statistically, an informative predictor.
Again, I am not saying these SHOULD be legal - but that's a philosophical/legal issue. Where I now reside, the legally approved predictors are: (a) age, (b) accidents in the last 5 years, (c) most recent validity period of drivers license, and (d) age and safety rating of car. It's not as "free" a market, but it is much saner.
 but objectivity is subjective... http://wiki.c2.com/?ObjectivityIsAnIllusion
I know some non-causal predictors, yet they're all controversial: various data as predictors for race, for example an address to determine which part of the city one lives.
This is my argument: it could be a `good` predictor but it's a proxy and it only works because there is no transparency. As such I don't think it's ethical for two different reasons.
I'd also like to add to your anecdata: I still use my first hotmail email as primary. I'm a young technical and safe driver (I hope).
That's your definition. "Causal" is well defined, "true predictor" isn't. Let's adopt your definition for the sake of argument ...
> age causes defects, smoking causes sickness, alcohol impairs
Age is correlated with defects but does not directly cause it. Some people at 40 have less "age related defects" than other people at 30.
Smoking causes sickness, but is also correlated with less obesity, and -- in the minutes/hours after smoking, is correlated with being more alert (possibly caused by a nicotine response, I don't remember the biology).
Religion _sometimes_ causes behavior, the only thing you can say are that they are correlated - the vast majority of Jewish people I know are NOT observant, and yet - over the general population - it does correlate with less time on the roads even for the general population.
Yes, it's a proxy, but so is age, or "time since you got your license" - Some new drivers are way better than other drivers who have been driving for 20 years.
Whether it works only because of (no) transparency is up for debate - my great uncle could not switch from AOL to gmail on his own without my help, and does not want to. He might want to if he knew it saved him $100/year in insurance costs, true, but it would cost me time in support costs (gmail is easier; but he's been using AOL for over 20 years and it's hard for him to change his ways).
> As such I don't think it's ethical for two different reasons.
I don't think it's ethical either. And I would thus like it to be made illegal - because, statistically speaking, right now it is likely a good proxy and predictor for a lot of things, and therefore as long as it's legal it will be used.
We have "blacklisted" other things, like race -- e.g., in the US, unfortunately "skin darkness" is a good predictor for the question "having spent time in jail" (about 10% of black people vs. 1% of white people). It is illegal to use as a predictor -- but it turns out that other "legal" predictors such as "lives in a high crime neighborhood" correlate strongly with "skin darkness" and ARE used to determine sentences.
The right thing, IMO, is to whitelist acceptable predictors, not blacklist unacceptable ones because you can always find proxies for the unacceptable ones, and it is easy to game one of those proxies, but hard to game 20 of them at the same time. But that's an ethical, legal and philosophical debate. Statistically, proxies are sound.
 I have no access to real data, I'm just assuming for the sake of argument
I'm actually not satisfied with that definition, but for now it works well enough for the sake of argument. I would like to define it as a predictor whose strength does not change when it's used transparently. But that definition is too vague for now.
And you're right that when used transparently, the predictor will probably become more accurate.
> Smoking causes sickness, but is also correlated with less obesity, and -- in the minutes/hours after smoking, is correlated with being more alert (possibly caused by a nicotine response, I don't remember the biology).
In : [Since] nicotine is a metabolic stimulant and appetite suppressant, quitting or reducing smoking could lead to weight gain.
I didn't know that smoking caused less obesity. I would have thought it was because of monetary reasons.
> I didn't know that smoking caused less obesity. I would have thought it was because of monetary reasons.
It does not cause less obesity. It is correlated with less obesity, possibly through suppressed appetite -- but it's also possible that it's actually a reflection of economic background which is also correlated with obesity.
It apparently does cause more alertness (in the same way that coffee does -- enough of it will make you awake and alert, too much will kill you).
I am personally ok with higher car insurance premiums for people with a prior DUI conviction, and higher health insurance premiums for smokers; I'm not ok with higher health insurance premiums for albinos, or people with a family history of cardiac problems or family history of type I diabetes -- even though they are mathematically justified (which means that -- not being in either group -- my own premiums are higher as a result).
> Hotmail, like [not] smoking, [not] exercising, [not] drinking alcohol, is a choice.
You might want to understand correlation, causation and dependence, and implication before repeating that 1000 times. I specifically used the term "statistically imply", which is correct, and means "has predictive power", which is true in this case of correlation whether or not causation is involved.
Correlation actually implies a lot of things. At the very least, it implies statistical dependence (to the confidence attainable with the data). The converse is NOT true -- lack of correlation does not imply independence. But as independence implies LACK of correlation, correlation necessarily implies dependence. This is very basic probability, and is relevant to estimation of which insurance is an application. Causation is a completely different subject.
Care to elaborate? What exactly do you disagree with? Is any of the listed things not a choice?
Think screening for a virus vs trying to cure the virus. If warts are a symptom of the virus, having warts absolutely matters in screening. However, cutting off the warts will not remove the virus.
If a geneticist were to state that some gene or another were linked to the unibrow  would you consider this scientifically dubious? After all, in that case all they did was look at people who have unibrows and look to find characteristics that were more common to this group than others. What if that genetic indicator did show up disproportionately among all people with unibrows and was relatively less common among those without a unibrow?
 - https://www.nature.com/articles/ncomms10815
If someone was to prove such a fact (say, a gene prevalent in some ethnicity that causes propensity for fraud) would you support decision making based on the ethnicity? How about based on the presence of the gene?
Is it fair? No. Will legislation be put in place to fix it? No. The only thing that levels the playing field is .. Nevermind, let's not get into politics.
Let's imagine that being black was actually what caused black people to commit more crimes and this was scientifically proven, how does this make it less bad to discriminate against some black guy who may or may not be a criminal?
From a business perspective, correlation is a good measure and making decisions based on it is perfectly fine (if the correlation is real).
From a moral perspective, it is not OK to discriminate based on correlation or causation.
The distinction between correlation and causation matters when you are attempting to affect the outcome. For example you can't cure someone's sickle cell anemia by painting them white.
Let's say you accept the answer you've received based on your heuristic that black people will statistically commit more crimes.
If a policy was based upon this heuristic, then you would have people living in poverty that would not be subject to the same policy. This is what discrimination is about. Citizens would thus not have the same rights / opportunity.
And the only reason you are accepting this heuristic as "good enough", is because you make the assumption that "you have modelled [enough of] the variables that make up the solution space".
This is not acceptable for policy makers and should be watched for closely in a democracy.
> and this was scientifically proven
which renders moot any discussion as you simply did not use the same assumptions. The discussion was about specialist decision systems using correlation as a heuristic for causation. You cannot explore the merit (or shortfalls) of this common practice and its possible effect on policy-making when positing that causation has been proven. This is nonsense.
Have a good day.
And the discussion was NOT about systems using correlation as a heuristic for causation. If a company filters users based on something, only correlation matters for their purposes. Believing correlation is only useful as a heuristic for causation is ignorant. For some purposes, correlation matters. For some purposes, the root cause matters. Not everything is the same.
Whether or not ethnicity is a causal factor is irrelevant. Even if it is a causal factor, it is still an injustice to act on it.
And if you didn't care about justice and just wanted to achieve an optimal prediction, whether or not it's a causal factor doesn't actually matter. Probability theory does not require a causal link.
"If it is rainy then it is cloudy." Is a valid inference despite the fact that rain does not cause clouds.
It is not, at least locally (because sunshowers happen.)
That's not how the scientific method works.
Actuarial analysis for insurance to exercise a privilege (i.e. driving) is not something that needs to be more heavily regulated. This is not health care/insurance, food, housing, or other critical necessities.
Correlation is a core part of actuarial science. You'd actually destroy the insurance business if you regulated that into nonexistence as you desire.