Hacker News new | comments | show | ask | jobs | submit login
Car insurers accused of discriminating against people with Hotmail accounts (theguardian.com)
122 points by lumisota 10 months ago | hide | past | web | favorite | 219 comments



Its obvious why this is happening; some form of statistical modeling or AI correlated having a hotmail account with being higher risk.

This is a serious problem with all AI that makes decisions like this. Having a Hotmail account is a symptom of a problem which could lead to higher risk, but it is judging that symptom as if it is a cause which leads to risk.

Its exactly similar to the supposition that black people statistically commit more crime, and thus should pre-emptively receive harsher bail or be profiled. Unless you can scientifically prove that ethnicity causes crime to happen, its disgusting. This is obvious to us.

In year's past, correlation fallacies might have meant that black people were profiled by police more. Today, it means that we build AIs which make life-ending decisions like determining repeat offender risk, visa status, employment decisions, all of which predetermines an unknown outcome by correlating your known qualities with the known outcome of people with similar qualities to you.

We need regulation in place to stop any punitive decision making, public and private, which can be found in court to be based on correlation instead of causation.


I'd say there's a one really important difference between being Black and using Hotmail, and that difference is the same reason why one is a protected class and one isn't.

The real concern here isn't discriminating against people with Hotmail accounts. On the surface level, there's nothing wrong with that. This does become an issue if the property of having a Hotmail account is used as an avenue to discriminate against people based on their membership in a protected class -- a sort of actuarial parallel construction, if you will. At that point, we do have a bit of a problem.

> We need regulation in place to stop any punitive decision making, public and private, which can be found in court to be based on correlation instead of causation.

So, young men could not be charged more for insurance than young women? In fact, young people in general would have to get the same price as everyone else? People in New York would have to pay the same as someone in rural Alaska? Someone who's been driving a year gets charged the same as someone who has been driving fifteen, etc.? Even being in an accident doesn't cause you to get in another accident. It's just correlated with that outcome. Throw out the entire actuarial aspect of insurance? These "punitive" charges are all based on correlation. I don't see this happening.

----

That said, as a non-statistician, I'm interested in what kinds of tests can be run on signals to verify that they're not proxies for selecting and discriminating against protected classes. Is there a way to modify a signal using its correlation with membership in these classes such that you can marginalize the attributes you're not allowed to discriminate based on?

For example, it seems reasonable to believe that hotmail use is correlated with age, race, gender, etc. As an insurance company, I would have information on hand to inform me as to the extent of this correlation, at least among my customer base, and I would in turn know the correlations of those factors with risk. Could I somehow remove the racial, gender, and age components of the Hotmail risk signal to obtain a signal that only conveys the portion of the risk correlation that is not based on those classes? If so, what is that statistical technique called?


If you can explain why it is OK for car insurance to charge differently by gender, but by race is not OK, I will be impressed.


Because they are different things (in the US).

Racial discrimination requires "strict scrutiny" to be legal.

https://en.wikipedia.org/wiki/Strict_scrutiny

Sex discrimination requires "intermediate scrutiny" to be legal.

https://en.wikipedia.org/wiki/Intermediate_scrutiny

This podcast gives a wonderful history:

https://www.wnyc.org/story/sex-appeal/


I can't say for certain, because I wasn't involved in the decision. ;) My gut reaction is that it only ends up being seen as OK because the group being penalized has historically been at an advantage in most other areas financially. I am aware that some European countries are moving to do away with charging by gender.


Not some - as far as I am aware, all of EU has now banned charging men more for car insurance.


Not only car insurance, but also eg life insurance. Men used to pay up 50% more, now they pay less and women pay more - all tariffs are "unisex" now.


The US did the same thing with health insurance. It seems the moral debate is over; charging different rates for insurance based on gender has been deemed discrimination throughout the first world.

All that remains to do in the US is find the political will to outlaw discrimination that favors women.


Well, then women will be forced to pay more than their fair share for car insurance, putting them at even more of a disadvantage. Right?


And this observation is downvoted because? It's just stating obviously true facts as far as I can tell.

edit And then I get downvoted for sticking my neck out. Let me break the parent post down.

It asserts that charging genders equally will shift more burden to women. This is clearly true because premiums are higher for men because we have more accidents (so much for all the female driver jokes).

It also asserts women were a previously disadvantaged class. Also clearly true.

So to the downvoters out there, you might have found something disagreeable about the parent post, but you're wrong and he's right.


I see two things wrong in the post:

> fair share

Male drivers and female drivers don't have a "fair share" of insurance costs. No one should be obligated to pay for the actions of others simply because they're the same sex.

Driving risk should be evaluated on an individual basis based on those individuals' behavior.

> putting them at even more of a disadvantage

Young women are no longer at a disadvantage. Young women graduate college more often and earn more than young men.

I focus on young women and young men in this case because those are the people most affected by gender discrimination in auto insurance pricing, because they lack individual driving records on which to base insurance rates.


I've heard there's an app for that in the US, you drive with it for a week or two and then they adjust your premiums according to how you drive. That would be a more fair system. We're not discussing that though, we're discussing the current system vs one where insurance companies can't discriminate on the basis of gender - clearly that advantages men unfairly and disadvantages women unfairly. Yeah the system is still unfair the way it is, but this change would make it more so.

Women graduate from college more often, but college is not what it once was - I'm a college dropout and I'm sure I earn more than most graduates (plus I don't have the debt.) Women still earn less than men. Your claim that they earn more is just false.



That's interesting, although as those sources indicate, the pay gap increases with age so that by the time women hit their mid thirties they're substantially behind men in compensation (for the same work.) It's an improvement to be sure, but I wouldn't go so far as to call the problem solved and say that women aren't still a disadvantaged class, when it comes to compensation.


Look at my other reply to the parent post for my explanation as to why I disagree.


I suspect few HN users could provide sound logic to support their anonymous downvotes. Downvotes are good — at least it means there’s a chance someone eventually modulates their current viewpoint.


Don’t worry about downvotes; oppression by the status quo for random and varying reasons is a discriminatory mechanism to make our words invisible. Keep commenting!


This is a common mistake. Reducing inequality is not morally equivalent to penalizing the person who was previously in a superior position.


The issue here is the phrase "fair share" - why is it a fair share? Because women get into fewer accidents? Cool, what if we get out of the statistical data that black people have more accidents(this is just an example, I don't know if they really do)? Should it be justified to charge them more?

But maybe if you look beneath the surface you will find out that minorities are more likely to drive older, less safe cars, which leads to them having more accidents - they don't have more accidents because they are black, they have more accidents because of other factors.

The same with men and women - I think it's fair to say that men drive more powerful, larger cars(on average) than women do. So....they get into more accidents because the cars they drive are objectively more difficult to drive. Which obviously means that you should calculate your premium based on the type of vehicle being insured, not on the gender of the driver - and, by extension, insurance for a man and a woman, on the same vehicle, should be identical. Penalizing a man in this situation because men as a group get into more accidents is absolutely unfair.


> I think it's fair to say that men drive more powerful, larger cars(on average) than women do

I know you're sorta spitballing here, but at least in my experience that's a bit backwards. Around here, I'd guess women drive SUVs more than men do. Growing up, fathers would buy their daughters SUVs because they were safer. Now, husbands have their wives drive the family SUV for the same reason. Plus, moms and minivans, etc. That might just be a thing in the area I grew up, though. It's probably different in downtown Seattle.

> So....they get into more accidents because the cars they drive are objectively more difficult to drive.

I'd bet the testosterone doesn't help much, either :-)


Maybe it's different in US, but over here in UK with everyone I know the ratio I see is a man with an SUV/large powerful sedan, woman with a small city car to get to work and back. Moms in minivans isn't as much of a stereotype because kids just walk to school or take the bus/metro. I suspect it's more common in rural areas but in my experience taking kids to school by car is still not the primary way of getting to school. But obviously that's just my experience, my perspective.

I'd love to see actual data on this , especially power and size of cars grouped by gender.

>>I'd bet the testosterone doesn't help much, either :-)

Women have their own behaviour-affecting hormones too, you know.


Moms in minivans isn't as much of a stereotype because kids just walk to school or take the bus/metro

Where in the UK are you? I lived several years in England as a teenager and went to a 'nice' school in the suburbs of Surrey and almost everybody drove their kids to school.


over here in UK with everyone I know the ratio I see is a man with an SUV/large powerful sedan, woman with a small city car to get to work and back. Moms in minivans isn't as much of a stereotype

Also in the UK and can report that mothers driving “Chelsea tractors” is totally a thing.


Sure, but most mothers in the UK are not driving Range Rovers :-) It's very much a rich people thing. Working-class mothers don't drive massive SUVs in general, even if they drop off their kids at school - tiny city cars seem far more common than tanks.


Heh. It's always interesting seeing the differences between countries.

> Women have their own behaviour-affecting hormones too, you know.

Of course. I was making a half-joke.


> I'd bet the testosterone doesn't help much, either :-)

Whoa there. Someone might infer from this that you believe men might generally make decisions / respond to situations differently than women with biological reasons being a contributing factor. That's a dangerous line of thought.


I think it's fair to say that men drive more powerful, larger cars(on average) than women do. So....they get into more accidents because the cars they drive are objectively more difficult to drive. [...] Penalizing a man in this situation because men as a group get into more accidents is absolutely unfair.

Oh, come on, do you really think insurance companies don't break out all the different factors? They're in a cutthroat business where every tiny margin they make over their competitors is a big advantage. But if they guess wrong, they'll cut premiums too aggressively and lose money on accident claims.

Car size and gender are both factors. It's possible they're partly correlated, but actuaries can still incorporate both factors into their models.

Charging higher premiums for men is fair if you only care about accident rates. If you feel it's a form of discrimination and want to level the playing field, that has to be done via government regulation.


>The same with men and women - I think it's fair to say that men drive more powerful, larger cars(on average) than women do. So....they get into more accidents because the cars they drive are objectively more difficult to drive. Which obviously means that you should calculate your premium based on the type of vehicle being insured, not on the gender of the driver - and, by extension, insurance for a man and a woman, on the same vehicle, should be identical. Penalizing a man in this situation because men as a group get into more accidents is absolutely unfair.

The simplest "AI" (linear regression) will already factor this out if it has the data about the car driven and gender. Any remaining gender imbalance is attributable either to another factor or it is actually gender related (e.g. increased road rage, high speed driving and risky following behavior due to testosterone might be a stronger accident predictor than too careful driving (not speed matching fast enough when merging on the highway), too slow driving or worse spatial reasoning which are usually correlated with higher estrogen levels.)

In general the AI will just become a better predictor the more data it has available and the more detailed the data is. But an effeminate man or a manly woman might behave more than the opposite gender and thus be treated wrongly. We could probably solve this by tacking on a recent endocrinological report to every application, but this gets quite privacy invasive at some point.

In general if you get insurance you want to be part of a larger risk group to mitigate your effects. Having no insurance just means you have the most fair risk group, which is just yourself.


> The issue here is the phrase "fair share" - why is it a fair share? Because women get into fewer accidents? Cool, what if we get out of the statistical data that black people have more accidents(this is just an example, I don't know if they really do)?

Because actuarial tables say so. Insurance companies that decided it did not matter folded or changed their opinion after the losses


So every insurance company in the EU collapsed after charging people more depending on gender was banned EU-wide?


They leveled the prices at the higher rates.


In the case of life insurance or auto insurance, if charge the same by gender, it basically means subsidization of male by female. Given the female was the more disadvantaged group historically, it felt wrong.

Race is the other way around.


I'll take a swing.

Insurance companies have a strong incentive to price policies accurately, so you have to assume that discriminatory decisions are made with that goal in mind. Any restriction on factors that make the model less accurate (like forbidding race) will lead to higher volatility and a lower stock price (insurance companies are prized in portfolio management for their low volatility).

The government, however, recognizes that most people are forced to buy car insurance, which means that they are being forced to pay for those profit-maximizing decisions. And to the extent that being forced to pay can creates a downward spiral of high costs leading to poor decision making and even higher costs, that is something to be prevented.

So where to draw the line? Well, the people most at risk of getting into a downward spiral are the already economically disadvantaged (read:poor). The government already prevents considering whether the insured is poor (either explicitly or implicitly by not allowing questions about income/net worth on applications).

But really, any factor which would tend to disproportionately target the already-poor would need to be limited as well, since that can often become an end-run around the limitation. If you consider the factors that would do that at the population level, race would be very strong indicator of income, but gender would not be. (yes, I know women make less in the same jobs, but we're talking about the macro here, including women who are heads of household, 2-income families, and stay-at-home moms).

Now, it doesn't explain why other factors that would seem to be associated with low income like zipcode are allowed. But to answer your question about why gender is ok to price discriminate against while race is not, I think the answer is we try not to force poor people to pay more for government-mandated things just because they are poor or otherwise exhibit a feature that would characterize them that way to an uninformed observer.


Because it has not been shown that any indicator of causation of insurance risk due to race is anything other than a correlated symptom of other factors, where many properly peer-reviewed studies do seem to show that women, particularly within certain age and economic groups, are slightly safer drivers by an amount that is not small enough to be random noise in the stats. The effect is general enough for it to be defensible to say it is at least in part due to gender. Of course, other properties such as age, location, and profession, are much more significant factors.

While some studies show that it is true that Mexicans/African-Americans/insert-other-group-here are a higher insurance risk due to crime and to a smaller extent due to accidents, this is not because of their ethnic background but because of other perhaps correlated but not causally linked factors - they are more likely to live in poor areas with higher crime rates and not be able to afford better security measures. Being Mexican/black/other is not a causal factor despite there being some correlation, so it is not fair to discriminate based on that property, any discrimination should be based upon the causal factors, rather than the lazy correlation, to avoid being unfair to those said factors do not apply to.

(FYI: male here, with many an anecdotal tale to support us being worse drivers generally, if only by a little bit)

lt;cbatr: correlation does not imply causation. It is not fair to discriminate based on factors that are merely correlated with the risk, but it is fair to discriminate based on factors that can be shown to be causally related to the insurance risk.

Of course in the UK (and the EU more generally?) it has gone the other way: insurance companies were forced to stop discriminating by gender which means they can no longer give women lower premiums. Did they lower male premiums at the same time to make up the difference, or just bump female premiums up and pocket the extra? Go on, guess...


If there is no basis for being higher risk, why wouldn't an insurance company want to make more money?

Are you suggesting that their "inherent racism" is causing them to forgo shareholder profits?


They need to be as cheap as possible to remain competitive. This means identifying risky clients and either charging them more or worse refusing their business to make sure they are not taking on too much risk while trying to be competitive. If this were not the case then they would keep it simple and just charge a flat rate (that covers the highest risk they are willing to accept) to all customers.

Using discriminating factors that are simply correlated rather than causal is unfair as it penalises some people within a given profiled group for no good reason. This may not be due to inherent racism, but merely due to not properly understanding the statistics. Or not due to inherent racism now but because of lazy "it has always been done this way" reasoning. It may in part be due to racism, of course, as people are less likely to question results that agree with their worldview.

Identifying the correct causal factors and using the as discriminators is fair, and can be demonstrated to be better for the business too by allowing lower prices for some groups aiding competitiveness, but it can be harder work leading to the less effective and less fair option being used - saving effort now at the expense of being fair (and some potential longer-term business benefits).


Something I see come up regularly in discussions about women in tech is that women are often treated differently in their upbringing. For instance, many report an implicit (or even sometimes explicit) "computers are for boys" message that came from their caregivers, which shaped their connection to the tech industry. The causation here is not being a woman, it is being exposed to an environment that treats women in a certain way.

A similar norm I often see from caregivers is that they consider boys to be "rough and tumble" and girls "soft and dainty". It is easy to see how those behaviour classes could extend to insurance risk.

Is being a woman/man really the causation? Or is it something like the environment that the genders are brought up in? If it is the latter, how is that any different than a minority being brought up in a poor neighbourhood (and, statistically, stay there)? After all, in both cases, you are penalizing those who had an upbringing that is outside of the statistical norm due to a presence of correlation.


And young women (girls to me) drive less because young men (boys) tend to drive when they are dating. So maybe it's not because of their gender (or perhaps more properly sex here) but because of other "perhaps correlated but not causally linked factors."

With a lot of states changing the requirements for drivers permits to require more training and disallowing multiple occupants under certain conditions when the driver is newly licensed it will be interesting to see if this changes.


> It is not fair to discriminate based on factors that are merely correlated with the risk, but it is fair to discriminate based on factors that can be shown to be causally related to the insurance risk.

Why should an actuary care about cause? The correlation is what matters for that purpose.


Here's one reason: In most countries people have an official sex, shown on official documents. (That will change one day: sex should be treated as potentially confidential medical data.) However, most countries have never had an official concept of "race". So if an insurer asked people for their race they could answer whatever gives them the lowest premium and they could not be accused of lying.


That's not true. Most countries don't have an official concept of social classes, so that makes it okay to say that you earn above X (e.g. to someone lending you money) even though you're not making anything? It's still lying.


Most countries have an official concept of annual income. It's used to assess income tax.


Most countries also have laws that prohibit the employer from asking for this information, it's still not OK to lie. Refuse to disclose, yes, but don't lie.

Edit: guys, you should probably consult a lawyer. If you don't refuse to disclose but lie instead, that's called deception and at least in the EU it's a criminal offense. Also, morality, huh?


Is deception in general illegal in the EU? I don't think that is the case in the US. Fraud in the US requires that the person being deceived suffers an injury as a result of their reliance on a misrepresentation of a fact. That would be the case if you were applying for a loan because your income affects your ability to repay the loan, but I don't think it would be the case for an employer because they aren't harmed by your income being lower than what you tell them it is.


Yes, deception is illegal even if no one was, is or will be harmed; the rule is that it's illegal if the deception was made for profit (not just monetary). There are many situations where you profit from deception even though no one is harmed, and that's illegal.


What if insurer asks for the applicant's passport photo and then uses artificial or human intelligence to classify the applicant by race?


That's an interesting question. Of course, not everyone has a passport, but that's a nit: people applying for car insurance typically do have a driving licence with a photo. What would an AI make of a huge database of photos and insurance claims? I'd love it if someone would try the experiment! I doubt the AI would invent the concept of "race" (whatever that means), but it might decide that people with tattoos or piercings should pay more, for example.


If the insurers have everyone's photo, the AI can directly correlate the photos to accident rates and discriminate based on whatever mysterious aspect of the image appearance causes the correlation.


> That will change one day: sex should be treated as potentially confidential medical data.

Why “should” it? Don’t hold your breath.


Because differential treatment between the sexes doesn't aggregate and self-amplify negative economic outcomes across generations to create distinct economic disparities between minorities?


Feminists have argued that differential treatment between the sexes has aggregated and self-amplified across generations, to the detriment of women.

Why wouldn't the same be true of differential treatment to the detriment of men?


>Feminists have argued

Do you? Because you're going to actually have to make this argument for me - I don't think I've ever seen feminists say that capital accumulation mechanics are the cause of misogyny.


Not capital accumulation, but social expectations. In this case, one obvious candidate would be that the stereotype that young men are more dangerous drivers creates a social pressure for young men to take more risks to prove their manhood.

In this case auto insurance rates are both a consequence of the stereotype and a mechanism that imposes it on new drivers.


>In this case auto insurance rates are ... a mechanism that imposes [the stereotype] on new drivers.

The argument for restricting discrimination on a gendered basis relies on the socially instructive nature of vehicular insurance to teenagers.

That feels exceptionally weak.


One of many factors. Movies and other fiction are no doubt the most powerful force communicating that expectation that young men should act like race car drivers.


> In fact, young people in general would have to get the same price as everyone else?

I really enjoy the fact that young people pay for car "insurance" more because of their increased risk at driving, but old people don't pay for health "insurance" more (if they pay at all) because of their increased risk at getting ill. Because one of those would be totally unfair while the other one is totally OK. You understand which one is which, right?

It is basically youth serfdom.


> I really enjoy the fact that young people pay for car "insurance" more because of their increased risk at driving, but old people don't pay for health "insurance" more (if they pay at all) because of their increased risk at getting ill

My health plan, purchased through my state's ACA exchange in the US, costs me as a 57 year old 2.71x what it would have cost a 23 year old purchasing the exact same plan.

I would happily take a 23 year old's car insurance premiums if I could get a 23 year old's health insurance premiums in exchange.


> old people don't pay for health "insurance" more

They do in fact pay much more for health insurance. US law does now limit how large of a difference there can be to lower than the natural amount, though, so young people are subsidizing older people's health insurance.


Here in the UK, if you elect to get private health cover in addition to the NHS-provided general care, older people do pay more.


I think they pay more in the US as well? I thought the aspect of ACA limits how much more they have to pay (and the Republican drafts had similar rules, but allowed a greater differential).

The reasoning being, over your lifetime you pay a bit more when young, which acts as a virtual savings account for when your older and need more care.

If you relied on people saving, they wouldn't and you'd have old people dying in the streets. Similar logic applies to enforcing pension savings.


But it's not savings it's redistribution, i.e. extortion.

Young generation is guaranteed nothing for their contribution.

It could only be sound if population is to grow organically and indefinitely, which will obviously be unsustainable either.


The society getting richer is probably the defining property that makes it sustainable, not the number of bodies in it.


Unfortunately that's not how modern economy works. And we don't know how to make it work that way.

Modern economy works in this way: you give up ~40% of your income, which is then redistributed between retirees. Infrastructure and all other kinds of social programs also take their cut. This works when there's two workers for every retired person

If there would be as many retirees as workers, you will end up giving 80% of your income. But it is totally unsustainable. Imagine that you pay $10 for a coffee, and the barista only gets $2 towards his account, post-tax. She won't be able to make decent living. You strangle life out of economy and make it go to grey zone.

Yes we may be able to produce endless amount of wealth, but we are not able to make this sheet balance itself. There's just 100 of percents and they don't make those anymore.


What happens in your case if an entity is not in any way "aware" of people's protected class (they don't collect that info) but using these other properties coincidentally affect a protected class?

Say you _don't_ collect info on people's religiousness but other properties you collect in effect result in you either discriminating for or against either religious people or perhaps against atheists.

If there is no awareness, is there still liability?


Yes, you can still be liable even if you're not aware:

"Where a disparate impact is shown, the plaintiff can prevail without the necessity of showing intentional discrimination unless the defendant employer demonstrates that the practice or policy in question has a demonstrable relationship to the requirements of the job in question"

https://en.wikipedia.org/wiki/Disparate_impact


I'm not a lawyer, but I'd say if you make a good faith effort to prevent that eventuality, it would probably at least mitigate liability. That's a great question, though.


There's a simple technique to "control for" those factors. First you split your population into groups that each have a similar race, gender, and age. Then within those groups you make comparisons of the hotmail users to the others. https://en.wikipedia.org/wiki/Controlling_for_a_variable

You would need to keep in mind xkcd #882 (https://xkcd.com/882/).


Let me ask some clarifying questions, because I think I understand the broad strokes of what you're saying, but I'm not clear on some details.

OK, so I find that among Black users, Hotmail is associated with a 1.05 relative risk to the baseline, and among white users, 1.15 relative risk. How do I then apply this knowledge to an incoming user of an unknown race with a Hotmail account? Do I give them the population weighted average of of the relative risks among all my buckets? What if I do know their race? Do Black users get assigned 1.05 as much risk and white users get 1.15?

Also, what if some of my buckets aren't full enough to get a high confidence of the relative Hotmail risk in that bucket? Let's say I don't have enough gay Hispanic women in my cohort. Do I just drop them from the analysis and hope they're similar to the other populations?


Also, what if Hotmail is associated with a 1.15 relative risk for both black and white users, but a black user is significantly more likely to have a Hotmail account than a white user?


> Having a Hotmail account is a symptom of a problem which could lead to higher risk, but it is judging that symptom as if it is a cause which leads to risk.

No, it's not. Insurance is not about establishing causality. Insurance is about assessing risk. If you are correctly assessing risk via proxy signals, you are still correctly predicting the losses that will have to be covered by your insurance, it's completely irrelevant how the losses are causally connected to the signals that you use for the risk assessment. If careless people are more likely to both cause accidents and to have hotmail accounts, then use of a hotmail account does correctly predict an increased risk.

> Its exactly similar to the supposition that black people statistically commit more crime, and thus should pre-emptively receive harsher bail or be profiled.

No, insurance is not about influencing people, it is about assessing risk. Insurance companies ultimately don't care about how much losses they have to cover, they only care about pricing their premiums such that they do have the money to pay for it. Insurance companies don't raise premiums to incentivize people to reduce risk. Insurance companies raise premiums because there are more losses to cover where the risk is higher. That that might also sometimes incentivize people to reduce risk is a side effect, but not an intention of the insurer.


that's sort of how insurance pricing works though. An actuarial process groups you with others based on similar attributes, and works out an overall risk value for you based on the combined risk profiles of the "groups" to which you belong.

Historically, insurance companies only had things like postcode, gender, age, accident history to go off... but they're always interested in other data sets that could be used to modify pricing (or select less risky customers).

Clearly someone or something has identified email domain as a factor (presumably based on data).


They don't want "less risky customers" in car insurance and similar markets they want:

More customers (full stop, if you have customers then your competitors don't have them, so you grow and they fail) Customers who pay more money

Correctly assessing risk is not irrelevant but it's a small factor. You will not go out of business if you get the risk factors slightly wrong, you will if you can't attract customers or if they don't pay.

In markets where you insurer fewer, bigger customers you care even less about accurately computing their individual risk, and more about a problem called aggregate risk - if you ensure all the residential blocks in San Francisco against earthquake damage, the Big One wipes you out. Insure one such block in every US city and you're fine, the Big One means you take a loss in SF but not in New York or Seattle. But if they're all built from Newsteel and then it turns out Newsteel turns into mush after 25 years you're back to big problems. So you are suddenly very interested in understanding what is the Same about things you insure.


Even the obvious one, your insurance changes based on the type of car you're insuring.. because it's not all about cost.

It's no secret that sports cars are more costly to insure, specifically because people are more likely to drive recklessly in them.

If we didn't discriminate against people's car preferences, everyone's insurance would increase overnight.


> Having a Hotmail account is a symptom of a problem which could lead to higher risk, but it is judging that symptom as if it is a cause which leads to risk.

No, it's not judging that symptom as a cause. It doesn't even matter if it's a cause or a symptom, as long as it is statistically sound. Hotmail, like [not] smoking, [not] exercising, [not] drinking alcohol, is a choice. Having chosen one way or the other does, statistically, imply other things.

Whether we, as a society, want to allow this kind of statistical inference to meaningfully impact things like insurance costs, prison terms and other important decisions is a completely different issue.


Then it needs to be clear what the effects of the choice are, so you can choose. Then the effect will disappear because it's not a true predictor. Each of your examples have a clear explanation, hotmail doesn't.


What's a "true predictor"? Statistics deals with useful predictors (biased/unbiased, efficient/inefficient, sufficient/insufficient etc), and in that sense, it's likely that a hotmail account is a good predictor of (not) being good or up to date with modern computer technology; that's definitely true of my sample anecdata (and AOL is a stronger predictor of the same).

Now, assume for a second that ability to use modern technology (like touch screens and digital displays) correlates with safer driving -- an assumption that could actually be tested. If that's the case, then you would probably accept that a score on some objective[0] test of tech savvy could affect car insurance rates, right? An objective, though slightly less accurate, estimator of tech savvy is the domain name - "@microsoft.com" or "@google.com" means likely savvy, "@aol.com" or "@hotmail.com" likely less so.

Age, Gender and Religion are other predictors of the same kind; e.g. observant Jews have significantly lower (~15% in some stats) chance of being in a car accident. That's not because they are better drivers or drive better cars - it's because they refrain from driving for one day a week. Knowing someone is Jewish therefore implies lower chance of being in a car accident - maybe only 3%, because 80% do not observe. So "being Jewish" is, statistically, an informative predictor.

Again, I am not saying these SHOULD be legal - but that's a philosophical/legal issue. Where I now reside, the legally approved predictors are: (a) age, (b) accidents in the last 5 years, (c) most recent validity period of drivers license, and (d) age and safety rating of car. It's not as "free" a market, but it is much saner.

[0] but objectivity is subjective... http://wiki.c2.com/?ObjectivityIsAnIllusion


Let's say a true predictor is causal. Because each example you give is causal: age causes defects, smoking causes sickness, alcohol impairs, religion causes behaviour. Using hotmail versus gmail will not cause something -- private domains are outliers. If you make the predictor transparent the effect should disappear because no one will use a hotmail address unless they are technically inept.

I know some non-causal predictors, yet they're all controversial: various data as predictors for race, for example an address to determine which part of the city one lives.

This is my argument: it could be a `good` predictor but it's a proxy and it only works because there is no transparency. As such I don't think it's ethical for two different reasons.

I'd also like to add to your anecdata: I still use my first hotmail email as primary. I'm a young technical and safe driver (I hope).


> Let's say a true predictor is causal.

That's your definition. "Causal" is well defined, "true predictor" isn't. Let's adopt your definition for the sake of argument ...

> age causes defects, smoking causes sickness, alcohol impairs

Age is correlated with defects but does not directly cause it. Some people at 40 have less "age related defects" than other people at 30.

Smoking causes sickness, but is also correlated with less obesity, and -- in the minutes/hours after smoking, is correlated with being more alert (possibly caused by a nicotine response, I don't remember the biology).

Religion _sometimes_ causes behavior, the only thing you can say are that they are correlated - the vast majority of Jewish people I know are NOT observant, and yet - over the general population - it does correlate with less time on the roads even for the general population.

Yes, it's a proxy, but so is age, or "time since you got your license" - Some new drivers are way better than other drivers who have been driving for 20 years.

Whether it works only because of (no) transparency is up for debate - my great uncle could not switch from AOL to gmail on his own without my help, and does not want to. He might want to if he knew it saved him $100/year in insurance costs, true, but it would cost me time in support costs (gmail is easier; but he's been using AOL for over 20 years and it's hard for him to change his ways).

> As such I don't think it's ethical for two different reasons.

I don't think it's ethical either. And I would thus like it to be made illegal - because, statistically speaking, right now it is likely[0] a good proxy and predictor for a lot of things, and therefore as long as it's legal it will be used.

We have "blacklisted" other things, like race -- e.g., in the US, unfortunately "skin darkness" is a good predictor for the question "having spent time in jail" (about 10% of black people vs. 1% of white people). It is illegal to use as a predictor -- but it turns out that other "legal" predictors such as "lives in a high crime neighborhood" correlate strongly with "skin darkness" and ARE used to determine sentences.

The right thing, IMO, is to whitelist acceptable predictors, not blacklist unacceptable ones because you can always find proxies for the unacceptable ones, and it is easy to game one of those proxies, but hard to game 20 of them at the same time. But that's an ethical, legal and philosophical debate. Statistically, proxies are sound.

[0] I have no access to real data, I'm just assuming for the sake of argument


You seem to keep arguing that your examples are not causal but only correlated. Because something does not happen 100% of the time does not mean it's not causal. Smoking causes cancer, not everyone that smokes actually gets cancer.

I'm actually not satisfied with that definition, but for now it works well enough for the sake of argument. I would like to define it as a predictor whose strength does not change when it's used transparently. But that definition is too vague for now.

And you're right that when used transparently, the predictor will probably become more accurate.

> Smoking causes sickness, but is also correlated with less obesity, and -- in the minutes/hours after smoking, is correlated with being more alert (possibly caused by a nicotine response, I don't remember the biology).

In [0]: [Since] nicotine is a metabolic stimulant and appetite suppressant, quitting or reducing smoking could lead to weight gain.

I didn't know that smoking caused less obesity. I would have thought it was because of monetary reasons.

[0] http://www.nber.org/papers/w21937


I am using this definition of causality[0], which is precise, and you use it in a much more general sense.

> I didn't know that smoking caused less obesity. I would have thought it was because of monetary reasons.

It does not cause less obesity. It is correlated with less obesity, possibly through suppressed appetite -- but it's also possible that it's actually a reflection of economic background which is also correlated with obesity.

It apparently does cause more alertness (in the same way that coffee does -- enough of it will make you awake and alert, too much will kill you).

[0] https://en.wikipedia.org/wiki/Causality#Probabilistic_causat...


That's correct on all counts except it's also irrelevant whether the variable is a choice or not.


It's a philosophical/ethical/legal question what should be allowed to go into insurance rates, so relevance is determined by the framework in which you evaluate things.

I am personally ok with higher car insurance premiums for people with a prior DUI conviction, and higher health insurance premiums for smokers; I'm not ok with higher health insurance premiums for albinos, or people with a family history of cardiac problems or family history of type I diabetes -- even though they are mathematically justified (which means that -- not being in either group -- my own premiums are higher as a result).


Yes, but that is a different issue as you said in your original comment. Statistically it is irrelevant. The "is it a choice" part also belongs in the "completely different issue" basket, so to speak.


gosh. no, having a hotmail account correlation it doesn't imply anything at all. is not causation, that's worth repeating 1000 times.

> Hotmail, like [not] smoking, [not] exercising, [not] drinking alcohol, is a choice.

lol.


> correlation it doesn't imply anything at all. is not causation, that's worth repeating 1000 times.

You might want to understand correlation, causation and dependence, and implication before repeating that 1000 times. I specifically used the term "statistically imply", which is correct, and means "has predictive power", which is true in this case of correlation whether or not causation is involved.

Correlation actually implies a lot of things. At the very least, it implies statistical dependence (to the confidence attainable with the data). The converse is NOT true -- lack of correlation does not imply independence. But as independence implies LACK of correlation, correlation necessarily implies dependence. This is very basic probability, and is relevant to estimation of which insurance is an application. Causation is a completely different subject.

> lol

Care to elaborate? What exactly do you disagree with? Is any of the listed things not a choice?


Yelling that correlation is not causation 1000 times is ignorant when the effectiveness of a method depends only on correlation, and does not depend on the existence of a causal link. And in the case of filtering, that is the case.

Think screening for a virus vs trying to cure the virus. If warts are a symptom of the virus, having warts absolutely matters in screening. However, cutting off the warts will not remove the virus.


What would you define as "scientifically proven"? In general, an observation that is observable, repeatable, and predictive is considered the criteria there. Take for instance genetics. The entire field is, with a few exceptions, built on nothing but correlations. And social science is 100% built on correlations.

If a geneticist were to state that some gene or another were linked to the unibrow [1] would you consider this scientifically dubious? After all, in that case all they did was look at people who have unibrows and look to find characteristics that were more common to this group than others. What if that genetic indicator did show up disproportionately among all people with unibrows and was relatively less common among those without a unibrow?

[1] - https://www.nature.com/articles/ncomms10815


> Unless you can scientifically prove that ethnicity causes crime to happen, its disgusting.

If someone was to prove such a fact (say, a gene prevalent in some ethnicity that causes propensity for fraud) would you support decision making based on the ethnicity? How about based on the presence of the gene?


Both of those are already happening in health insurance. Not a "gene" that causes "crime", but you're higher risk under certain conditions depending on genes and ethnicity.

Is it fair? No. Will legislation be put in place to fix it? No. The only thing that levels the playing field is .. Nevermind, let's not get into politics.


You should watch Gattaca.


And then consider the alternate ending: his heart condition lead him to die halfway through the mission, and without a pilot everyone else onboard died.


I came away from the film with the impression that he didn't have and never had any heart condition. He was assumed to be super unhealthy because he was a natural birth (a kind of superstition, if you will), but that makes him like us. Do we all have heart conditions?


IIRC heart disease is the second most common cause of death, something in the region of 20%. So not all, but a substantial fraction of us have heart conditions.


Which might be why they thought that he had it. They seemed to be incorrect about something or he wouldn't routinely outperform his brother in physical sports.


Many athletes die young from heart conditions - a heart that's overperforming is one that's wearing itself out faster.


He had to fake the heart-monitoring test when he was running on a treadmill.


Did he have to or did he assume he had to?


I disagree completely. Why does it matter whether it's just correlation or causation?

Let's imagine that being black was actually what caused black people to commit more crimes and this was scientifically proven, how does this make it less bad to discriminate against some black guy who may or may not be a criminal?

From a business perspective, correlation is a good measure and making decisions based on it is perfectly fine (if the correlation is real).

From a moral perspective, it is not OK to discriminate based on correlation or causation.

The distinction between correlation and causation matters when you are attempting to affect the outcome. For example you can't cure someone's sickle cell anemia by painting them white.


That makes the assumption that you have modelled all the variables that make up the solution space. For the example the real, underlying reason might be poverty. So while for the purposes of the application the solution might be "good enough" but in the broader context (society) there are going to be serious consequences.


No, I do not make any such assumption. And I already covered for which purposes the "underlying cause" matters and for which it doesn't.


Let's assume that higher crime rates are caused by poverty and not by the color of your skin.

Let's say you accept the answer you've received based on your heuristic that black people will statistically commit more crimes.

If a policy was based upon this heuristic, then you would have people living in poverty that would not be subject to the same policy. This is what discrimination is about. Citizens would thus not have the same rights / opportunity.

And the only reason you are accepting this heuristic as "good enough", is because you make the assumption that "you have modelled [enough of] the variables that make up the solution space".

This is not acceptable for policy makers and should be watched for closely in a democracy.


I don't understand what you're arguing against, which part of my comment do you consider incorrect?


Going back to your original post:

> and this was scientifically proven

which renders moot any discussion as you simply did not use the same assumptions. The discussion was about specialist decision systems using correlation as a heuristic for causation. You cannot explore the merit (or shortfalls) of this common practice and its possible effect on policy-making when positing that causation has been proven. This is nonsense.

Have a good day.


I said that even if it was scientifically proven it still wouldn't be OK to discriminate. Reading comprehension.

And the discussion was NOT about systems using correlation as a heuristic for causation. If a company filters users based on something, only correlation matters for their purposes. Believing correlation is only useful as a heuristic for causation is ignorant. For some purposes, correlation matters. For some purposes, the root cause matters. Not everything is the same.


>Unless you can scientifically prove that ethnicity causes crime to happen, its disgusting. This is obvious to us.

Whether or not ethnicity is a causal factor is irrelevant. Even if it is a causal factor, it is still an injustice to act on it.

And if you didn't care about justice and just wanted to achieve an optimal prediction, whether or not it's a causal factor doesn't actually matter. Probability theory does not require a causal link.

"If it is rainy then it is cloudy." Is a valid inference despite the fact that rain does not cause clouds.


> "If it is rainy then it is cloudy." Is a valid deduction

It is not, at least locally (because sunshowers happen.)


I thought I could sneak that fix in before anyone would notice. Yes, it's a valid inference, true with a high probability, but not a probability of 1.


>scientifically prove

That's not how the scientific method works.


> We need regulation in place to stop any punitive decision making, public and private, which can be found in court to be based on correlation instead of causation.

Actuarial analysis for insurance to exercise a privilege (i.e. driving) is not something that needs to be more heavily regulated. This is not health care/insurance, food, housing, or other critical necessities.

Correlation is a core part of actuarial science. You'd actually destroy the insurance business if you regulated that into nonexistence as you desire.


I think the troubling aspect here is that these are actually statistically sound decisions. The company does not need to have a strong implication that Hotmail -> bad driver. They only need to know that P(bad driver | Hotmail) > P(bad driver | Gmail). And through some correlation (many of the theories proposed by others seem plausible), this apparently holds.

As developers of these systems, we need to be careful of how we might apply superficial correlations like these, so that we don't cause harm and burden to those who happen to be caught up in them through no fault of their own.

As a side note, I happen to have a 10+ year old Hotmail account that I use for signing up to services. My Gmail address is only given out to real people. Personally, I view it as a testament to my diligence that I have managed to give my email out to hundreds of websites, several of which have had database breaches, and still only see one or two unwanted emails per week.


If insurance were a way to minimize risk by distributing it, differential pricing would be applied only for factors you can influence, to the degree that you can influence them. E.g. if you are a smoker, health insurance would be more expensive, but if you are willing to go into therapy, the insurance would pay for it to save on cancer treatments down the line.

Of course insurance companies are first and foremost trying to maximize their profits by charging everyone just slightly more than their expected payouts [1]. That also means that their profits go up when they get better at modeling someone's risk profile and then charge them more. The whole business model of insurance depends on treating people differently, even if they are different due to no fault of their own.

[1] corollary: if you have enough money, you shouldn't buy insurance (expected loss), but insurance companies (expected profit)


The more perfectly insurance companies can model future payouts, the less it acts like insurance.


You should never buy insurance.

The Expected loss with insurance is higher than without for two reasons:

* The insurance company must make a profit

* You are much more likely to go bankrupt than the insurance company. Bankruptcy limits the payout.

The only time you should take insurance is when you know something the insurance company doesn't. For example, you know how very dangerous your house electrics are, while they don't. Take that thinking too far and it's called "insurance fraud" though.


Like those who are of a certain gender and are charged more?[1]

Or those from a certain zip code who are charged more? [2]

Or those who drive a certain car model who are charged more? [3]

How is email address any different?

[1] https://www.nerdwallet.com/blog/insurance/teen-boys-high-car...

[2] http://www.latimes.com/business/autos/la-fi-hy-ten-worst-zip...

[3] https://electrek.co/2017/06/05/tesla-owners-insurance-rates/


Its much easier to change your email than your gender or zip code. Charging different amounts based on a car model makes sense though, some cars are much safer than others.


Yes, the people that STILL haven't bothered to change from Hotmail even though it's quite easy are more likely to not be good drivers. I don't think the difficulty of changing the factor should matter, in fact I'd say the easier they are to change the more impact they likely have as it may show laziness (higher risk). Having these factors be openly noted however would be important.


> we need to be careful of how we might apply superficial correlations like these

Well, for one, correlation != causation. Period.


Well, that's fine, but insurance companies are only interested in correlation. It's rather obvious that Hotmail addresses don't cause car accidents.


Yes, it is obvious, hence my comment. Using correlation as a stand-in for causation is stupid.


Why should an insurance company care about causation?


I never said insurance companies should care about it. Obviously some folks do care about it else this article would not be on the front page of HN, since the entire premise is "correlation may not be a good way to discriminate!!"


But does that mean that causation is a good way to discriminate or that the article is claiming anything like that?


My wife runs a online consumer facing business and her experience is hotmail account users are significantly more of a pain than any other email account type. They seem to be less able to understand instructions - it is a bit of a running joke in our house whenever she complains to me about some painful customer I ask her if they are a hotmail customer.


I can second this. If I get a support email from a hotmail user the first thing I do is to refund them their upgrade payment, because most of the time it is completely hopeless. Another pita with hotmail users is when they click the "Report spam" button on email address confirmation emails ...


What would you rather have? Insurers using broad classes and heuristics about you to put you in a bucket that may not always be fair, or a company like Google who knows about your life to an extreme amount of detail calculating exactly how much risk you pose and providing that to the insurers as a service?

If you don't want to be put by insurers into buckets, well, all I can say is, be careful what you wish for.


Personally, I'd rather car insurance were tailored as individually as possible. If I could get that option without my data ever leaving Google's silo, I would do it in a heartbeat. Then again, I know I'm very low risk compared to most people.

In general, I do think it's good that people pay insurance based on the real risk of loss they pose. I'm fine with this in the case of umbrella insurance, car insurance, home insurance, life insurance, etc. The only exception I have is health insurance, because I don't think people should die of curable illnesses, just because they can't afford treatment.

But if someone poses a high risk of causing a car accident, I prefer that they not be able to drive. If someone is likely to burn their house down (maybe they're into basement welding), they probably should have to pay more to insure it. Insurance companies know that many people would prefer to subject themselves to the panopticon to save money, which is why some of them are starting to offer the option of installing a speed tracker in your car, among other things.

I recognize that this is not everyone's preference of course, but you asked, so I thought I might give you some perspective from someone who thinks differently from you.


> Personally, I'd rather car insurance were tailored as individually as possible.

Aren't you then saying that ultimately you would prefer if insurance didn't exist?

I mean, that would asymptotically approach a prediction of your personal future losses, and thus your premiums would asymptotically approach your future losses, and thus what's the point of giving your money to an insurance company only to get it back later?

But why wait for insurance to not exist, really? For most things, you can actually get exactly and ideally, without any approximation, what you wish for: Just don't buy insurance, and you will only pay for exactly your individual risk--there is no way to tailor your insurance more individually than that, is there?


I don’t think you can predict risk well enough for that to come true. No matter the predictive power, real life still carries probabilistic nature. We can assign 0.0001 probability to somebody’s house burning down, but in real life it either will, or will not.

They can then choose to skip insurance, and carry a very small risk of loosing a lot of money. Or buy insurance, and carry a very high probability of wasting much smaller amount of money.


> I don’t think you can predict risk well enough for that to come true.

Which doesn't change that that is the ideal goal that GP is suggesting, right?

> No matter the predictive power, real life still carries probabilistic nature. We can assign 0.0001 probability to somebody’s house burning down, but in real life it either will, or will not.

The question was not whether the goal is reachable. The question was whether the goal is actually a good idea. And in order to evaluate that, you have to consider the consequences of the hypothetical case that it is reachable.

> They can then choose to skip insurance, and carry a very small risk of loosing a lot of money. Or buy insurance, and carry a very high probability of wasting much smaller amount of money.

What do probabilities matter to the outcome here? If a hypothetical insurance company managed to achieve the ideal goal with regards to fire insurance, in that they made the insurance as individual as (logically) possible, that would by definition mean that they would charge a premium only from exactly those people whose houses would in fact burn down lateron, and then pay them back when it does indeed burn down. Not insuring yourself gets you exactly the same end result that you would get in that hypothetical world: If your house does not burn down, you don't pay anything and you don't get anything back, and if your house does burn down, you pay/save massive insurance premiums that you later get paid back/have in your savings account. Or rather, you don't, in the latter case, because you can't afford it. So, if you think that that is a goal to strive for, why not realize it for yourself now?


> The question was not whether the goal is reachable. The question was whether the goal is actually a good idea. And in order to evaluate that, you have to consider the consequences of the hypothetical case that it is reachable.

So: yes, it would be ideal if we could predict all traffic collisions and all home fires. And indeed if that were true, insurance wouldn't exist. But that's not a very enlightening scenario.

> If a hypothetical insurance company managed to achieve the ideal goal with regards to fire insurance, in that they made the insurance as individual as (logically) possible, that would by definition mean that they would charge a premium only from exactly those people whose houses would in fact burn down lateron, and then pay them back when it does indeed burn down. Not insuring yourself gets you exactly the same end result that you would get in that hypothetical world: If your house does not burn down, you don't pay anything and you don't get anything back, and if your house does burn down, you pay/save massive insurance premiums that you later get paid back/have in your savings account.

Indeed, but that's because the insurance company can accurately predict what will cause house fires. And people could use that prediction: if taking up welding means my insurance premium goes from 0 to 300k because I'm definitely going to burn my house down if I take up welding, I can use that information to make an informed decision about whether to take up welding.

> So, if you think that that is a goal to strive for, why not realize it for yourself now?

How? Where can I get a quote, now, on whether a given hobby/lifestyle change is going to lead to me burning down my house?


Only if "as individually as possible" means perfect prediction. If you assume that there's a limit to how well the future can be predicted, insurance still has value.

Also, car insurance specifically is usually legally mandated. There's a tangible benefit to having it (being allowed to drive) besides the risk mitigation.


> Only if "as individually as possible" means perfect prediction. If you assume that there's a limit to how well the future can be predicted, insurance still has value.

The question was not whether insurance has value if that ideal goal is not going to be reached. The question was whether the goal makes sense in the first place.

> Also, car insurance specifically is usually legally mandated. There's a tangible benefit to having it (being allowed to drive) besides the risk mitigation.

Which admittedly does not allow GP to realize this now with regards to car insurance, true.


> I mean, that would asymptotically approach a prediction of your personal future losses, and thus your premiums would asymptotically approach your future losses, and thus what's the point of giving your money to an insurance company only to get it back later?

If we could estimate that well, risk would no longer be a real thing, and then there would be no need for insurance at all. However, you can't just take ideas to the absolute theoretical limit and expect the results to come out making sense in the real world.


> If we could estimate that well, risk would no longer be a real thing, and then there would be no need for insurance at all.

Or maybe there would?

> However, you can't just take ideas to the absolute theoretical limit and expect the results to come out making sense in the real world.

Erm ... yes, you can, and you should? If you don't actually mean "as individually as possible", then you presumably mean "more individual than now, up to the point where I wouldn't agree anymore", which is essentially a vacuous statement.

Either you actually mean your idea, or you don't. If you don't mean it, then it's up to you to clarify, not up to me to make assumptions about what you might have meant.


Take "as individually as possible," to mean, "as individually as practically possible," rather than, "as individually as theoretically possible," and you'll have a better understanding of what I was trying to communicate.


Well, where I live that is not allowed by law. You must have an insurance and if you don't you will get an expensive one automatically to protect others. The second problem is that if somebody causes an accident for me, that person needs to be able to pay. Without an insurance what's ensuring me that there's money on the other end? The only viable alternative to monthly payments is to have a big sum up front or the safety to take a loan if something should happen.


I don't necessarily disagree with you. I posed that question because if Google actually did provide that kind of service, people here would be up in arms against it.

And god forbid if Facebook or Uber did [mind you that Uber is in a reasonably good position to], they would be sharpening the guillotines and calling for Zuck's and Travis' heads.


> Personally, I'd rather car insurance were tailored as individually as possible.

It exists. It is called self-insurance where the number of insured parties is 1.


No, I still value risk smoothing. I know a lot about my driving skill, but I don't have the ability to make perfect predictions about all possible future losses.


I would favour indiviualised pricing and I don't think it would be too hard to achieve without getting too invasive. Basically, your driving record should speak for itself over time.

The problem would be that approach is that insurance companies want to be able to keep an element of fuzziness in their pricing. Predictable pricing is the last thing they want. Why do they only care about beating your current quote, but have no interest in what you were paying for insurance in years gone by? If you have a 10+ year history of perfect driving in dull boring cars, how can insurance companies significantly increase your premium without being able to justify why they suddenly think you are now riskier?

By putting people in buckets they can adjust prices without singling anyone out, and as a by product keep the consume a bit confused abut what is going on.

I see it happening all the time. Drivers with no claims, no points, no convictions and who drive safe, dull dull cars suddenly having the premiums increased. Its not because of them or what they did, its because the insurance company decided to put them in a bucket that justified a price hike.


I would like car insurers to be only allowed to ask a limited white-listed list of options:

* Address * Where you keep your car * Typical mileage * Car model & age

That's probably it. In addition they shouldn't be allowed to use age as a discriminator (we don't do that for National Insurance in the UK and it works well).

It's ridiculous that they ask for things like your job and your marital status.


You haven't said why it is ridiculous. You have merely just asserted it.

Why is it okay to ask for the car model? Yeah you can say that very old cars or specifically unsafe cars might have worse risk profiles but most modern cars (made in the last 10 years, for example) are more or less all road safe yet their risk buckets are different. That's not because of the car itself, it's because of the kinds of drivers who end up buying them. So we are back to the 'hotmail' argument.


It's ok for comprehensive cover because the cost to repair or replace different cars varies.

For third party actually I think you are right - it shouldn't depend on the specific model you drive, but it could depend on the power, e.g. you're more likely to crash in a fast car.

The point is insurance premiums shouldn't depend on things that you can't realistically change like your age, medical history, salary, marital status etc.


Considering drivers are legally required to have insurance, the insurers should at minimum be legally required to provide a readable itemized explanation for their cost decision. I.e. this accident was factored in this much, your email domain was factored in this much, your race was factored in this much, etc. In other words, they should be required to explain the full details of whatever model they are using.

Also potentially the actual variables allowed in this computation should be white-listed by regulators. I.e. you can't use email domains because they were never allowed in the first place.

All of this could be considered "anti-innovation" or whatever, but I see it as the minimal consumer protection that should be provided for a service that drivers may *not skip.


I have no clue about what it takes to run a profitable insurance business, so no opinion on the soundness of that practice.

But I certainly do use metadata about people's communication in my general assessment of their relevance, trustfulness, clue level, etc. Do you use a Hotmail address? Well then, you really are out of it. Gmail? A lot sleaker, yes, but our conversation will be under third party scrutiny, and chances are that you haven't thought that part of it through. Your work mail for a private correspondance? I wouldn't do that in a million years, so yeah, your total score just went down a notch. Is that an Outlook client I see you're using? Fine, but sort of humdrum, and you are probably the kind of person who will send me .docx documents to read, and make a fuss when you get them back, formatting screwed up. And so on and so forth, most of it subconscious, but the evaluations stick, and mostly turn out to be accurate.

And yes, I have made first sortings of job applicants on the the same kind of criteria.


Hotmail addresses still get modern email software - it is essentially the same online client that office 365 uses, and it works well.

I dare say it is more powerful than gmail and just as modern. In fact, much more modern; gmail hasn’t changed much in the last 5 years (or longer).

I have an outlook.com address primarily, but it’s tied to the same hotmail address that I used back in 2006. Yes, Hotmail gives you multiple email addresses for free, and gives you all sorts of options for managing them (and yes, they work for sign in).


Yes, I know all that. But I also know the kind of impression a Hotmail-address is likely to give. So will conclude that a hotmailer either doesn't know or doesn't care about the impression aspect. All part of the picture.


Wait, you're saying that you intentionally favor job applicants based on your admittedly subconscious biases?


Read again. I said most of it subconscius. Whenever I have evaluated applications, I certainly have made the effort to be explicit about every criteria I've used.

It's not a farfetched idea that the person sending a job application from the web client of a Gmail account is probably a different type than the one with a personal domain and running Thunderbird/Linux.


That's not really fair. There are a lot of folks who are bound to the MS Office suite not by choice, but because that's what the corporate system of their employer uses.


In which case we are talking work mail. My evaluation won't really concern someone personally, then, but the organisation they work for.


Someone with a HotMail account has managed to keep a consistent address for about twenty years. That is a good thing. The idea that they should ditch HotMail for the functionally equivalent GMail just because GMail is popular is, of course, idiotic.

But this is an insurance company, so it is plausible that they crunched the numbers and found a real effect.


I still log into my old Hotmail account every few months just to keep it from getting deleted. It’s a complete piece of shit, it’s not even usable let alone comparable in any way to gmail.


If it's a complete piece of shit, why do you want to prevent it from being deleted?


In my case, I've maintained old email addresses when:

1) A website ties accounts to email addresses and doesn't allow changing them, and

2) the email provider recycles expired addresses, i.e. a new user could sign up for my old address.


For historical reasons, in case there were ever some account on some rarely used service where I forgot to change the email, and in case I ever need to build something on the MS Graph API.


maybe he has a forward to a usable email account.


Are you using outlook.com? How is it a complete piece of shit? Microsoft improved it quite a bit. It’s different than Gmail, but far from awful.


I find outlook.com aka hotmail vastly superior to cluttered gmail, are your sure you even tried it?


>The idea that they should ditch HotMail for the functionally equivalent GMail just because GMail is popular is, of course, idiotic.

I think the "functionally equivalent" part is quite arguable, at least it was in the past when many techies made the switch from hotmail to gmail.

People switched because the features and interface were far superior.


>People switched because the features and interface were far superior.

No one should use the web interface, let alone "techies." Use a real email client and everything behaves identically.


Not really, portability was not easy to accomplish back then when you were constantly shifting between work stations and different setups.

Even to this day, I don't know of an easy way to switch between fresh devices all the time and use a "real" email client.

I don't see the problem with someone preferring gmail - personally the +suffix tagging was what I still use to this day.


Well, you could also see it like, somebody not interested in switching to better alternatives, even after 20 years.

Say, driving a 20 year old car even though there are safer alternatives available.


Or...too lacking in ambition to find a proper service and forward the previous address?


Lacking in ambition sounds good from an actuarial point of view.


So what email address should I get that will most reduce my risk of crashing my car?


Probably Hotmail, since without insurance you won't be allowed to buy a car at a dealership, thereby reducing your overall likelihood of acquiring a vehicle, which in turn reduces the likelihood you drive, which in turn reduces the likelihood you crash your car.


How about @nasa.gov ? If you can help build rocket you can probably drive.


Not sure if trolling... Correlation does not mean causation


Here's a bonus messed up way of discriminating that auto insurance companies do.

The minimum liability coverage in many states is $10K. However, every major insurance company insists they always recommend $100K. The average liability claim in an auto accident is around $15K.

However, if you look at that and think, hey, I have liquid funds, $10K - $100K isn't that big a gap for me, and if $15K is the average then $10K seems like a pretty good economic level.

Guess what? The insurance companies have decided that having the minimum $10K worth of coverage means that you are exhibiting high risk behavior and you get to pay more if you ever want to increase that compared to those who already have it, say to $25K when you get married and your wife is more nervous about it.

Oh, you want to shop around do you? Well they also target other insurance companies who primarily market themselves as selling those $10K policies that the larger companies refuse to write. Your high-risk tag will hit if you are coming from one of those (like the General) regardless of your coverage.

I'll repeat that in case its unclear. I have to pay more for my legally required insurance because I didn't want to buy more than what I am legally required to buy. I am being punished because I didn't buy what they preferred to sell me from the companies they preferred me to buy it from.

But once I behave "correctly", by paying more for a policy from one of the correct insurance companies with the arbitrary minimum the insurance companies prefer, they will bless me by removing my high-risk tag after 6 months. Thanks.


rant: another weird symptom of the world rapidly turning into shit.

It seems like more knowledge only makes the world worse. For a consumer there is very very very little benefit of corporations knowing anything about you.

The entire concept of insurance is to lessen the financial impact of an unfortunate event by everyone contributing a smaller amount. Fighting against this principle is insurance companies that want to keep accepting money but really don't want to hand it out again. To do this they will disadvantage anyone who is more likely to need their money. Some things make sense, maybe smokers should pay more for health insurance but with more and more data mining they will be able to find trends where people that do X might be 0.001% more likely to claim and will therefore charge them more. If you happen to also do X you will pay more even if you yourself arn't more likely to claim. and when they chuck ML/AI into the mix to look for trends it will just get worse.


So you think it would be better if insurance companies knew exactly all the details of your life so they could custom design your premium for you? Because that is the real outcome of making premiums specifically catered to your life experiences as opposed to general buckets.

If everyone were charged the same premium, your premiums would actually be much more, because now you would be deemed as posing the same amount of risk on the road as a person who has caused 5 accidents over 10 years.

You can't have your cake and eat it too.


no, definitely not! god no!

that does away with the idea of insurance. what if your genes said that you were 15% more likely to get cancer and all the insurance companies won't accept you?

what if to keep your premiums you had to not eat bacon?

hellish!


More worryingly, they also appear to be using first name to influence insurance prices. Which is problematic, since first name correlates with race. See: https://news.ycombinator.com/item?id=16212991


Frankly maybe they should just be allowed to charge differential prices based on race. If race correlates with different insurance outcomes for some reason then covering up the symptoms doesn't change anything, the same link might just end up being discovered by an ML system via other signals. I've seen ML repeatedly discover inconvenient truths in other (non individual related) contexts and suppressing the feature it was using to make the connection just caused it to find worse equivalents based on combinations of other features.

My sympathy is somewhat limited because in recent times it feels like there's plenty of explicit racism in the world against whites, like the BBC constantly posting internships and job ads that specifically forbid white people from applying. If western societies have somehow concluded that it's OK to engage in that sort of thing, then why shouldn't race be a factor in insurance contributions? At least that has some sort of coherent argument for why it's a good thing (more accurate premiums for everyone).


Firstly, it's correlation not causation though, at the moment certain names are associated with lower economic status and poor education, purely because immigrants take time to accrue assets and move up the ladder, and certain races have been historically heavily discriminated against and so were forced into lower economic statuses and worse schools.

Secondly, you seem to be against positive discrimination (which I was for a time).

All you have to do to change your mind is look at the actual statistics. We've been trying to not discriminate for over a generation, for over 30/40 years.

For example, how many female executives are there? How many female politicians are there? In the UK, for example, both those numbers were terrible. No, for politicians, we've got all-women short lists, and it's working, the numbers are starting to approach an even balance. We've even just got our second female Prime Minister, before the US even managed one.

It's basically a tiny percentage. People like employing people in their "club". People just like them. For well paid people, that's the white, male club. Maybe it's subconscious racism, maybe it's concious racism, maybe it's simply network effects, it doesn't matter.

What matters is that we've tried the "we don't discriminate" path for over a generation, it didn't work and therefore positive discrimination is necessary to correct imbalances in many fields. At some point we'll be able to stop, but simply saying that as a society we wouldn't discriminate any more didn't work.


I'm not sure your British example shows what you think.

The Labour party in the UK has all women shortlists, as befitting its left wing policies. It has never had a female leader, let alone a female prime minister.

The Conservative party refuses to use gender quotas, and has given the UK both its female prime ministers, who won fair elections, in which neither campaigned on a Clintonesque "vote for me because I'm a woman and that'd be neat" line.

To me, it doesn't seem like "positive discrimination" even works there, and by the way, let's call it what it is, sexism and racism against majorities. Nothing positive about it. The British Empire oppressed all sorts of ethnic majorities (in the countries that were conquered), because they believed the conquered people were savages/lesser people/etc. We don't hesitate to call that racism do we?


I think those claims are fundamentally based on the belief that in a truly fair society, we'd see proportionate representations of all genders, ethnicities, etc. in all places. That's just a belief, though. There's no evidence or science supporting that. It is at least as possible (if not more so) that cultural factors, individuals' own decisions, and other factors are contributing more.

The reality is that you can draw lines anywhere between large groups of people and see different outcomes. You will see different outcomes between people born in New York or LA. People with brown eyes and people with hazel eyes. People in Kansas and people in Arkansas. We can openly acknowledge differences in outcomes between these groups will happen despite a lack of systemic oppression keeping one or the other down, yet as soon as race or gender come up, people insist we have to put blinders on and pretend we're all identical robots all the way up into the board room / Congress.

When you outright assume that differences in outcomes are caused by oppression, you're making an unfalsifiable claim. Nothing has been proven, and you could just be wrong. And if you're wrong, then you're just rationalizing racism against people you perceive are unfairly oppressive. But since it's unfalsifiable, you can never be called out on it.


We have clear evidence it's not working. That was my point.

Your entire point is predicated on the assumption we don't, and so your whole argument falls.

For example, what's unfalsifiable about "evidence shows racism and sexism is still rampant in executive appointments". It's verifiable and falsifiable. You can, in less than a minute with a search engine, find the percentages of women on executive boards.


> It's verifiable and falsifiable.

You can verify the representations are not proportionate. There is no evidence that, if no oppression existed, the representations would be proportionate. This is a very basic and clear difference.

The NBA has a very high representation of black men, and a very low representation of Jewish or Hispanic men. This in no way proves racism is rampant in the NBA. The cause is likely other factors, and if you decided to "positively discriminate" against black men to "fix" this problem, you wouldn't actually be "fixing" anything. You'd just be rationalizing discriminating on a racial basis.


Or maybe - and call me crazy here - men and women are different, and want different things, and therefore, as we reduce discrimination, nearly every job and hobby will end up with a huge gender imbalance as individuals sort themselves into groups doing what they truly, freely want to do.

Then your conviction that every job must be exactly 50% of each gender no matter how much power must be wielded to make it happen would be a massively misguided recipe for tyranny, harming both men and women.


Well, I pay significantly more on my insurance purely because I'm a man.


I would not be surprised if there is a correlation between maintaining a hotmail or yahoo account and poor decision-making impacting one's driving record.


I received an email from Admiral yesterday, which seems related but not completely:

"""

You may have seen a story in the news which claims we use customers' names to price our insurance based on race. This is 100% not the case and we do not, and have never, used this information to provide a price to our customers. I'm sorry if this story has caused you any concerns.

To offer our prices we use a complex rating structure and rate on many different variables and data sources. The journalists have misunderstood our pricing structure and the insurance quotes in the story are not like for like.

This email is to offer you an explanation of the press story and to offer my apologies for any concern caused. There is no need for you to take any action.

"""

Not sure which story that's referring to; seems somewhat different to (and much worse than) this one.

On a slightly different note: that old "trick" of adding an experienced driver to one's insurance has always seemed a bit odd to me. And as a 36-year-old, I have to say I'd assumed I was already past the point of its applicability. But I had cause to add my mother to an Admiral policy this week, and was very surprised to discover that adding her would give me a refund.


There's loads of crazy stuff like that.

The quote system asks if your car has anything added to it, like roof racks, spoilers, tow bar, body kit. You go "Oh yeah, a tow bar" thinking crap this will cost extra and their backend correlation engine figures "tow bar, probably caravan trailer, old boring safe driver, low risk" and you get a discount.

The UK government gave insurers direct electronic access to driving license records. If you tell them your details they can instantly check if you've got points, or have been disqualified from driving. Insurers ask for your license details. To use the government database right? Nope, that's a bunch of expensive R&D, if you make it optional, people with bad records just say they don't know their license number, charge them more. Simple, no extra R&D needed.


There was a story going around where apparently the premium was significantly higher if you put your name as "Mohammed", rather than "John".


So, I wonder what fees they charge for @aol email accounts?


Please correct me if I am wrong, but as far as I can tell there is no advantage to the insurers, with respect only to the returns on policies, in producing accurate assessments of individual risk levels, as compared with only accurately assessing the aggregate risk level - the expected return is the same either way (and I would imagine that the individualised assessment is much more challenging).

Assuming I am correct about that, then the advantage to them of providing individualised premiums is to appear competitive in the market place, by being able to confidently match their premiums to individual risk levels, and thus being able to advertise (the possibility of) lower premiums.

Again assuming all of that is true, then there really isn't any net benefit - in fact presumably we have to pay for this whole individualised risk assessment exercise, which will actually raise the total costs.

More speculatively, I imagine that their job is made somewhat easier by cognitive biases that cause people to think, on average, that they are better than average at (for example) driving.


Related: Capital One was giving different loans based on browser used[0].

[0] https://consumerist.com/2010/11/01/capital-one-made-me-diffe...


Kind of wondering what would happen if you applied for a loan using w3m or lynx.


Just invent a time machine and go back to 2010 to check with Capital One. (or ... just wait, I'm sure this practice is still around and we'll here about other cases)


I've actually modeled email domain as a feature in a regression model of customer spending habits. We were all surprised to find it is a very predictive signal!

Of course insurance companies want to use every signal available to them to price discriminate on rates. If you are a good driver, the insurance company wants you because you are unlikely to file claims. Likewise, you are happy to receive a discount for being a good driver. A simple signal of being a good driver is your driving record, but we all know this isn't very reliable by itself. So you end up with other signals being used. Literally any signal that is not protected (like race/gender) will get used in any insurance model. None of this is new. The color of your car is used surely, so why is it so shocking that your email domain is used as well?


And now we have to know, what did the regression show for popular email services?


I didn't get paid to say.


rightfully so, i use my hotmail for junk too. its a crummy service compared to like gmail. hotmail pretty much lets most spam through to the inbox


If Hotmail uses the same backend as Outlook, I cannot agree; I've never felt like I've gotten spam of any kind.

Today, e-mail spam seems to bee a solved problem from my own user perspective. The next big thing to tackle is telemarketer phone spam.


> The next big thing to tackle is telemarketer phone spam.

Even tried to use a new number, worked for a year or two and it got leaked to some telemarketer.

With the effort involved to change my number again. I decided to try a new tactic; waste their time.

They call me, I identify it is spam(eg first name instead of real name) and ask them to hold and then go silent/ghost until they hang-up.

I feel guilty to hear "sir, sir, voice of despair sir... beep beep beep" because I worked in a call center and understand these guys just want to make a living. I tried to be nice and asked my number to be removed, no luck.

I engage and then waste their time and now suddenly I stop to get telemarketer calls?! On a side-note I feel like I started to receive more SMS spam. So I'm now at the point of wanting to opt-in so that they can call me to waste their time. _sigh_


> Today, e-mail spam seems to bee a solved problem from my own user perspective.

Honest question.. how long have you had the email addresses you're using for comparison?

I've had the same email for the past 12 years, corporate, and now moved into O365. It's _average_ for spam filtering as far as my subjective experience goes.


Same email for past 8 years, all (100%) of the spam goes directly into the spam folder where it belongs.

Marketing emails go into "promotions" - but I could probably unsubscribe from those if I wanted.

The only messages in my inbox are either from real people or are otherwise notifications I want to get.


> hotmail pretty much lets most spam through to the inbox

At the same time they are one of the most difficult email providers to deliver transactional emails (eg. email address confirmation) to.


Why am I not surprised that it's Admiral, one of the worst car insurers I've ever had the misfortune to deal with.


The same company that wanted to force people to give them access to their social media to help determine their premiums.

https://www.theguardian.com/technology/2016/nov/02/admiral-t...

Thankfully, facebook put a stop to it. But if they hadn't I'm certain it would have become a common practice.


Compared to some of the antics we had with Direct Line, Admiral have been really good - not had any serious problems and when my wife crashed my car last year they looked after everything pretty well.


If you're an awful company, you surely need to make sure that your data is more accurate than the competition. It's like running an awful store, it's doable with prices low enough.


I do use Gmail. But I give my hotmail address to companies (like insurers), which ultimately forwards to my gmail.


I wonder if a computer noticed that a hotmail account was correlated with more payouts.

https://jacobitemag.com/2017/08/29/a-i-bias-doesnt-mean-what...


I don't know about Mahapatra, but Stuccio is a denizen of this very site: https://news.ycombinator.com/user?id=yummyfajitas


He was either banned or left after being threatened with a ban. Check his last post time.


Thanks for the link. I found some fairly interesting articles there.


I think the real correlation here is with age. People who still use a Hotmail address are likely from carryovers of the MSN era, i.e. millenials.


Millennials?!

My experience has always been that customers with email addresses at older services (Hotmail, Yahoo, Earthlink, AOL etc) are usually boomers that are using an account set up for them by their millennial aged grandchildren back in the 90's or early 2000's...


Exactly the demographic that will be more likely to exhibit impaired driving.

This makes perfect sense.


Except that age is almost certainly already a variable in the rating system, and its effect should be taken in account already.


My comment was a little tongue-in-cheek, but you make a good a point. As other posts have mentioned, the email would be a proxy for other factors associated with age that age by itself can't capture.

If it's age at all. I'd bet it might be more closely associated with something harder to articulate or measure, along the lines of "a proclivity for clinging to subpar solutions." Someone who uses a subpar email service might also have ingrained poor driving habits.


Aaah, but it's not just age. It's also technical incompetence. People with Hotmail accounts are probably less likely to do their own maintenance, after all they haven't maintained their e-mail by moving to a provider with functional spam filtering. Their cars are more likely to break down. Not all insurance claims are due to accidents.


Millennial male here (car insurers love taking my money), still use the same hotmail account that I made when I wanted to chat with my elementary school friends. I have two different gmail accounts, but I can't ever seem to make the switch over (I've tried multiple times). I think I'm the only person who uses gmail for spam accounts and hotmail as my real account.


<snort>

I can't think of a millennial that would be caught dead with a Hotmail account.

However, I do agree it's age. Probably advanced age rather than young age.

The real problem is that we are allowing "AI analysis" to allow people to discriminate upon something (Hotmail ownership) which we would NOT allow people to discriminate on (age, in this instance) outright.


Car insurers are allowed to discriminate by age...? As I’ve aged out of college, my rates have gone down quite a bit.

I guess one worry is that you have multicollinearity, you might get a “double counting” effect on the same signal. But, you should be able to see that in a proper model.


Car insurers have to place everybody in the same age group into the same bracket by default. They can't penalize for something unrelated to actual historical performance of the individual situation(like type of car, zipcode, etc.).

The danger with allowing these kinds of "AI" models is that they will become instances of parallel construction: "We don't want to insure people who are <X>, so construct a model until it coughs up a surrogate to <X> at a 95% probability."




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: