Hacker News new | past | comments | ask | show | jobs | submit login

> Having a Hotmail account is a symptom of a problem which could lead to higher risk, but it is judging that symptom as if it is a cause which leads to risk.

No, it's not judging that symptom as a cause. It doesn't even matter if it's a cause or a symptom, as long as it is statistically sound. Hotmail, like [not] smoking, [not] exercising, [not] drinking alcohol, is a choice. Having chosen one way or the other does, statistically, imply other things.

Whether we, as a society, want to allow this kind of statistical inference to meaningfully impact things like insurance costs, prison terms and other important decisions is a completely different issue.

Then it needs to be clear what the effects of the choice are, so you can choose. Then the effect will disappear because it's not a true predictor. Each of your examples have a clear explanation, hotmail doesn't.

What's a "true predictor"? Statistics deals with useful predictors (biased/unbiased, efficient/inefficient, sufficient/insufficient etc), and in that sense, it's likely that a hotmail account is a good predictor of (not) being good or up to date with modern computer technology; that's definitely true of my sample anecdata (and AOL is a stronger predictor of the same).

Now, assume for a second that ability to use modern technology (like touch screens and digital displays) correlates with safer driving -- an assumption that could actually be tested. If that's the case, then you would probably accept that a score on some objective[0] test of tech savvy could affect car insurance rates, right? An objective, though slightly less accurate, estimator of tech savvy is the domain name - "@microsoft.com" or "@google.com" means likely savvy, "@aol.com" or "@hotmail.com" likely less so.

Age, Gender and Religion are other predictors of the same kind; e.g. observant Jews have significantly lower (~15% in some stats) chance of being in a car accident. That's not because they are better drivers or drive better cars - it's because they refrain from driving for one day a week. Knowing someone is Jewish therefore implies lower chance of being in a car accident - maybe only 3%, because 80% do not observe. So "being Jewish" is, statistically, an informative predictor.

Again, I am not saying these SHOULD be legal - but that's a philosophical/legal issue. Where I now reside, the legally approved predictors are: (a) age, (b) accidents in the last 5 years, (c) most recent validity period of drivers license, and (d) age and safety rating of car. It's not as "free" a market, but it is much saner.

[0] but objectivity is subjective... http://wiki.c2.com/?ObjectivityIsAnIllusion

Let's say a true predictor is causal. Because each example you give is causal: age causes defects, smoking causes sickness, alcohol impairs, religion causes behaviour. Using hotmail versus gmail will not cause something -- private domains are outliers. If you make the predictor transparent the effect should disappear because no one will use a hotmail address unless they are technically inept.

I know some non-causal predictors, yet they're all controversial: various data as predictors for race, for example an address to determine which part of the city one lives.

This is my argument: it could be a `good` predictor but it's a proxy and it only works because there is no transparency. As such I don't think it's ethical for two different reasons.

I'd also like to add to your anecdata: I still use my first hotmail email as primary. I'm a young technical and safe driver (I hope).

> Let's say a true predictor is causal.

That's your definition. "Causal" is well defined, "true predictor" isn't. Let's adopt your definition for the sake of argument ...

> age causes defects, smoking causes sickness, alcohol impairs

Age is correlated with defects but does not directly cause it. Some people at 40 have less "age related defects" than other people at 30.

Smoking causes sickness, but is also correlated with less obesity, and -- in the minutes/hours after smoking, is correlated with being more alert (possibly caused by a nicotine response, I don't remember the biology).

Religion _sometimes_ causes behavior, the only thing you can say are that they are correlated - the vast majority of Jewish people I know are NOT observant, and yet - over the general population - it does correlate with less time on the roads even for the general population.

Yes, it's a proxy, but so is age, or "time since you got your license" - Some new drivers are way better than other drivers who have been driving for 20 years.

Whether it works only because of (no) transparency is up for debate - my great uncle could not switch from AOL to gmail on his own without my help, and does not want to. He might want to if he knew it saved him $100/year in insurance costs, true, but it would cost me time in support costs (gmail is easier; but he's been using AOL for over 20 years and it's hard for him to change his ways).

> As such I don't think it's ethical for two different reasons.

I don't think it's ethical either. And I would thus like it to be made illegal - because, statistically speaking, right now it is likely[0] a good proxy and predictor for a lot of things, and therefore as long as it's legal it will be used.

We have "blacklisted" other things, like race -- e.g., in the US, unfortunately "skin darkness" is a good predictor for the question "having spent time in jail" (about 10% of black people vs. 1% of white people). It is illegal to use as a predictor -- but it turns out that other "legal" predictors such as "lives in a high crime neighborhood" correlate strongly with "skin darkness" and ARE used to determine sentences.

The right thing, IMO, is to whitelist acceptable predictors, not blacklist unacceptable ones because you can always find proxies for the unacceptable ones, and it is easy to game one of those proxies, but hard to game 20 of them at the same time. But that's an ethical, legal and philosophical debate. Statistically, proxies are sound.

[0] I have no access to real data, I'm just assuming for the sake of argument

You seem to keep arguing that your examples are not causal but only correlated. Because something does not happen 100% of the time does not mean it's not causal. Smoking causes cancer, not everyone that smokes actually gets cancer.

I'm actually not satisfied with that definition, but for now it works well enough for the sake of argument. I would like to define it as a predictor whose strength does not change when it's used transparently. But that definition is too vague for now.

And you're right that when used transparently, the predictor will probably become more accurate.

> Smoking causes sickness, but is also correlated with less obesity, and -- in the minutes/hours after smoking, is correlated with being more alert (possibly caused by a nicotine response, I don't remember the biology).

In [0]: [Since] nicotine is a metabolic stimulant and appetite suppressant, quitting or reducing smoking could lead to weight gain.

I didn't know that smoking caused less obesity. I would have thought it was because of monetary reasons.

[0] http://www.nber.org/papers/w21937

I am using this definition of causality[0], which is precise, and you use it in a much more general sense.

> I didn't know that smoking caused less obesity. I would have thought it was because of monetary reasons.

It does not cause less obesity. It is correlated with less obesity, possibly through suppressed appetite -- but it's also possible that it's actually a reflection of economic background which is also correlated with obesity.

It apparently does cause more alertness (in the same way that coffee does -- enough of it will make you awake and alert, too much will kill you).

[0] https://en.wikipedia.org/wiki/Causality#Probabilistic_causat...

That's correct on all counts except it's also irrelevant whether the variable is a choice or not.

It's a philosophical/ethical/legal question what should be allowed to go into insurance rates, so relevance is determined by the framework in which you evaluate things.

I am personally ok with higher car insurance premiums for people with a prior DUI conviction, and higher health insurance premiums for smokers; I'm not ok with higher health insurance premiums for albinos, or people with a family history of cardiac problems or family history of type I diabetes -- even though they are mathematically justified (which means that -- not being in either group -- my own premiums are higher as a result).

Yes, but that is a different issue as you said in your original comment. Statistically it is irrelevant. The "is it a choice" part also belongs in the "completely different issue" basket, so to speak.

gosh. no, having a hotmail account correlation it doesn't imply anything at all. is not causation, that's worth repeating 1000 times.

> Hotmail, like [not] smoking, [not] exercising, [not] drinking alcohol, is a choice.


> correlation it doesn't imply anything at all. is not causation, that's worth repeating 1000 times.

You might want to understand correlation, causation and dependence, and implication before repeating that 1000 times. I specifically used the term "statistically imply", which is correct, and means "has predictive power", which is true in this case of correlation whether or not causation is involved.

Correlation actually implies a lot of things. At the very least, it implies statistical dependence (to the confidence attainable with the data). The converse is NOT true -- lack of correlation does not imply independence. But as independence implies LACK of correlation, correlation necessarily implies dependence. This is very basic probability, and is relevant to estimation of which insurance is an application. Causation is a completely different subject.

> lol

Care to elaborate? What exactly do you disagree with? Is any of the listed things not a choice?

Yelling that correlation is not causation 1000 times is ignorant when the effectiveness of a method depends only on correlation, and does not depend on the existence of a causal link. And in the case of filtering, that is the case.

Think screening for a virus vs trying to cure the virus. If warts are a symptom of the virus, having warts absolutely matters in screening. However, cutting off the warts will not remove the virus.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact