Hacker News new | past | comments | ask | show | jobs | submit login

Let's say a true predictor is causal. Because each example you give is causal: age causes defects, smoking causes sickness, alcohol impairs, religion causes behaviour. Using hotmail versus gmail will not cause something -- private domains are outliers. If you make the predictor transparent the effect should disappear because no one will use a hotmail address unless they are technically inept.

I know some non-causal predictors, yet they're all controversial: various data as predictors for race, for example an address to determine which part of the city one lives.

This is my argument: it could be a `good` predictor but it's a proxy and it only works because there is no transparency. As such I don't think it's ethical for two different reasons.

I'd also like to add to your anecdata: I still use my first hotmail email as primary. I'm a young technical and safe driver (I hope).




> Let's say a true predictor is causal.

That's your definition. "Causal" is well defined, "true predictor" isn't. Let's adopt your definition for the sake of argument ...

> age causes defects, smoking causes sickness, alcohol impairs

Age is correlated with defects but does not directly cause it. Some people at 40 have less "age related defects" than other people at 30.

Smoking causes sickness, but is also correlated with less obesity, and -- in the minutes/hours after smoking, is correlated with being more alert (possibly caused by a nicotine response, I don't remember the biology).

Religion _sometimes_ causes behavior, the only thing you can say are that they are correlated - the vast majority of Jewish people I know are NOT observant, and yet - over the general population - it does correlate with less time on the roads even for the general population.

Yes, it's a proxy, but so is age, or "time since you got your license" - Some new drivers are way better than other drivers who have been driving for 20 years.

Whether it works only because of (no) transparency is up for debate - my great uncle could not switch from AOL to gmail on his own without my help, and does not want to. He might want to if he knew it saved him $100/year in insurance costs, true, but it would cost me time in support costs (gmail is easier; but he's been using AOL for over 20 years and it's hard for him to change his ways).

> As such I don't think it's ethical for two different reasons.

I don't think it's ethical either. And I would thus like it to be made illegal - because, statistically speaking, right now it is likely[0] a good proxy and predictor for a lot of things, and therefore as long as it's legal it will be used.

We have "blacklisted" other things, like race -- e.g., in the US, unfortunately "skin darkness" is a good predictor for the question "having spent time in jail" (about 10% of black people vs. 1% of white people). It is illegal to use as a predictor -- but it turns out that other "legal" predictors such as "lives in a high crime neighborhood" correlate strongly with "skin darkness" and ARE used to determine sentences.

The right thing, IMO, is to whitelist acceptable predictors, not blacklist unacceptable ones because you can always find proxies for the unacceptable ones, and it is easy to game one of those proxies, but hard to game 20 of them at the same time. But that's an ethical, legal and philosophical debate. Statistically, proxies are sound.

[0] I have no access to real data, I'm just assuming for the sake of argument


You seem to keep arguing that your examples are not causal but only correlated. Because something does not happen 100% of the time does not mean it's not causal. Smoking causes cancer, not everyone that smokes actually gets cancer.

I'm actually not satisfied with that definition, but for now it works well enough for the sake of argument. I would like to define it as a predictor whose strength does not change when it's used transparently. But that definition is too vague for now.

And you're right that when used transparently, the predictor will probably become more accurate.

> Smoking causes sickness, but is also correlated with less obesity, and -- in the minutes/hours after smoking, is correlated with being more alert (possibly caused by a nicotine response, I don't remember the biology).

In [0]: [Since] nicotine is a metabolic stimulant and appetite suppressant, quitting or reducing smoking could lead to weight gain.

I didn't know that smoking caused less obesity. I would have thought it was because of monetary reasons.

[0] http://www.nber.org/papers/w21937


I am using this definition of causality[0], which is precise, and you use it in a much more general sense.

> I didn't know that smoking caused less obesity. I would have thought it was because of monetary reasons.

It does not cause less obesity. It is correlated with less obesity, possibly through suppressed appetite -- but it's also possible that it's actually a reflection of economic background which is also correlated with obesity.

It apparently does cause more alertness (in the same way that coffee does -- enough of it will make you awake and alert, too much will kill you).

[0] https://en.wikipedia.org/wiki/Causality#Probabilistic_causat...




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: