Basically, one instance of bias is the fact that many crime-prediction models are trained on police data, which means they will predict crime in places more often targeted by the police anyway. Then the model predictions even amplify that effect, since more training data may be generated from the places now more often policed, etc.
There's lots of resources out there on AI fairness these days. I think everyone who tries stuff like crime prediction should read up on that topic.
You can listen to an interview she does on econtalk -- interesting to learn more about the hidden biases.
+1000. That book should be required reading for anyone working in machine learning. Written by a former Wall Street quant who has the math down cold.
What she knows about rampant bias in allegedly politically agnostic machine learning circles is that the formulation and production of answers is trivial when compared to the formulation and production of questions.
Super-relevant to this thread is her work on recidivism risk scoring algos run on prisoners and defendants. The feedback loops that these algos spur are seriously damaging the lives of huge numbers of persons in the criminal justice system far beyond proportionality for the offenses that brought them there.
If you train an AI on data that solely consists of trading firms & banks, you should recognize that it's an AI biased towards detecting activity at trading firms & banks, and that it might be lacking in other areas.
It becomes dangerous when such an AI is marketed and assumed as unbiased and the source of truth for detecting all insider trading activity.
In other words if done decently well, they'd obtain data from all places but focus on the problematic areas till they reach equilibrium and then you redirect to the next hotspot, no?
It's a common known fact that crime-ridden, mostly minority neighborhoods have a complete dearth of services such as police, firefighters, and ambulance. Even back in the 80s, Public Enemy released a song called "911 is a Joke" because if you called 911 from a black neighborhood, they won't respond.
It's routinely known that cops will rather stay in rich neighborhoods and let the poorer neighborhoods fester. This happened during the LA riots where Koreantown burned because most of the cops went to Beverly Hill, etc, to protect the rich people.
I saw this first hand in Detroit about 8 years ago. I was driving through a bad neighborhood in Detroit with my friend, and we entered the neighborhood of Grosse Point, a rich area. Cops followed us until we left, which is their way of saying "you don't belong here". Meanwhile, among the burnt down houses and broken windows of Detroit proper, you couldn't see a single cop.
The difference? The neighboring town had a couple decades long reputation of being a hotbed for crime.
That reputation would be cemented in an AI system that was trained off of police data.
See this tutorial given at this years NIPS machine learning conference: http://mrtz.org/nips17/#/
Having worked in law enforcement at various levels (state and federal) in a prior professional life, I can attest to the differences in what gets reported and how based upon who was working or supervising and where they were assigned. Humans are simply not reliable reporters for this kind of data. No matter how hard we try to make the reports plain and standardized our biases, one way or another, will always seep in.
The code is open-sourced in an R Notebook: http://minimaxir.com/notebooks/predicting-arrests/
The model performance isn't great enough to usher in precrime, even in the best case. There are likely better approaches nowadays. (e.g. since the location data is spatial, a convolutional neural network might work better.)
Statistics have been consistent in reporting that men commit more criminal acts than women. Self-reported delinquent acts are also higher for men than women across many different actions. Burton, et al. (1998) found that low levels of self control are associated with criminal activity. Many professionals have offered explanations for this sex difference. Some differing explanations include men's evolutionary tendency toward risk and violent behavior, sex differences in activity, social support, and gender inequality.
OP has stated elsewhere in the comments that the reason they're interested in the tech is for testing it in some other unrelated area.
Palantir already does all this on a massive scale for the US govt. Want to affect future crime in a positive way? Solve the problems that contribute to it.
Not that you asked.
I know people have done these types of studies before but found that they easily became bias, and thus there is a wariness of using it (like the judge AI who was more likely to convict black people). I'm not sure how it is in Norway, but I don't expect it to be much different from America, where there are places which are disproportionately convicted of crimes, where other areas such crimes are seen as infractions. This is really going to mess with the data and perpetuate the bad system.
If I recall it was kind of a lone wolf effort, so I don’t know the rigor of his techniques, howver you never know if he might want to share results or collaborate.
Don’t have a link handy, but that should be enough info to google if you’re interested.
He's the founder of the Murder Accountability Project:
The British series "The Code" speaks a little bit about it in ep 3:
Food for thought on how incredibly biased these effort can be.
And in the case of crime, chicago should be a pretty good dataset.
It's a similar problem to using ML to give people credit scores.
If the training data includes a lot of minorities and poor people breaking laws / delinquent payments, then your ML will simply key on race/economic status as a predictor.
So you've built a system that simply targets those groups.
But you might object and say that this race/economic status targeting gives the highest accuracy! It was only learned in the training data, after all. You can make a great classifier that is extremely unfair.
So you have to realize there is a conflict here between accuracy and fairness. This means there is a conflict between observational data (training), and using that data to produce decisions/outcomes.
If you make decisions/outcomes that reinforce the training data, you do not give racial groups/low economic status people a chance to improve their lives.
That is extremely inhuman, predatory, and unfair.
Lady Justice doesn't wear a blindfold as a fashion accessory. Discarding information is a key factor in nearly every established system of justice / morality. Refusing to do so (i.e. "just" running a ML algorithm) places you directly at odds with society's hard-earned best practices.
I never noticed that before. Thanks for pointing this out!
Ok, and to what end?
I assume someone else will be consuming these predictions, else you wouldn't bother at all.
What are your customers/users going to do with these predictions?
Or is that simply not your responsibility; someone else's problem?