
Show HN: I scraped local court records to find dirty cops - kristintynski
https://lawsuit.org/keeping-cops-accountable/
======
abstractbarista
You have to be careful about coming to fast conclusions based on these
statistics. I can imagine these numbers being sucked up by various ill-
informed political commentators to mislead the public on what police
departments are doing. (Regardless, I love this work and would love to see it
done nationwide!)

-It could be that some cops are assigned to densely white areas, while others must work in mostly black areas. This does not necessitate racism on behalf of the officers or management. (Perhaps it does in the overall local culture, as indicated by the self-segregation? To police an entire area, you must distribute resources to various portions of that area.) Racist cops might even _prefer_ the dominant race of the area they work in, yet _still_ mostly cite that dominant race, simply because there are so few disliked-race targets to look for.

-Certain violations cited to predominantly one race may simply indicate a _reality_ in which people of that race break that law more often. This does not necessitate racist targeting on behalf of the officers. At that point, maybe a discussion about how seriously this law needs to be enforced should be had. Window tint? Dime bag of weed? Is that _really_ such a big deal? (Ideally, these laws would just be removed. But politically this seems harder than using officer discretion.)

~~~
kristintynski
I agree with you, and that's why the officer names are redacted. Things we
might at first glance think are because of biased/racist/sexist cops might
have good explanations. That said, visualizing this data does surface outliers
really well, and each outlier should be further investigated IMO.

It all starts with getting all this data into one place, scraped from the
thousands of county court records sites. It's a big task, but this data is
priceless.

~~~
abstractbarista
Definitely agree, and hey this is really cool work. :) Wouldn't it be
beautiful to have a nationwide map, where users could zoom to their
county/city/town level, and immediately look at various outliers?

~~~
kristintynski
I'm beginning to think it might be the start to a more just society. Is that
thinking this is bigger than it is? The implications just seem big to me. I
don't think we are going to get privacy back, but if we can at least have
transparency into things like policing, we stand a much better chance at
holding our government accountable.

------
dkn775
I worked in this area for the state of MD at one of the foremost traffic
records outfits in the US and have done this work previously, there are a few
considerations you are leaving out which comes as a result of knowing these
data.

You want to never look at things at the citation level, you should group it
into stops. One driver can get many citations per stop, which confounds your
findings..grouping can be done by making an ID of DL#+Date of stop. If time is
included, put that on ONLY if it has the same time for all citation issued in
the stop (like it is here in MD) - You could end here, or you could add an
indication Y/N for whether a citation was issued. This way, you are looking at
stops where at least 1 citation was issued, instead of 3 counts for a race off
of one stop where three tickets were issued.

Also, the racial categories are based upon the DMV database, and what options
they have permitted. The confounder here is racially ambiguous persons may be
misclassified, while obviously black people etc will reliably be called such.

Additionally, you want to use something like the National Household Travel
Survey to get an idea of the racial background of DRIVERS in the area you are
examining. It is quite possible that an area has more drivers of a certain
race. Note this still will not be perfect due to out of jurisdiction drivers.

Please contact me with any questions. Very interesting stuff, I just want to
make sure you do it right.....if you hope to make any changes at the govt
level, these considerations will be widely known in the traffic records
sector.

~~~
kristintynski
This is SO useful. Thank you so much for the insights. I'm not 100% certain,
but I think finding traffic stop data is pretty rare for most counties.

If citation data is all we have, by itself in its raw form is still important
considering how locked away this data is currently.

I think there are a lot of smart people like yourself that would do in-depth
analysis of even this messy data to pull out useful insights.

I'd be eager to get people like you involved if you had an interest. you can
email me at:

Kristin - at - Frac.tl

 __That goes for anyone who has an interest in helping. :)

------
kristintynski
I plan do do this for many more counties. I'm sure it isn't a coincidence how
difficult many are to scrape. This data often feels purposefully obfuscated.
It only seems right to me that everyone should see local data aggregated like
this. If police intent do use data to track the population, we should be able
to track them as well.

~~~
dkn775
Before you expand to more counties, please address the critiques here. Also
note that these data are not homogeneous across jurisdictions. I like what
you’re doing, but you need to be very careful about this analysis, or else it
will just come off as uninformed by the people who work in this space. State
public safety data is not uniformly collected.

You want to ensure you have all these caveats mentioned so that when it gains
traction, you won’t waste as much time dealing with people who will try to
invalidate your findings.

~~~
kristintynski
I'm much more interested in making this data open and complete and potentially
close to live updating as possible.

I agree that it will take more depth to get at true patterns in the data, and
there are many important considerations for data collection, cleaning, and
analysis.

I hope to find some others who would like to work with me on this.

~~~
dkn775
I sent you an email.

------
justwalt
The only objection I can think of that an honest police department might have
against this sort of analysis taking place is that some figures might be taken
out of context or not adequately explained, such as officers’ numbers who
patrol neighborhoods that are predominantly one ethnicity.

If I believe in my work, I don’t mind it scrutinized, and would welcome the
analysis, given that the analyst is not excessively biased.

~~~
t-writescode
Think more evil-ly on the part of an invested party with a mission using the
data in ways they want.

Imagine high-resolution evaluation of your work as a programmer by a nefarious
party that is out to get you or your team.

~~~
withinboredom
> Imagine high-resolution evaluation of your work as a programmer by a
> nefarious party that is out to get you or your team.

Been there. I had to “make one commit a day” because I wasn’t committing as
much as my peers. The next week of commits was me automating commit + Pr
generation that only made changes to white space (plus normal work, which was
mostly research at the time).

------
oftenwrong
This seems like a promising way of detecting bias. However, I would be
surprised if this type of analysis was permitted to continue without
objections from police organisations.

~~~
kristintynski
It's a matter of public record, though I guess they could gripe about any data
viz methodologies... still, we need this type of oversight.

------
lowdose
It would be nice if the actual racial distribution of the population of the
county is added for context. For a random visitor from the internets the lack
of context hinders the ability to understand what implications the data has.

When 90% of the population is black the conclusions are different from when
the population is 90% white.

------
sneak
Redacting data in your results is very close to the same as not publishing
your results. Please do not redact critical information that allows people to
validate or otherwise cross-check your results.

