
Do the police give more tickets at the end of the month to meet quotas? - rpicard
http://robert.io/posts/4.html
======
mattdeboard
Now I'm just an uneducated bumpkin, but it seems like you've tried to answer
two questions here but mistakenly labeled them as one (until you back off in
the last paragraph of "Analysis"):

1\. Do police give more tickets at the end of the month? 2\. If they do, is it
because of quotas?

The first is a statistical question, the second is much more complex. I can
think of a few very good reasons why numbers would spike at the beginning of
the month, plummet in the fat middle then spike at the end. None of them
really have much to do with statistics directly.

I think this would be compelling if you'd compare stats for years in which
"quotas" (whatever they decide to call them) were policy against the stats
from years which they weren't.

~~~
wpietri
Circa 1992 I read a book titled something like "How to Speed and Get Away With
It". It was written by a retired state trooper. One of the many things he
mentioned was that the middle of the month is the best time to speed because
troopers tended to fill their quotas early and late.

The most useful piece of advice, though, was on how to pull over. Turns out
that every time a cop pulls somebody over, he's worried about getting smooshed
by traffic or shot by a loon. His advice, which I've followed faithfully, is
to pull waaaay over, so the cop can park close to traffic and create a bubble
of protected space. Then you turn on the interior lights, roll down the
windows, turn off the car, throw the keys on the dash, and hold the steering
wheel at the top with both hands.

Authority figures like respect, and this display of considerate compliance has
worked wonders for me.

~~~
Evbn
Scrum programming burndown charts corroborate this S-curve effect.

------
mapgrep
I don't understand all the gymnastics to correct for the fact that months end
on different days (30, 31, etc.) and all the effort put into dealing with days
by number.

For a given month, you can determine the last day. You can also determine the
last weekday.

So just do a series that for each month shows "average weekday citations, non
automatic" and "final weekday citations, non automatic" (and maybe "first
weekday citations, non automatic").

~~~
Evbn
If quotas are monthly not daily,some gymnastics is required.

~~~
mapgrep
This is exactly my point - the quotas are monthly not daily, but he's doing
the wrong kind of work. He's putting his data into a daily format and then
trying to extrapolate monthly information.

The right way to solve this problem is to find the exact thing you want - the
precise last weekday of the month, the precise first weekday, and a precise
weekday average - which is trivial with any decent date/time library. These
would be the RIGHT kind of gymnastics. Tallying information by day and then
running some kind of regression to approximate last-day traffic is a waste
when you can just GET the last-day traffic.

------
Maro
Warning, bad science!

Looking at the last plot, you can't conclude that police give out more tickets
at the end of the month, unless you also conclude that police like to give out
tickets around the 8th, and don't like to give out tickets around the 11th of
each month. Problem is, there are no error bars, if there were, it'd probably
show that there is no significant evidence for the hypothesis. The error of
the mean of the last plot is probably comparable to the signal itself.

~~~
rpicard
I appreciate your concern! I didn't really draw any conclusions from the data,
as I mentioned in the "Conclusions" sections. This wasn't meant to be hard
science, just some fun with data.

I did make an update with a few, more conventional statistics as well as an
adjustment based on day of the week (see "Updates"):
<http://robert.io/posts/4.html>

------
endianswap
It would be interesting to adjust the data to compensate for day of the week,
as well as for the dates that only happen in a few months out of the year. I
wonder if the 31sts in that year were all in the middle of the week, or were
all on weekends?

~~~
rpicard
That's an interesting thought. It looks like they were spread out pretty
evenly throughout the week in 2011 and 2010, though.

> as well as for the dates that only happen in a few months out of the year

That actually was compensated for. That's what the "Correcting for more
frequent dates" section was about.

~~~
lukeschlather
I'd be interested to see this same graph run for each day of the week, again
normalized.

------
sareon
I can't speak about all police but I've asked this question to my father who
works with the RCMP. He has been in charge of traffic sections for years. They
don't have a quota, not in the usual sense but they do keep track of data on
tickets, this includes how many tickets each member is giving out. They don't
require them to give out X tickets per month but what they do look for is
members who are not giving out around the same amount of tickets as their
peers. This usually indicates they are not patroling and doing their job.

Another thing they can look at with this data is where they are giving out
tickets. If they notice one officer has all his tickets in one location which
is known for high accident rates then they know that person is "fishing"
rather than actually doing active patroling.

~~~
lostlogin
Isn't giving out tickets in a high accident area is exactly what should
happen? Tickets in a low accident area must surely be less likely to change
accident rates?

~~~
msg
This is why the Roomba cleans the edges of the room. It's the dirty part.

------
geofft
Isn't the answer "statistically, we can't disprove the null hypothesis,
because the variation near the end of the month wasn't bigger than the month-
wide variation"? Getting some actual standard deviations might help.

But I think the data is a little more conclusively inconclusive than a mere
"I'm not sure I can tell anything from the data".

------
RyanIyengar
Well out of the 7 times a 31st appears, 2 of them are holidays, Halloween and
New Year's Eve. I would have guessed that this might increase ticket issuance,
but I suppose it's possible it could have a negative impact as well.

~~~
ShabbyDoo
Do union-required holiday pay rates cause the PD to only staff for minimum
requirements? Perhaps the PD simply staffs at minimum levels because most
officers want those days off? Do holidays cause so many "real" crimes that the
police don't have time to write tickets?

~~~
DrStalker
In Australia long weekends usually trigger double demerits and an increase in
visible police, especially roadside breath testing. I don't know what it does
to the ticket rates but the effect on every driver I know personally is to be
extra cautious, which would reduce ticket rates.

~~~
bigiain
"which would reduce ticket rates"

Maybe, there's a possibility of increased ticket rates too due to the extra
driving km done by people "going away for the long weekend", and by drivers
being less experienced on the routes they're driving (compared to regular
commuter traffic who know exactly where the speed/redlight cameras are, and
are usually in heavy enough traffic to not be able to exceed the speed limit).

I wonder if there's local (NSW, Australia - for me) tickets-by-day day
available to analyse?

------
beder
A few comments on the stats themselves:

1\. It looks like the total number of tickets in 2009 and 2010 is about 10%
that of 2011. I'm guessing that there weren't actually ten times as many
tickets given in 2011, so either the data is incomplete (as the author
suggested), or there was a typo. If the data is incomplete, I'd suggest
normalizing to the 2011 totals; otherwise, the 3-year average doesn't make
much sense.

2\. The scale of the "normalized" difference graphs (showing "Actual -
Expected"). The formula given is

(actual - expected) / total * 1000 = normalized number

If this is the case, then since the scale goes to about +/- 5, the differences
are very small (less than 1% away from what you'd expect!). But from
eyeballing the data, that doesn't seem right.

In any case, a better scale might be to expect the data to be normally
distributed, and scale the differences to # of standard deviations. (See,
e.g.,
[http://en.wikipedia.org/wiki/Normal_distribution#Standard_de...](http://en.wikipedia.org/wiki/Normal_distribution#Standard_deviation_and_tolerance_intervals))

~~~
rpicard
1) Yes, there were far fewer tickets in the data for 2009 and 2010 than 2011.

2) The fact that the normalized numbers were so small was very unintuitive to
me at first too, but the important thing to realize is that in that formula,
you're dividing the _difference_ , not actual value for the given day, by the
_total number for the year_. When I first ran those numbers I was so confused
by the output. I was originally thinking that I'd normalize it by saying "X
percent of the total for that year," but since I was working with the
differences, and not the actual values, the numbers were too small a fraction.

Either that or I made some huge mistake in my logic...

WRT the use of standard deviations, like I said in the post, I'm not a
statistician, so I wasn't really sure what the canonical way of normalizing
data was. I pretty much just made one up. Thanks for pointing that out. I'll
look into using standard deviation for the next one. :)

~~~
beder
In that case, you should divide by "expected", so you get a percentage
difference for each day. (Normalizing by total for the year doesn't make
sense, since imagine that there were 300 days per month instead of 30 - your
numbers would be divided by 10 again, but the data you want to visualize would
stay the same.)

~~~
rpicard
Great point! Dividing by expected would have been a much nicer solution. I
think the method I used still works though, just not as nicely. Am I wrong?

~~~
Kaizyn
Z-Score is your friend for this kind of data normalization.

<http://en.wikipedia.org/wiki/Standard_score>

~~~
rpicard
Thanks. It looks like I need to do a little self-study in statistics.

------
michaelhoffman
It doesn't make sense to look at an averaged plot like this without error
bars. Just comparing 2009 and 2010 there seems to be a huge amount of
variation.

Even better than error bars would be box plots of the data corresponding to
each day of the month. You can easily produce such plots with R.

~~~
sesqu
It's also worth noting that the error bars on a given year won't be constant
for the month, due to the differences in expected citations.

------
redxaxder
It bothers me that in some months, the 27th is 3 days from the end of the
month, but in some months, the 27th is 4 days from the end. Could this mess up
the graph?

The 27th appears in every month, but the time remaining to meet your
(possible) quota would be different for different months.

~~~
paulhodge
I bet it would be interesting to have a graph of tickets over "number of days
until the end of current month"

~~~
rpicard
I agree! That might be a good reason for an update.

------
ck2
Just wait until they have to pay for all the drones.

They'll just task one to follow you until you make a ticketable driving
mistake.

~~~
wicker
The joke's on them. The ticket will go to the R&D department of the driverless
cab company.

------
casualaistudent
The data should have been labelled from the end of each month rather than from
the beginning - then there wouldn't be the anomaly on the 31st.

I.e. n[i] should be sum over all days d of tickets[d] where d is the i-th day
from the end of the month that d occurs in.

------
fnordfnordfnord
It would be interesting to see weekdays vs weekends. Houston works the
freeways pretty hard during the week, and not so much on the weekend. Or, at
least that's how it seems.

Also, re quotas, from my cop buddy (constable) "Of course there are no quotas,
that's ridiculous. What they do, is tell you to attend the meeting of the
commissioner's court; during which the commissioners bemoan the budget woes
and discuss which county employees may have to be laid off if things don't
improve, etc. etc." He said that usually inspires the troops, and nobody ever
says the word "quota."

------
Patient0
From my memories of statistics class:

the way an actual statistician would try to answer this question would be to
first describe a "null" hypothesis, called H0, then talk about the probability
of seeing this set results, or an even more extreme set, if H0 were in fact
the case.

H0: The police do not change their ticket collection strategies at the end of
the month

H1: They try to collect more tickets at the end of the month than the other
days of the month

H0 would describe some distribution for the numbers you are seeing. This is
where you can put all of your assumptions about how things are - so that you
ultimately end up with a model that generates a certain distribution.

Under this model, there's going to be a certain probability of seeing this
result, or a more extreme result.

If that probability is less than, say, 5%, then this is saying that you'd only
have a 5% chance of seeing these numbers given that there isn't in fact any
conspiracy to collect more tickets.

In such a case, you might then "reject" the null hypothesis in favour of H1.

If the probability was higher than 5%, you might say that there is no
"significant" evidence in favour of H1.

------
fenfir
Did anyone else notice the bumps on the 7th, 14th, 21st and 28th? I'd be
curious to see the distribution of days of the week over those days.

------
roomnoise
It could be that they have to get in the paperwork before the 31st so it can
be manually processed. So the last day to them would be a day or two before.
Also the beginning of the month makes sense because you'd want to dole out a
bunch in the first week so you could coast a little in the middle.

~~~
rpicard
That was my first thought, but then I think we'd see a larger drop on the 30th
as well, since that's the last day of the month sometimes too. Someone
suggested doing a graph with "days until end of month" on the X-axis, which
might be interesting.

------
lutorm
You need to compare the variations compared to the expected variation for a
Poisson distribution with the observed rate to judge whether or not they are
significant or likely arise from noise. Just looking at the deviations in
absolute number isn't telling you anything.

------
tomrod
Good analysis. Let me comment that it would be improved by investigating
quarterly or yearly quota, or controlling for budget deficits (e.g. if 2010
has a large expected budget deficit, likely in September 2009 tickets begin
being written with greater frequency).

~~~
ShabbyDoo
It would be interesting to ask the Baltimore PD what incentives are given to
officers to write speeding tickets. [I bet they claim none] Then, ask some
cops on the street what the real incentive system is like. From there, use the
data to see how cops respond to both the real and official incentives.

I wonder if ticket writing happens when there's nothing better to do. It would
be interesting to see the correlation between arrests for "real" stuff and
traffic tickets. Are there differences in pay rates based on days/times
worked? Do more tickets get written when rates are low vs. 2.5x pay on a
holiday? Does the PD only staff for necessities when rates are higher or when
officers most desire time off?

I'm surprised I haven't read something yet about Stephen Levitt looking at
this data. This provides such incredible insight into the impact of incentives
and lets us compare the probability that the official incentive policy is
perceived as accurate by the officers vs. the likely real but unwritten
incentives.

~~~
jeffool
It'd probably be better to first ask cops, and then ask officials. If cops
give examples, and officials say none, then you can ask officials about the
examples cops have. But ask them plainly first. Give them a chance to be
honest with you. Or hang themselves.

------
jeaguilar
I'd be interested to see what the distribution of automated tickets is
compared to police-issued tickets. The automated tickets provide a baseline of
random lawlessness. Do police issue more tickets on the same days that more
automated tickets are issued?

------
btilly
There is a strong weekly cycle to human behavior. I would suggest starting by
measuring that cycle, then subtracting it out of the data before computing
your day of the month distribution.

------
ekianjo
That was a very poor analysis. I am not a statistician either but I have
experience in analysing large pools of data, and you would typically look at a
number of other factors, such as day of the week, impact of weekends, impact
of seasons, error range, standard deviation in order to get a hint of what is
out of the average or not. Just plotting data and comparing bars does not
quality, even remotely, as an analysis.

~~~
shadowmint
On the other hand when you apply any non mathematical (ie. not standard
deviation, etc) segmentation on the dataset you are _artificially_ fitting the
data.

The correct approach would be to search for patterns over time in the dataset
(frequency analysis) and see what turns up.

 _not_ arbitrary segment the data into week sized blocks because you 'think'
there might be a data pattern in there.

I see this sort of 'inspiration based' analysis in web analytics all the time,
and it's complete nonsense.

Look for patterns in the data, don't look at the world and try to fit the data
to it. You'll end up with stupid and statistically invalid results.

~~~
ekianjo
I see your point but I do not fully agree with your view. There are so many
times where we see patterns, correlations emerging from the data, which make
no sense whatsoever. Statistical correlations can happen by chance (by
definition) and something sticking out does not mean that something is
happening. You would need to repeat the analysis on different pools of data to
prove that there is indeed something you are missing.

By looking at established segments, at least you know where to look for things
that MAY make sense, and then derive hypothesis.

If not, you end with bullcrap analysis we see every single day in literature
where researchers find a "link between vegetarians and personality disorder"
or things like that, without even probing the fact such links may be
coincidental at best.

~~~
shadowmint
The problem is that mathematically when you partition your data into sub-set
and analyse those subsets you are no longer analysing the original data set.

Your data is now composed of the original set of data [n1, n2, n3... nn] + a
_NEW DATA SET_ [s1, s2, s3 ... sn] which is your selection criteria.

This is often masked by the fact that your selection data set is a function
f(x) ("we'll just split them up by week and partition by gender"), so it
doesn't look like your actually combining two dataset, but _you are_.

This new set of data S, biases the result of the analysis, and the more
complex it is, the closer you are to over-fitting to find the 'golden
correlation' you're looking for in your data.

Humans are pattern matching machines. We see patterns in random noise, and
hear voices in random tones. It's _easy_ for us to see patterns (or what we
think are patterns) in data sets, but its _much harder_ to statistically
substantiate those patterns as distinct from randomness.

It's provable, I imagine, that any data pattern can be found in a data set if
we have sufficiently complex selection criteria for the samples, and
sufficiently random raw data.

 _dont do this_

If you do, at the very least acknowledge that you have selectively modified
the original dataset and specifically test your partitioning rules.

Its much, much better to search the dataset for a target pattern, and then
investigate the partitioning rules that your search matches against (because
you can examine the partitioning rules for complexity, and data-content and
easily spot over-fitting).

------
brucehart
Has there been any research on average miles driven vs. the day of the month?
I would speculate that people who live paycheck to paycheck might drive more
at the beginning of the month (after payday when they can afford to fill up on
gas and run errands like buying groceries) than at the end of the month. I'm
not sure if this would have a significant impact on the data.

~~~
tankenmate
You also need to consider that most wage earners have more discretionary money
available at the end of the month. Some of this money will be spent on alcohol
derived entertainment, and a consequent increase in the number of DUIs. I
wonder if different types of violations are more common by time of month.

------
mahyarm
I think the explanation that monthly tickets, arrests and convictions are easy
numbers for managers to rate the 'productivity' of police is enough to explain
such behavior. If your anywhere carrer minded as a police officer, or your
getting shit from your manager, you'll probably initiate such behavior.

------
igorgue
This was an scandal in NYPD "pay the rent" they called it, related TAL story
with audio evidence of the existent of a quota:
[http://www.thisamericanlife.org/radio-
archives/episode/414/r...](http://www.thisamericanlife.org/radio-
archives/episode/414/right-to-remain-silent?act=2)

------
crntaylor
I'd like to see some error bars or confidence intervals on these plots. If you
just compare the data from two different years there seems to be a lot of
variation. Some error bars would help you decide if the variation between days
of the month is meaningful, or if it can be explained by noise.

------
efa
"It’s also worth noting that a quota system wouldn’t really explain the drop
on the 31st...".

Maybe on the 28th or so they panic and start issuing tickets. And by the 31st
they've made their quota and relax. They don't want to wait to the last day
and risk it.

Or maybe these results mean nothing (more likely)!

------
alainbryden
There's a question dedicated to this on Skeptics.SE:

[http://skeptics.stackexchange.com/questions/9578/do-
police-o...](http://skeptics.stackexchange.com/questions/9578/do-police-
officers-have-monthly-quotas-of-traffic-tickets-to-write).

In some areas, this has been confirmed.

~~~
lclarkmichalek
While it's interesting to read about the individual cases of unofficial
quotas, it'd be much more interesting to see a larger study done on multiple
police forces over a significant amount of time to see how widespread the
practice is. I only say this as this thread is full of anecdotal evidence, and
the skeptics question's responses seems to be much the same.

~~~
alainbryden
Well it's not necessarily anecdotal evidence. It's anecdotal in that it proves
that only some forces in some countries have a quota system, but that is
enough to answer the question: Some definitely do. It certainly doesn't answer
the question for all known police forces in all countries, but I don't think
that's a reasonable expectation.

------
arbuge
An end of month bump doesn't really surprise me... for an ex-cop's viewpoint
into how police officers operate, Dale Carson's book (Arrest-Proof Yourself)
is an illustrative read.

------
ShabbyDoo
So, does anyone here know any Baltimore cops? Could someone who works downtown
walk outside, find a cop, and ask? We're all speculating here.

------
daem0n
Also, the data could drop low on the 31st because not all months have 31 days
so when tickets are distributed evenly over months, that day does not easily
match up with 28 (being the definite last day of a month)...then 29, 30 and
finally 31 being the least used.

Could this be true?

~~~
rpicard
That was actually compensated for in the "Correcting for more frequent dates"
section. The first graph of that section shows what you would expect the
distribution to look like, since not all months have the same number of days.

The following graphs are based on variation from that expectation.

~~~
rhino42
But the analysis did not account for the variable number of weekend/weekday
days on each day of the month. I don't know how meaningful this is, but it
would be worth it to inspect this separately and then weigh it in if needed.

~~~
rpicard
Just from a brief look at a calendar for 2011 and 2010, it seems like the days
were pretty evenly spread out over the week.

------
dutchbrit
Would like to see this for meeting yearly quotas, so month by month.

