
Using FOIA Data and Unix to halve major source of parking tickets - tptacek
http://mchap.io/using-foia-data-and-unix-to-halve-major-source-of-parking-tickets.html
======
tptacek
Matt's schtick is automated, large-scale FOIA requesting; he obtains huge
collections of data from cities and then tries to do interesting stuff with
it. Here, he apparently managed to get all the tickets in Chicago for several
years running, and then used that data to fix the parking signs.

This, to me, is so neat.

~~~
bpchaps
Hey! Thanks for posting this, and the kind words.

Interesting note about getting data like this - Illinois FOIA allows a
requester to submit a SQL as part of their request.. so long as they know the
tables and columns within the database ;)

~~~
ir0nic
Now wouldn't it be funny if you got them to drop a table this way?

~~~
arenaninja
Wouldn't you assign readonly permissions to the account used to fulfill these
requests?

~~~
johndavidback
Ideally, but until someone asks for it the first time, they probably don't
even have an account for the purpose. So, an irritated DBA who has myriad
other backlog tasks probably runs it on a copy if smart, and production if
lazy or in a hurry. Would not surprise me in the slightest if an important
table was dropped in this way.

------
floatrock
He cost the city $60k in revenue by asking them to clear up confusing signage.

That's some small-government activism I can get behind!

~~~
jcims
Given that many of the tickets were issued because this is a taxi stand, it's
also possible that revenues increased for the taxi company in that area.

------
hpincket
This is great! I often request records but have never come up with a great use
case!

Under Florida public record law, source code produced by state employees is,
in very narrow circumstances, a non-exempt public record (the code can't
process sensitive data, etc.). I'm considering a future endeavor where I
periodically request the code to such projects until the I.T. department
decides it's worth the effort to open source it.

I like to think this is a step towards consolidating publicly funded code and
reducing duplicate effort. Ahh, imagine making a pull request to your city's
website! But I'm getting ahead of myself...

~~~
nappy
Interesting - any idea what the other circumstance are? Is there a statute for
this? Have you considered requesting some specific source code and publishing
it yourself? Might make sense to start small here.

I have a lot of experience in making public records requests and would be
happy to help.

------
mehrdadn
This is awesome. Question though: how is producing license plate data like
this not a disallowed privacy invasion? It seems like you could totally track
who's parking where and potentially do nasty stuff, if you know (say) someone
well-off whom you don't like and who doesn't seem to mind getting tickets on a
regular basis.

~~~
rlpb
Criminal proceedings are usually public in most countries. Where do parking
tickets fall? Even if not technically criminal, some might, for the same
underlying reasons, consider it acceptable for "someone well-off whom you
don't like and who doesn't seem to mind getting tickets on a regular basis" to
have this illegal behavior on the public record. "Don't want your name
tarnished? Don't park illegally."

[Edit: on the other hand, if the ticket is unfair (eg. confusing signage as in
this example), then you have a valid point; I just wanted to point out the
other side of the coin]

~~~
mehrdadn
I was asking about the legality, not the morality. If it's not technically
criminal then that's all that matters.

Also, this is completely missing the point here:

> Don't want your name tarnished? Don't park illegally.

It's not about reputation, it's about privacy -- and safety.

~~~
mikekchar
Don't know about where you are, but in Ontario Canada (the only place I've
gotten in enough trouble to care about the difference ;-) ), there is a
criminal code and the highway traffic act. There are illegal things that you
can do in your car that may be in either or both areas.

"Criminal" means in the criminal code, but both are illegal. I think that you
don't have a right to privacy for either because it obfuscates the application
of the law. Indeed, the Japanese government can query the Ontario government
to get a list of transgressions that you had while driving a car in Ontario (I
know this because they did so when I converted my driver's license to a
Japanese one -- and they didn't need my consent).

I think OP's use of the term "criminal" is a bit loose, but I would be
surprised if you have any right to privacy for a a fine levied due to a legal
infraction. Whether or not you _should_ have a right to privacy is a
completely different conversation...

Aside: It was important to me because many years ago I inadvertently drove
while suspended. I had an unpaid ticket that I had forgotten about and my
license was suspended. The suspension got lost in the mail (first a postal
strike and then the delivery person put my mail in the wrong "super box" \-- I
eventually got it months later). When I was first getting my visa for Japan, I
needed to find out if this was a criminal offence or a highway traffic act
offence.

------
btrettel
Would be nice to use the same skills to help reduce cars illegally parked in
the bike lane. Identify areas which cyclists commonly complain about that
(e.g., to the city) and encourage them to put up better signs?

~~~
jimmaswell
Could also signal the area is starved for parking, so the lane should be
removed and bicyclists should just ride in the road for those stretches. Small
inconvenience to the bicyclists who still get to use the road, big win for the
drivers.

~~~
thatjsguy
Good lord, just park and walk. Or use that bike lane. Cities have an
overabundance of expensive parking [1], which does nothing but waste space and
act as giant thermal batteries. Everyone loves complaining about traffic, but
we don’t seem to have the political will to do something about it (namely:
reduce our dependence on cars). Cities should work to reduce their dependence
on cars, not increase it at the expense of cyclists and pedestrians, who are
actually the life blood of cities (think about how many pedestrians stop into
small business compared to drivers speeding by at 35 mi/h).

[1] [https://www.strongtowns.org/journal/more-evidence-that-we-
ha...](https://www.strongtowns.org/journal/more-evidence-that-we-have-too-
much-parking)

~~~
mmirate
And how do _you_ travel more than a couple miles and arrive on-time, during
the summer, wearing businesslike attire, without smelling like a cyclist upon
arrival?

~~~
phyzome
Don't wear businesslike attire when bicycling in hot weather. You pack it,
instead, and ride at a comfortable pace instead of doing speed-racer stuff.
Pack a thermos of ice water to drink from occasionally and you won't even
break a sweat.

Or use an e-bike so you don't have to pedal as hard.

I live in the Boston area and I see a ton of people in business attire riding
bikes.

~~~
mmirate
> Boston area

That explains. Contrast the climates of Georgia or Florida.

During the summers here, just _standing outdooors in the sunlight_ wearing
anything heavier than beachwear, is inadvisable at best.

~~~
btrettel
I commute by bike daily in Austin, TX. In the summer I bring a change of
clothes and wear a damp athletic shirt and damp headwrap. My commute is
relatively short as well. I don't shower when I get to work and most of the
time I smell perfectly fine. (Confirmed by people who have no problem
contradicting me.) Yes, you'll sweat, but nowhere near as much as most people
think you will if you plan it well.

Humidity would make this more difficult. Austin's not as humid as Houston, but
it's not a desert either.

~~~
phyzome
Boston gets pretty humid in the summer. Drinking ice water towards the end of
the ride works really well.

------
pasbesoin
Reminds me of working with free-form, manual entry order detail information,
in a former life.

Hundreds of thousands of records a month. I ended up importing them into
Excel(1) and then using... what was that called? An MS/Windows library that
came with IE 5 and/or a few other things, that provided regex support (with a
few quirks) that was accessible via VBA.

The point was, I could programmatically mine it -- including regex pattern
matching and replacement of and within cell contents -- while also having a
flexible UI within which to find and handle one-off cases. When the one-off's
demonstrated a repeating pattern, I could quickly iterate to add that to the
programmatic mining logic.

This included adding color cueing for items of particular interest, manual
follow-up. Excel's sorting capabilities to bring potentially related instances
into visually displayed groups. And the like.

It ended up working quite well. I might have preferred something else to VBA,
and I did use Perl and other stuff, elsewhere (something that also gave me
both power and the flexibility to rapidly iterate).

But the point is, with such data, I found it very useful to combine regex and
rapid programmatic manipulation, together with a good visual interface
(including visual cues, the ability to comment upon instances -- Excel cell-
level comments -- etc.) and manual manipulation.

As a final aside, the extensive set of Excel keyboard shortcuts greatly aided
in rapidly and effectively navigating and massaging the imported data.

\--

1\. This was back when Excel had... I think it was a 64K (or a bit less) limit
on the number of rows in a sheet.

P.S. I tended to retain the originally imported data in its columns, and to
produce my mining of it in other columns. That way, I could always and
immediately see what I started with, for any particular record. (And, if
things visually started to be "too many columns", well, Excel lets you hide a
range of columns from the view. As one example of how its features really
helped, on the visual front while doing this work.)

I still had to learn and allow for some quirks Excel exhibited with respect to
importing text data. That included making sure the cells/columns being
imported into carried the correct/needed formatting designation _before_
importing into them (usually, "Text").

~~~
pasbesoin
I'm pretty sure what I was thinking of is Windows Scripting Host (WSH). At the
time, I picked it up as a part of IE 5.

[https://en.wikipedia.org/wiki/Windows_Script_Host](https://en.wikipedia.org/wiki/Windows_Script_Host)

------
jklein11
Are there any resources explaining the FOIA process? I'm not sure what types
of information is available, what it can be used for, etc and am always amazed
with the type of information people are able to get the government to hand
over.

~~~
lucb1e
I had this question as well and asked someone who did a FOIA request. There is
no listing. It's just that, if you notice or can logically conclude that a
certain kind of data exists, you can request it. In this example, it is fairly
logical that the city has a record of parking tickets that were written out,
and so the author requested them.

I'm surprised he asked after license plates, though. I don't know if that is
different in the USA, but in Europe that certainly wouldn't fly because of
privacy. I wouldn't even have asked because I shouldn't want to have such
data. Perhaps one could get an anonymized version to be able to correlate how
often a certain plate got a ticket, but not which plate that was. Anyway, the
general concept of a FOIA request is the same. (Edit: Oh, someone else
remarked this as well:
[https://news.ycombinator.com/item?id=17754396](https://news.ycombinator.com/item?id=17754396))

------
Bobbleoxs
I too feel I couldn't pass the $190m cost in the first place. Granted, I can
see where the cost ramps up as explained by @morei. Could someone explain
whether this is for the 10-year contract or a license of some sort for each
year?

If it is annually, they got 17m tickets over 7 years so for 10 years, assuming
they issue just over 19m tickets, that means each parking ticket needs to be
at least $10 to cover the cost, even at $100 per ticket, IBM is banking on 10%
share? That seems excessive to me but I never worked in government so could
someone enlighten me on this?

By any chance there's a conflict of interest for government to be willing to
make improvement and cut down parking tickets or any other similar source of
income? Or maybe that's what public audit is for?

------
lettergram
I did something similar for universities, to help students select their
courses:

[https://easy-a.net/](https://easy-a.net/)

I wrote a blog post about it, because it requires a ton of work to get FOIA
requested data - this I'm assuming was done in the same painstaking way:

[https://austingwalters.com/foia-
requesting-100-universities/](https://austingwalters.com/foia-
requesting-100-universities/)

I give this props. I'm sure it required a ton of work

------
kioleanu
Wow, great stuff!

Did you give more thought into the address cleaning bit? Or does anyone have
an idea how to go about transforming mangled addresses into coordinates?

I have a problem that's been bothering me for months, similar to what you have
here: people from an emergency service call-center are inputting the addresses
of the emergencies. For emergencies that happen on the public domain, there is
often not a specific address, but rather names of landmarks. Something like
"Street StreetName / Opposite Train Station Y", which can be written like "st
stName / opp tr st y" or some other infinite variations.

I don't have any after-data to corroborate, but I do have previous instances
where the operator inputted the same address better. If I can extract the
correct landmarks, I think I can do a Google Places search for them, with a
cleaned query, like "Store Amazon, Best Street, Ohio" to get coordinates that
can fall into an acceptable area.

PS: in the example you gave with Lake Shore Drive, I think you could easily
correct the names with an algorithm based on the Levenshtein distance

~~~
bpchaps
I've put a LOT of thought into address cleaning! And yep - levenstein distance
seems to be the way to go.

My current stack is:

1\. Send addresses to [https://smartystreets.com/](https://smartystreets.com/)
\- They gave me a year's worth of unlimited geocoding for free. They also
tokenize the addresses, but I had about a 50% success rate with them.

2\. Tokenization raw addresses with
[https://github.com/datamade/usaddress](https://github.com/datamade/usaddress).

3\. Use a normalized levenstein distance algo to get ratio of difference.

4\. Compare all of the addresses' levenstein distances with each other.

5\. Apply logistical regression/gradient ascent algo to tickets by chaining
heavilytypo'd addresses to less-typo'd and eventually to a static list of
verified-correct addresses.

It works surprisingly well, but there are still a lot of problems that can't
easily be solved:

1\. Street types (st/ave/blvd/etc) are missing. So, when two addresses have
the same street name, it's difficult to pair the two. It's still possible with
some probability stuffs and matching the ticketers' paths to the nearest
street.

2\. Addresses have a LOT of one-off situations. For example, there's a street
name called "Avenue A". The street name here is "Avenue", and the street type
(usually st/ave/etc) is "A".

3\. Lots of four letter streets make levenstein distance very difficult.

Glad you enjoyed it!

~~~
kioleanu
I did enjoy it, yes, and I'm following your idea for my town also (it's open
data here). Lucky for me, it's a little bit prettier (I think they have
autocomplete on their devices for the addresses).

I already have some preliminary data - in a city with 350k inhabitants, they
gave 150k fines last year, totaling 2.5 mil EUR. I can't wait to search for
the hotspots

~~~
bpchaps
Let me know how it goes!

------
exikyut
What I'm most impressed about is that the author was able to include the FOIA
data.

I guess I just learned I half expected each person who wanted FOIA data to
have to request it themselves, for their own personal use.

In this case I can see reusing this for _interesting_ reasons (the plates in
the .txt.gz have not been removed), so...

------
flaxton
404 error? (page not found)

~~~
bpchaps
Should be better now. Let me know if it persists!

~~~
ReverseCold
Nope, still a Google 404 page.

~~~
Rick-Butler
Consider putting the site behind Cloudflare or similar content cache provider
maybe?

------
Anthony-G
I really enjoyed this article – not only because of the content but the
distraction-free layout makes it a pleasure to read. It’s rare that I come
across a site using such minimal and effective graphic design. As a bonus, the
site loads quickly and doesn’t rely on a stream of third-party JavaScript
files or other web resources. For a first blog post, I’m impressed. If I ever
get around to publishing my own blog, I know what to aim for. Keep up the good
work. The web needs more of this!

The footer indicates that the web page was generated using bashblog [1] –
looks like it might be worth checking out.

[1]
[https://github.com/cfenollosa/bashblog](https://github.com/cfenollosa/bashblog)

------
amaccuish
$190 million?! What does the architecture consist of? A database, some forms
and some integrations/api? I'm 90% sure they could have done that with free
software and a good support contract with a UNIX provider for far less :/

~~~
morei
This is a very common reaction that shows a lack of experience in dealing with
scale IT systems.

You're correct that a simple DB with some forms would be cheap.

But integration tends to be crazy expense. For this sort of system, other
things that also need to be covered: 1\. Billing integration. Including
changes to billing codes, bill (fine) printing, testing. 2\. Audit
integration. Because whenever money is handled, audit follows. 3\. Customer
support integration. Including UI for customer service, training, testing.
This is often a very complex item because customer service already have a
zillion systems they have to use and their training requirements are ongoing
and expensive, so they want you to integrate with their existing systems
instead of giving them a brand new thing, and integrate with their existing
training processes, etc etc. 4\. Integrate with all those hand-held readers.
inc vendor compliance, testing etc. 5\. Contract management. You have a
contract with the government and they'd like to know that you did what you
claim you did. So there's teams of people to deal with on an ongoing basis.
6\. Project management. There's more than one person working on this, and a
lot of complex integration requiring changes in other systems => extensive
project management. 7\. Ongoing changes to requirements, often conflicting.
All the integration points above are moving targets, so expect that they'll
have to be re-done a few times both before and after launch. 8\. Arse
covering. You now have a large contract with the US Government. You will sued
and they will get sued (typically by whomever didn't win the contract). Vast
amounts of documentation covering _everything_, including documenting the
process by which documents are written => tech writers galore, plus lawyers
plus lawyers.

Honestly, this is barely scratching the surface. I haven't even touched the
(expensive) work before the contract is even signed.

$190M doesn't go very far!

~~~
masklinn
FWIW in case you wanted to format the list HN does not really support markup,
so you need a blank line between list items so they get displayed as separate
paragraphs rather than folded into a single one.

------
ccleve
I know this intersection. It's at the corner of State and Division, in the
heart of the Rush Street neighborhood.

The likely reason there are many tickets there is that there are many bars
there, and great crowds of people who have had a bit too much to drink. There
are also great crowds of cops there every weekend.

Without looking at the data, I'd expect that many of the tickets are getting
written in the middle of the night, when people are too inebriated, or too
distracted, to read signs carefully.

~~~
snappieT
Too inebriated to read a sign but safe enough to drive a metal box at high
speeds?

Not saying the signage was clear, but that is a very very weak excuse to not
understand them.

~~~
ccleve
Explanation != excuse.

------
rootsudo
Alternatively, if you're a parking meter in Chicago, you now now how to meet
quota and can full in the rest of your day with meaningful things.

------
vectorEQ
very interesting example of data analysis and it's practical implications.
awsome blog post, and you won't hear me say that quickly.

"This'll be my first blog post on the internet, ever. Hopefully it's
interesting and accurate. Please point out any mistakes if you see any!"

KEEP IT UP MATT! and data munging, not sure if it's a word, but it sounds nice
:D

------
punnerud
I’m thinking about a similar post. After a long run between different
departments in Norway I got out all historic train delays and all form/e-mail
contact with the rail company, including the number of people getting money
back because of the delays. What interested me most in this article was
mawk/AWK.

------
NegativeK
Matt/bpchaps, have you shared the results with the aldermen? I'd love to hear
about their reactions.

~~~
bpchaps
Kind of, but nothing super formal. I've contacted a few of the aldermen's
offices, but never the aldermen themselves.

Though, during last mayoral election, some of the mayoral candidates wanted to
use parking tickets as part of their campaign, and through some connections I
found my way into Bob Fioretti's campaign manager's office to discuss parking
tickets, alongside an ex-candidate, Amara Enyia's campaign manager. They were
super, super interested - Fioretti's CM calling the work "fucking golden".
But.. they both went silent after that, despite Fioretti started using parking
tickets as a major part of his campaign. Go figure.

There's a lot more to that story - I'll end up write about sometime later :).

------
donjh
This is great. I wonder if using this data patterns could be found showing
when and where tickets are written. Would be interesting to know when and how
often certain areas are checked for illegal parking, if such a pattern exists.

------
TheKarateKid
This is great, but I can only imagine there being hidden pushback by
government agencies if these requests to fix things became more common.

Revenue from parking tickets is easy money for a violation that is generally
harmless.

~~~
chapium
Honestly I dont expect any pushback for nefarious reasons. Its just as easy to
ignore a problem thats making the city money, rather than defend for it.

------
ekr
On the one hand, this is a really neat hack. On the other hand, given the
current situation with cars, pollution in cities (air quality, noise-level
etc), global warming, I would personally think twice before doing something
that would make car driving easier and more pleasant, as that simply
incentivizes more driving. (getting a ticket might be a small inconvenience,
but it does contribute to creating a slight aversion to driving).

Sure, this may seem a bit "evil", and the better solution is reducing negative
externalities through taxing, which is a more transparent and ethical
solution, but most of us don't have that level of power and influence within
local councils and governments.

~~~
fredley
American cities are built for cars - there's no way around that issue. To
navigate cities you _need_ some sort of powered transportation vehicle,
outside of a few just-about-pedestrian-navigable areas downtown.

Don't punish cars, promote electric cars, scooters, bikes, and other ways of
getting around. Dedicate more parking to electric-only spots (with chargers),
bike racks and bike share docks, and scooter parks.

------
throwawayjava
How much did the FOIA request cost?

~~~
bpchaps
It was free. I've never had to pay for any information coming from an Illinois
FOIA request.

~~~
dannyw
Have you thought about distributing the data, for accessibility and
reproducibility? As well as save the taxpayers a little bit of dough in case
someone else wants to request the same data.

edit: my bad, didn't see it at the end

~~~
kingbirdy
The dataset is linked to at the end of the article
([http://freeourinfo.com/tickets/](http://freeourinfo.com/tickets/))

------
jamisteven
This is incredible, thanks for sharing.

------
mxuribe
This is awesome!

------
mrfusion
Is anyone else bothered that they paid 190 million for a database and
interface to it?

~~~
baud147258
See that part of the discussion:

[https://news.ycombinator.com/item?id=17754670](https://news.ycombinator.com/item?id=17754670)

There's a few good explanations there.

