
Mapping Motor Vehicle Collisions in New York City - lil_tee
https://toddwschneider.com/posts/nyc-motor-vehicle-collisions-map/
======
takk309
Nice work, I am glad that there is a paragraph talking about exposure. Crash
trends based strictly on total number of crashes are easy to predict just
based on where there is more traffic. Using crashes per vehicle mile traveled
for road segments or crashes per entering vehicle for intersections can help
tease out trends. Controlling for severity is also important.

When I do a crash analysis for a city, one of the tasks I do regularly for my
job, I generate a crash rate and severity index for each intersection. The
severity index is basically a weighted average based on severity, non-
injury=1, minor injury=3, and severe injury or fatality=8. The crash rate and
severity index are divided to create a Severity Rate. While not perfect or
statistically valid, it does help identify trends. Also, I am in a rural state
so it is rare that there are enough crashes to make any statistically valid
conclusions.

~~~
mikeash
What’s the basis for the severity weights? I’d expect the weights to be way
more spread out, like 1/10/100/1000\. It would definitely not be a good trade
off to eliminate nine non-injury crashes at the cost of one additional
fatality. But I certainly could be missing something about this sort of
evaluation.

~~~
dsfyu404ed
Fatalities are a tiny minority of crashes and aren't really interesting to
study because usually you basically wind up studying the behavior of drunk
people and people who don't wear seat-belts and if you filter those out
there's not much data left making meaningful conclusions hard to draw. Fatal
accidents are often just normal accidents with a couple aggravating variables
on top (e.g. person rear ends semi-truck instead of normal truck or person
gets in minor accident but not wearing seat-belt) so it doesn't make sense to
fixate one them. Anything that reduces normal crashes by some amount will also
affect fatal crashes.

~~~
takk309
Drunk drivers and people that don't wear seat belts are still worth reviewing.
While there are rarely engineering solutions to the fatalities that result, it
can help inform education programs and initiatives. Amazingly, buckle-up and
don't drive drunk advertising can make a difference.

~~~
apendleton
They absolutely are, but are rare enough that it's difficult to reach
statistical significance when talking in the aggregate. That a particular part
of town went from one fatality one year to zero fatalities the next year is
probably not evidence of the success of any particular safety-related policy
intervention, it's just noise. Studying all crashes provides a proxy that
hopefully helps decrease the odds that the fatal ones will occur will making
it possible to make robust, data-driven claims about success or failure.

~~~
takk309
On a project I am currently working on, we saw pedestrian fatalities shift
from 7 to 13 in consecutive years. it is a nearly 100% increase but like you
said, it is just noise. This is in a city with around 100,000 residence.
Convincing politicians that it is just noise is a whole different story.

------
clhenrick
I've worked extensively with this dataset on a similar project,
[http://crashmapper.org](http://crashmapper.org), and through that process
found that the data is extremely error prone. Perhaps 20% of the collisions
recorded are not geocoded (e.g.lack lat, long coordinates) and don't contain
other location information such as street, cross street, and zip code that
could be used to geocode them. It appears that some precincts of the NYPD do a
better job at recording a crash location then others. Even more of the data
lacks values for "contributing factors" so it seems difficult to use as a
metric for analysis. Often there is a mismatch between the total number of
persons injured or killed and the number of pedestrians, cyclists, or
motorists injured or killed. Furthermore, whomever maintains this dataset will
periodically go back in time and update it seemingly at random, editing
existing data or adding new data, potentially months or years back in time.
Often it appears to be that the data maintainer is changing values for fields
such as the number of pedestrians, cyclists, motorists injured or killed.
Presumably this is because more information surfaced about an incident at a
later point in time and the city must go back and update it. However this can
result in stats from the data not aligning with the NYPD's or DOT's official
stats from a previous year. I would advise anyone to keep these facts in mind
if trying to use the data for analysis and policy recommendations, such is
open data.

------
xyzwave
Having done something similar for the Long Beach, CA area in college, one of
the most interesting takeaways was the relative spatial distribution between
fatal and non-fatal accidents.

Non-fatal accidents clearly clustered around high traffic areas, but fatal
accidents didn’t reveal the same clustering. Instead they appeared to be
uniformly distributed across the city.

I’m sure there is an explanation in this, and this was only 10 years data for
a single city, but it always felt a little spooky that these accidents were
equally likely to happen anywhere (though most likely later in the night).

~~~
icsllaf
High traffic areas tend to move traffic much slower than lower density areas.
Getting into a fatal traffic crash when going 15 miiles per hour in stop and
go traffic is much harder than when you lose control of a car when going 50 on
an empty street.

------
jermaustin1
I'm not sure what constitutes a "collision", but in 2015, I lived on Lexington
between 121 and 122 and saw the investigation of a Hit and Run of a homeless
man. I talked to a couple of the witnesses who saw it happen.

This incident was at Lexington and 123rd. In the data, I do not see this
incident.

------
karussell
The question is if the highlighted area are really more dangerous or if there
are just more visitors. Shouldn't one take into account the traffic counts?

BTW: there is similar (open) data for Germany:
[https://unfallatlas.statistikportal.de/](https://unfallatlas.statistikportal.de/)
(It clearly shows the problem I mentioned)

Update: sorry, it seems that this issue is already discussed in this thread

~~~
djtriptych
Yup. This is pretty much a map of population density in NYC.

------
jdlyga
Lots of crashes in Hell's Kitchen. That area is full of people going out to
bars and restaurants, tiny sidewalks, and lots of impatient drivers trying to
get through Manhattan to New Jersey.

------
bonyt
The map of total deaths includes a significant blip on the west side near Pier
40 and the Holland Tunnel, which I think is from the 2017 truck attack.

[https://en.wikipedia.org/wiki/2017_New_York_City_truck_attac...](https://en.wikipedia.org/wiki/2017_New_York_City_truck_attack)

Map: [https://imgur.com/a/jNbOv7W](https://imgur.com/a/jNbOv7W)

~~~
kevin_thibedeau
That area has a higher rate of incidents in general because of Brooklynites
trying to avoid the excessive toll on Verrazano when leaving the city.

------
ryeguy_24
I would bet that the shadow/light patterns on Roosevelt Avenue & 94th Street,
Queens cause significant visual distractions to drivers and pedestrians.

------
dsfyu404ed
Drivers mostly hit other things when there's too many things demanding their
attention (poor visibility + difficult left turn + busy traffic + bikes +
pedestrians = high risk of accidents) so this is probably just a heat map of
intersections that are the busiest (in terms of things going on, not
necessarily throughput).

I'd like to see a month by month heat map.

~~~
magduf
>Drivers mostly hit other things when there's too many things demanding their
attention

And this is exactly why humans shouldn't be driving. Hopefully human driving
will be banned before too long, as machines can do it so much better.

------
brianbreslin
I would love to pay the author to do this for my city. I'm fairly certain I
could get the local govt to pay up for this.

------
slowhand09
Very impressive!

------
skizm
[https://xkcd.com/1138/](https://xkcd.com/1138/)

~~~
InitialLastName
To be fair, they mention that in the first paragraph after they introduce what
the data is actually doing.

There is still a value to looking at a population-correlated heatmap in order
to draw conclusions from the discrepancies between the two.

