Love this. I took a few minutes and converted your image into an Illustrator/vector file. Not 100% true to your original, but pretty good. Infinitely scalable so people can print it if they want.
The trouble with making a higher res image is that when you make that lat/lon buckets smaller, you have fewer samples in each one, so the image gets noisier. For the best possible image you'd want to download all the years of data.
The process of making it was quite simple. I zeroed a 2d array of integers, then took all the pickup/dropoff points and incremented the nearest cell. The pixel values are based on the logarithm of the counts, since otherwise everything outside midtown would be pretty much black.
There are some artifacts, like the thin vertical line down the east river. I think that was because of how the data was rounded, i.e. the number of unique longitude values that map to a certain image column.
I wrote this myself with a few hundred lines of C++, though I'm sure there's GIS software out there that will do all this for you with a few clicks.
Can you explain why there is a line going to the airport? Shouldn't the airport be more of an island of light (since I imagine most people aren't getting picked up / dropped off a half mile from the airport)?
Also did you overlay it onto a map? How did you get the angled effect if it's just a grid?
> Can you explain why there is a line going to the airport?
I assume it is because there is a fixed fare to/from JFK so drivers have little incentive to start/stop the meter at the exact pickup/dropoff location.
> Also did you overlay it onto a map?
No. If taxis did not pick up or drop off people on some street, that street does not appear. For example there is an area downtown where there are streets but they have had security barriers since 9/11 thus no taxis.
> How did you get the angled effect if it's just a grid?
None of NYCs grids are exactly north/south/east/west aligned.
If you are interested in what the 2013 data looks like at full resolution, I have a web map of it (https://www.mapbox.com/blog/vector-density/). Haven't updated with the new data yet.
You don't even need dedicated mapping programs. It's just latitudes and longitudes, so any 2D plotting program would be sufficient (as long as the data can fit into memory!).
Although the country boundaries aren't strictly necessary. Since there appears to be a demand for generating these maps, I'll work on a tutorial for this NYC data set.
I think so...? There was a free period and now I'm getting charged about $1.50 a month (though on my credit card, it's billed as Google AdWords...). However, I just checked the actual invoice and I don't see a line item for the taxi data, just for the 40GB of other data that I have online. The taxi data is about 90GB.
ok so just so i understand (i am new to big query) - even though I am not hosting this dataset (and thus not being charged) - I can, for free, query 1TB worth of queries for free using big query ?
Whoever owns Storage, NYC TLC in this case, pays a minimal fee of 2 cents per GB per month for storage. This includes multiple factors of replication/durability.
Whoever is doing the querying - this can be you - pays 5 dollars per TB queried. First 1TB per month is free.
I'm not able to review the data yet but I wonder if TLC took in any of the feedback to better anonymize the data. [0]
Also, I would love to see an analysis of whether traffic is actually getting worse compared to that of last year. This claim was made by mayor de Blasio as a reason to cap Uber rides.
Local governments should be demanding detailed data from companies like Uber in exchange for legalization, even raw data on locations of available cars. They have the leverage to get it now but they're wasting the chance. This data could eventually be used to avoid a true monopoly.
we already have enough hoops for companies to jump through to prevent competition so we certainly do not need any more. New York is a perfect example were regulation and so contorted the market you can make money selling your permission to run the business to the point it might be more profitable than running the business.
From medallions to food cart and restaurant permits, regulation is keeping competition out while rewarding those who merely sit on permits and rent their use. It is nearly an identical situation to how badly patents are managed and rewarded
One of the reasons that taxis are licensed is that they're supposed to pick up everyone and anyone. Uber and Lyft don't seem to have that problem with race (AFAIK) but there are two other situations where people have trouble:
(1) small children, where you need a car seat (or two)
(2) people with disabilities - service animal, wheelchair, etc.
Analyzing the data would hopefully show whether people had to wait for 2 hours.
Also, as a substitute for race, it might be possible to see if certain areas are under-served or not served at all. Perhaps drivers are avoiding picking up in Harlem.
Anecdata: As a wheelchair user and frequent traveler, Uber has never ever failed me, but hailing a cab was nearly impossible (was, because at this point even if uber charged twice what a cab does, I'd still use them every time for the certainty that they'll pick me up).
The difference is, the city has the authority to crack down on it. I was just in NYC, and cabs are running a video in the back seat informing passengers that it's illegal for cabs not to pick you up for race/disability. The city doesn't legally have that kind of leverage over Uber.
Leverage that is rarely and reluctantly used, and that no one reasonably expects to be used, and that does not translate into any observable consequences in the lives of black people attempting to get a ride.
I've once heard a quip that "You have no constitutional right to eat at a restaurant, but you do have one to a speedy trial -- but which one feels more secure?"
The city has "leverage" to stamp out discrimination against black people wanting a cab ride, but "no leverage" to stamp out discrimination against black people on Uber -- yet which one will more reliably secure a ride?
NYC did exactly this. The city threatened all rideshare companies with a cap on their driver growth rate, which Uber successfully "defeated" by giving up access to detailed data to the city.
Uber hailed this as a victory - but the way I see it, when your victory means "maintenance of the status quo while conceding your data to a third party" it probably wasn't actually a victory.
This is a disturbing dataset, in that it seems straightforward to extract personal data from it. Consider the information revealed by a taxi ride between a personal address and a workplace, a sensitive location, or another personal address...
This kind of data was previously an issue with Uber, too.
http://i.imgur.com/ov6K6mt.jpg