A couple cautions about drawing conclusions from this data though:
1. A trend in the outliers of a distribution does not imply the same trend in the mean values of the distribution. Although I'm sure there's an upward trend in the mean values, it's not good science to encourage people to jump to that second conclusion. (Not without the supporting data at least.)
2. What about sampling bias? Intuitively, it seems like the reporting weather stations would not be uniformly distributed geographically, but would rather show a higher concentration near urban areas, which store and release extra heat. How large is this effect, and how could we correct for it?
For anyone interested, you can also get at the raw data without signing up for engima.io:
Your first point is very well taken - we also urge caution in drawing too many conclusions regarding an overall trend in mean temperatures. However, I think our conclusion - that we are having more hot outlier days now than ever before - is interesting in and of itself.
Your second point is an interesting one, and I too would be curious to look further into the correlation between proximity to a city and temperature. However, to look for outliers, we created a long-running distribution of seasonal temperatures for each station individually - so in some sense the map is already corrected for this effect. Each anomaly you see is an anomaly for that station alone - meaning if an urban station regularly gets higher temperatures than a rural one, it will take a proportionally higher temperature to trigger an anomaly on the former than on the latter.
Furthermore, NOAA has been good about getting good national distribution of these stations so it's less of a concern than you might think.
That said, urban stations may still show artifacts compared to rural ones, eg. when there is an extreme warm outlier, cities may be more likely to have another warm outlier the following day due to the heat storage effect you mention. I'm not sure.
"However, over the Northern Hemisphere land areas where urban heat islands are most apparent, both the trends of lower-tropospheric temperature and surface air temperature show no significant differences. In fact, the lower-tropospheric temperatures warm at a slightly greater rate over North America (about 0.28°C/decade using satellite data) than do the surface temperatures (0.27°C/decade), although again the difference is not statistically significant. "
Some GISS temperature data for example excludes urban stations. Classification by night lights in satellite images. These rural stations also show similar trends.
More at the ever resourceful myth-collecting Skeptical Science site. http://www.skepticalscience.com/urban-heat-island-effect.htm
In this particular case, they are using GHCN Daily data that does not include any correction for this effect:
The GHCN Monthly data does include such corrections:
The magnitude of the changes made are quite large compared to the effects being measured. They average to zero, but are bimodal centered around about +1F and -1F:
I think it's a brilliant visualization, but if we are to presume the corrections are necessary and correct, it would also be reasonable to question what conclusions can be drawn from an analysis of data that does not include such corrections. At the least, I think it would be interesting to see their analysis applied to the more rural CRN1 and CRN2 stations versus the majority of lower quality CRN3, CRN4, and CRN5 stations that make up the bulk of the readings.
If you want to draw more conclusions, then you want different analysis anyway.
For anyone interested in playing a little with the global data in a very easy way, check out woodfortrees.org:
I think you mean we are have more outlier days now than in 1964, not ever.
As a quick datapoint right now--the snow pack in the high sierra in CA above 11,000 feet is exceedingly variant to that below. This observed data conflicts with reported NOAA data, because of sensor location issues.
We are having a drought in CA and in the Sierras but the snowpack even versus last year is anomolaous only at certain altitudes. From what I've heard our storms have only been precipitating above a threshold floor (for various reasons). This is something that we've experienced in previous years as well (to some extent, in 2013).
The Wind-energy potential is also very non-uniform:
Just for a quick example. This involves both altitude (eg, wind/weather shadows) as much as micro-topography (ridgelines, etc). Again, which can impact sensor-reported anomaly (micro-climates, etc).
So a bit of caution when extrapolating to things like continental scale.
And in terms of the energy surplus (or deficit) represented, these are huge changes being represented. Takes a lot of energy to heat (or cool) the world.
Why would this be intuitive at all?
First, if you play the animation and look at the dots on the map and have a basic understanding of how the US population is distributed, I just can't see how you would draw your conclusion. The dots are pretty evenly distributed geographically.
But even without the visualization, why would you assume there were more reporting stations in urban areas? Why would people be inclined to waste time and money supporting a new reporting station if there is already one close by?
Finally, the "urban heat islands are skewing the data and climate change is not real" argument has been debunked so thoroughly that it is hard to believe anyone who brings it up brings it up in good faith. Especially when it is phrased as an "innocuous question" which could have been answered with 30 seconds and your favorite search engine.
It does if the distribution is normal and the trend in the outliers is asymmetrical (as it is in this case). Because temperature distributions are (almost certainly) the result of many additive factors, they are normal by the mean value theorem.
One comment I might have is that your model assumes the average temperature on a given day of the year is constant across years. I would have to spend some time thinking of how to control for it, but have you considered the impact of climactic oscillation? Your data seems to reflect these patterns to some degree, and removing/reducing them might make the overall shift clearer.
Sure, some will claim that, "Alaska is always cold, and Hawai'i is always hot," which is a typical fallacy.
People surf in Alaska, and it snows in Hawai'i. (We just had a hailstorm a few weeks ago.)
Back in the 1970's I remember seeing snow fall halfway down the slopes of 13,796 foot Mauna Kea, something we don't see too often anymore.
Alaska's record low (-80F) is a little bit colder than Hawaii's record low (15F) though.
If you want that kind of temp without worrying about island fever, try any city in So California that's within 20-50miles of the Pacific ocean...
ex: Manhattan Beach, CA
Average low temp is 25f. Lovely place.
I don't agree with the decision to include a political argument in the writeup, but I suppose that's to be expected.
EDIT: Not working in Chrome, but it works in IE.
Have you considered aggregating a little more, so that years move by faster?
I bet there are some cheap wins you can find in there OP! Keep up the good work.
"Armed with this refined dataset, we computed the historical range of low and high temperatures for each station for each month of the year. We then compared each station's daily temperatures to its corresponding monthly distribution. If one or both of these measurements fell in the bottom or top 2% on a given day, we labeled it an "anomaly" according to the typology above."
In case it wasn't apparent to you, there were a number of fairly obvious (let's call them "junior") errors of reasoning embedded in the original posting. As in, the kinds of things we would have readily lost points for in that undergraduate-level "quantitative reasoning" course we crammed through on the way to fulfilling our social science degree.
Wholly independent, mind you, of all the fancy-schmancy talk about generalized linear models and whatnot. Which, while being quite nice-sounding, serve only to distract from the more basic (and glaring) errors of inference in the surrounding text.
Which, in turn, is why the label "dataviz" is a more than appropriate description of the otherwise quite fun and entertaining HTML5 demo provided by the Enigma team.
"Data science", however, it is not.
But, its hard to take the data seriously when the y-axis is rarely labeled...
"The NOAA, NASA and the Hadley Center press releases should be ignored. The reason which is expanded on with case studies in the full report is that the surface based data sets have become seriously flawed and can no longer be trusted for climate trend or model forecast assessment in decision making by congress or the EPA."