A couple cautions about drawing conclusions from this data though:
1. A trend in the outliers of a distribution does not imply the same trend in the mean values of the distribution. Although I'm sure there's an upward trend in the mean values, it's not good science to encourage people to jump to that second conclusion. (Not without the supporting data at least.)
2. What about sampling bias? Intuitively, it seems like the reporting weather stations would not be uniformly distributed geographically, but would rather show a higher concentration near urban areas, which store and release extra heat. How large is this effect, and how could we correct for it?
For anyone interested, you can also get at the raw data without signing up for engima.io:
Your first point is very well taken - we also urge caution in drawing too many conclusions regarding an overall trend in mean temperatures. However, I think our conclusion - that we are having more hot outlier days now than ever before - is interesting in and of itself.
Your second point is an interesting one, and I too would be curious to look further into the correlation between proximity to a city and temperature. However, to look for outliers, we created a long-running distribution of seasonal temperatures for each station individually - so in some sense the map is already corrected for this effect. Each anomaly you see is an anomaly for that station alone - meaning if an urban station regularly gets higher temperatures than a rural one, it will take a proportionally higher temperature to trigger an anomaly on the former than on the latter.
Furthermore, NOAA has been good about getting good national distribution of these stations so it's less of a concern than you might think.
That said, urban stations may still show artifacts compared to rural ones, eg. when there is an extreme warm outlier, cities may be more likely to have another warm outlier the following day due to the heat storage effect you mention. I'm not sure.
About the second point, the U.S. population has been growing (from 179 million in 1960 to 308 million in 2010 according to the US Census). So a particular station that was in the same location in 2014 as it was in 1964 could well have a more urban surrounding in 2014. In fact, one knows that surely on average this will be the case. Since more urban surroundings lead to higher temperatures, this must be a biasing factor. Does anybody have any idea how large this biasing factor is? Is there any literature on that issue?
"However, over the Northern Hemisphere land areas where urban heat islands are most apparent, both the trends of lower-tropospheric temperature and surface air temperature show no significant differences. In fact, the lower-tropospheric temperatures warm at a slightly greater rate over North America (about 0.28°C/decade using satellite data) than do the surface temperatures (0.27°C/decade), although again the difference is not statistically significant. "
Yes it's been talked about extensively even in the public for over a decade at least. In blogs and comment fields and columns the "but it's just the urban heat island" is a common myth that pops up all the time and has to be debunked constantly.
Some GISS temperature data for example excludes urban stations. Classification by night lights in satellite images. These rural stations also show similar trends.
I'm in danger of doing it also, but I think you are reacting defensively to a valid specific question because of your views on the larger climate reality. There is definitely an effect on individual stations as a result of changes to their surrounding environment. The question is whether the corrections applied to correct for it significantly affect the results of any given analysis.
The magnitude of the changes made are quite large compared to the effects being measured. They average to zero, but are bimodal centered around about +1F and -1F:
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/techreports/Technical%20Report%20NCDC%20No12-02-3.2.0-29Aug12.pdf
I think it's a brilliant visualization, but if we are to presume the corrections are necessary and correct, it would also be reasonable to question what conclusions can be drawn from an analysis of data that does not include such corrections. At the least, I think it would be interesting to see their analysis applied to the more rural CRN1 and CRN2 stations versus the majority of lower quality CRN3, CRN4, and CRN5 stations that make up the bulk of the readings.
Adding to the parent above, a few more consideratios on anomaly and sensor distribution.
As a quick datapoint right now--the snow pack in the high sierra in CA above 11,000 feet is exceedingly variant to that below. This observed data conflicts with reported NOAA data, because of sensor location issues.
We are having a drought in CA and in the Sierras but the snowpack even versus last year is anomolaous only at certain altitudes. From what I've heard our storms have only been precipitating above a threshold floor (for various reasons). This is something that we've experienced in previous years as well (to some extent, in 2013).
The Wind-energy potential is also very non-uniform:
Just for a quick example. This involves both altitude (eg, wind/weather shadows) as much as micro-topography (ridgelines, etc). Again, which can impact sensor-reported anomaly (micro-climates, etc).
So a bit of caution when extrapolating to things like continental scale.
You are currently grouping low daily maximums and low daily minimums together, and the same for high. I would be interested in being able to compare those separately, to test and quantify urban-heat-related theories that daily minimums have increased much more daily maximums have increased
Great point. We grouped the anomalies into simpler categories in order to make the project a bit more digestible. However, some of our initial analyses suggested that daily minimums have indeed increased more than daily maximums. Do you have any links to journal articles that discuss this theory?
The thing about weather temperature outliers, however, is that they're highly relevant to the effects resulting from them: an hard freeze at an unexpected time, or a colder seasonal low, can have huge impacts on agriculture, infrastructure (roads, pipes, overhead wires, and structures can be damaged by freezing or ice storms), similarly heat can result in ruined crops, heat deaths, drought, and fires. Diseases, pests, and parasites can also be strongly influenced by weather.
And in terms of the energy surplus (or deficit) represented, these are huge changes being represented. Takes a lot of energy to heat (or cool) the world.
> Intuitively, it seems like the reporting weather stations would not be uniformly distributed geographically, but would rather show a higher concentration near urban areas
Why would this be intuitive at all?
First, if you play the animation and look at the dots on the map and have a basic understanding of how the US population is distributed, I just can't see how you would draw your conclusion. The dots are pretty evenly distributed geographically.
But even without the visualization, why would you assume there were more reporting stations in urban areas? Why would people be inclined to waste time and money supporting a new reporting station if there is already one close by?
Finally, the "urban heat islands are skewing the data and climate change is not real" argument has been debunked so thoroughly that it is hard to believe anyone who brings it up brings it up in good faith. Especially when it is phrased as an "innocuous question" which could have been answered with 30 seconds and your favorite search engine.
> A trend in the outliers of a distribution does not imply the same trend in the mean values of the distribution
It does if the distribution is normal and the trend in the outliers is asymmetrical (as it is in this case). Because temperature distributions are (almost certainly) the result of many additive factors, they are normal by the mean value theorem.
A couple cautions about drawing conclusions from this data though:
1. A trend in the outliers of a distribution does not imply the same trend in the mean values of the distribution. Although I'm sure there's an upward trend in the mean values, it's not good science to encourage people to jump to that second conclusion. (Not without the supporting data at least.)
2. What about sampling bias? Intuitively, it seems like the reporting weather stations would not be uniformly distributed geographically, but would rather show a higher concentration near urban areas, which store and release extra heat. How large is this effect, and how could we correct for it?
For anyone interested, you can also get at the raw data without signing up for engima.io:
http://www.ncdc.noaa.gov/oa/climate/ghcn-daily/