Hacker News new | past | comments | ask | show | jobs | submit login
U.S. Daily Temperature Anomalies 1964-2013 (enigma.io)
206 points by dandelany on April 10, 2014 | hide | past | favorite | 56 comments



Absolutely beautiful visualization.

A couple cautions about drawing conclusions from this data though:

1. A trend in the outliers of a distribution does not imply the same trend in the mean values of the distribution. Although I'm sure there's an upward trend in the mean values, it's not good science to encourage people to jump to that second conclusion. (Not without the supporting data at least.)

2. What about sampling bias? Intuitively, it seems like the reporting weather stations would not be uniformly distributed geographically, but would rather show a higher concentration near urban areas, which store and release extra heat. How large is this effect, and how could we correct for it?

For anyone interested, you can also get at the raw data without signing up for engima.io:

http://www.ncdc.noaa.gov/oa/climate/ghcn-daily/


Thanks for the excellent comment.

Your first point is very well taken - we also urge caution in drawing too many conclusions regarding an overall trend in mean temperatures. However, I think our conclusion - that we are having more hot outlier days now than ever before - is interesting in and of itself.

Your second point is an interesting one, and I too would be curious to look further into the correlation between proximity to a city and temperature. However, to look for outliers, we created a long-running distribution of seasonal temperatures for each station individually - so in some sense the map is already corrected for this effect. Each anomaly you see is an anomaly for that station alone - meaning if an urban station regularly gets higher temperatures than a rural one, it will take a proportionally higher temperature to trigger an anomaly on the former than on the latter.

Furthermore, NOAA has been good about getting good national distribution of these stations so it's less of a concern than you might think.

That said, urban stations may still show artifacts compared to rural ones, eg. when there is an extreme warm outlier, cities may be more likely to have another warm outlier the following day due to the heat storage effect you mention. I'm not sure.


About the second point, the U.S. population has been growing (from 179 million in 1960 to 308 million in 2010 according to the US Census). So a particular station that was in the same location in 2014 as it was in 1964 could well have a more urban surrounding in 2014. In fact, one knows that surely on average this will be the case. Since more urban surroundings lead to higher temperatures, this must be a biasing factor. Does anybody have any idea how large this biasing factor is? Is there any literature on that issue?


Apparently the factor has been recognized and analyzed.

http://www.grida.no/publications/other/ipcc_tar/?src=/climat...

"However, over the Northern Hemisphere land areas where urban heat islands are most apparent, both the trends of lower-tropospheric temperature and surface air temperature show no significant differences. In fact, the lower-tropospheric temperatures warm at a slightly greater rate over North America (about 0.28°C/decade using satellite data) than do the surface temperatures (0.27°C/decade), although again the difference is not statistically significant. "


Yes it's been talked about extensively even in the public for over a decade at least. In blogs and comment fields and columns the "but it's just the urban heat island" is a common myth that pops up all the time and has to be debunked constantly.

Some GISS temperature data for example excludes urban stations. Classification by night lights in satellite images. These rural stations also show similar trends.

More at the ever resourceful myth-collecting Skeptical Science site. http://www.skepticalscience.com/urban-heat-island-effect.htm


I'm in danger of doing it also, but I think you are reacting defensively to a valid specific question because of your views on the larger climate reality. There is definitely an effect on individual stations as a result of changes to their surrounding environment. The question is whether the corrections applied to correct for it significantly affect the results of any given analysis.

In this particular case, they are using GHCN Daily data that does not include any correction for this effect: http://www.ncdc.noaa.gov/oa/climate/ghcn-daily/index.php?nam...

The GHCN Monthly data does include such corrections: https://www.ncdc.noaa.gov/ghcnm/v3.php?asection=homogeneity_...

The magnitude of the changes made are quite large compared to the effects being measured. They average to zero, but are bimodal centered around about +1F and -1F: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/techreports/Technical%20Report%20NCDC%20No12-02-3.2.0-29Aug12.pdf

I think it's a brilliant visualization, but if we are to presume the corrections are necessary and correct, it would also be reasonable to question what conclusions can be drawn from an analysis of data that does not include such corrections. At the least, I think it would be interesting to see their analysis applied to the more rural CRN1 and CRN2 stations versus the majority of lower quality CRN3, CRN4, and CRN5 stations that make up the bulk of the readings.


Yes, you are correct. I was reacting defensively.

If you want to draw more conclusions, then you want different analysis anyway.

For anyone interested in playing a little with the global data in a very easy way, check out woodfortrees.org:

http://woodfortrees.org/plot/gistemp-dts/from:1960/mean:12/p...


I think our conclusion - that we are having more hot outlier days now than ever before

I think you mean we are have more outlier days now than in 1964, not ever.


Adding to the parent above, a few more consideratios on anomaly and sensor distribution.

As a quick datapoint right now--the snow pack in the high sierra in CA above 11,000 feet is exceedingly variant to that below. This observed data conflicts with reported NOAA data, because of sensor location issues.

http://www.nohrsc.noaa.gov/nsa/

We are having a drought in CA and in the Sierras but the snowpack even versus last year is anomolaous only at certain altitudes. From what I've heard our storms have only been precipitating above a threshold floor (for various reasons). This is something that we've experienced in previous years as well (to some extent, in 2013).

The Wind-energy potential is also very non-uniform:

http://www.thepelicanpost.org/wp-content/uploads/2011/08/US_...

Just for a quick example. This involves both altitude (eg, wind/weather shadows) as much as micro-topography (ridgelines, etc). Again, which can impact sensor-reported anomaly (micro-climates, etc).

So a bit of caution when extrapolating to things like continental scale.


You are currently grouping low daily maximums and low daily minimums together, and the same for high. I would be interested in being able to compare those separately, to test and quantify urban-heat-related theories that daily minimums have increased much more daily maximums have increased


Great point. We grouped the anomalies into simpler categories in order to make the project a bit more digestible. However, some of our initial analyses suggested that daily minimums have indeed increased more than daily maximums. Do you have any links to journal articles that discuss this theory?


The thing about weather temperature outliers, however, is that they're highly relevant to the effects resulting from them: an hard freeze at an unexpected time, or a colder seasonal low, can have huge impacts on agriculture, infrastructure (roads, pipes, overhead wires, and structures can be damaged by freezing or ice storms), similarly heat can result in ruined crops, heat deaths, drought, and fires. Diseases, pests, and parasites can also be strongly influenced by weather.

And in terms of the energy surplus (or deficit) represented, these are huge changes being represented. Takes a lot of energy to heat (or cool) the world.


> Intuitively, it seems like the reporting weather stations would not be uniformly distributed geographically, but would rather show a higher concentration near urban areas

Why would this be intuitive at all?

First, if you play the animation and look at the dots on the map and have a basic understanding of how the US population is distributed, I just can't see how you would draw your conclusion. The dots are pretty evenly distributed geographically.

But even without the visualization, why would you assume there were more reporting stations in urban areas? Why would people be inclined to waste time and money supporting a new reporting station if there is already one close by?

Finally, the "urban heat islands are skewing the data and climate change is not real" argument has been debunked so thoroughly that it is hard to believe anyone who brings it up brings it up in good faith. Especially when it is phrased as an "innocuous question" which could have been answered with 30 seconds and your favorite search engine.


> A trend in the outliers of a distribution does not imply the same trend in the mean values of the distribution

It does if the distribution is normal and the trend in the outliers is asymmetrical (as it is in this case). Because temperature distributions are (almost certainly) the result of many additive factors, they are normal by the mean value theorem.


If you increase the level of rigorousness enough, you can't say anything about the world based on any data.


Loved It!

One comment I might have is that your model assumes the average temperature on a given day of the year is constant across years. I would have to spend some time thinking of how to control for it, but have you considered the impact of climactic oscillation? Your data seems to reflect these patterns to some degree, and removing/reducing them might make the overall shift clearer.

http://en.wikipedia.org/wiki/Effects_of_the_El_Ni%C3%B1o%E2%...


Yeah, that is an interesting point and highlights the difference between "weather" and "climate" - it also brings up a tension between two different objectives of the chart: to visualize weather patterns over time intuitively, and to draw general conclusions about climate trends. In the context of the first intention, climatic oscillations like El Niño are interesting signals - you can see how they affect weather throughout the country in unexpected ways. But in the context of the latter goal, they are noise which should be filtered out/corrected for.


That makes sense; my only thought was that if you are graphing "anomalies" you might want to filter out non-anomalous behavior. Higher highs or lower lows are actually to some degree expected in those years. I suppose it could be best not to control for oscillatory behavior though as the affect of any climate shift on those oscillations is possibly not insignificant.


Gorgeous visualization! Takes a really long time to watch though, even after clicking the "up" on speed a bunch of times, and so it's hard to get the same sense from watching the points on the map as you get instantly from the chart under the map.

Have you considered aggregating a little more, so that years move by faster?


I'm probably been blind but where is the units for the Y axis on the top graph?


You aren't blind :) We just had trouble fitting a y-scale on the top chart that was both clean and readable and didn't get in the way of the time slider... However this is just supposed to be an overview of the trend - the proportional bar chart further down the page is the same data, plotted in more depth and explained further.


Too bad the graphics only show the lower 48 states.

Sure, some will claim that, "Alaska is always cold, and Hawai'i is always hot," which is a typical fallacy.

People surf in Alaska, and it snows in Hawai'i. (We just had a hailstorm a few weeks ago.)

http://www.hawaiinewsnow.com/story/25102371/flash-flood-watc...

Back in the 1970's I remember seeing snow fall halfway down the slopes of 13,796 foot Mauna Kea, something we don't see too often anymore.


Little known fact: Hawaii's record high temperature (100F) is the same as Alaska's record high temperature (100F), and both are the lowest record high temperatures for the 50 states.

Alaska's record low (-80F) is a little bit colder than Hawaii's record low (15F) though.


I'd love to live in Hawaii. It sounds like heaven on Earth.


It is. Not so much if you have to work 2-3 jobs though. Or if you happen to have island fever.

If you want that kind of temp without worrying about island fever, try any city in So California that's within 20-50miles of the Pacific ocean...

ex: Manhattan Beach, CA http://www.weather.com/weather/wxclimatology/monthly/graph/U...

Average low temp is 25f. Lovely place.


Click and drag to pan the map - Alaska, Hawaii and Puerto Rico all have data :)


Thank you... Not sure why I didn't notice this initially. (blames outdated mobile device)


Hail is not the same as snow. When I lived in the mid-atlantic region, we got hail almost exclusively in the summer, as it was typically caused by thunderstorms.


Wouldn't it be better to use a rolling window rather than monthly distribution for anomaly detection? One might expect more anomalies in the shoulder months using the monthly distribution.


Yes, that's a good point. It would be better to create one distribution for each day of the year, with the values drawn from the 2-3 weeks before / after that day, but we chose a month to minimize the computational complexity. Presumably these artifacts wouldn't affect the aggregate trend over time, only the appearance of more outliers at the beginning of March / October, etc. each year. We also were drawing inspiration from NOAA's Climate Extremes Index which uses monthly data: https://www.ncdc.noaa.gov/extremes/cei/


Yes, this would be a good improvement to the data pipeline. I only built the front-end so I can't speak to exactly what difficulties would be involved, but I agree.


Really awesome visualization, but I didn't read the text because the columns were insanely wide on my 24" monitor. Might want to constrain the width a bit, chaps.


Thanks for the feedback! Just pushed an update with a max-width.


Nice work, really cool method for visualizing this data.

I don't agree with the decision to include a political argument in the writeup, but I suppose that's to be expected.


Is the data not loading for anyone else? It's been 10 minutes for me an all I get it "Loading data".

EDIT: Not working in Chrome, but it works in IE.


Hmm, that's odd - it was developed in Chrome and I haven't seen any Chrome-related issues after testing on several computers. Could be due to a badly-behaving browser extension? Let me know if you see any JS errors on the console, or if you figure out what was causing this...


A-ha! Looks like HTTPS-Everywhere was messing it up. After loading it Incognito it worked.


Neat visualization, 100 megabytes of ram consumption! Sometimes as much as 8 meg GC during a JavaScript frame, making the frame bloat to over 50ms of running time.

I bet there are some cheap wins you can find in there OP! Keep up the good work.


You should've seen it before we found all the cheap wins :) Seriously though, I'd be interested to hear any ideas you have - it has already been optimized quite a bit but it's just a lot of data (3 million rows over 50 years). It uses canvas and throws away all the circle references every day so that helps a lot.


I was wanting to do something just like this recently. Very visualizing and pleasing to watch. Now we just need some slow building dramatic melody to watch as our planet melts away!


There hasn't been any warming for 17 years. How can the graph point up? I'm sure I will get lambasted. But there hasn't been any warming for 17 years.


Would it be very hard to do this for a sample of all the stations world wide? The US centered view is kind of odd considering the subject matter.


Theoretically no, the database we used contains weather station data from all over the world. However during our analysis we found that the quality of data from international stations was a lot more varied - as measured by many factors such as geographical distribution of stations, regularity of measurements and number of implausible outliers. I would still like to come back to this eventually & try to iron out as many of those issues as possible - but for the scope of this project it was much easier to just use US data as it is extremely well-curated.


I'd be curious to see maximum and minimum temperatures modeled with an extreme value distribution. Might do it myself.


I couldn't find the answer to this (perhaps I missed it): what is defined as an outlier?


From the page:

"Armed with this refined dataset, we computed the historical range of low and high temperatures for each station for each month of the year. We then compared each station's daily temperatures to its corresponding monthly distribution. If one or both of these measurements fell in the bottom or top 2% on a given day, we labeled it an "anomaly" according to the typology above."


It's "dataviz", bro. Not data science.


Yo, downvoter:

In case it wasn't apparent to you, there were a number of fairly obvious (let's call them "junior") errors of reasoning embedded in the original posting. As in, the kinds of things we would have readily lost points for in that undergraduate-level "quantitative reasoning" course we crammed through on the way to fulfilling our social science degree.

Wholly independent, mind you, of all the fancy-schmancy talk about generalized linear models and whatnot. Which, while being quite nice-sounding, serve only to distract from the more basic (and glaring) errors of inference in the surrounding text.

Which, in turn, is why the label "dataviz" is a more than appropriate description of the otherwise quite fun and entertaining HTML5 demo provided by the Enigma team.

"Data science", however, it is not.


Great looking graphs.

But, its hard to take the data seriously when the y-axis is rarely labeled...


Can't wait to see 2014. Right up there with... hmm, maybe not.


Is it the raw, original weather station data, or the data the NOAA has been "adjusting" over and over to make the past cooler?


Joe D'Aleo, the first director of meteorology at the Weather Channel who now works with WeatherBELL, is explaining to anyone who will listen that NOAA did exactly that.

http://icecap.us/images/uploads/NOAAroleinclimategate.pdf

Closing line:

"The NOAA, NASA and the Hadley Center press releases should be ignored. The reason which is expanded on with case studies in the full report is that the surface based data sets have become seriously flawed and can no longer be trusted for climate trend or model forecast assessment in decision making by congress or the EPA."


AFAIK this is based on raw, original weather station data.


strange coincidence, temperatures started going up after the EPA declared war on sulfur.


You have to ask yourself this question: If I saw a webpage like this, but with the opposite trend (or no trend), would I consider this to be strong evidence against global warming, or would I dismiss it as cherry picked data, a statistical artifact, etc.?


I think the solution for such issue is not at the MacroEconomical level, but at micro-behavioral level of our culture and life habits, if we solve it, all the macro factors will adapt accordingly




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: