Hacker News new | comments | show | ask | jobs | submit login
Ask HN: What are some good sources of climate data to analyze?
108 points by rpeden 6 months ago | hide | past | web | favorite | 40 comments
I've noticed that articles mentioning climate and climate change that make it to the front page end up being flagged.

I think this is probably fair, because discussions about such articles invariably end up being political and don't end up contributing very positively to HN.

I certainly have thoughts and opinions on this topic, but upon reflection I realize they're not especially well informed. They're mainly based on what I've read in popular media, and I'm hoping I can do better than that.

So I'd like to approach it from another direction. If I want to get some raw climate data for myself, and analyze it using R or Python, what would be a good starting point?

Are there specific data sets that I should focus on? Are there any well known papers that I should read?

"No One Gives a Fuck About Climate Change" made the front page of Hacker News in June. This inspired a team to created the first Open API for Climate Data. It's a repository of carbon dioxide data as measured on Mauna Loa by NOAA.

The prototype is up and running with data from 1958 - 2017! http://api.carbondoomsday.com prototype frontend: http://carbondoomsday.com.

The project is open source on GitHub: https://github.com/giving-a-fuck-about-climate-change

The original blog post that inspired the project: No One Gives a Fuck About Climate Change (http://titojankowski.com/no-one-gives-a-fck-about-climate-ch...)

Would love to get your perspective and feedback! Maybe you want to follow the project on Github?

Also, continue this discussion with other like-minded hackers!

Join the "Giving Fucks about Climate Change" Google Group: https://groups.google.com/d/forum/giving-fucks-about-climate...

I have a Master's degree in physics from the University of Copenhagen and I have taught together with one of the leading climate scientists from that university Bo Møllesøe Vinther. He claimed that the best dataset on global average temperatures is satellite based and is the University of Alabama in Huntington dataset (UAH) which goes back to 1979.

You can find it here: https://www.nsstc.uah.edu/climate/

Direct link for monthly temperature averages: http://www.nsstc.uah.edu/data/msu/v6.0beta/tlt/uahncdc_lt_6....

Wouldn't you need data going back for thousands of years? 40 years of data won't tell you anything about the cause of the temperatures in this data set. You'll need to eliminate the possibility that we are in a Little Heat Wave cycle; if you got forty years of temperature data during the beginning of the Little Ice Age, you won't have a representative sample, for instance.

Its a unique dataset. it does not portend to be a multi-source long history dataset. you raise a question re: long term trends which is another topic.


Not just data, but analysis tools as well. Run by a Berkeley physicist who was originally a climate change skeptic and changed his mind after doing this analysis.

I looked into this a while back and came to the conclusion that this is the best source.

To combat the "oh no they are 'corrected'" claims, they host the uncorrected, completely raw files too[1]. Then you can have the enjoyable task of trying to work out how to correct for sensors moving, cities growing, etc. Look for the "raw" datasets, as opposed to the Breakpoint Adjusted files.

[1] http://berkeleyearth.org/data/

Rich Muller is a smart guy. I was at the conference where he announced he had changed his mind about climate change, and why. FWIW, his book "Physics for Future Presidents" -- while not really about climate change -- is very good too.

It's slightly tangential, but you might find this project, and the surrounding community, to be of interest.


Also, in addition to historical data that's already collected and available online, an interesting side note is that it's possible to directly receive some satellite data yourself using a relatively inexpensive SDR (Software Defined Radio) device. This article deals specifically with receiving images from NOAA satellites, but there are probably other signals floating around out there that would be useful to you.


The Apache project you link is not tangential at all -- it's spot on! See the examples at:


The people advising the project have significant publications and research experience. I believe some of their examples are replicating peer-reviewed work with that software.

Fair enough. I just meant tangential in the sense that the OP was asking about data, not software specifically. But yeah, it is very much about working with climate models. Unfortunately I can't claim any experience working with the project myself... I'm only familiar with it in passing.

The authors inhabit the same floor of the lab where I work, so I may have a touch of bias. ;-) I think some data sources are linked with the examples, but I have not checked myself.

I follow some websites casually, usually daily, to get an idea of what the numbers are showing me. It's just a ritual I go through in the mornings before going to work.

When I want to see where we are temperature-wise, I go to http://www.drroyspencer.com/latest-global-temperatures/ for the latest satellite monthly temperatures.

The What's Up with That? website (which is strongly biased towards the "denier" side of the debate) has a great collection of graphs related to arctic and antarctic sea ice: http://wattsupwiththat.com/reference-pages/sea-ice-page/

I go to http://spaceweather.com/ to watch the sunspot numbers (just for curiosity sake), http://sealevel.colorado.edu/ for sea level information, and http://www.cpc.noaa.gov/products/analysis_monitoring/ensostu... to see if we're in an El Nino or a La Nina (or somewhere in the middle).

Google's Bigquery has the following huge set of weather measurements, and you can use both the initial trial credit, as well as a substantial free quota every month to run analysis on it right within Bigquery: https://bigquery.cloud.google.com/dataset/bigquery-public-da...

Quote: "This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and present, collected from over 9000 stations.

Dataset Source: NOAA

Category: Weather

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: daily"

Here's a compilation of links to get you started: https://tamino.wordpress.com/climate-data-links/

Climate science is a big subject and you should start by understanding the basics of the subject. Spencer Weart's history of climate science is a good place to start: https://history.aip.org/climate/index.htm

Also, I see many people with strong statistics/engineering backgrounds analyzing climate data and coming to wrong conclusions because they never bothered to learn the physics. Don't be those people.

Thanks for the suggestions.

And thanks for the heads up regarding coming to wrong incorrect conclusions. That's important and useful advice.

The IPCC full reports are probably the place to start.

However, before embarking on your own analysis, you should consider what questions you're trying to answer and how to avoid misleading yourself. There are two very important and extremely complex areas: proxies and forcing.

Proxies are the use of data other than temperature measurements to infer temperature, such as tree rings and ice cores.

Forcing is the second-order effects of more CO2 such as causing more water evaporation - whic is itself a greenhouse gas.

I think you've identified the two most important topics. A nitpick, though: I disagree with your definition of forcing, it's the same as my definition of feedback.

A forcing is anything that moves the temperature away from the equilibrium between radiative losses and insolation, whether higher or lower. Additional CO2 is itself a forcing.

Thanks so much for the tips!

This is exactly why I did an Ask HN. I'm sure I could've searched around and found some data, but as you've pointed out, there are other things to consider. This thread helped me discover some of the 'unknown unknowns'.

I believe that the dataset published by ICARUS in Maynooth University might be more complete and up to date than some of the others mentioned here. It is a combination of a large number of datasets, is global in scope and runs from 1900 to 2012.


It is the basis for the two following articles:



IIRC the largest files might not download correctly from this repository linked to above. If that it the case you should contact the researcher Peter Thorne:


Right now I am using this data: https://data.giss.nasa.gov/gistemp/station_data/

While my motivation is to demonstrate matrix decomposition (see: https://twitter.com/pmigdal/status/902563470125789188), I wanted to use a credible source. See also a few threads on Open Data Stack Exchange (e.g. https://opendata.stackexchange.com/questions/1546/how-might-...).

Since you mentioned the Open Data Stack Exchange, it might also be worth pointing out a couple of sub-reddits that might also be of use to the OP (or anybody):



If you're looking for raw data, you can find measurements from NASA's earth observation missions here: https://earthdata.nasa.gov/

NASA's Worldview application presents the data as overlays on maps of the Earth and it lets you scroll backward in time from the start of the missions (around 2000 or so) up to near real time (a few hours old). If you want to make sense of the scientific data, you'll still need to put some work into understanding what the instruments are measuring: https://worldview.earthdata.nasa.gov/

The National Centers for Environmental Information are probably a good place to start. Collection of links to data can be found here: https://www.ncdc.noaa.gov/data-access/quick-links. Alternatively there are some collected sets available here (which may be a little cleaner/ have some metatdata annotation): https://public.enigma.com/search/climate.

Data from European Union's satellites (Copernicus Program) is available for free.

Copernicus Climate Change Service is under development, some data is already available:


Some other Copernicus services may also contain relevant information related to climate change: atmosphere, land, marine..


Oregon University has precip/temperature data for CONUS freely available!


http://www.woodfortrees.org/ - excellent stuff here

There's an interesting wrinkle going on in Australia at the moment. The Bureau of Meteorology routinely adjusts the "raw" data (for entirely sensible statistical reasons). However, the net effect of these adjustments is apparently to amplify the warming trend. There's controversy even in the raw data - any analysis produced from it can be criticised. I don't think going to the source is going to resolve the politics.


Quoting from the website "Search and access 194 data sets covering the Atmosphere, Ocean, Land and more. Explore climate indices, reanalyses and satellite data and understand their application to climate model metrics. This is the only data portal that combines data discovery, metadata, figures and world-class expertise on the strengths, limitations and applications of climate data."

There's a reply that's dead, but I'd like to address it. It asked:

"We have 25 years or so invested in the work. Why should I make the data available to you, when your aim is to try and find something wrong with it?"

That's a fair question, but my aim definitely isn't to try to find something wrong with the data. My aim is only to explore the data and find out what's right and what's wrong about my current understanding of the situation. I don't plan to blog about or even discuss the work I do.

Also, isn't publishing something in order for others to "find something wrong with it" a core component of the scientific method?

sure, but statistically someone who knows how to correctly interpret this data probably already has the academic communication channels to solicit it directly, so someone randomly asking for it online probably has an axe to grind

Right, which is why I thought it was a fair point for the original poster of the message to bring up. And it seems like there are lots of people with an axe to grind on this topic.

I'm just looking to learn more for myself, and I decided to ask HN because I was having trouble deciding where to start. I realize it's tough to look at data on a complex topic without a thorough academic background in it. But I figure I'll never learn more if I don't at least get started.

Thanks for explaining this, I would have thought the same: I'm a novice, looking for good climate data sets to analyze... asking HN could be a good place to start. Based on the feedback/links provided so far, I think it is!

Based on my interpretation, using that logic as a reason to withhold the data probably furthers the exclusionary/isolationist phenomenon that exists in modern academia already. Certainly people who are formally accredited to 'correctly interpret this data' are quantifiable, but without providing it to the non-credentialed as well we shouldn't draw statistical claims based on their ability to correctly analyze something they've been denied access to, or suppose their intentions as a reason to exclude access.

Example: me

I pride myself with being able to change my mind even on topics that I have been sure about since I was a kid. (Example: laws against drugs, -based on thoughful opinions by people here and possibly elsewhere I now think we would be better of if certain drugs were decriminalized.)

On certain topics however like AGW or not there is still just a shouting match: flagging, downvoting, accusations ("sceptism is just another word for denier", "anyone who disagree is a paid shill", "questions are never real, it is the YAQ tactic")

Luckily I've met some more reasonable folks at work that kindly answered my questions about data -with data- instead of flaming me.

You see, people like me want data like metric tons of CO2 by source

- not data about how many scientists have agreed.

> metric tons of CO2 by source

Surprisingly hard to measure without controversy. You can take point measurements of atmospheric content and ocean cooncentration, but all sorts of things like melting permafrost outgassing have to be estimated.

I've never used it myself, but Richard Muller's Berkely Earth project publish their datasets online.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact