Hacker News new | past | comments | ask | show | jobs | submit login

In case someone is looking for historical weather data for ML training and prediction, I created an open-source weather API which continuously archives weather data.

Using past and forecast data from multiple numerical weather models can be combined using ML to achieve better forecast skill than any individual model. Because each model is physically bound, the resulting ML model should be stable.

See: https://open-meteo.com




Is there somewhere to see historical forecasts?

So not "the weather on 25 December 2022 was such and such" but rather "on 20 December 2022 the forecast for 25 December 2022 was such and such"


Not yet, but I am working towards it: https://github.com/open-meteo/open-meteo/issues/206


I’ve always wanted to see something like that. I always wonder if forecasts are a coin flip beyond a window of a few hours.


I just quit photographing weddings (and other stuff) this year. It's a job where the forecast really impacts you, so you tend to pay attention.

The amount of brides I've had to calm down when rain was forecast for their day is pretty high. In my experience, in my region, precipitation forecasts more than 3 days out are worthless except for when it's supposed to rain for several days straight. Temperature/wind is better but it can still swing one way or the other significantly.

For other types of shoots I'd tell people that ideally we'd postpone on the day of, and only to start worrying about it the day before the shoot.

I'm in Minnesota, so our weather is quite a bit more dynamic than many regions, for what it's worth.


I know at a minimum that hurricane forecasts have gotten significantly better over time. We can now

https://www.nhc.noaa.gov/verification/verify5.shtml

Our 96 hour projections are as accurate today as the 24 hour projections were in 1990.


Looks like https://sites.research.google/weatherbench/ attempts to "benchmark" different forecast models/systems.

They're very cautious about naming a "best" model though!

> Weather forecasting is a multi-faceted problem with a variety of use cases. No single metric fits all those use cases. Therefore,it is important to look at a number of different metrics and consider how the forecast will be applied.


That last paragraph sounds like something ChatGPT would write.


Are you thinking something like https://www.forecastadvisor.com/?


I would like to see an independent forecast comparison tool similar to Forecast Advisor, which evaluates numerical weather models. However, getting reliable ground truth data on a global scale can be a challenge.

Since Open-Meteo continuously downloads every weather model run, the resulting time series closely resembles assimilated gridded data. GraphCast relies on the same data to initialize each weather model run. By comparing past forecasts to future assimilated data, we can assess how much a weather model deviates from the "truth," eliminating the need for weather station data for comparison. This same principle is also applied to validate GraphCast.

Moreover, storing past weather model runs can enhance forecasts. For instance, if a weather model consistently predicts high temperatures for a specific large-scale weather pattern, a machine learning model (or a simple multilinear regression) can be trained to mitigate such biases. This improvement can be done for a single location with minimal computational effort.


How did you handle missing data? I’ve used NOAA data a few times and I’m always surprised at how many days of historical data are missing. They have also stopped recording in certain locations and then start in new locations over time making it hard to get solid historical weather information.


Open-Meteo has a great API too. I used it to build my iOS weather app Frej (open source and free: https://github.com/boxed/frej)

It was super easy and the responses are very fast.


That’s awesome! I’ve hooked something similar up to my service - https://dropory.com which predicts which day it will rain the least for any location

Based on historical data!


Yikes, after completed three steps I was asked for my email. No to your bait and switch, thanks!


It can take up to 10 min to generate a report - I had a spinner before but people just left the page. So I implemented a way to send it to them instead. I’ve never used the emails for anything else than that. Try it with a 10 min disposable email address if you like. Thanks for your feedback!


Ok, seems like your UI is not coming from a place of malice. However, pulling out an email input form at the final step is a very widespread UI dark pattern, so if nothing else please let people know that you will ask their email before they start interacting with your forms.


Hi Jeff, Great work, Respect!

I just hit the daily limit on the second request at https://climate-api.open-meteo.com/v1/climate

I see the limit for non-commercial use should be "less than 10.000 daily API calls". Technically 2 is less than 10.000, I know, but still I decided to drop you a comment. :)


10.000 requests / (24 hours * 60 minutes * 60 seconds) = 0.11 requests / second

or 1 request every ~9 seconds.

Maybe you just didn't space them enough.


Maybe, that would be funny. ~7 requests per minute would be a more dev-friendly way of enforcing the same quota.


I confirm, open-meteo is awesome and has a great API (and API playground!). And is the only source I know to offer 2 weeks of hourly forecasts (I understand at that point they are more likely to just show a general trend, but it still looks spectacular).

It's a pleasure being able to use it in https://weathergraph.app


> And is the only source I know to offer 2 weeks of hourly forecasts

Enjoy the data directly from the source producing them.

American weather agency: https://www.nco.ncep.noaa.gov/pmb/products/gfs/

European weather agency: https://www.ecmwf.int/en/forecasts/datasets/open-data

The data’s not necessarily east to work with, but it’s all there, and you get all the forecast ensembles (potential forecasted weather paths) too


Thank you, I didn't know! I'd love to, but I'd need another 24 hours in a day to also process the data - I'm glad I can build on a work of others and use the friendly APIs :).


This is awesome. I was trying to do a weather project a while ago, but couldn't find an API to suit my needs for the life of me. It looks like yours still doesn't have exactly everything I'd want but it still has plenty. Mainly UV index is something I've been trying to find wide historical data for, but it seems like it just might not be out there. I do see you have solar radiation, so I wonder if I could calculate it using that data. But I believe UV index also takes into account things like local air pollution and ozone forecast as well.


How about https://pirateweather.net/en/latest/ ?

Does anyone have a compare this API with the latest API we have here?


Both APIs use weather models from NOAA GFS and HRRR, providing accurate forecasts in North America. HRRR updates every hour, capturing recent showers and storms in the upcoming hours. PirateWeather gained popularity last year as a replacement for the Dark Sky API when Dark Sky servers were shut down.

With Open-Meteo, I'm working to integrate more weather models, offering access not only to current forecasts but also past data. For Europe and South-East Asia, high-resolution models from 7 different weather services improve forecast accuracy compared to global models. The data covers not only common weather variables like temperature, wind, and precipitation but also includes information on wind at higher altitudes, solar radiation forecasts, and soil properties.

Using custom compression methods, large historical weather datasets like ERA5 are compressed from 20 TB to 4 TB, making them accessible through a time-series API. All data is stored in local files; no database set-up required. If you're interested in creating your own weather API, Docker images are provided, and you can download open data from NOAA GFS or other weather models.


This is great. I am very curious about the architectural decisions you've taken here. Is there a blog post / article about them? 80 yrs of historical data -- are you storing that somewhere in PG and the APIs are just fetching it? If so, what indices have you set up to make APIs fetch faster etc. I just fetched 1960 to 2022 in about 12 secs.


Traditional database systems struggle to handle gridded data efficiently. Using PG with time-based indices is memory and storage extensive. It works well for a limited number of locations, but global weather models at 9-12 km resolution have 4 to 6 million grid-cells.

I am exploiting on the homogeneity of gridded data. In a 2D field, calculating the data position for a graphical coordinate is straightforward. Once you add time as a third dimension, you can pick any timestamp at any point on earth. To optimize read speed, all time steps are stored sequentially on disk in a rotated/transposed OLAP cube.

Although the data now consists of millions of floating-point values without accompanying attributes like timestamps or geographical coordinates, the storage requirements are still high. Open-Meteo chunks data into small portions, each covering 10 locations and 2 weeks of data. Each block is individually compressed using an optimized compression scheme.

While this process isn't groundbreaking and is supported by file systems like NetCDF, Zarr, or HDF5, the challenge lies in efficiently working with multiple weather models and updating data with each new weather model run every few hours.

You can find more information here: https://openmeteo.substack.com/i/64601201/how-data-are-store...


I always suspect that they don't tell me the actual temperature. Maybe I am totally wrong but I suspect. I need to get my own physical thermometer not the digital one in my room and outside my house and have a camera focussed on it. So that later I can speed up the video and see how much the weather varied the previous night.


What? Why?


There is also https://github.com/google-research/weatherbench2 which has baselines of numerical weather models.


this is really cool, I've been looking for good snow-related weather APIs for my business. I tried looking on the site, but how does it work, being coordinates-based?

I'm used to working with different weather stations, e.g. seeing different snowfall prediction at the bottom of a mountain, halfway up, and at the top, where the coordinates are quite similar.


You'll need a local weather expert to assist, as terrain, geography and other hyper-local factors create forecasting unpredictability. For example, Jay Peak in VT has its own weather, the road in has no snow, but it's a raging snowstorm on the mountain.


Is it able to provide data on extreme events. Say, the current and potential path of a hurricane? similar to .kml that NOAA provides


Extreme weather is predicted by numerical weather models. Correctly representing hurricanes has driven development on the NOAA GFS model for centuries.

Open-Meteo focuses on providing access to weather data for single locations or small areas. If you look at data for coastal areas, forecast and past weather data will show severe winds. Storm tracks or maps are not available, but might be implemented in the future.


I would love to hear about this centuries-old NOAA GFS model. The one I know about definitely doesn't have that kind of history behind it.


Some of the oldest data may come from ships logs back to 1836

https://www.reuters.com/graphics/CLIMATE-CHANGE-ICE-SHIPLOGS...


Sorry, decades.

KML files for storm tracks are still the best way to go. You could calculate storm tracks yourself for other weather models like DWD ICON, ECMWF IFS or MeteoFrance ARPEGE, but storm tracks based on GFS ensembles are easy to use with sufficient accuracy


Appreciate the response. Do you know of any services that provide what I described in the previous comments? I'm specifically interested in extreme weather conditions and their visual representation (hurricanes, tornados, hails etc.) with API capabilities


Go to: nhc.noaa.gov/gis There's a list of data and products with kmls and kmzs and geojsons and all sorts of stuff. I haven't actually used the API for retrieving these, but NOAA has a pretty solid track record with data dissemination.


I was going to ask about air quality, but just opened the site and you have air quality as well! Thanks!


Are multiple data sources supported?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: