Pirate Climate
Fast Hyperlinks
Publications and Press
Climate forecasts are primarily discovered utilizing fashions run by authorities businesses, however the outputs aren’t simple to make use of or in codecs constructed for the net.
To attempt to tackle this, I’ve put collectively a service that reads climate forecasts and serves it following the Dark Sky API fashion.
Earlier than going any farther, I needed so as to add a link to sign up and support this project! Operating this on AWS signifies that it scales fantastically and is way more dependable than if I used to be making an attempt to host this, but additionally prices actual cash. I might like to preserve this challenge going long-term, however I am nonetheless paying again my pupil loans, which limits how a lot I can spend on this! Something helps, and a $2 month-to-month donation lets me elevate your API restrict from 10,000 calls/ month to 25,000 calls per 30 days.
Alternatively, I even have a GitHub Sponsorship web page arrange on my profile! This offers the choice to make a one-time donation to contribute this challenge. This challenge (particularly the free tier) would not be potential with out the continuing assist from the challenge sponsors, in order that they’re the heros here!
Latest Updates- Spring 2023
As much as model 1.4! As all the time, particulars can be found within the changelog.
- New sign-up portal: https://pirate-weather.apiable.io/. This may let me spend manner much less time managing subscriptions, and extra time knowledge wrangling. Additionally addresses a ton of bugs associated to the outdated developer portal. APIs requested through the outdated portal will proceed to work although!
- A lot better alert assist.
- A ton of varied bug fixes.
- Printed official API specifications.
- Main revamp of the Residence Assistant Integration.
Background
This challenge began from two factors: as a part of my PhD, I needed to turn into very conversant in working with NOAA forecast outcomes (https://orcid.org/0000-0003-4725-3251). Individually, an outdated pill arrange as a “Magic Mirror,” and was utilizing a weather module that relied on the Darkish Sky API, in addition to my Home Assistant setup. So after I heard that it was shutting down, I believed, “I’m wondering if I might do that.” Plus, I like studying new issues (http://alexanderrey.ca/), and I had been in search of a challenge to be taught Python on, so this appeared like the right alternative!
Spoiler alert, nevertheless it was far more tough than I believed, however realized quite a bit all through the method, and I feel the top outcome turned out very well!
Why?
This API is designed to be a drop in alternative/ various to the Darkish Sky API, and as a instrument for assessing GFS and HRRR forecasts through a JSON API. This solves two targets:
- It’ll additionally permit legacy purposes to proceed operating after the Darkish Sky shutdown, since as Residence Assistant Integrations, Magic Mirror playing cards, and a complete host of different purposes which were developed over time.
- For anybody that’s curious about figuring out precisely how your climate forecasts are generated, that is the “present me the numbers” method, for the reason that knowledge returned is instantly from NOAA fashions, and each processing step I do is documented. There are lots of existing services that present customized forecasts utilizing their very own distinctive applied sciences, which may undoubtedly enhance accuracy, however I am an engineer, so I needed to have the ability to know what is going on into the forecasts I am utilizing. If you happen to’re the kind of one that needs a dense 34-page PowerPoint about why it rained when the forecast mentioned it would not, then this is likely to be for you.
- I needed to offer a extra group targeted supply of climate knowledge. Climate is native, however I am solely in a single spot, so I depend on folks submitting issues to assist enhance the forecast!
Present Course of- AWS
The important thing to the whole lot right here is AWS’s Elastic File System (EFS). I needed to keep away from “reinventing the wheel” as a lot as potential, and there’s already an incredible instrument for extracting knowledge from forecast files- WGRIB2! Furthermore, NOAA knowledge was already being stored on AWS. This meant that, from the ten,000 ft perspective, knowledge might be downloaded and saved on a file system that would then be simply accessed by a serverless operate, as a substitute of making an attempt to maneuver it to a database.
That’s the “one-sentence” rationalization of how that is arrange, however for extra particulars, learn on!
Structure overview
- EventBridge timers launch Step Operate to set off Fargate
- WGRIB2 Picture pulled from repo
- Fargate Cluster launched
- Job to: Obtain, Merge, Chunk GRB recordsdata
- Information saved to EFS
- NWS alerts saved to EFS as GeoJSON
- Lambda reads forecast knowledge, processes and interpolates, returns JSON
- Expose JSON forecast knowledge through API Gateway
- Distribute API endpoint through CloudFront
- Monitor API key utilization
Information Sources
Ranging from the start, three NOAA fashions are used for the uncooked forecast knowledge: HRRR, GFS, and the GEFS.
HRRR
The Excessive Decision Speedy Refresh (HRRR) gives forecasts over the entire continental US, in addition to a lot of the Canadian inhabitants. 15-minute forecasts each 3 km are supplied each hour for 18 hours, and each 6 hours a 48-hour forecast is run, all at a 3 km decision. This was good for this challenge, since Darkish Sky supplied a minute-by-minute forecast for 1 hour, which could be loosely approximated utilizing the 15-minute HRRR forecasts. HRRR has virtually the entire variables required for the API, except UV radiation and ozone. Personally, that is my favorite climate mannequin, and the one which produced the most effective outcomes throughout my thesis analysis on Hurricane Dorian https://doi.org/10.1029/2020JC016489.
GFS
The International Forecast System (GFS) is NOAA’s world climate mannequin. Operating with a decision of about 30 km (0.25 levels), the GFS mannequin gives hourly forecasts out of 120 hours, and 3-hour forecasts out to 240 hours. Right here, GFS knowledge is used for anyplace on this planet not lined by the HRRR mannequin, and for all outcomes previous 48 hours.
The GFS mannequin additionally underpins the International Ensemble Forecast System (GEFS), which is the 30-member ensemble (the web site says 21, however there are 30 knowledge recordsdata) model of the GFS. Which means 30 totally different “variations” of the mannequin are run, every with barely totally different beginning assumptions. The API makes use of the GEFS to get precipitation sort, amount, and likelihood, because it appeared like probably the most correct manner of figuring out this. I do not know how Darkish Sky did it, and I’m very open to suggestions about different methods it might be assigned, since getting the precipitation likelihood quantity turned out to be one of the advanced components of all the setup!
GEFS
The International Ensemble Forecast System (GEFS) is the ensemble model of NOAA’s GFS mannequin. By operating totally different variations parameters and inputs, 30 totally different variations of this mannequin are run on the similar time, offering 3-hour forecasts out to 240 hours. The API makes use of the GEFS to get precipitation sort, amount, and likelihood, because it appeared like probably the most correct manner of figuring out this. I do not know how Darkish Sky did it, and I’m very open to suggestions about different methods it might be assigned, since getting the precipitation likelihood quantity turned out to be one of the advanced components of all the setup!
ERA5
To supply historic climate knowledge, the European Reanalysis 5 Dataset is used. This supply is especially fascinating, since not like the real-time NOAA fashions that I must convert, it is supplied within the “cloud native” Zarr file format. This lets the info be accessed instantly and shortly in S3 from Lambda. There aren’t almost as many, many parameters out there as with the GFS or HRRR fashions, however there are sufficient to cowl an important variables.
Others
There are a selection of different fashions that I might have used as a part of this API. The Canadian mannequin (HRDPS) is even increased decision (2.5 km), and appears to do notably effectively with precipitation. Additionally, the European models are generally thought of higher world fashions than the GFS mannequin is, which might make it an incredible addition. Nevertheless, HRRR and GFS have been sufficient to get issues working, and since they’re saved on AWS already, there have been no knowledge switch prices!
As the remainder of this doc explains, the info pipeline right here is pretty versatile, and given sufficient curiosity, it will be comparatively easy so as to add extra mannequin sources/ historic forecasts.
Forecast knowledge is supplied by NOAA in GRIB2 format. This file sort has a steep studying curve, however is good as soon as I noticed the way it labored. Briefly, it saves all of the forecast parameters, and contains metadata on their names and items. GRIB recordsdata are compressed to avoid wasting house, however are referenced in a manner that lets particular person parameters be shortly extracted. As a way to see what’s going on in a GRIB file, the NASA Panoply reader works extremely effectively.
Lambda, Fargate, and WGRIB2 Setup
AWS Lambda permits code to run with out requiring any underlying server infrastructure (serverless). In my case, I used Python because the goal language, since I used to be curious about studying it! As soon as triggered, a Lambda operate will run with the configured reminiscence. It may well pull knowledge from S3 or the Elastic File System (EFS), and may use data handed as a part of the set off. Lambda capabilities can depend upon layers or assist code packages. This API makes use of Lambda to retrieve and course of the forecast knowledge when a name is made.
Lambda handles Python packages as layers, so I created layers for NetCDF4, Astral, pytz, and timezonefinder. To get historic knowledge, I added a Zarr layer; nevertheless, it’s too giant to be mixed with the NetCDF4 layer in Lambda, which is why it is a separate API name in comparison with the forecast API.
For processing, I needed to make use of the WGRIB2 utility as a lot as I might, because it has been extensively optimized for this kind of work. Pywgrib2 was just lately launched, which is the Python interface for working with WGRIB2 recordsdata. I used the pywgrib2_s flavour, after which all the time known as it utilizing the .wgrib2
method.
I initially had this setup utilizing Lambda, however bumped into the five hundred MB momentary storage restrict that Lambda has. As a substitute, processing is now executed utilizing the Elastic Conatiner Service (ECS) and Fargate. This lets code run inside a Docker container, which I set as much as compile the WGRIB2 code from supply. This picture is saved on AWS, and will get retrieved every time a processing job is run! The Dockerfile to generate this container is fairly easy, counting on Ubuntu as a base picture and following the directions from the WGRIB2 README. Solely fascinating half is a nifty “one-liner” to exchange a selected line within the makefile: `RUN sed -i “s|MAKE_SHARED_LIB=0|MAKE_SHARED_LIB=1|g” makefile’.
Information Pipeline
Set off
Forecasts are saved from NOAA onto the AWS Public Cloud into three buckets for the HRRR, GFS, and GEFS:
GFS | GEFS | HRRR- 48h | HRRR- 18h/ SubHourly | |
---|---|---|---|---|
Run Instances (UTC) | 0,6,12,18 | 0,6,12,18 | 0,6,12,18 | 0-24 |
Delay | 5:00 | 7:00 | 2:30 | 1:45 |
Ingest Instances (UTC) | 5,11,17,23 | 7,13,19,1 | 2:30,8:30,14:30,20:30 | 1:45-00:45 |
Every rule calls a special AWS Step Function, which is the instrument that oversees the info pipeline. The step operate takes the present time from the set off, provides a number of different environmental parameters (like which bucket the info is saved in and which processing script to make use of), after which lastly begins a Fargate Job utilizing the WGRIB2/ Python Docker picture! Step capabilities have the added perk that they’ll repeat the duty if it fails for some motive. I spent a while optimizing the duties to maximize network speed and decrease the RAM necessities, selecting 1 CPU and 5 GB of RAM. The Fargate job is about as much as have entry to the NOAA S3 buckets, in addition to an EFS file system to avoid wasting the processed recordsdata. The python processing scripts are defined beneath, with the supply code out there on the repository.
Obtain, Filter, and Merge
For the entire fashions, the obtain course of works in the same manner:
- Get the bucket, time, and obtain path from the environmental variables set by the step operate.
- Arrange paths on the EFS filesystem.
- Step by the required recordsdata and obtain utilizing
boto3
.
For the HRRR mannequin, the wind instructions should be transformed from grid relative to earth relative, utilizing the wgrib2 -new_grid_winds
command. Individually, for the GFS/ GEFS fashions, there are two amassed precipitation fields (APCP
), one representing 3 hours of accumulation, and one representing 0 to the forecast hour. wgrib2 has a -ncep_norm
command; nevertheless, it requires that on a regular basis steps are in the identical grib file, which is not how they’re saved to the buckets. As a substitute, I used tip #66 from the (ever useful) wgrib2 tricks website, and added the -quit
command to cease wgrib2 from processing the second APCP
file.
My full pywgrib2_s command ended up wanting like this:
pywgrib2_s.wgrib2([download_path, '-new_grid_winds', 'earth', '-new_grid_interpolation', 'neighbor', '-match', matchString, '-new_grid', HRRR_grid1, HRRR_grid2, HRRR_grid3, download_path_GB])
pywgrib2_s.wgrib2([download_path, '-rewind_init', download_path, '-new_grid_winds', 'earth', '-new_grid_interpolation', 'neighbor', '-match', 'APCP', '-append','-new_grid', HRRR_grid1, HRRR_grid2, HRRR_grid3, download_path_GB, '-quit'])
The place matchString
was the listing of parameters, HRRR_grid1, HRRR_grid2, HRRR_grid3
are the HRRR grid parameters, and download_path_GB
was the output file location.
As soon as wgrib2 has run, the processed grib recordsdata are appended to a NetCDF file (through pywgrib2_s.wgrib2([download_path_GB, '-append', '-netcdf', download_path_NC])
). This can be a NetCDF 3 file, so no compression, however is way simpler to work with than GRIB recordsdata. After every step is added to the NetCDF, the unique GRIBs are eliminated to avoid wasting house.
For a lot of the scripts, there are literally two totally different processes occurring on the similar time, downloading barely totally different recordsdata. For the GFS mannequin, that is the first and secondary variable variations, for GEFS that is the whole ensemble in addition to the imply, and for HRRR that is the hourly and subhourly forecasts. The method is similar as above, simply replicated to cut back the variety of scripts that should be run.
Compress, Chunk, and Save
My preliminary plan was to easily save the grib recordsdata to EFS and entry them through py_wgrib2; nevertheless, regardless of EFS being very fast and wgrib2’s optimizations, this was by no means quick sufficient to be life like (~20 seconds). Finally, I used to be pointed within the route of a extra structured file sort, and since there was already an incredible NetCDF Python package deal, it appeared good!
From the merged NetCDF 3 recordsdata, the subsequent steps are fairly easy:
- Create a brand new in-memory NetCDF4 file .
- Copy variables over from NetCDF3 to NetCDF4, enabling compression and vital digit restrict for each.
- Chunk the NetCDF4 file by time to dramatically pace up entry instances and save to EFS.
- A separate pickle file is saved with the latitudes and longitudes of every grid node.
- Previous mannequin outcomes are faraway from the EFS filesystem.
Whereas the method is easy, the main points listed below are tough. The chunking and compression are the important thing parts right here, since they permit for quick knowledge retrieval, whereas utilizing an in-memory dataset speeds issues up a good bit.
Mannequin Particular Notes
- As a way to get UV knowledge, a separate grib file is required for the GFS mannequin, as it’s labeled as a “Least generally used parameters.” The information ingest steps are the identical, however there’s an additional step the place the wgrib2
-append
command is used to merge the 2 NetCDF3 recordsdata collectively. - The ensemble knowledge was by far probably the most tough to take care of. There are a number of further steps:
- The 30-ensemble grib recordsdata for a given time step are merged and saved as a grib file within the
/tmp/
- The 30-ensemble grib recordsdata for a given time step are merged and saved as a grib file within the
- The wgrib2
-ens_processing
command is then run on this merged grib file. This produces likelihood of precipitation, imply, and unfold (which is used for precipitation depth error) from the 30-member ensemble; nevertheless, it gives the likelihood of any (>0) precipitation. Since this can be a little too delicate, I used the wonderful wgrib2 trick #65, which mixes-rpn
and-set_prob
to permit arbitrary values for use. - These three values are then exported to NetCDF3 recordsdata with the
-set_ext_name
command set to 1 - The recordsdata are then transformed to NetCDF 4 and chucked in the identical manner
- For many variables, the
least vital digit
parameter is about to 1, and the compression stage can be set to 1. There may be most likely some room for additional optimization right here.
Retrieval
When a request is available in, a Lambda operate is triggered and is handed the URL parameters (latitude/ longitude/ time/ prolonged forecast/ items) as a JSON payload. These are extracted, after which the nearest grid cell to the lat/lengthy is discovered from the pickle recordsdata created from the mannequin outcomes. Utilizing the time parameter (if out there, in any other case the present time), the newest accomplished mannequin runs are recognized. Climate variables are then iteratively extracted from the NetCDF4 recordsdata and saved to a 2-dimensional numpy arrays. That is then repeated for every mannequin, skipping the HRRR outcomes the requested location is outdoors of the HRRR area. For the GFS mannequin, precipitation accumulation is adjusted from the various time step within the grib file to a typical 1-hour time step.
As soon as the info has been learn in, arrays space created for the minutely and hourly forecasts, and the info sequence from the mannequin outcomes is interpolated into these new output arrays. This course of labored extremely effectively, since NetCDF recordsdata natively save timestamps, so the identical methodology might be adopted for every knowledge supply.
Some precipitation parameters are true/false (will it rain, snow, hail, and so forth.), and for these, the identical interpolation is finished utilizing 0 and 1, after which the precipitation class with the very best worth is chosen and saved. At the moment a ten:1 snow to rain ratio is used (1 mm of rain is 10 mm of snow), however this might be improved. The place out there, HRRR sub-hourly outcomes are used for minutely precipitation (and all presently outcomes), and the GFS ensemble mannequin is used for the hourly time sequence. Each day knowledge is calculated by processing the hourly time sequence, calculating most, minimal, and imply values.
For the GFS and GEFS fashions, the returned worth is a weighted common (by 1 over the gap) of the closest 9 grid cells. For variables have been taking a median is not life like (true/false variables), the commonest (mode) result’s used. Whereas this method is not used for the HRRR mannequin, for the reason that cells are a lot nearer collectively, I got it working utilizing the numpy np.argpartition
operate to search out the 9 closest factors.
Just a few extra parameters are calculated with out utilizing the NOAA fashions. The extremely useful timezonefinder python library is used to find out the native time zone for a request, which is required to find out when days begin and finish and which icon to make use of. Astral is used for dawn, sundown, and moon phases. Obvious temperature is discovered by adjusting for both wind chill or humidex, and the UV Index is calculated from the modelled photo voltaic radiation. This variable has some uncertainty, for the reason that official documentation means that these values ought to be multiplied by 40. I’ve discovered this produces values which are incorrect, and as a substitute, the mannequin outcomes are multiplied by 0.4. Darkish Sky gives each temperatureHigh
and temperatureMax
values, and since I’m not certain what the distinction between them is, the identical worth is presently used for each.
Icons are based mostly on the specific precipitation whether it is anticipated, and the full cloud cowl share and visibility in any other case. For climate alerts, a GeoJSON is downloaded each 10 minutes from the NWS, and the requested level is iteratively checked to see whether it is inside one of many alert polygons. If some extent is inside an alert, the main points are extracted from the GeoJSON and returned.
Lastly, the forecast is transformed into the requested items (defaulting to US customary items for compatibility), after which into the returned JSON payload. The lambda operate takes between 1 and three seconds to run, relying on if the purpose is contained in the HRRR mannequin area, and what number of alerts are presently lively within the US.
Historic Information
Historic knowledge is saved within the AWS ERA5 bucket in Zarr format, which makes it extremely simple to work with right here! I largely adopted the method outlined right here: https://github.com/zflamig/birthday-weather, with some minor tweaks to learn one location as a substitute of all the area and to process accumulation variables. This dataset did not embrace cloud cowl, which introduced a major problem, since that’s what’s used to find out the climate icons. To work round this, I used the supplied shortwave radiation flux variable and in contrast it in opposition to the theoretical clear sky radiation. This is not an ideal proxy, because it does not work at evening, and there are different elements that may impression shortwave radiation aside from cloud cowl (notably elevation), nevertheless it gives an affordable approximation.
AWS API
The tip of this service depends on two different AWS merchandise, the API Gateway and developer portal. I discovered the API Gateway (utilizing the REST protocol) pretty straightforward- on this implantation there’s one useful resource, a GET
request to the customized area title, which extracts the {api-key}
and {location}
from the URL as path parameters. It additionally checks for URL question parameters. This methodology then authenticates the request, passes it to the Lambda operate, and returns the outcome.
The trickiest a part of this setup was, by far, getting the API Gateway to make use of an API key from the URL. This isn’t formally supported (versus URL question parameters). This is sensible, since passing API keys in a URL is not a [great idea](https://safety.stackexchange.com/questions/118975/is-it-safe-to-include-an-api-key-in-a-requests-url, however for compatibility, I wanted to discover a manner.
After a number of makes an attempt, what ended up working was a customized Lambda Authorizer as described here. Basically, what occurs is that the API Gateway passes the request to this quick Lambda operate, which converts the URL path parameter into the API key. That is then handed again to the API Gateway for validation. For this to work, the API Key Supply
must be set to AUTHORIZER
beneath the setting panel.
Subsequent Steps
Whereas this service presently covers virtually the whole lot that the Darkish Sky API does, I’ve a number of concepts for future enhancements to this service!
- Textual content Summaries. That is the biggest lacking piece. Darkish Sky open-sourced their translation library, so my plan is to construct off that to get this working. All the info is there, nevertheless it’s a matter of writing the logics required to go from numerical forecasts to climate summaries.
- Further sources. The strategy developed right here is essentially supply agnostic. Any climate forecast service that delivers knowledge utilizing grib recordsdata that wgrib2 can perceive (all the first ones) is theoretically able to being added in. The NOAA North American Mesoscale NAM mannequin would supply increased decision forecasts out to 4 days (as a substitute of the two days from HRRR). The Canadian HRDPS Model is one other tempting addition, because it gives knowledge at a decision even increased than HRRR (2.5 km vs. 3.5 km)! The European model can be implausible so as to add in, because it typically outperforms the GFS mannequin; nevertheless, the info just isn’t open, which might add a major value.
Different Notes and Assumptions
- Whereas this API will give minutely forecasts for anyplace on this planet, they’re calculated utilizing the HRRR-subhourly forecasts, so solely correct to 15-minute durations. Exterior of the HRRR area, they’re calculated utilizing the GFS hourly forecasts, so actually not including a lot worth!
- Precipitation chances are a tough downside to solve- climate fashions do not give a variety of outcomes, only one reply. To get chances, this implementation depends on the International Ensemble Forecast System (GEFS). This can be a 30-member ensemble, so if 1 member predicts precipitation, the likelihood shall be 1/30. GEFS knowledge can be used to foretell precipitation sort and accumulation. A 1:10 snow-water ratio is assumed.
- Present circumstances are based mostly on mannequin outcomes (from HRRR-subhourly), which assimilates observations, however not direct observations.
- Why “PirateWeather”? I’ve all the time thought that the HRRR mannequin was pronounced the identical manner because the basic pirate “ARRR”. Additionally, there’s one company on the market that thinks APIs could be copyrighted, which could apply right here.
Do you utilize PirateWeather? Open a pull request so as to add it to the listing.