Query the USDA NWCC Air and Water Database REST API • awdb

The awdb package provides functions for querying the four endpoints of the Air and Water Database (AWDB) REST API maintained by the National Water and Climate Center (NWCC) at the United States Department of Agriculture (USDA). Endpoints include data, forecast, reference-data, and metadata. The package is extremely light weight, with Rust via extendr doing most of the heavy lifting to deserialize and flatten deeply nested JSON responses. The package is also designed to support pretty printing of tibbles if you import the tibble package.

Installation

You can install the release version of awdb from CRAN with:

install.packages("awdb")

Or you can get the development version from GitHub with:

# install.packages("pak")
pak::pak("kbvernon/awdb")

The AWDB REST API

This package provides a separate function to query each endpoint at the USDA AWDB REST API:

Endpoint	Function
data	`get_elements()`
forecasts	`get_forecasts()`
reference-data	`get_references()`
metadata	`get_stations()`

Because the API does not provide for spatial queries, requests made with areas of interest (aoi) first ask the API metadata endpoint for all stations in the database and their spatial coordinates. It converts the set to an sf object, performs a spatial filter with the aoi, and then sends another request for elements or forecasts at the stations in the aoi.

Get Stations

Find all AWDB stations around Bear Lake in northern Utah that measure soil moisture percent at various depths.

library(awdb)
library(sf)
library(tibble)

stations <- get_stations(bear_lake, elements = "SMS:*")

stations
#> Simple feature collection with 9 features and 14 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -111.6296 ymin: 41.68541 xmax: -111.1663 ymax: 42.4132
#> Geodetic CRS:  WGS 84
#> # A tibble: 9 × 15
#>   station_triplet station_id state_code network_code name   dco_code county_name
#>   <chr>           <chr>      <chr>      <chr>        <chr>  <chr>    <chr>      
#> 1 374:UT:SNTL     374        UT         SNTL         Bug L… UT       Rich       
#> 2 484:ID:SNTL     484        ID         SNTL         Frank… UT       Franklin   
#> 3 1114:UT:SNTL    1114       UT         SNTL         Garde… UT       Cache      
#> 4 493:ID:SNTL     493        ID         SNTL         Giveo… ID       Bear Lake  
#> 5 1115:UT:SNTL    1115       UT         SNTL         Klond… UT       Cache      
#> 6 1013:UT:SNTL    1013       UT         SNTL         Templ… UT       Cache      
#> 7 823:UT:SNTL     823        UT         SNTL         Tony … UT       Cache      
#> 8 1113:UT:SNTL    1113       UT         SNTL         Tony … UT       Cache      
#> 9 1098:UT:SNTL    1098       UT         SNTL         Usu D… UT       Rich       
#> # ℹ 8 more variables: huc <chr>, elevation <dbl>, data_time_zone <dbl>,
#> #   pedon_code <chr>, shef_id <chr>, begin_date <chr>, end_date <chr>,
#> #   geometry <POINT [°]>

Get Elements

USDA NWCC refers to soil, snow, stream, and weather variables measured at AWDB stations as “elements.” Here we get snow water equivalent and soil moisture measurements around Bear Lake in early May of 2015.

elements <- get_elements(
  bear_lake,
  elements = c("WTEQ", "SMS:8"),
  awdb_options = set_options(
    begin_date = "2015-05-01",
    end_date = "2015-05-07"
  )
)

elements[c(
  "station_triplet",
  "element_code",
  "element_values"
)]
#> # A tibble: 10 × 3
#>    station_triplet element_code element_values  
#>    <chr>           <chr>        <list>          
#>  1 374:UT:SNTL     WTEQ         <tibble [7 × 2]>
#>  2 471:ID:SNTL     WTEQ         <tibble [7 × 2]>
#>  3 484:ID:SNTL     WTEQ         <tibble [7 × 2]>
#>  4 1114:UT:SNTL    WTEQ         <tibble [7 × 2]>
#>  5 493:ID:SNTL     WTEQ         <tibble [7 × 2]>
#>  6 1115:UT:SNTL    WTEQ         <tibble [7 × 2]>
#>  7 1013:UT:SNTL    WTEQ         <tibble [7 × 2]>
#>  8 823:UT:SNTL     WTEQ         <tibble [7 × 2]>
#>  9 1113:UT:SNTL    WTEQ         <tibble [7 × 2]>
#> 10 1098:UT:SNTL    WTEQ         <tibble [7 × 2]>

elements[["element_values"]][[1]]
#> # A tibble: 7 × 2
#>   date       value
#>   <chr>      <dbl>
#> 1 2015-05-01   2.1
#> 2 2015-05-02   1.1
#> 3 2015-05-03   0  
#> 4 2015-05-04   0  
#> 5 2015-05-05   0  
#> 6 2015-05-06   0  
#> 7 2015-05-07   0

These are time series, so the element values come in a list column containing data.frames with at least date and value columns. Using tidyr::unnest() is helpful for unpacking all of them.

Get Forecasts

Get streamflow forecasts for the Cascades in west central Oregon. As with get_elements(), this returns a list column.

forecasts <- get_forecasts(cascades, elements = "SRVO")

forecasts[c(
  "station_triplet",
  "element_code",
  "publication_date",
  "forecast_period",
  "forecast_values"
)]
#> # A tibble: 155 × 5
#>    station_triplet element_code publication_date forecast_period forecast_values
#>    <chr>           <chr>        <chr>            <chr>           <list>         
#>  1 14050000:OR:US… SRVO         2025-01-01 00:00 02-01:07-31     <tibble>       
#>  2 14050000:OR:US… SRVO         2025-02-01 00:00 02-01:07-31     <tibble>       
#>  3 14050000:OR:US… SRVO         2025-01-01 00:00 02-01:09-30     <tibble>       
#>  4 14050000:OR:US… SRVO         2025-02-01 00:00 02-01:09-30     <tibble>       
#>  5 14050000:OR:US… SRVO         2025-03-01 00:00 03-01:07-31     <tibble>       
#>  6 14050000:OR:US… SRVO         2025-03-01 00:00 03-01:09-30     <tibble>       
#>  7 14050000:OR:US… SRVO         2025-01-01 00:00 04-01:07-31     <tibble>       
#>  8 14050000:OR:US… SRVO         2025-02-01 00:00 04-01:07-31     <tibble>       
#>  9 14050000:OR:US… SRVO         2025-03-01 00:00 04-01:07-31     <tibble>       
#> 10 14050000:OR:US… SRVO         2025-04-01 00:00 04-01:07-31     <tibble>       
#> # ℹ 145 more rows

forecasts[["forecast_values"]][[1]]
#> # A tibble: 5 × 2
#>   probability value
#>   <chr>       <dbl>
#> 1 10             50
#> 2 30             43
#> 3 50             38
#> 4 70             34
#> 5 90             28

Get References

A somewhat unique endpoint for this REST API is called “References.” If you have ever worked with government employees or the military, you maybe are aware that they prefer an extremely condensed form of speech jammed full of acronyms and other codes. The references endpoint helps clarify their cryptic language with data dictionaries that explain what each code used in the database actually means. It also in the process provides an exhaustive list of available options. All of this, you can access with get_references(). For instance, if you want a table showing all possible station elements in the AWDB, it is as simple as this.

get_references("elements")
#> # A tibble: 116 × 9
#>    code  name     physical_element_name function_code data_precision description
#>    <chr> <chr>    <chr>                 <chr>                  <int> <chr>      
#>  1 TAVG  AIR TEM… air temperature       V                          1 Average Ai…
#>  2 TMAX  AIR TEM… air temperature       X                          1 Maximum Ai…
#>  3 TMIN  AIR TEM… air temperature       N                          1 Minimum Ai…
#>  4 TOBS  AIR TEM… air temperature       C                          1 Instantane…
#>  5 PRES  BAROMET… barometric pressure   C                          2 Barometric…
#>  6 BATT  BATTERY  battery               C                          2 Battery Vo…
#>  7 BATV  BATTERY… battery               V                          2 <NA>       
#>  8 BATX  BATTERY… battery               X                          2 Maximum Ba…
#>  9 BATN  BATTERY… battery               N                          2 Minimum Ba…
#> 10 ETIB  BATTERY… battery-eti precip g… C                          2 <NA>       
#> # ℹ 106 more rows
#> # ℹ 3 more variables: stored_unit_code <chr>, english_unit_code <chr>,
#> #   metric_unit_code <chr>

Additional Query Parameters

In the above examples, we use set_options() to pass additional query parameters. If you don’t pass any arguments, it uses defaults assumed by the AWDB REST API. It’s important to note that not all parameters are passed to every endpoint. The references endpoint, for example, doesn’t take any query parameters other than reference_type. To see what goes where, you can print the awdb_options list returned by set_options(). This will also show you the current values for each parameter.

set_options()
#> 
#> ── AWDB Query Parameter Set ────────────────────────────────────────────────────
#> Options passed to each endpoint.
#> 
#>                           VALUE STATION ELEMENT FORECAST
#> networks                      *     [X]     [X]      [X]
#> duration                  DAILY     [X]     [X]      [ ]
#> begin_date                 NULL     [ ]     [X]      [ ]
#> end_date                   NULL     [ ]     [X]      [ ]
#> period_reference            END     [ ]     [X]      [ ]
#> central_tendency           NULL     [ ]     [X]      [ ]
#> return_flags              FALSE     [ ]     [X]      [ ]
#> return_original_values    FALSE     [ ]     [X]      [ ]
#> return_suspect_values     FALSE     [ ]     [X]      [ ]
#> begin_publication_date     NULL     [ ]     [ ]      [X]
#> end_publication_date       NULL     [ ]     [ ]      [X]
#> exceedence_probabilities   NULL     [ ]     [ ]      [X]
#> forecast_periods           NULL     [ ]     [ ]      [X]
#> station_names              NULL     [X]     [X]      [X]
#> dco_codes                  NULL     [X]     [X]      [X]
#> county_names               NULL     [X]     [X]      [X]
#> hucs                       NULL     [X]     [X]      [X]
#> return_forecast_metadata  FALSE     [X]     [ ]      [ ]
#> return_reservoir_metadata FALSE     [X]     [ ]      [ ]
#> return_element_metadata   FALSE     [X]     [ ]      [ ]
#> active_only                TRUE     [X]     [X]      [X]
#> request_size                 10     [ ]     [X]      [X]