💾 Archived View for republic.circumlunar.space › users › johngodlee › posts › 2020-11-08-soil_query.… captured on 2023-06-14 at 14:50:55. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-04)
-=-=-=-=-=-=-
DATE: 2020-11-08
AUTHOR: John L. Godlee
This week I have been working on the SEOSAW database[1], which holds tree measurement data from around 7000 survey plots in southern Africa. The database is always difficult to deal with, mostly because the component datasets were all collected with different initial purposes, meaning there is a significant amount of data bashing to get each one into the SEOSAW format. I chose to use an R package to hold functions and workflows for cleaning the datasets and compiling the main database. In previous versions of the package I had a number of reduced size spatial data layers held as data objects in the package, which I could then query for information at each plot such as estimated herbivore biomass, elevation, and administrative region. I've since removed all of these functions, because they were too temperamental and the trade-off in spatial resolution that came from storing reduced size versions of each data layer in the package was too great. Now the data layers are queried at full resolution in a separate process right at the end of dataset compilation.
One of the functions which I was actually quite proud of was one that queried the SoilGrids[2] REST API[3] to get soil information. The function also provided functionality for averaging over soil depths. The SoilGrids REST API is still experimental, and since I first wrote the function they upgraded to v2.0. This broke the function and so it didn't work for a long time.
3: https://rest.soilgrids.org/soilgrids/v2.0/docs
I re-wrote the function yesterday and thought I would post it here, as it probably won't get any use elsewhere. I also wrote a companion function which sends an empty query to get back all possible combinations of soil attributes, depths, and values (mean, Q0.05 etc.).
#' Query SoilGrids via the REST API #' #' See https://rest.soilgrids.org/soilgrids/v2.0/docs for more information #' #' @param x dataframe of plot level data #' @param plot_id column name string of plot IDs #' @param longitude_of_centre column name string of plot longitude #' @param latitude_of_centre column name string of plot latitude #' @param attrib vector of soil attributes to extract #' @param depth vector of soil depths over which to extract the attributes, e.g. 0-5, 30-60 #' @param average optional list of vectors, each of length 2, describing ranges #' of depths over which to average (mean) the values of attrib #' @param value vector of soil values to extract for each attribute, e.g. mean, Q0.05 #' #' @details See \code{soilQueryOptions()} for all possibly combinations of #' attrib, depth, and value #' #' @return dataframe of soil values by depth #' #' @importFrom httr GET content #' @export #' soilQuery <- function(x, plot_id = "plot_id", longitude_of_centre = "longitude_of_centre", latitude_of_centre = "latitude_of_centre", attrib = c("cec", "cfvo", "clay", "nitrogen", "ocd", "ocs", "phh20", "sand", "silt", "soc"), depth = c("0-5", "5-15", "15-30", "30-60", "60-100", "100-200"), average = list(c(0,30)), value = "mean") { default_attrib <- c("bdod", "cec", "cfvo", "clay", "nitrogen", "ocd", "ocs", "phh20", "sand", "silt", "soc") default_depth <- c("0-5", "5-15", "15-30", "30-60", "60-100", "100-200") depth_values <- unique(unlist(strsplit(default_depth, "-"))) default_value <- c("Q0.05", "Q0.5", "Q0.95", "mean", "uncertainty") # Check input if (any(!attrib %in% default_attrib)) { stop("Illegal soil attribute: ", paste(attrib[!attrib %in% default_attrib], collapse = ", "), "\n\tAllowed values: ", paste(default_attrib, collapse = ", "), "\n\tRun `soilQueryOptions()` to find valid attributes") } if (any(!depth %in% default_depth)) { stop("Illegal depth: ", paste(depth[!depth %in% default_depth], collapse = ", "), "\n\tallowed values: ", paste(default_depth, collapse = ", "), "\n\tRun `soilQueryOptions()` to find valid depths") } if (!is.null(average)) { if (any(!unlist(average) %in% depth_values)) { stop("Illegal averaging depth: ", paste(average[!unlist(average) %in% depth_values], collapse = ", "), "\n\tallowed values: ", paste(depth_values, collapse = ", ")) } if (any(unlist(average) > max(depth_values) | unlist(average) < min(depth_values))) { stop("average range values outside sampled depths: ", paste(unlist(average)[unlist(average) > max(depth_values) | unlist(average) < min(depth_values)], collapse = ", ")) } if (any(sapply(average, function(y) { length(y) != 2 }))) { stop("All average ranges must be vectors of length 2") } } if (any(!value %in% default_value)) { stop("Illegal value: ", paste(value[!value %in% default_value], collapse = ", "), "\n\tallowed values: ", paste(default_value, collapse = ", "), "\n\tRun `soilQueryOptions()` to find valid values") } # Construct query attrib_string <- paste0(paste0("&property=", attrib), collapse = "") depth_string <- paste0(paste0("&depth=", depth, "cm"), collapse = "") value_string <- paste0(paste0("&value=", value), collapse = "") queries <- lapply(1:nrow(x), function(y) { call <- paste0("https://rest.soilgrids.org/soilgrids/v2.0/properties/query?", "lon=", x[y, longitude_of_centre], "&lat=", x[y, latitude_of_centre], attrib_string, depth_string, value_string) }) # Run GET query query_get <- lapply(queries, httr::GET) # If any queries fail, end function if (any(unlist(lapply(query_get, `[[`, "status_code")) != 200)) { stop("Some queries failed") } # Flatten to list query_list <- lapply(query_get, httr::content, as = "parsed") # For each query, for each attrib, for each depth, for each value, extract values query_extract <- lapply(query_list, function(y) { lapply(y$properties$layers, function(z) { lapply(z$depths, function(v) { lapply(v$values, function(w) { w }) }) }) }) # Make tidy dataframe of values extract_df <- data.frame( plot_id = rep(x[[plot_id]], each = length(attrib) * length(depth) * length(value)), attrib = rep(rep(attrib, each = length(depth) * length(value)), times = length(x[[plot_id]])), depth = rep(rep(rep(depth, each = length(value)), times = length(x[[plot_id]])), times = length(attrib)), value = rep(rep(rep(value, times = length(x[[plot_id]])), times = length(attrib)), times = length(depth)), extract = unlist(query_extract)) # Optionally average each value and attrib over depths if (!is.null(average)) { extract_split <- split(extract_df, list(extract_df$plot_id, extract_df$attrib, extract_df$value)) extract_df <- do.call(rbind, lapply(extract_split, function(y) { do.call(rbind, lapply(average, function(z) { y$min_depth <- sapply(strsplit(y$depth, "-"), `[`, 1) y$max_depth <- sapply(strsplit(y$depth, "-"), `[`, 2) out_df <- data.frame(plot_id = y[1,plot_id], attrib = y[1, "attrib"], depth = paste(y$min_depth[which(y$min_depth == z[1])], y$max_depth[which(y$max_depth == z[2])], sep = "-"), value = y[1, "value"], extract = mean(y$extract[which(y$min_depth == z[1]) : which(y$max_depth == z[2])], na.rm = TRUE)) })) })) row.names(extract_df) <- NULL } return(extract_df) }
The function to get all possible input options:
#' Query SoilGrids via the REST API #' #' See https://rest.soilgrids.org/soilgrids/v2.0/docs for more information #' #' @return Dataframe of all soil attributes, depths and values #' #' @importFrom httr GET content #' @export #' soilQueryOptions <- function() { query_get <- httr::GET("https://rest.soilgrids.org/soilgrids/v2.0/properties/layers") query_list <- httr::content(query_get, as = "parsed") out <- do.call(rbind, lapply(query_list$layers, function(x) { attrib <- x$property attrib_df <- do.call(rbind, lapply(x$layer_structure, function(y) { depth <- y$range value <- unlist(y$values) data.frame(depth, value) })) cbind(attrib = attrib, attrib_df) })) return(out) }
A test of soilQuery() for three locations in Africa:
x <- data.frame(plot_id = c("A", "B", "C", "D"), longitude_of_centre = c(22.674 , 15.555, 32.122, -1.319738), latitude_of_centre = c(7.140, -14.736, -6.231, 53.065368)) attrib = c("cec", "clay", "nitrogen") depth = c("0-5", "5-15", "15-30", "60-100") average = list(c(0,30), c(0,15), c(60,100)) value = c("mean", "Q0.5") soilQuery(x, attrib = attrib, depth = depth, average = average, value = value)
┌─────────┬──────────┬────────┬───────┬─────────┐ │ plot_id │ attrib │ depth │ value │ extract │ ╞═════════╪══════════╪════════╪═══════╪═════════╡ │ A │ cec │ 0-30 │ mean │ 102 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ A │ cec │ 0-15 │ mean │ 106 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ A │ cec │ 60-100 │ mean │ 100 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ B │ cec │ 0-30 │ mean │ 83 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ B │ cec │ 0-15 │ mean │ 84.500 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ B │ cec │ 60-100 │ mean │ 72 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ C │ cec │ 0-30 │ mean │ 53 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ C │ cec │ 0-15 │ mean │ 56.500 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ C │ cec │ 60-100 │ mean │ 44 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ D │ cec │ 0-30 │ mean │ 152.330 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ D │ cec │ 0-15 │ mean │ 173 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ D │ cec │ 60-100 │ mean │ 94 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ A │ clay │ 0-30 │ mean │ 187 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ A │ clay │ 0-15 │ mean │ 161.500 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ A │ clay │ 60-100 │ mean │ 335 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ B │ clay │ 0-30 │ mean │ 85.670 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ B │ clay │ 0-15 │ mean │ 73.500 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ B │ clay │ 60-100 │ mean │ 321 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ C │ clay │ 0-30 │ mean │ 108 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ C │ clay │ 0-15 │ mean │ 110.500 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ C │ clay │ 60-100 │ mean │ 265 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ D │ clay │ 0-30 │ mean │ 195.330 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ D │ clay │ 0-15 │ mean │ 211 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ D │ clay │ 60-100 │ mean │ 111 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ A │ nitrogen │ 0-30 │ mean │ 97 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ A │ nitrogen │ 0-15 │ mean │ 110.500 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ A │ nitrogen │ 60-100 │ mean │ 45 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ B │ nitrogen │ 0-30 │ mean │ 48 │ ├─────────┼──────────┼────────┼───────┼─────────┤ │ . │ ... │ ... │ ... │ ... │ └─────────┴──────────┴────────┴───────┴─────────┘