💾 Archived View for republic.circumlunar.space › users › johngodlee › posts › 2020-10-31-abundance_m… captured on 2023-03-20 at 18:58:40. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-04)
-=-=-=-=-=-=-
DATE: 2020-10-31
AUTHOR: John L. Godlee
There are lots of R packages to generate species by site abundance matrices from a long-format dataframe of records. For example, labdsv::matrify() takes a matrix like this:
┌──────┬────────────────┬───────────┐ │ Site │ Species │ Abundance │ ╞══════╪════════════════╪═══════════╡ │ A │ Quercus robur │ 10 │ ├──────┼────────────────┼───────────┤ │ B │ Quercus robur │ 2 │ ├──────┼────────────────┼───────────┤ │ B │ Betula pendula │ 30 │ ├──────┼────────────────┼───────────┤ │ ... │ ... │ ... │ └──────┴────────────────┴───────────┘
This method relies on already having the data summarised, but what if each row was a record, as would be the case if you had raw tree diameter measurements, rather than merely a count of abundance:
┌──────┬────────────────┬────────┐ │ Site │ Species │ DBH │ ╞══════╪════════════════╪════════╡ │ A │ Quercus robur │ 15.600 │ ├──────┼────────────────┼────────┤ │ A │ Quercus robur │ 5.400 │ ├──────┼────────────────┼────────┤ │ A │ Betula pendula │ 11 │ ├──────┼────────────────┼────────┤ │ ... │ ... │ ... │ └──────┴────────────────┴────────┘
It wouldn't be hard to turn this into a summary table with some dplyr:
count(dat, Site, Species)
Additionally, what if individuals vary according sampling effort, for example if species less than 10 cm DBH were only measured in a 20x10 m box within a large 20x50 m plot:
┌──────┬────────────────┬────────┬───────┐ │ Site │ Species │ DBH │ FPC │ ╞══════╪════════════════╪════════╪═══════╡ │ A │ Quercus robur │ 15.600 │ 1 │ ├──────┼────────────────┼────────┼───────┤ │ A │ Quercus robur │ 5.400 │ 0.200 │ ├──────┼────────────────┼────────┼───────┤ │ A │ Betula pendula │ 11 │ 1 │ ├──────┼────────────────┼────────┼───────┤ │ ... │ ... │ ... │ ... │ └──────┴────────────────┴────────┴───────┘
Or if the measure of abundance isn't individual presence, but the canopy cover of the individual:
┌──────┬────────────────┬────────┬───────┐ │ Site │ Species │ DBH │ Cover │ ╞══════╪════════════════╪════════╪═══════╡ │ A │ Quercus robur │ 15.600 │ 2.530 │ ├──────┼────────────────┼────────┼───────┤ │ A │ Quercus robur │ 5.400 │ 1.010 │ ├──────┼────────────────┼────────┼───────┤ │ A │ Betula pendula │ 11 │ 2.400 │ ├──────┼────────────────┼────────┼───────┤ │ ... │ ... │ ... │ ... │ └──────┴────────────────┴────────┴───────┘
Then it becomes much harder to create one of these matrices.
Wouldn't it be nice to have a base R function to create species by site abundance matrices, which can deal with sampling effort, alternative methods of abundance, and unsummarised data.
#' Generate a species by site abundance matrix #' #' @param x dataframe of individual records #' @param site_id column name string of site IDs #' @param species_id column name string of species names #' @param fpc optional column name string of sampling weights of each record, #' between 0 and 1 #' @param abundance optional column name string with an alternative abundance #' measure such as biomass, canopy cover, body length #' #' @return dataframe of species abundances (columns) per site (rows) #' #' @examples #' x <- data.frame(site_id = rep(c("A", "B", "C"), each = 3), #' species_id = sample(c("a", "b", "c", "d"), 9, replace = TRUE), #' fpc = rep(c(0.5, 0.6, 1), each = 3), #' abundance = seq(1:9)) #' abMat(x, "site_id", "species_id") #' abMat(x, "site_id", "species_id", "fpc") #' abMat(x, "site_id", "species_id", "fpc", "abundance") #' #' @export #' abMat <- function(x, site_id, species_id, fpc = NULL, abundance = NULL) { # If no fpc or abundance, make 1 if (is.null(fpc)) { x$fpc <- 1 } else { x$fpc <- x[[fpc]] } if (is.null(abundance)) { x$abundance <- 1 } else { x$abundance <- x[[abundance]] } # Get all species and sites species <- unique(x[[species_id]]) sites <- unique(x[[site_id]]) # Create empty species by site matrix comm <- matrix(0, nrow = length(sites), ncol = length(species)) # Fill matrix for (i in seq(length(sites))) { for(j in seq(length(species))) { abu <- x[x[[site_id]] == sites[i] & x[[species_id]] == species[j], c(site_id, species_id, "fpc", "abundance")] comm[i,j] <- sum(1 * abu$abundance / abu$fpc, na.rm = TRUE) } } # Make tidy with names comm <- data.frame(comm) names(comm) <- species row.names(comm) <- sites return(comm) }