💾 Archived View for republic.circumlunar.space › users › johngodlee › posts › 2020-10-31-abundance_m… captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Making abundance matrices

DATE: 2020-10-31

AUTHOR: John L. Godlee

There are lots of R packages to generate species by site abundance matrices from a long-format dataframe of records. For example, labdsv::matrify() takes a matrix like this:

┌──────┬────────────────┬───────────┐
│ Site │    Species     │ Abundance │
╞══════╪════════════════╪═══════════╡
│  A   │ Quercus robur  │    10     │
├──────┼────────────────┼───────────┤
│  B   │ Quercus robur  │     2     │
├──────┼────────────────┼───────────┤
│  B   │ Betula pendula │    30     │
├──────┼────────────────┼───────────┤
│ ...  │      ...       │    ...    │
└──────┴────────────────┴───────────┘

This method relies on already having the data summarised, but what if each row was a record, as would be the case if you had raw tree diameter measurements, rather than merely a count of abundance:

┌──────┬────────────────┬────────┐
│ Site │    Species     │  DBH   │
╞══════╪════════════════╪════════╡
│  A   │ Quercus robur  │ 15.600 │
├──────┼────────────────┼────────┤
│  A   │ Quercus robur  │ 5.400  │
├──────┼────────────────┼────────┤
│  A   │ Betula pendula │   11   │
├──────┼────────────────┼────────┤
│ ...  │      ...       │  ...   │
└──────┴────────────────┴────────┘

It wouldn't be hard to turn this into a summary table with some dplyr:

count(dat, Site, Species)

Additionally, what if individuals vary according sampling effort, for example if species less than 10 cm DBH were only measured in a 20x10 m box within a large 20x50 m plot:

┌──────┬────────────────┬────────┬───────┐
│ Site │    Species     │  DBH   │  FPC  │
╞══════╪════════════════╪════════╪═══════╡
│  A   │ Quercus robur  │ 15.600 │   1   │
├──────┼────────────────┼────────┼───────┤
│  A   │ Quercus robur  │ 5.400  │ 0.200 │
├──────┼────────────────┼────────┼───────┤
│  A   │ Betula pendula │   11   │   1   │
├──────┼────────────────┼────────┼───────┤
│ ...  │      ...       │  ...   │  ...  │
└──────┴────────────────┴────────┴───────┘

Or if the measure of abundance isn't individual presence, but the canopy cover of the individual:

┌──────┬────────────────┬────────┬───────┐
│ Site │    Species     │  DBH   │ Cover │
╞══════╪════════════════╪════════╪═══════╡
│  A   │ Quercus robur  │ 15.600 │ 2.530 │
├──────┼────────────────┼────────┼───────┤
│  A   │ Quercus robur  │ 5.400  │ 1.010 │
├──────┼────────────────┼────────┼───────┤
│  A   │ Betula pendula │   11   │ 2.400 │
├──────┼────────────────┼────────┼───────┤
│ ...  │      ...       │  ...   │  ...  │
└──────┴────────────────┴────────┴───────┘

Then it becomes much harder to create one of these matrices.

Wouldn't it be nice to have a base R function to create species by site abundance matrices, which can deal with sampling effort, alternative methods of abundance, and unsummarised data.

#' Generate a species by site abundance matrix
#'
#' @param x dataframe of individual records
#' @param site_id column name string of site IDs
#' @param species_id column name string of species names
#' @param fpc optional column name string of sampling weights of each record, 
#'     between 0 and 1 
#' @param abundance optional column name string with an alternative abundance 
#'     measure such as biomass, canopy cover, body length
#'
#' @return dataframe of species abundances (columns) per site (rows)
#' 
#' @examples
#' x <- data.frame(site_id = rep(c("A", "B", "C"), each = 3), 
#'   species_id = sample(c("a", "b", "c", "d"), 9, replace = TRUE), 
#'   fpc = rep(c(0.5, 0.6, 1), each = 3), 
#'   abundance = seq(1:9))
#' abMat(x, "site_id", "species_id")
#' abMat(x, "site_id", "species_id", "fpc")
#' abMat(x, "site_id", "species_id", "fpc", "abundance")
#' 
#' @export
#' 
abMat <- function(x, site_id, species_id, fpc = NULL, abundance = NULL) {
  # If no fpc or abundance, make 1
  if (is.null(fpc)) {
    x$fpc <- 1
  } else {
    x$fpc <- x[[fpc]]
  }
  if (is.null(abundance)) {
    x$abundance <- 1 
  } else {
    x$abundance <- x[[abundance]]
  }

  # Get all species and sites
  species <- unique(x[[species_id]])
  sites <- unique(x[[site_id]])

  # Create empty species by site matrix
  comm <- matrix(0, nrow = length(sites), ncol = length(species))

  # Fill matrix
  for (i in seq(length(sites))) {
    for(j in seq(length(species))) {
      abu <- x[x[[site_id]] == sites[i] & x[[species_id]] == species[j], 
        c(site_id, species_id, "fpc", "abundance")]
      comm[i,j] <- sum(1 * abu$abundance / abu$fpc, na.rm = TRUE)
    }
  }

  # Make tidy with names
  comm <- data.frame(comm)
  names(comm) <- species
  row.names(comm) <- sites

  return(comm)
}