💾 Archived View for republic.circumlunar.space › users › johngodlee › posts › 2019-11-15-gginext.gmi captured on 2023-04-19 at 23:51:41. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-04)
-=-=-=-=-=-=-
DATE: 2019-11-15
AUTHOR: John L. Godlee
There is an R package called iNEXT[1] which allows you to rarefy and extrapolate Hill numbers (measures of diversity) to correct for differences in sampling effort between replicates. In my case, I have a dataset of plots (replicates) with trees in them, but the plots vary in size (sampling effort). Smaller plots by probability will have fewer species in them, but this doesn't mean the landscape species richness is lower.
1: https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12613
I start off with a list where each list item is a vector of species abundances found in a particular plot. In this example I'm using the spider dataset which comes from iNEXT. I'll also load some packages:
library(iNEXT) library(ggplot2) library(dplyr) data(spider)
Then I run iNEXT() on the data to estimate the Hill Number of order q, which is the equivalent numbers species richness:
inext_out <- iNEXT(spider, q = 0, datatype = "abundance")
The default visualisation provided by iNEXT looks like this:
ggiNEXT(inext_out)
It's pretty ugly and I think it makes some bad design choices. It works fine for a visualisation during analysis, but if I want to put this figure in an article it needs to look crisper.
I define a function to create a better ggplot() object from the output of iNEXT():
ggiNEXT_fix <- function(x){ inext_out_list <- list() inext_out_list[[1]] <- bind_rows(x$iNextEst, .id = "id") inext_out_list[[2]] <- inext_out_list[[1]] %>% filter(method == "observed") inext_out_list[[3]] <- inext_out_list[[1]] %>% filter(method == "interpolated") inext_out_list[[4]] <- inext_out_list[[1]] %>% filter(method == "extrapolated") inext_out_list[[5]] <- inext_out_list[[1]] %>% group_by(id) %>% summarise(m_max = max(m), qd_max = max(qD)) ggplot() + geom_line(data = inext_out_list[[3]], aes(x = m, y = qD, colour = id, linetype = "Interpolated")) + geom_line(data = inext_out_list[[4]], aes(x = m, y = qD, colour = id, linetype = "Extrapolated")) + geom_point(data = inext_out_list[[2]], aes(x = m, y = qD, fill = id, shape = "Observed"), size = 2, colour = "black") + geom_point(data = inext_out_list[[5]], aes(x = m_max, y = qd_max, fill = id, shape = "Extrapolated"), size = 2, colour = "black") + scale_linetype_manual(name = NA, values = c(3, 1), labels = c('Extrapolated','Interpolated')) + scale_shape_manual(name = NA, values = c(22,21), labels = c("Extrapolated", "Interpolated")) + scale_fill_discrete(name = NA, guide = FALSE) + labs(x = "Bootstrap sample size", y = "Extrapolated species richness") + theme_classic() + theme(legend.title = element_blank()) } ggiNEXT_fix(inext_out)
Of course, the default ggiNEXT() function is a lot more flexible than my function, it can create different plots depending on options given, like facetting by Hill number if there are multiple or facetting by both Hill number and sample if there are lots of samples. It can also make sample completeness curves and coverage-based rarefaction/extrapolation curves, which my function can't do. Maybe in the future if I need to plot those methods I will expand the function.