💾 Archived View for republic.circumlunar.space › users › johngodlee › posts › 2018-10-15-fill-genus.… captured on 2024-08-31 at 12:55:50. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-12-04)

-=-=-=-=-=-=-

An R function to fill abbreviated genus names in a list of species

DATE: 2018-10-15

AUTHOR: John L. Godlee

I had a list of species names written up by a colleague, but the colleague had abbreviated subsequent adjacent instances of a genus in the list to the first letter of the genus with a dot after it, which is common in written prose, but is pretty daft in a dataset.

Instead of going through and manually writing in all the genus names, I wrote a function in R to do it for me:

fill.genus <- function(x, abbrev = "."){
    rel_enc <- rle(as.character(x))
    empty <- which(grepl("\\.", rel_enc$value))
    rel_enc$values[empty] <- rel_enc$value[empty-1]
    inverse.rle(rel_enc)
}

So if the dataset looks like this:

genus <- c("Tapiphyllum",
    "Terminalia",
    "T.",
    "Tortuga",
    "T.",
    "Vangueriopsis",
    "V.",
    "V.",
    "Xeroderris",
    "Xylopia",

The output of fill.genus(genus) would look like:

c("Tapiphyllum",
    "Terminalia",
    "Terminalia",
    "Tortuga",
    "Tortuga",
    "Vangueriopsis",
    "Vangueriopsis",
    "Vangueriopsis",
    "Xeroderris",
    "Xylopia",