💾 Archived View for republic.circumlunar.space › users › johngodlee › posts › 2018-10-15-fill-genus.… captured on 2023-09-28 at 16:21:52. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-04)
-=-=-=-=-=-=-
DATE: 2018-10-15
AUTHOR: John L. Godlee
I had a list of species names written up by a colleague, but the colleague had abbreviated subsequent adjacent instances of a genus in the list to the first letter of the genus with a dot after it, which is common in written prose, but is pretty daft in a dataset.
Instead of going through and manually writing in all the genus names, I wrote a function in R to do it for me:
fill.genus <- function(x, abbrev = "."){ rel_enc <- rle(as.character(x)) empty <- which(grepl("\\.", rel_enc$value)) rel_enc$values[empty] <- rel_enc$value[empty-1] inverse.rle(rel_enc) }
So if the dataset looks like this:
genus <- c("Tapiphyllum", "Terminalia", "T.", "Tortuga", "T.", "Vangueriopsis", "V.", "V.", "Xeroderris", "Xylopia",
The output of fill.genus(genus) would look like:
c("Tapiphyllum", "Terminalia", "Terminalia", "Tortuga", "Tortuga", "Vangueriopsis", "Vangueriopsis", "Vangueriopsis", "Xeroderris", "Xylopia",