Taxonomic name parsing

Taxonomic name parsing#

This notebook is intended to show the functionality of a taxonomic name parser. Name parsing describes the identification of string components as parts of scientific names. In general, these are

generic name/genus,
specific name/epithet,
infraspecies markers (subsp./var./f.),
infraspecific name/epithet,
authors.

Additional elements, as hybrid signs and a number of other markers as cv. or agg., spelling errors, and custom information in the names, e.g., Bellis perennis_plot123, can complicate the process of name parsing. In this notebook, the taxonomic name parser from GBIF will be used. However, there are some limitation with it, and in the hands-on part, we will attempt to overcome those by pre-processing the data before sending it to the name parser. We will also try to speed up the name parsing process by using the parallel processing functionality of R.

Prerequisites#

To run the code presented here, you will need

the sample names list provided in the workshop,
a functioning R environment and
the R packages data.table, rgbif, and doSNOW installed.

Code#

The first block of code loads libraries and prepares the workspace. You will need to adapt the working directory.

# load packages
library(data.table) # handle large datasets
library(rgbif) # access GBIF data
library(doSNOW) # parallel computing

# clear workspace
rm(list = ls())

# set working directory
setwd(paste0(.brd, "gfoe NFDI taxonomic harmonization workshop"))

# load data
plants <- fread("plant names_2024-04-08.txt", sep = "\t")
animals <- fread("animal names_2024-04-09.txt", sep = "\t")

Lade nötiges Paket: foreach

Lade nötiges Paket: iterators

Lade nötiges Paket: snow

Both the plants and animals variables are tables with one column. The names in these tables are different from each other - most notably, some have authors included while others have not. To get the best results when doing name harmonization later on, we will need to separate authors, and also remove problematic characters from the data.

Encoding#

Unfortunately, when getting data from differing sources, we will often find that these data have been encoded in different ways. This means that while the typical English language letters will be stored the same way on any machine, when it comes to accents and some other special characters, it may matter whether data was stored by a computer in the US or Japan, and whether the computer has a Windows, Mac, or Linux operating system.

We will deal with the most common case: Data being stored in the Windows-specific CP-1252 encoding (mislabeled ANSI or latin1 sometimes) and not in UTF-8.

How your machine treats data from different encodings depends on what encoding is preset in your console. You can check this using the following:

Sys.getlocale()

'LC_COLLATE=German_Germany.utf8;LC_CTYPE=German_Germany.utf8;LC_MONETARY=German_Germany.utf8;LC_NUMERIC=C;LC_TIME=German_Germany.utf8'

If your console has no UTF-8 setting (no matter the language) you may change it like this:

Sys.setlocale(category = "LC_ALL", locale = "German_Germany.utf8")

You can use another encoding, too, but it may throw errors later on. So let’s check whether the data comes in UTF-8, and if not, let’s repair it, assuming it is CP-1252 (our best guess, likely correct in 99% of the cases).

# check whether correct encoding is UTF-8
table(validUTF8(plants$oldName))
table(validUTF8(animals$modName))

FALSE  TRUE 
   73  4927 

TRUE 
5000 

# create new columns for variables
plants[, newName := oldName]
animals[, newName := modName]
# correct encoding, assuming current encoding is CP-1252
plants[!validUTF8(newName), newName := iconv(newName, from = "CP1252", to = "UTF-8")]

Name parsing#

Let’s try to parse the names using the GBIF name parser.

resP <- data.table(name_parse(plants$newName))
resA <- data.table(name_parse(animals$newName))

table(resP$parsed)
table(resA$parsed)

FALSE  TRUE 
   11  4989 

FALSE  TRUE 
   17  4983 

That looks like a pretty good result. For plants and animals, we got all but 11 and 12 names parsed, respectively. Let’s look at what did not work for animals.

resA[parsed == FALSE]

A data.table: 17 × 18
scientificname	type	genusorabove	authorship	year	parsed	parsedpartially	canonicalname	canonicalnamecomplete	canonicalnamewithmarker	specificepithet	bracketauthorship	bracketyear	rankmarker	infraspecificepithet	infrageneric	cultivarepithet	sensu
<chr>	<chr>	<chr>	<chr>	<chr>	<lgl>	<lgl>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>
W0_758_Allobates hodli Simões, Lima & Farias	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Andinobates spec X Batista, Jaramillo, Ponce, & Crawford, 2014	HYBRID	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
XM3_777_Anthracothorax viridis (Audebert & Vieillot, 1801)	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
U_291_Balearica pavonina (Linnaeus, 1758)	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Q4_650_Colpophyllia Milne Edwards & Haime	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
O_824_Crypthelia glebulenta Cairns, 1986	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
DISsOSURA LONGICAUDUS (Gmelin, 1788)	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
C9_428_Euphlyctis hexadactylus (Lesson, 1834)	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
0S_65_Flabellum siboae Gardiner, 1904	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
165_251_Glaucidium nubicola Robbins & Stiles, 1999	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
HOMOPUq AREOLATUS (Thunberg, 1787)	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
LEPIDOPhRA SYMMETRICA Cairns, 1991	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
P_944_Podocnemis erythrocephala (Spix, 1824)	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
X_242_RHODOPIS	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
0P_663_Saiga tatarica (Linnaeus, 1766)	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
98Y_336_Stichopathes semiglabra (van Pesch, 1914)	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
6X_832_Uroplatus henkeli	NO_NAME	NA	NA	NA	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Pre-processing#

The problem with most of these names is the number-character combinations before the actual name. They need to be removed before using the name parser. As it seems these are combinations of one to three uppercase characters or numbers followed by a underline repeated twice, we may find them like shown below. Note that is essential to use regular expressions, which can be used to create target patterns to search for. Regular expressions are more or less the same across programming languages. Some information specifically on R can be found here.

animals[grepl("^([[:upper:]]|\\d){1,3}_([[:upper:]]|\\d){1,3}", newName), "newName"]

A data.table: 14 × 1
newName
<chr>
W0_758_Allobates hodli Simões, Lima & Farias
XM3_777_Anthracothorax viridis (Audebert & Vieillot, 1801)
U_291_Balearica pavonina (Linnaeus, 1758)
WCO_501_Calliphlox mitchellii (Bourcier, 1847)
Q4_650_Colpophyllia Milne Edwards & Haime
O_824_Crypthelia glebulenta Cairns, 1986
C9_428_Euphlyctis hexadactylus (Lesson, 1834)
0S_65_Flabellum siboae Gardiner, 1904
165_251_Glaucidium nubicola Robbins & Stiles, 1999
P_944_Podocnemis erythrocephala (Spix, 1824)
X_242_RHODOPIS
0P_663_Saiga tatarica (Linnaeus, 1766)
98Y_336_Stichopathes semiglabra (van Pesch, 1914)
6X_832_Uroplatus henkeli

Removing such a sequence could be done more or less like this.

# create a new variable to not overwrite the original data
animals[, testName := newName]

# remove the name sequences
animals[, testName := sub("^([[:upper:]]|\\d){1,3}_([[:upper:]]|\\d){1,3}", "", testName)]

# check whether it worked
animals[testName != newName, c("newName", "testName")]

TASKS:

Try to fix the code so that it gives the wanted result.

To increase the accuracy of later matching, look for these combinations of uppercase letters and numbers also in the species epithet.

Then, try to fix the problems with the other unparsed names in the animal and plant names.

There may also be some generic terms you may want to remove (e.g. spec., spp., agg., etc.).

Some useful functions can be found below.

# check for a number after the genus name, but before the year
animals[grepl("^\\S+\\s.*\\d.*\\s\\d{4}$", newName), "newName"][1:3]
resA[grepl("^\\S+\\s.*\\d.*\\s\\d{4}$", animals$newName)][1:3]

# check for spec., species, morpho, spp.
animals[grepl("spec\\.|species|morpho|spp\\.", newName), "newName"][1:3]
resA[grepl("spec\\.|species|morpho|spp\\.", animals$newName)][1:3]

# find name parts after an equal sign
plants[grepl("=", newName), "newName"][1:3]
resP[grepl("=", plants$newName)][1:3]

A data.table: 3 × 1
newName
<chr>
ACANTHASTREA LORDHOWENSIS_4_889 Veron & Pichon, 1982
Accipiter spp.-5 Rothschild & Hartert, 1926
Acropora morphospec1 Veron & Wallace, 1984

A data.table: 3 × 18
scientificname	type	genusorabove	authorship	year	parsed	parsedpartially	canonicalname	canonicalnamecomplete	canonicalnamewithmarker	specificepithet	bracketauthorship	bracketyear	rankmarker	infraspecificepithet	infrageneric	cultivarepithet	sensu
<chr>	<chr>	<chr>	<chr>	<chr>	<lgl>	<lgl>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>
ACANTHASTREA LORDHOWENSIS_4_889 Veron & Pichon, 1982	DOUBTFUL	Acanthastrea	Lordhowensis	NA	TRUE	TRUE	Acanthastrea	Acanthastrea Lordhowensis	Acanthastrea	NA	NA	NA	NA	NA	NA	NA	NA
Accipiter spp.-5 Rothschild & Hartert, 1926	INFORMAL	Accipiter	NA	NA	TRUE	TRUE	Accipiter spec.	Accipiter spec.	Accipiter spec.	NA	NA	NA	sp.	NA	NA	NA	NA
Acropora morphospec1 Veron & Wallace, 1984	SCIENTIFIC	Acropora	NA	NA	TRUE	TRUE	Acropora	Acropora	Acropora	NA	NA	NA	NA	NA	NA	NA	NA

A data.table: 3 × 1
newName
<chr>
Accipiter spp.-5 Rothschild & Hartert, 1926
Acropora morphospec1 Veron & Wallace, 1984
Brookesia spp. Q Brygoo & Domergue, 1975

A data.table: 3 × 18
scientificname	type	genusorabove	authorship	year	parsed	parsedpartially	canonicalname	canonicalnamecomplete	canonicalnamewithmarker	specificepithet	bracketauthorship	bracketyear	rankmarker	infraspecificepithet	infrageneric	cultivarepithet	sensu
<chr>	<chr>	<chr>	<chr>	<chr>	<lgl>	<lgl>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>
Accipiter spp.-5 Rothschild & Hartert, 1926	INFORMAL	Accipiter	NA	NA	TRUE	TRUE	Accipiter spec.	Accipiter spec.	Accipiter spec.	NA	NA	NA	sp.	NA	NA	NA	NA
Acropora morphospec1 Veron & Wallace, 1984	SCIENTIFIC	Acropora	NA	NA	TRUE	TRUE	Acropora	Acropora	Acropora	NA	NA	NA	NA	NA	NA	NA	NA
Brookesia spp. Q Brygoo & Domergue, 1975	INFORMAL	Brookesia	NA	NA	TRUE	FALSE	Brookesia spp.Q	Brookesia spp.Q	Brookesia spp.Q	spp.Q	NA	NA	sp.	NA	NA	NA	NA

A data.table: 3 × 1
newName
<chr>
Artemisia vulgaris x verlotiorum = A. x wurzellii C.M. James & Stace
Lolium perenne x multiflorum = L. x boucheanum Kunth
Mentha arvensis x aquatica x spicata = M. x smithiana R.A. Graham

A data.table: 3 × 17
scientificname	type	parsed	parsedpartially	genusorabove	canonicalname	canonicalnamecomplete	canonicalnamewithmarker	rankmarker	specificepithet	authorship	infraspecificepithet	bracketauthorship	notho	sensu	nomstatus	strain
<chr>	<chr>	<lgl>	<lgl>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>	<chr>
Artemisia vulgaris x verlotiorum = A. x wurzellii C.M. James & Stace	HYBRID	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Lolium perenne x multiflorum = L. x boucheanum Kunth	HYBRID	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Mentha arvensis x aquatica x spicata = M. x smithiana R.A. Graham	HYBRID	FALSE	FALSE	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

You can compare your results to the ones provided here below. For plants, they have been created using a name parser developed specifically for the names of the TRY database. For animals this is an extract of the CITES names list, these are the unmodified correct names, from which the erronous names found here were derived.

# for testing
plantsFull <- fread("plant names full_2024-04-08.txt", sep = ",")
animalsFull <- fread("animal names full_2024-04-09.txt", sep = ",")

str(plantsFull)
str(animalsFull)

Classes 'data.table' and 'data.frame':	5000 obs. of  18 variables:
 $ oldName         : chr  "" "(lauraceae) pubescente" "?Betulaceae sp." "Abarema curvicarpa" ...
 $ newName         : chr  "" "" "" "" ...
 $ familyNameFound : logi  FALSE TRUE TRUE FALSE FALSE FALSE ...
 $ oldFamilyName   : chr  "" "lauraceae" "Betulaceae" "" ...
 $ newFamilyName   : chr  "" "Lauraceae" "Betulaceae" "" ...
 $ genus           : chr  "" "Pubescente" "" "Abarema" ...
 $ hybrid1         : chr  "" "" "" "" ...
 $ species1        : chr  "" "" "" "curvicarpa" ...
 $ subSpeciesFlag  : chr  "" "" "" "" ...
 $ subSpecies      : chr  "" "" "" "" ...
 $ varSpeciesFlag  : chr  "" "" "" "" ...
 $ varSpecies      : chr  "" "" "" "" ...
 $ formaSpeciesFlag: logi  NA NA NA NA NA NA ...
 $ formaSpecies    : logi  NA NA NA NA NA NA ...
 $ hybrid2         : chr  "" "" "" "" ...
 $ species2        : chr  "" "" "" "" ...
 $ author          : chr  "" "" "" "" ...
 $ kingdom         : chr  "" "" "" "P" ...
 - attr(*, ".internal.selfref")=<externalptr> 
Classes 'data.table' and 'data.frame':	5000 obs. of  55 variables:
 $ TaxonId                    : int  2581 3734 1703 68243 68179 68198 68076 68212 68150 68149 ...
 $ Kingdom                    : chr  "Animalia" "Animalia" "Animalia" "Animalia" ...
 $ Phylum                     : chr  "Chordata" "Chordata" "Chordata" "Chordata" ...
 $ Class                      : chr  "Aves" "Aves" "Reptilia" "Reptilia" ...
 $ Order                      : chr  "Apodiformes" "Apodiformes" "Sauria" "Sauria" ...
 $ Family                     : chr  "Trochilidae" "Trochilidae" "Anguidae" "Anguidae" ...
 $ Genus                      : chr  "Abeillia" "Abeillia" "Abronia" "Abronia" ...
 $ Species                    : chr  "" "abeillei" "" "anzuetoi" ...
 $ Subspecies                 : chr  "" "" "" "" ...
 $ FullName                   : chr  "Abeillia" "Abeillia abeillei" "Abronia" "Abronia anzuetoi" ...
 $ AuthorYear                 : chr  "Bonaparte, 1850" "(Lesson & DeLattre, 1839)" "Gray, 1838" "Campbell & Frost, 1993" ...
 $ RankName                   : chr  "GENUS" "SPECIES" "GENUS" "SPECIES" ...
 $ CurrentListing             : chr  "II" "II" "I/II" "I" ...
 $ FullAnnotationEnglish      : chr  "Appendix II:" "Appendix II:" "Appendix II:Except the species included in Appendix I. Zero export quota for wild specimens for <i>Abronia auri"| __truncated__ "Appendix I:" ...
 $ AnnotationEnglish          : chr  "Appendix II:" "Appendix II:" "Appendix II:Except the species included in Appendix I. Zero export quota for wild specimens for <i>Abronia auri"| __truncated__ "Appendix I:" ...
 $ AnnotationSpanish          : chr  "Appendix II:" "Appendix II:" "Appendix II:Excepto las especies incluidas en el Apéndice I. Cupo de exportación nulo para los especímenes silv"| __truncated__ "Appendix I:" ...
 $ AnnotationFrench           : chr  "Appendix II:" "Appendix II:" "Appendix II:Sauf les espèces inscrites à l’Annexe I. Quota d’exportation zéro pour les spécimens sauvages pour "| __truncated__ "Appendix I:" ...
 $ #AnnotationSymbol          : chr  "" "" "" "" ...
 $ #Annotation                : chr  "Appendix II:" "Appendix II:" "Appendix II:" "Appendix I:" ...
 $ SynonymsWithAuthors        : chr  "" "Ornismya abeillei Lesson & DeLattre, 1839" "" "Abronia anzuetoi Köhler, 2000" ...
 $ EnglishNames               : chr  "" "Emerald-chinned Hummingbird" "" "Anzuetoi arboreal alligator lizard" ...
 $ SpanishNames               : chr  "" "Colibrí barbiesmeralda" "" "" ...
 $ FrenchNames                : chr  "" "Colibri d'Abeillé" "" "" ...
 $ CitesAccepted              : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ All_DistributionISOCodes   : chr  "" "SV, GT, HN, MX, NI" "" "GT" ...
 $ All_DistributionFullNames  : chr  "" "El Salvador, Guatemala, Honduras, Mexico, Nicaragua" "" "Guatemala" ...
 $ NativeDistributionFullNames: chr  "" "El Salvador, Guatemala, Honduras, Mexico, Nicaragua" "" "Guatemala" ...
 $ Introduced_Distribution    : chr  "" "" "" "" ...
 $ Introduced(?)_Distribution : chr  "" "" "" "" ...
 $ Reintroduced_Distribution  : chr  "" "" "" "" ...
 $ Extinct_Distribution       : chr  "" "" "" "" ...
 $ Extinct(?)_Distribution    : chr  "" "" "" "" ...
 $ Distribution_Uncertain     : chr  "" "" "" "" ...
 $ modOrder                   : chr  "Apodiformes" "Apodiformes" "Sauria" "Sauria" ...
 $ modFamily                  : chr  "Trochilidae" "Trochilidae" "Anguidae" "Anguidae" ...
 $ modGenus                   : chr  "Abeillia" "Abeillia" "Abronia" "Abronia" ...
 $ modSpecies                 : chr  "" "abeillei" "" "anzuetoi" ...
 $ modSubspecies              : chr  "" "" "" "" ...
 $ modAuthorYear              : chr  "Bonaparte, 1850" "(Lesson & DeLattre, 1839)" "Gray, 1838" "Campbell & Frost, 1993" ...
 $ uppercase                  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ lowercase                  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ changedOne                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ omittedOne                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ shuffle                    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ noAuthors                  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ abbrAuthors                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ noYear                     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ abbrGenus                  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ addPlot                    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ addFamily                  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ morphoSpec                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ cutGenusEpi                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ modName                    : chr  "Abeillia Bonaparte, 1850" "Abeillia abeillei (Lesson & DeLattre, 1839)" "Abronia Gray, 1838" "Abronia anzuetoi Campbell & Frost, 1993" ...
 $ cutName                    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ Name                       : chr  "Abeillia Bonaparte, 1850" "Abeillia abeillei (Lesson & DeLattre, 1839)" "Abronia Gray, 1838" "Abronia anzuetoi Campbell & Frost, 1993" ...
 - attr(*, ".internal.selfref")=<externalptr> 

Parallel processing#

With our name lists having 5000 names each, the parsing takes just some seconds. Depending on the size of the list, it may be a good idea to speed up the process by parallelizing it. As the name_parse function already accepts several names at once, we may split the names list into the number of cores we can use for parallel processing.

Let’s check how many cores are available on the system.

parallel::detectCores()

16

It is unlikely that you have so many cores available, but from former trials with the GBIF API I can tell you that is is wise to limit the core number to 24 at maximum. So let’s re-run the name parsing to compare the times needed.

timeStart <- Sys.time()
resP <- data.table(name_parse(plants$newName))
Sys.time() - timeStart
timeStart <- Sys.time()
resA <- data.table(name_parse(animals$newName))
Sys.time() - timeStart

Time difference of 1.98634 secs

Time difference of 3.512737 secs

Now let’s split the lists into chunks and let each worker run independently.

nLists <- min(24, parallel::detectCores() - 1)
(nNames <- nrow(plants) %/% nLists)
(nNamesLast <- nNames + nrow(plants) %% nLists)

333

338

We chose to use one workers less than we can, to allow the computer to fulfill other tasks while the script is running, and a maximum of 24. On my computer, this means that each chunk has 333 names to process, and the last chunk will have 338. Let’s create the parallel environment now and compare the times. We just to the plant case for simplicity.

# create the cluster for parallel processing
cl <- makeCluster(nLists)
registerDoSNOW(cl)

# run the name parsing in parallel
# the option "fill = TRUE" makes sure foreach throws no error due to different column numbers
timeStart <- Sys.time()
resP_parallel <- foreach(
	i = seq_len(nLists), .combine = function(...) rbind(..., fill = TRUE),
	.packages = c("data.table", "rgbif")
) %dopar% {
	if (i < nLists) {
		res <- data.table(name_parse(plants$newName[seq_len(nNames) + (i - 1) * nNames]))
	} else {
		res <- data.table(name_parse(plants$newName[seq_len(nNamesLast) + (i - 1) * nNames]))
	}
	res
}
Sys.time() - timeStart

# stop the cluster
stopCluster(cl)

Time difference of 5.201381 secs

Timewise, for this little dataset, the overhead created by setting up the parallel environment was larger than the speed gain through parallel processing. Let’s check whether the results are the same.

all(resP == resP_parallel, na.rm = TRUE)

TRUE

However, the results are as expected, and with larger lists, this approach could save us some time.