Prepare Species tolerance to frost data from TRY for use#

The Species tolerance to frost data from TRY informs on the ability of a species to endure frost.

If you intend to clean more than one or two traits, we recommend the use of the batch pre-processing script. Refer to the TRY main page for details.

If you have questions, suggestions, spot errors, or want to contribute, get in touch with us through planthub@idiv.de.

Author: David Schellenberger Costa

Requirements#

To run the script, the following is needed:

  • TRY data, available here

  • the data.table library may need to be installed

Code#

# load in libraries
library(data.table) # handle large datasets

# clear workspace
rm(list = ls())

Let’s get the TRY data

# set working directory (adapt this!)
setwd(paste0(.brd, "PlantHub"))

# read in data (adapt this!)
TRY <- fread("TRY_PlantHub.gz")

# select data of interest
TRYSubset <- TRY[TraitName == "Species tolerance to frost"]

This trait has some data coded as USDA plant hardiness zones. To decode the data, i.e. convert the plant hardiness zones into minimum temperatures, information on these zones needs to be obtained. They may be downloaded from wikipedia. The table found there needs to be converted into a data.frame of the same dimensions as the table, with no rownames, but a column “Zone” containing the USDA zones.

PHZ <- read.table(paste0(.brd, "PlantHub/USDA plant hardiness zones.txt"), sep = "\t", header = TRUE)
PHZ$ZoneClean <- as.numeric(gsub("\\D", "", PHZ$Zone))
PHZ <- t(sapply(unique(PHZ$ZoneClean), function(x) c(x, max(PHZ[PHZ$ZoneClean == x, 2:3]))))

The species tolerance to frost data from TRY is a container for numeric (degrees) and categorical data. We will start to process the categorical data and do the numeric afterwards. We will also split both numeric and categorical values into several traits.

To get an overview of the data, we convert values to lowercase, sort them, and show them as a table.

# extract original data strings
oriVals <- TRYSubset$OrigValueStr # oriVals == original values

# change all to lowercase to ease later classification
oriVals <- tolower(oriVals)

# get an overview over the data by summarizing values and showing them in alphabetical order
valueOverview <- table(oriVals)
valueOverview[order(valueOverview)]

It looks like a good idea to remove purely numeric values.

oriVals[!grepl("[[:lower:]]", oriVals)] <- NA

There are a couple of different traits mixed here. They can be distinguished by their names found in the “DataName” column. We will create new standardized names to use as actual trait names.

# show trait names
(catTraits <- names(table(TRYSubset$DataName[!is.na(oriVals)])))

# show values within each trait
for (i in seq_along(catTraits)) {
	print(catTraits[i])
	print(table(oriVals[TRYSubset$DataName == catTraits[i]]))
}

# create new trait names for those categories
catTraitNames <- c(
	"Plant exposed to freezing in natural range",
	"Plant non-woody tissue frost tolerance",
	"Plant seedling tissue frost tolerance",
	"Plant frost hardiness in spring",
	"Plant frost hardiness in winter"
)

As there are many expressions used, we standardize them here. We then write the new values into the “CleanedValueStr” column.

oriVals <- sub("high.*", "high", oriVals)
oriVals <- sub("intermediate.*", "intermediate", oriVals)
oriVals <- sub("hardy", "resistant", oriVals)
oriVals <- sub("very resistant", "resistant", oriVals)
oriVals <- sub("freezingexposed", TRUE, oriVals)
oriVals <- sub("freezingunexposed", FALSE, oriVals)

# integrate data into TRY
TRY[TraitName == "Species tolerance to frost", CleanedValueStr := oriVals]

We will now process the numerical values.

To get an overview of the data, we convert values to lowercase, sort them, and show them as a table.

# extract original data strings
oriVals <- TRYSubset$OrigValueStr # oriVals == original values

# change all to lowercase to ease later classification
oriVals <- tolower(oriVals)

# get an overview over the data by summarizing values and showing them in alphabetical order
valueOverview <- table(oriVals)
valueOverview[order(valueOverview)]

We remove purely categorical values that have no numerical characters and convert the remainder to numeric type.

oriVals[!grepl("\\d", oriVals)] <- NA
oriVals <- as.numeric(oriVals)

There are a couple of different traits mixed here. They can be distinguished by their names found in the “DataName” column. We will create new standardized names to use as actual trait names. There are some traits having “Frost hardiness” in their names that contain very little data. They will be removed. It will be necessary to harmonize Fahrenheit and Celsius temperatures as well as convert USDA zones to observed mean minimum temperature.

# show different traits using numerical values
(numTraits <- names(table(TRYSubset$DataName[!is.na(oriVals)])))

# show values within each trait
for (i in seq_along(numTraits)) {
	print(numTraits[i])
	print(table(TRYSubset$OrigUnitStr[TRYSubset$DataName == numTraits[i]]))
	print(table(oriVals[TRYSubset$DataName == numTraits[i]]))
}

# few data for hardiness of different plant organs, therefore remove these traits
numTraits <- numTraits[!grepl("Frost hardiness", numTraits)]

# create new trait names for those categories
numTraitNames <- c(
	"Plant minimum of frost free days in occurrence range",
	"Plant minimal tolerated temperature for recruitment",
	"Plant minimal tolerated temperature",
	"Plant minimal tolerated temperature",
	"Plant minimal tolerated temperature"
)

# convert farenheit and USDA hardiness zones into °C
oriVals[TRYSubset$OrigUnitStr == "farenheit"] <- 5 / 9 * oriVals[TRYSubset$OrigUnitStr == "farenheit"] - 32
oriVals[TRYSubset$DataName == "USDA Frost Hardiness Zone"] <-
	vapply(oriVals[TRYSubset$DataName == "USDA Frost Hardiness Zone"], function(x) PHZ[x, 2], 1)

# remove Frost hardiness trait data
oriVals[grepl("Frost hardiness", TRYSubset$DataName)] <- NA

We will now insert the data into the “CleanedValueStr” column. However, we will take care to only overwrite numeric data, not the categorical one that was already inserted.

TRY[
	TraitName == "Species tolerance to frost" & grepl("\\d", OrigValueStr),
	CleanedValueStr := oriVals[grepl("\\d", TRYSubset$OrigValueStr)]
]

We will now change the trait names and add units to them.

for (i in seq_along(catTraits)) {
	TRY[TraitName == "Species tolerance to frost" & DataName == catTraits[i], TraitName := catTraitNames[i]]
}
for (i in seq_along(numTraits)) {
	TRY[TraitName == "Species tolerance to frost" & DataName == numTraits[i], TraitName := numTraitNames[i]]
	if (grepl("days", numTraitNames[i])) {
		TRY[TraitName == numTraitNames[i], OrigUnitStr := "d"]
	} else if (grepl("temperature", numTraitNames[i])) {
		TRY[TraitName == numTraitNames[i], OrigUnitStr := "°C"]
	}
}

As there were some traits with very little data that we will not use and whose trait name was not changed, we remove it from the dataset.

TRY <- TRY[TraitName != "Species tolerance to frost"]

Let’s write the data to a file.

fwrite(TRY, file = paste0("TRY_processed_", Sys.Date(), ".gz"))