Digitize data from images#

This is an exemplary script for the use of the package “digitize” to extract data from graphics where it is not available in digital form (e.g. plots from old publications). To get more information on this R package, visit the documentation.

If you have questions, suggestions, spot errors, or want to contribute, get in touch with us through planthub@idiv.de.

Author: David Schellenberger Costa

Requirements#

To run the script, the following is needed:

Code#

# load in libraries
library("digitize")

# clear workspace
rm(list = ls())

# set working directory (adapt this!)
setwd(paste0(.brd, "snippets"))

We will work with the two images that can be found with the link above. The first one is an example of a bar plot, and the second of a scatter plot.

Extracting data from a bar plot#

Specifically, we want to get the heights of the parts of a stacked bar plot representing the amount of variation in plant functional traits attributable to different taxonomic levels, e.g. species or family.

Let’s load and calibrate the image. The loaded image will appear in your graphics window. Now you have to input four calibration points. They should correspond to clearly visible ticks of axes in the image. The first two points will be used to calibrate the x axis, the second two points to calibrate the y axis. The points do not need to be the extremes, but longer distance between points increases accuracy.

cal <- ReadAndCal("fig1.jpg")

Point and click on the dividing lines of all bar plots in your image. To ease later computation, also mark the y values of 0 and 1 in each bar. Finish your input by clicking the right mouse button or pressing ESC.

raw <- DigitData(col = "red")

We now need to transform the raw coordinates to the scale of the image. The four numbers are coordinates from the four points chosen in the calibration: the x value of point 1, the x value of point 2, the y value of point 3, the y value of point 4 points <- Calibrate(raw, cal, 0, 1, 0, 1). We will add some description to our data. As we can see in the image, the different bars represent plant functional traits. So we name the columns of our matrix accordingly.

# transform values to matrix and round results
dat <- matrix(round(points$y, 2), nrow = 6, ncol = nrow(points) / 6)

# add colnames
colnames(dat) <- c("Leaf area", "SLA", "LDMC", "Plant height", "SSD", "LeafC", "LeafN", "LeafP")[seq_len(ncol(dat))]

As we want to get the individual bar segment sizes, we calulate them by subtracting row values from each other.

dat <- t(sapply(2:nrow(dat), function(x) dat[x, ] - dat[x - 1, ]))

We still need to add rownames. As seen in the image, they should be the different taxonomic levels.

rownames(dat) <- c("Order", "Family", "Genus", "Species", "Within")

We are now ready to work with the data, but also redo the plot. Let’s try this for visual confirmation of the results.

barplot(dat, legend = rownames(dat))

Extracting the data points from a scatter plot#

This time, we want to get the coordinates of points in a scatter plot (we will not do the lines, but it would basically be the same: Just selecting enough points from the lines to get the shape right). Here, the scatter plot shows the variation of stem specific density with elevation for three different plant groups: Trees, herbs, and epiphytes.

Let’s load and calibrate the image. The loaded image will appear in your graphics window. Now you have to input four calibration points. They should correspond to clearly visible ticks of axes in the image. The first two points will be used to calibrate the x axis, the second two points to calibrate the y axis. The points do not need to be the extremes, but longer distance between points increases accuracy.

cal <- ReadAndCal("fig2.jpg")

Point and click on each data point that belongs to the same group in your plot viewer. Click the right mouse button or press ESC to finish. As there are three groups, by running the code in a loop we will be asked to select points three times. Here, it is important that you adapt the x and y values in the Calibrate() function to the ones you selected when doing the calibration above to get the scale right.

numGroups <- 3 # number of groups
dat <- list()
for (i in 1:numGroups) {
	raw <- DigitData(col = "red")
	# type in the real x and y values you marked with the blue calibration crosses
	dat[[i]] <- Calibrate(raw, cal, 1.600, 2.800, 0, 0.4)
}

For visual confirmation, let’s first extract the data range from the data and then plot the three groups of points in a loop.

# extract x and y ranges from data
xRange <- range(sapply(dat, function(x) range(round(x$x, 2))))
yRange <- range(sapply(dat, function(x) range(round(x$y, 2))))

# plot the data
plot(NULL, xlim = xRange, ylim = yRange, xlab = "elevation [m]", ylab = "SSD [g/cm^3]")
for (i in 1:numGroups) {
	points(dat[[i]], pch = i, lwd = 2, col = i)
}