Lathyrus vernus IPMs

Richard P. Shefferson, Shun Kurokawa, and Johan Ehrlén

In this vignette, we will use the lathyrus dataset to illustrate the estimation of integral projection models (IPMs). To reduce output size, we have prevented some statements from running if they produce long stretches of output. Examples include most summary() calls. In these cases, we include hashtagged versions of these calls, and our text assumes that the user runs these statements without hashtags. We have also written this vignette using the default Ehrlén format as the output format for all historical MPMs. All examples work with deVries format as well. This vignette is only a sample analysis. Detailed information and instructions on using lefko3 are available through a free online e-book called lefko3: a gentle introduction, available on the projects page of the Shefferson lab website. This website also includes links to long-format vignettes and other demonstrations of this package.

ORGANISM AND POPULATION

Lathyrus vernus (family Fabaceae) is a long-lived forest herb, native to Europe and large parts of northern Asia. Individuals grow larger slowly and usually flower only after 10-15 years of vegetative growth. Flowering individuals have an average conditional lifespan of 44.3 years (Ehrlén and Lehtila 2002). L. vernus lacks organs for vegetative spread and individuals are well delimited (Ehrlén 2002). One or several erect shoots of up to 40 cm height emerge from a subterranean rhizome in March and April. Flowering occurs about four weeks after shoot emergence. Shoot growth is determinate, and the number of flowers is determined in the previous year (Ehrlén and Van Groenendael 2001). Individuals sometimes do not produce aboveground structures every year, instead remaining dormant in one or more seasons. L. vernus is self-compatible but requires visits from bumble-bees to produce seeds. Individuals produce few, large seeds and establishment from seeds is relatively frequent (Ehrlén and Eriksson 1996). The pre-dispersal seed predator Bruchus atomarius often consumes a large fraction of developing seeds, and roe deer (Capreolus capreolus) sometimes consume the shoots (Ehrlén and Munzbergova 2009).

Data for this study were collected from six permanent plots in a population of L. vernus located in a deciduous forest in the Tullgarn area in southeastern Sweden (58.9496 N, 17.6097 E), during 1988–1991 (Ehrlén 1995). The six plots were similar with regard to soil type, elevation, slope, and canopy cover. Within each plot, all individuals were marked with numbered tags that remained over the study period, and their locations were mapped. New individuals were included in the study in each year. Individuals were recorded at least three times every growing season. At the time of shoot emergence, we recorded whether individuals were alive and produced above-ground shoots, and if shoots had been grazed. During flowering, we recorded flower number and the height and diameter of all shoots. At fruit maturation, we counted the number of intact and damaged seeds. To derive a measure of above-ground size for each individual, we calculated the volume of each shoot as \(\pi × (\frac{1}{2} diameter)^2 × height\), and summed the volumes of all shoots. This measure is strongly correlated with the dry mass of aboveground tissues (\(R^2 = 0.924\), \(P < 0.001\), \(n = 50\), log-transformed values; Ehrlén 1995). Size of individuals that had been grazed was estimated based on measures of shoot diameter in grazed shoots, and the relationship between shoot diameter and shoot height in non-grazed individuals. Only individuals with an aboveground volume of more than 230 mm3 flowered and produced fruits during this study. Individuals that lacked aboveground structures in one season but reappeared in the following year were considered dormant. Individuals that lacked aboveground structures in two subsequent seasons were considered dead from the year in which they first lacked aboveground structures. Probabilities of seeds surviving to the next year, and of being present as seedlings or seeds in the soil seed bank, were derived from separate yearly sowing experiments in separate plots adjacent to each subplot (Ehrlén and Eriksson 1996).

BASIC WORKFLOW

An IPM is a kind of function-based MPM in which transitions are modeled on a continuous state variable rather than discrete stages (Ellner and Rees 2006). A size metric is often used as this continuous state variable. In practice, all size-classified matrix projection models including IPMs require discretized size classes. So, although size is modeled on the Gaussian distribution, the actual stages developed break size up into many fine-scale, equally sized classes or bins. Although the number of these bins varies from analysis to analysis, package lefko3 uses a default of 100 (this can be changed as an option). Because vital rates are modeled rather than directly calculated from the data, the number of individuals moving through any particular size class on any particular occasion does not need to be considered in determining stage boundaries (as they would be in raw MPM estimation). IPMs are often complex, meaning that they include some life history stages that fall outside of the size classifications developed for the matrices, such as dormant seeds or juveniles.

The dataset that we have provided is organized in horizontal format, meaning that each row holds all of the data for a single, unique individual, and columns correspond to individual condition in particular monitoring occasions (which we refer to as years here, since there was one main census in each year). The original spreadsheet file used to keep the dataset has a repeating pattern to these columns, with each year having a similarly arranged group of variables. Let’s load the dataset.

data(lathyrus)
#summary(lathyrus)

This dataset includes information on 1,119 individuals, so there are 1,119 rows with data. There are 38 columns. The first two columns give identifying information about each individual (SUBPLOT refers to the patch, and GENET refers to individual identity), with each individual’s data entirely restricted to one row. This is followed by four sets of nine columns, each named VolumeXX, lnVolXX, FCODEXX, FlowXX, IntactseedXX, Dead19XX, DormantXX, Missing19XX, and SeedlingXX, where XX corresponds to the year of observation and with years organized consecutively. Thus, columns 3-11 refer to year 1988, columns 12-20 refer to year 1989, etc. This strictly repeated pattern allows us to manipulate the original dataset quickly and efficiently. We should know the number of years used, which is 4 years here, and we should also have arranged the columns in the same order for each year, with years in consecutive order with no extra columns between them. This order is not required, provided that we are willing to input all variable names in the correct order when transforming the dataset later.

Step 1. Life history model development

To begin, we will create a stageframe for this dataset. A stageframe is a data frame that describes all stages in the life history of the organism, in a way usable by the functions in this package and using stage names and classifications that completely match those used in the dataset. It must include complete descriptions of all stages that occur in the dataset, with each stage defined uniquely. Since this object can be used for automated classification of individuals, all sizes, reproductive states, and other characteristics defining each stage in the dataset need to be accounted for explicitly. This can be difficult if a few data points exist outside the range of sizes specified in the stageframe, so great care must be taken to include all relevant size values and values of other descriptor variables occurring within the dataset. The final description of each stage occurring in the dataset must not completely overlap with any other stage, although partial overlap is allowed. We will base our stageframe on the life history model provided in Ehrlén (2000), but use a different size classification based on leaf volume to allow IPM construction and make all mature stages other than vegetative dormancy reproductive, as shown in Figure 6.1 below.

Figure 6.1. Life history model of Lathyrus vernus. Not all adult classes are shown. Survival transitions are indicated with solid arrows, while fecundity transitions are indicated with dashed arrows.

In the stageframe code below, we show that we want an IPM by choosing two stages that serve as the size limits for IPM size classification. These two size classes should have exactly the same characteristics in the stageframe other than size. By choosing these two size limits, we can skip adding and describing the many size classes that will fall between these limits - function sf_create() will create all of these for us. We mark these limits in the vector that we load into the stagenames option using the string ipm. Package lefko3 will then create and name all IPM size classes according to its own conventions. The default number of size classes is 100 bins, which can be changed using the ipmbins option.

sizevector <- c(0, 100, 0, 1, 7100)
stagevector <- c("Sd", "Sdl", "Dorm", "ipm", "ipm")
repvector <- c(0, 0, 0, 1, 1)
obsvector <- c(0, 1, 0, 1, 1)
matvector <- c(0, 0, 1, 1, 1)
immvector <- c(1, 1, 0, 0, 0)
propvector <- c(1, 0, 0, 0, 0)
indataset <- c(0, 1, 1, 1, 1)
binvec <- c(0, 100, 0.5, 1, 1)
comments <- c("Dormant seed", "Seedling", "Dormant", "ipm adult stage", "ipm adult stage")
lathframeipm <- sf_create(sizes = sizevector, stagenames = stagevector, 
  repstatus = repvector, obsstatus = obsvector, propstatus = propvector,
  immstatus = immvector, matstatus = matvector, comments = comments,
  indataset = indataset, binhalfwidth = binvec, ipmbins = 100, roundsize = 3)
#lathframeipm

This stageframe has 103 stages. The IPM portion technically starts with the fourth stage and keeps going through the 103rd stage. Stage names within this range are concatenations of the size centroid (designated with sz), and, given a limited string length, the reproductive status, maturity status, and observation status. The first three stages, which fall outside of the IPM classification, are left unaltered.

Step 2a. Dataset standardization

To work with this dataset, we first need to format the data into vertical format, in which each row corresponds to the state of a single individual in two (if ahistorical) or three (if historical) consecutive time intervals. Because this is an IPM, we will need to estimate linear models of vital rates. This will require us to avoid NAs in size and fecundity (fortunately, in this dataset, NA is equivalent to 0), so we will set NAas0 = TRUE to inform R to treat NAs as zeroes. We will also set NRasRep = TRUE because we will assume that all adult stages other than dormancy are reproductive, and there are mature individuals in the dataset that do not reproduce but need to be included in reproductive stages (setting this option to TRUE makes sure that the reproductive status of non-reproductive individuals in potentially reproductive stages is set to 1, although the actual fecundity is not altered). Finally, we will ignore patches marked in the dataset and estimate matrices only for the full population, in order to preserve statistical power for vital rate modeling in historical IPM analysis.

In the input to verticalize3() below, we utilize a repeating pattern of variable names arranged in the same order for each monitoring occasion. This arrangement allows us to enter only the first variable in each set, as long as noyears and blocksize are set properly and no gaps or shuffles appear in the dataset. The data management functions that we have created for lefko3 do not require such repeating patterns, but they do make the required input in the function much shorter and more succinct. To see a more detailed summary of the standardized dataset, please remove full = FALSE from the summary_hfv() call.

lathvertipm <- verticalize3(lathyrus, noyears = 4, firstyear = 1988, 
  individcol = "GENET", blocksize = 9, juvcol = "Seedling1988", 
  sizeacol = "Volume88", repstracol = "FCODE88", fecacol = "Intactseed88", 
  deadacol = "Dead1988", nonobsacol = "Dormant1988", stageassign = lathframeipm,
  stagesize = "sizea", censorcol = "Missing1988",censorkeep = NA, censor = TRUE,
  NAas0 = TRUE, NRasRep = TRUE)
#> hfv dataset has been censored using a non-repeating censor variable.
#>       Please check the resulting dataset with the summary_hfv() function to make
#>       sure that not too much has been removed from the original dataset.
summary_hfv(lathvertipm, full = FALSE)
#> 
#> This hfv dataset contains 2552 rows, 42 variables, 1 population, 1 patch, 276 individuals, and 3 time steps.

Before we move on to the next steps in analysis, let’s take a closer look at fecundity. In this dataset, fecundity is mostly a count of intact seeds, and only differs in six cases where the seed output was estimated based on other models. To see this, try the following code.

writeLines(paste0("Length of fecundity variable in t+1: ", length(lathvertipm$feca3),
    ", and # non-integer entries: ", length(which(lathvertipm$feca3 !=
    round(lathvertipm$feca3)))))
#> Length of fecundity variable in t+1: 2552, and # non-integer entries: 0
writeLines(paste0("Length of fecundity variable in t: ", length(lathvertipm$feca2),
    ", and # non-integer entries: ", length(which(lathvertipm$feca2 !=
    round(lathvertipm$feca2)))))
#> Length of fecundity variable in t: 2552, and # non-integer entries: 6
writeLines(paste0("Length of fecundity variable in t-1: ", length(lathvertipm$feca1),
    ", and # non-integer entries: ", length(which(lathvertipm$feca1 !=
    round(lathvertipm$feca1)))))
#> Length of fecundity variable in t-1: 2552, and # non-integer entries: 6

We see that we have quite a bit of fecundity data, and that it is overwhelmingly but not exclusively integer. So, we can either treat fecundity as a continuous variable, or round the values and treat fecundity as a count variable. We will choose the latter approach in this analysis.

lathvertipm$feca3 <- round(lathvertipm$feca3)
lathvertipm$feca2 <- round(lathvertipm$feca2)
lathvertipm$feca1 <- round(lathvertipm$feca1)

Although we wish to treat fecundity as a count, it is still not clear what underlying distribution we should use. This package currently allows eight choices: Gaussian, Gamma, Poisson, negative binomial, zero-inflated Poisson, zero-inflated negative binomial, zero-truncated Poisson, and zero-truncated negative binomial. To assess which to use, we should first assess whether the mean and variance of the count are equal using a dispersion test. The Poisson distribution assumes that the mean and variance are equal, and so we can test this assumption using a chi-squared test. If it is not significantly different, then we may use some variant of the Poisson distribution. If the data are significantly over- or under-dispersed, then we should use the negative binomial distribution. If fecundity of 0 is possible in reproductive stages, as in cases where reproductive status is defined by flowering rather than by offspring production, then we should also test whether the number of zeroes is significantly greater than expected under these distributions, and use a zero-inflated distribution if so (if fecundity does not equal 0 in any reproductive individuals at all, then we should use a zero-truncated distribution).

Let’s start off by looking at a plot of the distribution of fecundity.

hist(subset(lathvertipm, repstatus2 == 1)$feca2, main = "Fecundity", 
  xlab = "Intact seeds produced in occasion t")

Figure 6.2. Histogram of fecundity in occasion t

We see that the distribution conforms to a classic count variable with a very low mean value. The first bar suggests that there may be too many zeroes, as well. Let’s now formally test our assumptions. We will use the hfv_qc() function, which will allow us to assess the quality of the data from the viewpoint of the linear models to be built later. We will use most of the input from the modelsearch() call that we will conduct later.

hfv_qc(data = lathvertipm, suite = "size",
  vitalrates = c("surv", "obs", "size", "fec"), juvestimate = "Sdl",
  indiv = "individ", year = "year2", age = "obsage", year.as.random = TRUE)
#> Survival:
#> 
#>   Data subset has 43 variables and 2267 transitions.
#> 
#>   Variable alive3 has 0 missing values.
#>   Variable alive3 is a binomial variable.
#> 
#> 
#> Observation status:
#> 
#>   Data subset has 43 variables and 2122 transitions.
#> 
#>   Variable obsstatus3 has 0 missing values.
#>   Variable obsstatus3 is a binomial variable.
#> 
#> 
#> Primary size:
#> 
#>   Data subset has 43 variables and 1916 transitions.
#> 
#>   Variable sizea3 has 0 missing values.
#>   Variable sizea3 appears to be a floating point variable.
#>   1256 elements are not integers.
#>   The minimum value of sizea3 is 3.4 and the maximum is 6646.
#>   The mean value of sizea3 is 512.8 and the variance is 507200.
#>   The value of the Shapiro-Wilk test of normality is 0.7134 with P = 3.014e-49.
#>   Variable sizea3 differs significantly from a Gaussian distribution.
#> 
#>   Variable sizea3 is fully positive, lacking even 0s.
#> 
#> 
#> Reproductive status:
#> 
#>   Data subset has 43 variables and 1916 transitions.
#> 
#>   Variable repstatus3 has 0 missing values.
#>   Variable repstatus3 is a binomial variable.
#> 
#> 
#> Fecundity:
#> 
#>   Data subset has 43 variables and 2267 transitions.
#> 
#>   Variable feca2 has 0 missing values.
#>   Variable feca2 appears to be an integer variable.
#> 
#>   Variable feca2 is fully non-negative.
#> 
#>   Overdispersion test:
#>     Mean feca2 is 1.278
#>     The variance in feca2 is 23.13
#>     The probability of this dispersion level by chance assuming that
#>     the true mean feca2 = variance in feca2,
#>     and an alternative hypothesis of overdispersion, is 0
#>     Variable feca2 is significantly overdispersed.
#> 
#>   Zero-inflation and truncation tests:
#>     Mean lambda in feca2 is 0.2785
#>     The actual number of 0s in feca2 is 2000
#>     The expected number of 0s in feca2 under the null hypothesis is 631.4
#>     The probability of this deviation in 0s from expectation by chance is 0
#>     Variable feca2 is significantly zero-inflated.
#> 
#> 
#> Juvenile survival:
#> 
#>   Data subset has 43 variables and 285 transitions.
#> 
#>   Variable alive3 has 0 missing values.
#>   Variable alive3 is a binomial variable.
#> 
#> 
#> Juvenile observation status:
#> 
#>   Data subset has 43 variables and 210 transitions.
#> 
#>   Variable obsstatus3 has 0 missing values.
#>   Variable obsstatus3 is a binomial variable.
#> 
#> 
#> Juvenile primary size:
#> 
#>   Data subset has 43 variables and 193 transitions.
#> 
#>   Variable sizea3 has 0 missing values.
#>   Variable sizea3 appears to be a floating point variable.
#>   127 elements are not integers.
#>   The minimum value of sizea3 is 2.1 and the maximum is 61.
#>   The mean value of sizea3 is 11.23 and the variance is 50.81.
#>   The value of the Shapiro-Wilk test of normality is 0.5997 with P = 5.72e-21.
#>   Variable sizea3 differs significantly from a Gaussian distribution.
#> 
#>   Variable sizea3 is fully positive, lacking even 0s.
#> 
#> 
#> Juvenile reproductive status:
#> 
#>   Data subset has 43 variables and 193 transitions.
#> 
#>   Variable repstatus3 has 0 missing values.
#>   Variable repstatus3 is a binomial variable.
#> 
#> 
#> Juvenile maturity status:
#> 
#>   Data subset has 43 variables and 210 transitions.
#> 
#>   Variable matstatus3 has 0 missing values.
#>   Variable matstatus3 is a binomial variable.

All of the probability variables look like binomials, so we have no problem there. Size and juvenile size appear to be continuous variables that differ significantly from the assumptions of the Gaussian distribution. However, we will still use a Gaussian distribution in this case, for instructional purposes. Fecundity is a count variable with significant overdispersion and significantly more zeros than expected, so we should use a zero-inflated negative binomial distribution.

Step 2b: Develop supplemental information for matrix estimation

Now we will create supplement tables, which provide extra data for matrix estimation that is not included in the main demographic dataset. Specifically, we will provide the seed dormancy probability and germination rate, which are given as transitions from the dormant seed stage to another year of seed dormancy or to the germinated seedling stage, respectively. We assume that the germination rate is the same regardless of whether seed was produced in the previous year or has been in the seedbank for longer. We will incorporate these terms both as fixed constants for specific transitions within the resulting matrices, and as multipliers for fecundity, since ultimately fecundity will be estimated as the production of seed multiplied by the seed germination rate or the seed dormancy rate. Because some individuals stay in the seedling stage for only 1 year, and the seed stage itself cannot be observed and so does not exist in the dataset, we will also set a proxy set of transitions so that R assumes that the transitions from seed in occasion t-1 to seedling in occasion t to all mature stages in occasion t+1 are equal to the equivalent transitions from seedling in both occasions t-1 and t. We will start with the ahistorical case, and then move on to the historical case, where we also need to input the corresponding stages in occasion t-1 and transition types from occasion t-1 to t for each transition.

lathsupp2 <- supplemental(stage3 = c("Sd", "Sdl", "Sd", "Sdl"), 
  stage2 = c("Sd", "Sd", "rep", "rep"),
  givenrate = c(0.345, 0.054, NA, NA),
  multiplier = c(NA, NA, 0.345, 0.054),
  type = c(1, 1, 3, 3), stageframe = lathframeipm, historical = FALSE)
lathsupp3 <- supplemental(stage3 = c("Sd", "Sd", "Sdl", "Sdl", "npr", "Sd", "Sdl"), 
  stage2 = c("Sd", "Sd", "Sd", "Sd", "Sdl", "rep", "rep"),
  stage1 = c("Sd", "rep", "Sd", "rep", "Sd", "mat", "mat"),
  eststage3 = c(NA, NA, NA, NA, "npr", NA, NA),
  eststage2 = c(NA, NA, NA, NA, "Sdl", NA, NA),
  eststage1 = c(NA, NA, NA, NA, "Sdl", NA, NA),
  givenrate = c(0.345, 0.345, 0.054, 0.054, NA, NA, NA),
  multiplier = c(NA, NA, NA, NA, NA, 0.345, 0.054),
  type = c(1, 1, 1, 1, 1, 3, 3), type_t12 = c(1, 2, 1, 2, 1, 1, 1),
  stageframe = lathframeipm, historical = TRUE)
#lathsupp2
#lathsupp3

These supplement tables provide the best means of adding external data to our MPMs because they allow both specific transitions to be isolated, and because they allow the use of shorthand to identify large groups of transitions (e.g. using mat, immat, rep, nrep, prop, npr, obs, nobs, groupX, or all to signify all mature stages, immature stages, reproductive stages, non-reproductive stages, propagule stages, non-propagule stages, observable stages, unobservable stages, stages within group X, or simply all stages, respectively). They also allow more powerful changes to MPMs, such as block reset of groups of transitions to zero or other constants.

Step 3. Tests of history, and vital rate modeling

Integral projection models (IPMs) require functions of vital rates to populate them. Here, we will develop these functions as linear models using modelsearch(). This function automates several crucial and complex tasks in MPM analysis. Specifically, it automates 1) the building of global models for each vital rate requested, 2) the exhaustive construction of all reduced models, and 3) the selection of the best-fit models. Let’s look at some of the options that we will utilize for this purpose (please note that this list includes only some of the options actually offered by the function - further options are shown in the documentation for modelsearch(), and further theoretical details are shown in the Basic theory and concepts vignette):

historical: Setting this to TRUE or FALSE indicates to include state in occasion t-1 in the global models.

approach: Designates whether to use generalized linear models (glm), in which all factors are treated as fixed, or mixed effects models (mixed), in which factors are treated as either fixed or random (most notably, occasion, patch, and individual identity can be treated as random).

suite: Designates which factors to include in the global model. Possible values include size, which includes size only; rep, which includes reproductive status only; main, which includes both size and reproductive status as main effects only; full, which includes both size and reproductive status and all two-way interactions; and cons, which includes only an intercept. These factors are in addition to individual identity, patch, and occasion.

vitalrates: Designates which vital rates to model. The default is to model survival, size, and fecundity, but users can also model observation status and reproduction status. Fewer vital rates may also be chosen.

juvestimate: Designates the name of the juvenile stage transitioning to maturity, in cases where the dataset includes data on juveniles.

juvsize: Denotes whether size should be used as an independent term in models involving transition from the juvenile stage.

bestfit: Denotes whether the best-fit model should be chosen as the model with the lowest AICc (AICc) or as the most parsimonious model (AICc&k), where the latter is the model that has the fewest estimated parameters and is within 2 AICc units of the model with the lowest AICc.

sizedist: Designates the distribution used for size. The options include gaussian, gamma, poisson, and negbin, the last of which refers to the negative binomial distribution (quadratic parameterization). Versions also exist for up to two further size variables in addition to primary size.

ipm_method: Designates whether to estimate size transition probabilities using the cumulative density function (CDF) method (the default), or the midpoint method. The CDF method is exact while the midpoint method generally overestimates (see Doak et al. (2021) for further details).

fecdist: Designates the distribution used for fecundity. The options include gaussian, gamma, poisson, and negbin, the last of which refers to the negative binomial distribution (quadratic parameterization).

fectime: Designates whether fecundity is estimated within occasion t (the default) or occasion t+1. Plant ecologists are likely to choose the former, since fecundity is typically estimated via a proxy such as flowers, fruit, or seed produced. Wildlife ecologists might choose the latter, since fecundity may be best estimated as a count of actual juveniles within nests, burrows, or other family structures.

size.zero: Designates whether the size distribution should be zero-inflated. Only applies to the Poisson and negative binomial distributions. Versions for up to two other size variables exist.

size.trunc: Designates whether to use a zero-truncated distribution for size. Only applies to the Poisson and negative binomial distributions, and cannot be applied if size.zero = TRUE. Versions for up to two other size variables exist.

fec.zero: Designates whether the fecundity distribution should be zero-inflated. Only applies to the Poisson and negative binomial distributions.

fec.trunc: Designates whether to use a zero-truncated distribution for fecundity. Only applies to the Poisson and negative binomial distributions, and cannot be applied if fec.zero = TRUE.

jsize.zero: Designates whether the juvenile size distribution should be zero-inflated. Only applies to the Poisson and negative binomial distributions. Versions for up to two other size variables exist.

jsize.trunc: Designates whether to use a zero-truncated distribution for juvenile size. Only applies to the Poisson and negative binomial distributions, and cannot be applied if jsize.zero = TRUE. Versions for up to two other size variables exist.

indiv: Designates the variable corresponding to individual identity in the vertical dataset.

patch: Designates the variable corresponding to patch identity in the vertical dataset.

year: Designates the variable corresponding to time in occasion t in the vertical dataset.

year.as.random: Designates whether to treat year as a random factor, and is only used when approach = "mixed".

patch.as.random: Designates whether to treat patch identity as a random factor, and is only used when approach = "mixed".

quiet: Denotes whether to silence guidepost statements and most warning messages. Only warnings incurred in model testing will be visible, including warnings from the testing of global and reduced models produced during model dredging.

This function both develops the vital rate models necessary for function-based MPM estimation, and allows us to test whether individual history affects demography. Setting historical = TRUE fits size and/or reproductive status in occasions t and t-1 into global models, while setting historical = FALSE limits testing to the impacts of state in occasion t only. The model building and selection protocols can then be used to see if history has a significant impact on a vital rate, by assessing whether a historical term is retained in the best-fit model.

First, we will create the historical models to assess whether history is a significant influence on vital rates. Because all stages are defined as potentially reproductive, we will not test for the impacts of reproductive status. Please remove the hashtag from the summary call to see details of the best-fit models.

lathmodels3ipm <- modelsearch(lathvertipm, historical = TRUE, approach = "mixed", 
  suite = "size", vitalrates = c("surv", "obs", "size", "fec"), 
  juvestimate = "Sdl", bestfit = "AICc&k", sizedist = "gaussian", 
  fecdist = "negbin", fec.zero = TRUE, indiv = "individ", year = "year2", 
  year.as.random = TRUE, juvsize = TRUE, quiet = TRUE)
#summary(lathmodels3ipm)

We see here, as before, that size in occasion t-1 exerts an influence on some vital rates, including survival to occasion t+1 and size in occasion t+1. So, the historical IPM is the correct choice here. Accuracy is also quite high for adult survival and juvenile and adult observation, but lower elsewhere.

We will also create an ahistorical IPM for comparison. For that purpose, we will create the ahistorical linear model set.

lathmodels2ipm <- modelsearch(lathvertipm, historical = FALSE, 
  approach = "mixed", suite = "size", 
  vitalrates = c("surv", "obs", "size", "fec"), juvestimate = "Sdl", 
  bestfit = "AICc&k", sizedist = "gaussian", fecdist = "negbin", 
  fec.zero = TRUE, indiv = "individ", year = "year2", year.as.random = TRUE, 
  juvsize = TRUE, quiet = TRUE)
#summary(lathmodels2ipm)

Step 4. IPM estimation

We will now create the historical suite of matrices covering the years of study. These matrices will be extremely large - large enough that some computers might have difficulty with them. If you encounter an error message telling you that you have run out of memory, then please try this on a more powerful computer :) .

lathmat3ipm <- flefko3(stageframe = lathframeipm, modelsuite = lathmodels3ipm,
  supplement = lathsupp3, data = lathvertipm, reduce = FALSE)
#summary(lathmat3ipm)

These are giant matrices. With 10,609 rows and columns, there are a total of 112,550,881 elements per matrix. But they are also amazingly sparse - with 915,103 elements estimated, only 0.8% of elements per matrix are non-zero. The survival probability sums all look good, so we appear to have no problems with overly large given and proxy survival transitions provided through our supplemental tables. Let’s now build the ahistorical IPMs.

lathmat2ipm <- flefko2(stageframe = lathframeipm, modelsuite = lathmodels2ipm,
  supplement = lathsupp2, data = lathvertipm, reduce = FALSE)
#summary(lathmat2ipm)

The ahistorical IPMs are certainly smaller than the historical IPMs, but are nonetheless quiet large. Although huge, these matrices are not sparse - an average of 9,182.3 elements out of 10,609 per matrix are estimated (86.6%; note that the summary() function only counts elements as estimated if they equal a value other than 0, leading the exact number of estimated elements to vary among matrices when elements are estimated at near 0).

Let’s do a further comparison - let’s view a matrix plot of each kind of MPM, ahistorical and then historical. First, the ahistorical plot.

image3(lathmat2ipm, used = 1)

Figure 6.3. Image of ahistorical projection matrix

#> [[1]]
#> NULL

Now the historical plot.

image3(lathmat3ipm, used = 1)

Figure 6.4. Image of historical projection matrix

#> [[1]]
#> NULL

The plots above show that the ahistorical matrix is large but dense, mostly full of non-zero entries (the colored elements correspond to non-zero elements). In contrast, the historical matrix is huge and sparse, mostly full of zeroes with a general pattern to the distribution of non-zero elements.

Now let’s estimate the mean IPM matrices. We will estimate one mean matrix each, because we did not separate patches in the data reorganization and vital rate modeling. As a check, let’s also look over the summaries.

lath2ipmmean <- lmean(lathmat2ipm)
lath3ipmmean <- lmean(lathmat3ipm)
#summary(lath2ipmmean)
#summary(lath3ipmmean)

All looks fine! Let’s also take a look at a portion of one of the conditional historical matrices, particularly the matrix conditional on vegetative dormancy in occasion t-1. This matrix can be compared to the ahistorical mean to assess the impacts of history on the matrix elements themselves. Please remember to remove the hashtag.

l3mcond <- cond_hmpm(lath3ipmmean)
#l3mcond$Mcond[[1]]$Dorm

Step 5. MPM analysis

Now let’s estimate and plot the deterministic population growth rate, \(\lambda\), for each year.

ipm2lambda <- lambda3(lathmat2ipm)
ipm3lambda <- lambda3(lathmat3ipm)

plot(lambda ~ year2, data = ipm2lambda, xlab = "Year", ylab = "Lambda", 
  ylim = c(0.65, 1.00), type = "l", lwd = 2, bty = "n")
lines(lambda ~ year2, data = ipm3lambda, lwd = 2, lty = 2, col = "red")
legend("bottomleft", c("ahistorical", "historical"), lty = c(1, 2), 
  col = c("black", "red"), lwd = 2, bty = "n")

Figure 6.5. Ahistorical vs. historical lambda

Ahistorical estimates of \(\lambda\) are lower than historical estimates, and the historical \(\lambda\) values are more in line with estimates from the other Lathyrus vignettes.

Let’s now look at \(\lambda\) for the mean matrices and compare with the stochastic growth rate, \(a = \text{log} \lambda _{S}\). We will set the number of simulations low in the historical case in order to keep the amount of memory used and computation time low, because the size of the historical matrices will use up plenty of both (the default is 10,000). We will also set the seed for the random number generator to make our results reproducible.

lambda3(lath2ipmmean)
#>   pop patch    lambda
#> 1   1     1 0.8074309
lambda3(lath3ipmmean)
#>   pop patch    lambda
#> 1   1     1 0.8777134

set.seed(42)
slambda3(lathmat2ipm)
#>   pop patch          a       var        sd          se
#> 1   1     1 -0.2211827 0.0998485 0.3159881 0.003159881
set.seed(42)
slambda3(lathmat3ipm, times = 1000)
#>   pop patch          a        var        sd          se
#> 1   1     1 -0.1364482 0.07710515 0.2776781 0.008780954

The historical growth rate is larger than the ahistorical in both deterministic and stochastic analyses, although all four numbers suggest a declining population. Now let’s compare the stable stage distribution from both the ahistorical and historical mean MPMs.

ipm2ss <- stablestage3(lath2ipmmean)
ipm3ss <- stablestage3(lath3ipmmean)
ipm2ss_s <- stablestage3(lathmat2ipm, stochastic = TRUE, seed = 42)
ipm3ss_s <- stablestage3(lathmat3ipm, stochastic = TRUE, times = 1000, seed = 42)

ss_put_together <- cbind.data.frame(ipm2ss$ss_prop, ipm3ss$ahist$ss_prop,
  ipm2ss_s$ss_prop, ipm3ss_s$ahist$ss_prop)
names(ss_put_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(ss_put_together) <- ipm2ss$stage_id

lty.o <- par("lty")
par(lty = 0)
barplot(t(ss_put_together), beside=T, ylab = "Proportion", xlab = "Stage",
  col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
  col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")

Figure 6.6. Ahistorical vs. historically-corrected stable stage distribution

par(lty = lty.o)

Both ahistorical and historical approaches show the stable stage distribution dominated by the second stage, which is the seedling stage. The next highest is the dormant seed stage, followed by the adult dormant stage. Stochastic analysis shows similar patterns, with ahistorical analyses suggesting a greater overall share of the population in the seedling stage.

Let’s now try a sensitivity analysis. We will conduct both deterministic and stochastic analyses of ahistorical and historical MPMs. Because of the size of the hMPMs, we will limit run time of the stochastic sensitivity analysis to 200 time steps (this may take several hours even with reduced time steps). We will also limit our analysis to biologically possible transitions by ignoring the sensitivities of zero elements.

lath2ipmsens <- sensitivity3(lath2ipmmean)
lath3ipmsens <- sensitivity3(lath3ipmmean)

set.seed(42)
lath2ipmsens_s <- sensitivity3(lathmat2ipm, stochastic = TRUE)
set.seed(42)
lath3ipmsens_s <- sensitivity3(lathmat3ipm, stochastic = TRUE, steps = 200)

writeLines("Highest ahistorical deterministic sensitivity is associated with
  element ")
#> Highest ahistorical deterministic sensitivity is associated with
#>   element
which(lath2ipmsens$ah_sensmats[[1]] == max(lath2ipmsens$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 273
writeLines("Highest historically-corrected deterministic sensitivity is
  associated with element ")
#> Highest historically-corrected deterministic sensitivity is
#>   associated with element
which(lath3ipmsens$ah_sensmats[[1]] == max(lath3ipmsens$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 273
writeLines("Highest ahistorical stochastic sensitivity is associated with
  element ")
#> Highest ahistorical stochastic sensitivity is associated with
#>   element
which(lath2ipmsens_s$ah_sensmats[[1]] == max(lath2ipmsens_s$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 273
writeLines("Highest historically-corrected stochastic sensitivity is associated
  with element ")
#> Highest historically-corrected stochastic sensitivity is associated
#>   with element
which(lath3ipmsens_s$ah_sensmats[[1]] == max(lath3ipmsens_s$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 273
writeLines("Highest historical deterministic sensitivity is associated with
  element ")
#> Highest historical deterministic sensitivity is associated with
#>   element
which(lath3ipmsens$h_sensmats[[1]] == max(lath3ipmsens$h_sensmats[[1]][which(lath3ipmmean$A[[1]] > 0)]))
#> [1] 2
writeLines("Highest historical stochastic sensitivity is associated with
  element ")
#> Highest historical stochastic sensitivity is associated with
#>   element
which(lath3ipmsens_s$h_sensmats[[1]] == max(lath3ipmsens_s$h_sensmats[[1]][which(lath3ipmmean$A[[1]] > 0)]))
#> [1]   2 105

Our sensitivity analyses generally suggest that population growth rate is most sensitive to element 273, which is the third column (271 / 103 = 2.650) and the 67th row and so is associated with the transition from vegetative dormancy to one of the mid-to-large-sized adult stages. Deterministic analysis of the hMPM concurs with this (see the historically-corrected output). Historical deterministic and stochastic sensitivity analyses both suggest that population growth rate is most sensitive to changes in element 2, which is the transition from dormant seed in times t-1 and t to seedling in time t+1. Historical stochastic sensitivity also shows an importance to element 105, but this is a biologically impossible transition requiring the individual to be in the dormant seed and seedling stage concurrently in time t. So, sensitivity analysis strongly suggests that vegetative dormancy and mid-sized adults have strong impacts on population growth rate.

We will now move on to elasticity analysis. We will reduce the number of occasion steps simulated in the stochastic historical analysis to reduce the amount of computer processing time (this might take several hours even with the number of steps reduced). We will also remove the sensitivity analyses from memory.

rm("lath2ipmsens", "lath3ipmsens", "lath2ipmsens_s", "lath3ipmsens_s")

lath2ipmelas <- elasticity3(lath2ipmmean)
lath3ipmelas <- elasticity3(lath3ipmmean)

set.seed(42)
lath2ipmelas_s <- elasticity3(lathmat2ipm, stochastic = TRUE)
set.seed(42)
lath3ipmelas_s <- elasticity3(lathmat3ipm, stochastic = TRUE, steps = 200)

writeLines("\nThe highest ahistorical deterministic elasticity is associated
  with element: ")
#> 
#> The highest ahistorical deterministic elasticity is associated
#>   with element:
which(lath2ipmelas$ah_elasmats[[1]] == max(lath2ipmelas$ah_elasmats[[1]]))
#> [1] 209
writeLines("\nThe highest historically-corrected deterministic elasticity is
  associated with element: ")
#> 
#> The highest historically-corrected deterministic elasticity is
#>   associated with element:
which(lath3ipmelas$ah_elasmats[[1]] == max(lath3ipmelas$ah_elasmats[[1]]))
#> [1] 2
writeLines("\nThe highest ahistorical stochastic elasticity is associated with
  element: ")
#> 
#> The highest ahistorical stochastic elasticity is associated with
#>   element:
which(lath2ipmelas_s$ah_elasmats[[1]] == max(lath2ipmelas_s$ah_elasmats[[1]]))
#> [1] 209
writeLines("\nThe highest historically-corrected stochastic elasticity is
  associated with element: ")
#> 
#> The highest historically-corrected stochastic elasticity is
#>   associated with element:
which(lath3ipmelas_s$ah_elasmats[[1]] == max(lath3ipmelas_s$ah_elasmats[[1]]))
#> [1] 5
writeLines("\nThe highest historical deterministic elasticity is associated
  with element: ")
#> 
#> The highest historical deterministic elasticity is associated
#>   with element:
which(lath3ipmelas$h_elasmats[[1]] == max(lath3ipmelas$h_elasmats[[1]]))
#> [1] 10717
writeLines("\nThe highest historical stochastic elasticity is associated with
  element: ")
#> 
#> The highest historical stochastic elasticity is associated with
#>   element:
which(lath3ipmelas_s$h_elasmats[[1]] == max(lath3ipmelas_s$h_elasmats[[1]]))
#> [1] 10717

Ahistorical analyses agree that \(\lambda\) is most elastic in response to stasis in vegetative dormancy (element 209 is the element in the third column, 209 / 103 = 2.029, and the third row, 209 - (103 * 2) = 3). The historical analyses both agree that population growth rate is most elastic to changes in element 10,717, which is in the 2nd column and the 108th row and corresponds to the transition from dormant seed in time t-1 and seedling in time t to one of the smallest adult stages in time t+1. Historically-corrected deterministic and stochastic results suggest that elements 2 and 5 are most crucial, and these correspond to the transitions from dormant seed to seedling and small adult (second size class). Now let’s plot the elasticity of \(\lambda\) to transitions from both perspectives.

elas_put_together <- cbind.data.frame(colSums(lath2ipmelas$ah_elasmats[[1]]), 
  colSums(lath3ipmelas$ah_elasmats[[1]]), colSums(lath2ipmelas_s$ah_elasmats[[1]]), 
  colSums(lath3ipmelas_s$ah_elasmats[[1]]))
names(elas_put_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(elas_put_together) <- lath2ipmelas$ah_stages$stage_id

lty.o <- par("lty")
par(lty = 0)
barplot(t(elas_put_together), beside=T, ylab = "Elasticity", xlab = "Stage",
  col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
  col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")

Figure 6.7. Ahistorical vs. historically-corrected elasticity of lambda to stage

par(lty = lty.o)

The plot of these distributions shows the strong importance of vegetative dormancy and small adults. The tallest bars in ahistorical analyses correspond to dormant adults. Historical analyses reduce the importance of dormant adults somewhat, and show a greater importance to slightl larger adults than in the ahistorical case.

Now let’s take a look at the summed elasticities of different kinds of transitions, beginning with a comparison of ahistorical to historically-corrected transitions.

lath2elas_sums <- summary(lath2ipmelas)
lath3elas_sums <- summary(lath3ipmelas)
lath2elas_sums_s <- summary(lath2ipmelas_s)
lath3elas_sums_s <- summary(lath3ipmelas_s)

elas_sums_together <- cbind.data.frame(lath2elas_sums$ahist[,2],
  lath3elas_sums$ahist[,2], lath2elas_sums_s$ahist[,2], lath3elas_sums_s$ahist[,2])
names(elas_sums_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(elas_sums_together) <- lath2elas_sums$ahist$category

barplot(t(elas_sums_together), beside=T, ylab = "Elasticity", xlab = "Transition",
  col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
  col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")

Figure 6.8. Ahistorical vs. historically-corrected elasticity of lambda to transitions

We see similar transition elasticities between ahistorical and historical analyses, with growth the most influential on \(\lambda\) and shrinkage next, while fecundity is the least influential depending on whether the analysis is ahistorical or historical.

Package lefko3 also includes functions to conduct deterministic and stochastic life table response experiments, and two general projection functions that can be used to project IPMs for analyses such as quasi-extinction analysis as well as for density dependent analysis. Users wishing to conduct these analyses should see our free e-manual called lefko3: a gentle introduction and other vignettes, including long-format and video vignettes, on the projects page of the Shefferson lab website. Our function-based Cypripedium candidum vignette includes an example.

Acknowledgements

We are grateful to two anonymous reviewers whose scrutiny improved the quality of this vignette. The project resulting in this package and this tutorial was funded by Grant-In-Aid 19H03298 from the Japan Society for the Promotion of Science.

Literature cited

Doak, Daniel F., Ellen Waddle, Ryan E. Langendorf, Allison M. Louthan, Nathalie Isabelle Chardon, Reilly R. Dibner, Douglas A. Keinath, et al. 2021. “A Critical Comparison of Integral Projection and Matrix Projection Models for Demographic Analysis.” Ecological Monographs 91 (2): e01447. https://doi.org/10.1002/ecm.1447.
Ehrlén, Johan. 1995. “Demography of the Perennial Herb Lathyrus Vernus. I. Herbivory and Individual Performance.” Journal of Ecology 83 (2): 287–95. https://doi.org/10.2307/2261567.
———. 2000. “The Dynamics of Plant Populations: Does the History of Individuals Matter?” Ecology 81 (6): 1675–84. https://doi.org/10.1890/0012-9658(2000)081[1675:TDOPPD]2.0.CO;2.
———. 2002. “Assessing the Lifetime Consequences of Plant-Animal Interactions for the Perennial Herb Lathyrus Vernus (Fabaceae).” Perspectives in Plant Ecology, Evolution and Systematics 5 (3): 145–63. https://doi.org/10.1078/1433-8319-00031.
Ehrlén, Johan, and Ove Eriksson. 1996. “Seedling Recruitment in the Perennial Herb Lathyrus Vernus.” Flora 191 (4): 377–83. https://doi.org/10.1016/S0367-2530(17)30744-2.
Ehrlén, Johan, and Kari Lehtila. 2002. “How Perennal Are Perennial Plants?” Oikos 98: 308–22. https://doi.org/10.1034/j.1600-0706.2002.980212.x.
Ehrlén, Johan, and Zuzana Munzbergova. 2009. “Timing of Flowering: Opposed Selection on Different Fitness Components and Trait Covariation.” The American Naturalist 173 (6): 819–30. https://doi.org/10.1086/598492.
Ehrlén, Johan, and Jan Van Groenendael. 2001. “Storage and the Delayed Costs of Reproduction in the Understorey Perennial Lathyrus Vernus.” Journal of Ecology 89 (2): 237–46. https://doi.org/10.1046/j.1365-2745.2001.00546.x.
Ellner, Stephen P., and Mark Rees. 2006. “Integral Projection Models for Species with Complex Demography.” American Naturalist 167 (3): 410–28. https://doi.org/10.1086/499438.