\documentclass{article}
\usepackage[pdftex]{graphicx}
\usepackage{amsmath}
\addtolength{\textwidth}{1in}
\addtolength{\oddsidemargin}{-.5in}
\setlength{\evensidemargin}{\oddsidemargin}
\SweaveOpts{keep.source=TRUE}
%\VignetteIndexEntry{The lmekin function}
%\VignetteDependes{nlme}
%\VignetteDependes{coxme, nlme}
\title{The lmekin function}
\author{Terry Therneau\\ Mayo Clinic}
\begin{document}
\maketitle
<>=
options(continue=' ', width=60)
@
\section{Background}
The original kinship library had an implementation of linear mixed effects
models using the matrix code found in coxme.
Since the primary motivation for the functions in that library was to
fit models with random family effects, i.e., using a kinship
matrix for the correlation, the name \emph{lmekin} was
chosen.
The reason for the program was entirely to check our arithmetic:
the result of the matrix manipulations contained in it should give
exactly the same answer as lme, and since the underlying routines were shared
with coxme that gave a validity check for parts of coxme.
With more time and a larger test suite the routine is no longer
necessary for this purpose, however, it became popular with
users (they often do unanticipated things)
since it can fit a few models that lme cannot.
Let me emphasis this: most models that can be fit with the
lmekin function can also be fit with lme and/or lmer. For any
such model the lme/lmer functions will be faster and
have superior support routines (residuals, printing, plotting, etc.)
The solution code for lmer is likely also more reliable since it has
been exercised on a much wider variety of data sets.
However, there are models that lmekin will fit which lme will not.
The most obvious of these are models with a random genetic effect,
e.g. a kinship matrix.
The second class will be models for which the user has written
their own variance extension, as described in the variance vignette.
The follow-up methods for lmekin are limited, which reflects the
fact that linear mixed effects models are not a primary focus for
me, the author of the coxme package.
A primary reason to update lmekin at all is a desire to depreciate
the original kinship package; this routine was the last bit of functionality
that is not otherwise available.
The set of models fit by lmekin was also extended to include all of the
random effects structures supported by coxme,
which should make the routine more valuable.
Contributions by others with deeper interest will be warmly received.
Nevertheless, the core code is solid and reliable to the best of
my ability and will be actively maintained.
\section{Simple Models}
The control code for lmekin is identical to coxme with respect to
specifying the random effects, and both are modeled on the
methods used in lmer. Here is a simple example using one of the
data sets from Pinheiro and Bates.
<<>>=
library(coxme)
require(nlme)
fit1 <- lme(effort~Type, random= ~ 1|Subject,data=ergoStool,
method="ML")
fit2 <- lmekin(effort ~ Type + (1|Subject), data=ergoStool,
method="ML")
print(fit1)
print(fit2)
@
And here is a slightly more complex one based on data from
J. Cortinas \cite{Cortinas02}. There are 37 centers of varying size, and the
simulated data set has both random intercepts and treatment
effects per center.
<<>>=
tdata <-eortc
tdata$center2 <- factor(tdata$center)
fit3 <- lme(y ~ trt, random= ~ trt|center2, data=tdata,
method="ML")
fit3
fit4 <- lmekin(y ~ trt + (1+ trt|center), tdata)
fit4
all.equal(fit3$logLik, fit4$loglik)
@
First note that the two fits give identical log-likelihoods,
even though the coefficients differ.
The log-likelihood function is somewhat flat on top, and
because of different default starting estimates the two
programs do not end up at exactly the same place.
One small difference above is that lmekin is a little more
forgiving with respect to groups. The center variable
in the eortc data set is numeric, when it appears on
the right hand side of the vertical bar \verb!(1 + trt|center)!
the program assumes it is a grouping effect.
The lme routine insists that the grouping variable be a factor.
(In defense of lme, if one were to accidentally put a continuous
variable on the right such as age, which has no business being
there, the error message is welcome.)
A more important difference
from lme (and lmer) is the inclusion of random intercepts.
In lmer a random term like \texttt{(age | group} will actually fit the
model \texttt{(1+age | group)}, i.e., an intercept term is assumed
unless it is specifically removed by adding \texttt{-1} to the model.
In lmekin an intercept is not assumed, the random effect you
type is the one that you get.
The primary reason for this is that lmer mimics lm, which
also adds an intercept unless it is explicitly suppressed.
The coxme function mimics coxph, which does not add an
intercept. Since lmekin is built on the same routines as
coxme it also follows that convention.
(In Cox models there is not an intercept term for the fixed effects
since this is absorbed into the baseline hazard).
\section{GAW example}
\begin{figure}
\resizebox{\textwidth}{!}{\includegraphics{lmekin-gaw.pdf}}
\caption{Pedigree 9 from the GAW data.}
\label{gawfig}
\end{figure}
The following examples use data from one of the Genetic Analaysis Workshops
(I don't remeber which year).
First read in the saved data, create the pedigrees, and create the kinship
matrix.
Figure \ref{gawfig} shows a plot of the smallest of the 23 families in the
file.
<>=
require(kinship2)
load("gaw.rda")
gped <- with(gdata, pedigree(id, father, mother, sex=sex, famid=famid))
kmat <- kinship(gped)
plot(gped[9])
gfit0 <- lm(age ~ q1, gdata)
summary(gfit0)
gfit1 <- lmekin(age ~ q1 + (1|id), data=gdata, varlist=kmat*2)
gfit1
@
The fit predicts age at onset using one quantitative trait along with a
familial affect.
The residual error is decreased when we include a familial effect, and the
familial effects is substantial.
The kinship matrix has diagonal elements of .5 (if there is no inbreeding);
it is traditional to use a scaled version with elements of 1 in genetics
models.
A next step is to look at the effect of a particular locus.
The saved rda file also contains the results of a single SOLAR run at
locus 6.90 along with the \texttt{pedindex} file created by SOLAR.
We need to convert these into sparse matrix form, and add appropriate labels.
(When there are kinship or ibd matrices, the coxme routine uses the
matrix labels to match the proper row/col to the proper subject).
The SOLAR package may reorder subjects in the data set; the
pedindex matrix contains the new subject and family numbers in colums
1 and 6, and the original family and subject values in the last two columns.
In this data set each subject has a unique identifier, so we do not need to
include the family id in the matrix dimnames to obtain correct matches.
<<>>=
sid <- pedindex[,9]
ibd6.90 <- with(solar6.90, sparseMatrix(id2, id1, x=x, symmetric=TRUE,
dimnames=list(sid, sid)))
gfit2 <- lmekin(age ~ (1|id), data=gdata,
varlist= list(kmat, ibd6.90))
print(gfit2)
@
The specific effect is modest for this locus: it partitions the familial
effect found above into about 1/3 locus specific and 2/3 multifactorial.
Another possible fit is to assume a common environmental effect for
each family.
(For pedigrees this large I have serious doubts about the
relevance of the model below, but it serves as an illustration).
When there are multiple random terms the varlist argument is matched up
to them one by one, with the default choice used for any remaining,
so in the model below the first will be a kinship effect and the second an uncorrelated random
intercept per family.
<<>>=
gfit3 <- lmekin(age ~ q1 + (1|id) + (1|famid), data=gdata,
varlist=kmat)
gfit3
@
If one wanted to be specific the above model could be written as
below, to identify the actual variance functions used for each.
<>=
lmekin(age ~ q1 + (1|id) + (1|famid), data=gdata,
varlist=list(coxmeMlist(kmat), coxmeFull))
@
\section{Computation}
The random effects linear model is
\begin{align}
y &= X\beta + Zb + \epsilon \\
b &\sim N(0, \sigma^2 A(\theta) \\
\epsilon &= N(0, \sigma^2)
\end{align}
Here $\beta$ are the fixed and $b$ the random coefficients, and the
variance matrix $A$ of the random effects depends on some arbitrary
vector of parameters $\theta$.
For any fixed value of $\theta$ the solution for the remaining
parameters is based on a QR decomposition, exactly as is laid
out in section 2.2 of Pinheiro and Bates (\cite{Jose}), leading also
a profile likelihood value $L(\theta)$.
For known $A$, this is solved as an augmented
least squares problem with
\begin{equation*}
y^*=\left(\begin{array}{c} y\\0 \end{array} \right) \qquad
X^*=\left(\begin{array}{c} X\\0 \end{array} \right) \qquad
Z^*=\left(\begin{array}{c} Z\\ \Delta \end{array} \right)
\end{equation*}
where $\Delta' \Delta= A^{-1}$.
The dummy rows of data have $y=0$, $X=0$ and $\Delta$ as
the predictor variables.
With known $\Delta$, this gives the solution to all the other
parameters as an ordinary least squares problem, which is solved
using a QR decompostion.
The $Z$ matrix is often sparse, so the QR computations are done using the
Matrix library to take advantage of this.
Maximization of $L(\theta)$ with respect to $\theta$ is accomplished
with the optim() function.
Thus, during the solution process $A$ will contain relative
variances for components of $b$, something that Pinheiro and Bates
refer to as the \emph{precision} matrix.
When the results of a fit are printed out $A$ is multiplied by
$\sigma^2$ to give the variance of $b$ directly.
This decomposition will be invisible to most users, unless they
either set initial values or retrieve variances directly from the
coxme object.
Initial values are for the parameters $\theta$ of $A$, and the
results of the VarCorr function will also be terms of $\theta$,
not multiplied by the residual variance.
This causes a complication if a user wanted to fix the the overall
variance of the random effect at some constant; no solution to this
is yet in place.
For comparison see section 2.1.1 of Pinheiro and Bates.
They also use the values of the
Cholesky decomposition $\Delta$ directly as the unknowns for
the optim function.
This has the advantage of further numerical precision, avoids
computing the Cholesky decompostion anew at each iteration,
and guarrantees that the variance matrix is positive definite.
However, though it works well for an unstructured variance, the
lme default, the common genetic models do not have a simple
representation in the Cholesky space and so we work directly
with $A$.
\begin{thebibliography}{9}
\bibitem{Jose} Jos{\'{e}} C. Pinheiro and Douglas M. Bates,
\emph{Mixed-Effects Models in S and S-PLUS},
Springer, 2000.
\bibitem{Cortinas02}
Cortinas Abrahantes, Jose; Burzykowski, Tomasz,
A version of the EM algorithm for proportional hazards models with
random effects, \emph{Lecture Notes of the ICB Seminars},
p. 15-20, 2002.
\end{thebibliography}
\end{document}