SMHS LinearModeling LMM Assumptions
Contents
Linear mixed effects analyses - Mixed Effect Model Assumptions
First review the Linear mixed effects analyses section.
The same conditions we have in the fixed effect multivariate linear model apply to mixed and random effect models – co-linearity, influential data points, homoscedasticity, and lack of normality. These assumptions can be checked by creating residual plots, histogram plots of the residuals or a Q-Q normal probability plots.
The fixed effect independence condition is relaxed in mixed/random effect models as this was the main motivation for mixed models – to resolve dependencies in the data. Mixed effect models still require independence, e.g., when ignoring independent and including just a fixed effect for a variable of interest. For instance, working with a model that does not include a random effect “Player”, then we have multiple Weight responses per Player. This would violate the LME model independence assumption. Careful selection of fixed effects and random effects is necessary to resolve potential dependencies in the data.
The function dfbeta() can’t be used for assessing influential data points in mixed effects linear models the way it can for fixed effect models. To check for influential points in mixed effect models the package influence.ME or a leave-one-out validation can be employed.
For example we can define a vector of size equal to the number of rows in the data. Iterating over each row (i), we estimate a new mixed model excluding the current row index (data[-i,]). The function fixef() extracts the coefficients of interest, which can be adapted to the specific analysis. Running fixef() on the linear model yields the position of the relevant coefficient. For example, position “1” refers to the intercept (which is always the first coefficient mentioned in the coefficient table) and position “2” reflects the effect of “Height” appears second in the list of coefficients.
df <- as.data.frame(data) all.res=numeric(nrow(df)) for(i in 1:nrow(df)) { # Generic # myfullmodel=lmer(response~predictor+ (1+predictor|randomeffect)) # results[i]=fixef(myfullmodel)[parameter position index] fullmodel=lmer(Weight~Height+ (1+Height|Team), data=data[-i,]) results[i]=fixef(fullmodel)[2] echo ("Row = ", i) }
Comments
Fixed effects represent explanatory predictors that are expected to have a systematic and predictable influence on the data (response). Whereas random effects represent covariates expected to have a non-systematic, idiosyncratic, unpredictable, or “random” influence on the response variable. Examples of such random effects in experimental studies include “subject/patient/player/unit” and “Age”, as we generally have no control over idiosyncrasies of individual subjects or their age at time of observation.
Often fixed effects are expected to exhaust the population of interest, or the levels of a factor. In the MLB study the factor “Team” may not exhaust the space as there are other teams/leagues. However, for MLB at a fixed time, the “Team” factor may be fully exhaustive. Same with Height. Random effects represent sub-samples from the population of interest and may not “exhaust” the population as more players or teams could be included in the study. The levels of random factor may only represent a small sub-subset of all levels of the factor.
Hands-on Activity
Use these cancer data (http://www.ats.ucla.edu/stat/data/hdp.csv), representing cancer phenotypes and predictors (e.g., "IL6", "CRP", "LengthofStay", "Experience") and outcome measures (e.g., remission) collected on patients, nested within doctors (DID) and within hospitals (HID). To fit a mixed model (http://www.ats.ucla.edu/stat/r/dae/melogit.htm) and examine remissions as cancer outcomes.
This lung cancer dataset includes a variety of outcomes collected on patients, nested within doctors, who are in turn nested within hospitals. Doctor level variables include experience. hdp <- read.csv("http://www.ats.ucla.edu/stat/data/hdp.csv") hdp <- within(hdp, { Married <- factor(Married, levels = 0:1, labels = c("no", "yes")) DID <- factor(DID) HID <- factor(HID) })
Plot several continuous predictor variables to examine the distributions and catch coding errors (e.g., if values range from 0 to 7, but we see a 999), and explore the relationship among our variables.
# install.packages("ggally") # library(GGally) # library("ggplot2") # ggpairs (hdp[, c("IL6", "CRP", "LengthofStay", "Experience")])
Next See
Machine Learning Algorithms section for data modeling, training , testing, forecasting, prediction, and simulation.
- SOCR Home page: http://www.socr.umich.edu
Translate this page: