SMHS LinearModeling LMM Assumptions

From SOCR
Revision as of 12:28, 17 February 2016 by Dinov (talk | contribs) (Created page with "== Linear mixed effects analyses - Mixed Effect Model Assumptions== First review the SMHS_LinearModeling_LMM | Linear mixed effects analyses sec...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Linear mixed effects analyses - Mixed Effect Model Assumptions

First review the Linear mixed effects analyses section.

The same conditions we have in the fixed effect multivariate linear model apply to mixed and random effect models – co-linearity, influential data points, homoscedasticity, and lack of normality. These assumptions can be checked by creating residual plots, histogram plots of the residuals or a Q-Q normal probability plots.

The fixed effect independence condition is relaxed in mixed/random effect models as this was the main motivation for mixed models – to resolve dependencies in the data. Mixed effect models still require independence, e.g., when ignoring independent and including just a fixed effect for a variable of interest. For instance, working with a model that does not include a random effect “Player”, then we have multiple Weight responses per Player. This would violate the LME model independence assumption. Careful selection of fixed effects and random effects is necessary to resolve potential dependencies in the data.

The function dfbeta() can’t be used for assessing influential data points in mixed effects linear models the way it can for fixed effect models. To check for influential points in mixed effect models the package influence.ME or a leave-one-out validation can be employed.

For example we can define a vector of size equal to the number of rows in the data. Iterating over each row (i), we estimate a new mixed model excluding the current row index (data[-i,]). The function fixef() extracts the coefficients of interest, which can be adapted to the specific analysis. Running fixef() on the linear model yields the position of the relevant coefficient. For example, position “1” refers to the intercept (which is always the first coefficient mentioned in the coefficient table) and position “2” reflects the effect of “Height” appears second in the list of coefficients.

df <- as.data.frame(data)
all.res=numeric(nrow(df)) 
for(i in 1:nrow(df))
{ 		# Generic
# myfullmodel=lmer(response~predictor+ (1+predictor|randomeffect))
# results[i]=fixef(myfullmodel)[parameter position index]
fullmodel=lmer(Weight~Height+ (1+Height|Team), data=data[-i,])
results[i]=fixef(fullmodel)[2]
echo ("Row = ", i)
}

Comments

Fixed effects represent explanatory predictors that are expected to have a systematic and predictable influence on the data (response). Whereas random effects represent covariates expected to have a non-systematic, idiosyncratic, unpredictable, or “random” influence on the response variable. Examples of such random effects in experimental studies include “subject/patient/player/unit” and “Age”, as we generally have no control over idiosyncrasies of individual subjects or their age at time of observation.

Often fixed effects are expected to exhaust the population of interest, or the levels of a factor. In the MLB study the factor “Team” may not exhaust the space as there are other teams/leagues. However, for MLB at a fixed time, the “Team” factor may be fully exhaustive. Same with Height. Random effects represent sub-samples from the population of interest and may not “exhaust” the population as more players or teams could be included in the study. The levels of random factor may only represent a small sub-subset of all levels of the factor.

Hands-on Activity

Use these cancer data (http://www.ats.ucla.edu/stat/data/hdp.csv), representing cancer phenotypes and predictors (e.g., "IL6", "CRP", "LengthofStay", "Experience") and outcome measures (e.g., remission) collected on patients, nested within doctors (DID) and within hospitals (HID). To fit a mixed model (http://www.ats.ucla.edu/stat/r/dae/melogit.htm) and examine remissions as cancer outcomes.

This lung cancer dataset includes a variety of outcomes collected on patients, nested within doctors, who are in turn nested within hospitals. Doctor level variables include experience.
hdp <- read.csv("http://www.ats.ucla.edu/stat/data/hdp.csv")
hdp <- within(hdp, {
Married <- factor(Married, levels = 0:1, labels = c("no", "yes"))
DID <- factor(DID)
HID <- factor(HID)
})

Plot several continuous predictor variables to examine the distributions and catch coding errors (e.g., if values range from 0 to 7, but we see a 999), and explore the relationship among our variables.

# install.packages("ggally")
# library(GGally)
# library("ggplot2")
# ggpairs (hdp[, c("IL6", "CRP", "LengthofStay", "Experience")])


Next See

Machine Learning Algorithms section for data modeling, training , testing, forecasting, prediction, and simulation.





Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif