Jump to: navigation, search

Scientific Methods for Health Sciences - Hierarchical Linear Models (HLM)


Hierarchical linear model (also called multilevel models) refers to statistical models of parameters that vary at more than one level. It can be regarded as generalization of linear model and is widely applied in various studies from different fields especially for research designs where data for participants are organized at more than one level. In this section, we will present a general introduction to hierarchical linear model with comparison from the classical method of ANOVA model and illustrate the application of hierarchical linear model with examples.


How can we deal with cases where the data are multiple levels? What would be the advantages of hierarchical linear model compared to the classical approach? How does the hierarchical linear model work?


1) Hierarchical linear models These are statistical models of parameters that vary at more than one level. It combines the advantages of the mixed-model ANOVA, with its flexible modeling of fixed and random effects, and regression, with its advantages in dealing with unbalanced data and predictors that are discrete or continuous.

  • Data for such models are hierarchically structured with first-level units nested within second-level units, second-level units nested within third-level units and so on. Parameters for such models may be viewed as having a hierarchical linear structure and be viewed as varying across level-two units as a function of level-two characteristics. Higher levels may be added without limits, though to date no published applications involve more than three levels.
  • Compared to the classical experimental design models: the random factors are nested and never crossed, however, fixed factors can be crossed with random factors and random factors may be nested within fixed factors. The data may be unbalanced at any level, and continuous predictors can be defined at any level. Both discrete and continuous predictors can be specified as having random effects, which are allowed to covary.
  • The units of analysis are usually individuals who are nested within aggregate units at the higher level. The lowest level of data in the model is usually an individual, repeated measurements of individuals may e examined. The hierarchical linear model provides an alternative type of analysis for univariate or multivariate analysis of repeated measures. Individual differences in growth curves may be examined. Multilevel models can be used as an alternative to ANCOVA, where scores on the dependent variable are adjusted for covariate and the model can analyze these experiments without assumption of homogeneity-of-regression slopes, which is required by ANCOVA.

2) Two-level hierarchical linear model Hierarchical linear models can be used on data with many levels, though two-level models are the most common. The dependent variable must be examined at the lowest level of analysis. Take the example of a design study involves students nested within schools. Level-one model specifies how student-level predictors relate to the student-level outcome while at level two, each of the regression coefficients defined in the level-one model.

  • The level-one model (the student level), the outcome $y_{ij}$ for student $i$ in school $j (i = 1, 2, \cdots, n_{j}; j = 1, 2, \cdots, J),$ varies as a function of student characteristics, $X_{qij}, q = 1, \cdots, Q,$ and a random error $r_{ij}$ according to the linear regression model $y_{ij}=\beta_{0j}+\sum\beta_{qj}X_{qij} + r_{ij}, r_{ij} \sim N(0,\sigma^2),$ where $\beta_{0j}$ is the intercept and each $\beta_{qj}, q = 1, \cdots, Q,$ is a regression coefficient indicating the strength of association between $X_{qij},$ and the outcome within school $j$. The error of prediction of $y_{ij}$ by the $X$’s is $r_{ij}$, which is assumed normally distributed and, for simplicity, homoscedastic.
  • The level-two model: each regression coefficient, $\beta_{qj}, q = 0, 1, \cdots, Q$ defined by the level-one model, becomes an outcome variable to be predicted by school-level characteristics, $W_{sj}, s = 1, \cdots, S$, according to the regression model: $\beta_{qj}=\Theta_{q0}+\sum\Theta_{qs}W_{sj}+\mu_{qj}$, where $\Theta_{q0}$ is an intercept, each $\Theta_{qs}, s = 1, 2, \cdots, S$, is a regression slope specifying the strength of association between each $W_{sj}$ and the outcome $\beta_{qj}$; and the random effects are assumed sampled from a (Q+1)-variate normal distribution, where each $\mu_{qj}, q=1,2, \cdots, Q,$ has a mean of $0$ and variance $\tau_{qq}$, and the covariance between $\mu_{qj}$ and $\mu_{qj}$ is $\tau_{qq}$. There are a considerable number of options in modeling each $\beta_{qj}, q=0,1,\cdots, Q$. If every $W_{sj}$ is assumed to have no effect, the regression coefficients $\Theta_{qs}, s=1,2,\cdots, S,$ are set to zero. If the random effect $\mu_{qj}$ is also constraint to zero, then $\beta_{qj} = \Theta_{q0}$, i.e., $\beta_{qj},$ is fixed across all schools.
  • The one-way analysis of variance:
    • Classical approach of one-way ANOVA model: $y_{ij} = \mu + \alpha_{j} + r_{ij}, r_{ij} \sim N(0,\sigma^2)$, where $y_{ij}$ is the observation for subject a assigned to level $j$ of the independent variable; $\mu$ is the grand mean; $\alpha_{j}$ is the effect associated with level $j$; and $r_{ij}$ is assumed normally distributed with mean $0$, homogeneous variance $\sigma^2$.
    • Hierarchical linear model: $y_{ij}=\beta_{0j}+\sum\beta_{qj}X_{qij}+r_{ij}, r_{ij}\sim N(0,\sigma^2),$ set the intercept to zero, have the level-one model: $y_{ij}=\beta_{0j}+r_{ij}, r_{ij} \sim N(0,\sigma^2)$, $\sigma^2$ is the within group variance and $\beta_{0j}$ is the group mean and is the only parameter that needs to be predicted at level two and the model for that parameter is similarly simplified so that all regression coefficients except the intercept are set to zero: $\beta_{0j}=\Theta_{00}+\mu_{0j}, \mu_{0j}\sim N(0,\tau^2)$. Here, $\Theta_{00}$ is the grand mean and $\mu_{0j}$ is the effect associated with level $j$. In the random effects model, the effect is typically assumed normally distributed with a mean of zero. In the fixed effects model, each $\mu_{0j}$ is a fixed constant, $y_{ij}=\Theta_{00}+\mu_{0j}+r_{ij}$, where $r_{ij} \sim N(0,\sigma^2)$, and $\mu_{0j} \sim N(0,\tau^2)$. This is clearly the one-way random effects ANOVA.
    • Hypothesis testing: a simple test of the null hypothesis of no group effects, i.e., $H_{o}: \tau^2=0$ is given by the statistic $H=\sum \hat{P}_{j}(\hat{y}_{\cdot j}-\bar{y}_{\cdot \cdot})^2$, where $\hat{P}_{j}=\frac{n_{j}} {\hat{\sigma}}^2$. H has a large sample chi-square distribution with J-1 degrees of freedom under the null hypothesis. With the case of balanced data, the sum of precision weighted squared difference reduces to $H=\hat{P}\sum(\bar{y}_{i}-\bar{y})^2 = (J-1)\frac{MS_{b}} {MS_{w}}$, revealing clearly that $\frac{H}{J-1}$ is the usual $F$ statistic for testing group differences in ANOVA.
  • The two-factor nested design: hierarchical linear models generalize maximum likelihood estimation to the case of unbalanced data, and covariates measured at each level can be discrete or continuous and can have either fixed or random effects.
    • The classical approach: $y_{ijk}=\mu+\alpha_{k}+\pi_{j(k)}+r_{ijk}, \pi_{j(k)} \sim N(0,\tau^2), r_{ijk} \sim N(0,\sigma^2)$, where $y_{ijk}$ is the outcome for subject $i$ nested within level $j$ of the random factor, which is, in turn, nested within level $k$ of the fixed factor $(i=1, \cdots, n_{jk}; j=1, \cdots, J_{k}; k=1,\cdots, K); \mu$ is the grand mean; $\alpha_{k}$ is the effect associated with the $k^{th}$ level of the fixed factor; $\pi_{j(k)}$ is the effect associated with the $j^{th}$ level of the random factor within the $k^{th}$ level of the fixed factor; and $r_{ijk}$ is the random (within cell) error. In the case of balanced data ($n_{jk}=n$ for every level of the random factor), the standard analysis of variance method and the method of restricted maximum likelihood coincide.
    • Analysis of means of a Hierarchical linear model: as in the case of the one-way ANOVA, first set every regression coefficient, except the intercept to zero, $y_{ij}=\beta_{0j} + r_{ij}, r_{ij} \sim N(0,\sigma^2)$. The level-two (between class) model is a regression model in which the class mean $\beta_{0j}$ is the outcome and the predictor is a contrast between the two treatments. The level-two model: $\beta_{0j}=\Theta_{00} + \Theta_{01}W_{j} + u_{0j}, \mu_{0j} \sim N(0,\tau^2)$, has $w_{j}=\frac{1}{2}$ for classes experiencing instructional method $1$ and $W_{j} =-\frac{1}{2}$ for classes experiencing instructional method $2$. Hence, the correspondences between the hierarchical model and the ANOVA model are $\Theta_{00} = \mu, \Theta_{01} = \alpha_{1} - \alpha_{1}, \mu_{0j} = \pi_{j(k)}$. The single model in this case would be $y_{ij} = \Theta_{00} + \Theta_{01}W_{j} + \mu_{0j} + r_{ij}$. K-1 contrasts must be included to represent the variation among the K methods in order to duplicate the ANOVA results.
  • The two-factor crossed design (with replications within cells): with the mixed cases with one factor fixed and the other random.
    • The classical approach for the mixed model for the two-factor crossed design: $y_{ijk}=\mu+\alpha_{k}+\pi_{j}+(\alpha \pi)_{jk}+r_{ijk}, \pi_{j} \sim N(0,\tau^2), (\alpha \pi)_{jk} \sim N(0, \delta^2), r_{ijk} \sim N(0, \sigma^2).$
    • Analysis by means of a hierarchical linear model: in the two-factor mixed crossed model, the fixed factor is specified in the level-one model with (K-1) X’s. With K=2, only one $X$ is specified, so the level-one model becomes $y_{ij} = \beta_{0j} + \beta_{1j}X_{1ij} + r_{ij}, r_{ij} \sim N(0, \sigma^2), y_{ij}$ is the outcome for the subject $i$ having tutor $j$, $\beta_{0j}$ is the mean for the $j^{th}$ tutor, $\beta_{1j}$ is the contrast between the practice and no-practice conditions within tutor $j$. $X_{1ij}=1$ for subjects of tutor $j$ having practice and $-1$ for those having no practice, and $r_{ij}$ is the within cell error. To replicate the results of the ANOVA, the level-two model is formulated to allow these to vary randomly across tutors: $\beta_{0j} = \Theta_{00}+\mu_{0j},$ and $\beta_{1j} = \Theta_{10} + \mu_{1j},$ where $\Theta_{00}$ is the grand mean, $\mu_{0j}$ is the unique effect of tutor $j$ on the mean level of the outcome, $\Theta_{10}$ is the average value of the treatment contrast, and $\mu_{1j}$ is the unique effect of tutor $j$ on that contrast. The correspondences between the hierarchical model and the ANOVA model are $\Theta_{00} = \mu, \Theta_{10} = \frac{\alpha_{2}-\alpha_{1}}{2}, \mu_{0j}=\pi_{j}, \mu_{1j} = \frac{(\alpha \beta)_{j2}-(\alpha \beta)_{j1}}{2}, \tau_{00} = \tau^2, \tau_{11} = \frac{\delta^2}{2}$, in this case we yield the single model: $y_{ij} = \Theta_{00} + \Theta_{10}X_{1ij} + \mu_{0j} + \mu_{1j}X_{1ij} + r_{ij}$.
  • Randomized block (and repeated measures) design: involve mixed models having both fixed and random effects. The blocks will typically be viewed as having random levels, and within blocks, there will commonly be a fixed effects design. The fixed effects may represent experimental treatment levels, or in longitudinal studies, involve polynomial trends.
    • Classical approach for a randomized block design where ther are no between blocks factors. $y_{ij} = \mu + \alpha_{i} + \pi_{j} + e_{ij}$, where $\pi_{j} \sim N(0,\tau^2)$, and $e_{ij} \sim N(0,\sigma^2)$. $\pi_{j}$ is the effect of block $j$ and $e_{ij}$ is the error, which has two components $(\alpha \pi)_{ij}$ and $r_{ij}.$
    • Analysis by means of a hierarchical linear model for the randomized block design: similar to specification for the two-factor crossed design discussed above. The difference is that in the case of randomized block design, there is no replication within cells, so the model needs to be simplified. According to the level-one (within block) model, the outcome depends on polynomial trend components plus error: $y_{ij} = \beta_{0j} + \beta_{1j}(LIN)_{ij} + \beta_{2}(QUAD)_{ij} + \beta_{3j}(CUBE)_{ij} + r_{ij}, r_{ij} \sim N(0,\sigma^2)$. $y_{ij}$ is the outcome for the subject $i$ in block $j$; $\beta_{0j}$ is the mean for the block $j$; $(LIN)_{ij}$ assigns the linear contrast values $(-1.5, -0.5, 0.5, 1.5)$ to durations (1,2,3,4) respectively; $(QUAD)_{ij}$ assigns the quadratic contrast values (0.5, -0.5, -0.5, 0.5); $(CUBE)_{ij}$ assigns the cubic contrast values (-0.5, 1.5, -1.5, 0.5); $\beta_{1j}, \beta_{2j}$ and $\beta_{3j}$ are the linear, quadratic, and cubic regression parameters, respectively and $r_{ij}$ is the within cell error. Suppose we have four observations per block and rou regression coefficients ($\beta$’s) in the level-one model, no degrees of freedom remain to estimate within cell error. Assume that the contrast values don’t vary across blacks, then we can treat the trend parameters as fixed, yielding the level-two model: $\beta_{0j} = \Theta_{00} + \mu_{0j}, \mu_{0j} \sim N(0,\tau^2)$; $\beta_{1j}=\Theta_{10}$; $\beta_{2j} = \Theta_{20}$; $\beta_{3j} = \Theta_{30}$, where $\Theta_{00}$ is the grand mean, $\mu_{0j}$ is the unique effect of block $j$ assumed normally distributed with mean zero and variance $\tau^2$. The coefficients, $\beta_{1j}, \beta_{2j}$ and $\beta_{3j}$ are constrained to be invariant across blocks. The correspondences between the hierarchical model and the ANOVA model are $\Theta_{00}=\mu, \Theta_{10} = (-1.5\alpha_{1} – 0.5\alpha_{2} + 0.5\alpha_{3} + 1.5\alpha_{4}); \Theta_{20} = (0.5\alpha_{1} – 0.5\alpha_{2} – 0.5\alpha_{3} + 0.5\alpha_{4}); \Theta_{30} = (-0.5\alpha_{1} + 1.5\alpha_{2} – 1.5\alpha_{3} + 0.5\alpha_{4})$, and $\mu_{0j} = \pi_{j}$. In this case, we can combine the equations above and yield the single model: $y_{ij} = \Theta_{00} + \Theta_{10}(LIN)_{ij} + \Theta_{20}(QUAD)_{ij} + \Theta_{30}(CUBE)_{ij} + \mu_{0j} + r_{ij}.$ This model assumes the variance-covariance matrix of repeated measure is compound symmetric: $Var(Y_{ij}) = \tau^2 + \sigma^2; Cov(Y_{ij}, Y_{i’j}) = \tau^2.$


This article presented an overview of the logic and rationale of hierarchical linear models. Due to the inherently hierarchical nature of organizations, data collected in organizations consist of nested entities. More specifically, individuals are nested in work groups, work groups are nested in departments, departments are nested in organizations, and organizations are nested in environments. Hierarchical linear models provide a conceptual and statistical mechanism for investigating and drawing conclusions regarding the influence of phenomena at different levels of analysis. This introductory paper: (a) discusses the logic and rationale of hierarchical linear models, (b) presents a conceptual description of the estimation strategy, and (c) using a hypothetical set of research questions, provides an overview of a typical series of multi-level models that might be investigated.

This article provided a review of the educational application of hierarchical linear models. The search for appropriate statistical methods for hierarchical, multilevel data has been a prominent theme in educational statistics over the past 15 years. As a result of this search, an important class of models, termed hierarchical linear models by this review, has emerged. In the paradigmatic application of such models, observations within each group (e.g., classroom or school) vary as a function of group-level or “microparameters.” However, these microparameters vary randomly across the population of groups as a function of “macroparameters.” Research interest has focused on estimation of both micro- and macroparameters. This paper reviews estimation theory and application of such models. Also, the logic of these methods is extended beyond the paradigmatic case to include research domains as diverse as panel studies, meta-analysis, and classical test theory. Microparameters to be estimated may be as diverse as means, proportions, variances, linear regression coefficients, and logit linear regression coefficients. Estimation theory is reviewed from Bayes and empirical Bayes viewpoints and the examples considered involve data sets with two levels of hierarchy.


Training AERA Overheads

GPU Computing


Fox-Companion Mixed Models


Example:  using data Dyestuff
'data.frame':   30 obs. of  2 variables:
 $\$$ Batch: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 2 ...
 $\$$ Yield: num  1545 1440 1440 1520 1580 ...
 Batch     Yield     
 A:5   Min.   :1440  
 B:5   1st Qu.:1469  
 C:5   Median :1530  
 D:5   Mean   :1528  
 E:5   3rd Qu.:1575 


SMHS Fig1 Hierarchical Linear Models.png

model <- lmer(Yield ~ 1+(1|Batch),Dyestuff)
Linear mixed model fit by REML ['lmerMod']
Formula: Yield ~ 1 + (1 | Batch) 
::Data: Dyestuff 
REML criterion at convergence: 319.6543 
Random effects:
 Groups   Name        Std.Dev.
 Batch    (Intercept) 42.00   
 Residual             49.51   
Number of obs: 30, groups: Batch, 6
Fixed Effects:

Linear mixed model fit by REML ['lmerMod']
Formula: Yield ~ 1 + (1 | Batch) 
   Data: Dyestuff 

REML criterion at convergence: 319.6543

Random effects:
 Groups   Name        Variance Std.Dev.
 Batch    (Intercept) 1764     42.00   
 Residual             2451     49.51   
Number of obs: 30, groups: Batch, 6
Fixed effects:
            Estimate Std. Error t value
(Intercept)  1527.50      19.38    78.8
[1] 1509.893 1509.893 1509.893 1509.893 1509.893 1527.891 1527.891 1527.891 1527.891
1527.891  1556.062 1556.062
[13] 1556.062 1556.062 1556.062 1504.415 1504.415 1504.415 1504.415 1504.415 1584.233
1584.233 1584.233 1584.233
[25] 1584.233 1482.505 1482.505 1482.505 1482.505 1482.505

model.up <- update(model,REML=F) ## refit the model for Maximum Llikelihood estimates, which is the same as Restricted ML estimates given the dataset is balanced, one-way classification. model.up

Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Yield ~ 1 + (1 | Batch) 
   Data: Dyestuff 
      AIC       BIC    logLik  deviance 
 333.3271  337.5307 -163.6635  327.3271 
 Random effects:
 Groups   Name        Std.Dev.
 Batch    (Intercept) 37.26   
 Residual             49.51   
 Number of obs: 30, groups: Batch, 6
 Fixed Effects:


Multilevel Model Wikipedia

Hierarchical Linear Models and Experimental Design

Translate this page:

Uk flag.gif

De flag.gif

Es flag.gif

Fr flag.gif

It flag.gif

Pt flag.gif

Jp flag.gif

Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Fi flag.gif

इस भाषा में
In flag.gif

No flag.png

Kr flag.gif

Cn flag.gif

Cn flag.gif

Ru flag.gif

Nl flag.gif

Gr flag.gif

Hr flag.gif

Česká republika
Cz flag.gif

Dk flag.gif

Pl flag.png

Ro flag.png

Se flag.gif