SMHS rANOVA
Contents
Scientific Methods for Health Sciences - Repeated measures Analysis of Variance (rANOVA)
Overview
The phrase Repeated measures is used in situations when the same objects/units/entities take part in all conditions of an experiment. Given there is multiple measures on the same subject, we have to control for correlation between multiple measures on the same subject. Repeated measures ANOVA (rANOVA) is a commonly used statistical approach for analyzing repeated measure designs. It is the equivalent of the one-way ANOVA, but for related, not independent, groups. It is also referred to as within-subject ANOVA or ANOVA for correlated samples. The test is to detect any overall differences between related means. We can analyze data using repeated measures ANOVA for two types of study design. Studies that investigate either (1) changes in mean scores over three or more time points, or (2) difference in mean scores under three or more different conditions.
Motivation
If you want to test the equality of means, ANOVA would often be a good way to go. However, when it comes to data with repeated measures, a standard ANOVA would be inappropriate because it fails to model the correlation between the repeated measures: the data violate the ANOVA assumption of independence. Repeated measures is used for several reasons: First, some research hypotheses require repeated measures. Longitudinal research, for example, measures each sample member at each of several ages. In this case, age would be a repeated factor. Second, in cases where there is a great deal of variation between sample members, error variance estimates from standard ANOVAs are large. Repeated measures of each sample member provide a way of accounting for this variance, thus reducing error variance. Third, when sample members are difficult to recruit, repeated measures designs are economical because each member is measured under all conditions. Fourth, it applies when members have been matched according to some important characteristic.
In order to provide a demonstration of how to calculate a repeated measures ANOVA, we shall use the example of a 6-month exercise-training intervention where six subjects had their fitness level measured on three occasions: pre-, 3 months, and post-intervention. Their data is shown below along with some initial calculations:
Subjects | Pre- | 3 Months | 6 Months | Subject Means |
1 | 45 | 50 | 55 | 50 |
2 | 42 | 42 | 45 | 43 |
3 | 36 | 41 | 43 | 40 |
4 | 39 | 35 | 40 | 38 |
5 | 51 | 55 | 59 | 55 |
6 | 44 | 49 | 56 | 49.7 |
Monthly Means | 42.8 | 45.3 | 49.7 |
Source | SS | df | MS | F |
Conditions | $SS_{conditions}$ | $(k-1)$ | $MS_{conditions}$ | $\frac {MS_{conditions}}{MS_{error}}$ |
Subjects | $SS_{subjects}$ | $(n-1)$ | $MS_{subjects}$ | $\frac{MS_{subjects}} {MS_{error}}$ |
Error | $SS_{error}$ | $(k-1)(n-1)$ | $MS_{error}$ | |
Total | $SS_{T}$ | $(N-1)$ |
$F=\frac {MS_{conditions}}{MS_{error}}=\frac {MS_{time}}{MS_{error}}$
$SS_{Total} = SS_{conditions}+SS_{subjects}+SS_{Error}$
with corresponding degrees of freedom:
$df_{Total}=df_{conditions}+df_{subjects}+df_{Error} =(k-1)+(n-1)+((n-k)(n-1))$
$SS_{Time}$s the same as for $SS_{b}$ in an independent ANOVA, and
$SS_{Time}=SS_{b}=\sum_{i=1}^{k} n_{i} (\bar x_{i}-\bar x)^{2}=$
where $k$ is the number of conditions, $n_{i}$ is the number of subjects under $ith$ condition, $\bar x_{i}$ is the mean score for each $ith$ condition and $\bar x$ is the overall grand mean of all conditions.
Here, we have
$SS_{Time}=SS_{b}=\sum_{i=1}^{k} n_{i} (\bar x_{i}-\bar x)^{2}=6[(42.8-45.9)^{2}+(45.3-45.9)^{2}+(49.7-45.9)^{2}]=6[9.61+0.36+14.44]=143.44.$
The within-subject variation $SS_{w}$ is calculated as
$SS_{W}=\sum_{1}(x_{i1}-\bar x_{1})^{2}+\sum_{2}(x_{i2}-\bar x_{2})^{2} +⋯+\sum_{k}(x_{ik}-\bar x_{k})^{2}$
where $x_{ik}$ is the score of the $ith$ subject in group $k$.
For this example we have: $SS_{W}=\sum_{1}(x_{i1}-\bar x_{1})^{2}+\sum_{2}(x_{i2}-\bar x_{2})^{2} +⋯+\sum_{k}(x_{ik}-\bar x_{k})^{2}=[(45-42.8)^2+(42-42.8)^2+⋯+(56-49.7)^2 ]=715.5$
$SS_{subject}$ is calculated by: $SS_{subject}=k\sum(\bar x_{i}-\bar x )^{2}$ where $\bar x_{i}$ is the mean score of the $ith$ subject, $\bar x$ is the grand mean.
In our example, we have $SS_{subject}=k\sum(\bar x_{i})-\bar x^{2}=3[(50-45.9)^2+(40-45.9)^2+(38-45.9)^2+(55-45.9)^2+(49.7-45.9)^2 ]=658.3.$
Thus,
$SS_{Error}= SS_{w}-SS_{subject}=715.5-658.3=57.2$
$F=MS_{Time}⁄MS_{Error} =(SS_{Time}⁄df_{Time})⁄(SS_{Error}⁄df_{Error})=$
$=(SS_{Time}⁄(k-1))⁄(SS_{Error}⁄(n-1)(k-1))=$
$(146.44⁄((3-1))⁄(57.2⁄(5*2))=12.53$
We can now look up (or use a computer program) to ascertain the critical F-statistic for our F-distribution with our degrees of freedom for time $df_{Time}$ and error $df_{Error}$ and determine whether our F-statistic indicates a statistically significant result.
We report the F-statistic from a repeated measures ANOVA as:
$F(df_{Time}, df_{Error} = F-value, p = p-value,$
which for our example would be:
$F(2, 10) = 12.53, p = 0.002,$ see SOCR Java F-distribution calculator, or the Distributome HTML5 F-distribution Calculator
This means we can reject the null hypothesis and accept the alternative hypothesis. As we will discuss later, there are assumptions and effect sizes we can calculate that can alter how we report the above result. However, we would otherwise report the above findings for this example exercise study as: The six-month exercise-training program had a statistically significant effect on fitness levels, $F(2, 10) = 12.53, p = .002.$
Theory
Assumption: As with all statistical analyses, specific assumptions should be met to justify the use of this test. With repeated measures ANOVA, standard univariate and multivariate assumptions apply. The list of assumptions include:
- Normality: For each level of the within subject factor, the dependent variable must have a normal distribution.
- Sphericity: Difference scores computed between two levels of a within-subjects factor must have the same variance for the comparison of any two levels (this assumptions only applied with more than 2 levels of independent variables).
- Randomness: Cases should be derived from a random sample, and different scores for each participant are independent from those of other participants.
- Multivariate normality: The difference scores are multivariately normally distributed in the population.
One of the greatest advantages of rANOVA, as is the case with repeated measures designs in general, is the ability to partition out variability due to individual differences. Consider the general structure of F-statistic:
$F=MS_{Treatment}⁄MS_{Error} =(SS_{Treatment}⁄df_{Treatment})⁄(SS_{Error}⁄df_{Error})$
Is a between-subject design there is an element of variance due to individual difference that is combined with the treatment and error terms:
$SS_{Total}=SS_{Treatment}+SS_{Error},df_{Total}=n-1$
In a repeated measures design it is possible to partition subject variability from the treatment and error terms. In such a case, variability can be broken down into between-treatments variability (or within-subjects effects, excluding individual differences) and within-treatments variability. The within-treatments variability can be further partitioned into between-subjects variability (individual differences) and error (excluding the individual differences):
$F=MS_{conditions}⁄MS_{Error} =MS_{Time}⁄MS_{Error}$
$SS_{Error}=SS_{w}-SS_{subject}$
$SS_{Total}=SS_{conditions}+SS_{Subjects}+SS_{Error}$,
$df_{Total}=df_{conditions}+df_{subjects}+df_{Error}=(k-1)+(n-1)+((n-k)(n-1))$.
In reference to the general structure of the F-statistic, it is clear that by partitioning out the between-subjects variability, the F-value will increase because the sum of squares error term will be smaller resulting in a smaller MSError. It is noteworthy that partitioning variability reduces degrees of freedom from the F-test, therefore the between-subjects variability must be significant enough to offset the loss in degrees of freedom. If between-subjects variability is small this process may actually reduce the F-value.
F test:
As with other analysis of variance tests, the rANOVA makes use of an F statistic to determine significance. Depending on the number of within-subjects factors and assumption violations, it is necessary to select the most appropriate of three tests:
- Standard Univariate ANOVA F test — This test is commonly used given only two levels of the within-subjects factor (i.e. time point 1 and time point 2). This test is not recommended given more than 2levels of the within-subjects factor because the assumption of sphericity is commonly violated in such cases.
- Alternative Univariate test — These tests account for violations to the assumption of sphericity, and can be used when the within-subjects factor exceeds 2 levels. The F statistic is the same as in the Standard Univariate ANOVA F test, but is associated with a more accurate p-value. This correction is done by adjusting the degrees of freedom downward for determining the critical F value. Two corrections are commonly used—The Greenhouse-Geisser correction and the Huynh-Feldt correction. The Greenhouse-Geisser correction is more conservative, but addresses a common issue of increasing variability over time in a repeated-measures design.
- Multivariate Test—This test does not assume sphericity, but is also highly conservative.
Effect size:
One of the most commonly reported effect size statistics for rANOVA is partial eta-squared $(η_{p}^{2})$. It is also common to use the multivariate $η^{2}$ when the assumption of sphericity has been violated, and the multivariate test statistic is reported. A third effect size statistic that is reported is the generalized $η^{2}$, which is comparable to $η_{p}^{2}$ in a one-way repeated measures ANOVA. It has been shown to be a better estimate of effect size with other within-subjects tests.
Caution: rANOVA is not always the best statistical analyses for repeated measure designs. It is vulnerable to effects from missing values, imputation, unequivalent time points between subjects and violations of sphericity. These issues can result in sampling bias and inflated rates of Type I error. In such cases it may be better to consider use of a linear mixed model.
Applications
This article introduced some examples of analysis of variance and covariance (ANOVA) and computer programs for planning data collection designs and estimating power. It introduced about the basic statistical model set up and ANOVA models with one-factor design, nested designs, fully replicated factorial designs, randomized-block designs, split-plot designs, repeated-measures designs and un-replicated designs. In the repeated measure ANOVA, it included examples of one-factor repeated-measures model, two and three factor model with repeated measures.
This article used repeated measure ANOVA to analyze the data from a pretest-posttest design. The pretest-posttest control group design (or an extension of it) is a highly prestigious experimental design. A popular analytic strategy involves subjecting the data provided by this design to a repeated measures analysis of variance (ANOVA). Unfortunately, the statistical results yielded by this type of analysis can easily be misinterpreted, since the score model underlying the analysis is not correct. Examples from recently published articles are used to demonstrate that this statistical procedure has led to (a) incorrect statements regarding treatment effects, (b) completely redundant reanalyzes of the same data, and (c) problems with respect to post hoc investigations. 2 alternative strategies-gain scores and covariance-are discussed and compared. (19 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Software (R)
Problems
1) A simple problem would be consider the following experiment: 10 participants engage in an LDT, within which they are exposed to 4 different types of word pairs. RT to click Word or Non-Word recorded and averaged for each pair type. Forward-Prime pairs (e.g., baby-stork)
Backward-Prime pairs (e.g., stork-baby)
Unrelated Pairs (e.g., glass-apple)
Non-Word Pairs (e.g., door-blug)
Do a repeated measures ANOVA test on for the overall RM ANOVA: $Ho: µf = µb = µu= µn$
$Ha$: At least one treatment mean is different from another.
The data set is listed as below:
Part | Forward f | Backward b | Unrelated u | Non-word n | Sum Part. | |
1 | 0.2 | 0.1 | 0.4 | 0.7 | 1.4 | |
2 | 0.5 | 0.3 | 0.8 | 0.9 | 2.5 | |
3 | 0.4 | 0.3 | 0.6 | 0.8 | 2.1 | $\sum F^{2}= 1.36$ |
4 | 0.4 | 0.2 | 0.8 | 0.9 | 2.3 | $\sum B^{2}=.58$ |
5 | 0.6 | 0.4 | 0.8 | 0.8 | 2.6 | $\sum U^{2}=3.82$ |
6 | 0.3 | 0.3 | 0.5 | 0.8 | 1.9 | $\sum N^{2}= 6.63$ |
7 | 0.1 | 0.1 | 0.5 | 0.7 | 1.4 | $\sum P^{2}= 385$ |
8 | 0.2 | 0.1 | 0.6 | 0.9 | 1.8 | $\sum X_{t}^{2}= 12.39$ |
9 | 0.3 | 0.2 | 0.4 | 0.7 | 1.6 | |
10 | 0.4 | 0.2 | 0.6 | 0.9 | 2.1 | |
$\sum$ | 3.4 | 2.2 | 6 | 8.1 | 19.7 | All = 19.7 |
2) use these data SOPCR Data and apply this rANOVA protocol R-tutorials to showcase the group differences (NC, MCI and AD) relative to some covariates (2 or more of, MMSE CDR Sex Age TBV GMV WMV CSFV) – you can focus only on one shape-measure (e.g., SA).
References
Repeated Measures Design Wikipedia
- SOCR Home page: http://www.socr.umich.edu
Translate this page: