Difference between revisions of "SMHS rANOVA"

From SOCR
Jump to: navigation, search
(Problems)
Line 1: Line 1:
==[[SMHS| Scientific Methods for Health Sciences]] - Repeated measures Analysis of Variance (rANOVA) ==
+
==[[SMHS| Scientific Methods for Health Sciences]] - Repeated Measures Analysis of Variance (rANOVA) ==
  
 
===Overview===
 
===Overview===
The phrase Repeated measures is used in situations when the same objects/units/entities take part in all conditions of an experiment. Given there is multiple measures on the same subject, we have to control for correlation between multiple measures on the same subject. Repeated measures ANOVA (rANOVA) is a commonly used statistical approach for analyzing repeated measure designs. It is the equivalent of the one-way ANOVA, but for related, not independent, groups. It is also referred to as within-subject ANOVA or ANOVA for correlated samples. The test is to detect any overall differences between related means. We can analyze data using repeated measures ANOVA for two types of study design. Studies that investigate either (1) changes in mean scores over three or more time points, or (2) difference in mean scores under three or more different conditions.  
+
'''Repeated Measures Analysis of Variance (rANOVA)''' is a statistical method for analyzing data where the same subjects are measured multiple times under different conditions or at different time points. Unlike independent ANOVA, rANOVA accounts for the correlation between repeated measurements on the same subject, increasing statistical power by reducing error variance attributable to individual differences. This section provides a comprehensive treatment of rANOVA, including its mathematical foundations, assumptions, modern alternatives, and practical implementation in R.
  
 
===Motivation===
 
===Motivation===
If you want to test the equality of means, ANOVA would often be a good way to go. However, when it comes to data with repeated measures, a standard ANOVA would be inappropriate because it fails to model the correlation between the repeated measures: the data violate the ANOVA assumption of independence. Repeated measures is used for several reasons: First, some research hypotheses require repeated measures. Longitudinal research, for example, measures each sample member at each of several ages. In this case, age would be a repeated factor. Second, in cases where there is a great deal of variation between sample members, error variance estimates from standard ANOVAs are large. Repeated measures of each sample member provide a way of accounting for this variance, thus reducing error variance. Third, when sample members are difficult to recruit, repeated measures designs are economical because each member is measured under all conditions. Fourth, it applies when members have been matched according to some important characteristic.  
+
Consider a clinical trial measuring blood pressure in patients at baseline, 3 months, and 6 months post-treatment. Using independent ANOVA would violate the independence assumption since measurements are correlated within subjects. rANOVA addresses this by:
 +
1. **Partitioning variance** into between-subjects and within-subjects components
 +
2. **Increasing statistical power** by reducing error variance
 +
3. **Accounting for individual differences** through subject-specific effects
 +
4. **Enabling longitudinal analysis** of change over time
  
In order to provide a demonstration of how to calculate a repeated measures ANOVA, we shall use the example of a 6-month exercise-training intervention where six subjects had their fitness level measured on three occasions: pre-, 3 months, and post-intervention. Their data is shown below along with some initial calculations:
+
For studies with missing data, complex covariance structures, or unbalanced designs, '''Linear Mixed Models (LMMs)''' provide a more flexible alternative to traditional rANOVA.
<center>
 
{| class="wikitable" style="text-align:center; width:45%" border="1"
 
|-
 
|Subjects|| Pre- ||3 Months|| 6 Months|| Subject Means
 
|-
 
|1|| 45|| 50|| 55|| 50
 
|-
 
|2|| 42|| 42|| 45|| 43
 
|-
 
|3|| 36|| 41|| 43|| 40
 
|-
 
|4|| 39 ||35|| 40|| 38
 
|-
 
|5|| 51 ||55|| 59|| 55
 
|-
 
|6 ||44|| 49 ||56|| 49.7
 
|-
 
|Monthly Means|| 42.8|| 45.3|| 49.7||
 
|-
 
|}
 
</center>
 
  
 +
===Theory===
  
<center>
+
====1) Mathematical Model====
{| class="wikitable" style="text-align:center; width:45%" border="1"
 
|-
 
|Source|| SS|| df|| MS|| F
 
|-
 
|Conditions|| $SS_{conditions}$||$(k-1)$||$MS_{conditions}$|| $\frac {MS_{conditions}}{MS_{error}}$
 
|-
 
|Subjects|| $SS_{subjects}$||$(n-1)$|| $MS_{subjects}$||$\frac{MS_{subjects}} {MS_{error}}$
 
|-
 
|Error || $SS_{error}$||$(k-1)(n-1)$||$MS_{error}$|| 
 
|-
 
|Total|| $SS_{T}$|| $(N-1)$|| ||
 
|-
 
|}
 
</center>
 
  
$F=\frac {MS_{conditions}}{MS_{error}}=\frac {MS_{time}}{MS_{error}}$
+
=====Univariate Approach=====
 +
For \(n\) subjects measured at \(k\) time points, the univariate rANOVA model is:
 +
\[
 +
y_{ij} = \mu + \pi_i + \tau_j + \varepsilon_{ij}, \quad i=1,\ldots,n,\; j=1,\ldots,k
 +
\]
 +
where:
 +
- \(y_{ij}\) is the measurement for subject \(i\) at time \(j\)
 +
- \(\mu\) is the overall mean
 +
- \(\pi_i \sim N(0, \sigma_{\pi}^2)\) is the random subject effect
 +
- \(\tau_j\) is the fixed time effect (\(\sum_{j=1}^k \tau_j = 0\))
 +
- \(\varepsilon_{ij} \sim N(0, \sigma_{\varepsilon}^2)\) is the within-subject error
  
$SS_{Total} = SS_{conditions}+SS_{subjects}+SS_{Error}$
+
The covariance structure assumes:
 +
\[
 +
\text{Cov}(y_{ij}, y_{i'j'}) =  
 +
\begin{cases}  
 +
\sigma_{\pi}^2 + \sigma_{\varepsilon}^2 & \text{if } i=i', j=j' \\
 +
\sigma_{\pi}^2 & \text{if } i=i', j \neq j' \\
 +
0 & \text{if } i \neq i'
 +
\end{cases}
 +
\]
  
 +
=====Matrix Formulation=====
 +
Let \(\mathbf{y}_i = (y_{i1}, \ldots, y_{ik})^\top\) be the vector of measurements for subject \(i\):
 +
\[
 +
\mathbf{y}_i = \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol{\varepsilon}_i
 +
\]
 +
where:
 +
- \(\mathbf{X}_i\) is the design matrix for fixed effects
 +
- \(\boldsymbol{\beta}\) contains fixed effects (including time effects)
 +
- \(\mathbf{Z}_i\) is the design matrix for random effects
 +
- \(\mathbf{b}_i \sim N(\mathbf{0}, \mathbf{G})\) are random subject effects
 +
- \(\boldsymbol{\varepsilon}_i \sim N(\mathbf{0}, \mathbf{R}_i)\) is the within-subject error
  
with corresponding degrees of freedom:
+
For compound symmetry structure:
 +
\[
 +
\text{Var}(\mathbf{y}_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^\top + \mathbf{R}_i = \sigma_{\pi}^2\mathbf{J}_k + \sigma_{\varepsilon}^2\mathbf{I}_k
 +
\]
 +
where \(\mathbf{J}_k\) is a \(k \times k\) matrix of ones and \(\mathbf{I}_k\) is the identity matrix.
  
$df_{Total}=df_{conditions}+df_{subjects}+df_{Error}
+
====2) Variance Partitioning====
=(k-1)+(n-1)+((n-k)(n-1))$
 
  
 +
The total sum of squares decomposes as:
 +
\[
 +
SST = SS_{BetweenSubjects} + SS_{WithinSubjects}
 +
\]
 +
\[
 +
SS_{WithinSubjects} = SS_{Time} + SS_{Error}
 +
\]
  
$SS_{Time}$s the same as for $SS_{b}$ in an independent ANOVA, and
+
More formally:
 +
\[
 +
\begin{aligned}
 +
SST &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{\cdot\cdot})^2 \\
 +
SS_{Subjects} &= k \sum_{i=1}^n (\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2 \\
 +
SS_{Time} &= n \sum_{j=1}^k (\bar{y}_{\cdot j} - \bar{y}_{\cdot\cdot})^2 \\
 +
SS_{Error} &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{i\cdot} - \bar{y}_{\cdot j} + \bar{y}_{\cdot\cdot})^2
 +
\end{aligned}
 +
\]
  
 +
Degrees of freedom:
 +
\[
 +
\begin{aligned}
 +
df_{Total} &= nk - 1 \\
 +
df_{Subjects} &= n - 1 \\
 +
df_{Time} &= k - 1 \\
 +
df_{Error} &= (n - 1)(k - 1)
 +
\end{aligned}
 +
\]
  
$SS_{Time}=SS_{b}=\sum_{i=1}^{k} n_{i} (\bar x_{i}-\bar x)^{2}=$
+
Mean squares:
 +
\[
 +
\begin{aligned}
 +
MS_{Subjects} &= SS_{Subjects} / (n-1) \\
 +
MS_{Time} &= SS_{Time} / (k-1) \\
 +
MS_{Error} &= SS_{Error} / ((n-1)(k-1))
 +
\end{aligned}
 +
\]
  
where $k$ is the number of conditions, $n_{i}$ is the number of subjects under $ith$ condition, $\bar x_{i}$ is the mean score for each $ith$ condition and $\bar x$ is the overall grand mean of all conditions.
+
F-statistic for time effect:
 +
\[
 +
F = \frac{MS_{Time}}{MS_{Error}} \sim F_{k-1, (n-1)(k-1)} \quad \text{under } H_0: \tau_1 = \cdots = \tau_k
 +
\]
  
 +
====3) Sphericity and Corrections====
  
Here, we have
+
The **sphericity assumption** (also called **circularity**) requires:
 +
\[
 +
\text{Var}(y_{ij} - y_{ij'}) = \text{constant for all } j \neq j'
 +
\]
 +
Equivalently, the covariance matrix of the difference scores has equal variances and equal covariances.
  
$SS_{Time}=SS_{b}=\sum_{i=1}^{k} n_{i} (\bar x_{i}-\bar x)^{2}=6[(42.8-45.9)^{2}+(45.3-45.9)^{2}+(49.7-45.9)^{2}]=6[9.61+0.36+14.44]=143.44.$
+
'''Mauchly's Test for Sphericity:'''
 +
\[
 +
W = \frac{| \mathbf{E} |}{(\text{tr}(\mathbf{E})/p)^{p}} \sim \chi^2_{(p(p-1)/2 - 1)}
 +
\]
 +
where \(\mathbf{E}\) is the \(p \times p\) covariance matrix of the difference scores, \(p = k-1\).
  
 +
When sphericity is violated, corrections adjust degrees of freedom:
  
The within-subject variation $SS_{w}$ is calculated as
+
1. **Greenhouse-Geisser correction**: \(\hat{\epsilon} = \frac{(\text{tr}(\mathbf{E}))^2}{(p-1)\sum_{i=1}^p \lambda_i^2}\)
$SS_{W}=\sum_{1}(x_{i1}-\bar x_{1})^{2}+\sum_{2}(x_{i2}-\bar x_{2})^{2} +⋯+\sum_{k}(x_{ik}-\bar x_{k})^{2}$
+
  \[
 +
  F_{GG} \sim F_{\hat{\epsilon}(k-1), \hat{\epsilon}(n-1)(k-1)}
 +
  \]
  
where $x_{ik}$ is the score of the $ith$ subject in group $k$.
+
2. **Huynh-Feldt correction**: \(\tilde{\epsilon} = \frac{n(k-1)\hat{\epsilon} - 2}{(k-1)[(n-1) - (k-1)\hat{\epsilon}]}\)
 +
  \[
 +
  F_{HF} \sim F_{\tilde{\epsilon}(k-1), \tilde{\epsilon}(n-1)(k-1)}
 +
  \]
  
For this example we have:
+
====4) Multivariate Approach (MANOVA)====
$SS_{W}=\sum_{1}(x_{i1}-\bar x_{1})^{2}+\sum_{2}(x_{i2}-\bar x_{2})^{2} +⋯+\sum_{k}(x_{ik}-\bar x_{k})^{2}=[(45-42.8)^2+(42-42.8)^2+⋯+(56-49.7)^2 ]=715.5$
 
  
 +
For \(k\) time points, consider the \((k-1) \times 1\) vector of difference scores \(\mathbf{d}_i = (y_{i2}-y_{i1}, \ldots, y_{ik}-y_{i,k-1})^\top\):
 +
\[
 +
\mathbf{d}_i \sim N_{k-1}(\boldsymbol{\mu}_d, \boldsymbol{\Sigma}_d)
 +
\]
 +
Test \(H_0: \boldsymbol{\mu}_d = \mathbf{0}\) using Hotelling's \(T^2\):
 +
\[
 +
T^2 = n\bar{\mathbf{d}}^\top \mathbf{S}_d^{-1} \bar{\mathbf{d}} \sim \frac{(n-1)(k-1)}{n-k+1} F_{k-1, n-k+1}
 +
\]
 +
where \(\bar{\mathbf{d}}\) is the sample mean and \(\mathbf{S}_d\) is the sample covariance of difference scores.
  
 +
===Assumptions and Diagnostics===
  
$SS_{subject}$ is calculated by: $SS_{subject}=k\sum(\bar x_{i}-\bar x )^{2}$ where $\bar x_{i}$ is the mean score of the $ith$ subject, $\bar x$ is the grand mean.
+
'''Key Assumptions:'''
 +
1. **Normality**: Residuals \(\varepsilon_{ij}\) should be normally distributed
 +
2. **Sphericity**: Equal variances of differences between all pairs of repeated measures
 +
3. **Missing Data**: Missing completely at random (MCAR) for valid inference
 +
4. **No Time × Subject Interaction**: Treatment effects are consistent across subjects
  
 +
'''Diagnostic Procedures:'''
 +
<pre>
 +
# Comprehensive rANOVA diagnostic function
 +
check_ranova_assumptions <- function(data, subject_var, time_var, response_var) {
 +
 
 +
  cat("=== REPEATED MEASURES ANOVA DIAGNOSTICS ===\n")
 +
 
 +
  # 1. Normality of residuals
 +
  # Fit a linear model ignoring the repeated measures structure first
 +
  model_lm <- lm(as.formula(paste(response_var, "~", time_var)), data = data)
 +
  residuals <- resid(model_lm)
 +
 
 +
  par(mfrow = c(2, 3))
 +
 
 +
  # Q-Q plot
 +
  qqnorm(residuals, main = "Q-Q Plot of Residuals")
 +
  qqline(residuals)
 +
 
 +
  # Shapiro-Wilk test
 +
  shapiro_test <- shapiro.test(residuals)
 +
  cat("1. Normality (Shapiro-Wilk): W =", round(shapiro_test$statistic, 4),
 +
      "p =", format.pval(shapiro_test$p.value, digits = 4), "\n")
 +
 
 +
  # 2. Sphericity test (requires ez package)
 +
  if (require(ez, quietly = TRUE)) {
 +
    # Convert to wide format for Mauchly's test
 +
    wide_data <- reshape(data,
 +
                        idvar = subject_var,
 +
                        timevar = time_var,
 +
                        direction = "wide")
 +
   
 +
    # Remove subject column
 +
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
 +
    response_matrix <- as.matrix(wide_data[, response_cols])
 +
   
 +
    # Mauchly's test using car package
 +
    if (require(car, quietly = TRUE)) {
 +
      # Create an idata frame for repeated measures
 +
      time_levels <- unique(data[[time_var]])
 +
      idata <- data.frame(time = factor(time_levels))
 +
     
 +
      # Fit linear model for repeated measures
 +
      mlm <- lm(response_matrix ~ 1)
 +
     
 +
      # Mauchly's test
 +
      mauchly_test <- Anova(mlm, idata = idata, idesign = ~time, type = "III")
 +
      mauchly_result <- mauchly_test$sphericity.tests
 +
     
 +
      cat("\n2. Mauchly's Test for Sphericity:\n")
 +
      print(mauchly_result)
 +
     
 +
      # Greenhouse-Geisser and Huynh-Feldt corrections
 +
      epsilon_GG <- mauchly_result["Greenhouse-Geisser epsilon", "GG eps"]
 +
      epsilon_HF <- mauchly_result["Huynh-Feldt epsilon", "HF eps"]
 +
      cat("\nEpsilon corrections:\n")
 +
      cat("  Greenhouse-Geisser:", round(epsilon_GG, 3), "\n")
 +
      cat("  Huynh-Feldt:", round(epsilon_HF, 3), "\n")
 +
    }
 +
  }
 +
 
 +
  # 3. Homogeneity of variance across time points
 +
  boxplot(as.formula(paste(response_var, "~", time_var)),
 +
          data = data,
 +
          main = "Boxplots by Time Point",
 +
          xlab = time_var, ylab = response_var)
 +
 
 +
  # Levene's test for homogeneity of variance
 +
  if (require(car, quietly = TRUE)) {
 +
    levene_test <- leveneTest(as.formula(paste(response_var, "~ factor(", time_var, ")")),
 +
                              data = data)
 +
    cat("\n3. Levene's Test for Homogeneity of Variance:\n")
 +
    print(levene_test)
 +
  }
 +
 
 +
  # 4. Individual profiles plot
 +
  data_sorted <- data[order(data[[subject_var]], data[[time_var]]), ]
 +
 
 +
  # Create empty plot
 +
  plot(1, type = "n",
 +
      xlim = range(as.numeric(data[[time_var]])),
 +
      ylim = range(data[[response_var]]),
 +
      xlab = time_var, ylab = response_var,
 +
      main = "Individual Profiles")
 +
 
 +
  # Plot each subject's profile
 +
  subjects <- unique(data[[subject_var]])
 +
  colors <- rainbow(length(subjects))
 +
 
 +
  for (i in seq_along(subjects)) {
 +
    subject_data <- data[data[[subject_var]] == subjects[i], ]
 +
    lines(as.numeric(subject_data[[time_var]]),
 +
          subject_data[[response_var]],
 +
          col = colors[i], type = "b", lwd = 1)
 +
  }
 +
 
 +
  # Add group mean profile
 +
  group_means <- aggregate(as.formula(paste(response_var, "~", time_var)),
 +
                          data = data, mean)
 +
  lines(as.numeric(group_means[[time_var]]),
 +
        group_means[[response_var]],
 +
        col = "black", lwd = 3, type = "b")
 +
 
 +
  legend("topright", legend = c("Individual", "Group Mean"),
 +
        col = c("gray", "black"), lwd = c(1, 3), bty = "n")
 +
 
 +
  # 5. Residuals vs fitted plot
 +
  plot(fitted(model_lm), residuals,
 +
      xlab = "Fitted Values", ylab = "Residuals",
 +
      main = "Residuals vs Fitted")
 +
  abline(h = 0, col = "red", lty = 2)
 +
 
 +
  # 6. Histogram of residuals
 +
  hist(residuals, main = "Histogram of Residuals",
 +
      xlab = "Residuals", col = "lightblue", freq = FALSE)
 +
  curve(dnorm(x, mean = mean(residuals), sd = sd(residuals)),
 +
        add = TRUE, col = "red", lwd = 2)
 +
 
 +
  par(mfrow = c(1, 1))
 +
 
 +
  # Return diagnostic summary
 +
  return(list(
 +
    shapiro_test = shapiro_test,
 +
    sphericity_tests = if(exists("mauchly_result")) mauchly_result else NULL,
 +
    homogeneity_test = if(exists("levene_test")) levene_test else NULL
 +
  ))
 +
}
 +
</pre>
  
In our example, we have $SS_{subject}=k\sum(\bar x_{i})-\bar x^{2}=3[(50-45.9)^2+(40-45.9)^2+(38-45.9)^2+(55-45.9)^2+(49.7-45.9)^2 ]=658.3.$
+
===Applications===
  
 +
====Example 1: Exercise Training Intervention====
 +
<pre>
 +
# Recreate the example from the motivation section
 +
exercise_data <- data.frame(
 +
  subject = factor(rep(1:6, each = 3)),
 +
  time = factor(rep(c("Pre", "3Months", "6Months"), 6)),
 +
  fitness = c(45, 50, 55,    # Subject 1
 +
              42, 42, 45,    # Subject 2
 +
              36, 41, 43,    # Subject 3
 +
              39, 35, 40,    # Subject 4
 +
              51, 55, 59,    # Subject 5
 +
              44, 49, 56)    # Subject 6
 +
)
  
Thus,
+
cat("=== Exercise Training Intervention Study ===\n")
 +
cat("Research question: Does fitness level change over time?\n\n")
  
$SS_{Error}= SS_{w}-SS_{subject}=715.5-658.3=57.2$
+
# Display the data
 +
cat("Dataset:\n")
 +
print(exercise_data)
  
$F=MS_{Time}⁄MS_{Error} =(SS_{Time}⁄df_{Time})(SS_{Error}⁄df_{Error})=$
+
# Summary statistics
 +
cat("\nSummary statistics by time:\n")
 +
print(aggregate(fitness ~ time, data = exercise_data,
 +
                FUN = function(x) c(mean = mean(x), sd = sd(x))))
  
$=(SS_{Time}⁄(k-1))⁄(SS_{Error}⁄(n-1)(k-1))=$
+
# Run diagnostics first
 +
cat("\n=== Model Diagnostics ===\n")
 +
diagnostics <- check_ranova_assumptions(exercise_data, "subject", "time", "fitness")
  
$(146.44⁄((3-1))(57.2⁄(5*2))=12.53$
+
# Traditional rANOVA using aov()
 +
cat("\n=== Traditional rANOVA ===\n")
 +
model_aov <- aov(fitness ~ time + Error(subject/time), data = exercise_data)
 +
print(summary(model_aov))
  
 +
# Extract F-statistic and p-value
 +
aov_summary <- summary(model_aov)
 +
F_time <- aov_summary[[2]][[1]]$"F value"[1]
 +
df_time <- aov_summary[[2]][[1]]$"Df"[1]
 +
df_error <- aov_summary[[2]][[1]]$"Df"[2]
 +
p_time <- aov_summary[[2]][[1]]$"Pr(>F)"[1]
  
We can now look up (or use a computer program) to ascertain the critical F-statistic for our F-distribution with our degrees of freedom for time $df_{Time}$ and error $df_{Error}$ and determine whether our F-statistic indicates a statistically significant result.
+
cat(sprintf("\nF(%d, %d) = %.2f, p = %.3f\n",
We report the F-statistic from a repeated measures ANOVA as:
+
            df_time, df_error, F_time, p_time))
  
 +
# Effect size calculation (Partial eta-squared)
 +
SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
 +
SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
 +
eta_sq <- SS_time / (SS_time + SS_error)
 +
cat("Partial eta-squared (effect size):", round(eta_sq, 3), "\n")
  
$F(df_{Time}, df_{Error} = F-value, p = p-value,$
+
# Post-hoc pairwise comparisons with Bonferroni correction
 +
cat("\n=== Post-hoc Pairwise Comparisons ===\n")
 +
library(emmeans)
  
which for our example would be:
+
# Fit linear model for post-hoc tests
 +
model_lm <- lm(fitness ~ time + subject, data = exercise_data)
 +
emm_time <- emmeans(model_lm, specs = pairwise ~ time, adjust = "bonferroni")
 +
print(emm_time$contrasts)
  
$F(2, 10) = 12.53, p = 0.002,$
+
# Visualization
see [http://socr.ucla.edu/htmls/dist/Fisher_Distribution.html  SOCR Java F-distribution calculator],  
+
library(ggplot2)
or  [http://www.distributome.org/V3/calc/FCalculator.html  the Distributome HTML5 F-distribution Calculator]
+
ggplot(exercise_data, aes(x = time, y = fitness, group = subject)) +
 +
  geom_line(aes(color = subject), alpha = 0.6) +
 +
  geom_point(aes(color = subject), size = 2) +
 +
  stat_summary(aes(group = 1), fun = mean, geom = "line",
 +
              color = "black", size = 1.5) +
 +
  stat_summary(fun.data = mean_se, geom = "errorbar",
 +
              width = 0.1, color = "black") +
 +
  labs(title = "Fitness Levels Over Time",
 +
      subtitle = "Individual profiles with group mean",
 +
      x = "Time Point", y = "Fitness Score") +
 +
  theme_minimal()
 +
</pre>
  
 +
====Example 2: Clinical Trial with Missing Data (Using Mixed Models)====
 +
<pre>
 +
# Simulated clinical trial data with missing values
 +
set.seed(123)
 +
n_subjects <- 30
 +
n_timepoints <- 4
  
This means we can reject the null hypothesis and accept the alternative hypothesis. As we will discuss later, there are assumptions and effect sizes we can calculate that can alter how we report the above result. However, we would otherwise report the above findings for this example exercise study as: The six-month exercise-training program had a statistically significant effect on fitness levels, $F(2, 10) = 12.53, p = .002.$
+
clinical_data <- expand.grid(
 +
  subject = factor(1:n_subjects),
 +
  time = factor(paste0("Week_", 0:3)),
 +
  treatment = factor(rep(c("Drug", "Placebo"), each = n_subjects/2))
 +
)
  
 +
# Generate data with treatment effect and time trend
 +
clinical_data$response <- with(clinical_data,
 +
  50 +  # Baseline
 +
  ifelse(treatment == "Drug", 5, 0) +  # Treatment effect
 +
  as.numeric(time) * 2 +  # Time effect (2 units per week)
 +
  ifelse(treatment == "Drug", as.numeric(time) * 1, 0) +  # Treatment × time interaction
 +
  rep(rnorm(n_subjects, 0, 3), each = n_timepoints) +  # Random intercept
 +
  rnorm(n_subjects * n_timepoints, 0, 2)  # Residual error
 +
)
  
===Theory===
+
# Introduce missing data (MCAR)
Assumption:
+
missing_indices <- sample(1:nrow(clinical_data), size = 20)
As with all statistical analyses, specific assumptions should be met to justify the use of this test. With repeated measures ANOVA, standard univariate and multivariate assumptions apply. The list of assumptions include:
+
clinical_data$response[missing_indices] <- NA
*Normality: For each level of the within subject factor, the dependent variable must have a normal distribution.
 
*Sphericity: Difference scores computed between two levels of a within-subjects factor must have the same variance for the comparison of any two levels (this assumptions only applied with more than 2 levels of independent variables).
 
*Randomness: Cases should be derived from a random sample, and different scores for each participant are independent from those of other participants.
 
*Multivariate normality: The difference scores are multivariately normally distributed in the population.
 
  
One of the greatest advantages of rANOVA, as is the case with repeated measures designs in general, is the ability to partition out variability due to individual differences. Consider the general structure of F-statistic:
+
cat("=== Clinical Trial with Repeated Measures ===\n")
 +
cat("Research question: Does the treatment affect response over time?\n\n")
  
$F=MS_{Treatment}⁄MS_{Error} =(SS_{Treatment}⁄df_{Treatment})(SS_{Error}⁄df_{Error})$
+
cat("Missing data pattern:\n")
 +
print(table(is.na(clinical_data$response)))
  
 +
# Approach 1: Linear Mixed Model (recommended for missing data)
 +
cat("\n=== Linear Mixed Model Approach ===\n")
 +
library(lme4)
 +
library(lmerTest)
  
Is a between-subject design there is an element of variance due to individual difference that is combined with the treatment and error terms:
+
# Fit mixed model with random intercepts
$SS_{Total}=SS_{Treatment}+SS_{Error},df_{Total}=n-1$
+
mixed_model <- lmer(response ~ time * treatment + (1|subject),
 +
                    data = clinical_data,  
 +
                    na.action = na.omit)
  
 +
cat("\nModel summary:\n")
 +
print(summary(mixed_model))
  
In a repeated measures design it is possible to partition subject variability from the treatment and error terms. In such a case, variability can be broken down into between-treatments variability (or within-subjects effects, excluding individual differences) and within-treatments variability. The within-treatments variability can be further partitioned into between-subjects variability (individual differences) and error (excluding the individual differences):
+
# Test fixed effects
 +
cat("\nTests of fixed effects:\n")
 +
print(anova(mixed_model))
  
 +
# Random effects variance components
 +
cat("\nVariance components:\n")
 +
print(VarCorr(mixed_model))
  
$F=MS_{conditions}⁄MS_{Error} =MS_{Time}⁄MS_{Error}$
+
# Estimated marginal means
+
if (require(emmeans)) {
$SS_{Error}=SS_{w}-SS_{subject}$
+
  cat("\n=== Estimated Marginal Means ===\n")
 +
  emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
 +
  print(emm_treatment)
 +
 
 +
  emm_time <- emmeans(mixed_model, specs = ~ time)
 +
  print(emm_time)
 +
 
 +
  # Interaction plot
 +
  emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
 +
  print(emm_interaction)
 +
 
 +
  # Plot interaction
 +
  plot_data <- as.data.frame(emm_interaction)
 +
  ggplot(plot_data, aes(x = time, y = emmean, group = treatment, color = treatment)) +
 +
    geom_line(size = 1.5) +
 +
    geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.1) +
 +
    geom_point(size = 3) +
 +
    labs(title = "Treatment × Time Interaction",
 +
        subtitle = "Estimated marginal means with 95% confidence intervals",
 +
        x = "Time Point", y = "Response") +
 +
    theme_minimal()
 +
}
  
$SS_{Total}=SS_{conditions}+SS_{Subjects}+SS_{Error}$,  
+
# Approach 2: Traditional rANOVA (for comparison)
 +
cat("\n=== Traditional rANOVA (for comparison) ===\n")
 +
# Remove subjects with missing data for traditional ANOVA
 +
complete_data <- clinical_data[complete.cases(clinical_data), ]
 +
complete_subjects <- unique(complete_data$subject)
 +
cat("Subjects with complete data:", length(complete_subjects), "\n")
  
$df_{Total}=df_{conditions}+df_{subjects}+df_{Error}=(k-1)+(n-1)+((n-k)(n-1))$.
+
if (length(complete_subjects) >= 3) {
 +
  model_aov2 <- aov(response ~ time * treatment + Error(subject/time),
 +
                    data = complete_data)
 +
  print(summary(model_aov2))
 +
} else {
 +
  cat("Insufficient complete cases for traditional rANOVA\n")
 +
}
 +
</pre>
  
 +
====Example 3: Real Dataset - CO2 Uptake in Plants====
 +
<pre>
 +
# Using built-in CO2 dataset from R
 +
data(CO2)
 +
cat("=== CO2 Uptake in Plants ===\n")
 +
cat("Research question: Does CO2 uptake change with concentration?\n\n")
  
In reference to the general structure of the F-statistic, it is clear that by partitioning out the between-subjects variability, the F-value will increase because the sum of squares error term will be smaller resulting in a smaller MSError. It is noteworthy that partitioning variability reduces degrees of freedom from the F-test, therefore the between-subjects variability must be significant enough to offset the loss in degrees of freedom. If between-subjects variability is small this process may actually reduce the F-value.
+
# Prepare data: focus on Quebec plants, type Qn1
 +
co2_data <- CO2[CO2$Type == "Quebec" & CO2$Plant == "Qn1", ]
 +
co2_data$conc_factor <- factor(co2_data$conc)
  
===F test:===
+
cat("Data structure:\n")
As with other analysis of variance tests, the rANOVA makes use of an F statistic to determine significance. Depending on the number of within-subjects factors and assumption violations, it is necessary to select the most appropriate of three tests:
+
str(co2_data)
  
*Standard Univariate ANOVA F test — This test is commonly used given only two levels of the within-subjects factor (i.e. time point 1 and time point 2). This test is not recommended given more than 2levels of the within-subjects factor because the assumption of sphericity is commonly violated in such cases.
+
# Visual inspection
*Alternative Univariate test — These tests account for violations to the assumption of sphericity, and can be used when the within-subjects factor exceeds 2 levels. The F statistic is the same as in the Standard Univariate ANOVA F test, but is associated with a more accurate p-value. This correction is done by adjusting the degrees of freedom downward for determining the critical F value. Two corrections are commonly used—The Greenhouse-Geisser correction and the Huynh-Feldt correction. The Greenhouse-Geisser correction is more conservative, but addresses a common issue of increasing variability over time in a repeated-measures design.
+
library(ggplot2)
*Multivariate Test—This test does not assume sphericity, but is also highly conservative.
+
ggplot(co2_data, aes(x = conc, y = uptake, group = 1)) +
 +
  geom_line() +
 +
  geom_point() +
 +
  labs(title = "CO2 Uptake by Concentration",
 +
      x = "CO2 Concentration (mL/L)", y = "Uptake (umol/m^2 sec)") +
 +
  theme_minimal()
  
 +
# Since we have only one plant, this is a single-subject repeated measures design
 +
# We'll use a polynomial contrast analysis
  
'''Effect size:'''
+
cat("\n=== Polynomial Trend Analysis ===\n")
One of the most commonly reported effect size statistics for rANOVA is partial eta-squared $(η_{p}^{2})$. It is also common to use the multivariate $η^{2}$ when the assumption of sphericity has been violated, and the multivariate test statistic is reported. A third effect size statistic that is reported is the generalized $η^{2}$, which is comparable to $η_{p}^{2}$ in a one-way repeated measures ANOVA. It has been shown to be a better estimate of effect size with other within-subjects tests. 
+
# Create orthogonal polynomial contrasts for concentration
 +
conc_levels <- length(unique(co2_data$conc))
 +
poly_contrasts <- contr.poly(conc_levels)
  
'''Caution:'''
+
# Apply contrasts
rANOVA is not always the best statistical analyses for repeated measure designs. It is vulnerable to effects from missing values, imputation, unequivalent time points between subjects and violations of sphericity. These issues can result in sampling bias and inflated rates of Type I error. In such cases it may be better to consider use of a linear mixed model.
+
co2_data$conc_factor <- factor(co2_data$conc)
 +
contrasts(co2_data$conc_factor) <- poly_contrasts
  
 +
# Fit model with polynomial contrasts
 +
model_poly <- lm(uptake ~ conc_factor, data = co2_data)
 +
print(summary(model_poly))
  
===Applications===
+
# Extract polynomial components
 +
anova_poly <- anova(model_poly)
 +
cat("\nPolynomial decomposition:\n")
 +
print(anova_poly)
  
[http://www.southampton.ac.uk/~cpd/anovas/datasets/index.htm  This article] introduced some examples of analysis of variance and covariance (ANOVA) and computer programs for planning data collection designs and estimating power. It introduced about the basic statistical model set up and ANOVA models with one-factor design, nested designs, fully replicated factorial designs, randomized-block designs, split-plot designs, repeated-measures designs and un-replicated designs. In the repeated measure ANOVA, it included examples of one-factor repeated-measures model, two and three factor model with repeated measures.
+
# Calculate proportion of variance explained by each polynomial component
 +
SS_total <- sum(anova_poly$`Sum Sq`)
 +
for (i in 1:nrow(anova_poly)) {
 +
  prop_var <- anova_poly$`Sum Sq`[i] / SS_total
 +
  cat(sprintf("Component %d: %.1f%% of variance\n",  
 +
              i, prop_var * 100))
 +
}
 +
</pre>
  
[http://psycnet.apa.org/journals/bul/82/4/511/ This article] used repeated measure ANOVA to analyze the data from a pretest-posttest design. The pretest-posttest control group design (or an extension of it) is a highly prestigious experimental design. A popular analytic strategy involves subjecting the data provided by this design to a repeated measures analysis of variance (ANOVA). Unfortunately, the statistical results yielded by this type of analysis can easily be misinterpreted, since the score model underlying the analysis is not correct. Examples from recently published articles are used to demonstrate that this statistical procedure has led to (a) incorrect statements regarding treatment effects, (b) completely redundant reanalyzes of the same data, and (c) problems with respect to post hoc investigations. 2 alternative strategies-gain scores and covariance-are discussed and compared. (19 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
+
===Advanced Topics===
  
 +
====1) Power Analysis for Repeated Measures====
 +
<pre>
 +
# Power analysis for repeated measures designs
 +
calculate_power_rm <- function(n, k, rho, f2, alpha = 0.05) {
 +
  # n: number of subjects
 +
  # k: number of time points
 +
  # rho: correlation between repeated measures
 +
  # f2: effect size (Cohen's f^2)
 +
 
 +
  # Degrees of freedom
 +
  df1 <- k - 1
 +
  df2 <- (n - 1) * (k - 1)
 +
 
 +
  # Non-centrality parameter
 +
  lambda <- n * k * f2
 +
 
 +
  # Critical F value
 +
  F_crit <- qf(1 - alpha, df1, df2)
 +
 
 +
  # Power
 +
  power <- 1 - pf(F_crit, df1, df2, ncp = lambda)
 +
 
 +
  return(power)
 +
}
  
===Software (R)===
+
# Example: Clinical trial with 20 subjects, 4 time points
[http://www.southampton.ac.uk/~cpd/anovas/datasets/ANOVA%20in%20R.htm  ANOVA datasets]
+
power_example <- calculate_power_rm(n = 20, k = 4, rho = 0.5, f2 = 0.25)
 +
cat("Power for repeated measures design:\n")
 +
cat("Subjects: 20, Time points: 4, Effect size f² = 0.25\n")
 +
cat("Estimated power:", round(power_example, 3), "\n")
  
[http://ww2.coastal.edu/kingw/statistics/R-tutorials/repeated.html R-tutorials] 
+
# Power curve visualization
 +
library(ggplot2)
 +
n_range <- seq(10, 50, by = 5)
 +
power_values <- sapply(n_range, function(n) calculate_power_rm(n, 4, 0.5, 0.25))
  
===Problems===
+
power_df <- data.frame(n = n_range, power = power_values)
1) A simple problem would be consider the following experiment:
+
ggplot(power_df, aes(x = n, y = power)) +
10 participants engage in an LDT, within which they are exposed to 4 different types of word pairs. RT to click Word or Non-Word recorded and averaged for each pair type.
+
  geom_line(size = 1.5, color = "blue") +
Forward-Prime pairs (e.g., baby-stork)
+
  geom_point(size = 3) +
 +
  geom_hline(yintercept = 0.8, linetype = "dashed", color = "red") +
 +
  labs(title = "Power Analysis for Repeated Measures Design",
 +
      subtitle = "k = 4 time points, ρ = 0.5, f² = 0.25",
 +
      x = "Number of Subjects", y = "Statistical Power") +
 +
  theme_minimal()
 +
</pre>
  
Backward-Prime pairs (e.g., stork-baby)
+
====2) Handling Autocorrelated Errors====
 
+
<pre>
Unrelated Pairs (e.g., glass-apple)
+
# When measurements are equally spaced in time, errors may be autocorrelated
 
+
library(nlme)
Non-Word Pairs (e.g., door-blug)
 
 
 
Do a repeated measures ANOVA test on for the overall RM ANOVA:
 
$Ho: µf = µb = µu= µn$
 
 
 
$Ha$:  At least one treatment mean is different from another.
 
 
The data set is listed as below:
 
<center>
 
{| class="wikitable" style="text-align:center; width:35%" border="1"
 
|-
 
||Part|| Forward f||Backward b|| Unrelated  u|| Non-word  n  || Sum Part.||
 
|-
 
|1||0.2||0.1||0.4||0.7||1.4||
 
|-
 
|2||0.5||0.3||0.8||0.9||2.5||
 
|-
 
|3||0.4||0.3||0.6||0.8||2.1||$\sum F^{2}= 1.36$
 
|-
 
|4||0.4||0.2||0.8||0.9||2.3||$\sum B^{2}=.58$
 
|-
 
|5||0.6||0.4||0.8||0.8||2.6||$\sum U^{2}=3.82$
 
|-
 
|6||0.3||0.3||0.5||0.8||1.9||$\sum N^{2}= 6.63$
 
|-
 
|7||0.1||0.1||0.5||0.7||1.4||$\sum P^{2}= 385$
 
|-
 
|8||0.2||0.1||0.6||0.9||1.8||$\sum X_{t}^{2}= 12.39$
 
|-
 
|9||0.3||0.2||0.4||0.7||1.6||
 
|-
 
|10||0.4||0.2||0.6||0.9||2.1||
 
|-
 
|$\sum$||3.4||2.2||6||8.1||19.7||All = 19.7
 
|}
 
</center>
 
 
 
2) use these data
 
[http://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI SOPCR Data] and apply this rANOVA protocol
 
[http://ww2.coastal.edu/kingw/statistics/R-tutorials/repeated.html R-tutorials] to showcase the group differences (NC, MCI and AD) relative to some covariates (2 or more of, MMSE CDR Sex Age TBV GMV WMV CSFV) – you can focus only on one shape-measure (e.g., SA).
 
 
 
 
 
 
 
===References===
 
 
 
[http://en.wikipedia.org/wiki/Repeated_measures_design  Repeated Measures Design Wikipedia]
 
 
 
[https://statistics.laerd.com/statistical-guides/repeated-measures-anova-statistical-guide.php  Repeated Measures ANOVA]
 
 
 
 
 
 
  
 +
# Simulate data with AR(1) errors
 +
set.seed(456)
 +
n <- 15
 +
k <- 5
 +
time_points <- 1:k
  
 +
ar_data <- data.frame(
 +
  subject = rep(1:n, each = k),
 +
  time = rep(time_points, n),
 +
  treatment = rep(rep(c("A", "B"), length.out = n), each = k)
 +
)
  
 +
# Generate data with AR(1) errors
 +
for (i in 1:n) {
 +
  ar_errors <- arima.sim(model = list(ar = 0.7), n = k, sd = 2)
 +
  idx <- ((i-1)*k + 1):(i*k)
 +
  ar_data$response[idx] <- 50 +
 +
    ifelse(ar_data$treatment[idx] == "A", 5, 0) +
 +
    ar_data$time[idx] * 2 +
 +
    ar_errors
 +
}
  
 +
# Fit model with AR(1) correlation structure
 +
model_ar1 <- lme(response ~ time * treatment,
 +
                random = ~ 1 | subject,
 +
                correlation = corAR1(form = ~ time | subject),
 +
                data = ar_data)
  
 +
cat("=== Model with AR(1) Correlation Structure ===\n")
 +
print(summary(model_ar1))
 +
cat("\nCorrelation parameter:", intervals(model_ar1)$corStruct[2], "\n")
  
 +
# Compare with compound symmetry model
 +
model_cs <- lme(response ~ time * treatment,
 +
                random = ~ 1 | subject,
 +
                correlation = corCompSymm(form = ~ 1 | subject),
 +
                data = ar_data)
  
 +
# Model comparison
 +
cat("\n=== Model Comparison ===\n")
 +
print(anova(model_ar1, model_cs))
 +
</pre>
  
 +
===Software Implementation===
  
 +
<pre>
 +
# Comprehensive rANOVA analysis function
 +
run_ranova_analysis <- function(data, subject_var, time_var, response_var,
 +
                                between_vars = NULL, method = "mixed") {
 +
 
 +
  cat("=== REPEATED MEASURES ANOVA ANALYSIS ===\n\n")
 +
  cat("Subject variable:", subject_var, "\n")
 +
  cat("Time variable:", time_var, "\n")
 +
  cat("Response variable:", response_var, "\n")
 +
  if (!is.null(between_vars)) {
 +
    cat("Between-subjects variables:", paste(between_vars, collapse = ", "), "\n")
 +
  }
 +
  cat("Method:", method, "\n\n")
 +
 
 +
  # Check for missing data
 +
  missing_count <- sum(is.na(data[[response_var]]))
 +
  cat("Missing observations:", missing_count,
 +
      sprintf("(%.1f%%)\n", missing_count/nrow(data)*100))
 +
 
 +
  if (method == "mixed") {
 +
    # Linear mixed model approach
 +
    library(lme4)
 +
    library(lmerTest)
 +
   
 +
    # Build formula
 +
    if (is.null(between_vars)) {
 +
      formula_str <- paste(response_var, "~", time_var, "+ (1|", subject_var, ")")
 +
    } else {
 +
      between_str <- paste(between_vars, collapse = " + ")
 +
      # Add interactions with time
 +
      interaction_str <- paste(paste(time_var, between_vars, sep = ":"), collapse = " + ")
 +
      formula_str <- paste(response_var, "~", time_var, "*", between_str,
 +
                          "+ (1|", subject_var, ")")
 +
    }
 +
   
 +
    cat("Fitting model:", formula_str, "\n\n")
 +
    model <- lmer(as.formula(formula_str), data = data)
 +
   
 +
    cat("--- Model Summary ---\n")
 +
    print(summary(model))
 +
   
 +
    cat("\n--- Tests of Fixed Effects ---\n")
 +
    print(anova(model))
 +
   
 +
    # Variance components
 +
    cat("\n--- Variance Components ---\n")
 +
    print(VarCorr(model))
 +
   
 +
    # Calculate intraclass correlation coefficient (ICC)
 +
    vc <- VarCorr(model)
 +
    sigma_subject <- as.data.frame(vc)$vcov[1]
 +
    sigma_residual <- as.data.frame(vc)$vcov[2]
 +
    ICC <- sigma_subject / (sigma_subject + sigma_residual)
 +
    cat("\nIntraclass Correlation Coefficient (ICC):", round(ICC, 3), "\n")
 +
   
 +
  } else if (method == "traditional") {
 +
    # Traditional rANOVA using aov()
 +
    if (!is.null(between_vars)) {
 +
      formula_str <- paste(response_var, "~",
 +
                          paste(between_vars, collapse = " + "),
 +
                          "+ Error(", subject_var, "/", time_var, ")")
 +
    } else {
 +
      formula_str <- paste(response_var, "~", time_var,
 +
                          "+ Error(", subject_var, "/", time_var, ")")
 +
    }
 +
   
 +
    cat("Fitting model:", formula_str, "\n\n")
 +
    model <- aov(as.formula(formula_str), data = data)
 +
   
 +
    cat("--- ANOVA Table ---\n")
 +
    print(summary(model))
 +
  }
 +
 
 +
  # Post-hoc tests if there's a significant time effect
 +
  cat("\n--- Post-hoc Comparisons ---\n")
 +
  if (require(emmeans)) {
 +
    if (method == "mixed") {
 +
      emm_time <- emmeans(model, specs = pairwise ~ time_var)
 +
    } else {
 +
      # For traditional ANOVA, fit a linear model for post-hoc tests
 +
      if (is.null(between_vars)) {
 +
        lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var))
 +
      } else {
 +
        lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var,
 +
                                      "+", paste(between_vars, collapse = " + ")))
 +
      }
 +
      lm_model <- lm(lm_formula, data = data)
 +
      emm_time <- emmeans(lm_model, specs = pairwise ~ time_var, adjust = "tukey")
 +
    }
 +
    print(emm_time$contrasts)
 +
  }
 +
 
 +
  # Effect sizes
 +
  cat("\n--- Effect Sizes ---\n")
 +
  if (method == "traditional") {
 +
    aov_summary <- summary(model)
 +
    if (!is.null(between_vars)) {
 +
      # This is more complex with between-subjects factors
 +
      cat("Note: Effect size calculation for factorial rANOVA requires manual computation\n")
 +
    } else {
 +
      SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
 +
      SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
 +
      eta_sq <- SS_time / (SS_time + SS_error)
 +
      cat("Partial eta-squared:", round(eta_sq, 3), "\n")
 +
    }
 +
  }
 +
 
 +
  return(model)
 +
}
  
 +
# Example usage with built-in dataset
 +
cat("\n\n=== EXAMPLE: Chick Weight Dataset ===\n")
 +
data(ChickWeight)
 +
ChickWeight$Time <- factor(ChickWeight$Time)
 +
ChickWeight$Chick <- factor(ChickWeight$Chick)
  
 +
# Run analysis
 +
result <- run_ranova_analysis(
 +
  data = ChickWeight,
 +
  subject_var = "Chick",
 +
  time_var = "Time",
 +
  response_var = "weight",
 +
  between_vars = "Diet",
 +
  method = "mixed"
 +
)
 +
</pre>
  
 +
===Common Issues and Solutions===
  
 +
####1) Missing Data Patterns
 +
<pre>
 +
# Identify missing data patterns
 +
library(mice)
  
 +
analyze_missing_patterns <- function(data, subject_var, time_var, response_var) {
 +
  # Convert to wide format
 +
  wide_data <- reshape(data,
 +
                      idvar = subject_var,
 +
                      timevar = time_var,
 +
                      direction = "wide")
 +
 
 +
  # Create pattern matrix
 +
  pattern_matrix <- md.pattern(wide_data[, -1], plot = FALSE)
 +
 
 +
  cat("Missing Data Patterns:\n")
 +
  print(pattern_matrix)
 +
 
 +
  # Visualize missingness
 +
  if (require(VIM)) {
 +
    aggr_plot <- aggr(wide_data[, -1],
 +
                      numbers = TRUE,
 +
                      sortVars = TRUE,
 +
                      labels = names(wide_data)[-1],
 +
                      cex.axis = 0.7,
 +
                      gap = 3,
 +
                      ylab = c("Missing Data Pattern", "Frequency"))
 +
  }
 +
 
 +
  # Test for MCAR using Little's test
 +
  if (require(norm2)) {
 +
    # Prepare data for Little's test
 +
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
 +
    test_data <- wide_data[, response_cols]
 +
   
 +
    # Perform Little's MCAR test
 +
    little_test <- mcar.test(test_data)
 +
    cat("\nLittle's MCAR Test:\n")
 +
    print(little_test)
 +
  }
 +
 
 +
  return(pattern_matrix)
 +
}
  
 +
# Example
 +
missing_pattern <- analyze_missing_patterns(clinical_data, "subject", "time", "response")
 +
</pre>
  
 +
####2) Violation of Sphericity
 +
<pre>
 +
handle_sphericity_violation <- function(data, subject_var, time_var, response_var) {
 +
  cat("=== Handling Sphericity Violation ===\n")
 +
 
 +
  # 1. Use Greenhouse-Geisser or Huynh-Feldt corrections
 +
  cat("\n1. Apply epsilon corrections:\n")
 +
  # (Implementation shown in check_ranova_assumptions function above)
 +
 
 +
  # 2. Use multivariate approach (MANOVA)
 +
  cat("\n2. Multivariate approach (MANOVA):\n")
 +
 
 +
  # Convert to wide format
 +
  wide_data <- reshape(data,
 +
                      idvar = subject_var,
 +
                      timevar = time_var,
 +
                      direction = "wide")
 +
 
 +
  # Extract response columns
 +
  response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
 +
  response_matrix <- as.matrix(wide_data[, response_cols])
 +
 
 +
  # Fit MANOVA
 +
  manova_result <- manova(response_matrix ~ 1)
 +
 
 +
  # Test using Pillai's trace (most robust)
 +
  manova_summary <- summary(manova_result, test = "Pillai")
 +
  print(manova_summary)
 +
 
 +
  # 3. Use linear mixed models with unstructured covariance
 +
  cat("\n3. Linear mixed model with unstructured covariance:\n")
 +
  library(nlme)
 +
 
 +
  # Fit model with unstructured covariance
 +
  model_unstructured <- lme(as.formula(paste(response_var, "~", time_var)),
 +
                          random = ~ 1 | subject_var,
 +
                          correlation = corSymm(form = ~ as.numeric(get(time_var)) | subject_var),
 +
                          data = data,
 +
                          control = lmeControl(opt = "optim"))
 +
 
 +
  print(summary(model_unstructured))
 +
 
 +
  # Compare with compound symmetry model
 +
  model_compound <- lme(as.formula(paste(response_var, "~", time_var)),
 +
                      random = ~ 1 | subject_var,
 +
                      correlation = corCompSymm(form = ~ 1 | subject_var),
 +
                      data = data)
 +
 
 +
  cat("\nModel comparison (unstructured vs. compound symmetry):\n")
 +
  print(anova(model_unstructured, model_compound))
 +
}
 +
</pre>
  
 +
===Problems and Exercises===
  
 +
1. **Conceptual Problems**:
 +
  a) Derive the expected mean squares for the rANOVA model
 +
  b) Prove that the F-test for time effects in rANOVA follows an F-distribution under the null hypothesis
 +
  c) Show how the intraclass correlation coefficient relates to the variance components in the mixed model
  
 +
2. **Applied Problems**:
 +
  a) Analyze the [SOCR LDT dataset](http://wiki.socr.umich.edu/index.php/SOCR_Data_072108_LDT) using rANOVA
 +
  b) Conduct a repeated measures analysis of the [ChickWeight dataset](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ChickWeight.html) with diet as a between-subjects factor
 +
  c) Perform power analysis for a planned longitudinal study with 5 time points, expecting a medium effect (f² = 0.15), and correlation ρ = 0.6
  
 +
3. **Simulation Study**:
 +
<pre>
 +
# Simulation to compare traditional rANOVA and linear mixed models
 +
simulate_rm_comparison <- function(n_sim = 500, n = 20, k = 4, missing_rate = 0.1) {
 +
 
 +
  type1_error_aov <- numeric(n_sim)
 +
  type1_error_lmm <- numeric(n_sim)
 +
 
 +
  for (sim in 1:n_sim) {
 +
    # Simulate data under null hypothesis (no time effect)
 +
    data <- expand.grid(
 +
      subject = factor(1:n),
 +
      time = factor(1:k)
 +
    )
 +
   
 +
    # Random intercept model
 +
    subject_effects <- rnorm(n, 0, 5)
 +
    data$response <- rep(subject_effects, each = k) + rnorm(n * k, 0, 2)
 +
   
 +
    # Introduce missing data
 +
    if (missing_rate > 0) {
 +
      missing_indices <- sample(1:nrow(data), size = round(nrow(data) * missing_rate))
 +
      data$response[missing_indices] <- NA
 +
    }
 +
   
 +
    # Method 1: Traditional rANOVA (complete cases only)
 +
    complete_data <- data[complete.cases(data), ]
 +
    complete_subjects <- length(unique(complete_data$subject))
 +
   
 +
    if (complete_subjects >= 3) {
 +
      try({
 +
        model_aov <- aov(response ~ time + Error(subject/time), data = complete_data)
 +
        aov_summary <- summary(model_aov)
 +
        p_aov <- aov_summary[[2]][[1]]$"Pr(>F)"[1]
 +
        type1_error_aov[sim] <- p_aov < 0.05
 +
      }, silent = TRUE)
 +
    }
 +
   
 +
    # Method 2: Linear mixed model (uses all available data)
 +
    try({
 +
      library(lmerTest)
 +
      model_lmm <- lmer(response ~ time + (1|subject), data = data,
 +
                        na.action = na.omit)
 +
      p_lmm <- anova(model_lmm)$"Pr(>F)"[1]
 +
      type1_error_lmm[sim] <- p_lmm < 0.05
 +
    }, silent = TRUE)
 +
  }
 +
 
 +
  cat("Simulation Results (n =", n_sim, "):\n")
 +
  cat("Type I error rate - Traditional rANOVA:",
 +
      mean(type1_error_aov, na.rm = TRUE), "\n")
 +
  cat("Type I error rate - Linear Mixed Model:",
 +
      mean(type1_error_lmm, na.rm = TRUE), "\n")
 +
 
 +
  return(data.frame(
 +
    simulation = 1:n_sim,
 +
    aov_error = type1_error_aov,
 +
    lmm_error = type1_error_lmm
 +
  ))
 +
}
  
 +
# Run simulation
 +
sim_results <- simulate_rm_comparison(n_sim = 200, n = 15, k = 3, missing_rate = 0.1)
 +
</pre>
  
 +
===References===
  
 +
1. Maxwell, S. E., & Delaney, H. D. (2004). *Designing Experiments and Analyzing Data: A Model Comparison Perspective* (2nd ed.). Psychology Press.
 +
2. Pinheiro, J. C., & Bates, D. M. (2000). *Mixed-Effects Models in S and S-PLUS*. Springer.
 +
3. Verbeke, G., & Molenberghs, G. (2000). *Linear Mixed Models for Longitudinal Data*. Springer.
 +
4. Field, A., Miles, J., & Field, Z. (2012). *Discovering Statistics Using R*. Sage Publications.
 +
5. Gueorguieva, R., & Krystal, J. H. (2004). Move over ANOVA: Progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry. *Archives of General Psychiatry, 61*(3), 310-317.
  
 +
####Online Resources####
 +
* [https://cran.r-project.org/web/views/RepeatedMeasures.html Repeated Measures Analysis in R]
 +
* [https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf Linear Mixed Models with lme4]
 +
* [https://www.socr.umich.edu SOCR Resources and Datasets]
 +
* [https://stats.idre.ucla.edu/r/seminars/repeated-measures-analysis-with-r/ UCLA R Seminar on Repeated Measures]
  
 
<hr>
 
<hr>
* SOCR Home page: http://www.socr.umich.edu
+
* SOCR Home page: https://www.socr.umich.edu
  
 
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_rANOVA}}
 
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_rANOVA}}

Revision as of 12:23, 9 December 2025

Scientific Methods for Health Sciences - Repeated Measures Analysis of Variance (rANOVA)

Overview

Repeated Measures Analysis of Variance (rANOVA) is a statistical method for analyzing data where the same subjects are measured multiple times under different conditions or at different time points. Unlike independent ANOVA, rANOVA accounts for the correlation between repeated measurements on the same subject, increasing statistical power by reducing error variance attributable to individual differences. This section provides a comprehensive treatment of rANOVA, including its mathematical foundations, assumptions, modern alternatives, and practical implementation in R.

Motivation

Consider a clinical trial measuring blood pressure in patients at baseline, 3 months, and 6 months post-treatment. Using independent ANOVA would violate the independence assumption since measurements are correlated within subjects. rANOVA addresses this by: 1. **Partitioning variance** into between-subjects and within-subjects components 2. **Increasing statistical power** by reducing error variance 3. **Accounting for individual differences** through subject-specific effects 4. **Enabling longitudinal analysis** of change over time

For studies with missing data, complex covariance structures, or unbalanced designs, Linear Mixed Models (LMMs) provide a more flexible alternative to traditional rANOVA.

Theory

1) Mathematical Model

Univariate Approach

For \(n\) subjects measured at \(k\) time points, the univariate rANOVA model is: \[ y_{ij} = \mu + \pi_i + \tau_j + \varepsilon_{ij}, \quad i=1,\ldots,n,\; j=1,\ldots,k \] where: - \(y_{ij}\) is the measurement for subject \(i\) at time \(j\) - \(\mu\) is the overall mean - \(\pi_i \sim N(0, \sigma_{\pi}^2)\) is the random subject effect - \(\tau_j\) is the fixed time effect (\(\sum_{j=1}^k \tau_j = 0\)) - \(\varepsilon_{ij} \sim N(0, \sigma_{\varepsilon}^2)\) is the within-subject error

The covariance structure assumes: \[ \text{Cov}(y_{ij}, y_{i'j'}) = \begin{cases} \sigma_{\pi}^2 + \sigma_{\varepsilon}^2 & \text{if } i=i', j=j' \\ \sigma_{\pi}^2 & \text{if } i=i', j \neq j' \\ 0 & \text{if } i \neq i' \end{cases} \]

Matrix Formulation

Let \(\mathbf{y}_i = (y_{i1}, \ldots, y_{ik})^\top\) be the vector of measurements for subject \(i\): \[ \mathbf{y}_i = \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol{\varepsilon}_i \] where: - \(\mathbf{X}_i\) is the design matrix for fixed effects - \(\boldsymbol{\beta}\) contains fixed effects (including time effects) - \(\mathbf{Z}_i\) is the design matrix for random effects - \(\mathbf{b}_i \sim N(\mathbf{0}, \mathbf{G})\) are random subject effects - \(\boldsymbol{\varepsilon}_i \sim N(\mathbf{0}, \mathbf{R}_i)\) is the within-subject error

For compound symmetry structure: \[ \text{Var}(\mathbf{y}_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^\top + \mathbf{R}_i = \sigma_{\pi}^2\mathbf{J}_k + \sigma_{\varepsilon}^2\mathbf{I}_k \] where \(\mathbf{J}_k\) is a \(k \times k\) matrix of ones and \(\mathbf{I}_k\) is the identity matrix.

2) Variance Partitioning

The total sum of squares decomposes as: \[ SST = SS_{BetweenSubjects} + SS_{WithinSubjects} \] \[ SS_{WithinSubjects} = SS_{Time} + SS_{Error} \]

More formally: \[ \begin{aligned} SST &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Subjects} &= k \sum_{i=1}^n (\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Time} &= n \sum_{j=1}^k (\bar{y}_{\cdot j} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Error} &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{i\cdot} - \bar{y}_{\cdot j} + \bar{y}_{\cdot\cdot})^2 \end{aligned} \]

Degrees of freedom: \[ \begin{aligned} df_{Total} &= nk - 1 \\ df_{Subjects} &= n - 1 \\ df_{Time} &= k - 1 \\ df_{Error} &= (n - 1)(k - 1) \end{aligned} \]

Mean squares: \[ \begin{aligned} MS_{Subjects} &= SS_{Subjects} / (n-1) \\ MS_{Time} &= SS_{Time} / (k-1) \\ MS_{Error} &= SS_{Error} / ((n-1)(k-1)) \end{aligned} \]

F-statistic for time effect: \[ F = \frac{MS_{Time}}{MS_{Error}} \sim F_{k-1, (n-1)(k-1)} \quad \text{under } H_0: \tau_1 = \cdots = \tau_k \]

3) Sphericity and Corrections

The **sphericity assumption** (also called **circularity**) requires: \[ \text{Var}(y_{ij} - y_{ij'}) = \text{constant for all } j \neq j' \] Equivalently, the covariance matrix of the difference scores has equal variances and equal covariances.

Mauchly's Test for Sphericity: \[ W = \frac{| \mathbf{E} |}{(\text{tr}(\mathbf{E})/p)^{p}} \sim \chi^2_{(p(p-1)/2 - 1)} \] where \(\mathbf{E}\) is the \(p \times p\) covariance matrix of the difference scores, \(p = k-1\).

When sphericity is violated, corrections adjust degrees of freedom:

1. **Greenhouse-Geisser correction**: \(\hat{\epsilon} = \frac{(\text{tr}(\mathbf{E}))^2}{(p-1)\sum_{i=1}^p \lambda_i^2}\)

  \[
   F_{GG} \sim F_{\hat{\epsilon}(k-1), \hat{\epsilon}(n-1)(k-1)}
   \]

2. **Huynh-Feldt correction**: \(\tilde{\epsilon} = \frac{n(k-1)\hat{\epsilon} - 2}{(k-1)[(n-1) - (k-1)\hat{\epsilon}]}\)

  \[
   F_{HF} \sim F_{\tilde{\epsilon}(k-1), \tilde{\epsilon}(n-1)(k-1)}
   \]

4) Multivariate Approach (MANOVA)

For \(k\) time points, consider the \((k-1) \times 1\) vector of difference scores \(\mathbf{d}_i = (y_{i2}-y_{i1}, \ldots, y_{ik}-y_{i,k-1})^\top\): \[ \mathbf{d}_i \sim N_{k-1}(\boldsymbol{\mu}_d, \boldsymbol{\Sigma}_d) \] Test \(H_0: \boldsymbol{\mu}_d = \mathbf{0}\) using Hotelling's \(T^2\): \[ T^2 = n\bar{\mathbf{d}}^\top \mathbf{S}_d^{-1} \bar{\mathbf{d}} \sim \frac{(n-1)(k-1)}{n-k+1} F_{k-1, n-k+1} \] where \(\bar{\mathbf{d}}\) is the sample mean and \(\mathbf{S}_d\) is the sample covariance of difference scores.

Assumptions and Diagnostics

Key Assumptions: 1. **Normality**: Residuals \(\varepsilon_{ij}\) should be normally distributed 2. **Sphericity**: Equal variances of differences between all pairs of repeated measures 3. **Missing Data**: Missing completely at random (MCAR) for valid inference 4. **No Time × Subject Interaction**: Treatment effects are consistent across subjects

Diagnostic Procedures:

# Comprehensive rANOVA diagnostic function
check_ranova_assumptions <- function(data, subject_var, time_var, response_var) {
  
  cat("=== REPEATED MEASURES ANOVA DIAGNOSTICS ===\n")
  
  # 1. Normality of residuals
  # Fit a linear model ignoring the repeated measures structure first
  model_lm <- lm(as.formula(paste(response_var, "~", time_var)), data = data)
  residuals <- resid(model_lm)
  
  par(mfrow = c(2, 3))
  
  # Q-Q plot
  qqnorm(residuals, main = "Q-Q Plot of Residuals")
  qqline(residuals)
  
  # Shapiro-Wilk test
  shapiro_test <- shapiro.test(residuals)
  cat("1. Normality (Shapiro-Wilk): W =", round(shapiro_test$statistic, 4),
      "p =", format.pval(shapiro_test$p.value, digits = 4), "\n")
  
  # 2. Sphericity test (requires ez package)
  if (require(ez, quietly = TRUE)) {
    # Convert to wide format for Mauchly's test
    wide_data <- reshape(data, 
                         idvar = subject_var, 
                         timevar = time_var, 
                         direction = "wide")
    
    # Remove subject column
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
    response_matrix <- as.matrix(wide_data[, response_cols])
    
    # Mauchly's test using car package
    if (require(car, quietly = TRUE)) {
      # Create an idata frame for repeated measures
      time_levels <- unique(data[[time_var]])
      idata <- data.frame(time = factor(time_levels))
      
      # Fit linear model for repeated measures
      mlm <- lm(response_matrix ~ 1)
      
      # Mauchly's test
      mauchly_test <- Anova(mlm, idata = idata, idesign = ~time, type = "III")
      mauchly_result <- mauchly_test$sphericity.tests
      
      cat("\n2. Mauchly's Test for Sphericity:\n")
      print(mauchly_result)
      
      # Greenhouse-Geisser and Huynh-Feldt corrections
      epsilon_GG <- mauchly_result["Greenhouse-Geisser epsilon", "GG eps"]
      epsilon_HF <- mauchly_result["Huynh-Feldt epsilon", "HF eps"]
      cat("\nEpsilon corrections:\n")
      cat("  Greenhouse-Geisser:", round(epsilon_GG, 3), "\n")
      cat("  Huynh-Feldt:", round(epsilon_HF, 3), "\n")
    }
  }
  
  # 3. Homogeneity of variance across time points
  boxplot(as.formula(paste(response_var, "~", time_var)), 
          data = data,
          main = "Boxplots by Time Point",
          xlab = time_var, ylab = response_var)
  
  # Levene's test for homogeneity of variance
  if (require(car, quietly = TRUE)) {
    levene_test <- leveneTest(as.formula(paste(response_var, "~ factor(", time_var, ")")), 
                              data = data)
    cat("\n3. Levene's Test for Homogeneity of Variance:\n")
    print(levene_test)
  }
  
  # 4. Individual profiles plot
  data_sorted <- data[order(data[[subject_var]], data[[time_var]]), ]
  
  # Create empty plot
  plot(1, type = "n", 
       xlim = range(as.numeric(data[[time_var]])), 
       ylim = range(data[[response_var]]),
       xlab = time_var, ylab = response_var,
       main = "Individual Profiles")
  
  # Plot each subject's profile
  subjects <- unique(data[[subject_var]])
  colors <- rainbow(length(subjects))
  
  for (i in seq_along(subjects)) {
    subject_data <- data[data[[subject_var]] == subjects[i], ]
    lines(as.numeric(subject_data[[time_var]]), 
          subject_data[[response_var]], 
          col = colors[i], type = "b", lwd = 1)
  }
  
  # Add group mean profile
  group_means <- aggregate(as.formula(paste(response_var, "~", time_var)), 
                          data = data, mean)
  lines(as.numeric(group_means[[time_var]]), 
        group_means[[response_var]], 
        col = "black", lwd = 3, type = "b")
  
  legend("topright", legend = c("Individual", "Group Mean"), 
         col = c("gray", "black"), lwd = c(1, 3), bty = "n")
  
  # 5. Residuals vs fitted plot
  plot(fitted(model_lm), residuals,
       xlab = "Fitted Values", ylab = "Residuals",
       main = "Residuals vs Fitted")
  abline(h = 0, col = "red", lty = 2)
  
  # 6. Histogram of residuals
  hist(residuals, main = "Histogram of Residuals",
       xlab = "Residuals", col = "lightblue", freq = FALSE)
  curve(dnorm(x, mean = mean(residuals), sd = sd(residuals)), 
        add = TRUE, col = "red", lwd = 2)
  
  par(mfrow = c(1, 1))
  
  # Return diagnostic summary
  return(list(
    shapiro_test = shapiro_test,
    sphericity_tests = if(exists("mauchly_result")) mauchly_result else NULL,
    homogeneity_test = if(exists("levene_test")) levene_test else NULL
  ))
}

Applications

Example 1: Exercise Training Intervention

# Recreate the example from the motivation section
exercise_data <- data.frame(
  subject = factor(rep(1:6, each = 3)),
  time = factor(rep(c("Pre", "3Months", "6Months"), 6)),
  fitness = c(45, 50, 55,    # Subject 1
              42, 42, 45,    # Subject 2
              36, 41, 43,    # Subject 3
              39, 35, 40,    # Subject 4
              51, 55, 59,    # Subject 5
              44, 49, 56)    # Subject 6
)

cat("=== Exercise Training Intervention Study ===\n")
cat("Research question: Does fitness level change over time?\n\n")

# Display the data
cat("Dataset:\n")
print(exercise_data)

# Summary statistics
cat("\nSummary statistics by time:\n")
print(aggregate(fitness ~ time, data = exercise_data, 
                FUN = function(x) c(mean = mean(x), sd = sd(x))))

# Run diagnostics first
cat("\n=== Model Diagnostics ===\n")
diagnostics <- check_ranova_assumptions(exercise_data, "subject", "time", "fitness")

# Traditional rANOVA using aov()
cat("\n=== Traditional rANOVA ===\n")
model_aov <- aov(fitness ~ time + Error(subject/time), data = exercise_data)
print(summary(model_aov))

# Extract F-statistic and p-value
aov_summary <- summary(model_aov)
F_time <- aov_summary[[2]][[1]]$"F value"[1]
df_time <- aov_summary[[2]][[1]]$"Df"[1]
df_error <- aov_summary[[2]][[1]]$"Df"[2]
p_time <- aov_summary[[2]][[1]]$"Pr(>F)"[1]

cat(sprintf("\nF(%d, %d) = %.2f, p = %.3f\n", 
            df_time, df_error, F_time, p_time))

# Effect size calculation (Partial eta-squared)
SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
eta_sq <- SS_time / (SS_time + SS_error)
cat("Partial eta-squared (effect size):", round(eta_sq, 3), "\n")

# Post-hoc pairwise comparisons with Bonferroni correction
cat("\n=== Post-hoc Pairwise Comparisons ===\n")
library(emmeans)

# Fit linear model for post-hoc tests
model_lm <- lm(fitness ~ time + subject, data = exercise_data)
emm_time <- emmeans(model_lm, specs = pairwise ~ time, adjust = "bonferroni")
print(emm_time$contrasts)

# Visualization
library(ggplot2)
ggplot(exercise_data, aes(x = time, y = fitness, group = subject)) +
  geom_line(aes(color = subject), alpha = 0.6) +
  geom_point(aes(color = subject), size = 2) +
  stat_summary(aes(group = 1), fun = mean, geom = "line", 
               color = "black", size = 1.5) +
  stat_summary(fun.data = mean_se, geom = "errorbar", 
               width = 0.1, color = "black") +
  labs(title = "Fitness Levels Over Time",
       subtitle = "Individual profiles with group mean",
       x = "Time Point", y = "Fitness Score") +
  theme_minimal()

Example 2: Clinical Trial with Missing Data (Using Mixed Models)

# Simulated clinical trial data with missing values
set.seed(123)
n_subjects <- 30
n_timepoints <- 4

clinical_data <- expand.grid(
  subject = factor(1:n_subjects),
  time = factor(paste0("Week_", 0:3)),
  treatment = factor(rep(c("Drug", "Placebo"), each = n_subjects/2))
)

# Generate data with treatment effect and time trend
clinical_data$response <- with(clinical_data, 
  50 +  # Baseline
  ifelse(treatment == "Drug", 5, 0) +  # Treatment effect
  as.numeric(time) * 2 +  # Time effect (2 units per week)
  ifelse(treatment == "Drug", as.numeric(time) * 1, 0) +  # Treatment × time interaction
  rep(rnorm(n_subjects, 0, 3), each = n_timepoints) +  # Random intercept
  rnorm(n_subjects * n_timepoints, 0, 2)  # Residual error
)

# Introduce missing data (MCAR)
missing_indices <- sample(1:nrow(clinical_data), size = 20)
clinical_data$response[missing_indices] <- NA

cat("=== Clinical Trial with Repeated Measures ===\n")
cat("Research question: Does the treatment affect response over time?\n\n")

cat("Missing data pattern:\n")
print(table(is.na(clinical_data$response)))

# Approach 1: Linear Mixed Model (recommended for missing data)
cat("\n=== Linear Mixed Model Approach ===\n")
library(lme4)
library(lmerTest)

# Fit mixed model with random intercepts
mixed_model <- lmer(response ~ time * treatment + (1|subject), 
                    data = clinical_data, 
                    na.action = na.omit)

cat("\nModel summary:\n")
print(summary(mixed_model))

# Test fixed effects
cat("\nTests of fixed effects:\n")
print(anova(mixed_model))

# Random effects variance components
cat("\nVariance components:\n")
print(VarCorr(mixed_model))

# Estimated marginal means
if (require(emmeans)) {
  cat("\n=== Estimated Marginal Means ===\n")
  emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
  print(emm_treatment)
  
  emm_time <- emmeans(mixed_model, specs = ~ time)
  print(emm_time)
  
  # Interaction plot
  emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
  print(emm_interaction)
  
  # Plot interaction
  plot_data <- as.data.frame(emm_interaction)
  ggplot(plot_data, aes(x = time, y = emmean, group = treatment, color = treatment)) +
    geom_line(size = 1.5) +
    geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.1) +
    geom_point(size = 3) +
    labs(title = "Treatment × Time Interaction",
         subtitle = "Estimated marginal means with 95% confidence intervals",
         x = "Time Point", y = "Response") +
    theme_minimal()
}

# Approach 2: Traditional rANOVA (for comparison)
cat("\n=== Traditional rANOVA (for comparison) ===\n")
# Remove subjects with missing data for traditional ANOVA
complete_data <- clinical_data[complete.cases(clinical_data), ]
complete_subjects <- unique(complete_data$subject)
cat("Subjects with complete data:", length(complete_subjects), "\n")

if (length(complete_subjects) >= 3) {
  model_aov2 <- aov(response ~ time * treatment + Error(subject/time), 
                    data = complete_data)
  print(summary(model_aov2))
} else {
  cat("Insufficient complete cases for traditional rANOVA\n")
}

Example 3: Real Dataset - CO2 Uptake in Plants

# Using built-in CO2 dataset from R
data(CO2)
cat("=== CO2 Uptake in Plants ===\n")
cat("Research question: Does CO2 uptake change with concentration?\n\n")

# Prepare data: focus on Quebec plants, type Qn1
co2_data <- CO2[CO2$Type == "Quebec" & CO2$Plant == "Qn1", ]
co2_data$conc_factor <- factor(co2_data$conc)

cat("Data structure:\n")
str(co2_data)

# Visual inspection
library(ggplot2)
ggplot(co2_data, aes(x = conc, y = uptake, group = 1)) +
  geom_line() +
  geom_point() +
  labs(title = "CO2 Uptake by Concentration",
       x = "CO2 Concentration (mL/L)", y = "Uptake (umol/m^2 sec)") +
  theme_minimal()

# Since we have only one plant, this is a single-subject repeated measures design
# We'll use a polynomial contrast analysis

cat("\n=== Polynomial Trend Analysis ===\n")
# Create orthogonal polynomial contrasts for concentration
conc_levels <- length(unique(co2_data$conc))
poly_contrasts <- contr.poly(conc_levels)

# Apply contrasts
co2_data$conc_factor <- factor(co2_data$conc)
contrasts(co2_data$conc_factor) <- poly_contrasts

# Fit model with polynomial contrasts
model_poly <- lm(uptake ~ conc_factor, data = co2_data)
print(summary(model_poly))

# Extract polynomial components
anova_poly <- anova(model_poly)
cat("\nPolynomial decomposition:\n")
print(anova_poly)

# Calculate proportion of variance explained by each polynomial component
SS_total <- sum(anova_poly$`Sum Sq`)
for (i in 1:nrow(anova_poly)) {
  prop_var <- anova_poly$`Sum Sq`[i] / SS_total
  cat(sprintf("Component %d: %.1f%% of variance\n", 
              i, prop_var * 100))
}

Advanced Topics

1) Power Analysis for Repeated Measures

# Power analysis for repeated measures designs
calculate_power_rm <- function(n, k, rho, f2, alpha = 0.05) {
  # n: number of subjects
  # k: number of time points
  # rho: correlation between repeated measures
  # f2: effect size (Cohen's f^2)
  
  # Degrees of freedom
  df1 <- k - 1
  df2 <- (n - 1) * (k - 1)
  
  # Non-centrality parameter
  lambda <- n * k * f2
  
  # Critical F value
  F_crit <- qf(1 - alpha, df1, df2)
  
  # Power
  power <- 1 - pf(F_crit, df1, df2, ncp = lambda)
  
  return(power)
}

# Example: Clinical trial with 20 subjects, 4 time points
power_example <- calculate_power_rm(n = 20, k = 4, rho = 0.5, f2 = 0.25)
cat("Power for repeated measures design:\n")
cat("Subjects: 20, Time points: 4, Effect size f² = 0.25\n")
cat("Estimated power:", round(power_example, 3), "\n")

# Power curve visualization
library(ggplot2)
n_range <- seq(10, 50, by = 5)
power_values <- sapply(n_range, function(n) calculate_power_rm(n, 4, 0.5, 0.25))

power_df <- data.frame(n = n_range, power = power_values)
ggplot(power_df, aes(x = n, y = power)) +
  geom_line(size = 1.5, color = "blue") +
  geom_point(size = 3) +
  geom_hline(yintercept = 0.8, linetype = "dashed", color = "red") +
  labs(title = "Power Analysis for Repeated Measures Design",
       subtitle = "k = 4 time points, ρ = 0.5, f² = 0.25",
       x = "Number of Subjects", y = "Statistical Power") +
  theme_minimal()

2) Handling Autocorrelated Errors

# When measurements are equally spaced in time, errors may be autocorrelated
library(nlme)

# Simulate data with AR(1) errors
set.seed(456)
n <- 15
k <- 5
time_points <- 1:k

ar_data <- data.frame(
  subject = rep(1:n, each = k),
  time = rep(time_points, n),
  treatment = rep(rep(c("A", "B"), length.out = n), each = k)
)

# Generate data with AR(1) errors
for (i in 1:n) {
  ar_errors <- arima.sim(model = list(ar = 0.7), n = k, sd = 2)
  idx <- ((i-1)*k + 1):(i*k)
  ar_data$response[idx] <- 50 + 
    ifelse(ar_data$treatment[idx] == "A", 5, 0) +
    ar_data$time[idx] * 2 +
    ar_errors
}

# Fit model with AR(1) correlation structure
model_ar1 <- lme(response ~ time * treatment, 
                 random = ~ 1 | subject,
                 correlation = corAR1(form = ~ time | subject),
                 data = ar_data)

cat("=== Model with AR(1) Correlation Structure ===\n")
print(summary(model_ar1))
cat("\nCorrelation parameter:", intervals(model_ar1)$corStruct[2], "\n")

# Compare with compound symmetry model
model_cs <- lme(response ~ time * treatment, 
                random = ~ 1 | subject,
                correlation = corCompSymm(form = ~ 1 | subject),
                data = ar_data)

# Model comparison
cat("\n=== Model Comparison ===\n")
print(anova(model_ar1, model_cs))

Software Implementation

# Comprehensive rANOVA analysis function
run_ranova_analysis <- function(data, subject_var, time_var, response_var, 
                                between_vars = NULL, method = "mixed") {
  
  cat("=== REPEATED MEASURES ANOVA ANALYSIS ===\n\n")
  cat("Subject variable:", subject_var, "\n")
  cat("Time variable:", time_var, "\n")
  cat("Response variable:", response_var, "\n")
  if (!is.null(between_vars)) {
    cat("Between-subjects variables:", paste(between_vars, collapse = ", "), "\n")
  }
  cat("Method:", method, "\n\n")
  
  # Check for missing data
  missing_count <- sum(is.na(data[[response_var]]))
  cat("Missing observations:", missing_count, 
      sprintf("(%.1f%%)\n", missing_count/nrow(data)*100))
  
  if (method == "mixed") {
    # Linear mixed model approach
    library(lme4)
    library(lmerTest)
    
    # Build formula
    if (is.null(between_vars)) {
      formula_str <- paste(response_var, "~", time_var, "+ (1|", subject_var, ")")
    } else {
      between_str <- paste(between_vars, collapse = " + ")
      # Add interactions with time
      interaction_str <- paste(paste(time_var, between_vars, sep = ":"), collapse = " + ")
      formula_str <- paste(response_var, "~", time_var, "*", between_str, 
                          "+ (1|", subject_var, ")")
    }
    
    cat("Fitting model:", formula_str, "\n\n")
    model <- lmer(as.formula(formula_str), data = data)
    
    cat("--- Model Summary ---\n")
    print(summary(model))
    
    cat("\n--- Tests of Fixed Effects ---\n")
    print(anova(model))
    
    # Variance components
    cat("\n--- Variance Components ---\n")
    print(VarCorr(model))
    
    # Calculate intraclass correlation coefficient (ICC)
    vc <- VarCorr(model)
    sigma_subject <- as.data.frame(vc)$vcov[1]
    sigma_residual <- as.data.frame(vc)$vcov[2]
    ICC <- sigma_subject / (sigma_subject + sigma_residual)
    cat("\nIntraclass Correlation Coefficient (ICC):", round(ICC, 3), "\n")
    
  } else if (method == "traditional") {
    # Traditional rANOVA using aov()
    if (!is.null(between_vars)) {
      formula_str <- paste(response_var, "~", 
                          paste(between_vars, collapse = " + "), 
                          "+ Error(", subject_var, "/", time_var, ")")
    } else {
      formula_str <- paste(response_var, "~", time_var, 
                          "+ Error(", subject_var, "/", time_var, ")")
    }
    
    cat("Fitting model:", formula_str, "\n\n")
    model <- aov(as.formula(formula_str), data = data)
    
    cat("--- ANOVA Table ---\n")
    print(summary(model))
  }
  
  # Post-hoc tests if there's a significant time effect
  cat("\n--- Post-hoc Comparisons ---\n")
  if (require(emmeans)) {
    if (method == "mixed") {
      emm_time <- emmeans(model, specs = pairwise ~ time_var)
    } else {
      # For traditional ANOVA, fit a linear model for post-hoc tests
      if (is.null(between_vars)) {
        lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var))
      } else {
        lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var,
                                      "+", paste(between_vars, collapse = " + ")))
      }
      lm_model <- lm(lm_formula, data = data)
      emm_time <- emmeans(lm_model, specs = pairwise ~ time_var, adjust = "tukey")
    }
    print(emm_time$contrasts)
  }
  
  # Effect sizes
  cat("\n--- Effect Sizes ---\n")
  if (method == "traditional") {
    aov_summary <- summary(model)
    if (!is.null(between_vars)) {
      # This is more complex with between-subjects factors
      cat("Note: Effect size calculation for factorial rANOVA requires manual computation\n")
    } else {
      SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
      SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
      eta_sq <- SS_time / (SS_time + SS_error)
      cat("Partial eta-squared:", round(eta_sq, 3), "\n")
    }
  }
  
  return(model)
}

# Example usage with built-in dataset
cat("\n\n=== EXAMPLE: Chick Weight Dataset ===\n")
data(ChickWeight)
ChickWeight$Time <- factor(ChickWeight$Time)
ChickWeight$Chick <- factor(ChickWeight$Chick)

# Run analysis
result <- run_ranova_analysis(
  data = ChickWeight,
  subject_var = "Chick",
  time_var = "Time",
  response_var = "weight",
  between_vars = "Diet",
  method = "mixed"
)

Common Issues and Solutions

        1. 1) Missing Data Patterns
# Identify missing data patterns
library(mice)

analyze_missing_patterns <- function(data, subject_var, time_var, response_var) {
  # Convert to wide format
  wide_data <- reshape(data, 
                       idvar = subject_var, 
                       timevar = time_var, 
                       direction = "wide")
  
  # Create pattern matrix
  pattern_matrix <- md.pattern(wide_data[, -1], plot = FALSE)
  
  cat("Missing Data Patterns:\n")
  print(pattern_matrix)
  
  # Visualize missingness
  if (require(VIM)) {
    aggr_plot <- aggr(wide_data[, -1], 
                      numbers = TRUE, 
                      sortVars = TRUE, 
                      labels = names(wide_data)[-1],
                      cex.axis = 0.7, 
                      gap = 3, 
                      ylab = c("Missing Data Pattern", "Frequency"))
  }
  
  # Test for MCAR using Little's test
  if (require(norm2)) {
    # Prepare data for Little's test
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
    test_data <- wide_data[, response_cols]
    
    # Perform Little's MCAR test
    little_test <- mcar.test(test_data)
    cat("\nLittle's MCAR Test:\n")
    print(little_test)
  }
  
  return(pattern_matrix)
}

# Example
missing_pattern <- analyze_missing_patterns(clinical_data, "subject", "time", "response")
        1. 2) Violation of Sphericity
handle_sphericity_violation <- function(data, subject_var, time_var, response_var) {
  cat("=== Handling Sphericity Violation ===\n")
  
  # 1. Use Greenhouse-Geisser or Huynh-Feldt corrections
  cat("\n1. Apply epsilon corrections:\n")
  # (Implementation shown in check_ranova_assumptions function above)
  
  # 2. Use multivariate approach (MANOVA)
  cat("\n2. Multivariate approach (MANOVA):\n")
  
  # Convert to wide format
  wide_data <- reshape(data, 
                       idvar = subject_var, 
                       timevar = time_var, 
                       direction = "wide")
  
  # Extract response columns
  response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
  response_matrix <- as.matrix(wide_data[, response_cols])
  
  # Fit MANOVA
  manova_result <- manova(response_matrix ~ 1)
  
  # Test using Pillai's trace (most robust)
  manova_summary <- summary(manova_result, test = "Pillai")
  print(manova_summary)
  
  # 3. Use linear mixed models with unstructured covariance
  cat("\n3. Linear mixed model with unstructured covariance:\n")
  library(nlme)
  
  # Fit model with unstructured covariance
  model_unstructured <- lme(as.formula(paste(response_var, "~", time_var)),
                           random = ~ 1 | subject_var,
                           correlation = corSymm(form = ~ as.numeric(get(time_var)) | subject_var),
                           data = data,
                           control = lmeControl(opt = "optim"))
  
  print(summary(model_unstructured))
  
  # Compare with compound symmetry model
  model_compound <- lme(as.formula(paste(response_var, "~", time_var)),
                       random = ~ 1 | subject_var,
                       correlation = corCompSymm(form = ~ 1 | subject_var),
                       data = data)
  
  cat("\nModel comparison (unstructured vs. compound symmetry):\n")
  print(anova(model_unstructured, model_compound))
}

Problems and Exercises

1. **Conceptual Problems**:

  a) Derive the expected mean squares for the rANOVA model
  b) Prove that the F-test for time effects in rANOVA follows an F-distribution under the null hypothesis
  c) Show how the intraclass correlation coefficient relates to the variance components in the mixed model

2. **Applied Problems**:

  a) Analyze the [SOCR LDT dataset](http://wiki.socr.umich.edu/index.php/SOCR_Data_072108_LDT) using rANOVA
  b) Conduct a repeated measures analysis of the [ChickWeight dataset](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ChickWeight.html) with diet as a between-subjects factor
  c) Perform power analysis for a planned longitudinal study with 5 time points, expecting a medium effect (f² = 0.15), and correlation ρ = 0.6

3. **Simulation Study**:

# Simulation to compare traditional rANOVA and linear mixed models
simulate_rm_comparison <- function(n_sim = 500, n = 20, k = 4, missing_rate = 0.1) {
  
  type1_error_aov <- numeric(n_sim)
  type1_error_lmm <- numeric(n_sim)
  
  for (sim in 1:n_sim) {
    # Simulate data under null hypothesis (no time effect)
    data <- expand.grid(
      subject = factor(1:n),
      time = factor(1:k)
    )
    
    # Random intercept model
    subject_effects <- rnorm(n, 0, 5)
    data$response <- rep(subject_effects, each = k) + rnorm(n * k, 0, 2)
    
    # Introduce missing data
    if (missing_rate > 0) {
      missing_indices <- sample(1:nrow(data), size = round(nrow(data) * missing_rate))
      data$response[missing_indices] <- NA
    }
    
    # Method 1: Traditional rANOVA (complete cases only)
    complete_data <- data[complete.cases(data), ]
    complete_subjects <- length(unique(complete_data$subject))
    
    if (complete_subjects >= 3) {
      try({
        model_aov <- aov(response ~ time + Error(subject/time), data = complete_data)
        aov_summary <- summary(model_aov)
        p_aov <- aov_summary[[2]][[1]]$"Pr(>F)"[1]
        type1_error_aov[sim] <- p_aov < 0.05
      }, silent = TRUE)
    }
    
    # Method 2: Linear mixed model (uses all available data)
    try({
      library(lmerTest)
      model_lmm <- lmer(response ~ time + (1|subject), data = data, 
                        na.action = na.omit)
      p_lmm <- anova(model_lmm)$"Pr(>F)"[1]
      type1_error_lmm[sim] <- p_lmm < 0.05
    }, silent = TRUE)
  }
  
  cat("Simulation Results (n =", n_sim, "):\n")
  cat("Type I error rate - Traditional rANOVA:", 
      mean(type1_error_aov, na.rm = TRUE), "\n")
  cat("Type I error rate - Linear Mixed Model:", 
      mean(type1_error_lmm, na.rm = TRUE), "\n")
  
  return(data.frame(
    simulation = 1:n_sim,
    aov_error = type1_error_aov,
    lmm_error = type1_error_lmm
  ))
}

# Run simulation
sim_results <- simulate_rm_comparison(n_sim = 200, n = 15, k = 3, missing_rate = 0.1)

References

1. Maxwell, S. E., & Delaney, H. D. (2004). *Designing Experiments and Analyzing Data: A Model Comparison Perspective* (2nd ed.). Psychology Press. 2. Pinheiro, J. C., & Bates, D. M. (2000). *Mixed-Effects Models in S and S-PLUS*. Springer. 3. Verbeke, G., & Molenberghs, G. (2000). *Linear Mixed Models for Longitudinal Data*. Springer. 4. Field, A., Miles, J., & Field, Z. (2012). *Discovering Statistics Using R*. Sage Publications. 5. Gueorguieva, R., & Krystal, J. H. (2004). Move over ANOVA: Progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry. *Archives of General Psychiatry, 61*(3), 310-317.

        1. Online Resources####



Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif