Difference between revisions of "SMHS rANOVA"

From SOCR
Jump to: navigation, search
(References)
 
(19 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==[[SMHS| Scientific Methods for Health Sciences]] - Repeated measures Analysis of Variance (rANOVA) ==
+
==[[SMHS| Scientific Methods for Health Sciences]] - Repeated Measures Analysis of Variance (rANOVA) ==
  
 
===Overview===
 
===Overview===
The phrase Repeated measures is used in situations when the same objects/units/entities take part in all conditions of an experiment. Given there is multiple measures on the same subject, we have to control for correlation between multiple measures on the same subject. Repeated measures ANOVA (rANOVA) is a commonly used statistical approach for analyzing repeated measure designs. It is the equivalent of the one-way ANOVA, but for related, not independent, groups. It is also referred to as within-subject ANOVA or ANOVA for correlated samples. The test is to detect any overall differences between related means. We can analyze data using repeated measures ANOVA for two types of study design. Studies that investigate either (1) changes in mean scores over three or more time points, or (2) difference in mean scores under three or more different conditions.  
+
'''Repeated Measures Analysis of Variance (rANOVA)''' is a statistical method for analyzing data where the same subjects are measured multiple times under different conditions or at different time points. Unlike independent ANOVA, rANOVA accounts for the correlation between repeated measurements on the same subject, increasing statistical power by reducing error variance attributable to individual differences. This section provides a comprehensive treatment of rANOVA, including its mathematical foundations, assumptions, modern alternatives, and practical implementation in R.
  
 
===Motivation===
 
===Motivation===
If you want to test the equality of means, ANOVA would often be a good way to go. However, when it comes to data with repeated measures, a standard ANOVA would be inappropriate because it fails to model the correlation between the repeated measures: the data violate the ANOVA assumption of independence. Repeated measures is used for several reasons: First, some research hypotheses require repeated measures. Longitudinal research, for example, measures each sample member at each of several ages. In this case, age would be a repeated factor. Second, in cases where there is a great deal of variation between sample members, error variance estimates from standard ANOVAs are large. Repeated measures of each sample member provide a way of accounting for this variance, thus reducing error variance. Third, when sample members are difficult to recruit, repeated measures designs are economical because each member is measured under all conditions. Fourth, it applies when members have been matched according to some important characteristic.
+
Consider a clinical trial measuring blood pressure in patients at baseline, 3 months, and 6 months post-treatment. Using independent ANOVA would violate the independence assumption since measurements are correlated within subjects. rANOVA addresses this by:
  
In order to provide a demonstration of how to calculate a repeated measures ANOVA, we shall use the example of a 6-month exercise-training intervention where six subjects had their fitness level measured on three occasions: pre-, 3 months, and post-intervention. Their data is shown below along with some initial calculations:
+
1. '''Partitioning variance''' into between-subjects and within-subjects components
<center>
 
{| class="wikitable" style="text-align:center; width:45%" border="1"
 
|-
 
|Subjects|| Pre- ||3 Months|| 6 Months|| Subject Means
 
|-
 
|1|| 45|| 50|| 55|| 50
 
|-
 
|2|| 42|| 42|| 45|| 43
 
|-
 
|3|| 36|| 41|| 43|| 40
 
|-
 
|4|| 39 ||35|| 40|| 38
 
|-
 
|5|| 51 ||55|| 59|| 55
 
|-
 
|6 ||44|| 49 ||56|| 49.7
 
|-
 
|Monthly Means|| 42.8|| 45.3|| 49.7||
 
|-
 
|}
 
</center>
 
  
 +
2. '''Increasing statistical power''' by reducing error variance
  
<center>
+
3. '''Accounting for individual differences''' through subject-specific effects
{| class="wikitable" style="text-align:center; width:45%" border="1"
 
|-
 
|Source|| SS|| df|| MS|| F
 
|-
 
|Conditions|| $SS_{conditions}$||$(k-1)$||$MS_{conditions}$|| $\frac {MS_{conditions}}{MS_{error}}$
 
|-
 
|Subjects|| $SS_{subjects}$||$(n-1)$|| $MS_{subjects}$||$\frac{MS_{subjects}} {MS_{error}}$
 
|-
 
|Error || $SS_{error}$||$(k-1)(n-1)$||$MS_{error}$|| 
 
|-
 
|Total|| $SS_{T}$|| $(N-1)$|| ||
 
|-
 
|}
 
</center>
 
  
$F=\frac {MS_{conditions}}{MS_{error}}=\frac {MS_{time}}{MS_{error}}$
+
4. '''Enabling longitudinal analysis''' of change over time
  
$SS_{Total} = SS_{conditions}+SS_{subjects}+SS_{Error}$
+
For studies with missing data, complex covariance structures, or unbalanced designs, '''Linear Mixed Models (LMMs)''' provide a more flexible alternative to traditional rANOVA.
  
 +
===Theory===
 +
 +
====Mathematical Model====
 +
 +
=====Univariate Approach=====
 +
For <math>n</math> subjects measured at <math>k</math> time points, the univariate rANOVA model is:
 +
<math>
 +
y_{ij} = \mu + \pi_i + \tau_j + \varepsilon_{ij}, \quad i=1,\ldots,n,\; j=1,\ldots,k
 +
</math>
 +
where:
 +
- <math>y_{ij}</math> is the measurement for subject <math>i</math> at time <math>j</math>
 +
- <math>\mu</math> is the overall mean
 +
- <math>\pi_i \sim N(0, \sigma_{\pi}^2)</math> is the random subject effect
 +
- <math>\tau_j</math> is the fixed time effect (<math>\sum_{j=1}^k \tau_j = 0</math>)
 +
- <math>\varepsilon_{ij} \sim N(0, \sigma_{\varepsilon}^2)</math> is the within-subject error
 +
 +
The covariance structure assumes:
 +
<math>
 +
\text{Cov}(y_{ij}, y_{i'j'}) =
 +
\begin{cases}
 +
\sigma_{\pi}^2 + \sigma_{\varepsilon}^2 & \text{if } i=i', j=j' \\
 +
\sigma_{\pi}^2 & \text{if } i=i', j \neq j' \\
 +
0 & \text{if } i \neq i'
 +
\end{cases}
 +
</math>
 +
 +
=====Matrix Formulation=====
 +
Let <math>\mathbf{y}_i = (y_{i1}, \ldots, y_{ik})^\top</math> be the vector of measurements for subject <math>i</math>:
 +
<math>
 +
\mathbf{y}_i = \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol{\varepsilon}_i
 +
</math>
 +
where:
 +
- <math>\mathbf{X}_i</math> is the design matrix for fixed effects
 +
- <math>\boldsymbol{\beta}</math> contains fixed effects (including time effects)
 +
- <math>\mathbf{Z}_i</math> is the design matrix for random effects
 +
- <math>\mathbf{b}_i \sim N(\mathbf{0}, \mathbf{G})</math> are random subject effects
 +
- <math>\boldsymbol{\varepsilon}_i \sim N(\mathbf{0}, \mathbf{R}_i)</math> is the within-subject error
 +
 +
For compound symmetry structure:
 +
<math>
 +
\text{Var}(\mathbf{y}_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^\top + \mathbf{R}_i = \sigma_{\pi}^2\mathbf{J}_k + \sigma_{\varepsilon}^2\mathbf{I}_k
 +
</math>
 +
where <math>\mathbf{J}_k</math> is a <math>k \times k</math> matrix of ones and <math>\mathbf{I}_k</math> is the identity matrix.
 +
 +
====Variance Partitioning====
 +
 +
The total sum of squares decomposes as:
 +
<math>
 +
SST = SS_{BetweenSubjects} + SS_{WithinSubjects}
 +
</math>
 +
<math>
 +
SS_{WithinSubjects} = SS_{Time} + SS_{Error}
 +
</math>
 +
 +
More formally:
 +
<math>
 +
\begin{aligned}
 +
SST &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{\cdot\cdot})^2 \\
 +
SS_{Subjects} &= k \sum_{i=1}^n (\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2 \\
 +
SS_{Time} &= n \sum_{j=1}^k (\bar{y}_{\cdot j} - \bar{y}_{\cdot\cdot})^2 \\
 +
SS_{Error} &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{i\cdot} - \bar{y}_{\cdot j} + \bar{y}_{\cdot\cdot})^2
 +
\end{aligned}
 +
</math>
 +
 +
Degrees of freedom:
 +
<math>
 +
\begin{aligned}
 +
df_{Total} &= nk - 1 \\
 +
df_{Subjects} &= n - 1 \\
 +
df_{Time} &= k - 1 \\
 +
df_{Error} &= (n - 1)(k - 1)
 +
\end{aligned}
 +
</math>
  
with corresponding degrees of freedom:  
+
Mean squares:
 +
<math>
 +
\begin{aligned}
 +
MS_{Subjects} &= SS_{Subjects} / (n-1) \\
 +
MS_{Time} &= SS_{Time} / (k-1) \\
 +
MS_{Error} &= SS_{Error} / ((n-1)(k-1))
 +
\end{aligned}
 +
</math>
  
$df_{Total}=df_{conditions}+df_{subjects}+df_{Error}
+
F-statistic for time effect:
=(k-1)+(n-1)+((n-k)(n-1))$
+
<math>
 +
F = \frac{MS_{Time}}{MS_{Error}} \sim F_{k-1, (n-1)(k-1)} \quad \text{under } H_0: \tau_1 = \cdots = \tau_k
 +
</math>
  
 +
====Sphericity and Corrections====
  
$SS_{Time}$s the same as for $SS_{b}$ in an independent ANOVA, and
+
The '''sphericity assumption''' (also called '''circularity''') requires:
 +
<math>
 +
\text{Var}(y_{ij} - y_{ij'}) = \text{constant for all } j \neq j'
 +
</math>
 +
Equivalently, the covariance matrix of the difference scores has equal variances and equal covariances.
  
 +
'''Mauchly's Test for Sphericity:'''
 +
<math>
 +
W = \frac{| \mathbf{E} |}{(\text{tr}(\mathbf{E})/p)^{p}} \sim \chi^2_{(p(p-1)/2 - 1)}
 +
</math>
 +
where <math>\mathbf{E}</math> is the <math>p \times p</math> covariance matrix of the difference scores, <math>p = k-1</math>.
  
$SS_{Time}=SS_{b}=\sum_{i=1}^{k} n_{i} (\bar x_{i}-\bar x)^{2}=$
+
When sphericity is violated, corrections adjust degrees of freedom:
  
where $k$ is the number of conditions, $n_{i}$ is the number of subjects under $ith$ condition, $\bar x_{i}$ is the mean score for each $ith$ condition and $\bar x$ is the overall grand mean of all conditions.
+
1. '''Greenhouse-Geisser correction''': <math>\hat{\epsilon} = \frac{(\text{tr}(\mathbf{E}))^2}{(p-1)\sum_{i=1}^p \lambda_i^2}</math> <math>F_{GG} \sim F_{\hat{\epsilon}(k-1), \hat{\epsilon}(n-1)(k-1)}</math>
  
 +
2. '''Huynh-Feldt correction''': <math>\tilde{\epsilon} = \frac{n(k-1)\hat{\epsilon} - 2}{(k-1)[(n-1) - (k-1)\hat{\epsilon}]}</math> <math>F_{HF} \sim F_{\tilde{\epsilon}(k-1), \tilde{\epsilon}(n-1)(k-1)}</math>
  
Here, we have
+
====4) Multivariate Approach (MANOVA)====
  
$SS_{Time}=SS_{b}=\sum_{i=1}^{k} n_{i} (\bar x_{i}-\bar x)^{2}=6[(42.8-45.9)^{2}+(45.3-45.9)^{2}+(49.7-45.9)^{2}]=6[9.61+0.36+14.44]=143.44.$
+
For <math>k</math> time points, consider the <math>(k-1) \times 1</math> vector of difference scores <math>\mathbf{d}_i = (y_{i2}-y_{i1}, \ldots, y_{ik}-y_{i,k-1})^\top</math>:
 +
<math>
 +
\mathbf{d}_i \sim N_{k-1}(\boldsymbol{\mu}_d, \boldsymbol{\Sigma}_d)
 +
</math>
 +
Test <math>H_0: \boldsymbol{\mu}_d = \mathbf{0}</math> using Hotelling's <math>T^2</math>:
 +
<math>
 +
T^2 = n\bar{\mathbf{d}}^\top \mathbf{S}_d^{-1} \bar{\mathbf{d}} \sim \frac{(n-1)(k-1)}{n-k+1} F_{k-1, n-k+1}
 +
</math>
 +
where <math>\bar{\mathbf{d}}</math> is the sample mean and <math>\mathbf{S}_d</math> is the sample covariance of difference scores.
  
 +
===Assumptions and Diagnostics===
  
The within-subject variation $SS_{w}$ is calculated as
+
'''Key Assumptions:'''
$SS_{W}=\sum_{1}(x_{i1}-\bar x_{1})^{2}+\sum_{2}(x_{i2}-\bar x_{2})^{2} +⋯+\sum_{k}(x_{ik}-\bar x_{k})^{2}$
+
1. '''Normality''': Residuals <math>\varepsilon_{ij}</math> should be normally distributed
 +
2. '''Sphericity''': Equal variances of differences between all pairs of repeated measures
 +
3. '''Missing Data''': Missing completely at random (MCAR) for valid inference
 +
4. '''No Time × Subject Interaction''': Treatment effects are consistent across subjects
  
where $x_{ik}$ is the score of the $ith$ subject in group $k$.
+
'''Diagnostic Procedures:'''
 +
<pre>
 +
# Comprehensive rANOVA diagnostic function
 +
check_ranova_assumptions <- function(data, subject_var, time_var, response_var) {
 +
 
 +
  cat("=== REPEATED MEASURES ANOVA DIAGNOSTICS ===\n")
 +
 
 +
  # 1. Normality of residuals
 +
  # Fit a linear model ignoring the repeated measures structure first
 +
  model_lm <- lm(as.formula(paste(response_var, "~", time_var)), data = data)
 +
  residuals <- resid(model_lm)
 +
 
 +
  par(mfrow = c(2, 3))
 +
 
 +
  # Q-Q plot
 +
  qqnorm(residuals, main = "Q-Q Plot of Residuals")
 +
  qqline(residuals)
 +
 
 +
  # Shapiro-Wilk test
 +
  shapiro_test <- shapiro.test(residuals)
 +
  cat("1. Normality (Shapiro-Wilk): W =", round(shapiro_test$statistic, 4),
 +
      "p =", format.pval(shapiro_test$p.value, digits = 4), "\n")
 +
 
 +
  # 2. Sphericity test (requires ez package)
 +
  if (require(ez, quietly = TRUE)) {
 +
    # Convert to wide format for Mauchly's test
 +
    wide_data <- reshape(data,
 +
                        idvar = subject_var,
 +
                        timevar = time_var,
 +
                        direction = "wide")
 +
   
 +
    # Remove subject column
 +
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
 +
    response_matrix <- as.matrix(wide_data[, response_cols])
 +
   
 +
    # Mauchly's test using car package
 +
    if (require(car, quietly = TRUE)) {
 +
      # Create an idata frame for repeated measures
 +
      time_levels <- unique(data[[time_var]])
 +
      idata <- data.frame(time = factor(time_levels))
 +
     
 +
      # Fit linear model for repeated measures
 +
      mlm <- lm(response_matrix ~ 1)
 +
     
 +
      # Mauchly's test
 +
      mauchly_test <- Anova(mlm, idata = idata, idesign = ~time, type = "III")
 +
      mauchly_result <- mauchly_test$sphericity.tests
 +
     
 +
      cat("\n2. Mauchly's Test for Sphericity:\n")
 +
      print(mauchly_result)
 +
     
 +
      # Greenhouse-Geisser and Huynh-Feldt corrections
 +
      epsilon_GG <- mauchly_result["Greenhouse-Geisser epsilon", "GG eps"]
 +
      epsilon_HF <- mauchly_result["Huynh-Feldt epsilon", "HF eps"]
 +
      cat("\nEpsilon corrections:\n")
 +
      cat("  Greenhouse-Geisser:", round(epsilon_GG, 3), "\n")
 +
      cat("  Huynh-Feldt:", round(epsilon_HF, 3), "\n")
 +
    }
 +
  }
 +
 
 +
  # 3. Homogeneity of variance across time points
 +
  boxplot(as.formula(paste(response_var, "~", time_var)),
 +
          data = data,
 +
          main = "Boxplots by Time Point",
 +
          xlab = time_var, ylab = response_var)
 +
 
 +
  # Levene's test for homogeneity of variance
 +
  if (require(car, quietly = TRUE)) {
 +
    levene_test <- leveneTest(as.formula(paste(response_var, "~ factor(", time_var, ")")),
 +
                              data = data)
 +
    cat("\n3. Levene's Test for Homogeneity of Variance:\n")
 +
    print(levene_test)
 +
  }
 +
 
 +
  # 4. Individual profiles plot
 +
  data_sorted <- data[order(data[[subject_var]], data[[time_var]]), ]
 +
 
 +
  # Create empty plot
 +
  plot(1, type = "n",
 +
      xlim = range(as.numeric(data[[time_var]])),
 +
      ylim = range(data[[response_var]]),
 +
      xlab = time_var, ylab = response_var,
 +
      main = "Individual Profiles")
 +
 
 +
  # Plot each subject's profile
 +
  subjects <- unique(data[[subject_var]])
 +
  colors <- rainbow(length(subjects))
 +
 
 +
  for (i in seq_along(subjects)) {
 +
    subject_data <- data[data[[subject_var]] == subjects[i], ]
 +
    lines(as.numeric(subject_data[[time_var]]),
 +
          subject_data[[response_var]],
 +
          col = colors[i], type = "b", lwd = 1)
 +
  }
 +
 
 +
  # Add group mean profile
 +
  group_means <- aggregate(as.formula(paste(response_var, "~", time_var)),
 +
                          data = data, mean)
 +
  lines(as.numeric(group_means[[time_var]]),
 +
        group_means[[response_var]],
 +
        col = "black", lwd = 3, type = "b")
 +
 
 +
  legend("topright", legend = c("Individual", "Group Mean"),
 +
        col = c("gray", "black"), lwd = c(1, 3), bty = "n")
 +
 
 +
  # 5. Residuals vs fitted plot
 +
  plot(fitted(model_lm), residuals,
 +
      xlab = "Fitted Values", ylab = "Residuals",
 +
      main = "Residuals vs Fitted")
 +
  abline(h = 0, col = "red", lty = 2)
 +
 
 +
  # 6. Histogram of residuals
 +
  hist(residuals, main = "Histogram of Residuals",
 +
      xlab = "Residuals", col = "lightblue", freq = FALSE)
 +
  curve(dnorm(x, mean = mean(residuals), sd = sd(residuals)),
 +
        add = TRUE, col = "red", lwd = 2)
 +
 
 +
  par(mfrow = c(1, 1))
 +
 
 +
  # Return diagnostic summary
 +
  return(list(
 +
    shapiro_test = shapiro_test,
 +
    sphericity_tests = if(exists("mauchly_result")) mauchly_result else NULL,
 +
    homogeneity_test = if(exists("levene_test")) levene_test else NULL
 +
  ))
 +
}
 +
</pre>
  
For this example we have:
+
===Applications===
$SS_{W}=\sum_{1}(x_{i1}-\bar x_{1})^{2}+\sum_{2}(x_{i2}-\bar x_{2})^{2} +⋯+\sum_{k}(x_{ik}-\bar x_{k})^{2}=[(45-42.8)^2+(42-42.8)^2+⋯+(56-49.7)^2 ]=715.5$
 
  
 +
====Example 1: Exercise Training Intervention====
 +
<pre>
 +
# Recreate the example from the motivation section
 +
exercise_data <- data.frame(
 +
  subject = factor(rep(1:6, each = 3)),
 +
  time = factor(rep(c("Pre", "3Months", "6Months"), 6)),
 +
  fitness = c(45, 50, 55,    # Subject 1
 +
              42, 42, 45,    # Subject 2
 +
              36, 41, 43,    # Subject 3
 +
              39, 35, 40,    # Subject 4
 +
              51, 55, 59,    # Subject 5
 +
              44, 49, 56)    # Subject 6
 +
)
  
 +
cat("=== Exercise Training Intervention Study ===\n")
 +
cat("Research question: Does fitness level change over time?\n\n")
  
$SS_{subject}$ is calculated by: $SS_{subject}=k\sum(\bar x_{i}-\bar x )^{2}$ where $\bar x_{i}$ is the mean score of the $ith$ subject, $\bar x$ is the grand mean.
+
# Display the data
 +
cat("Dataset:\n")
 +
print(exercise_data)
  
 +
# Summary statistics
 +
cat("\nSummary statistics by time:\n")
 +
print(aggregate(fitness ~ time, data = exercise_data,
 +
                FUN = function(x) c(mean = mean(x), sd = sd(x))))
  
In our example, we have $SS_{subject}=k\sum(\bar x_{i})-\bar x^{2}=3[(50-45.9)^2+(40-45.9)^2+(38-45.9)^2+(55-45.9)^2+(49.7-45.9)^2 ]=658.3.$
+
# Run diagnostics first
 +
cat("\n=== Model Diagnostics ===\n")
 +
diagnostics <- check_ranova_assumptions(exercise_data, "subject", "time", "fitness")
  
 +
# Traditional rANOVA using aov()
 +
cat("\n=== Traditional rANOVA ===\n")
 +
model_aov <- aov(fitness ~ time + Error(subject/time), data = exercise_data)
 +
print(summary(model_aov))
  
Thus,
+
# Extract F-statistic and p-value
 +
aov_summary <- summary(model_aov)
 +
F_time <- aov_summary[[2]][[1]]$"F value"[1]
 +
df_time <- aov_summary[[2]][[1]]$"Df"[1]
 +
df_error <- aov_summary[[2]][[1]]$"Df"[2]
 +
p_time <- aov_summary[[2]][[1]]$"Pr(>F)"[1]
  
$SS_{Error}= SS_{w}-SS_{subject}=715.5-658.3=57.2$
+
cat(sprintf("\nF(%d, %d) = %.2f, p = %.3f\n",
 +
            df_time, df_error, F_time, p_time))
  
$F=MS_{Time}⁄MS_{Error} =(SS_{Time}⁄df_{Time})(SS_{Error}⁄df_{Error})=$
+
# Effect size calculation (Partial eta-squared)
 +
SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
 +
SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
 +
eta_sq <- SS_time / (SS_time + SS_error)
 +
cat("Partial eta-squared (effect size):", round(eta_sq, 3), "\n")
  
$=(SS_{Time}⁄(k-1))⁄(SS_{Error}⁄(n-1)(k-1))=$
+
# Post-hoc pairwise comparisons with Bonferroni correction
 +
cat("\n=== Post-hoc Pairwise Comparisons ===\n")
 +
library(emmeans)
  
$(146.44⁄((3-1))(57.2⁄(5*2))=12.53$
+
# Fit linear model for post-hoc tests
 +
model_lm <- lm(fitness ~ time + subject, data = exercise_data)
 +
emm_time <- emmeans(model_lm, specs = pairwise ~ time, adjust = "bonferroni")
 +
print(emm_time$contrasts)
  
 +
# Visualization
 +
library(ggplot2)
 +
ggplot(exercise_data, aes(x = time, y = fitness, group = subject)) +
 +
  geom_line(aes(color = subject), alpha = 0.6) +
 +
  geom_point(aes(color = subject), size = 2) +
 +
  stat_summary(aes(group = 1), fun = mean, geom = "line",
 +
              color = "black", size = 1.5) +
 +
  stat_summary(fun.data = mean_se, geom = "errorbar",
 +
              width = 0.1, color = "black") +
 +
  labs(title = "Fitness Levels Over Time",
 +
      subtitle = "Individual profiles with group mean",
 +
      x = "Time Point", y = "Fitness Score") +
 +
  theme_minimal()
 +
</pre>
  
We can now look up (or use a computer program) to ascertain the critical F-statistic for our F-distribution with our degrees of freedom for time $df_{Time}$ and error $df_{Error}$ and determine whether our F-statistic indicates a statistically significant result.
+
====Example 2: Clinical Trial with Missing Data (Using Mixed Models)====
We report the F-statistic from a repeated measures ANOVA as:
+
<pre>
 +
# Simulated clinical trial data with missing values
 +
set.seed(123)
 +
n_subjects <- 30
 +
n_timepoints <- 4
  
 +
clinical_data <- expand.grid(
 +
  subject = factor(1:n_subjects),
 +
  time = factor(paste0("Week_", 0:3)),
 +
  treatment = factor(rep(c("Drug", "Placebo"), each = n_subjects/2))
 +
)
  
$F(df_{Time}, df_{Error} = F-value, p = p-value,$
+
# Generate data with treatment effect and time trend
 +
clinical_data$response <- with(clinical_data,
 +
  50 +  # Baseline
 +
  ifelse(treatment == "Drug", 5, 0) +  # Treatment effect
 +
  as.numeric(time) * 2 +  # Time effect (2 units per week)
 +
  ifelse(treatment == "Drug", as.numeric(time) * 1, 0) +  # Treatment × time interaction
 +
  rep(rnorm(n_subjects, 0, 3), each = n_timepoints) +  # Random intercept
 +
  rnorm(n_subjects * n_timepoints, 0, 2)  # Residual error
 +
)
  
which for our example would be:
+
# Introduce missing data (MCAR)
 +
missing_indices <- sample(1:nrow(clinical_data), size = 20)
 +
clinical_data$response[missing_indices] <- NA
  
$F(2, 10) = 12.53, p = 0.002,$
+
cat("=== Clinical Trial with Repeated Measures ===\n")
see [http://socr.ucla.edu/htmls/dist/Fisher_Distribution.html  SOCR Java F-distribution calculator],
+
cat("Research question: Does the treatment affect response over time?\n\n")
or  [http://www.distributome.org/V3/calc/FCalculator.html  the Distributome HTML5 F-distribution Calculator]
 
  
 +
cat("Missing data pattern:\n")
 +
print(table(is.na(clinical_data$response)))
  
This means we can reject the null hypothesis and accept the alternative hypothesis. As we will discuss later, there are assumptions and effect sizes we can calculate that can alter how we report the above result. However, we would otherwise report the above findings for this example exercise study as: The six-month exercise-training program had a statistically significant effect on fitness levels, $F(2, 10) = 12.53, p = .002.$
+
# Approach 1: Linear Mixed Model (recommended for missing data)
 +
cat("\n=== Linear Mixed Model Approach ===\n")
 +
library(lme4)
 +
library(lmerTest)
  
 +
# Fit mixed model with random intercepts
 +
mixed_model <- lmer(response ~ time * treatment + (1|subject),
 +
                    data = clinical_data,
 +
                    na.action = na.omit)
  
===Theory===
+
cat("\nModel summary:\n")
Assumption:
+
print(summary(mixed_model))
As with all statistical analyses, specific assumptions should be met to justify the use of this test. With repeated measures ANOVA, standard univariate and multivariate assumptions apply. The list of assumptions include:
+
 
*Normality: For each level of the within subject factor, the dependent variable must have a normal distribution.
+
# Test fixed effects
*Sphericity: Difference scores computed between two levels of a within-subjects factor must have the same variance for the comparison of any two levels (this assumptions only applied with more than 2 levels of independent variables).
+
cat("\nTests of fixed effects:\n")
*Randomness: Cases should be derived from a random sample, and different scores for each participant are independent from those of other participants.
+
print(anova(mixed_model))
*Multivariate normality: The difference scores are multivariately normally distributed in the population.
+
 
 +
# Random effects variance components
 +
cat("\nVariance components:\n")
 +
print(VarCorr(mixed_model))
 +
 
 +
# Estimated marginal means
 +
if (require(emmeans)) {
 +
  cat("\n=== Estimated Marginal Means ===\n")
 +
  emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
 +
  print(emm_treatment)
 +
 
 +
  emm_time <- emmeans(mixed_model, specs = ~ time)
 +
  print(emm_time)
 +
 
 +
  # Interaction plot
 +
  emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
 +
  print(emm_interaction)
 +
 
 +
 
 +
# Robust Interaction plot
 +
if (require(emmeans)) {
 +
  cat("\n=== Estimated Marginal Means ===\n")
 +
  emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
 +
  print(emm_treatment)
 +
 
 +
  emm_time <- emmeans(mixed_model, specs = ~ time)
 +
  print(emm_time)
 +
 
 +
  # Interaction plot
 +
  emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
 +
 
 +
  # Convert to data frame and ensure confidence intervals are present
 +
  plot_data <- as.data.frame(emm_interaction)
 +
 
 +
  # Check if confidence interval columns exist
 +
  if (!all(c("asymp.LCL", "asymp.LCL") %in% names(plot_data))) {
 +
    # If not, get them using summary
 +
    plot_data <- as.data.frame(summary(emm_interaction))
 +
  }
 +
 
 +
  # Now plot
 +
  ggplot(plot_data, aes(x = time, y = emmean, group = treatment, color = treatment)) +
 +
    geom_line(size = 1.5) +
 +
    geom_errorbar(aes(ymin = asymp.LCL, ymax = asymp.UCL), width = 0.1) +
 +
    geom_point(size = 3) +
 +
    labs(title = "Treatment × Time Interaction",
 +
        subtitle = "Estimated marginal means with 95% confidence intervals",
 +
        x = "Time Point", y = "Response") +
 +
    theme_minimal()
 +
}
 +
 
 +
}
 +
 
 +
# Approach 2: Traditional rANOVA (for comparison)
 +
cat("\n=== Traditional rANOVA (for comparison) ===\n")
 +
# Remove subjects with missing data for traditional ANOVA
 +
complete_data <- clinical_data[complete.cases(clinical_data), ]
 +
complete_subjects <- unique(complete_data$subject)
 +
cat("Subjects with complete data:", length(complete_subjects), "\n")
 +
 
 +
if (length(complete_subjects) >= 3) {
 +
  model_aov2 <- aov(response ~ time * treatment + Error(subject/time),
 +
                    data = complete_data)
 +
  print(summary(model_aov2))
 +
} else {
 +
  cat("Insufficient complete cases for traditional rANOVA\n")
 +
}
 +
</pre>
 +
 
 +
====Example 3: Real Dataset - CO2 Uptake in Plants====
 +
<pre>
 +
# Using built-in CO2 dataset from R
 +
data(CO2)
 +
cat("=== CO2 Uptake in Plants ===\n")
 +
cat("Research question: Does CO2 uptake change with concentration?\n\n")
 +
 
 +
# Prepare data: focus on Quebec plants, type Qn1
 +
co2_data <- CO2[CO2$Type == "Quebec" & CO2$Plant == "Qn1", ]
 +
co2_data$conc_factor <- factor(co2_data$conc)
 +
 
 +
cat("Data structure:\n")
 +
str(co2_data)
  
One of the greatest advantages of rANOVA, as is the case with repeated measures designs in general, is the ability to partition out variability due to individual differences. Consider the general structure of F-statistic:
+
# Visual inspection
 +
library(ggplot2)
 +
ggplot(co2_data, aes(x = conc, y = uptake, group = 1)) +
 +
  geom_line() +
 +
  geom_point() +
 +
  labs(title = "CO2 Uptake by Concentration",
 +
      x = "CO2 Concentration (mL/L)", y = "Uptake (umol/m^2 sec)") +
 +
  theme_minimal()
  
$F=MS_{Treatment}⁄MS_{Error} =(SS_{Treatment}⁄df_{Treatment})⁄(SS_{Error}⁄df_{Error})$
+
# Since we have only one plant, this is a single-subject repeated measures design
 +
# We'll use a polynomial contrast analysis
  
 +
cat("\n=== Polynomial Trend Analysis ===\n")
 +
# Create orthogonal polynomial contrasts for concentration
 +
conc_levels <- length(unique(co2_data$conc))
 +
poly_contrasts <- contr.poly(conc_levels)
  
Is a between-subject design there is an element of variance due to individual difference that is combined with the treatment and error terms:
+
# Apply contrasts
$SS_{Total}=SS_{Treatment}+SS_{Error},df_{Total}=n-1$
+
co2_data$conc_factor <- factor(co2_data$conc)
 +
contrasts(co2_data$conc_factor) <- poly_contrasts
  
 +
# Fit model with polynomial contrasts
 +
model_poly <- lm(uptake ~ conc_factor, data = co2_data)
 +
print(summary(model_poly))
  
In a repeated measures design it is possible to partition subject variability from the treatment and error terms. In such a case, variability can be broken down into between-treatments variability (or within-subjects effects, excluding individual differences) and within-treatments variability. The within-treatments variability can be further partitioned into between-subjects variability (individual differences) and error (excluding the individual differences):
+
# Extract polynomial components
 +
anova_poly <- anova(model_poly)
 +
cat("\nPolynomial decomposition:\n")
 +
print(anova_poly)
  
 +
# Calculate proportion of variance explained by each polynomial component
 +
SS_total <- sum(anova_poly$`Sum Sq`)
 +
for (i in 1:nrow(anova_poly)) {
 +
  prop_var <- anova_poly$`Sum Sq`[i] / SS_total
 +
  cat(sprintf("Component %d: %.1f%% of variance\n",
 +
              i, prop_var * 100))
 +
}
 +
</pre>
  
$F=MS_{conditions}⁄MS_{Error} =MS_{Time}⁄MS_{Error}$
+
===Advanced Topics===
 
$SS_{Error}=SS_{w}-SS_{subject}$
 
  
$SS_{Total}=SS_{conditions}+SS_{Subjects}+SS_{Error}$,
+
====Power Analysis for Repeated Measures====
 +
<pre>
 +
# Power analysis for repeated measures designs
 +
calculate_power_rm <- function(n, k, rho, f2, alpha = 0.05) {
 +
  # n: number of subjects
 +
  # k: number of time points
 +
  # rho: correlation between repeated measures
 +
  # f2: effect size (Cohen's f^2)
 +
 
 +
  # Degrees of freedom
 +
  df1 <- k - 1
 +
  df2 <- (n - 1) * (k - 1)
 +
 
 +
  # Non-centrality parameter
 +
  lambda <- n * k * f2
 +
 
 +
  # Critical F value
 +
  F_crit <- qf(1 - alpha, df1, df2)
 +
 
 +
  # Power
 +
  power <- 1 - pf(F_crit, df1, df2, ncp = lambda)
 +
 
 +
  return(power)
 +
}
  
$df_{Total}=df_{conditions}+df_{subjects}+df_{Error}=(k-1)+(n-1)+((n-k)(n-1))$.
+
# Example: Clinical trial with 20 subjects, 4 time points
 +
power_example <- calculate_power_rm(n = 20, k = 4, rho = 0.5, f2 = 0.25)
 +
cat("Power for repeated measures design:\n")
 +
cat("Subjects: 20, Time points: 4, Effect size f² = 0.25\n")
 +
cat("Estimated power:", round(power_example, 3), "\n")
  
 +
# Power curve visualization
 +
library(ggplot2)
 +
n_range <- seq(10, 50, by = 5)
 +
power_values <- sapply(n_range, function(n) calculate_power_rm(n, 4, 0.5, 0.25))
  
In reference to the general structure of the F-statistic, it is clear that by partitioning out the between-subjects variability, the F-value will increase because the sum of squares error term will be smaller resulting in a smaller MSError. It is noteworthy that partitioning variability reduces degrees of freedom from the F-test, therefore the between-subjects variability must be significant enough to offset the loss in degrees of freedom. If between-subjects variability is small this process may actually reduce the F-value.
+
power_df <- data.frame(n = n_range, power = power_values)
 +
ggplot(power_df, aes(x = n, y = power)) +
 +
  geom_line(size = 1.5, color = "blue") +
 +
  geom_point(size = 3) +
 +
  geom_hline(yintercept = 0.8, linetype = "dashed", color = "red") +
 +
  labs(title = "Power Analysis for Repeated Measures Design",
 +
      subtitle = "k = 4 time points, ρ = 0.5, f² = 0.25",
 +
      x = "Number of Subjects", y = "Statistical Power") +
 +
  theme_minimal()
 +
</pre>
  
===F test:===
+
====Handling Autocorrelated Errors====
As with other analysis of variance tests, the rANOVA makes use of an F statistic to determine significance. Depending on the number of within-subjects factors and assumption violations, it is necessary to select the most appropriate of three tests:
 
  
*Standard Univariate ANOVA F test — This test is commonly used given only two levels of the within-subjects factor (i.e. time point 1 and time point 2). This test is not recommended given more than 2levels of the within-subjects factor because the assumption of sphericity is commonly violated in such cases.
+
In this example, note that we show refitting the model using maximum likelihood (method = "ML") instead of REML.  
*Alternative Univariate test — These tests account for violations to the assumption of sphericity, and can be used when the within-subjects factor exceeds 2 levels. The F statistic is the same as in the Standard Univariate ANOVA F test, but is associated with a more accurate p-value. This correction is done by adjusting the degrees of freedom downward for determining the critical F value. Two corrections are commonly used—The Greenhouse-Geisser correction and the Huynh-Feldt correction. The Greenhouse-Geisser correction is more conservative, but addresses a common issue of increasing variability over time in a repeated-measures design.
+
This often resolves the non-positive definite matrix issue for computing intervals on variance-covariance components,  
*Multivariate Test—This test does not assume sphericity, but is also highly conservative.
+
as ML doesn't apply the same adjustments that can destabilize the approximation under REML.
  
  
'''Effect size:'''
+
<pre>
One of the most commonly reported effect size statistics for rANOVA is partial eta-squared $(η_{p}^{2})$. It is also common to use the multivariate $η^{2}$ when the assumption of sphericity has been violated, and the multivariate test statistic is reported. A third effect size statistic that is reported is the generalized $η^{2}$, which is comparable to $η_{p}^{2}$ in a one-way repeated measures ANOVA. It has been shown to be a better estimate of effect size with other within-subjects tests. 
+
# When measurements are equally spaced in time, errors may be autocorrelated
 +
library(nlme)
  
'''Caution:'''
+
# Simulate data with AR(1) errors
rANOVA is not always the best statistical analyses for repeated measure designs. It is vulnerable to effects from missing values, imputation, unequivalent time points between subjects and violations of sphericity. These issues can result in sampling bias and inflated rates of Type I error. In such cases it may be better to consider use of a linear mixed model.  
+
set.seed(456)
 +
n <- 15
 +
k <- 5
 +
time_points <- 1:k
  
 +
ar_data <- data.frame(
 +
  subject = rep(1:n, each = k),
 +
  time = rep(time_points, n),
 +
  treatment = rep(rep(c("A", "B"), length.out = n), each = k)
 +
)
  
===Applications===
+
# Generate data with AR(1) errors
 +
for (i in 1:n) {
 +
  ar_errors <- arima.sim(model = list(ar = 0.7), n = k, sd = 2)
 +
  idx <- ((i-1)*k + 1):(i*k)
 +
  ar_data$response[idx] <- 50 +
 +
    ifelse(ar_data$treatment[idx] == "A", 5, 0) +
 +
    ar_data$time[idx] * 2 +
 +
    ar_errors
 +
}
  
[http://www.southampton.ac.uk/~cpd/anovas/datasets/index.htm  This article] introduced some examples of analysis of variance and covariance (ANOVA) and computer programs for planning data collection designs and estimating power. It introduced about the basic statistical model set up and ANOVA models with one-factor design, nested designs, fully replicated factorial designs, randomized-block designs, split-plot designs, repeated-measures designs and un-replicated designs. In the repeated measure ANOVA, it included examples of one-factor repeated-measures model, two and three factor model with repeated measures.
+
# Fit model with AR(1) correlation structure
 +
model_ar1 <- lme(response ~ time * treatment,  
 +
                random = ~ 1 | subject,
 +
                correlation = corAR1(form = ~ time | subject),
 +
                data = ar_data)
  
[http://psycnet.apa.org/journals/bul/82/4/511/ This article] used repeated measure ANOVA to analyze the data from a pretest-posttest design. The pretest-posttest control group design (or an extension of it) is a highly prestigious experimental design. A popular analytic strategy involves subjecting the data provided by this design to a repeated measures analysis of variance (ANOVA). Unfortunately, the statistical results yielded by this type of analysis can easily be misinterpreted, since the score model underlying the analysis is not correct. Examples from recently published articles are used to demonstrate that this statistical procedure has led to (a) incorrect statements regarding treatment effects, (b) completely redundant reanalyzes of the same data, and (c) problems with respect to post hoc investigations. 2 alternative strategies-gain scores and covariance-are discussed and compared. (19 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
+
# Refit the model using maximum likelihood (method = "ML") instead of REML.  
 +
# This often resolves the non-positive definite matrix issue for computing intervals on variance-covariance components,  
 +
# as ML doesn't apply the same adjustments that can destabilize the approximation under REML.
 +
# Refit the model with ML
 +
model_ar1_ml <- update(model_ar1, method = "ML")
  
 +
cat("=== Initial Model with AR(1) Correlation Structure ===\n")
 +
print(summary(model_ar1))
  
===Software (R)===
+
cat("=== Refit Model with AR(1) Correlation Structure and Maximum Likelihood  Estimation ===\n")
[http://www.southampton.ac.uk/~cpd/anovas/datasets/ANOVA%20in%20R.htm  ANOVA datasets]
+
print(summary(model_ar1_ml))
  
[http://ww2.coastal.edu/kingw/statistics/R-tutorials/repeated.html R-tutorials]
+
cat("\nCorrelation parameter:", intervals(model_ar1_ml)$corStruct[2], "\n")
  
===Problems===
+
# Compare with compound symmetry model
1) A simple problem would be consider the following experiment:
+
model_cs <- lme(response ~ time * treatment,
10 participants engage in an LDT, within which they are exposed to 4 different types of word pairs. RT to click Word or Non-Word recorded and averaged for each pair type.
+
                random = ~ 1 | subject,
Forward-Prime pairs (e.g., baby-stork)
+
                correlation = corCompSymm(form = ~ 1 | subject),
 +
                data = ar_data,  
 +
                method = "ML")
  
Backward-Prime pairs (e.g., stork-baby)
+
# Model comparison
 +
cat("\n=== Model Comparison ===\n")
 +
print(anova(model_ar1_ml, model_cs))
 +
</pre>
  
Unrelated Pairs (e.g., glass-apple)
+
===Software Implementation===
  
Non-Word Pairs (e.g., door-blug)
+
<pre>
 +
# Comprehensive rANOVA analysis function
 +
run_ranova_analysis <- function(data, subject_var, time_var, response_var, between_vars = NULL, method = "mixed") {
 +
  cat("=== REPEATED MEASURES ANOVA ANALYSIS ===\n\n")
 +
  cat("Subject variable:", subject_var, "\n")
 +
  cat("Time variable:", time_var, "\n")
 +
  cat("Response variable:", response_var, "\n")
 +
  if (!is.null(between_vars)) {
 +
    cat("Between-subjects variables:", paste(between_vars, collapse = ", "), "\n")
 +
  }
 +
  cat("Method:", method, "\n\n")
 +
 
 +
  # Check for missing data
 +
  missing_count <- sum(is.na(data[[response_var]]))
 +
  cat("Missing observations:", missing_count, sprintf("(%.1f%%)\n", missing_count/nrow(data)*100))
 +
 
 +
  if (method == "mixed") {
 +
    # Linear mixed model approach
 +
    library(lme4)
 +
    library(lmerTest)
 +
   
 +
    # Build formula
 +
    if (is.null(between_vars)) {
 +
      formula_str <- paste(response_var, "~", time_var, "+ (1|", subject_var, ")")
 +
    } else {
 +
      between_str <- paste(between_vars, collapse = " + ")
 +
      # Add interactions with time
 +
      interaction_str <- paste(paste(time_var, between_vars, sep = ":"), collapse = " + ")
 +
      formula_str <- paste(response_var, "~", time_var, "*", between_str, "+ (1|", subject_var, ")")
 +
    }
 +
    cat("Fitting model:", formula_str, "\n\n")
 +
    model <- lmer(as.formula(formula_str), data = data)
 +
   
 +
    cat("--- Model Summary ---\n")
 +
    print(summary(model))
 +
   
 +
    cat("\n--- Tests of Fixed Effects ---\n")
 +
    print(anova(model))
 +
   
 +
    # Variance components
 +
    cat("\n--- Variance Components ---\n")
 +
    print(VarCorr(model))
 +
   
 +
    # Calculate intraclass correlation coefficient (ICC)
 +
    vc <- VarCorr(model)
 +
    sigma_subject <- as.data.frame(vc)$vcov[1]
 +
    sigma_residual <- as.data.frame(vc)$vcov[2]
 +
    ICC <- sigma_subject / (sigma_subject + sigma_residual)
 +
    cat("\nIntraclass Correlation Coefficient (ICC):", round(ICC, 3), "\n")
 +
   
 +
  } else if (method == "traditional") {
 +
    # Traditional rANOVA using aov()
 +
    if (!is.null(between_vars)) {
 +
      formula_str <- paste(response_var, "~", paste(between_vars, collapse = " + "), "+ Error(", subject_var, "/", time_var, ")")
 +
    } else {
 +
      formula_str <- paste(response_var, "~", time_var, "+ Error(", subject_var, "/", time_var, ")")
 +
    }
 +
    cat("Fitting model:", formula_str, "\n\n")
 +
    model <- aov(as.formula(formula_str), data = data)
 +
   
 +
    cat("--- ANOVA Table ---\n")
 +
    print(summary(model))
 +
  }
 +
 
 +
  # Post-hoc tests if there's a significant time effect
 +
  cat("\n--- Post-hoc Comparisons ---\n")
 +
  if (require(emmeans)) {
 +
    # Check for significant time effect (optional: adjust p-value threshold as needed)
 +
    if (method == "mixed") {
 +
      anova_res <- anova(model)
 +
      time_p <- anova_res[time_var, "Pr(>F)"]
 +
    } else {
 +
      anova_res <- summary(model)
 +
      time_p <- anova_res[[2]][[1]]$"Pr(>F)"[1]  # Assuming time is the first term in Error strata
 +
    }
 +
   
 +
    if (!is.na(time_p) && time_p < 0.05) {
 +
      if (method == "mixed") {
 +
        # Dynamically build the specs formula
 +
        specs_formula <- formula(paste("pairwise ~", time_var))
 +
        emm_time <- emmeans(model, specs = specs_formula, adjust = "tukey")
 +
      } else {
 +
        # For traditional ANOVA, fit a linear model for post-hoc tests
 +
        if (is.null(between_vars)) {
 +
          lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var))
 +
        } else {
 +
          lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var, "+", paste(between_vars, collapse = " + ")))
 +
        }
 +
        lm_model <- lm(lm_formula, data = data)
 +
       
 +
        # Dynamically build the specs formula
 +
        specs_formula <- formula(paste("pairwise ~", time_var))
 +
        emm_time <- emmeans(lm_model, specs = specs_formula, adjust = "tukey")
 +
      }
 +
      print(emm_time$contrasts)
 +
    } else {
 +
      cat("No significant time effect detected (p >= 0.05). Skipping post-hoc comparisons.\n")
 +
    }
 +
  }
 +
 
 +
  # Effect sizes
 +
  cat("\n--- Effect Sizes ---\n")
 +
  if (method == "traditional") {
 +
    aov_summary <- summary(model)
 +
    if (!is.null(between_vars)) {
 +
      # This is more complex with between-subjects factors
 +
      cat("Note: Effect size calculation for factorial rANOVA requires manual computation\n")
 +
    } else {
 +
      SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
 +
      SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
 +
      eta_sq <- SS_time / (SS_time + SS_error)
 +
      cat("Partial eta-squared:", round(eta_sq, 3), "\n")
 +
    }
 +
  }
 +
 
 +
  return(model)
 +
}
  
Do a repeated measures ANOVA test on for the overall RM ANOVA:
+
# Example usage with built-in dataset
$Ho: µf = µb = µu= µn$
+
cat("\n\n=== EXAMPLE: Chick Weight Dataset ===\n")
 +
data(ChickWeight)
 +
ChickWeight$Time <- factor(ChickWeight$Time)
 +
ChickWeight$Chick <- factor(ChickWeight$Chick)
  
$Ha$:  At least one treatment mean is different from another.
+
# Run analysis
+
result <- run_ranova_analysis(
The data set is listed as below:
+
  data = ChickWeight,
<center>
+
  subject_var = "Chick",
{| class="wikitable" style="text-align:center; width:35%" border="1"
+
  time_var = "Time",
|-
+
  response_var = "weight",
|Part|| Forward ||Backward|| Unrelated|| Non-word|| Sum ||  ||
+
   between_vars = "Diet",
|-
+
  method = "mixed"
|   || f || b || u || n || Part.|| ||
+
)
|-
+
</pre>
|1|| 0.2||0.1||0.4|| 0.7||1.4||
 
|-
 
|2 ||0.5||0.3||0.8||0.9||2.5||
 
|-
 
|3|| 0.4||0.3||0.6||0.8||2.1||$\sum F^{2}= 1.36$
 
|-
 
|4 ||0.4||0.2||0.8||0.9||2.3||$\sum B^{2}=.58$
 
|-
 
|5|| 0.6||0.4||0.8||0.8||2.6||$\sum U^{2}=3.82$
 
|-
 
|6|| 0.3||0.3||0.5||0.8||1.9||$\sum N^{2}= 6.63$
 
|-
 
|7|| 0.1||0.1||0.5||0.7||1.4||$\sum P^{2}= 385$
 
|-
 
|8|| 0.2||0.1||0.6||0.9||1.8||$\sum X_{t}^{2}= 12.39$
 
|-
 
|9|| 0.3||0.2||0.4||0.7||1.6||
 
|-
 
|10|| 0.4||0.2||0.6||0.9||2.1||
 
|-
 
|$\sum$||3.4||2.2||6||8.1||19.7||All = 19.7
 
|}
 
</center>
 
  
 +
===Common Issues and Solutions===
  
+
==== Missing Data Patterns====
  
 +
<pre>
 +
# Identify missing data patterns
 +
library(mice)
  
 +
analyze_missing_patterns <- function(data, subject_var, time_var, response_var) {
 +
  # Convert to wide format
 +
  wide_data <- reshape(data,
 +
                      idvar = subject_var,
 +
                      timevar = time_var,
 +
                      direction = "wide")
 +
 
 +
  # Create pattern matrix
 +
  pattern_matrix <- md.pattern(wide_data[, -1], plot = FALSE)
 +
 
 +
  cat("Missing Data Patterns:\n")
 +
  print(pattern_matrix)
 +
 
 +
  # Visualize missingness
 +
  if (require(VIM)) {
 +
    aggr_plot <- aggr(wide_data[, -1],
 +
                      numbers = TRUE,
 +
                      sortVars = TRUE,
 +
                      labels = names(wide_data)[-1],
 +
                      cex.axis = 0.7,
 +
                      gap = 3,
 +
                      ylab = c("Missing Data Pattern", "Frequency"))
 +
  }
 +
 
 +
  # Test for MCAR using Little's test
 +
  if (require(norm2)) {
 +
    # Prepare data for Little's test
 +
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
 +
    test_data <- wide_data[, response_cols]
 +
   
 +
    # Perform Little's MCAR test
 +
    little_test <- mcar.test(test_data)
 +
    cat("\nLittle's MCAR Test:\n")
 +
    print(little_test)
 +
  }
 +
 
 +
  return(pattern_matrix)
 +
}
  
 +
# Example
 +
missing_pattern <- analyze_missing_patterns(clinical_data, "subject", "time", "response")
 +
</pre>
  
 +
====Violation of Sphericity====
  
 +
First we define the worker function.
  
 +
<pre>
 +
handle_sphericity_violation <- function(data, subject_var, time_var, response_var) {
 +
  cat("=== Handling Sphericity Violation ===\n")
 +
 
 +
  # 1. Use Greenhouse-Geisser or Huynh-Feldt corrections
 +
  cat("\n1. Apply epsilon corrections:\n")
 +
  # (Implementation shown in check_ranova_assumptions function above)
 +
 
 +
  # 2. Use multivariate approach (MANOVA)
 +
  cat("\n2. Multivariate approach (MANOVA):\n")
 +
 
 +
  # Convert to wide format
 +
  wide_data <- reshape(data,
 +
                      idvar = subject_var,
 +
                      timevar = time_var,
 +
                      direction = "wide")
 +
 
 +
  # Extract response columns
 +
  response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
 +
  response_matrix <- as.matrix(wide_data[, response_cols])
 +
 
 +
  # Fit MANOVA
 +
  manova_result <- manova(response_matrix ~ 1)
 +
 
 +
  # Test using Pillai's trace (most robust)
 +
  manova_summary <- summary(manova_result, test = "Pillai")
 +
  print(manova_summary)
 +
 
 +
  # 3. Use linear mixed models with unstructured covariance
 +
  cat("\n3. Linear mixed model with unstructured covariance:\n")
 +
  library(nlme)
 +
 
 +
  # Fit model with unstructured covariance
 +
  model_unstructured <- lme(as.formula(paste(response_var, "~", time_var)),
 +
                          random = ~ 1 | subject_var,
 +
                          correlation = corSymm(form = ~ as.numeric(get(time_var)) | subject_var),
 +
                          data = data,
 +
                          control = lmeControl(opt = "optim"))
 +
 
 +
  print(summary(model_unstructured))
 +
 
 +
  # Compare with compound symmetry model
 +
  model_compound <- lme(as.formula(paste(response_var, "~", time_var)),
 +
                      random = ~ 1 | subject_var,
 +
                      correlation = corCompSymm(form = ~ 1 | subject_var),
 +
                      data = data)
 +
 
 +
  cat("\nModel comparison (unstructured vs. compound symmetry):\n")
 +
  print(anova(model_unstructured, model_compound))
 +
}
 +
</pre>
  
 +
Next, we test the ''handle_sphericity_violation ()'' function using a small toy dataset with 4 subjects and 3 time points
 +
to keep things simple and illustrative (avoiding potential issues with missing data or large outputs in built-in datasets like ChickWeight).
 +
This dataset simulates repeated measures where the response increases over time, which might violate sphericity in a real analysis.
  
 +
<pre>
 +
# Load required library
 +
library(nlme)
  
 +
# Create a small toy dataset for demonstration (modified to avoid collinearity/rank deficiency)
 +
data <- data.frame(
 +
  subject = rep(1:4, each = 3),
 +
  time = rep(1:3, 4),
 +
  response = c(5, 7, 8, 5.5, 6, 7.5, 4, 5, 5.5, 6, 8, 9)
 +
)
  
 +
# Define the function (with fixes for grouping variable and time conversion)
 +
handle_sphericity_violation <- function(data, subject_var, time_var, response_var) {
 +
  cat("=== Handling Sphericity Violation ===\n")
 +
 
 +
  # 1. Use Greenhouse-Geisser or Huynh-Feldt corrections
 +
  cat("\n1. Apply epsilon corrections:\n")
 +
  # (Implementation shown in check_ranova_assumptions function above)
 +
 
 +
  # 2. Use multivariate approach (MANOVA) for time effect
 +
  cat("\n2. Multivariate approach (MANOVA):\n")
 +
  # Convert to wide format
 +
  wide_data <- reshape(data, idvar = subject_var, timevar = time_var, direction = "wide")
 +
  # Extract response columns (assuming sorted by time levels)
 +
  response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
 +
  response_matrix <- as.matrix(wide_data[, response_cols])
 +
  # Create successive differences to test time effect (k-1 variables)
 +
  if (ncol(response_matrix) > 1) {
 +
    diff_matrix <- response_matrix[, 2:ncol(response_matrix)] - response_matrix[, 1:(ncol(response_matrix)-1)]
 +
    # Fit MANOVA on differences ~1 (tests if mean differences are zero)
 +
    manova_result <- manova(diff_matrix ~ 1)
 +
    # Test using Pillai's trace (most robust)
 +
    manova_summary <- summary(manova_result, test = "Pillai")
 +
    print(manova_summary)
 +
  } else {
 +
    cat("Not enough time points for MANOVA.\n")
 +
  }
 +
 
 +
  # 3. Use linear mixed models with unstructured covariance
 +
  cat("\n3. Linear mixed model with unstructured covariance:\n")
 +
  # library(nlme) # Already loaded globally
 +
 
 +
  # Convert grouping variable to factor (required for lme grouping)
 +
  data[[subject_var]] <- factor(data[[subject_var]])
 +
 
 +
  # Convert time_var to numeric if it's a factor (for correlation covariate)
 +
  if (is.factor(data[[time_var]])) {
 +
    data[[time_var]] <- as.numeric(as.character(data[[time_var]]))
 +
  }
 +
 
 +
  # Fit model with unstructured covariance (dynamic formula)
 +
  cor_form <- as.formula(paste("~", time_var, "|", subject_var))
 +
 
 +
  # *** FIX APPLIED HERE: Create random formula dynamically ***
 +
  random_form <- as.formula(paste("~ 1 |", subject_var))
 +
 
 +
  model_unstructured <- lme(as.formula(paste(response_var, "~", time_var)),
 +
                            random = random_form, # Use the dynamically created formula
 +
                            correlation = corSymm(form = cor_form),
 +
                            data = data,
 +
                            control = lmeControl(opt = "optim"))
 +
  print(summary(model_unstructured))
 +
 
 +
  # Compare with compound symmetry model
 +
  model_compound <- lme(as.formula(paste(response_var, "~", time_var)),
 +
                        random = random_form, # Also use the dynamically created formula
 +
                        correlation = corCompSymm(form = as.formula(paste("~ 1 |", subject_var))),
 +
                        data = data)
 +
  cat("\nModel comparison (unstructured vs. compound symmetry):\n")
 +
  print(anova(model_unstructured, model_compound))
 +
}
  
 +
# Call the function with the toy dataset
 +
handle_sphericity_violation(data, "subject", "time", "response")
 +
</pre>
  
 +
===Problems and Exercises===
  
 +
1. '''Conceptual Problems''':
 +
  a) Derive the expected mean squares for the rANOVA model
 +
  b) Prove that the F-test for time effects in rANOVA follows an F-distribution under the null hypothesis
 +
  c) Show how the intraclass correlation coefficient relates to the variance components in the mixed model
  
 +
2. '''Applied Problems''':
 +
  a) Analyze the [http://wiki.socr.umich.edu/index.php/SOCR_Data_072108_LDT SOCR LDT dataset] using rANOVA
 +
  b) Conduct a repeated measures analysis of the [https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ChickWeight.html ChickWeight dataset] with diet as a between-subjects factor
 +
  c) Perform power analysis for a planned longitudinal study with 5 time points, expecting a medium effect (f² = 0.15), and correlation ρ = 0.6
  
  
 +
3. '''Simulation Study''':
 +
<pre>
 +
# Simulation to compare traditional rANOVA and linear mixed models
 +
simulate_rm_comparison <- function(n_sim = 500, n = 20, k = 4, missing_rate = 0.1) {
 +
 
 +
  type1_error_aov <- numeric(n_sim)
 +
  type1_error_lmm <- numeric(n_sim)
 +
 
 +
  for (sim in 1:n_sim) {
 +
    # Simulate data under null hypothesis (no time effect)
 +
    data <- expand.grid(
 +
      subject = factor(1:n),
 +
      time = factor(1:k)
 +
    )
 +
   
 +
    # Random intercept model
 +
    subject_effects <- rnorm(n, 0, 5)
 +
    data$response <- rep(subject_effects, each = k) + rnorm(n * k, 0, 2)
 +
   
 +
    # Introduce missing data
 +
    if (missing_rate > 0) {
 +
      missing_indices <- sample(1:nrow(data), size = round(nrow(data) * missing_rate))
 +
      data$response[missing_indices] <- NA
 +
    }
 +
   
 +
    # Method 1: Traditional rANOVA (complete cases only)
 +
    complete_data <- data[complete.cases(data), ]
 +
    complete_subjects <- length(unique(complete_data$subject))
 +
   
 +
    if (complete_subjects >= 3) {
 +
      try({
 +
        model_aov <- aov(response ~ time + Error(subject/time), data = complete_data)
 +
        aov_summary <- summary(model_aov)
 +
        p_aov <- aov_summary[[2]][[1]]$"Pr(>F)"[1]
 +
        type1_error_aov[sim] <- p_aov < 0.05
 +
      }, silent = TRUE)
 +
    }
 +
   
 +
    # Method 2: Linear mixed model (uses all available data)
 +
    try({
 +
      library(lmerTest)
 +
      model_lmm <- lmer(response ~ time + (1|subject), data = data,
 +
                        na.action = na.omit)
 +
      p_lmm <- anova(model_lmm)$"Pr(>F)"[1]
 +
      type1_error_lmm[sim] <- p_lmm < 0.05
 +
    }, silent = TRUE)
 +
  }
 +
 
 +
  cat("Simulation Results (n =", n_sim, "):\n")
 +
  cat("Type I error rate - Traditional rANOVA:",
 +
      mean(type1_error_aov, na.rm = TRUE), "\n")
 +
  cat("Type I error rate - Linear Mixed Model:",
 +
      mean(type1_error_lmm, na.rm = TRUE), "\n")
 +
 
 +
  return(data.frame(
 +
    simulation = 1:n_sim,
 +
    aov_error = type1_error_aov,
 +
    lmm_error = type1_error_lmm
 +
  ))
 +
}
  
 +
# Run simulation
 +
sim_results <- simulate_rm_comparison(n_sim = 200, n = 15, k = 3, missing_rate = 0.1)
 +
</pre>
  
 +
===References===
  
 +
1. Maxwell, S. E., & Delaney, H. D. (2004). *Designing Experiments and Analyzing Data: A Model Comparison Perspective* (2nd ed.). Psychology Press.
  
 +
2. Pinheiro, J. C., & Bates, D. M. (2000). *Mixed-Effects Models in S and S-PLUS*. Springer.
  
 +
3. Verbeke, G., & Molenberghs, G. (2000). *Linear Mixed Models for Longitudinal Data*. Springer.
  
 +
4. Field, A., Miles, J., & Field, Z. (2012). *Discovering Statistics Using R*. Sage Publications.
  
 +
5. Gueorguieva, R., & Krystal, J. H. (2004). Move over ANOVA: Progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry. *Archives of General Psychiatry, 61*(3), 310-317.
  
 +
===Online Resources===
 +
* [https://cran.r-project.org/web/views/RepeatedMeasures.html Repeated Measures Analysis in R]
 +
* [https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf Linear Mixed Models with lme4]
 +
* [https://www.socr.umich.edu SOCR Resources and Datasets]
 +
* [https://stats.idre.ucla.edu/r/seminars/repeated-measures-analysis-with-r/ UCLA R Seminar on Repeated Measures]
  
 
<hr>
 
<hr>
* SOCR Home page: http://www.socr.umich.edu
+
* SOCR Home page: https://www.socr.umich.edu
  
 
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_rANOVA}}
 
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_rANOVA}}

Latest revision as of 13:35, 9 December 2025

Scientific Methods for Health Sciences - Repeated Measures Analysis of Variance (rANOVA)

Overview

Repeated Measures Analysis of Variance (rANOVA) is a statistical method for analyzing data where the same subjects are measured multiple times under different conditions or at different time points. Unlike independent ANOVA, rANOVA accounts for the correlation between repeated measurements on the same subject, increasing statistical power by reducing error variance attributable to individual differences. This section provides a comprehensive treatment of rANOVA, including its mathematical foundations, assumptions, modern alternatives, and practical implementation in R.

Motivation

Consider a clinical trial measuring blood pressure in patients at baseline, 3 months, and 6 months post-treatment. Using independent ANOVA would violate the independence assumption since measurements are correlated within subjects. rANOVA addresses this by:

1. Partitioning variance into between-subjects and within-subjects components

2. Increasing statistical power by reducing error variance

3. Accounting for individual differences through subject-specific effects

4. Enabling longitudinal analysis of change over time

For studies with missing data, complex covariance structures, or unbalanced designs, Linear Mixed Models (LMMs) provide a more flexible alternative to traditional rANOVA.

Theory

Mathematical Model

Univariate Approach

For \(n\) subjects measured at \(k\) time points, the univariate rANOVA model is\[ y_{ij} = \mu + \pi_i + \tau_j + \varepsilon_{ij}, \quad i=1,\ldots,n,\; j=1,\ldots,k \] where: - \(y_{ij}\) is the measurement for subject \(i\) at time \(j\) - \(\mu\) is the overall mean - \(\pi_i \sim N(0, \sigma_{\pi}^2)\) is the random subject effect - \(\tau_j\) is the fixed time effect (\(\sum_{j=1}^k \tau_j = 0\)) - \(\varepsilon_{ij} \sim N(0, \sigma_{\varepsilon}^2)\) is the within-subject error

The covariance structure assumes\[ \text{Cov}(y_{ij}, y_{i'j'}) = \begin{cases} \sigma_{\pi}^2 + \sigma_{\varepsilon}^2 & \text{if } i=i', j=j' \\ \sigma_{\pi}^2 & \text{if } i=i', j \neq j' \\ 0 & \text{if } i \neq i' \end{cases} \]

Matrix Formulation

Let \(\mathbf{y}_i = (y_{i1}, \ldots, y_{ik})^\top\) be the vector of measurements for subject \(i\)\[ \mathbf{y}_i = \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol{\varepsilon}_i \] where: - \(\mathbf{X}_i\) is the design matrix for fixed effects - \(\boldsymbol{\beta}\) contains fixed effects (including time effects) - \(\mathbf{Z}_i\) is the design matrix for random effects - \(\mathbf{b}_i \sim N(\mathbf{0}, \mathbf{G})\) are random subject effects - \(\boldsymbol{\varepsilon}_i \sim N(\mathbf{0}, \mathbf{R}_i)\) is the within-subject error

For compound symmetry structure\[ \text{Var}(\mathbf{y}_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^\top + \mathbf{R}_i = \sigma_{\pi}^2\mathbf{J}_k + \sigma_{\varepsilon}^2\mathbf{I}_k \] where \(\mathbf{J}_k\) is a \(k \times k\) matrix of ones and \(\mathbf{I}_k\) is the identity matrix.

Variance Partitioning

The total sum of squares decomposes as\[ SST = SS_{BetweenSubjects} + SS_{WithinSubjects} \] \( SS_{WithinSubjects} = SS_{Time} + SS_{Error} \)

More formally\[ \begin{aligned} SST &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Subjects} &= k \sum_{i=1}^n (\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Time} &= n \sum_{j=1}^k (\bar{y}_{\cdot j} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Error} &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{i\cdot} - \bar{y}_{\cdot j} + \bar{y}_{\cdot\cdot})^2 \end{aligned} \]

Degrees of freedom\[ \begin{aligned} df_{Total} &= nk - 1 \\ df_{Subjects} &= n - 1 \\ df_{Time} &= k - 1 \\ df_{Error} &= (n - 1)(k - 1) \end{aligned} \]

Mean squares\[ \begin{aligned} MS_{Subjects} &= SS_{Subjects} / (n-1) \\ MS_{Time} &= SS_{Time} / (k-1) \\ MS_{Error} &= SS_{Error} / ((n-1)(k-1)) \end{aligned} \]

F-statistic for time effect\[ F = \frac{MS_{Time}}{MS_{Error}} \sim F_{k-1, (n-1)(k-1)} \quad \text{under } H_0: \tau_1 = \cdots = \tau_k \]

Sphericity and Corrections

The sphericity assumption (also called circularity) requires\[ \text{Var}(y_{ij} - y_{ij'}) = \text{constant for all } j \neq j' \] Equivalently, the covariance matrix of the difference scores has equal variances and equal covariances.

Mauchly's Test for Sphericity: \( W = \frac{| \mathbf{E} |}{(\text{tr}(\mathbf{E})/p)^{p}} \sim \chi^2_{(p(p-1)/2 - 1)} \) where \(\mathbf{E}\) is the \(p \times p\) covariance matrix of the difference scores, \(p = k-1\).

When sphericity is violated, corrections adjust degrees of freedom:

1. Greenhouse-Geisser correction\[\hat{\epsilon} = \frac{(\text{tr}(\mathbf{E}))^2}{(p-1)\sum_{i=1}^p \lambda_i^2}\] \(F_{GG} \sim F_{\hat{\epsilon}(k-1), \hat{\epsilon}(n-1)(k-1)}\)

2. Huynh-Feldt correction\[\tilde{\epsilon} = \frac{n(k-1)\hat{\epsilon} - 2}{(k-1)[(n-1) - (k-1)\hat{\epsilon}]}\] \(F_{HF} \sim F_{\tilde{\epsilon}(k-1), \tilde{\epsilon}(n-1)(k-1)}\)

4) Multivariate Approach (MANOVA)

For \(k\) time points, consider the \((k-1) \times 1\) vector of difference scores \(\mathbf{d}_i = (y_{i2}-y_{i1}, \ldots, y_{ik}-y_{i,k-1})^\top\)\[ \mathbf{d}_i \sim N_{k-1}(\boldsymbol{\mu}_d, \boldsymbol{\Sigma}_d) \] Test \(H_0: \boldsymbol{\mu}_d = \mathbf{0}\) using Hotelling's \(T^2\)\[ T^2 = n\bar{\mathbf{d}}^\top \mathbf{S}_d^{-1} \bar{\mathbf{d}} \sim \frac{(n-1)(k-1)}{n-k+1} F_{k-1, n-k+1} \] where \(\bar{\mathbf{d}}\) is the sample mean and \(\mathbf{S}_d\) is the sample covariance of difference scores.

Assumptions and Diagnostics

Key Assumptions: 1. Normality: Residuals \(\varepsilon_{ij}\) should be normally distributed 2. Sphericity: Equal variances of differences between all pairs of repeated measures 3. Missing Data: Missing completely at random (MCAR) for valid inference 4. No Time × Subject Interaction: Treatment effects are consistent across subjects

Diagnostic Procedures:

# Comprehensive rANOVA diagnostic function
check_ranova_assumptions <- function(data, subject_var, time_var, response_var) {
  
  cat("=== REPEATED MEASURES ANOVA DIAGNOSTICS ===\n")
  
  # 1. Normality of residuals
  # Fit a linear model ignoring the repeated measures structure first
  model_lm <- lm(as.formula(paste(response_var, "~", time_var)), data = data)
  residuals <- resid(model_lm)
  
  par(mfrow = c(2, 3))
  
  # Q-Q plot
  qqnorm(residuals, main = "Q-Q Plot of Residuals")
  qqline(residuals)
  
  # Shapiro-Wilk test
  shapiro_test <- shapiro.test(residuals)
  cat("1. Normality (Shapiro-Wilk): W =", round(shapiro_test$statistic, 4),
      "p =", format.pval(shapiro_test$p.value, digits = 4), "\n")
  
  # 2. Sphericity test (requires ez package)
  if (require(ez, quietly = TRUE)) {
    # Convert to wide format for Mauchly's test
    wide_data <- reshape(data, 
                         idvar = subject_var, 
                         timevar = time_var, 
                         direction = "wide")
    
    # Remove subject column
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
    response_matrix <- as.matrix(wide_data[, response_cols])
    
    # Mauchly's test using car package
    if (require(car, quietly = TRUE)) {
      # Create an idata frame for repeated measures
      time_levels <- unique(data[[time_var]])
      idata <- data.frame(time = factor(time_levels))
      
      # Fit linear model for repeated measures
      mlm <- lm(response_matrix ~ 1)
      
      # Mauchly's test
      mauchly_test <- Anova(mlm, idata = idata, idesign = ~time, type = "III")
      mauchly_result <- mauchly_test$sphericity.tests
      
      cat("\n2. Mauchly's Test for Sphericity:\n")
      print(mauchly_result)
      
      # Greenhouse-Geisser and Huynh-Feldt corrections
      epsilon_GG <- mauchly_result["Greenhouse-Geisser epsilon", "GG eps"]
      epsilon_HF <- mauchly_result["Huynh-Feldt epsilon", "HF eps"]
      cat("\nEpsilon corrections:\n")
      cat("  Greenhouse-Geisser:", round(epsilon_GG, 3), "\n")
      cat("  Huynh-Feldt:", round(epsilon_HF, 3), "\n")
    }
  }
  
  # 3. Homogeneity of variance across time points
  boxplot(as.formula(paste(response_var, "~", time_var)), 
          data = data,
          main = "Boxplots by Time Point",
          xlab = time_var, ylab = response_var)
  
  # Levene's test for homogeneity of variance
  if (require(car, quietly = TRUE)) {
    levene_test <- leveneTest(as.formula(paste(response_var, "~ factor(", time_var, ")")), 
                              data = data)
    cat("\n3. Levene's Test for Homogeneity of Variance:\n")
    print(levene_test)
  }
  
  # 4. Individual profiles plot
  data_sorted <- data[order(data[[subject_var]], data[[time_var]]), ]
  
  # Create empty plot
  plot(1, type = "n", 
       xlim = range(as.numeric(data[[time_var]])), 
       ylim = range(data[[response_var]]),
       xlab = time_var, ylab = response_var,
       main = "Individual Profiles")
  
  # Plot each subject's profile
  subjects <- unique(data[[subject_var]])
  colors <- rainbow(length(subjects))
  
  for (i in seq_along(subjects)) {
    subject_data <- data[data[[subject_var]] == subjects[i], ]
    lines(as.numeric(subject_data[[time_var]]), 
          subject_data[[response_var]], 
          col = colors[i], type = "b", lwd = 1)
  }
  
  # Add group mean profile
  group_means <- aggregate(as.formula(paste(response_var, "~", time_var)), 
                          data = data, mean)
  lines(as.numeric(group_means[[time_var]]), 
        group_means[[response_var]], 
        col = "black", lwd = 3, type = "b")
  
  legend("topright", legend = c("Individual", "Group Mean"), 
         col = c("gray", "black"), lwd = c(1, 3), bty = "n")
  
  # 5. Residuals vs fitted plot
  plot(fitted(model_lm), residuals,
       xlab = "Fitted Values", ylab = "Residuals",
       main = "Residuals vs Fitted")
  abline(h = 0, col = "red", lty = 2)
  
  # 6. Histogram of residuals
  hist(residuals, main = "Histogram of Residuals",
       xlab = "Residuals", col = "lightblue", freq = FALSE)
  curve(dnorm(x, mean = mean(residuals), sd = sd(residuals)), 
        add = TRUE, col = "red", lwd = 2)
  
  par(mfrow = c(1, 1))
  
  # Return diagnostic summary
  return(list(
    shapiro_test = shapiro_test,
    sphericity_tests = if(exists("mauchly_result")) mauchly_result else NULL,
    homogeneity_test = if(exists("levene_test")) levene_test else NULL
  ))
}

Applications

Example 1: Exercise Training Intervention

# Recreate the example from the motivation section
exercise_data <- data.frame(
  subject = factor(rep(1:6, each = 3)),
  time = factor(rep(c("Pre", "3Months", "6Months"), 6)),
  fitness = c(45, 50, 55,    # Subject 1
              42, 42, 45,    # Subject 2
              36, 41, 43,    # Subject 3
              39, 35, 40,    # Subject 4
              51, 55, 59,    # Subject 5
              44, 49, 56)    # Subject 6
)

cat("=== Exercise Training Intervention Study ===\n")
cat("Research question: Does fitness level change over time?\n\n")

# Display the data
cat("Dataset:\n")
print(exercise_data)

# Summary statistics
cat("\nSummary statistics by time:\n")
print(aggregate(fitness ~ time, data = exercise_data, 
                FUN = function(x) c(mean = mean(x), sd = sd(x))))

# Run diagnostics first
cat("\n=== Model Diagnostics ===\n")
diagnostics <- check_ranova_assumptions(exercise_data, "subject", "time", "fitness")

# Traditional rANOVA using aov()
cat("\n=== Traditional rANOVA ===\n")
model_aov <- aov(fitness ~ time + Error(subject/time), data = exercise_data)
print(summary(model_aov))

# Extract F-statistic and p-value
aov_summary <- summary(model_aov)
F_time <- aov_summary[[2]][[1]]$"F value"[1]
df_time <- aov_summary[[2]][[1]]$"Df"[1]
df_error <- aov_summary[[2]][[1]]$"Df"[2]
p_time <- aov_summary[[2]][[1]]$"Pr(>F)"[1]

cat(sprintf("\nF(%d, %d) = %.2f, p = %.3f\n", 
            df_time, df_error, F_time, p_time))

# Effect size calculation (Partial eta-squared)
SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
eta_sq <- SS_time / (SS_time + SS_error)
cat("Partial eta-squared (effect size):", round(eta_sq, 3), "\n")

# Post-hoc pairwise comparisons with Bonferroni correction
cat("\n=== Post-hoc Pairwise Comparisons ===\n")
library(emmeans)

# Fit linear model for post-hoc tests
model_lm <- lm(fitness ~ time + subject, data = exercise_data)
emm_time <- emmeans(model_lm, specs = pairwise ~ time, adjust = "bonferroni")
print(emm_time$contrasts)

# Visualization
library(ggplot2)
ggplot(exercise_data, aes(x = time, y = fitness, group = subject)) +
  geom_line(aes(color = subject), alpha = 0.6) +
  geom_point(aes(color = subject), size = 2) +
  stat_summary(aes(group = 1), fun = mean, geom = "line", 
               color = "black", size = 1.5) +
  stat_summary(fun.data = mean_se, geom = "errorbar", 
               width = 0.1, color = "black") +
  labs(title = "Fitness Levels Over Time",
       subtitle = "Individual profiles with group mean",
       x = "Time Point", y = "Fitness Score") +
  theme_minimal()

Example 2: Clinical Trial with Missing Data (Using Mixed Models)

# Simulated clinical trial data with missing values
set.seed(123)
n_subjects <- 30
n_timepoints <- 4

clinical_data <- expand.grid(
  subject = factor(1:n_subjects),
  time = factor(paste0("Week_", 0:3)),
  treatment = factor(rep(c("Drug", "Placebo"), each = n_subjects/2))
)

# Generate data with treatment effect and time trend
clinical_data$response <- with(clinical_data, 
  50 +  # Baseline
  ifelse(treatment == "Drug", 5, 0) +  # Treatment effect
  as.numeric(time) * 2 +  # Time effect (2 units per week)
  ifelse(treatment == "Drug", as.numeric(time) * 1, 0) +  # Treatment × time interaction
  rep(rnorm(n_subjects, 0, 3), each = n_timepoints) +  # Random intercept
  rnorm(n_subjects * n_timepoints, 0, 2)  # Residual error
)

# Introduce missing data (MCAR)
missing_indices <- sample(1:nrow(clinical_data), size = 20)
clinical_data$response[missing_indices] <- NA

cat("=== Clinical Trial with Repeated Measures ===\n")
cat("Research question: Does the treatment affect response over time?\n\n")

cat("Missing data pattern:\n")
print(table(is.na(clinical_data$response)))

# Approach 1: Linear Mixed Model (recommended for missing data)
cat("\n=== Linear Mixed Model Approach ===\n")
library(lme4)
library(lmerTest)

# Fit mixed model with random intercepts
mixed_model <- lmer(response ~ time * treatment + (1|subject), 
                    data = clinical_data, 
                    na.action = na.omit)

cat("\nModel summary:\n")
print(summary(mixed_model))

# Test fixed effects
cat("\nTests of fixed effects:\n")
print(anova(mixed_model))

# Random effects variance components
cat("\nVariance components:\n")
print(VarCorr(mixed_model))

# Estimated marginal means
if (require(emmeans)) {
  cat("\n=== Estimated Marginal Means ===\n")
  emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
  print(emm_treatment)
  
  emm_time <- emmeans(mixed_model, specs = ~ time)
  print(emm_time)
  
  # Interaction plot
  emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
  print(emm_interaction)
  

# Robust Interaction plot
if (require(emmeans)) {
  cat("\n=== Estimated Marginal Means ===\n")
  emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
  print(emm_treatment)
  
  emm_time <- emmeans(mixed_model, specs = ~ time)
  print(emm_time)
  
  # Interaction plot
  emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
  
  # Convert to data frame and ensure confidence intervals are present
  plot_data <- as.data.frame(emm_interaction)
  
  # Check if confidence interval columns exist
  if (!all(c("asymp.LCL", "asymp.LCL") %in% names(plot_data))) {
    # If not, get them using summary
    plot_data <- as.data.frame(summary(emm_interaction))
  }
  
  # Now plot
  ggplot(plot_data, aes(x = time, y = emmean, group = treatment, color = treatment)) +
    geom_line(size = 1.5) +
    geom_errorbar(aes(ymin = asymp.LCL, ymax = asymp.UCL), width = 0.1) +
    geom_point(size = 3) +
    labs(title = "Treatment × Time Interaction",
         subtitle = "Estimated marginal means with 95% confidence intervals",
         x = "Time Point", y = "Response") +
    theme_minimal()
}

}

# Approach 2: Traditional rANOVA (for comparison)
cat("\n=== Traditional rANOVA (for comparison) ===\n")
# Remove subjects with missing data for traditional ANOVA
complete_data <- clinical_data[complete.cases(clinical_data), ]
complete_subjects <- unique(complete_data$subject)
cat("Subjects with complete data:", length(complete_subjects), "\n")

if (length(complete_subjects) >= 3) {
  model_aov2 <- aov(response ~ time * treatment + Error(subject/time), 
                    data = complete_data)
  print(summary(model_aov2))
} else {
  cat("Insufficient complete cases for traditional rANOVA\n")
}

Example 3: Real Dataset - CO2 Uptake in Plants

# Using built-in CO2 dataset from R
data(CO2)
cat("=== CO2 Uptake in Plants ===\n")
cat("Research question: Does CO2 uptake change with concentration?\n\n")

# Prepare data: focus on Quebec plants, type Qn1
co2_data <- CO2[CO2$Type == "Quebec" & CO2$Plant == "Qn1", ]
co2_data$conc_factor <- factor(co2_data$conc)

cat("Data structure:\n")
str(co2_data)

# Visual inspection
library(ggplot2)
ggplot(co2_data, aes(x = conc, y = uptake, group = 1)) +
  geom_line() +
  geom_point() +
  labs(title = "CO2 Uptake by Concentration",
       x = "CO2 Concentration (mL/L)", y = "Uptake (umol/m^2 sec)") +
  theme_minimal()

# Since we have only one plant, this is a single-subject repeated measures design
# We'll use a polynomial contrast analysis

cat("\n=== Polynomial Trend Analysis ===\n")
# Create orthogonal polynomial contrasts for concentration
conc_levels <- length(unique(co2_data$conc))
poly_contrasts <- contr.poly(conc_levels)

# Apply contrasts
co2_data$conc_factor <- factor(co2_data$conc)
contrasts(co2_data$conc_factor) <- poly_contrasts

# Fit model with polynomial contrasts
model_poly <- lm(uptake ~ conc_factor, data = co2_data)
print(summary(model_poly))

# Extract polynomial components
anova_poly <- anova(model_poly)
cat("\nPolynomial decomposition:\n")
print(anova_poly)

# Calculate proportion of variance explained by each polynomial component
SS_total <- sum(anova_poly$`Sum Sq`)
for (i in 1:nrow(anova_poly)) {
  prop_var <- anova_poly$`Sum Sq`[i] / SS_total
  cat(sprintf("Component %d: %.1f%% of variance\n", 
              i, prop_var * 100))
}

Advanced Topics

Power Analysis for Repeated Measures

# Power analysis for repeated measures designs
calculate_power_rm <- function(n, k, rho, f2, alpha = 0.05) {
  # n: number of subjects
  # k: number of time points
  # rho: correlation between repeated measures
  # f2: effect size (Cohen's f^2)
  
  # Degrees of freedom
  df1 <- k - 1
  df2 <- (n - 1) * (k - 1)
  
  # Non-centrality parameter
  lambda <- n * k * f2
  
  # Critical F value
  F_crit <- qf(1 - alpha, df1, df2)
  
  # Power
  power <- 1 - pf(F_crit, df1, df2, ncp = lambda)
  
  return(power)
}

# Example: Clinical trial with 20 subjects, 4 time points
power_example <- calculate_power_rm(n = 20, k = 4, rho = 0.5, f2 = 0.25)
cat("Power for repeated measures design:\n")
cat("Subjects: 20, Time points: 4, Effect size f² = 0.25\n")
cat("Estimated power:", round(power_example, 3), "\n")

# Power curve visualization
library(ggplot2)
n_range <- seq(10, 50, by = 5)
power_values <- sapply(n_range, function(n) calculate_power_rm(n, 4, 0.5, 0.25))

power_df <- data.frame(n = n_range, power = power_values)
ggplot(power_df, aes(x = n, y = power)) +
  geom_line(size = 1.5, color = "blue") +
  geom_point(size = 3) +
  geom_hline(yintercept = 0.8, linetype = "dashed", color = "red") +
  labs(title = "Power Analysis for Repeated Measures Design",
       subtitle = "k = 4 time points, ρ = 0.5, f² = 0.25",
       x = "Number of Subjects", y = "Statistical Power") +
  theme_minimal()

Handling Autocorrelated Errors

In this example, note that we show refitting the model using maximum likelihood (method = "ML") instead of REML. This often resolves the non-positive definite matrix issue for computing intervals on variance-covariance components, as ML doesn't apply the same adjustments that can destabilize the approximation under REML.


# When measurements are equally spaced in time, errors may be autocorrelated
library(nlme)

# Simulate data with AR(1) errors
set.seed(456)
n <- 15
k <- 5
time_points <- 1:k

ar_data <- data.frame(
  subject = rep(1:n, each = k),
  time = rep(time_points, n),
  treatment = rep(rep(c("A", "B"), length.out = n), each = k)
)

# Generate data with AR(1) errors
for (i in 1:n) {
  ar_errors <- arima.sim(model = list(ar = 0.7), n = k, sd = 2)
  idx <- ((i-1)*k + 1):(i*k)
  ar_data$response[idx] <- 50 + 
    ifelse(ar_data$treatment[idx] == "A", 5, 0) +
    ar_data$time[idx] * 2 +
    ar_errors
}

# Fit model with AR(1) correlation structure
model_ar1 <- lme(response ~ time * treatment, 
                 random = ~ 1 | subject,
                 correlation = corAR1(form = ~ time | subject),
                 data = ar_data)

# Refit the model using maximum likelihood (method = "ML") instead of REML. 
# This often resolves the non-positive definite matrix issue for computing intervals on variance-covariance components, 
# as ML doesn't apply the same adjustments that can destabilize the approximation under REML.
# Refit the model with ML
model_ar1_ml <- update(model_ar1, method = "ML")

cat("=== Initial Model with AR(1) Correlation Structure ===\n")
print(summary(model_ar1))

cat("=== Refit Model with AR(1) Correlation Structure and Maximum Likelihood  Estimation ===\n")
print(summary(model_ar1_ml))

cat("\nCorrelation parameter:", intervals(model_ar1_ml)$corStruct[2], "\n")

# Compare with compound symmetry model
model_cs <- lme(response ~ time * treatment, 
                random = ~ 1 | subject,
                correlation = corCompSymm(form = ~ 1 | subject),
                data = ar_data, 
                method = "ML")

# Model comparison
cat("\n=== Model Comparison ===\n")
print(anova(model_ar1_ml, model_cs))

Software Implementation

# Comprehensive rANOVA analysis function
run_ranova_analysis <- function(data, subject_var, time_var, response_var, between_vars = NULL, method = "mixed") {
  cat("=== REPEATED MEASURES ANOVA ANALYSIS ===\n\n")
  cat("Subject variable:", subject_var, "\n")
  cat("Time variable:", time_var, "\n")
  cat("Response variable:", response_var, "\n")
  if (!is.null(between_vars)) {
    cat("Between-subjects variables:", paste(between_vars, collapse = ", "), "\n")
  }
  cat("Method:", method, "\n\n")
  
  # Check for missing data
  missing_count <- sum(is.na(data[[response_var]]))
  cat("Missing observations:", missing_count, sprintf("(%.1f%%)\n", missing_count/nrow(data)*100))
  
  if (method == "mixed") {
    # Linear mixed model approach
    library(lme4)
    library(lmerTest)
    
    # Build formula
    if (is.null(between_vars)) {
      formula_str <- paste(response_var, "~", time_var, "+ (1|", subject_var, ")")
    } else {
      between_str <- paste(between_vars, collapse = " + ")
      # Add interactions with time
      interaction_str <- paste(paste(time_var, between_vars, sep = ":"), collapse = " + ")
      formula_str <- paste(response_var, "~", time_var, "*", between_str, "+ (1|", subject_var, ")")
    }
    cat("Fitting model:", formula_str, "\n\n")
    model <- lmer(as.formula(formula_str), data = data)
    
    cat("--- Model Summary ---\n")
    print(summary(model))
    
    cat("\n--- Tests of Fixed Effects ---\n")
    print(anova(model))
    
    # Variance components
    cat("\n--- Variance Components ---\n")
    print(VarCorr(model))
    
    # Calculate intraclass correlation coefficient (ICC)
    vc <- VarCorr(model)
    sigma_subject <- as.data.frame(vc)$vcov[1]
    sigma_residual <- as.data.frame(vc)$vcov[2]
    ICC <- sigma_subject / (sigma_subject + sigma_residual)
    cat("\nIntraclass Correlation Coefficient (ICC):", round(ICC, 3), "\n")
    
  } else if (method == "traditional") {
    # Traditional rANOVA using aov()
    if (!is.null(between_vars)) {
      formula_str <- paste(response_var, "~", paste(between_vars, collapse = " + "), "+ Error(", subject_var, "/", time_var, ")")
    } else {
      formula_str <- paste(response_var, "~", time_var, "+ Error(", subject_var, "/", time_var, ")")
    }
    cat("Fitting model:", formula_str, "\n\n")
    model <- aov(as.formula(formula_str), data = data)
    
    cat("--- ANOVA Table ---\n")
    print(summary(model))
  }
  
  # Post-hoc tests if there's a significant time effect
  cat("\n--- Post-hoc Comparisons ---\n")
  if (require(emmeans)) {
    # Check for significant time effect (optional: adjust p-value threshold as needed)
    if (method == "mixed") {
      anova_res <- anova(model)
      time_p <- anova_res[time_var, "Pr(>F)"]
    } else {
      anova_res <- summary(model)
      time_p <- anova_res[[2]][[1]]$"Pr(>F)"[1]  # Assuming time is the first term in Error strata
    }
    
    if (!is.na(time_p) && time_p < 0.05) {
      if (method == "mixed") {
        # Dynamically build the specs formula
        specs_formula <- formula(paste("pairwise ~", time_var))
        emm_time <- emmeans(model, specs = specs_formula, adjust = "tukey")
      } else {
        # For traditional ANOVA, fit a linear model for post-hoc tests
        if (is.null(between_vars)) {
          lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var))
        } else {
          lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var, "+", paste(between_vars, collapse = " + ")))
        }
        lm_model <- lm(lm_formula, data = data)
        
        # Dynamically build the specs formula
        specs_formula <- formula(paste("pairwise ~", time_var))
        emm_time <- emmeans(lm_model, specs = specs_formula, adjust = "tukey")
      }
      print(emm_time$contrasts)
    } else {
      cat("No significant time effect detected (p >= 0.05). Skipping post-hoc comparisons.\n")
    }
  }
  
  # Effect sizes
  cat("\n--- Effect Sizes ---\n")
  if (method == "traditional") {
    aov_summary <- summary(model)
    if (!is.null(between_vars)) {
      # This is more complex with between-subjects factors
      cat("Note: Effect size calculation for factorial rANOVA requires manual computation\n")
    } else {
      SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
      SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
      eta_sq <- SS_time / (SS_time + SS_error)
      cat("Partial eta-squared:", round(eta_sq, 3), "\n")
    }
  }
  
  return(model)
}

# Example usage with built-in dataset
cat("\n\n=== EXAMPLE: Chick Weight Dataset ===\n")
data(ChickWeight)
ChickWeight$Time <- factor(ChickWeight$Time)
ChickWeight$Chick <- factor(ChickWeight$Chick)

# Run analysis
result <- run_ranova_analysis(
  data = ChickWeight,
  subject_var = "Chick",
  time_var = "Time",
  response_var = "weight",
  between_vars = "Diet",
  method = "mixed"
)

Common Issues and Solutions

Missing Data Patterns

# Identify missing data patterns
library(mice)

analyze_missing_patterns <- function(data, subject_var, time_var, response_var) {
  # Convert to wide format
  wide_data <- reshape(data, 
                       idvar = subject_var, 
                       timevar = time_var, 
                       direction = "wide")
  
  # Create pattern matrix
  pattern_matrix <- md.pattern(wide_data[, -1], plot = FALSE)
  
  cat("Missing Data Patterns:\n")
  print(pattern_matrix)
  
  # Visualize missingness
  if (require(VIM)) {
    aggr_plot <- aggr(wide_data[, -1], 
                      numbers = TRUE, 
                      sortVars = TRUE, 
                      labels = names(wide_data)[-1],
                      cex.axis = 0.7, 
                      gap = 3, 
                      ylab = c("Missing Data Pattern", "Frequency"))
  }
  
  # Test for MCAR using Little's test
  if (require(norm2)) {
    # Prepare data for Little's test
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
    test_data <- wide_data[, response_cols]
    
    # Perform Little's MCAR test
    little_test <- mcar.test(test_data)
    cat("\nLittle's MCAR Test:\n")
    print(little_test)
  }
  
  return(pattern_matrix)
}

# Example
missing_pattern <- analyze_missing_patterns(clinical_data, "subject", "time", "response")

Violation of Sphericity

First we define the worker function.

handle_sphericity_violation <- function(data, subject_var, time_var, response_var) {
  cat("=== Handling Sphericity Violation ===\n")
  
  # 1. Use Greenhouse-Geisser or Huynh-Feldt corrections
  cat("\n1. Apply epsilon corrections:\n")
  # (Implementation shown in check_ranova_assumptions function above)
  
  # 2. Use multivariate approach (MANOVA)
  cat("\n2. Multivariate approach (MANOVA):\n")
  
  # Convert to wide format
  wide_data <- reshape(data, 
                       idvar = subject_var, 
                       timevar = time_var, 
                       direction = "wide")
  
  # Extract response columns
  response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
  response_matrix <- as.matrix(wide_data[, response_cols])
  
  # Fit MANOVA
  manova_result <- manova(response_matrix ~ 1)
  
  # Test using Pillai's trace (most robust)
  manova_summary <- summary(manova_result, test = "Pillai")
  print(manova_summary)
  
  # 3. Use linear mixed models with unstructured covariance
  cat("\n3. Linear mixed model with unstructured covariance:\n")
  library(nlme)
  
  # Fit model with unstructured covariance
  model_unstructured <- lme(as.formula(paste(response_var, "~", time_var)),
                           random = ~ 1 | subject_var,
                           correlation = corSymm(form = ~ as.numeric(get(time_var)) | subject_var),
                           data = data,
                           control = lmeControl(opt = "optim"))
  
  print(summary(model_unstructured))
  
  # Compare with compound symmetry model
  model_compound <- lme(as.formula(paste(response_var, "~", time_var)),
                       random = ~ 1 | subject_var,
                       correlation = corCompSymm(form = ~ 1 | subject_var),
                       data = data)
  
  cat("\nModel comparison (unstructured vs. compound symmetry):\n")
  print(anova(model_unstructured, model_compound))
}

Next, we test the handle_sphericity_violation () function using a small toy dataset with 4 subjects and 3 time points to keep things simple and illustrative (avoiding potential issues with missing data or large outputs in built-in datasets like ChickWeight). This dataset simulates repeated measures where the response increases over time, which might violate sphericity in a real analysis.

# Load required library
library(nlme)

# Create a small toy dataset for demonstration (modified to avoid collinearity/rank deficiency)
data <- data.frame(
  subject = rep(1:4, each = 3),
  time = rep(1:3, 4),
  response = c(5, 7, 8, 5.5, 6, 7.5, 4, 5, 5.5, 6, 8, 9)
)

# Define the function (with fixes for grouping variable and time conversion)
handle_sphericity_violation <- function(data, subject_var, time_var, response_var) {
  cat("=== Handling Sphericity Violation ===\n")
  
  # 1. Use Greenhouse-Geisser or Huynh-Feldt corrections
  cat("\n1. Apply epsilon corrections:\n")
  # (Implementation shown in check_ranova_assumptions function above)
  
  # 2. Use multivariate approach (MANOVA) for time effect
  cat("\n2. Multivariate approach (MANOVA):\n")
  # Convert to wide format
  wide_data <- reshape(data, idvar = subject_var, timevar = time_var, direction = "wide")
  # Extract response columns (assuming sorted by time levels)
  response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
  response_matrix <- as.matrix(wide_data[, response_cols])
  # Create successive differences to test time effect (k-1 variables)
  if (ncol(response_matrix) > 1) {
    diff_matrix <- response_matrix[, 2:ncol(response_matrix)] - response_matrix[, 1:(ncol(response_matrix)-1)]
    # Fit MANOVA on differences ~1 (tests if mean differences are zero)
    manova_result <- manova(diff_matrix ~ 1)
    # Test using Pillai's trace (most robust)
    manova_summary <- summary(manova_result, test = "Pillai")
    print(manova_summary)
  } else {
    cat("Not enough time points for MANOVA.\n")
  }
  
  # 3. Use linear mixed models with unstructured covariance
  cat("\n3. Linear mixed model with unstructured covariance:\n")
  # library(nlme) # Already loaded globally
  
  # Convert grouping variable to factor (required for lme grouping)
  data[[subject_var]] <- factor(data[[subject_var]])
  
  # Convert time_var to numeric if it's a factor (for correlation covariate)
  if (is.factor(data[[time_var]])) {
    data[[time_var]] <- as.numeric(as.character(data[[time_var]]))
  }
  
  # Fit model with unstructured covariance (dynamic formula)
  cor_form <- as.formula(paste("~", time_var, "|", subject_var))
  
  # *** FIX APPLIED HERE: Create random formula dynamically ***
  random_form <- as.formula(paste("~ 1 |", subject_var)) 
  
  model_unstructured <- lme(as.formula(paste(response_var, "~", time_var)),
                            random = random_form, # Use the dynamically created formula
                            correlation = corSymm(form = cor_form),
                            data = data,
                            control = lmeControl(opt = "optim"))
  print(summary(model_unstructured))
  
  # Compare with compound symmetry model
  model_compound <- lme(as.formula(paste(response_var, "~", time_var)),
                        random = random_form, # Also use the dynamically created formula
                        correlation = corCompSymm(form = as.formula(paste("~ 1 |", subject_var))),
                        data = data)
  cat("\nModel comparison (unstructured vs. compound symmetry):\n")
  print(anova(model_unstructured, model_compound))
}

# Call the function with the toy dataset
handle_sphericity_violation(data, "subject", "time", "response")

Problems and Exercises

1. Conceptual Problems:

  a) Derive the expected mean squares for the rANOVA model
  b) Prove that the F-test for time effects in rANOVA follows an F-distribution under the null hypothesis
  c) Show how the intraclass correlation coefficient relates to the variance components in the mixed model

2. Applied Problems:

  a) Analyze the SOCR LDT dataset using rANOVA
  b) Conduct a repeated measures analysis of the ChickWeight dataset with diet as a between-subjects factor
  c) Perform power analysis for a planned longitudinal study with 5 time points, expecting a medium effect (f² = 0.15), and correlation ρ = 0.6


3. Simulation Study:

# Simulation to compare traditional rANOVA and linear mixed models
simulate_rm_comparison <- function(n_sim = 500, n = 20, k = 4, missing_rate = 0.1) {
  
  type1_error_aov <- numeric(n_sim)
  type1_error_lmm <- numeric(n_sim)
  
  for (sim in 1:n_sim) {
    # Simulate data under null hypothesis (no time effect)
    data <- expand.grid(
      subject = factor(1:n),
      time = factor(1:k)
    )
    
    # Random intercept model
    subject_effects <- rnorm(n, 0, 5)
    data$response <- rep(subject_effects, each = k) + rnorm(n * k, 0, 2)
    
    # Introduce missing data
    if (missing_rate > 0) {
      missing_indices <- sample(1:nrow(data), size = round(nrow(data) * missing_rate))
      data$response[missing_indices] <- NA
    }
    
    # Method 1: Traditional rANOVA (complete cases only)
    complete_data <- data[complete.cases(data), ]
    complete_subjects <- length(unique(complete_data$subject))
    
    if (complete_subjects >= 3) {
      try({
        model_aov <- aov(response ~ time + Error(subject/time), data = complete_data)
        aov_summary <- summary(model_aov)
        p_aov <- aov_summary[[2]][[1]]$"Pr(>F)"[1]
        type1_error_aov[sim] <- p_aov < 0.05
      }, silent = TRUE)
    }
    
    # Method 2: Linear mixed model (uses all available data)
    try({
      library(lmerTest)
      model_lmm <- lmer(response ~ time + (1|subject), data = data, 
                        na.action = na.omit)
      p_lmm <- anova(model_lmm)$"Pr(>F)"[1]
      type1_error_lmm[sim] <- p_lmm < 0.05
    }, silent = TRUE)
  }
  
  cat("Simulation Results (n =", n_sim, "):\n")
  cat("Type I error rate - Traditional rANOVA:", 
      mean(type1_error_aov, na.rm = TRUE), "\n")
  cat("Type I error rate - Linear Mixed Model:", 
      mean(type1_error_lmm, na.rm = TRUE), "\n")
  
  return(data.frame(
    simulation = 1:n_sim,
    aov_error = type1_error_aov,
    lmm_error = type1_error_lmm
  ))
}

# Run simulation
sim_results <- simulate_rm_comparison(n_sim = 200, n = 15, k = 3, missing_rate = 0.1)

References

1. Maxwell, S. E., & Delaney, H. D. (2004). *Designing Experiments and Analyzing Data: A Model Comparison Perspective* (2nd ed.). Psychology Press.

2. Pinheiro, J. C., & Bates, D. M. (2000). *Mixed-Effects Models in S and S-PLUS*. Springer.

3. Verbeke, G., & Molenberghs, G. (2000). *Linear Mixed Models for Longitudinal Data*. Springer.

4. Field, A., Miles, J., & Field, Z. (2012). *Discovering Statistics Using R*. Sage Publications.

5. Gueorguieva, R., & Krystal, J. H. (2004). Move over ANOVA: Progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry. *Archives of General Psychiatry, 61*(3), 310-317.

Online Resources




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif