Difference between revisions of "SMHS rANOVA"

Latest revision as of 13:35, 9 December 2025

Scientific Methods for Health Sciences - Repeated Measures Analysis of Variance (rANOVA)

Overview

Repeated Measures Analysis of Variance (rANOVA) is a statistical method for analyzing data where the same subjects are measured multiple times under different conditions or at different time points. Unlike independent ANOVA, rANOVA accounts for the correlation between repeated measurements on the same subject, increasing statistical power by reducing error variance attributable to individual differences. This section provides a comprehensive treatment of rANOVA, including its mathematical foundations, assumptions, modern alternatives, and practical implementation in R.

Motivation

Consider a clinical trial measuring blood pressure in patients at baseline, 3 months, and 6 months post-treatment. Using independent ANOVA would violate the independence assumption since measurements are correlated within subjects. rANOVA addresses this by:

1. Partitioning variance into between-subjects and within-subjects components

2. Increasing statistical power by reducing error variance

3. Accounting for individual differences through subject-specific effects

4. Enabling longitudinal analysis of change over time

For studies with missing data, complex covariance structures, or unbalanced designs, Linear Mixed Models (LMMs) provide a more flexible alternative to traditional rANOVA.

Theory

Mathematical Model

Univariate Approach

For \(n\) subjects measured at \(k\) time points, the univariate rANOVA model is\[ y_{ij} = \mu + \pi_i + \tau_j + \varepsilon_{ij}, \quad i=1,\ldots,n,\; j=1,\ldots,k \] where: - \(y_{ij}\) is the measurement for subject \(i\) at time \(j\) - \(\mu\) is the overall mean - \(\pi_i \sim N(0, \sigma_{\pi}^2)\) is the random subject effect - \(\tau_j\) is the fixed time effect (\(\sum_{j=1}^k \tau_j = 0\)) - \(\varepsilon_{ij} \sim N(0, \sigma_{\varepsilon}^2)\) is the within-subject error

The covariance structure assumes\[ \text{Cov}(y_{ij}, y_{i'j'}) = \begin{cases} \sigma_{\pi}^2 + \sigma_{\varepsilon}^2 & \text{if } i=i', j=j' \\ \sigma_{\pi}^2 & \text{if } i=i', j \neq j' \\ 0 & \text{if } i \neq i' \end{cases} \]

Matrix Formulation

Let \(\mathbf{y}_i = (y_{i1}, \ldots, y_{ik})^\top\) be the vector of measurements for subject \(i\)\[ \mathbf{y}_i = \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol{\varepsilon}_i \] where: - \(\mathbf{X}_i\) is the design matrix for fixed effects - \(\boldsymbol{\beta}\) contains fixed effects (including time effects) - \(\mathbf{Z}_i\) is the design matrix for random effects - \(\mathbf{b}_i \sim N(\mathbf{0}, \mathbf{G})\) are random subject effects - \(\boldsymbol{\varepsilon}_i \sim N(\mathbf{0}, \mathbf{R}_i)\) is the within-subject error

For compound symmetry structure\[ \text{Var}(\mathbf{y}_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^\top + \mathbf{R}_i = \sigma_{\pi}^2\mathbf{J}_k + \sigma_{\varepsilon}^2\mathbf{I}_k \] where \(\mathbf{J}_k\) is a \(k \times k\) matrix of ones and \(\mathbf{I}_k\) is the identity matrix.

Variance Partitioning

The total sum of squares decomposes as\[ SST = SS_{BetweenSubjects} + SS_{WithinSubjects} \] \( SS_{WithinSubjects} = SS_{Time} + SS_{Error} \)

More formally\[ \begin{aligned} SST &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Subjects} &= k \sum_{i=1}^n (\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Time} &= n \sum_{j=1}^k (\bar{y}_{\cdot j} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Error} &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{i\cdot} - \bar{y}_{\cdot j} + \bar{y}_{\cdot\cdot})^2 \end{aligned} \]

Degrees of freedom\[ \begin{aligned} df_{Total} &= nk - 1 \\ df_{Subjects} &= n - 1 \\ df_{Time} &= k - 1 \\ df_{Error} &= (n - 1)(k - 1) \end{aligned} \]

Mean squares\[ \begin{aligned} MS_{Subjects} &= SS_{Subjects} / (n-1) \\ MS_{Time} &= SS_{Time} / (k-1) \\ MS_{Error} &= SS_{Error} / ((n-1)(k-1)) \end{aligned} \]

F-statistic for time effect\[ F = \frac{MS_{Time}}{MS_{Error}} \sim F_{k-1, (n-1)(k-1)} \quad \text{under } H_0: \tau_1 = \cdots = \tau_k \]

Sphericity and Corrections

The sphericity assumption (also called circularity) requires\[ \text{Var}(y_{ij} - y_{ij'}) = \text{constant for all } j \neq j' \] Equivalently, the covariance matrix of the difference scores has equal variances and equal covariances.

Mauchly's Test for Sphericity: \( W = \frac{| \mathbf{E} |}{(\text{tr}(\mathbf{E})/p)^{p}} \sim \chi^2_{(p(p-1)/2 - 1)} \) where \(\mathbf{E}\) is the \(p \times p\) covariance matrix of the difference scores, \(p = k-1\).

When sphericity is violated, corrections adjust degrees of freedom:

1. Greenhouse-Geisser correction\[\hat{\epsilon} = \frac{(\text{tr}(\mathbf{E}))^2}{(p-1)\sum_{i=1}^p \lambda_i^2}\] \(F_{GG} \sim F_{\hat{\epsilon}(k-1), \hat{\epsilon}(n-1)(k-1)}\)

2. Huynh-Feldt correction\[\tilde{\epsilon} = \frac{n(k-1)\hat{\epsilon} - 2}{(k-1)[(n-1) - (k-1)\hat{\epsilon}]}\] \(F_{HF} \sim F_{\tilde{\epsilon}(k-1), \tilde{\epsilon}(n-1)(k-1)}\)

4) Multivariate Approach (MANOVA)

For \(k\) time points, consider the \((k-1) \times 1\) vector of difference scores \(\mathbf{d}_i = (y_{i2}-y_{i1}, \ldots, y_{ik}-y_{i,k-1})^\top\)\[ \mathbf{d}_i \sim N_{k-1}(\boldsymbol{\mu}_d, \boldsymbol{\Sigma}_d) \] Test \(H_0: \boldsymbol{\mu}_d = \mathbf{0}\) using Hotelling's \(T^2\)\[ T^2 = n\bar{\mathbf{d}}^\top \mathbf{S}_d^{-1} \bar{\mathbf{d}} \sim \frac{(n-1)(k-1)}{n-k+1} F_{k-1, n-k+1} \] where \(\bar{\mathbf{d}}\) is the sample mean and \(\mathbf{S}_d\) is the sample covariance of difference scores.

Assumptions and Diagnostics

Key Assumptions: 1. Normality: Residuals \(\varepsilon_{ij}\) should be normally distributed 2. Sphericity: Equal variances of differences between all pairs of repeated measures 3. Missing Data: Missing completely at random (MCAR) for valid inference 4. No Time × Subject Interaction: Treatment effects are consistent across subjects

Diagnostic Procedures:

# Comprehensive rANOVA diagnostic function
check_ranova_assumptions <- function(data, subject_var, time_var, response_var) {
  
  cat("=== REPEATED MEASURES ANOVA DIAGNOSTICS ===\n")
  
  # 1. Normality of residuals
  # Fit a linear model ignoring the repeated measures structure first
  model_lm <- lm(as.formula(paste(response_var, "~", time_var)), data = data)
  residuals <- resid(model_lm)
  
  par(mfrow = c(2, 3))
  
  # Q-Q plot
  qqnorm(residuals, main = "Q-Q Plot of Residuals")
  qqline(residuals)
  
  # Shapiro-Wilk test
  shapiro_test <- shapiro.test(residuals)
  cat("1. Normality (Shapiro-Wilk): W =", round(shapiro_test$statistic, 4),
      "p =", format.pval(shapiro_test$p.value, digits = 4), "\n")
  
  # 2. Sphericity test (requires ez package)
  if (require(ez, quietly = TRUE)) {
    # Convert to wide format for Mauchly's test
    wide_data <- reshape(data, 
                         idvar = subject_var, 
                         timevar = time_var, 
                         direction = "wide")
    
    # Remove subject column
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
    response_matrix <- as.matrix(wide_data[, response_cols])
    
    # Mauchly's test using car package
    if (require(car, quietly = TRUE)) {
      # Create an idata frame for repeated measures
      time_levels <- unique(data[[time_var]])
      idata <- data.frame(time = factor(time_levels))
      
      # Fit linear model for repeated measures
      mlm <- lm(response_matrix ~ 1)
      
      # Mauchly's test
      mauchly_test <- Anova(mlm, idata = idata, idesign = ~time, type = "III")
      mauchly_result <- mauchly_test$sphericity.tests
      
      cat("\n2. Mauchly's Test for Sphericity:\n")
      print(mauchly_result)
      
      # Greenhouse-Geisser and Huynh-Feldt corrections
      epsilon_GG <- mauchly_result["Greenhouse-Geisser epsilon", "GG eps"]
      epsilon_HF <- mauchly_result["Huynh-Feldt epsilon", "HF eps"]
      cat("\nEpsilon corrections:\n")
      cat("  Greenhouse-Geisser:", round(epsilon_GG, 3), "\n")
      cat("  Huynh-Feldt:", round(epsilon_HF, 3), "\n")
    }
  }
  
  # 3. Homogeneity of variance across time points
  boxplot(as.formula(paste(response_var, "~", time_var)), 
          data = data,
          main = "Boxplots by Time Point",
          xlab = time_var, ylab = response_var)
  
  # Levene's test for homogeneity of variance
  if (require(car, quietly = TRUE)) {
    levene_test <- leveneTest(as.formula(paste(response_var, "~ factor(", time_var, ")")), 
                              data = data)
    cat("\n3. Levene's Test for Homogeneity of Variance:\n")
    print(levene_test)
  }
  
  # 4. Individual profiles plot
  data_sorted <- data[order(data[[subject_var]], data[[time_var]]), ]
  
  # Create empty plot
  plot(1, type = "n", 
       xlim = range(as.numeric(data[[time_var]])), 
       ylim = range(data[[response_var]]),
       xlab = time_var, ylab = response_var,
       main = "Individual Profiles")
  
  # Plot each subject's profile
  subjects <- unique(data[[subject_var]])
  colors <- rainbow(length(subjects))
  
  for (i in seq_along(subjects)) {
    subject_data <- data[data[[subject_var]] == subjects[i], ]
    lines(as.numeric(subject_data[[time_var]]), 
          subject_data[[response_var]], 
          col = colors[i], type = "b", lwd = 1)
  }
  
  # Add group mean profile
  group_means <- aggregate(as.formula(paste(response_var, "~", time_var)), 
                          data = data, mean)
  lines(as.numeric(group_means[[time_var]]), 
        group_means[[response_var]], 
        col = "black", lwd = 3, type = "b")
  
  legend("topright", legend = c("Individual", "Group Mean"), 
         col = c("gray", "black"), lwd = c(1, 3), bty = "n")
  
  # 5. Residuals vs fitted plot
  plot(fitted(model_lm), residuals,
       xlab = "Fitted Values", ylab = "Residuals",
       main = "Residuals vs Fitted")
  abline(h = 0, col = "red", lty = 2)
  
  # 6. Histogram of residuals
  hist(residuals, main = "Histogram of Residuals",
       xlab = "Residuals", col = "lightblue", freq = FALSE)
  curve(dnorm(x, mean = mean(residuals), sd = sd(residuals)), 
        add = TRUE, col = "red", lwd = 2)
  
  par(mfrow = c(1, 1))
  
  # Return diagnostic summary
  return(list(
    shapiro_test = shapiro_test,
    sphericity_tests = if(exists("mauchly_result")) mauchly_result else NULL,
    homogeneity_test = if(exists("levene_test")) levene_test else NULL
  ))
}

Applications

Example 1: Exercise Training Intervention

# Recreate the example from the motivation section
exercise_data <- data.frame(
  subject = factor(rep(1:6, each = 3)),
  time = factor(rep(c("Pre", "3Months", "6Months"), 6)),
  fitness = c(45, 50, 55,    # Subject 1
              42, 42, 45,    # Subject 2
              36, 41, 43,    # Subject 3
              39, 35, 40,    # Subject 4
              51, 55, 59,    # Subject 5
              44, 49, 56)    # Subject 6
)

cat("=== Exercise Training Intervention Study ===\n")
cat("Research question: Does fitness level change over time?\n\n")

# Display the data
cat("Dataset:\n")
print(exercise_data)

# Summary statistics
cat("\nSummary statistics by time:\n")
print(aggregate(fitness ~ time, data = exercise_data, 
                FUN = function(x) c(mean = mean(x), sd = sd(x))))

# Run diagnostics first
cat("\n=== Model Diagnostics ===\n")
diagnostics <- check_ranova_assumptions(exercise_data, "subject", "time", "fitness")

# Traditional rANOVA using aov()
cat("\n=== Traditional rANOVA ===\n")
model_aov <- aov(fitness ~ time + Error(subject/time), data = exercise_data)
print(summary(model_aov))

# Extract F-statistic and p-value
aov_summary <- summary(model_aov)
F_time <- aov_summary[[2]][[1]]$"F value"[1]
df_time <- aov_summary[[2]][[1]]$"Df"[1]
df_error <- aov_summary[[2]][[1]]$"Df"[2]
p_time <- aov_summary[[2]][[1]]$"Pr(>F)"[1]

cat(sprintf("\nF(%d, %d) = %.2f, p = %.3f\n", 
            df_time, df_error, F_time, p_time))

# Effect size calculation (Partial eta-squared)
SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
eta_sq <- SS_time / (SS_time + SS_error)
cat("Partial eta-squared (effect size):", round(eta_sq, 3), "\n")

# Post-hoc pairwise comparisons with Bonferroni correction
cat("\n=== Post-hoc Pairwise Comparisons ===\n")
library(emmeans)

# Fit linear model for post-hoc tests
model_lm <- lm(fitness ~ time + subject, data = exercise_data)
emm_time <- emmeans(model_lm, specs = pairwise ~ time, adjust = "bonferroni")
print(emm_time$contrasts)

# Visualization
library(ggplot2)
ggplot(exercise_data, aes(x = time, y = fitness, group = subject)) +
  geom_line(aes(color = subject), alpha = 0.6) +
  geom_point(aes(color = subject), size = 2) +
  stat_summary(aes(group = 1), fun = mean, geom = "line", 
               color = "black", size = 1.5) +
  stat_summary(fun.data = mean_se, geom = "errorbar", 
               width = 0.1, color = "black") +
  labs(title = "Fitness Levels Over Time",
       subtitle = "Individual profiles with group mean",
       x = "Time Point", y = "Fitness Score") +
  theme_minimal()

Example 2: Clinical Trial with Missing Data (Using Mixed Models)

# Simulated clinical trial data with missing values
set.seed(123)
n_subjects <- 30
n_timepoints <- 4

clinical_data <- expand.grid(
  subject = factor(1:n_subjects),
  time = factor(paste0("Week_", 0:3)),
  treatment = factor(rep(c("Drug", "Placebo"), each = n_subjects/2))
)

# Generate data with treatment effect and time trend
clinical_data$response <- with(clinical_data, 
  50 +  # Baseline
  ifelse(treatment == "Drug", 5, 0) +  # Treatment effect
  as.numeric(time) * 2 +  # Time effect (2 units per week)
  ifelse(treatment == "Drug", as.numeric(time) * 1, 0) +  # Treatment × time interaction
  rep(rnorm(n_subjects, 0, 3), each = n_timepoints) +  # Random intercept
  rnorm(n_subjects * n_timepoints, 0, 2)  # Residual error
)

# Introduce missing data (MCAR)
missing_indices <- sample(1:nrow(clinical_data), size = 20)
clinical_data$response[missing_indices] <- NA

cat("=== Clinical Trial with Repeated Measures ===\n")
cat("Research question: Does the treatment affect response over time?\n\n")

cat("Missing data pattern:\n")
print(table(is.na(clinical_data$response)))

# Approach 1: Linear Mixed Model (recommended for missing data)
cat("\n=== Linear Mixed Model Approach ===\n")
library(lme4)
library(lmerTest)

# Fit mixed model with random intercepts
mixed_model <- lmer(response ~ time * treatment + (1|subject), 
                    data = clinical_data, 
                    na.action = na.omit)

cat("\nModel summary:\n")
print(summary(mixed_model))

# Test fixed effects
cat("\nTests of fixed effects:\n")
print(anova(mixed_model))

# Random effects variance components
cat("\nVariance components:\n")
print(VarCorr(mixed_model))

# Estimated marginal means
if (require(emmeans)) {
  cat("\n=== Estimated Marginal Means ===\n")
  emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
  print(emm_treatment)
  
  emm_time <- emmeans(mixed_model, specs = ~ time)
  print(emm_time)
  
  # Interaction plot
  emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
  print(emm_interaction)
  

# Robust Interaction plot
if (require(emmeans)) {
  cat("\n=== Estimated Marginal Means ===\n")
  emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
  print(emm_treatment)
  
  emm_time <- emmeans(mixed_model, specs = ~ time)
  print(emm_time)
  
  # Interaction plot
  emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
  
  # Convert to data frame and ensure confidence intervals are present
  plot_data <- as.data.frame(emm_interaction)
  
  # Check if confidence interval columns exist
  if (!all(c("asymp.LCL", "asymp.LCL") %in% names(plot_data))) {
    # If not, get them using summary
    plot_data <- as.data.frame(summary(emm_interaction))
  }
  
  # Now plot
  ggplot(plot_data, aes(x = time, y = emmean, group = treatment, color = treatment)) +
    geom_line(size = 1.5) +
    geom_errorbar(aes(ymin = asymp.LCL, ymax = asymp.UCL), width = 0.1) +
    geom_point(size = 3) +
    labs(title = "Treatment × Time Interaction",
         subtitle = "Estimated marginal means with 95% confidence intervals",
         x = "Time Point", y = "Response") +
    theme_minimal()
}

}

# Approach 2: Traditional rANOVA (for comparison)
cat("\n=== Traditional rANOVA (for comparison) ===\n")
# Remove subjects with missing data for traditional ANOVA
complete_data <- clinical_data[complete.cases(clinical_data), ]
complete_subjects <- unique(complete_data$subject)
cat("Subjects with complete data:", length(complete_subjects), "\n")

if (length(complete_subjects) >= 3) {
  model_aov2 <- aov(response ~ time * treatment + Error(subject/time), 
                    data = complete_data)
  print(summary(model_aov2))
} else {
  cat("Insufficient complete cases for traditional rANOVA\n")
}

Example 3: Real Dataset - CO2 Uptake in Plants

# Using built-in CO2 dataset from R
data(CO2)
cat("=== CO2 Uptake in Plants ===\n")
cat("Research question: Does CO2 uptake change with concentration?\n\n")

# Prepare data: focus on Quebec plants, type Qn1
co2_data <- CO2[CO2$Type == "Quebec" & CO2$Plant == "Qn1", ]
co2_data$conc_factor <- factor(co2_data$conc)

cat("Data structure:\n")
str(co2_data)

# Visual inspection
library(ggplot2)
ggplot(co2_data, aes(x = conc, y = uptake, group = 1)) +
  geom_line() +
  geom_point() +
  labs(title = "CO2 Uptake by Concentration",
       x = "CO2 Concentration (mL/L)", y = "Uptake (umol/m^2 sec)") +
  theme_minimal()

# Since we have only one plant, this is a single-subject repeated measures design
# We'll use a polynomial contrast analysis

cat("\n=== Polynomial Trend Analysis ===\n")
# Create orthogonal polynomial contrasts for concentration
conc_levels <- length(unique(co2_data$conc))
poly_contrasts <- contr.poly(conc_levels)

# Apply contrasts
co2_data$conc_factor <- factor(co2_data$conc)
contrasts(co2_data$conc_factor) <- poly_contrasts

# Fit model with polynomial contrasts
model_poly <- lm(uptake ~ conc_factor, data = co2_data)
print(summary(model_poly))

# Extract polynomial components
anova_poly <- anova(model_poly)
cat("\nPolynomial decomposition:\n")
print(anova_poly)

# Calculate proportion of variance explained by each polynomial component
SS_total <- sum(anova_poly$`Sum Sq`)
for (i in 1:nrow(anova_poly)) {
  prop_var <- anova_poly$`Sum Sq`[i] / SS_total
  cat(sprintf("Component %d: %.1f%% of variance\n", 
              i, prop_var * 100))
}

Advanced Topics

Power Analysis for Repeated Measures

# Power analysis for repeated measures designs
calculate_power_rm <- function(n, k, rho, f2, alpha = 0.05) {
  # n: number of subjects
  # k: number of time points
  # rho: correlation between repeated measures
  # f2: effect size (Cohen's f^2)
  
  # Degrees of freedom
  df1 <- k - 1
  df2 <- (n - 1) * (k - 1)
  
  # Non-centrality parameter
  lambda <- n * k * f2
  
  # Critical F value
  F_crit <- qf(1 - alpha, df1, df2)
  
  # Power
  power <- 1 - pf(F_crit, df1, df2, ncp = lambda)
  
  return(power)
}

# Example: Clinical trial with 20 subjects, 4 time points
power_example <- calculate_power_rm(n = 20, k = 4, rho = 0.5, f2 = 0.25)
cat("Power for repeated measures design:\n")
cat("Subjects: 20, Time points: 4, Effect size f² = 0.25\n")
cat("Estimated power:", round(power_example, 3), "\n")

# Power curve visualization
library(ggplot2)
n_range <- seq(10, 50, by = 5)
power_values <- sapply(n_range, function(n) calculate_power_rm(n, 4, 0.5, 0.25))

power_df <- data.frame(n = n_range, power = power_values)
ggplot(power_df, aes(x = n, y = power)) +
  geom_line(size = 1.5, color = "blue") +
  geom_point(size = 3) +
  geom_hline(yintercept = 0.8, linetype = "dashed", color = "red") +
  labs(title = "Power Analysis for Repeated Measures Design",
       subtitle = "k = 4 time points, ρ = 0.5, f² = 0.25",
       x = "Number of Subjects", y = "Statistical Power") +
  theme_minimal()

Handling Autocorrelated Errors

In this example, note that we show refitting the model using maximum likelihood (method = "ML") instead of REML. This often resolves the non-positive definite matrix issue for computing intervals on variance-covariance components, as ML doesn't apply the same adjustments that can destabilize the approximation under REML.

# When measurements are equally spaced in time, errors may be autocorrelated
library(nlme)

# Simulate data with AR(1) errors
set.seed(456)
n <- 15
k <- 5
time_points <- 1:k

ar_data <- data.frame(
  subject = rep(1:n, each = k),
  time = rep(time_points, n),
  treatment = rep(rep(c("A", "B"), length.out = n), each = k)
)

# Generate data with AR(1) errors
for (i in 1:n) {
  ar_errors <- arima.sim(model = list(ar = 0.7), n = k, sd = 2)
  idx <- ((i-1)*k + 1):(i*k)
  ar_data$response[idx] <- 50 + 
    ifelse(ar_data$treatment[idx] == "A", 5, 0) +
    ar_data$time[idx] * 2 +
    ar_errors
}

# Fit model with AR(1) correlation structure
model_ar1 <- lme(response ~ time * treatment, 
                 random = ~ 1 | subject,
                 correlation = corAR1(form = ~ time | subject),
                 data = ar_data)

# Refit the model using maximum likelihood (method = "ML") instead of REML. 
# This often resolves the non-positive definite matrix issue for computing intervals on variance-covariance components, 
# as ML doesn't apply the same adjustments that can destabilize the approximation under REML.
# Refit the model with ML
model_ar1_ml <- update(model_ar1, method = "ML")

cat("=== Initial Model with AR(1) Correlation Structure ===\n")
print(summary(model_ar1))

cat("=== Refit Model with AR(1) Correlation Structure and Maximum Likelihood  Estimation ===\n")
print(summary(model_ar1_ml))

cat("\nCorrelation parameter:", intervals(model_ar1_ml)$corStruct[2], "\n")

# Compare with compound symmetry model
model_cs <- lme(response ~ time * treatment, 
                random = ~ 1 | subject,
                correlation = corCompSymm(form = ~ 1 | subject),
                data = ar_data, 
                method = "ML")

# Model comparison
cat("\n=== Model Comparison ===\n")
print(anova(model_ar1_ml, model_cs))

Software Implementation

# Comprehensive rANOVA analysis function
run_ranova_analysis <- function(data, subject_var, time_var, response_var, between_vars = NULL, method = "mixed") {
  cat("=== REPEATED MEASURES ANOVA ANALYSIS ===\n\n")
  cat("Subject variable:", subject_var, "\n")
  cat("Time variable:", time_var, "\n")
  cat("Response variable:", response_var, "\n")
  if (!is.null(between_vars)) {
    cat("Between-subjects variables:", paste(between_vars, collapse = ", "), "\n")
  }
  cat("Method:", method, "\n\n")
  
  # Check for missing data
  missing_count <- sum(is.na(data[[response_var]]))
  cat("Missing observations:", missing_count, sprintf("(%.1f%%)\n", missing_count/nrow(data)*100))
  
  if (method == "mixed") {
    # Linear mixed model approach
    library(lme4)
    library(lmerTest)
    
    # Build formula
    if (is.null(between_vars)) {
      formula_str <- paste(response_var, "~", time_var, "+ (1|", subject_var, ")")
    } else {
      between_str <- paste(between_vars, collapse = " + ")
      # Add interactions with time
      interaction_str <- paste(paste(time_var, between_vars, sep = ":"), collapse = " + ")
      formula_str <- paste(response_var, "~", time_var, "*", between_str, "+ (1|", subject_var, ")")
    }
    cat("Fitting model:", formula_str, "\n\n")
    model <- lmer(as.formula(formula_str), data = data)
    
    cat("--- Model Summary ---\n")
    print(summary(model))
    
    cat("\n--- Tests of Fixed Effects ---\n")
    print(anova(model))
    
    # Variance components
    cat("\n--- Variance Components ---\n")
    print(VarCorr(model))
    
    # Calculate intraclass correlation coefficient (ICC)
    vc <- VarCorr(model)
    sigma_subject <- as.data.frame(vc)$vcov[1]
    sigma_residual <- as.data.frame(vc)$vcov[2]
    ICC <- sigma_subject / (sigma_subject + sigma_residual)
    cat("\nIntraclass Correlation Coefficient (ICC):", round(ICC, 3), "\n")
    
  } else if (method == "traditional") {
    # Traditional rANOVA using aov()
    if (!is.null(between_vars)) {
      formula_str <- paste(response_var, "~", paste(between_vars, collapse = " + "), "+ Error(", subject_var, "/", time_var, ")")
    } else {
      formula_str <- paste(response_var, "~", time_var, "+ Error(", subject_var, "/", time_var, ")")
    }
    cat("Fitting model:", formula_str, "\n\n")
    model <- aov(as.formula(formula_str), data = data)
    
    cat("--- ANOVA Table ---\n")
    print(summary(model))
  }
  
  # Post-hoc tests if there's a significant time effect
  cat("\n--- Post-hoc Comparisons ---\n")
  if (require(emmeans)) {
    # Check for significant time effect (optional: adjust p-value threshold as needed)
    if (method == "mixed") {
      anova_res <- anova(model)
      time_p <- anova_res[time_var, "Pr(>F)"]
    } else {
      anova_res <- summary(model)
      time_p <- anova_res[[2]][[1]]$"Pr(>F)"[1]  # Assuming time is the first term in Error strata
    }
    
    if (!is.na(time_p) && time_p < 0.05) {
      if (method == "mixed") {
        # Dynamically build the specs formula
        specs_formula <- formula(paste("pairwise ~", time_var))
        emm_time <- emmeans(model, specs = specs_formula, adjust = "tukey")
      } else {
        # For traditional ANOVA, fit a linear model for post-hoc tests
        if (is.null(between_vars)) {
          lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var))
        } else {
          lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var, "+", paste(between_vars, collapse = " + ")))
        }
        lm_model <- lm(lm_formula, data = data)
        
        # Dynamically build the specs formula
        specs_formula <- formula(paste("pairwise ~", time_var))
        emm_time <- emmeans(lm_model, specs = specs_formula, adjust = "tukey")
      }
      print(emm_time$contrasts)
    } else {
      cat("No significant time effect detected (p >= 0.05). Skipping post-hoc comparisons.\n")
    }
  }
  
  # Effect sizes
  cat("\n--- Effect Sizes ---\n")
  if (method == "traditional") {
    aov_summary <- summary(model)
    if (!is.null(between_vars)) {
      # This is more complex with between-subjects factors
      cat("Note: Effect size calculation for factorial rANOVA requires manual computation\n")
    } else {
      SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
      SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
      eta_sq <- SS_time / (SS_time + SS_error)
      cat("Partial eta-squared:", round(eta_sq, 3), "\n")
    }
  }
  
  return(model)
}

# Example usage with built-in dataset
cat("\n\n=== EXAMPLE: Chick Weight Dataset ===\n")
data(ChickWeight)
ChickWeight$Time <- factor(ChickWeight$Time)
ChickWeight$Chick <- factor(ChickWeight$Chick)

# Run analysis
result <- run_ranova_analysis(
  data = ChickWeight,
  subject_var = "Chick",
  time_var = "Time",
  response_var = "weight",
  between_vars = "Diet",
  method = "mixed"
)

Common Issues and Solutions

Missing Data Patterns

# Identify missing data patterns
library(mice)

analyze_missing_patterns <- function(data, subject_var, time_var, response_var) {
  # Convert to wide format
  wide_data <- reshape(data, 
                       idvar = subject_var, 
                       timevar = time_var, 
                       direction = "wide")
  
  # Create pattern matrix
  pattern_matrix <- md.pattern(wide_data[, -1], plot = FALSE)
  
  cat("Missing Data Patterns:\n")
  print(pattern_matrix)
  
  # Visualize missingness
  if (require(VIM)) {
    aggr_plot <- aggr(wide_data[, -1], 
                      numbers = TRUE, 
                      sortVars = TRUE, 
                      labels = names(wide_data)[-1],
                      cex.axis = 0.7, 
                      gap = 3, 
                      ylab = c("Missing Data Pattern", "Frequency"))
  }
  
  # Test for MCAR using Little's test
  if (require(norm2)) {
    # Prepare data for Little's test
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
    test_data <- wide_data[, response_cols]
    
    # Perform Little's MCAR test
    little_test <- mcar.test(test_data)
    cat("\nLittle's MCAR Test:\n")
    print(little_test)
  }
  
  return(pattern_matrix)
}

# Example
missing_pattern <- analyze_missing_patterns(clinical_data, "subject", "time", "response")

Violation of Sphericity

First we define the worker function.

handle_sphericity_violation <- function(data, subject_var, time_var, response_var) {
  cat("=== Handling Sphericity Violation ===\n")
  
  # 1. Use Greenhouse-Geisser or Huynh-Feldt corrections
  cat("\n1. Apply epsilon corrections:\n")
  # (Implementation shown in check_ranova_assumptions function above)
  
  # 2. Use multivariate approach (MANOVA)
  cat("\n2. Multivariate approach (MANOVA):\n")
  
  # Convert to wide format
  wide_data <- reshape(data, 
                       idvar = subject_var, 
                       timevar = time_var, 
                       direction = "wide")
  
  # Extract response columns
  response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
  response_matrix <- as.matrix(wide_data[, response_cols])
  
  # Fit MANOVA
  manova_result <- manova(response_matrix ~ 1)
  
  # Test using Pillai's trace (most robust)
  manova_summary <- summary(manova_result, test = "Pillai")
  print(manova_summary)
  
  # 3. Use linear mixed models with unstructured covariance
  cat("\n3. Linear mixed model with unstructured covariance:\n")
  library(nlme)
  
  # Fit model with unstructured covariance
  model_unstructured <- lme(as.formula(paste(response_var, "~", time_var)),
                           random = ~ 1 | subject_var,
                           correlation = corSymm(form = ~ as.numeric(get(time_var)) | subject_var),
                           data = data,
                           control = lmeControl(opt = "optim"))
  
  print(summary(model_unstructured))
  
  # Compare with compound symmetry model
  model_compound <- lme(as.formula(paste(response_var, "~", time_var)),
                       random = ~ 1 | subject_var,
                       correlation = corCompSymm(form = ~ 1 | subject_var),
                       data = data)
  
  cat("\nModel comparison (unstructured vs. compound symmetry):\n")
  print(anova(model_unstructured, model_compound))
}

Next, we test the handle_sphericity_violation () function using a small toy dataset with 4 subjects and 3 time points to keep things simple and illustrative (avoiding potential issues with missing data or large outputs in built-in datasets like ChickWeight). This dataset simulates repeated measures where the response increases over time, which might violate sphericity in a real analysis.

# Load required library
library(nlme)

# Create a small toy dataset for demonstration (modified to avoid collinearity/rank deficiency)
data <- data.frame(
  subject = rep(1:4, each = 3),
  time = rep(1:3, 4),
  response = c(5, 7, 8, 5.5, 6, 7.5, 4, 5, 5.5, 6, 8, 9)
)

# Define the function (with fixes for grouping variable and time conversion)
handle_sphericity_violation <- function(data, subject_var, time_var, response_var) {
  cat("=== Handling Sphericity Violation ===\n")
  
  # 1. Use Greenhouse-Geisser or Huynh-Feldt corrections
  cat("\n1. Apply epsilon corrections:\n")
  # (Implementation shown in check_ranova_assumptions function above)
  
  # 2. Use multivariate approach (MANOVA) for time effect
  cat("\n2. Multivariate approach (MANOVA):\n")
  # Convert to wide format
  wide_data <- reshape(data, idvar = subject_var, timevar = time_var, direction = "wide")
  # Extract response columns (assuming sorted by time levels)
  response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
  response_matrix <- as.matrix(wide_data[, response_cols])
  # Create successive differences to test time effect (k-1 variables)
  if (ncol(response_matrix) > 1) {
    diff_matrix <- response_matrix[, 2:ncol(response_matrix)] - response_matrix[, 1:(ncol(response_matrix)-1)]
    # Fit MANOVA on differences ~1 (tests if mean differences are zero)
    manova_result <- manova(diff_matrix ~ 1)
    # Test using Pillai's trace (most robust)
    manova_summary <- summary(manova_result, test = "Pillai")
    print(manova_summary)
  } else {
    cat("Not enough time points for MANOVA.\n")
  }
  
  # 3. Use linear mixed models with unstructured covariance
  cat("\n3. Linear mixed model with unstructured covariance:\n")
  # library(nlme) # Already loaded globally
  
  # Convert grouping variable to factor (required for lme grouping)
  data[[subject_var]] <- factor(data[[subject_var]])
  
  # Convert time_var to numeric if it's a factor (for correlation covariate)
  if (is.factor(data[[time_var]])) {
    data[[time_var]] <- as.numeric(as.character(data[[time_var]]))
  }
  
  # Fit model with unstructured covariance (dynamic formula)
  cor_form <- as.formula(paste("~", time_var, "|", subject_var))
  
  # *** FIX APPLIED HERE: Create random formula dynamically ***
  random_form <- as.formula(paste("~ 1 |", subject_var)) 
  
  model_unstructured <- lme(as.formula(paste(response_var, "~", time_var)),
                            random = random_form, # Use the dynamically created formula
                            correlation = corSymm(form = cor_form),
                            data = data,
                            control = lmeControl(opt = "optim"))
  print(summary(model_unstructured))
  
  # Compare with compound symmetry model
  model_compound <- lme(as.formula(paste(response_var, "~", time_var)),
                        random = random_form, # Also use the dynamically created formula
                        correlation = corCompSymm(form = as.formula(paste("~ 1 |", subject_var))),
                        data = data)
  cat("\nModel comparison (unstructured vs. compound symmetry):\n")
  print(anova(model_unstructured, model_compound))
}

# Call the function with the toy dataset
handle_sphericity_violation(data, "subject", "time", "response")

Problems and Exercises

1. Conceptual Problems:

  a) Derive the expected mean squares for the rANOVA model
  b) Prove that the F-test for time effects in rANOVA follows an F-distribution under the null hypothesis
  c) Show how the intraclass correlation coefficient relates to the variance components in the mixed model

2. Applied Problems:

  a) Analyze the SOCR LDT dataset using rANOVA
  b) Conduct a repeated measures analysis of the ChickWeight dataset with diet as a between-subjects factor
  c) Perform power analysis for a planned longitudinal study with 5 time points, expecting a medium effect (f² = 0.15), and correlation ρ = 0.6

3. Simulation Study:

# Simulation to compare traditional rANOVA and linear mixed models
simulate_rm_comparison <- function(n_sim = 500, n = 20, k = 4, missing_rate = 0.1) {
  
  type1_error_aov <- numeric(n_sim)
  type1_error_lmm <- numeric(n_sim)
  
  for (sim in 1:n_sim) {
    # Simulate data under null hypothesis (no time effect)
    data <- expand.grid(
      subject = factor(1:n),
      time = factor(1:k)
    )
    
    # Random intercept model
    subject_effects <- rnorm(n, 0, 5)
    data$response <- rep(subject_effects, each = k) + rnorm(n * k, 0, 2)
    
    # Introduce missing data
    if (missing_rate > 0) {
      missing_indices <- sample(1:nrow(data), size = round(nrow(data) * missing_rate))
      data$response[missing_indices] <- NA
    }
    
    # Method 1: Traditional rANOVA (complete cases only)
    complete_data <- data[complete.cases(data), ]
    complete_subjects <- length(unique(complete_data$subject))
    
    if (complete_subjects >= 3) {
      try({
        model_aov <- aov(response ~ time + Error(subject/time), data = complete_data)
        aov_summary <- summary(model_aov)
        p_aov <- aov_summary[[2]][[1]]$"Pr(>F)"[1]
        type1_error_aov[sim] <- p_aov < 0.05
      }, silent = TRUE)
    }
    
    # Method 2: Linear mixed model (uses all available data)
    try({
      library(lmerTest)
      model_lmm <- lmer(response ~ time + (1|subject), data = data, 
                        na.action = na.omit)
      p_lmm <- anova(model_lmm)$"Pr(>F)"[1]
      type1_error_lmm[sim] <- p_lmm < 0.05
    }, silent = TRUE)
  }
  
  cat("Simulation Results (n =", n_sim, "):\n")
  cat("Type I error rate - Traditional rANOVA:", 
      mean(type1_error_aov, na.rm = TRUE), "\n")
  cat("Type I error rate - Linear Mixed Model:", 
      mean(type1_error_lmm, na.rm = TRUE), "\n")
  
  return(data.frame(
    simulation = 1:n_sim,
    aov_error = type1_error_aov,
    lmm_error = type1_error_lmm
  ))
}

# Run simulation
sim_results <- simulate_rm_comparison(n_sim = 200, n = 15, k = 3, missing_rate = 0.1)

References

1. Maxwell, S. E., & Delaney, H. D. (2004). *Designing Experiments and Analyzing Data: A Model Comparison Perspective* (2nd ed.). Psychology Press.

2. Pinheiro, J. C., & Bates, D. M. (2000). *Mixed-Effects Models in S and S-PLUS*. Springer.

3. Verbeke, G., & Molenberghs, G. (2000). *Linear Mixed Models for Longitudinal Data*. Springer.

4. Field, A., Miles, J., & Field, Z. (2012). *Discovering Statistics Using R*. Sage Publications.

5. Gueorguieva, R., & Krystal, J. H. (2004). Move over ANOVA: Progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry. *Archives of General Psychiatry, 61*(3), 310-317.

Online Resources

SOCR Home page: https://www.socr.umich.edu

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige