SMHS rANOVA

From SOCR
Revision as of 12:54, 9 December 2025 by Dinov (talk | contribs) (Handling Autocorrelated Errors)
Jump to: navigation, search

Scientific Methods for Health Sciences - Repeated Measures Analysis of Variance (rANOVA)

Overview

Repeated Measures Analysis of Variance (rANOVA) is a statistical method for analyzing data where the same subjects are measured multiple times under different conditions or at different time points. Unlike independent ANOVA, rANOVA accounts for the correlation between repeated measurements on the same subject, increasing statistical power by reducing error variance attributable to individual differences. This section provides a comprehensive treatment of rANOVA, including its mathematical foundations, assumptions, modern alternatives, and practical implementation in R.

Motivation

Consider a clinical trial measuring blood pressure in patients at baseline, 3 months, and 6 months post-treatment. Using independent ANOVA would violate the independence assumption since measurements are correlated within subjects. rANOVA addresses this by:

1. Partitioning variance into between-subjects and within-subjects components

2. Increasing statistical power by reducing error variance

3. Accounting for individual differences through subject-specific effects

4. Enabling longitudinal analysis of change over time

For studies with missing data, complex covariance structures, or unbalanced designs, Linear Mixed Models (LMMs) provide a more flexible alternative to traditional rANOVA.

Theory

Mathematical Model

Univariate Approach

For \(n\) subjects measured at \(k\) time points, the univariate rANOVA model is\[ y_{ij} = \mu + \pi_i + \tau_j + \varepsilon_{ij}, \quad i=1,\ldots,n,\; j=1,\ldots,k \] where: - \(y_{ij}\) is the measurement for subject \(i\) at time \(j\) - \(\mu\) is the overall mean - \(\pi_i \sim N(0, \sigma_{\pi}^2)\) is the random subject effect - \(\tau_j\) is the fixed time effect (\(\sum_{j=1}^k \tau_j = 0\)) - \(\varepsilon_{ij} \sim N(0, \sigma_{\varepsilon}^2)\) is the within-subject error

The covariance structure assumes\[ \text{Cov}(y_{ij}, y_{i'j'}) = \begin{cases} \sigma_{\pi}^2 + \sigma_{\varepsilon}^2 & \text{if } i=i', j=j' \\ \sigma_{\pi}^2 & \text{if } i=i', j \neq j' \\ 0 & \text{if } i \neq i' \end{cases} \]

Matrix Formulation

Let \(\mathbf{y}_i = (y_{i1}, \ldots, y_{ik})^\top\) be the vector of measurements for subject \(i\)\[ \mathbf{y}_i = \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol{\varepsilon}_i \] where: - \(\mathbf{X}_i\) is the design matrix for fixed effects - \(\boldsymbol{\beta}\) contains fixed effects (including time effects) - \(\mathbf{Z}_i\) is the design matrix for random effects - \(\mathbf{b}_i \sim N(\mathbf{0}, \mathbf{G})\) are random subject effects - \(\boldsymbol{\varepsilon}_i \sim N(\mathbf{0}, \mathbf{R}_i)\) is the within-subject error

For compound symmetry structure\[ \text{Var}(\mathbf{y}_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^\top + \mathbf{R}_i = \sigma_{\pi}^2\mathbf{J}_k + \sigma_{\varepsilon}^2\mathbf{I}_k \] where \(\mathbf{J}_k\) is a \(k \times k\) matrix of ones and \(\mathbf{I}_k\) is the identity matrix.

Variance Partitioning

The total sum of squares decomposes as\[ SST = SS_{BetweenSubjects} + SS_{WithinSubjects} \] \( SS_{WithinSubjects} = SS_{Time} + SS_{Error} \)

More formally\[ \begin{aligned} SST &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Subjects} &= k \sum_{i=1}^n (\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Time} &= n \sum_{j=1}^k (\bar{y}_{\cdot j} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Error} &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{i\cdot} - \bar{y}_{\cdot j} + \bar{y}_{\cdot\cdot})^2 \end{aligned} \]

Degrees of freedom\[ \begin{aligned} df_{Total} &= nk - 1 \\ df_{Subjects} &= n - 1 \\ df_{Time} &= k - 1 \\ df_{Error} &= (n - 1)(k - 1) \end{aligned} \]

Mean squares\[ \begin{aligned} MS_{Subjects} &= SS_{Subjects} / (n-1) \\ MS_{Time} &= SS_{Time} / (k-1) \\ MS_{Error} &= SS_{Error} / ((n-1)(k-1)) \end{aligned} \]

F-statistic for time effect\[ F = \frac{MS_{Time}}{MS_{Error}} \sim F_{k-1, (n-1)(k-1)} \quad \text{under } H_0: \tau_1 = \cdots = \tau_k \]

Sphericity and Corrections

The sphericity assumption (also called circularity) requires\[ \text{Var}(y_{ij} - y_{ij'}) = \text{constant for all } j \neq j' \] Equivalently, the covariance matrix of the difference scores has equal variances and equal covariances.

Mauchly's Test for Sphericity: \( W = \frac{| \mathbf{E} |}{(\text{tr}(\mathbf{E})/p)^{p}} \sim \chi^2_{(p(p-1)/2 - 1)} \) where \(\mathbf{E}\) is the \(p \times p\) covariance matrix of the difference scores, \(p = k-1\).

When sphericity is violated, corrections adjust degrees of freedom:

1. Greenhouse-Geisser correction\[\hat{\epsilon} = \frac{(\text{tr}(\mathbf{E}))^2}{(p-1)\sum_{i=1}^p \lambda_i^2}\] \(F_{GG} \sim F_{\hat{\epsilon}(k-1), \hat{\epsilon}(n-1)(k-1)}\)

2. Huynh-Feldt correction\[\tilde{\epsilon} = \frac{n(k-1)\hat{\epsilon} - 2}{(k-1)[(n-1) - (k-1)\hat{\epsilon}]}\] \(F_{HF} \sim F_{\tilde{\epsilon}(k-1), \tilde{\epsilon}(n-1)(k-1)}\)

4) Multivariate Approach (MANOVA)

For \(k\) time points, consider the \((k-1) \times 1\) vector of difference scores \(\mathbf{d}_i = (y_{i2}-y_{i1}, \ldots, y_{ik}-y_{i,k-1})^\top\)\[ \mathbf{d}_i \sim N_{k-1}(\boldsymbol{\mu}_d, \boldsymbol{\Sigma}_d) \] Test \(H_0: \boldsymbol{\mu}_d = \mathbf{0}\) using Hotelling's \(T^2\)\[ T^2 = n\bar{\mathbf{d}}^\top \mathbf{S}_d^{-1} \bar{\mathbf{d}} \sim \frac{(n-1)(k-1)}{n-k+1} F_{k-1, n-k+1} \] where \(\bar{\mathbf{d}}\) is the sample mean and \(\mathbf{S}_d\) is the sample covariance of difference scores.

Assumptions and Diagnostics

Key Assumptions: 1. Normality: Residuals \(\varepsilon_{ij}\) should be normally distributed 2. Sphericity: Equal variances of differences between all pairs of repeated measures 3. Missing Data: Missing completely at random (MCAR) for valid inference 4. No Time × Subject Interaction: Treatment effects are consistent across subjects

Diagnostic Procedures:

# Comprehensive rANOVA diagnostic function
check_ranova_assumptions <- function(data, subject_var, time_var, response_var) {
  
  cat("=== REPEATED MEASURES ANOVA DIAGNOSTICS ===\n")
  
  # 1. Normality of residuals
  # Fit a linear model ignoring the repeated measures structure first
  model_lm <- lm(as.formula(paste(response_var, "~", time_var)), data = data)
  residuals <- resid(model_lm)
  
  par(mfrow = c(2, 3))
  
  # Q-Q plot
  qqnorm(residuals, main = "Q-Q Plot of Residuals")
  qqline(residuals)
  
  # Shapiro-Wilk test
  shapiro_test <- shapiro.test(residuals)
  cat("1. Normality (Shapiro-Wilk): W =", round(shapiro_test$statistic, 4),
      "p =", format.pval(shapiro_test$p.value, digits = 4), "\n")
  
  # 2. Sphericity test (requires ez package)
  if (require(ez, quietly = TRUE)) {
    # Convert to wide format for Mauchly's test
    wide_data <- reshape(data, 
                         idvar = subject_var, 
                         timevar = time_var, 
                         direction = "wide")
    
    # Remove subject column
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
    response_matrix <- as.matrix(wide_data[, response_cols])
    
    # Mauchly's test using car package
    if (require(car, quietly = TRUE)) {
      # Create an idata frame for repeated measures
      time_levels <- unique(data[[time_var]])
      idata <- data.frame(time = factor(time_levels))
      
      # Fit linear model for repeated measures
      mlm <- lm(response_matrix ~ 1)
      
      # Mauchly's test
      mauchly_test <- Anova(mlm, idata = idata, idesign = ~time, type = "III")
      mauchly_result <- mauchly_test$sphericity.tests
      
      cat("\n2. Mauchly's Test for Sphericity:\n")
      print(mauchly_result)
      
      # Greenhouse-Geisser and Huynh-Feldt corrections
      epsilon_GG <- mauchly_result["Greenhouse-Geisser epsilon", "GG eps"]
      epsilon_HF <- mauchly_result["Huynh-Feldt epsilon", "HF eps"]
      cat("\nEpsilon corrections:\n")
      cat("  Greenhouse-Geisser:", round(epsilon_GG, 3), "\n")
      cat("  Huynh-Feldt:", round(epsilon_HF, 3), "\n")
    }
  }
  
  # 3. Homogeneity of variance across time points
  boxplot(as.formula(paste(response_var, "~", time_var)), 
          data = data,
          main = "Boxplots by Time Point",
          xlab = time_var, ylab = response_var)
  
  # Levene's test for homogeneity of variance
  if (require(car, quietly = TRUE)) {
    levene_test <- leveneTest(as.formula(paste(response_var, "~ factor(", time_var, ")")), 
                              data = data)
    cat("\n3. Levene's Test for Homogeneity of Variance:\n")
    print(levene_test)
  }
  
  # 4. Individual profiles plot
  data_sorted <- data[order(data[[subject_var]], data[[time_var]]), ]
  
  # Create empty plot
  plot(1, type = "n", 
       xlim = range(as.numeric(data[[time_var]])), 
       ylim = range(data[[response_var]]),
       xlab = time_var, ylab = response_var,
       main = "Individual Profiles")
  
  # Plot each subject's profile
  subjects <- unique(data[[subject_var]])
  colors <- rainbow(length(subjects))
  
  for (i in seq_along(subjects)) {
    subject_data <- data[data[[subject_var]] == subjects[i], ]
    lines(as.numeric(subject_data[[time_var]]), 
          subject_data[[response_var]], 
          col = colors[i], type = "b", lwd = 1)
  }
  
  # Add group mean profile
  group_means <- aggregate(as.formula(paste(response_var, "~", time_var)), 
                          data = data, mean)
  lines(as.numeric(group_means[[time_var]]), 
        group_means[[response_var]], 
        col = "black", lwd = 3, type = "b")
  
  legend("topright", legend = c("Individual", "Group Mean"), 
         col = c("gray", "black"), lwd = c(1, 3), bty = "n")
  
  # 5. Residuals vs fitted plot
  plot(fitted(model_lm), residuals,
       xlab = "Fitted Values", ylab = "Residuals",
       main = "Residuals vs Fitted")
  abline(h = 0, col = "red", lty = 2)
  
  # 6. Histogram of residuals
  hist(residuals, main = "Histogram of Residuals",
       xlab = "Residuals", col = "lightblue", freq = FALSE)
  curve(dnorm(x, mean = mean(residuals), sd = sd(residuals)), 
        add = TRUE, col = "red", lwd = 2)
  
  par(mfrow = c(1, 1))
  
  # Return diagnostic summary
  return(list(
    shapiro_test = shapiro_test,
    sphericity_tests = if(exists("mauchly_result")) mauchly_result else NULL,
    homogeneity_test = if(exists("levene_test")) levene_test else NULL
  ))
}

Applications

Example 1: Exercise Training Intervention

# Recreate the example from the motivation section
exercise_data <- data.frame(
  subject = factor(rep(1:6, each = 3)),
  time = factor(rep(c("Pre", "3Months", "6Months"), 6)),
  fitness = c(45, 50, 55,    # Subject 1
              42, 42, 45,    # Subject 2
              36, 41, 43,    # Subject 3
              39, 35, 40,    # Subject 4
              51, 55, 59,    # Subject 5
              44, 49, 56)    # Subject 6
)

cat("=== Exercise Training Intervention Study ===\n")
cat("Research question: Does fitness level change over time?\n\n")

# Display the data
cat("Dataset:\n")
print(exercise_data)

# Summary statistics
cat("\nSummary statistics by time:\n")
print(aggregate(fitness ~ time, data = exercise_data, 
                FUN = function(x) c(mean = mean(x), sd = sd(x))))

# Run diagnostics first
cat("\n=== Model Diagnostics ===\n")
diagnostics <- check_ranova_assumptions(exercise_data, "subject", "time", "fitness")

# Traditional rANOVA using aov()
cat("\n=== Traditional rANOVA ===\n")
model_aov <- aov(fitness ~ time + Error(subject/time), data = exercise_data)
print(summary(model_aov))

# Extract F-statistic and p-value
aov_summary <- summary(model_aov)
F_time <- aov_summary[[2]][[1]]$"F value"[1]
df_time <- aov_summary[[2]][[1]]$"Df"[1]
df_error <- aov_summary[[2]][[1]]$"Df"[2]
p_time <- aov_summary[[2]][[1]]$"Pr(>F)"[1]

cat(sprintf("\nF(%d, %d) = %.2f, p = %.3f\n", 
            df_time, df_error, F_time, p_time))

# Effect size calculation (Partial eta-squared)
SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
eta_sq <- SS_time / (SS_time + SS_error)
cat("Partial eta-squared (effect size):", round(eta_sq, 3), "\n")

# Post-hoc pairwise comparisons with Bonferroni correction
cat("\n=== Post-hoc Pairwise Comparisons ===\n")
library(emmeans)

# Fit linear model for post-hoc tests
model_lm <- lm(fitness ~ time + subject, data = exercise_data)
emm_time <- emmeans(model_lm, specs = pairwise ~ time, adjust = "bonferroni")
print(emm_time$contrasts)

# Visualization
library(ggplot2)
ggplot(exercise_data, aes(x = time, y = fitness, group = subject)) +
  geom_line(aes(color = subject), alpha = 0.6) +
  geom_point(aes(color = subject), size = 2) +
  stat_summary(aes(group = 1), fun = mean, geom = "line", 
               color = "black", size = 1.5) +
  stat_summary(fun.data = mean_se, geom = "errorbar", 
               width = 0.1, color = "black") +
  labs(title = "Fitness Levels Over Time",
       subtitle = "Individual profiles with group mean",
       x = "Time Point", y = "Fitness Score") +
  theme_minimal()

Example 2: Clinical Trial with Missing Data (Using Mixed Models)

# Simulated clinical trial data with missing values
set.seed(123)
n_subjects <- 30
n_timepoints <- 4

clinical_data <- expand.grid(
  subject = factor(1:n_subjects),
  time = factor(paste0("Week_", 0:3)),
  treatment = factor(rep(c("Drug", "Placebo"), each = n_subjects/2))
)

# Generate data with treatment effect and time trend
clinical_data$response <- with(clinical_data, 
  50 +  # Baseline
  ifelse(treatment == "Drug", 5, 0) +  # Treatment effect
  as.numeric(time) * 2 +  # Time effect (2 units per week)
  ifelse(treatment == "Drug", as.numeric(time) * 1, 0) +  # Treatment × time interaction
  rep(rnorm(n_subjects, 0, 3), each = n_timepoints) +  # Random intercept
  rnorm(n_subjects * n_timepoints, 0, 2)  # Residual error
)

# Introduce missing data (MCAR)
missing_indices <- sample(1:nrow(clinical_data), size = 20)
clinical_data$response[missing_indices] <- NA

cat("=== Clinical Trial with Repeated Measures ===\n")
cat("Research question: Does the treatment affect response over time?\n\n")

cat("Missing data pattern:\n")
print(table(is.na(clinical_data$response)))

# Approach 1: Linear Mixed Model (recommended for missing data)
cat("\n=== Linear Mixed Model Approach ===\n")
library(lme4)
library(lmerTest)

# Fit mixed model with random intercepts
mixed_model <- lmer(response ~ time * treatment + (1|subject), 
                    data = clinical_data, 
                    na.action = na.omit)

cat("\nModel summary:\n")
print(summary(mixed_model))

# Test fixed effects
cat("\nTests of fixed effects:\n")
print(anova(mixed_model))

# Random effects variance components
cat("\nVariance components:\n")
print(VarCorr(mixed_model))

# Estimated marginal means
if (require(emmeans)) {
  cat("\n=== Estimated Marginal Means ===\n")
  emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
  print(emm_treatment)
  
  emm_time <- emmeans(mixed_model, specs = ~ time)
  print(emm_time)
  
  # Interaction plot
  emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
  print(emm_interaction)
  

# Robust Interaction plot
if (require(emmeans)) {
  cat("\n=== Estimated Marginal Means ===\n")
  emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
  print(emm_treatment)
  
  emm_time <- emmeans(mixed_model, specs = ~ time)
  print(emm_time)
  
  # Interaction plot
  emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
  
  # Convert to data frame and ensure confidence intervals are present
  plot_data <- as.data.frame(emm_interaction)
  
  # Check if confidence interval columns exist
  if (!all(c("asymp.LCL", "asymp.LCL") %in% names(plot_data))) {
    # If not, get them using summary
    plot_data <- as.data.frame(summary(emm_interaction))
  }
  
  # Now plot
  ggplot(plot_data, aes(x = time, y = emmean, group = treatment, color = treatment)) +
    geom_line(size = 1.5) +
    geom_errorbar(aes(ymin = asymp.LCL, ymax = asymp.UCL), width = 0.1) +
    geom_point(size = 3) +
    labs(title = "Treatment × Time Interaction",
         subtitle = "Estimated marginal means with 95% confidence intervals",
         x = "Time Point", y = "Response") +
    theme_minimal()
}

}

# Approach 2: Traditional rANOVA (for comparison)
cat("\n=== Traditional rANOVA (for comparison) ===\n")
# Remove subjects with missing data for traditional ANOVA
complete_data <- clinical_data[complete.cases(clinical_data), ]
complete_subjects <- unique(complete_data$subject)
cat("Subjects with complete data:", length(complete_subjects), "\n")

if (length(complete_subjects) >= 3) {
  model_aov2 <- aov(response ~ time * treatment + Error(subject/time), 
                    data = complete_data)
  print(summary(model_aov2))
} else {
  cat("Insufficient complete cases for traditional rANOVA\n")
}

Example 3: Real Dataset - CO2 Uptake in Plants

# Using built-in CO2 dataset from R
data(CO2)
cat("=== CO2 Uptake in Plants ===\n")
cat("Research question: Does CO2 uptake change with concentration?\n\n")

# Prepare data: focus on Quebec plants, type Qn1
co2_data <- CO2[CO2$Type == "Quebec" & CO2$Plant == "Qn1", ]
co2_data$conc_factor <- factor(co2_data$conc)

cat("Data structure:\n")
str(co2_data)

# Visual inspection
library(ggplot2)
ggplot(co2_data, aes(x = conc, y = uptake, group = 1)) +
  geom_line() +
  geom_point() +
  labs(title = "CO2 Uptake by Concentration",
       x = "CO2 Concentration (mL/L)", y = "Uptake (umol/m^2 sec)") +
  theme_minimal()

# Since we have only one plant, this is a single-subject repeated measures design
# We'll use a polynomial contrast analysis

cat("\n=== Polynomial Trend Analysis ===\n")
# Create orthogonal polynomial contrasts for concentration
conc_levels <- length(unique(co2_data$conc))
poly_contrasts <- contr.poly(conc_levels)

# Apply contrasts
co2_data$conc_factor <- factor(co2_data$conc)
contrasts(co2_data$conc_factor) <- poly_contrasts

# Fit model with polynomial contrasts
model_poly <- lm(uptake ~ conc_factor, data = co2_data)
print(summary(model_poly))

# Extract polynomial components
anova_poly <- anova(model_poly)
cat("\nPolynomial decomposition:\n")
print(anova_poly)

# Calculate proportion of variance explained by each polynomial component
SS_total <- sum(anova_poly$`Sum Sq`)
for (i in 1:nrow(anova_poly)) {
  prop_var <- anova_poly$`Sum Sq`[i] / SS_total
  cat(sprintf("Component %d: %.1f%% of variance\n", 
              i, prop_var * 100))
}

Advanced Topics

Power Analysis for Repeated Measures

# Power analysis for repeated measures designs
calculate_power_rm <- function(n, k, rho, f2, alpha = 0.05) {
  # n: number of subjects
  # k: number of time points
  # rho: correlation between repeated measures
  # f2: effect size (Cohen's f^2)
  
  # Degrees of freedom
  df1 <- k - 1
  df2 <- (n - 1) * (k - 1)
  
  # Non-centrality parameter
  lambda <- n * k * f2
  
  # Critical F value
  F_crit <- qf(1 - alpha, df1, df2)
  
  # Power
  power <- 1 - pf(F_crit, df1, df2, ncp = lambda)
  
  return(power)
}

# Example: Clinical trial with 20 subjects, 4 time points
power_example <- calculate_power_rm(n = 20, k = 4, rho = 0.5, f2 = 0.25)
cat("Power for repeated measures design:\n")
cat("Subjects: 20, Time points: 4, Effect size f² = 0.25\n")
cat("Estimated power:", round(power_example, 3), "\n")

# Power curve visualization
library(ggplot2)
n_range <- seq(10, 50, by = 5)
power_values <- sapply(n_range, function(n) calculate_power_rm(n, 4, 0.5, 0.25))

power_df <- data.frame(n = n_range, power = power_values)
ggplot(power_df, aes(x = n, y = power)) +
  geom_line(size = 1.5, color = "blue") +
  geom_point(size = 3) +
  geom_hline(yintercept = 0.8, linetype = "dashed", color = "red") +
  labs(title = "Power Analysis for Repeated Measures Design",
       subtitle = "k = 4 time points, ρ = 0.5, f² = 0.25",
       x = "Number of Subjects", y = "Statistical Power") +
  theme_minimal()

Handling Autocorrelated Errors

# When measurements are equally spaced in time, errors may be autocorrelated
library(nlme)

# Simulate data with AR(1) errors
set.seed(456)
n <- 15
k <- 5
time_points <- 1:k

ar_data <- data.frame(
  subject = rep(1:n, each = k),
  time = rep(time_points, n),
  treatment = rep(rep(c("A", "B"), length.out = n), each = k)
)

# Generate data with AR(1) errors
for (i in 1:n) {
  ar_errors <- arima.sim(model = list(ar = 0.7), n = k, sd = 2)
  idx <- ((i-1)*k + 1):(i*k)
  ar_data$response[idx] <- 50 + 
    ifelse(ar_data$treatment[idx] == "A", 5, 0) +
    ar_data$time[idx] * 2 +
    ar_errors
}

# Fit model with AR(1) correlation structure
model_ar1 <- lme(response ~ time * treatment, 
                 random = ~ 1 | subject,
                 correlation = corAR1(form = ~ time | subject),
                 data = ar_data)

# Refit the model using maximum likelihood (method = "ML") instead of REML. 
# This often resolves the non-positive definite matrix issue for computing intervals on variance-covariance components, 
# as ML doesn't apply the same adjustments that can destabilize the approximation under REML.
# Refit the model with ML
model_ar1_ml <- update(model_ar1, method = "ML")

cat("=== Initial Model with AR(1) Correlation Structure ===\n")
print(summary(model_ar1))

cat("=== Refit Model with AR(1) Correlation Structure and Maximum Likelihood  Estimation ===\n")
print(summary(model_ar1_ml))
cat("\nCorrelation parameter:", intervals(model_ar1)$corStruct[2], "\n")

# Compare with compound symmetry model
model_cs <- lme(response ~ time * treatment, 
                random = ~ 1 | subject,
                correlation = corCompSymm(form = ~ 1 | subject),
                data = ar_data)

# Model comparison
cat("\n=== Model Comparison ===\n")
print(anova(model_ar1, model_cs))

Software Implementation

# Comprehensive rANOVA analysis function
run_ranova_analysis <- function(data, subject_var, time_var, response_var, 
                                between_vars = NULL, method = "mixed") {
  
  cat("=== REPEATED MEASURES ANOVA ANALYSIS ===\n\n")
  cat("Subject variable:", subject_var, "\n")
  cat("Time variable:", time_var, "\n")
  cat("Response variable:", response_var, "\n")
  if (!is.null(between_vars)) {
    cat("Between-subjects variables:", paste(between_vars, collapse = ", "), "\n")
  }
  cat("Method:", method, "\n\n")
  
  # Check for missing data
  missing_count <- sum(is.na(data[[response_var]]))
  cat("Missing observations:", missing_count, 
      sprintf("(%.1f%%)\n", missing_count/nrow(data)*100))
  
  if (method == "mixed") {
    # Linear mixed model approach
    library(lme4)
    library(lmerTest)
    
    # Build formula
    if (is.null(between_vars)) {
      formula_str <- paste(response_var, "~", time_var, "+ (1|", subject_var, ")")
    } else {
      between_str <- paste(between_vars, collapse = " + ")
      # Add interactions with time
      interaction_str <- paste(paste(time_var, between_vars, sep = ":"), collapse = " + ")
      formula_str <- paste(response_var, "~", time_var, "*", between_str, 
                          "+ (1|", subject_var, ")")
    }
    
    cat("Fitting model:", formula_str, "\n\n")
    model <- lmer(as.formula(formula_str), data = data)
    
    cat("--- Model Summary ---\n")
    print(summary(model))
    
    cat("\n--- Tests of Fixed Effects ---\n")
    print(anova(model))
    
    # Variance components
    cat("\n--- Variance Components ---\n")
    print(VarCorr(model))
    
    # Calculate intraclass correlation coefficient (ICC)
    vc <- VarCorr(model)
    sigma_subject <- as.data.frame(vc)$vcov[1]
    sigma_residual <- as.data.frame(vc)$vcov[2]
    ICC <- sigma_subject / (sigma_subject + sigma_residual)
    cat("\nIntraclass Correlation Coefficient (ICC):", round(ICC, 3), "\n")
    
  } else if (method == "traditional") {
    # Traditional rANOVA using aov()
    if (!is.null(between_vars)) {
      formula_str <- paste(response_var, "~", 
                          paste(between_vars, collapse = " + "), 
                          "+ Error(", subject_var, "/", time_var, ")")
    } else {
      formula_str <- paste(response_var, "~", time_var, 
                          "+ Error(", subject_var, "/", time_var, ")")
    }
    
    cat("Fitting model:", formula_str, "\n\n")
    model <- aov(as.formula(formula_str), data = data)
    
    cat("--- ANOVA Table ---\n")
    print(summary(model))
  }
  
  # Post-hoc tests if there's a significant time effect
  cat("\n--- Post-hoc Comparisons ---\n")
  if (require(emmeans)) {
    if (method == "mixed") {
      emm_time <- emmeans(model, specs = pairwise ~ time_var)
    } else {
      # For traditional ANOVA, fit a linear model for post-hoc tests
      if (is.null(between_vars)) {
        lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var))
      } else {
        lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var,
                                      "+", paste(between_vars, collapse = " + ")))
      }
      lm_model <- lm(lm_formula, data = data)
      emm_time <- emmeans(lm_model, specs = pairwise ~ time_var, adjust = "tukey")
    }
    print(emm_time$contrasts)
  }
  
  # Effect sizes
  cat("\n--- Effect Sizes ---\n")
  if (method == "traditional") {
    aov_summary <- summary(model)
    if (!is.null(between_vars)) {
      # This is more complex with between-subjects factors
      cat("Note: Effect size calculation for factorial rANOVA requires manual computation\n")
    } else {
      SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
      SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
      eta_sq <- SS_time / (SS_time + SS_error)
      cat("Partial eta-squared:", round(eta_sq, 3), "\n")
    }
  }
  
  return(model)
}

# Example usage with built-in dataset
cat("\n\n=== EXAMPLE: Chick Weight Dataset ===\n")
data(ChickWeight)
ChickWeight$Time <- factor(ChickWeight$Time)
ChickWeight$Chick <- factor(ChickWeight$Chick)

# Run analysis
result <- run_ranova_analysis(
  data = ChickWeight,
  subject_var = "Chick",
  time_var = "Time",
  response_var = "weight",
  between_vars = "Diet",
  method = "mixed"
)

Common Issues and Solutions

Missing Data Patterns

# Identify missing data patterns
library(mice)

analyze_missing_patterns <- function(data, subject_var, time_var, response_var) {
  # Convert to wide format
  wide_data <- reshape(data, 
                       idvar = subject_var, 
                       timevar = time_var, 
                       direction = "wide")
  
  # Create pattern matrix
  pattern_matrix <- md.pattern(wide_data[, -1], plot = FALSE)
  
  cat("Missing Data Patterns:\n")
  print(pattern_matrix)
  
  # Visualize missingness
  if (require(VIM)) {
    aggr_plot <- aggr(wide_data[, -1], 
                      numbers = TRUE, 
                      sortVars = TRUE, 
                      labels = names(wide_data)[-1],
                      cex.axis = 0.7, 
                      gap = 3, 
                      ylab = c("Missing Data Pattern", "Frequency"))
  }
  
  # Test for MCAR using Little's test
  if (require(norm2)) {
    # Prepare data for Little's test
    response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
    test_data <- wide_data[, response_cols]
    
    # Perform Little's MCAR test
    little_test <- mcar.test(test_data)
    cat("\nLittle's MCAR Test:\n")
    print(little_test)
  }
  
  return(pattern_matrix)
}

# Example
missing_pattern <- analyze_missing_patterns(clinical_data, "subject", "time", "response")

Violation of Sphericity

handle_sphericity_violation <- function(data, subject_var, time_var, response_var) {
  cat("=== Handling Sphericity Violation ===\n")
  
  # 1. Use Greenhouse-Geisser or Huynh-Feldt corrections
  cat("\n1. Apply epsilon corrections:\n")
  # (Implementation shown in check_ranova_assumptions function above)
  
  # 2. Use multivariate approach (MANOVA)
  cat("\n2. Multivariate approach (MANOVA):\n")
  
  # Convert to wide format
  wide_data <- reshape(data, 
                       idvar = subject_var, 
                       timevar = time_var, 
                       direction = "wide")
  
  # Extract response columns
  response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
  response_matrix <- as.matrix(wide_data[, response_cols])
  
  # Fit MANOVA
  manova_result <- manova(response_matrix ~ 1)
  
  # Test using Pillai's trace (most robust)
  manova_summary <- summary(manova_result, test = "Pillai")
  print(manova_summary)
  
  # 3. Use linear mixed models with unstructured covariance
  cat("\n3. Linear mixed model with unstructured covariance:\n")
  library(nlme)
  
  # Fit model with unstructured covariance
  model_unstructured <- lme(as.formula(paste(response_var, "~", time_var)),
                           random = ~ 1 | subject_var,
                           correlation = corSymm(form = ~ as.numeric(get(time_var)) | subject_var),
                           data = data,
                           control = lmeControl(opt = "optim"))
  
  print(summary(model_unstructured))
  
  # Compare with compound symmetry model
  model_compound <- lme(as.formula(paste(response_var, "~", time_var)),
                       random = ~ 1 | subject_var,
                       correlation = corCompSymm(form = ~ 1 | subject_var),
                       data = data)
  
  cat("\nModel comparison (unstructured vs. compound symmetry):\n")
  print(anova(model_unstructured, model_compound))
}

Problems and Exercises

1. Conceptual Problems:

  a) Derive the expected mean squares for the rANOVA model
  b) Prove that the F-test for time effects in rANOVA follows an F-distribution under the null hypothesis
  c) Show how the intraclass correlation coefficient relates to the variance components in the mixed model

2. Applied Problems:

  a) Analyze the [SOCR LDT dataset](http://wiki.socr.umich.edu/index.php/SOCR_Data_072108_LDT) using rANOVA
  b) Conduct a repeated measures analysis of the [ChickWeight dataset](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ChickWeight.html) with diet as a between-subjects factor
  c) Perform power analysis for a planned longitudinal study with 5 time points, expecting a medium effect (f² = 0.15), and correlation ρ = 0.6


3. Simulation Study:

# Simulation to compare traditional rANOVA and linear mixed models
simulate_rm_comparison <- function(n_sim = 500, n = 20, k = 4, missing_rate = 0.1) {
  
  type1_error_aov <- numeric(n_sim)
  type1_error_lmm <- numeric(n_sim)
  
  for (sim in 1:n_sim) {
    # Simulate data under null hypothesis (no time effect)
    data <- expand.grid(
      subject = factor(1:n),
      time = factor(1:k)
    )
    
    # Random intercept model
    subject_effects <- rnorm(n, 0, 5)
    data$response <- rep(subject_effects, each = k) + rnorm(n * k, 0, 2)
    
    # Introduce missing data
    if (missing_rate > 0) {
      missing_indices <- sample(1:nrow(data), size = round(nrow(data) * missing_rate))
      data$response[missing_indices] <- NA
    }
    
    # Method 1: Traditional rANOVA (complete cases only)
    complete_data <- data[complete.cases(data), ]
    complete_subjects <- length(unique(complete_data$subject))
    
    if (complete_subjects >= 3) {
      try({
        model_aov <- aov(response ~ time + Error(subject/time), data = complete_data)
        aov_summary <- summary(model_aov)
        p_aov <- aov_summary[[2]][[1]]$"Pr(>F)"[1]
        type1_error_aov[sim] <- p_aov < 0.05
      }, silent = TRUE)
    }
    
    # Method 2: Linear mixed model (uses all available data)
    try({
      library(lmerTest)
      model_lmm <- lmer(response ~ time + (1|subject), data = data, 
                        na.action = na.omit)
      p_lmm <- anova(model_lmm)$"Pr(>F)"[1]
      type1_error_lmm[sim] <- p_lmm < 0.05
    }, silent = TRUE)
  }
  
  cat("Simulation Results (n =", n_sim, "):\n")
  cat("Type I error rate - Traditional rANOVA:", 
      mean(type1_error_aov, na.rm = TRUE), "\n")
  cat("Type I error rate - Linear Mixed Model:", 
      mean(type1_error_lmm, na.rm = TRUE), "\n")
  
  return(data.frame(
    simulation = 1:n_sim,
    aov_error = type1_error_aov,
    lmm_error = type1_error_lmm
  ))
}

# Run simulation
sim_results <- simulate_rm_comparison(n_sim = 200, n = 15, k = 3, missing_rate = 0.1)

References

1. Maxwell, S. E., & Delaney, H. D. (2004). *Designing Experiments and Analyzing Data: A Model Comparison Perspective* (2nd ed.). Psychology Press. 2. Pinheiro, J. C., & Bates, D. M. (2000). *Mixed-Effects Models in S and S-PLUS*. Springer. 3. Verbeke, G., & Molenberghs, G. (2000). *Linear Mixed Models for Longitudinal Data*. Springer. 4. Field, A., Miles, J., & Field, Z. (2012). *Discovering Statistics Using R*. Sage Publications. 5. Gueorguieva, R., & Krystal, J. H. (2004). Move over ANOVA: Progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry. *Archives of General Psychiatry, 61*(3), 310-317.

Online Resources




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif