Difference between revisions of "SMHS rANOVA"
(→Sphericity and Corrections) |
|||
| Line 122: | Line 122: | ||
1. '''Greenhouse-Geisser correction''': <math>\hat{\epsilon} = \frac{(\text{tr}(\mathbf{E}))^2}{(p-1)\sum_{i=1}^p \lambda_i^2}</math> | 1. '''Greenhouse-Geisser correction''': <math>\hat{\epsilon} = \frac{(\text{tr}(\mathbf{E}))^2}{(p-1)\sum_{i=1}^p \lambda_i^2}</math> | ||
| − | <math> | + | <math>F_{GG} \sim F_{\hat{\epsilon}(k-1), \hat{\epsilon}(n-1)(k-1)}</math> |
| − | |||
| − | |||
2. '''Huynh-Feldt correction''': <math>\tilde{\epsilon} = \frac{n(k-1)\hat{\epsilon} - 2}{(k-1)[(n-1) - (k-1)\hat{\epsilon}]}</math> | 2. '''Huynh-Feldt correction''': <math>\tilde{\epsilon} = \frac{n(k-1)\hat{\epsilon} - 2}{(k-1)[(n-1) - (k-1)\hat{\epsilon}]}</math> | ||
| − | <math> | + | <math>F_{HF} \sim F_{\tilde{\epsilon}(k-1), \tilde{\epsilon}(n-1)(k-1)}</math> |
| − | |||
| − | |||
====4) Multivariate Approach (MANOVA)==== | ====4) Multivariate Approach (MANOVA)==== | ||
Revision as of 12:28, 9 December 2025
Contents
- 1 Scientific Methods for Health Sciences - Repeated Measures Analysis of Variance (rANOVA)
Scientific Methods for Health Sciences - Repeated Measures Analysis of Variance (rANOVA)
Overview
Repeated Measures Analysis of Variance (rANOVA) is a statistical method for analyzing data where the same subjects are measured multiple times under different conditions or at different time points. Unlike independent ANOVA, rANOVA accounts for the correlation between repeated measurements on the same subject, increasing statistical power by reducing error variance attributable to individual differences. This section provides a comprehensive treatment of rANOVA, including its mathematical foundations, assumptions, modern alternatives, and practical implementation in R.
Motivation
Consider a clinical trial measuring blood pressure in patients at baseline, 3 months, and 6 months post-treatment. Using independent ANOVA would violate the independence assumption since measurements are correlated within subjects. rANOVA addresses this by:
1. Partitioning variance into between-subjects and within-subjects components
2. Increasing statistical power by reducing error variance
3. Accounting for individual differences through subject-specific effects
4. Enabling longitudinal analysis of change over time
For studies with missing data, complex covariance structures, or unbalanced designs, Linear Mixed Models (LMMs) provide a more flexible alternative to traditional rANOVA.
Theory
Mathematical Model
Univariate Approach
For \(n\) subjects measured at \(k\) time points, the univariate rANOVA model is\[ y_{ij} = \mu + \pi_i + \tau_j + \varepsilon_{ij}, \quad i=1,\ldots,n,\; j=1,\ldots,k \] where: - \(y_{ij}\) is the measurement for subject \(i\) at time \(j\) - \(\mu\) is the overall mean - \(\pi_i \sim N(0, \sigma_{\pi}^2)\) is the random subject effect - \(\tau_j\) is the fixed time effect (\(\sum_{j=1}^k \tau_j = 0\)) - \(\varepsilon_{ij} \sim N(0, \sigma_{\varepsilon}^2)\) is the within-subject error
The covariance structure assumes\[ \text{Cov}(y_{ij}, y_{i'j'}) = \begin{cases} \sigma_{\pi}^2 + \sigma_{\varepsilon}^2 & \text{if } i=i', j=j' \\ \sigma_{\pi}^2 & \text{if } i=i', j \neq j' \\ 0 & \text{if } i \neq i' \end{cases} \]
Matrix Formulation
Let \(\mathbf{y}_i = (y_{i1}, \ldots, y_{ik})^\top\) be the vector of measurements for subject \(i\)\[ \mathbf{y}_i = \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol{\varepsilon}_i \] where: - \(\mathbf{X}_i\) is the design matrix for fixed effects - \(\boldsymbol{\beta}\) contains fixed effects (including time effects) - \(\mathbf{Z}_i\) is the design matrix for random effects - \(\mathbf{b}_i \sim N(\mathbf{0}, \mathbf{G})\) are random subject effects - \(\boldsymbol{\varepsilon}_i \sim N(\mathbf{0}, \mathbf{R}_i)\) is the within-subject error
For compound symmetry structure\[ \text{Var}(\mathbf{y}_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^\top + \mathbf{R}_i = \sigma_{\pi}^2\mathbf{J}_k + \sigma_{\varepsilon}^2\mathbf{I}_k \] where \(\mathbf{J}_k\) is a \(k \times k\) matrix of ones and \(\mathbf{I}_k\) is the identity matrix.
Variance Partitioning
The total sum of squares decomposes as\[ SST = SS_{BetweenSubjects} + SS_{WithinSubjects} \] \( SS_{WithinSubjects} = SS_{Time} + SS_{Error} \)
More formally\[ \begin{aligned} SST &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Subjects} &= k \sum_{i=1}^n (\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Time} &= n \sum_{j=1}^k (\bar{y}_{\cdot j} - \bar{y}_{\cdot\cdot})^2 \\ SS_{Error} &= \sum_{i=1}^n \sum_{j=1}^k (y_{ij} - \bar{y}_{i\cdot} - \bar{y}_{\cdot j} + \bar{y}_{\cdot\cdot})^2 \end{aligned} \]
Degrees of freedom\[ \begin{aligned} df_{Total} &= nk - 1 \\ df_{Subjects} &= n - 1 \\ df_{Time} &= k - 1 \\ df_{Error} &= (n - 1)(k - 1) \end{aligned} \]
Mean squares\[ \begin{aligned} MS_{Subjects} &= SS_{Subjects} / (n-1) \\ MS_{Time} &= SS_{Time} / (k-1) \\ MS_{Error} &= SS_{Error} / ((n-1)(k-1)) \end{aligned} \]
F-statistic for time effect\[ F = \frac{MS_{Time}}{MS_{Error}} \sim F_{k-1, (n-1)(k-1)} \quad \text{under } H_0: \tau_1 = \cdots = \tau_k \]
Sphericity and Corrections
The sphericity assumption (also called circularity) requires\[ \text{Var}(y_{ij} - y_{ij'}) = \text{constant for all } j \neq j' \] Equivalently, the covariance matrix of the difference scores has equal variances and equal covariances.
Mauchly's Test for Sphericity: \( W = \frac{| \mathbf{E} |}{(\text{tr}(\mathbf{E})/p)^{p}} \sim \chi^2_{(p(p-1)/2 - 1)} \) where \(\mathbf{E}\) is the \(p \times p\) covariance matrix of the difference scores, \(p = k-1\).
When sphericity is violated, corrections adjust degrees of freedom:
1. Greenhouse-Geisser correction\[\hat{\epsilon} = \frac{(\text{tr}(\mathbf{E}))^2}{(p-1)\sum_{i=1}^p \lambda_i^2}\]
\(F_{GG} \sim F_{\hat{\epsilon}(k-1), \hat{\epsilon}(n-1)(k-1)}\)
2. Huynh-Feldt correction\[\tilde{\epsilon} = \frac{n(k-1)\hat{\epsilon} - 2}{(k-1)[(n-1) - (k-1)\hat{\epsilon}]}\]
\(F_{HF} \sim F_{\tilde{\epsilon}(k-1), \tilde{\epsilon}(n-1)(k-1)}\)
4) Multivariate Approach (MANOVA)
For \(k\) time points, consider the \((k-1) \times 1\) vector of difference scores \(\mathbf{d}_i = (y_{i2}-y_{i1}, \ldots, y_{ik}-y_{i,k-1})^\top\)\[ \mathbf{d}_i \sim N_{k-1}(\boldsymbol{\mu}_d, \boldsymbol{\Sigma}_d) \] Test \(H_0: \boldsymbol{\mu}_d = \mathbf{0}\) using Hotelling's \(T^2\)\[ T^2 = n\bar{\mathbf{d}}^\top \mathbf{S}_d^{-1} \bar{\mathbf{d}} \sim \frac{(n-1)(k-1)}{n-k+1} F_{k-1, n-k+1} \] where \(\bar{\mathbf{d}}\) is the sample mean and \(\mathbf{S}_d\) is the sample covariance of difference scores.
Assumptions and Diagnostics
Key Assumptions: 1. Normality: Residuals \(\varepsilon_{ij}\) should be normally distributed 2. Sphericity: Equal variances of differences between all pairs of repeated measures 3. Missing Data: Missing completely at random (MCAR) for valid inference 4. No Time × Subject Interaction: Treatment effects are consistent across subjects
Diagnostic Procedures:
# Comprehensive rANOVA diagnostic function
check_ranova_assumptions <- function(data, subject_var, time_var, response_var) {
cat("=== REPEATED MEASURES ANOVA DIAGNOSTICS ===\n")
# 1. Normality of residuals
# Fit a linear model ignoring the repeated measures structure first
model_lm <- lm(as.formula(paste(response_var, "~", time_var)), data = data)
residuals <- resid(model_lm)
par(mfrow = c(2, 3))
# Q-Q plot
qqnorm(residuals, main = "Q-Q Plot of Residuals")
qqline(residuals)
# Shapiro-Wilk test
shapiro_test <- shapiro.test(residuals)
cat("1. Normality (Shapiro-Wilk): W =", round(shapiro_test$statistic, 4),
"p =", format.pval(shapiro_test$p.value, digits = 4), "\n")
# 2. Sphericity test (requires ez package)
if (require(ez, quietly = TRUE)) {
# Convert to wide format for Mauchly's test
wide_data <- reshape(data,
idvar = subject_var,
timevar = time_var,
direction = "wide")
# Remove subject column
response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
response_matrix <- as.matrix(wide_data[, response_cols])
# Mauchly's test using car package
if (require(car, quietly = TRUE)) {
# Create an idata frame for repeated measures
time_levels <- unique(data[[time_var]])
idata <- data.frame(time = factor(time_levels))
# Fit linear model for repeated measures
mlm <- lm(response_matrix ~ 1)
# Mauchly's test
mauchly_test <- Anova(mlm, idata = idata, idesign = ~time, type = "III")
mauchly_result <- mauchly_test$sphericity.tests
cat("\n2. Mauchly's Test for Sphericity:\n")
print(mauchly_result)
# Greenhouse-Geisser and Huynh-Feldt corrections
epsilon_GG <- mauchly_result["Greenhouse-Geisser epsilon", "GG eps"]
epsilon_HF <- mauchly_result["Huynh-Feldt epsilon", "HF eps"]
cat("\nEpsilon corrections:\n")
cat(" Greenhouse-Geisser:", round(epsilon_GG, 3), "\n")
cat(" Huynh-Feldt:", round(epsilon_HF, 3), "\n")
}
}
# 3. Homogeneity of variance across time points
boxplot(as.formula(paste(response_var, "~", time_var)),
data = data,
main = "Boxplots by Time Point",
xlab = time_var, ylab = response_var)
# Levene's test for homogeneity of variance
if (require(car, quietly = TRUE)) {
levene_test <- leveneTest(as.formula(paste(response_var, "~ factor(", time_var, ")")),
data = data)
cat("\n3. Levene's Test for Homogeneity of Variance:\n")
print(levene_test)
}
# 4. Individual profiles plot
data_sorted <- data[order(data[[subject_var]], data[[time_var]]), ]
# Create empty plot
plot(1, type = "n",
xlim = range(as.numeric(data[[time_var]])),
ylim = range(data[[response_var]]),
xlab = time_var, ylab = response_var,
main = "Individual Profiles")
# Plot each subject's profile
subjects <- unique(data[[subject_var]])
colors <- rainbow(length(subjects))
for (i in seq_along(subjects)) {
subject_data <- data[data[[subject_var]] == subjects[i], ]
lines(as.numeric(subject_data[[time_var]]),
subject_data[[response_var]],
col = colors[i], type = "b", lwd = 1)
}
# Add group mean profile
group_means <- aggregate(as.formula(paste(response_var, "~", time_var)),
data = data, mean)
lines(as.numeric(group_means[[time_var]]),
group_means[[response_var]],
col = "black", lwd = 3, type = "b")
legend("topright", legend = c("Individual", "Group Mean"),
col = c("gray", "black"), lwd = c(1, 3), bty = "n")
# 5. Residuals vs fitted plot
plot(fitted(model_lm), residuals,
xlab = "Fitted Values", ylab = "Residuals",
main = "Residuals vs Fitted")
abline(h = 0, col = "red", lty = 2)
# 6. Histogram of residuals
hist(residuals, main = "Histogram of Residuals",
xlab = "Residuals", col = "lightblue", freq = FALSE)
curve(dnorm(x, mean = mean(residuals), sd = sd(residuals)),
add = TRUE, col = "red", lwd = 2)
par(mfrow = c(1, 1))
# Return diagnostic summary
return(list(
shapiro_test = shapiro_test,
sphericity_tests = if(exists("mauchly_result")) mauchly_result else NULL,
homogeneity_test = if(exists("levene_test")) levene_test else NULL
))
}
Applications
Example 1: Exercise Training Intervention
# Recreate the example from the motivation section
exercise_data <- data.frame(
subject = factor(rep(1:6, each = 3)),
time = factor(rep(c("Pre", "3Months", "6Months"), 6)),
fitness = c(45, 50, 55, # Subject 1
42, 42, 45, # Subject 2
36, 41, 43, # Subject 3
39, 35, 40, # Subject 4
51, 55, 59, # Subject 5
44, 49, 56) # Subject 6
)
cat("=== Exercise Training Intervention Study ===\n")
cat("Research question: Does fitness level change over time?\n\n")
# Display the data
cat("Dataset:\n")
print(exercise_data)
# Summary statistics
cat("\nSummary statistics by time:\n")
print(aggregate(fitness ~ time, data = exercise_data,
FUN = function(x) c(mean = mean(x), sd = sd(x))))
# Run diagnostics first
cat("\n=== Model Diagnostics ===\n")
diagnostics <- check_ranova_assumptions(exercise_data, "subject", "time", "fitness")
# Traditional rANOVA using aov()
cat("\n=== Traditional rANOVA ===\n")
model_aov <- aov(fitness ~ time + Error(subject/time), data = exercise_data)
print(summary(model_aov))
# Extract F-statistic and p-value
aov_summary <- summary(model_aov)
F_time <- aov_summary[[2]][[1]]$"F value"[1]
df_time <- aov_summary[[2]][[1]]$"Df"[1]
df_error <- aov_summary[[2]][[1]]$"Df"[2]
p_time <- aov_summary[[2]][[1]]$"Pr(>F)"[1]
cat(sprintf("\nF(%d, %d) = %.2f, p = %.3f\n",
df_time, df_error, F_time, p_time))
# Effect size calculation (Partial eta-squared)
SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
eta_sq <- SS_time / (SS_time + SS_error)
cat("Partial eta-squared (effect size):", round(eta_sq, 3), "\n")
# Post-hoc pairwise comparisons with Bonferroni correction
cat("\n=== Post-hoc Pairwise Comparisons ===\n")
library(emmeans)
# Fit linear model for post-hoc tests
model_lm <- lm(fitness ~ time + subject, data = exercise_data)
emm_time <- emmeans(model_lm, specs = pairwise ~ time, adjust = "bonferroni")
print(emm_time$contrasts)
# Visualization
library(ggplot2)
ggplot(exercise_data, aes(x = time, y = fitness, group = subject)) +
geom_line(aes(color = subject), alpha = 0.6) +
geom_point(aes(color = subject), size = 2) +
stat_summary(aes(group = 1), fun = mean, geom = "line",
color = "black", size = 1.5) +
stat_summary(fun.data = mean_se, geom = "errorbar",
width = 0.1, color = "black") +
labs(title = "Fitness Levels Over Time",
subtitle = "Individual profiles with group mean",
x = "Time Point", y = "Fitness Score") +
theme_minimal()
Example 2: Clinical Trial with Missing Data (Using Mixed Models)
# Simulated clinical trial data with missing values
set.seed(123)
n_subjects <- 30
n_timepoints <- 4
clinical_data <- expand.grid(
subject = factor(1:n_subjects),
time = factor(paste0("Week_", 0:3)),
treatment = factor(rep(c("Drug", "Placebo"), each = n_subjects/2))
)
# Generate data with treatment effect and time trend
clinical_data$response <- with(clinical_data,
50 + # Baseline
ifelse(treatment == "Drug", 5, 0) + # Treatment effect
as.numeric(time) * 2 + # Time effect (2 units per week)
ifelse(treatment == "Drug", as.numeric(time) * 1, 0) + # Treatment × time interaction
rep(rnorm(n_subjects, 0, 3), each = n_timepoints) + # Random intercept
rnorm(n_subjects * n_timepoints, 0, 2) # Residual error
)
# Introduce missing data (MCAR)
missing_indices <- sample(1:nrow(clinical_data), size = 20)
clinical_data$response[missing_indices] <- NA
cat("=== Clinical Trial with Repeated Measures ===\n")
cat("Research question: Does the treatment affect response over time?\n\n")
cat("Missing data pattern:\n")
print(table(is.na(clinical_data$response)))
# Approach 1: Linear Mixed Model (recommended for missing data)
cat("\n=== Linear Mixed Model Approach ===\n")
library(lme4)
library(lmerTest)
# Fit mixed model with random intercepts
mixed_model <- lmer(response ~ time * treatment + (1|subject),
data = clinical_data,
na.action = na.omit)
cat("\nModel summary:\n")
print(summary(mixed_model))
# Test fixed effects
cat("\nTests of fixed effects:\n")
print(anova(mixed_model))
# Random effects variance components
cat("\nVariance components:\n")
print(VarCorr(mixed_model))
# Estimated marginal means
if (require(emmeans)) {
cat("\n=== Estimated Marginal Means ===\n")
emm_treatment <- emmeans(mixed_model, specs = ~ treatment)
print(emm_treatment)
emm_time <- emmeans(mixed_model, specs = ~ time)
print(emm_time)
# Interaction plot
emm_interaction <- emmeans(mixed_model, specs = ~ treatment:time)
print(emm_interaction)
# Plot interaction
plot_data <- as.data.frame(emm_interaction)
ggplot(plot_data, aes(x = time, y = emmean, group = treatment, color = treatment)) +
geom_line(size = 1.5) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.1) +
geom_point(size = 3) +
labs(title = "Treatment × Time Interaction",
subtitle = "Estimated marginal means with 95% confidence intervals",
x = "Time Point", y = "Response") +
theme_minimal()
}
# Approach 2: Traditional rANOVA (for comparison)
cat("\n=== Traditional rANOVA (for comparison) ===\n")
# Remove subjects with missing data for traditional ANOVA
complete_data <- clinical_data[complete.cases(clinical_data), ]
complete_subjects <- unique(complete_data$subject)
cat("Subjects with complete data:", length(complete_subjects), "\n")
if (length(complete_subjects) >= 3) {
model_aov2 <- aov(response ~ time * treatment + Error(subject/time),
data = complete_data)
print(summary(model_aov2))
} else {
cat("Insufficient complete cases for traditional rANOVA\n")
}
Example 3: Real Dataset - CO2 Uptake in Plants
# Using built-in CO2 dataset from R
data(CO2)
cat("=== CO2 Uptake in Plants ===\n")
cat("Research question: Does CO2 uptake change with concentration?\n\n")
# Prepare data: focus on Quebec plants, type Qn1
co2_data <- CO2[CO2$Type == "Quebec" & CO2$Plant == "Qn1", ]
co2_data$conc_factor <- factor(co2_data$conc)
cat("Data structure:\n")
str(co2_data)
# Visual inspection
library(ggplot2)
ggplot(co2_data, aes(x = conc, y = uptake, group = 1)) +
geom_line() +
geom_point() +
labs(title = "CO2 Uptake by Concentration",
x = "CO2 Concentration (mL/L)", y = "Uptake (umol/m^2 sec)") +
theme_minimal()
# Since we have only one plant, this is a single-subject repeated measures design
# We'll use a polynomial contrast analysis
cat("\n=== Polynomial Trend Analysis ===\n")
# Create orthogonal polynomial contrasts for concentration
conc_levels <- length(unique(co2_data$conc))
poly_contrasts <- contr.poly(conc_levels)
# Apply contrasts
co2_data$conc_factor <- factor(co2_data$conc)
contrasts(co2_data$conc_factor) <- poly_contrasts
# Fit model with polynomial contrasts
model_poly <- lm(uptake ~ conc_factor, data = co2_data)
print(summary(model_poly))
# Extract polynomial components
anova_poly <- anova(model_poly)
cat("\nPolynomial decomposition:\n")
print(anova_poly)
# Calculate proportion of variance explained by each polynomial component
SS_total <- sum(anova_poly$`Sum Sq`)
for (i in 1:nrow(anova_poly)) {
prop_var <- anova_poly$`Sum Sq`[i] / SS_total
cat(sprintf("Component %d: %.1f%% of variance\n",
i, prop_var * 100))
}
Advanced Topics
Power Analysis for Repeated Measures
# Power analysis for repeated measures designs
calculate_power_rm <- function(n, k, rho, f2, alpha = 0.05) {
# n: number of subjects
# k: number of time points
# rho: correlation between repeated measures
# f2: effect size (Cohen's f^2)
# Degrees of freedom
df1 <- k - 1
df2 <- (n - 1) * (k - 1)
# Non-centrality parameter
lambda <- n * k * f2
# Critical F value
F_crit <- qf(1 - alpha, df1, df2)
# Power
power <- 1 - pf(F_crit, df1, df2, ncp = lambda)
return(power)
}
# Example: Clinical trial with 20 subjects, 4 time points
power_example <- calculate_power_rm(n = 20, k = 4, rho = 0.5, f2 = 0.25)
cat("Power for repeated measures design:\n")
cat("Subjects: 20, Time points: 4, Effect size f² = 0.25\n")
cat("Estimated power:", round(power_example, 3), "\n")
# Power curve visualization
library(ggplot2)
n_range <- seq(10, 50, by = 5)
power_values <- sapply(n_range, function(n) calculate_power_rm(n, 4, 0.5, 0.25))
power_df <- data.frame(n = n_range, power = power_values)
ggplot(power_df, aes(x = n, y = power)) +
geom_line(size = 1.5, color = "blue") +
geom_point(size = 3) +
geom_hline(yintercept = 0.8, linetype = "dashed", color = "red") +
labs(title = "Power Analysis for Repeated Measures Design",
subtitle = "k = 4 time points, ρ = 0.5, f² = 0.25",
x = "Number of Subjects", y = "Statistical Power") +
theme_minimal()
# When measurements are equally spaced in time, errors may be autocorrelated
library(nlme)
# Simulate data with AR(1) errors
set.seed(456)
n <- 15
k <- 5
time_points <- 1:k
ar_data <- data.frame(
subject = rep(1:n, each = k),
time = rep(time_points, n),
treatment = rep(rep(c("A", "B"), length.out = n), each = k)
)
# Generate data with AR(1) errors
for (i in 1:n) {
ar_errors <- arima.sim(model = list(ar = 0.7), n = k, sd = 2)
idx <- ((i-1)*k + 1):(i*k)
ar_data$response[idx] <- 50 +
ifelse(ar_data$treatment[idx] == "A", 5, 0) +
ar_data$time[idx] * 2 +
ar_errors
}
# Fit model with AR(1) correlation structure
model_ar1 <- lme(response ~ time * treatment,
random = ~ 1 | subject,
correlation = corAR1(form = ~ time | subject),
data = ar_data)
cat("=== Model with AR(1) Correlation Structure ===\n")
print(summary(model_ar1))
cat("\nCorrelation parameter:", intervals(model_ar1)$corStruct[2], "\n")
# Compare with compound symmetry model
model_cs <- lme(response ~ time * treatment,
random = ~ 1 | subject,
correlation = corCompSymm(form = ~ 1 | subject),
data = ar_data)
# Model comparison
cat("\n=== Model Comparison ===\n")
print(anova(model_ar1, model_cs))
Software Implementation
# Comprehensive rANOVA analysis function
run_ranova_analysis <- function(data, subject_var, time_var, response_var,
between_vars = NULL, method = "mixed") {
cat("=== REPEATED MEASURES ANOVA ANALYSIS ===\n\n")
cat("Subject variable:", subject_var, "\n")
cat("Time variable:", time_var, "\n")
cat("Response variable:", response_var, "\n")
if (!is.null(between_vars)) {
cat("Between-subjects variables:", paste(between_vars, collapse = ", "), "\n")
}
cat("Method:", method, "\n\n")
# Check for missing data
missing_count <- sum(is.na(data[[response_var]]))
cat("Missing observations:", missing_count,
sprintf("(%.1f%%)\n", missing_count/nrow(data)*100))
if (method == "mixed") {
# Linear mixed model approach
library(lme4)
library(lmerTest)
# Build formula
if (is.null(between_vars)) {
formula_str <- paste(response_var, "~", time_var, "+ (1|", subject_var, ")")
} else {
between_str <- paste(between_vars, collapse = " + ")
# Add interactions with time
interaction_str <- paste(paste(time_var, between_vars, sep = ":"), collapse = " + ")
formula_str <- paste(response_var, "~", time_var, "*", between_str,
"+ (1|", subject_var, ")")
}
cat("Fitting model:", formula_str, "\n\n")
model <- lmer(as.formula(formula_str), data = data)
cat("--- Model Summary ---\n")
print(summary(model))
cat("\n--- Tests of Fixed Effects ---\n")
print(anova(model))
# Variance components
cat("\n--- Variance Components ---\n")
print(VarCorr(model))
# Calculate intraclass correlation coefficient (ICC)
vc <- VarCorr(model)
sigma_subject <- as.data.frame(vc)$vcov[1]
sigma_residual <- as.data.frame(vc)$vcov[2]
ICC <- sigma_subject / (sigma_subject + sigma_residual)
cat("\nIntraclass Correlation Coefficient (ICC):", round(ICC, 3), "\n")
} else if (method == "traditional") {
# Traditional rANOVA using aov()
if (!is.null(between_vars)) {
formula_str <- paste(response_var, "~",
paste(between_vars, collapse = " + "),
"+ Error(", subject_var, "/", time_var, ")")
} else {
formula_str <- paste(response_var, "~", time_var,
"+ Error(", subject_var, "/", time_var, ")")
}
cat("Fitting model:", formula_str, "\n\n")
model <- aov(as.formula(formula_str), data = data)
cat("--- ANOVA Table ---\n")
print(summary(model))
}
# Post-hoc tests if there's a significant time effect
cat("\n--- Post-hoc Comparisons ---\n")
if (require(emmeans)) {
if (method == "mixed") {
emm_time <- emmeans(model, specs = pairwise ~ time_var)
} else {
# For traditional ANOVA, fit a linear model for post-hoc tests
if (is.null(between_vars)) {
lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var))
} else {
lm_formula <- as.formula(paste(response_var, "~", subject_var, "+", time_var,
"+", paste(between_vars, collapse = " + ")))
}
lm_model <- lm(lm_formula, data = data)
emm_time <- emmeans(lm_model, specs = pairwise ~ time_var, adjust = "tukey")
}
print(emm_time$contrasts)
}
# Effect sizes
cat("\n--- Effect Sizes ---\n")
if (method == "traditional") {
aov_summary <- summary(model)
if (!is.null(between_vars)) {
# This is more complex with between-subjects factors
cat("Note: Effect size calculation for factorial rANOVA requires manual computation\n")
} else {
SS_time <- aov_summary[[2]][[1]]$"Sum Sq"[1]
SS_error <- aov_summary[[2]][[1]]$"Sum Sq"[2]
eta_sq <- SS_time / (SS_time + SS_error)
cat("Partial eta-squared:", round(eta_sq, 3), "\n")
}
}
return(model)
}
# Example usage with built-in dataset
cat("\n\n=== EXAMPLE: Chick Weight Dataset ===\n")
data(ChickWeight)
ChickWeight$Time <- factor(ChickWeight$Time)
ChickWeight$Chick <- factor(ChickWeight$Chick)
# Run analysis
result <- run_ranova_analysis(
data = ChickWeight,
subject_var = "Chick",
time_var = "Time",
response_var = "weight",
between_vars = "Diet",
method = "mixed"
)
Common Issues and Solutions
Missing Data Patterns
# Identify missing data patterns
library(mice)
analyze_missing_patterns <- function(data, subject_var, time_var, response_var) {
# Convert to wide format
wide_data <- reshape(data,
idvar = subject_var,
timevar = time_var,
direction = "wide")
# Create pattern matrix
pattern_matrix <- md.pattern(wide_data[, -1], plot = FALSE)
cat("Missing Data Patterns:\n")
print(pattern_matrix)
# Visualize missingness
if (require(VIM)) {
aggr_plot <- aggr(wide_data[, -1],
numbers = TRUE,
sortVars = TRUE,
labels = names(wide_data)[-1],
cex.axis = 0.7,
gap = 3,
ylab = c("Missing Data Pattern", "Frequency"))
}
# Test for MCAR using Little's test
if (require(norm2)) {
# Prepare data for Little's test
response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
test_data <- wide_data[, response_cols]
# Perform Little's MCAR test
little_test <- mcar.test(test_data)
cat("\nLittle's MCAR Test:\n")
print(little_test)
}
return(pattern_matrix)
}
# Example
missing_pattern <- analyze_missing_patterns(clinical_data, "subject", "time", "response")
Violation of Sphericity
handle_sphericity_violation <- function(data, subject_var, time_var, response_var) {
cat("=== Handling Sphericity Violation ===\n")
# 1. Use Greenhouse-Geisser or Huynh-Feldt corrections
cat("\n1. Apply epsilon corrections:\n")
# (Implementation shown in check_ranova_assumptions function above)
# 2. Use multivariate approach (MANOVA)
cat("\n2. Multivariate approach (MANOVA):\n")
# Convert to wide format
wide_data <- reshape(data,
idvar = subject_var,
timevar = time_var,
direction = "wide")
# Extract response columns
response_cols <- grep(paste0("^", response_var, "\\."), names(wide_data), value = TRUE)
response_matrix <- as.matrix(wide_data[, response_cols])
# Fit MANOVA
manova_result <- manova(response_matrix ~ 1)
# Test using Pillai's trace (most robust)
manova_summary <- summary(manova_result, test = "Pillai")
print(manova_summary)
# 3. Use linear mixed models with unstructured covariance
cat("\n3. Linear mixed model with unstructured covariance:\n")
library(nlme)
# Fit model with unstructured covariance
model_unstructured <- lme(as.formula(paste(response_var, "~", time_var)),
random = ~ 1 | subject_var,
correlation = corSymm(form = ~ as.numeric(get(time_var)) | subject_var),
data = data,
control = lmeControl(opt = "optim"))
print(summary(model_unstructured))
# Compare with compound symmetry model
model_compound <- lme(as.formula(paste(response_var, "~", time_var)),
random = ~ 1 | subject_var,
correlation = corCompSymm(form = ~ 1 | subject_var),
data = data)
cat("\nModel comparison (unstructured vs. compound symmetry):\n")
print(anova(model_unstructured, model_compound))
}
Problems and Exercises
1. Conceptual Problems:
a) Derive the expected mean squares for the rANOVA model b) Prove that the F-test for time effects in rANOVA follows an F-distribution under the null hypothesis c) Show how the intraclass correlation coefficient relates to the variance components in the mixed model
2. Applied Problems:
a) Analyze the [SOCR LDT dataset](http://wiki.socr.umich.edu/index.php/SOCR_Data_072108_LDT) using rANOVA b) Conduct a repeated measures analysis of the [ChickWeight dataset](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ChickWeight.html) with diet as a between-subjects factor c) Perform power analysis for a planned longitudinal study with 5 time points, expecting a medium effect (f² = 0.15), and correlation ρ = 0.6
3. Simulation Study:
# Simulation to compare traditional rANOVA and linear mixed models
simulate_rm_comparison <- function(n_sim = 500, n = 20, k = 4, missing_rate = 0.1) {
type1_error_aov <- numeric(n_sim)
type1_error_lmm <- numeric(n_sim)
for (sim in 1:n_sim) {
# Simulate data under null hypothesis (no time effect)
data <- expand.grid(
subject = factor(1:n),
time = factor(1:k)
)
# Random intercept model
subject_effects <- rnorm(n, 0, 5)
data$response <- rep(subject_effects, each = k) + rnorm(n * k, 0, 2)
# Introduce missing data
if (missing_rate > 0) {
missing_indices <- sample(1:nrow(data), size = round(nrow(data) * missing_rate))
data$response[missing_indices] <- NA
}
# Method 1: Traditional rANOVA (complete cases only)
complete_data <- data[complete.cases(data), ]
complete_subjects <- length(unique(complete_data$subject))
if (complete_subjects >= 3) {
try({
model_aov <- aov(response ~ time + Error(subject/time), data = complete_data)
aov_summary <- summary(model_aov)
p_aov <- aov_summary[[2]][[1]]$"Pr(>F)"[1]
type1_error_aov[sim] <- p_aov < 0.05
}, silent = TRUE)
}
# Method 2: Linear mixed model (uses all available data)
try({
library(lmerTest)
model_lmm <- lmer(response ~ time + (1|subject), data = data,
na.action = na.omit)
p_lmm <- anova(model_lmm)$"Pr(>F)"[1]
type1_error_lmm[sim] <- p_lmm < 0.05
}, silent = TRUE)
}
cat("Simulation Results (n =", n_sim, "):\n")
cat("Type I error rate - Traditional rANOVA:",
mean(type1_error_aov, na.rm = TRUE), "\n")
cat("Type I error rate - Linear Mixed Model:",
mean(type1_error_lmm, na.rm = TRUE), "\n")
return(data.frame(
simulation = 1:n_sim,
aov_error = type1_error_aov,
lmm_error = type1_error_lmm
))
}
# Run simulation
sim_results <- simulate_rm_comparison(n_sim = 200, n = 15, k = 3, missing_rate = 0.1)
References
1. Maxwell, S. E., & Delaney, H. D. (2004). *Designing Experiments and Analyzing Data: A Model Comparison Perspective* (2nd ed.). Psychology Press. 2. Pinheiro, J. C., & Bates, D. M. (2000). *Mixed-Effects Models in S and S-PLUS*. Springer. 3. Verbeke, G., & Molenberghs, G. (2000). *Linear Mixed Models for Longitudinal Data*. Springer. 4. Field, A., Miles, J., & Field, Z. (2012). *Discovering Statistics Using R*. Sage Publications. 5. Gueorguieva, R., & Krystal, J. H. (2004). Move over ANOVA: Progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry. *Archives of General Psychiatry, 61*(3), 310-317.
Online Resources
- Repeated Measures Analysis in R
- Linear Mixed Models with lme4
- SOCR Resources and Datasets
- UCLA R Seminar on Repeated Measures
- SOCR Home page: https://www.socr.umich.edu
Translate this page: