Difference between revisions of "SMHS ANCOVA"
(→References) |
|||
| (10 intermediate revisions by 2 users not shown) | |||
| Line 2: | Line 2: | ||
===Overview=== | ===Overview=== | ||
| − | Analysis of | + | '''Analysis of Covariance (ANCOVA)''' is a statistical technique that blends [[SMHS_ANOVA|Analysis of Variance (ANOVA)]] and [[SMHS_Regression|regression analysis]] to assess whether population means of a dependent variable (DV) differ across levels of a categorical independent variable (IV) while controlling for the effects of continuous covariates (CVs). ANCOVA extends ANOVA by incorporating continuous predictors, which increases statistical power and reduces bias from preexisting group differences. This section provides a comprehensive treatment of ANCOVA, its multivariate extensions (MANCOVA), and practical implementation with R. |
===Motivation=== | ===Motivation=== | ||
| − | + | Consider a clinical trial comparing two treatments for blood pressure reduction. Patients differ in baseline blood pressure, age, and BMI. Simple ANOVA comparing treatment groups would ignore these covariates, potentially biasing results. ANCOVA addresses this by: | |
| + | 1. '''Increasing statistical power''' by reducing within-group error variance | ||
| + | 2. '''Adjusting for preexisting differences''' in nonequivalent groups | ||
| + | 3. '''Reducing bias''' from confounding variables | ||
| + | 4. '''Enabling more precise estimation''' of treatment effects | ||
| + | |||
| + | For multivariate outcomes (e.g., blood pressure, cholesterol, weight), '''Multivariate ANCOVA (MANCOVA)''' extends this framework to multiple DVs simultaneously. | ||
===Theory=== | ===Theory=== | ||
| − | + | ====1) ANOVA Review==== | |
| − | + | ||
| − | + | =====One-way ANOVA===== | |
| − | + | For \(k\) groups with \(n_i\) observations in group \(i\), the model is: | |
| − | + | \[ | |
| + | y_{ij} = \mu + \tau_i + \varepsilon_{ij}, \quad \varepsilon_{ij} \sim N(0, \sigma^2) | ||
| + | \] | ||
| + | where: | ||
| + | - \(y_{ij}\) is the \(j\)-th observation in group \(i\) | ||
| + | - \(\mu\) is the overall mean | ||
| + | - \(\tau_i\) is the treatment effect for group \(i\) (\(\sum_{i=1}^k \tau_i = 0\)) | ||
| + | - \(\varepsilon_{ij}\) is the random error | ||
| + | |||
| + | The total sum of squares partitions as: | ||
| + | \[ | ||
| + | SST = SSB + SSW = \sum_{i=1}^k \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_{\cdot\cdot})^2 = \sum_{i=1}^k n_i (\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2 + \sum_{i=1}^k \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_{i\cdot})^2 | ||
| + | \] | ||
| + | |||
| + | The F-statistic tests \(H_0: \tau_1 = \tau_2 = \cdots = \tau_k\): | ||
| + | \[ | ||
| + | F = \frac{MSB}{MSW} = \frac{SSB/(k-1)}{SSW/(n-k)} \sim F_{k-1, n-k} \quad \text{under } H_0 | ||
| + | \] | ||
| + | |||
| + | =====Two-way ANOVA===== | ||
| + | For factors A (a levels) and B (b levels) with r replicates: | ||
| + | \[ | ||
| + | y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \varepsilon_{ijk} | ||
| + | \] | ||
| + | where: | ||
| + | - \(\alpha_i\) is the main effect of factor A | ||
| + | - \(\beta_j\) is the main effect of factor B | ||
| + | - \((\alpha\beta)_{ij}\) is the interaction effect | ||
| + | - \(\varepsilon_{ijk} \sim N(0, \sigma^2)\) | ||
| + | |||
| + | Sum of squares decomposition: | ||
| + | \[ | ||
| + | SST = SSA + SSB + SSAB + SSE | ||
| + | \] | ||
| + | |||
| + | ====2) ANCOVA Model==== | ||
| + | |||
| + | =====Basic Model===== | ||
| + | The ANCOVA model with one covariate and one factor: | ||
| + | \[ | ||
| + | y_{ij} = \mu + \tau_i + \beta(x_{ij} - \bar{x}_{\cdot\cdot}) + \varepsilon_{ij}, \quad \varepsilon_{ij} \sim N(0, \sigma^2) | ||
| + | \] | ||
| + | where: | ||
| + | - \(y_{ij}\) is the response for observation \(j\) in group \(i\) | ||
| + | - \(\mu\) is the overall mean | ||
| + | - \(\tau_i\) is the treatment effect (\(\sum \tau_i = 0\)) | ||
| + | - \(\beta\) is the regression coefficient for the covariate \(x_{ij}\) | ||
| + | - \(\bar{x}_{\cdot\cdot}\) is the overall mean of the covariate (centering reduces multicollinearity) | ||
| + | |||
| + | The adjusted group mean for group \(i\) is: | ||
| + | \[ | ||
| + | \mu_i^{adj} = \mu + \tau_i = \bar{y}_{i\cdot} - \beta(\bar{x}_{i\cdot} - \bar{x}_{\cdot\cdot}) | ||
| + | \] | ||
| + | |||
| + | =====Matrix Formulation===== | ||
| + | For \(k\) groups and \(p\) covariates: | ||
| + | \[ | ||
| + | \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim N(\mathbf{0}, \sigma^2\mathbf{I}) | ||
| + | \] | ||
| + | where: | ||
| + | \[ | ||
| + | \mathbf{X} = \begin{bmatrix} | ||
| + | \mathbf{1} & \mathbf{Z} & \mathbf{C} | ||
| + | \end{bmatrix}, \quad | ||
| + | \boldsymbol{\beta} = \begin{bmatrix} | ||
| + | \mu \\ \boldsymbol{\tau} \\ \boldsymbol{\gamma} | ||
| + | \end{bmatrix} | ||
| + | \] | ||
| + | - \(\mathbf{Z}\) is the design matrix for group indicators | ||
| + | - \(\mathbf{C}\) is the matrix of covariates | ||
| + | - \(\boldsymbol{\tau}\) are treatment effects | ||
| + | - \(\boldsymbol{\gamma}\) are covariate coefficients | ||
| + | |||
| + | The least squares estimates: | ||
| + | \[ | ||
| + | \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y} | ||
| + | \] | ||
| + | |||
| + | =====Hypothesis Testing===== | ||
| + | 1. '''Test for covariate effect''': \(H_0: \beta = 0\) vs \(H_1: \beta \neq 0\) | ||
| + | \[ | ||
| + | F = \frac{MS_{reg}}{MSE} \sim F_{1, n-k-1} | ||
| + | \] | ||
| + | |||
| + | 2. '''Test for treatment effect adjusted for covariate''': \(H_0: \tau_1 = \cdots = \tau_k = 0\) | ||
| + | \[ | ||
| + | F = \frac{MS_{trt|cov}}{MSE} \sim F_{k-1, n-k-1} | ||
| + | \] | ||
| + | |||
| + | The adjusted treatment sum of squares: | ||
| + | \[ | ||
| + | SS_{trt|cov} = SS_{total} - SS_{cov} - SSE | ||
| + | \] | ||
| + | |||
| + | ====3) MANOVA and MANCOVA==== | ||
| − | + | =====MANOVA Model===== | |
| − | + | For \(p\) dependent variables: | |
| − | + | \[ | |
| − | + | \mathbf{Y}_{n\times p} = \mathbf{X}_{n\times q}\mathbf{B}_{q\times p} + \mathbf{E}_{n\times p}, \quad \text{vec}(\mathbf{E}) \sim N(\mathbf{0}, \mathbf{I}_n \otimes \boldsymbol{\Sigma}) | |
| − | + | \] | |
| − | { | + | where \(\boldsymbol{\Sigma}\) is the \(p\times p\) covariance matrix of errors. |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | The hypothesis matrix \(\mathbf{H}\) and error matrix \(\mathbf{E}\): | |
| − | + | \[ | |
| − | + | \mathbf{H} = \mathbf{Y}^\top\mathbf{X}(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{Y} - \mathbf{Y}^\top\mathbf{1}(\mathbf{1}^\top\mathbf{1})^{-1}\mathbf{1}^\top\mathbf{Y} | |
| + | \] | ||
| + | \[ | ||
| + | \mathbf{E} = \mathbf{Y}^\top\mathbf{Y} - \mathbf{Y}^\top\mathbf{X}(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{Y} | ||
| + | \] | ||
| − | + | Test statistics based on eigenvalues \(\lambda_i\) of \(\mathbf{E}^{-1}\mathbf{H}\): | |
| − | + | 1. '''Wilks' Lambda''': \(\Lambda = \frac{|\mathbf{E}|}{|\mathbf{H}+\mathbf{E}|} = \prod_{i=1}^s \frac{1}{1+\lambda_i}\) | |
| − | + | 2. '''Pillai's Trace''': \(V = \text{tr}[\mathbf{H}(\mathbf{H}+\mathbf{E})^{-1}] = \sum_{i=1}^s \frac{\lambda_i}{1+\lambda_i}\) | |
| + | 3. '''Hotelling-Lawley Trace''': \(U = \text{tr}(\mathbf{E}^{-1}\mathbf{H}) = \sum_{i=1}^s \lambda_i\) | ||
| + | 4. '''Roy's Largest Root''': \(\theta = \frac{\lambda_1}{1+\lambda_1}\) | ||
| − | + | =====MANCOVA Model===== | |
| − | + | Extends MANOVA with covariates: | |
| − | + | \[ | |
| − | + | \mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{Z}\mathbf{\Gamma} + \mathbf{E} | |
| − | + | \] | |
| − | + | where \(\mathbf{Z}\) contains covariates and \(\mathbf{\Gamma}\) their coefficients. | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | ====4) Assumptions and Diagnostics==== | |
| − | + | '''Key Assumptions:''' | |
| − | + | 1. '''Linearity''': Relationship between DV and covariates is linear | |
| − | + | 2. '''Homogeneity of regression slopes''': \(\beta\) is constant across groups | |
| − | + | 3. '''Normality''': Residuals \(\varepsilon_{ij} \sim N(0, \sigma^2)\) | |
| − | + | 4. '''Homoscedasticity''': Constant variance across groups | |
| − | + | 5. '''Independence''': Observations are independent | |
| + | 6. '''No multicollinearity''': Covariates not highly correlated | ||
| − | + | '''Diagnostic Tests:''' | |
| − | + | <pre> | |
| − | + | # R function for ANCOVA diagnostics | |
| − | + | check_ancova_assumptions <- function(model, data, group_var, covariate) { | |
| − | + | par(mfrow = c(2, 3)) | |
| − | + | ||
| − | + | # 1. Normality of residuals | |
| − | + | residuals <- resid(model) | |
| − | + | qqnorm(residuals, main = "Q-Q Plot of Residuals") | |
| − | + | qqline(residuals) | |
| − | + | shapiro_test <- shapiro.test(residuals) | |
| − | + | cat("Shapiro-Wilk normality test: W =", shapiro_test$statistic, | |
| − | + | "p =", shapiro_test$p.value, "\n") | |
| − | + | ||
| − | + | # 2. Homogeneity of variance | |
| − | + | plot(fitted(model), residuals, | |
| − | + | xlab = "Fitted Values", ylab = "Residuals", | |
| − | + | main = "Residuals vs Fitted") | |
| − | + | abline(h = 0, col = "red") | |
| − | + | ||
| − | </ | + | # Levene's test (using car package) |
| + | if (require(car)) { | ||
| + | levene_test <- leveneTest(residuals ~ data[[group_var]]) | ||
| + | cat("Levene's test for homogeneity of variance: F =", | ||
| + | levene_test$F[1], "p =", levene_test$Pr[1], "\n") | ||
| + | } | ||
| + | |||
| + | # 3. Linearity check | ||
| + | plot(data[[covariate]], residuals, | ||
| + | xlab = covariate, ylab = "Residuals", | ||
| + | main = "Residuals vs Covariate") | ||
| + | abline(h = 0, col = "red") | ||
| + | |||
| + | # 4. Homogeneity of regression slopes | ||
| + | interaction_model <- lm(formula(paste("y ~", group_var, "*", covariate)), data = data) | ||
| + | anova_interaction <- anova(model, interaction_model) | ||
| + | cat("\nTest for homogeneity of slopes (interaction test):\n") | ||
| + | print(anova_interaction) | ||
| + | |||
| + | # 5. Multicollinearity (if multiple covariates) | ||
| + | if (require(car)) { | ||
| + | vif_values <- vif(model) | ||
| + | cat("\nVariance Inflation Factors (VIF):\n") | ||
| + | print(vif_values) | ||
| + | } | ||
| + | |||
| + | par(mfrow = c(1, 1)) | ||
| + | } | ||
| + | </pre> | ||
| − | + | Below is a more sophisticated function, ''check_ancova_assumptions_simple()'', which is more robust to model configurations. | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | <pre> | |
| − | + | check_ancova_assumptions_simple <- function(model, data, group_var, covariate) { | |
| − | + | # Get response variable name from model formula | |
| − | + | response_var <- all.vars(formula(model))[1] | |
| − | + | ||
| + | cat("=== ANCOVA ASSUMPTION CHECKS ===\n") | ||
| + | cat("Response variable:", response_var, "\n") | ||
| + | cat("Group variable:", group_var, "\n") | ||
| + | cat("Covariate:", covariate, "\n\n") | ||
| + | |||
| + | # Set up plot layout | ||
| + | par(mfrow = c(2, 3)) | ||
| + | |||
| + | # 1. Normality of residuals | ||
| + | residuals <- resid(model) | ||
| + | qqnorm(residuals, main = "Q-Q Plot of Residuals") | ||
| + | qqline(residuals) | ||
| + | shapiro_test <- shapiro.test(residuals) | ||
| + | cat("1. Normality (Shapiro-Wilk): W =", round(shapiro_test$statistic, 4), | ||
| + | ", p =", format.pval(shapiro_test$p.value, digits = 4), "\n") | ||
| + | |||
| + | # 2. Homogeneity of variance | ||
| + | fitted_vals <- fitted(model) | ||
| + | plot(fitted_vals, residuals, | ||
| + | xlab = "Fitted Values", ylab = "Residuals", | ||
| + | main = "Residuals vs Fitted") | ||
| + | abline(h = 0, col = "red", lty = 2) | ||
| + | |||
| + | # 3. Linearity check (residuals vs covariate) | ||
| + | plot(data[[covariate]], residuals, | ||
| + | xlab = covariate, ylab = "Residuals", | ||
| + | main = paste("Residuals vs", covariate)) | ||
| + | abline(h = 0, col = "red", lty = 2) | ||
| + | |||
| + | # 4. Homogeneity of regression slopes - FIXED: Properly nested comparison | ||
| + | # Get all terms from the original model | ||
| + | model_terms <- attr(terms(model), "term.labels") | ||
| + | |||
| + | # Remove any existing interaction involving the group_var and covariate | ||
| + | # Keep all other terms | ||
| + | other_terms <- model_terms[!grepl(paste0(group_var, ":", covariate, "|", | ||
| + | covariate, ":", group_var), model_terms)] | ||
| + | |||
| + | # Build the model formula without the interaction | ||
| + | if (length(other_terms) > 0) { | ||
| + | base_formula <- paste(response_var, "~", paste(other_terms, collapse = " + ")) | ||
| + | } else { | ||
| + | base_formula <- paste(response_var, "~ 1") | ||
| + | } | ||
| + | |||
| + | # Build interaction model formula by adding the interaction term | ||
| + | interaction_formula <- paste(base_formula, "+", group_var, "*", covariate) | ||
| + | |||
| + | # Fit both models | ||
| + | base_model <- lm(as.formula(base_formula), data = data) | ||
| + | interaction_model <- lm(as.formula(interaction_formula), data = data) | ||
| + | |||
| + | # Test for significant interaction | ||
| + | interaction_test <- anova(base_model, interaction_model) | ||
| + | cat("\n2. Homogeneity of slopes test:\n") | ||
| + | print(interaction_test) | ||
| + | |||
| + | # Check if the interaction is significant - FIXED: Handle NA p-values | ||
| + | p_value <- interaction_test$`Pr(>F)`[2] | ||
| + | |||
| + | if (!is.na(p_value)) { | ||
| + | if (p_value < 0.05) { | ||
| + | cat("\nWARNING: Significant interaction detected (p =", | ||
| + | format.pval(p_value, digits = 4), | ||
| + | "). Slopes are not homogeneous.\n") | ||
| + | |||
| + | # Plot different slopes | ||
| + | plot(data[[covariate]], data[[response_var]], | ||
| + | col = as.numeric(data[[group_var]]), | ||
| + | xlab = covariate, ylab = response_var, | ||
| + | main = "Different Slopes by Group (Interaction Present)") | ||
| + | |||
| + | # Add regression lines for each group | ||
| + | groups <- unique(data[[group_var]]) | ||
| + | for (i in seq_along(groups)) { | ||
| + | group_data <- data[data[[group_var]] == groups[i], ] | ||
| + | if (nrow(group_data) > 1) { | ||
| + | abline(lm(as.formula(paste(response_var, "~", covariate)), | ||
| + | data = group_data), | ||
| + | col = i, lwd = 2) | ||
| + | } | ||
| + | } | ||
| + | |||
| + | legend("topright", legend = levels(data[[group_var]]), | ||
| + | col = 1:length(levels(data[[group_var]])), | ||
| + | lwd = 2, pch = 1) | ||
| + | } else { | ||
| + | cat("\nNo significant interaction (p =", | ||
| + | format.pval(p_value, digits = 4), | ||
| + | "). Homogeneity of slopes assumption is satisfied.\n") | ||
| + | } | ||
| + | } else { | ||
| + | cat("\nNote: Could not calculate p-value for interaction test.\n") | ||
| + | cat("Model comparison summary:\n") | ||
| + | print(interaction_test) | ||
| + | } | ||
| + | |||
| + | # 5. Residual histogram | ||
| + | hist(residuals, main = "Histogram of Residuals", | ||
| + | xlab = "Residuals", col = "lightblue") | ||
| + | |||
| + | par(mfrow = c(1, 1)) | ||
| + | |||
| + | # Additional diagnostics | ||
| + | cat("\n3. Additional Statistics:\n") | ||
| + | cat("- Mean of residuals:", round(mean(residuals), 4), "\n") | ||
| + | cat("- SD of residuals:", round(sd(residuals), 4), "\n") | ||
| + | cat("- Max absolute residual:", round(max(abs(residuals)), 4), "\n") | ||
| + | |||
| + | # Check for outliers (residuals > 3 SD) | ||
| + | if (sd(residuals) > 0) { | ||
| + | outlier_count <- sum(abs(scale(residuals)) > 3, na.rm = TRUE) | ||
| + | cat("- Potential outliers (>3 SD):", outlier_count, "\n") | ||
| + | } | ||
| + | |||
| + | # Return diagnostic results | ||
| + | return(list( | ||
| + | shapiro_test = shapiro_test, | ||
| + | interaction_test = interaction_test, | ||
| + | p_value_interaction = p_value | ||
| + | )) | ||
| + | } | ||
| + | </pre> | ||
===Applications=== | ===Applications=== | ||
| − | + | ====Example 1: Clinical Trial with Baseline Adjustment==== | |
| + | <pre> | ||
| + | # Simulated clinical trial data | ||
| + | set.seed(123) | ||
| + | n <- 100 | ||
| + | trial_data <- data.frame( | ||
| + | patient_id = 1:n, | ||
| + | treatment = factor(rep(c("Drug", "Placebo"), each = n/2)), | ||
| + | baseline_bp = rnorm(n, mean = 150, sd = 15), | ||
| + | age = sample(40:75, n, replace = TRUE), | ||
| + | bmi = rnorm(n, mean = 28, sd = 4) | ||
| + | ) | ||
| − | + | # Generate post-treatment BP with treatment effect and baseline correlation | |
| + | trial_data$post_bp <- with(trial_data, | ||
| + | baseline_bp * 0.7 + | ||
| + | ifelse(treatment == "Drug", -15, -5) + | ||
| + | (age - 60) * 0.2 + | ||
| + | (bmi - 28) * 0.5 + | ||
| + | rnorm(n, 0, 8) | ||
| + | ) | ||
| − | === | + | cat("=== Basic ANOVA (ignoring covariates) ===\n") |
| + | anova_simple <- aov(post_bp ~ treatment, data = trial_data) | ||
| + | print(summary(anova_simple)) | ||
| − | + | cat("\n=== ANCOVA (adjusting for baseline BP) ===\n") | |
| − | + | ancova_model <- aov(post_bp ~ treatment + baseline_bp, data = trial_data) | |
| − | + | print(summary(ancova_model)) | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | === | + | cat("\n=== ANCOVA (multiple covariates) ===\n") |
| + | ancova_full <- lm(post_bp ~ treatment + baseline_bp + age + bmi, data = trial_data) | ||
| + | print(summary(ancova_full)) | ||
| − | + | cat("\n=== Adjusted Means ===\n") | |
| + | library(emmeans) | ||
| + | adj_means <- emmeans(ancova_full, specs = ~ treatment) | ||
| + | print(adj_means) | ||
| − | + | cat("\n=== Pairwise Comparisons with Adjustment ===\n") | |
| − | + | pairwise_comparisons <- pairs(adj_means, adjust = "holm") | |
| − | + | print(pairwise_comparisons) | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | # Diagnostic checks | |
| − | + | cat("\n=== Model Diagnostics ===\n") | |
| − | + | check_ancova_assumptions_simple(ancova_full, trial_data, "treatment", "baseline_bp") | |
| − | + | </pre> | |
| − | + | ====Example 2: Educational Intervention Study==== | |
| − | + | <pre> | |
| + | # Using built-in iris dataset to demonstrate MANCOVA | ||
| + | data(iris) | ||
| + | # Create a categorical variable and covariate | ||
| + | set.seed(456) | ||
| + | iris$treatment <- factor(sample(c("Method_A", "Method_B", "Control"), | ||
| + | nrow(iris), replace = TRUE)) | ||
| + | iris$pretest_score <- iris$Sepal.Length * 10 + rnorm(nrow(iris), 50, 5) | ||
| − | + | # MANCOVA with multiple DVs | |
| − | + | cat("=== MANCOVA Example ===\n") | |
| − | + | Y <- cbind(iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width) | |
| − | + | manova_model <- manova(Y ~ treatment + pretest_score, data = iris) | |
| − | |||
| − | |||
| − | - | ||
| − | |||
| − | + | cat("\n--- Wilks' Lambda Test ---\n") | |
| − | + | print(summary(manova_model, test = "Wilks")) | |
| − | |||
| − | |||
| − | + | cat("\n--- Pillai's Trace Test ---\n") | |
| − | + | print(summary(manova_model, test = "Pillai")) | |
| − | |||
| − | |||
| − | + | cat("\n--- Individual ANCOVAs ---\n") | |
| − | + | for (i in 1:3) { | |
| − | + | cat("\nDV", i, ":\n") | |
| − | + | ancova_indiv <- lm(Y[, i] ~ treatment + pretest_score, data = iris) | |
| − | + | print(summary(ancova_indiv)) | |
| − | + | } | |
| − | |||
| − | |||
| − | + | # Visualizing adjusted means | |
| + | library(ggplot2) | ||
| + | library(dplyr) | ||
| + | # Calculate adjusted means using emmeans | ||
| + | if (require(emmeans)) { | ||
| + | adj_means_plot <- list() | ||
| + | for (i in 1:3) { | ||
| + | model <- lm(Y[, i] ~ treatment + pretest_score, data = iris) | ||
| + | emm <- emmeans(model, specs = ~ treatment) | ||
| + | adj_means_plot[[i]] <- as.data.frame(emm) %>% | ||
| + | mutate(DV = paste("DV", i)) | ||
| + | } | ||
| + | |||
| + | plot_data <- bind_rows(adj_means_plot) | ||
| + | ggplot(plot_data, aes(x = treatment, y = emmean, fill = treatment)) + | ||
| + | geom_bar(stat = "identity", alpha = 0.7) + | ||
| + | geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.2) + | ||
| + | facet_wrap(~ DV, scales = "free_y") + | ||
| + | labs(title = "Adjusted Means with 95% Confidence Intervals", | ||
| + | y = "Adjusted Mean", x = "Treatment Group") + | ||
| + | theme_minimal() | ||
| + | } | ||
| + | </pre> | ||
| − | === | + | ====Example 3: Real Dataset Analysis - Plant Growth==== |
| − | [ | + | <pre> |
| + | # Using PlantGrowth dataset with simulated covariate | ||
| + | data(PlantGrowth) | ||
| + | set.seed(789) | ||
| + | PlantGrowth$soil_quality <- rnorm(nrow(PlantGrowth), mean = 5, sd = 1) + | ||
| + | ifelse(PlantGrowth$group == "trt1", 0.5, | ||
| + | ifelse(PlantGrowth$group == "trt2", -0.5, 0)) | ||
| + | |||
| + | cat("=== Plant Growth ANCOVA ===\n") | ||
| + | cat("Research question: Do treatments affect plant weight after controlling for soil quality?\n\n") | ||
| + | |||
| + | # EDA | ||
| + | cat("--- Exploratory Data Analysis ---\n") | ||
| + | cat("Group means (unadjusted):\n") | ||
| + | print(aggregate(weight ~ group, data = PlantGrowth, mean)) | ||
| + | |||
| + | cat("\nCorrelation between weight and soil quality:", | ||
| + | cor(PlantGrowth$weight, PlantGrowth$soil_quality), "\n") | ||
| + | |||
| + | # Visualization | ||
| + | ggplot(PlantGrowth, aes(x = soil_quality, y = weight, color = group)) + | ||
| + | geom_point(size = 3) + | ||
| + | geom_smooth(method = "lm", se = FALSE) + | ||
| + | labs(title = "Relationship between Plant Weight and Soil Quality by Treatment", | ||
| + | x = "Soil Quality", y = "Plant Weight") + | ||
| + | theme_minimal() | ||
| + | |||
| + | # ANCOVA analysis | ||
| + | ancova_plant <- lm(weight ~ group + soil_quality, data = PlantGrowth) | ||
| + | cat("\n--- ANCOVA Results ---\n") | ||
| + | print(summary(ancova_plant)) | ||
| + | |||
| + | # Check assumptions | ||
| + | cat("\n--- Assumption Checks ---\n") | ||
| + | check_ancova_assumptions_simple(ancova_plant, PlantGrowth, "group", "soil_quality") | ||
| + | |||
| + | # Contrasts and post-hoc tests | ||
| + | cat("\n--- Post-hoc Comparisons ---\n") | ||
| + | library(multcomp) | ||
| + | contrasts <- glht(ancova_plant, linfct = mcp(group = "Tukey")) | ||
| + | print(summary(contrasts)) | ||
| + | |||
| + | # Effect sizes | ||
| + | cat("\n--- Effect Sizes ---\n") | ||
| + | library(effectsize) | ||
| + | eta_squared <- eta_squared(ancova_plant, partial = TRUE) | ||
| + | print(eta_squared) | ||
| + | |||
| + | # Power analysis | ||
| + | cat("\n--- Power Analysis ---\n") | ||
| + | library(pwr) | ||
| + | # Calculate achieved power | ||
| + | f2 <- eta_squared$Eta2_partial[1] / (1 - eta_squared$Eta2_partial[1]) | ||
| + | power_achieved <- pwr.f2.test(u = 2, v = 27, f2 = f2, sig.level = 0.05)$power | ||
| + | cat("Achieved power for treatment effect:", round(power_achieved, 3), "\n") | ||
| + | </pre> | ||
| + | |||
| + | ===Advanced Topics=== | ||
| + | |||
| + | ====1) Nonparametric ANCOVA==== | ||
| + | <pre> | ||
| + | # Quandrant test or rank-based ANCOVA | ||
| + | if (require(Rfit)) { | ||
| + | cat("=== Rank-Based ANCOVA ===\n") | ||
| + | rank_model <- rfit(weight ~ group + soil_quality, data = PlantGrowth) | ||
| + | print(summary(rank_model)) | ||
| + | } | ||
| + | </pre> | ||
| + | |||
| + | ====2) Mixed Effects ANCOVA==== | ||
| + | <pre> | ||
| + | # For repeated measures or clustered data | ||
| + | if (require(lme4)) { | ||
| + | # Simulated longitudinal data | ||
| + | set.seed(321) | ||
| + | long_data <- data.frame( | ||
| + | subject = rep(1:20, each = 3), | ||
| + | time = rep(1:3, 20), | ||
| + | treatment = factor(rep(sample(c("A", "B"), 20, replace = TRUE), each = 3)), | ||
| + | baseline = rnorm(60, 100, 15), | ||
| + | response = NA | ||
| + | ) | ||
| + | |||
| + | # Generate responses | ||
| + | long_data$response <- with(long_data, | ||
| + | 100 + | ||
| + | ifelse(treatment == "A", 5, -5) + | ||
| + | 0.7 * baseline + | ||
| + | rnorm(60, 0, 10) + # Residual error | ||
| + | rep(rnorm(20, 0, 5), each = 3) # Random intercept | ||
| + | ) | ||
| + | |||
| + | # Linear mixed model (ANCOVA with random effects) | ||
| + | mixed_model <- lmer(response ~ treatment + baseline + time + (1|subject), | ||
| + | data = long_data) | ||
| + | cat("\n=== Mixed Effects ANCOVA ===\n") | ||
| + | print(summary(mixed_model)) | ||
| + | |||
| + | # Compare with standard ANCOVA | ||
| + | standard_ancova <- lm(response ~ treatment + baseline + time, data = long_data) | ||
| + | cat("\n--- Comparison: Standard vs Mixed ANCOVA ---\n") | ||
| + | cat("Standard ANCOVA AIC:", AIC(standard_ancova), "\n") | ||
| + | cat("Mixed model AIC:", AIC(mixed_model), "\n") | ||
| + | } | ||
| + | </pre> | ||
| + | |||
| + | ====3) Power Analysis and Sample Size Planning==== | ||
| + | <pre> | ||
| + | # Power analysis for ANCOVA | ||
| + | calculate_power_ancova <- function(k, n_per_group, f2, rho, alpha = 0.05) { | ||
| + | # k: number of groups | ||
| + | # n_per_group: sample size per group | ||
| + | # f2: effect size (Cohen's f^2) | ||
| + | # rho: correlation between covariate and DV | ||
| + | |||
| + | N <- k * n_per_group | ||
| + | df1 <- k - 1 | ||
| + | df2 <- N - k - 1 # -1 for covariate | ||
| + | |||
| + | # Adjust f2 for covariate inclusion | ||
| + | f2_adj <- f2 / (1 - rho^2) | ||
| + | |||
| + | power <- pf(qf(1 - alpha, df1, df2), df1, df2, ncp = N * f2_adj, lower.tail = FALSE) | ||
| + | return(power) | ||
| + | } | ||
| + | |||
| + | cat("=== ANCOVA Power Calculator ===\n") | ||
| + | # Example: 3 groups, 20 per group, medium effect (f2 = 0.15), rho = 0.5 | ||
| + | power_example <- calculate_power_ancova(k = 3, n_per_group = 20, f2 = 0.15, rho = 0.5) | ||
| + | cat("Power for specified design:", round(power_example, 3), "\n") | ||
| + | |||
| + | # Sample size calculation for desired power | ||
| + | library(pwr) | ||
| + | ss <- pwr.f2.test(u = 2, f2 = 0.15, sig.level = 0.05, power = 0.80) | ||
| + | cat("Required sample size (per group for 3 groups):", ceiling(ss$v/3 + 3), "\n") | ||
| + | </pre> | ||
| + | |||
| + | ===Software Implementation=== | ||
| + | |||
| + | <pre> | ||
| + | # Comprehensive ANCOVA analysis function | ||
| + | run_ancova_analysis <- function(formula, data, group_var, covariates, | ||
| + | pairwise = TRUE, diagnostics = TRUE) { | ||
| + | |||
| + | # Fit model | ||
| + | model <- lm(formula, data = data) | ||
| + | |||
| + | cat("=== ANCOVA ANALYSIS REPORT ===\n\n") | ||
| + | cat("Model formula:", deparse(formula), "\n") | ||
| + | cat("Number of groups:", length(unique(data[[group_var]])), "\n") | ||
| + | cat("Number of covariates:", length(covariates), "\n") | ||
| + | cat("Total observations:", nrow(data), "\n\n") | ||
| + | |||
| + | # Model summary | ||
| + | cat("--- MODEL SUMMARY ---\n") | ||
| + | print(summary(model)) | ||
| + | cat("\n") | ||
| + | |||
| + | # ANOVA table | ||
| + | cat("--- ANOVA TABLE ---\n") | ||
| + | print(anova(model)) | ||
| + | cat("\n") | ||
| + | |||
| + | # Assumption checks | ||
| + | if (diagnostics) { | ||
| + | cat("--- ASSUMPTION DIAGNOSTICS ---\n") | ||
| + | check_ancova_assumptions_simple(model, data, group_var, covariates[1]) | ||
| + | } | ||
| + | |||
| + | # Adjusted means and comparisons | ||
| + | if (pairwise && require(emmeans)) { | ||
| + | cat("\n--- ADJUSTED GROUP MEANS ---\n") | ||
| + | adj_means <- emmeans(model, specs = as.formula(paste("~", group_var))) | ||
| + | print(adj_means) | ||
| + | |||
| + | cat("\n--- PAIRWISE COMPARISONS (Holm-adjusted) ---\n") | ||
| + | pairwise_results <- pairs(adj_means, adjust = "holm") | ||
| + | print(pairwise_results) | ||
| + | |||
| + | # Plot adjusted means | ||
| + | plot_data <- as.data.frame(adj_means) | ||
| + | p <- ggplot(plot_data, aes_string(x = group_var, y = "emmean", fill = group_var)) + | ||
| + | geom_bar(stat = "identity", alpha = 0.7) + | ||
| + | geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.2) + | ||
| + | labs(title = "Adjusted Group Means with 95% Confidence Intervals", | ||
| + | y = "Adjusted Mean", x = group_var) + | ||
| + | theme_minimal() | ||
| + | print(p) | ||
| + | } | ||
| + | |||
| + | # Effect sizes | ||
| + | if (require(effectsize)) { | ||
| + | cat("\n--- EFFECT SIZES ---\n") | ||
| + | es <- eta_squared(model, partial = TRUE) | ||
| + | print(es) | ||
| + | } | ||
| + | |||
| + | return(model) | ||
| + | } | ||
| + | |||
| + | # Example usage with mtcars | ||
| + | cat("\n\n=== EXAMPLE: ANCOVA with mtcars dataset ===\n") | ||
| + | data(mtcars) | ||
| + | mtcars$cyl <- factor(mtcars$cyl) | ||
| + | mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual")) | ||
| + | |||
| + | # Run comprehensive analysis | ||
| + | model_result <- run_ancova_analysis( | ||
| + | formula = mpg ~ cyl + hp + wt, | ||
| + | data = mtcars, | ||
| + | group_var = "cyl", | ||
| + | covariates = c("hp", "wt") | ||
| + | ) | ||
| + | </pre> | ||
| + | |||
| + | ===Common Issues and Solutions=== | ||
| + | |||
| + | ====Violation of Homogeneity of Slopes==== | ||
| + | <pre> | ||
| + | # When slopes differ by group | ||
| + | check_slopes <- function(data, dv, group, covariate) { | ||
| + | # Fit interaction model | ||
| + | interaction_model <- lm(as.formula(paste(dv, "~", group, "*", covariate)), data = data) | ||
| + | |||
| + | # Test interaction | ||
| + | simple_model <- lm(as.formula(paste(dv, "~", group, "+", covariate)), data = data) | ||
| + | anova_result <- anova(simple_model, interaction_model) | ||
| + | |||
| + | cat("Test for homogeneity of regression slopes:\n") | ||
| + | print(anova_result) | ||
| + | |||
| + | if (anova_result$`Pr(>F)`[2] < 0.05) { | ||
| + | cat("\nWARNING: Significant interaction detected. Slopes are not homogeneous.\n") | ||
| + | cat("Consider:\n") | ||
| + | cat("1. Reporting separate slopes for each group\n") | ||
| + | cat("2. Using Johnson-Neyman technique to identify regions of significance\n") | ||
| + | cat("3. Considering moderated regression analysis\n") | ||
| + | |||
| + | # Visualize different slopes | ||
| + | library(ggplot2) | ||
| + | p <- ggplot(data, aes_string(x = covariate, y = dv, color = group)) + | ||
| + | geom_point() + | ||
| + | geom_smooth(method = "lm", se = FALSE) + | ||
| + | labs(title = "Different Slopes by Group (Interaction Present)", | ||
| + | subtitle = "ANCOVA assumption violated") + | ||
| + | theme_minimal() | ||
| + | print(p) | ||
| + | } else { | ||
| + | cat("\nNo significant interaction. Homogeneity of slopes assumption satisfied.\n") | ||
| + | } | ||
| + | } | ||
| + | |||
| + | # Example | ||
| + | check_slopes(mtcars, "mpg", "cyl", "wt") | ||
| + | </pre> | ||
| + | |||
| + | ====Dealing with Missing Data==== | ||
| + | <pre> | ||
| + | # Multiple imputation for ANCOVA with missing covariate values | ||
| + | if (require(mice)) { | ||
| + | cat("=== Multiple Imputation for ANCOVA ===\n") | ||
| + | |||
| + | # Create dataset with missing values | ||
| + | set.seed(123) | ||
| + | missing_data <- PlantGrowth | ||
| + | missing_data$soil_quality[sample(1:nrow(missing_data), 5)] <- NA | ||
| + | |||
| + | # Perform multiple imputation | ||
| + | imputed_data <- mice(missing_data, m = 5, method = "pmm", printFlag = FALSE) | ||
| + | |||
| + | # Fit ANCOVA on each imputed dataset | ||
| + | models <- with(imputed_data, lm(weight ~ group + soil_quality)) | ||
| + | |||
| + | # Pool results | ||
| + | pooled_results <- pool(models) | ||
| + | cat("Pooled coefficients:\n") | ||
| + | print(summary(pooled_results)) | ||
| + | } | ||
| + | </pre> | ||
| + | |||
| + | ===Problems and Exercises=== | ||
| + | |||
| + | 1. '''Conceptual Problems''': | ||
| + | a) Prove that the adjusted group mean in ANCOVA is unbiased when the covariate is uncorrelated with treatment assignment | ||
| + | b) Derive the variance of the adjusted treatment effect estimator | ||
| + | c) Show how ANCOVA increases power compared to ANOVA | ||
| + | |||
| + | 2. '''Applied Problems''': | ||
| + | a) Analyze the SOCR [http://wiki.socr.umich.edu/index.php/SOCR_Data_Dinov_021808_ConsumerPriceIndex3Way Consumer Price Index dataset] using ANCOVA | ||
| + | b) Conduct a MANCOVA on the [https://archive.ics.uci.edu/ml/datasets/iris Iris dataset] with species as the factor and sepal length as covariate | ||
| + | c) Perform power analysis for a planned study with 4 groups, expecting a medium effect (f² = 0.20), and correlation ρ = 0.6 between covariate and DV | ||
| − | + | 3. '''Simulation Study''': | |
| + | <pre> | ||
| + | # Simulation to demonstrate ANCOVA advantages | ||
| + | simulate_ancova_power <- function(n_sim = 1000, n_per_group = 30, | ||
| + | effect_size = 0.5, rho = 0.6) { | ||
| + | |||
| + | power_anova <- numeric(n_sim) | ||
| + | power_ancova <- numeric(n_sim) | ||
| + | |||
| + | for (i in 1:n_sim) { | ||
| + | # Simulate data | ||
| + | group <- factor(rep(1:3, each = n_per_group)) | ||
| + | covariate <- rnorm(n_per_group * 3) | ||
| + | |||
| + | # Generate response with treatment effect and covariate relationship | ||
| + | response <- effect_size * as.numeric(group) + rho * covariate + | ||
| + | rnorm(n_per_group * 3, 0, sqrt(1 - rho^2)) | ||
| + | |||
| + | data <- data.frame(response, group, covariate) | ||
| + | |||
| + | # Fit ANOVA | ||
| + | anova_model <- aov(response ~ group, data = data) | ||
| + | power_anova[i] <- summary(anova_model)[[1]]$`Pr(>F)`[1] < 0.05 | ||
| + | |||
| + | # Fit ANCOVA | ||
| + | ancova_model <- aov(response ~ group + covariate, data = data) | ||
| + | power_ancova[i] <- summary(ancova_model)[[1]]$`Pr(>F)`[1] < 0.05 | ||
| + | } | ||
| + | |||
| + | cat("Simulation Results (n =", n_sim, "):\n") | ||
| + | cat("ANOVA power:", mean(power_anova), "\n") | ||
| + | cat("ANCOVA power:", mean(power_ancova), "\n") | ||
| + | cat("Power increase:", mean(power_ancova) - mean(power_anova), "\n") | ||
| + | |||
| + | return(data.frame(ANOVA = power_anova, ANCOVA = power_ancova)) | ||
| + | } | ||
| − | + | # Run simulation | |
| + | results <- simulate_ancova_power(n_sim = 500, n_per_group = 25, | ||
| + | effect_size = 0.4, rho = 0.5) | ||
| + | </pre> | ||
| − | + | ===References=== | |
| − | + | 1. Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). *Designing Experiments and Analyzing Data: A Model Comparison Perspective* (3rd ed.). Routledge. | |
| − | + | 2. Rutherford, A. (2011). *ANOVA and ANCOVA: A GLM Approach* (2nd ed.). Wiley. | |
| + | 3. Tabachnick, B. G., & Fidell, L. S. (2018). *Using Multivariate Statistics* (7th ed.). Pearson. | ||
| + | 4. Miller, G. A., & Chapman, J. P. (2001). Misunderstanding analysis of covariance. *Journal of Abnormal Psychology, 110*(1), 40-48. | ||
| + | 5. Stevens, J. P. (2012). *Applied Multivariate Statistics for the Social Sciences* (5th ed.). Routledge. | ||
| + | ===Online Resources=== | ||
| + | * [https://www.jstatsoft.org/article/view/v033i01 ANCOVA in R Tutorial] | ||
| + | * [https://cran.r-project.org/web/packages/emmeans/vignettes/emmeans.html Estimated Marginal Means] | ||
| + | * [https://cran.r-project.org/web/views/Multivariate.html Multivariate Statistics in R] | ||
| + | * [https://www.socr.umich.edu SOCR Resources] | ||
<hr> | <hr> | ||
| − | * SOCR Home page: | + | * SOCR Home page: https://www.socr.umich.edu |
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_ANCOVA}} | {{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_ANCOVA}} | ||
Latest revision as of 10:09, 9 December 2025
Contents
Scientific Methods for Health Sciences - Analysis of Covariance (ANCOVA)
Overview
Analysis of Covariance (ANCOVA) is a statistical technique that blends Analysis of Variance (ANOVA) and regression analysis to assess whether population means of a dependent variable (DV) differ across levels of a categorical independent variable (IV) while controlling for the effects of continuous covariates (CVs). ANCOVA extends ANOVA by incorporating continuous predictors, which increases statistical power and reduces bias from preexisting group differences. This section provides a comprehensive treatment of ANCOVA, its multivariate extensions (MANCOVA), and practical implementation with R.
Motivation
Consider a clinical trial comparing two treatments for blood pressure reduction. Patients differ in baseline blood pressure, age, and BMI. Simple ANOVA comparing treatment groups would ignore these covariates, potentially biasing results. ANCOVA addresses this by: 1. Increasing statistical power by reducing within-group error variance 2. Adjusting for preexisting differences in nonequivalent groups 3. Reducing bias from confounding variables 4. Enabling more precise estimation of treatment effects
For multivariate outcomes (e.g., blood pressure, cholesterol, weight), Multivariate ANCOVA (MANCOVA) extends this framework to multiple DVs simultaneously.
Theory
1) ANOVA Review
One-way ANOVA
For \(k\) groups with \(n_i\) observations in group \(i\), the model is: \[ y_{ij} = \mu + \tau_i + \varepsilon_{ij}, \quad \varepsilon_{ij} \sim N(0, \sigma^2) \] where: - \(y_{ij}\) is the \(j\)-th observation in group \(i\) - \(\mu\) is the overall mean - \(\tau_i\) is the treatment effect for group \(i\) (\(\sum_{i=1}^k \tau_i = 0\)) - \(\varepsilon_{ij}\) is the random error
The total sum of squares partitions as: \[ SST = SSB + SSW = \sum_{i=1}^k \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_{\cdot\cdot})^2 = \sum_{i=1}^k n_i (\bar{y}_{i\cdot} - \bar{y}_{\cdot\cdot})^2 + \sum_{i=1}^k \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_{i\cdot})^2 \]
The F-statistic tests \(H_0: \tau_1 = \tau_2 = \cdots = \tau_k\): \[ F = \frac{MSB}{MSW} = \frac{SSB/(k-1)}{SSW/(n-k)} \sim F_{k-1, n-k} \quad \text{under } H_0 \]
Two-way ANOVA
For factors A (a levels) and B (b levels) with r replicates: \[ y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \varepsilon_{ijk} \] where: - \(\alpha_i\) is the main effect of factor A - \(\beta_j\) is the main effect of factor B - \((\alpha\beta)_{ij}\) is the interaction effect - \(\varepsilon_{ijk} \sim N(0, \sigma^2)\)
Sum of squares decomposition: \[ SST = SSA + SSB + SSAB + SSE \]
2) ANCOVA Model
Basic Model
The ANCOVA model with one covariate and one factor: \[ y_{ij} = \mu + \tau_i + \beta(x_{ij} - \bar{x}_{\cdot\cdot}) + \varepsilon_{ij}, \quad \varepsilon_{ij} \sim N(0, \sigma^2) \] where: - \(y_{ij}\) is the response for observation \(j\) in group \(i\) - \(\mu\) is the overall mean - \(\tau_i\) is the treatment effect (\(\sum \tau_i = 0\)) - \(\beta\) is the regression coefficient for the covariate \(x_{ij}\) - \(\bar{x}_{\cdot\cdot}\) is the overall mean of the covariate (centering reduces multicollinearity)
The adjusted group mean for group \(i\) is: \[ \mu_i^{adj} = \mu + \tau_i = \bar{y}_{i\cdot} - \beta(\bar{x}_{i\cdot} - \bar{x}_{\cdot\cdot}) \]
Matrix Formulation
For \(k\) groups and \(p\) covariates: \[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim N(\mathbf{0}, \sigma^2\mathbf{I}) \] where: \[ \mathbf{X} = \begin{bmatrix} \mathbf{1} & \mathbf{Z} & \mathbf{C} \end{bmatrix}, \quad \boldsymbol{\beta} = \begin{bmatrix} \mu \\ \boldsymbol{\tau} \\ \boldsymbol{\gamma} \end{bmatrix} \] - \(\mathbf{Z}\) is the design matrix for group indicators - \(\mathbf{C}\) is the matrix of covariates - \(\boldsymbol{\tau}\) are treatment effects - \(\boldsymbol{\gamma}\) are covariate coefficients
The least squares estimates: \[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y} \]
Hypothesis Testing
1. Test for covariate effect: \(H_0: \beta = 0\) vs \(H_1: \beta \neq 0\) \[ F = \frac{MS_{reg}}{MSE} \sim F_{1, n-k-1} \]
2. Test for treatment effect adjusted for covariate: \(H_0: \tau_1 = \cdots = \tau_k = 0\) \[ F = \frac{MS_{trt|cov}}{MSE} \sim F_{k-1, n-k-1} \]
The adjusted treatment sum of squares: \[ SS_{trt|cov} = SS_{total} - SS_{cov} - SSE \]
3) MANOVA and MANCOVA
MANOVA Model
For \(p\) dependent variables: \[ \mathbf{Y}_{n\times p} = \mathbf{X}_{n\times q}\mathbf{B}_{q\times p} + \mathbf{E}_{n\times p}, \quad \text{vec}(\mathbf{E}) \sim N(\mathbf{0}, \mathbf{I}_n \otimes \boldsymbol{\Sigma}) \] where \(\boldsymbol{\Sigma}\) is the \(p\times p\) covariance matrix of errors.
The hypothesis matrix \(\mathbf{H}\) and error matrix \(\mathbf{E}\): \[ \mathbf{H} = \mathbf{Y}^\top\mathbf{X}(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{Y} - \mathbf{Y}^\top\mathbf{1}(\mathbf{1}^\top\mathbf{1})^{-1}\mathbf{1}^\top\mathbf{Y} \] \[ \mathbf{E} = \mathbf{Y}^\top\mathbf{Y} - \mathbf{Y}^\top\mathbf{X}(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{Y} \]
Test statistics based on eigenvalues \(\lambda_i\) of \(\mathbf{E}^{-1}\mathbf{H}\): 1. Wilks' Lambda: \(\Lambda = \frac{|\mathbf{E}|}{|\mathbf{H}+\mathbf{E}|} = \prod_{i=1}^s \frac{1}{1+\lambda_i}\) 2. Pillai's Trace: \(V = \text{tr}[\mathbf{H}(\mathbf{H}+\mathbf{E})^{-1}] = \sum_{i=1}^s \frac{\lambda_i}{1+\lambda_i}\) 3. Hotelling-Lawley Trace: \(U = \text{tr}(\mathbf{E}^{-1}\mathbf{H}) = \sum_{i=1}^s \lambda_i\) 4. Roy's Largest Root: \(\theta = \frac{\lambda_1}{1+\lambda_1}\)
MANCOVA Model
Extends MANOVA with covariates: \[ \mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{Z}\mathbf{\Gamma} + \mathbf{E} \] where \(\mathbf{Z}\) contains covariates and \(\mathbf{\Gamma}\) their coefficients.
4) Assumptions and Diagnostics
Key Assumptions: 1. Linearity: Relationship between DV and covariates is linear 2. Homogeneity of regression slopes: \(\beta\) is constant across groups 3. Normality: Residuals \(\varepsilon_{ij} \sim N(0, \sigma^2)\) 4. Homoscedasticity: Constant variance across groups 5. Independence: Observations are independent 6. No multicollinearity: Covariates not highly correlated
Diagnostic Tests:
# R function for ANCOVA diagnostics
check_ancova_assumptions <- function(model, data, group_var, covariate) {
par(mfrow = c(2, 3))
# 1. Normality of residuals
residuals <- resid(model)
qqnorm(residuals, main = "Q-Q Plot of Residuals")
qqline(residuals)
shapiro_test <- shapiro.test(residuals)
cat("Shapiro-Wilk normality test: W =", shapiro_test$statistic,
"p =", shapiro_test$p.value, "\n")
# 2. Homogeneity of variance
plot(fitted(model), residuals,
xlab = "Fitted Values", ylab = "Residuals",
main = "Residuals vs Fitted")
abline(h = 0, col = "red")
# Levene's test (using car package)
if (require(car)) {
levene_test <- leveneTest(residuals ~ data[[group_var]])
cat("Levene's test for homogeneity of variance: F =",
levene_test$F[1], "p =", levene_test$Pr[1], "\n")
}
# 3. Linearity check
plot(data[[covariate]], residuals,
xlab = covariate, ylab = "Residuals",
main = "Residuals vs Covariate")
abline(h = 0, col = "red")
# 4. Homogeneity of regression slopes
interaction_model <- lm(formula(paste("y ~", group_var, "*", covariate)), data = data)
anova_interaction <- anova(model, interaction_model)
cat("\nTest for homogeneity of slopes (interaction test):\n")
print(anova_interaction)
# 5. Multicollinearity (if multiple covariates)
if (require(car)) {
vif_values <- vif(model)
cat("\nVariance Inflation Factors (VIF):\n")
print(vif_values)
}
par(mfrow = c(1, 1))
}
Below is a more sophisticated function, check_ancova_assumptions_simple(), which is more robust to model configurations.
check_ancova_assumptions_simple <- function(model, data, group_var, covariate) {
# Get response variable name from model formula
response_var <- all.vars(formula(model))[1]
cat("=== ANCOVA ASSUMPTION CHECKS ===\n")
cat("Response variable:", response_var, "\n")
cat("Group variable:", group_var, "\n")
cat("Covariate:", covariate, "\n\n")
# Set up plot layout
par(mfrow = c(2, 3))
# 1. Normality of residuals
residuals <- resid(model)
qqnorm(residuals, main = "Q-Q Plot of Residuals")
qqline(residuals)
shapiro_test <- shapiro.test(residuals)
cat("1. Normality (Shapiro-Wilk): W =", round(shapiro_test$statistic, 4),
", p =", format.pval(shapiro_test$p.value, digits = 4), "\n")
# 2. Homogeneity of variance
fitted_vals <- fitted(model)
plot(fitted_vals, residuals,
xlab = "Fitted Values", ylab = "Residuals",
main = "Residuals vs Fitted")
abline(h = 0, col = "red", lty = 2)
# 3. Linearity check (residuals vs covariate)
plot(data[[covariate]], residuals,
xlab = covariate, ylab = "Residuals",
main = paste("Residuals vs", covariate))
abline(h = 0, col = "red", lty = 2)
# 4. Homogeneity of regression slopes - FIXED: Properly nested comparison
# Get all terms from the original model
model_terms <- attr(terms(model), "term.labels")
# Remove any existing interaction involving the group_var and covariate
# Keep all other terms
other_terms <- model_terms[!grepl(paste0(group_var, ":", covariate, "|",
covariate, ":", group_var), model_terms)]
# Build the model formula without the interaction
if (length(other_terms) > 0) {
base_formula <- paste(response_var, "~", paste(other_terms, collapse = " + "))
} else {
base_formula <- paste(response_var, "~ 1")
}
# Build interaction model formula by adding the interaction term
interaction_formula <- paste(base_formula, "+", group_var, "*", covariate)
# Fit both models
base_model <- lm(as.formula(base_formula), data = data)
interaction_model <- lm(as.formula(interaction_formula), data = data)
# Test for significant interaction
interaction_test <- anova(base_model, interaction_model)
cat("\n2. Homogeneity of slopes test:\n")
print(interaction_test)
# Check if the interaction is significant - FIXED: Handle NA p-values
p_value <- interaction_test$`Pr(>F)`[2]
if (!is.na(p_value)) {
if (p_value < 0.05) {
cat("\nWARNING: Significant interaction detected (p =",
format.pval(p_value, digits = 4),
"). Slopes are not homogeneous.\n")
# Plot different slopes
plot(data[[covariate]], data[[response_var]],
col = as.numeric(data[[group_var]]),
xlab = covariate, ylab = response_var,
main = "Different Slopes by Group (Interaction Present)")
# Add regression lines for each group
groups <- unique(data[[group_var]])
for (i in seq_along(groups)) {
group_data <- data[data[[group_var]] == groups[i], ]
if (nrow(group_data) > 1) {
abline(lm(as.formula(paste(response_var, "~", covariate)),
data = group_data),
col = i, lwd = 2)
}
}
legend("topright", legend = levels(data[[group_var]]),
col = 1:length(levels(data[[group_var]])),
lwd = 2, pch = 1)
} else {
cat("\nNo significant interaction (p =",
format.pval(p_value, digits = 4),
"). Homogeneity of slopes assumption is satisfied.\n")
}
} else {
cat("\nNote: Could not calculate p-value for interaction test.\n")
cat("Model comparison summary:\n")
print(interaction_test)
}
# 5. Residual histogram
hist(residuals, main = "Histogram of Residuals",
xlab = "Residuals", col = "lightblue")
par(mfrow = c(1, 1))
# Additional diagnostics
cat("\n3. Additional Statistics:\n")
cat("- Mean of residuals:", round(mean(residuals), 4), "\n")
cat("- SD of residuals:", round(sd(residuals), 4), "\n")
cat("- Max absolute residual:", round(max(abs(residuals)), 4), "\n")
# Check for outliers (residuals > 3 SD)
if (sd(residuals) > 0) {
outlier_count <- sum(abs(scale(residuals)) > 3, na.rm = TRUE)
cat("- Potential outliers (>3 SD):", outlier_count, "\n")
}
# Return diagnostic results
return(list(
shapiro_test = shapiro_test,
interaction_test = interaction_test,
p_value_interaction = p_value
))
}
Applications
Example 1: Clinical Trial with Baseline Adjustment
# Simulated clinical trial data
set.seed(123)
n <- 100
trial_data <- data.frame(
patient_id = 1:n,
treatment = factor(rep(c("Drug", "Placebo"), each = n/2)),
baseline_bp = rnorm(n, mean = 150, sd = 15),
age = sample(40:75, n, replace = TRUE),
bmi = rnorm(n, mean = 28, sd = 4)
)
# Generate post-treatment BP with treatment effect and baseline correlation
trial_data$post_bp <- with(trial_data,
baseline_bp * 0.7 +
ifelse(treatment == "Drug", -15, -5) +
(age - 60) * 0.2 +
(bmi - 28) * 0.5 +
rnorm(n, 0, 8)
)
cat("=== Basic ANOVA (ignoring covariates) ===\n")
anova_simple <- aov(post_bp ~ treatment, data = trial_data)
print(summary(anova_simple))
cat("\n=== ANCOVA (adjusting for baseline BP) ===\n")
ancova_model <- aov(post_bp ~ treatment + baseline_bp, data = trial_data)
print(summary(ancova_model))
cat("\n=== ANCOVA (multiple covariates) ===\n")
ancova_full <- lm(post_bp ~ treatment + baseline_bp + age + bmi, data = trial_data)
print(summary(ancova_full))
cat("\n=== Adjusted Means ===\n")
library(emmeans)
adj_means <- emmeans(ancova_full, specs = ~ treatment)
print(adj_means)
cat("\n=== Pairwise Comparisons with Adjustment ===\n")
pairwise_comparisons <- pairs(adj_means, adjust = "holm")
print(pairwise_comparisons)
# Diagnostic checks
cat("\n=== Model Diagnostics ===\n")
check_ancova_assumptions_simple(ancova_full, trial_data, "treatment", "baseline_bp")
Example 2: Educational Intervention Study
# Using built-in iris dataset to demonstrate MANCOVA
data(iris)
# Create a categorical variable and covariate
set.seed(456)
iris$treatment <- factor(sample(c("Method_A", "Method_B", "Control"),
nrow(iris), replace = TRUE))
iris$pretest_score <- iris$Sepal.Length * 10 + rnorm(nrow(iris), 50, 5)
# MANCOVA with multiple DVs
cat("=== MANCOVA Example ===\n")
Y <- cbind(iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width)
manova_model <- manova(Y ~ treatment + pretest_score, data = iris)
cat("\n--- Wilks' Lambda Test ---\n")
print(summary(manova_model, test = "Wilks"))
cat("\n--- Pillai's Trace Test ---\n")
print(summary(manova_model, test = "Pillai"))
cat("\n--- Individual ANCOVAs ---\n")
for (i in 1:3) {
cat("\nDV", i, ":\n")
ancova_indiv <- lm(Y[, i] ~ treatment + pretest_score, data = iris)
print(summary(ancova_indiv))
}
# Visualizing adjusted means
library(ggplot2)
library(dplyr)
# Calculate adjusted means using emmeans
if (require(emmeans)) {
adj_means_plot <- list()
for (i in 1:3) {
model <- lm(Y[, i] ~ treatment + pretest_score, data = iris)
emm <- emmeans(model, specs = ~ treatment)
adj_means_plot[[i]] <- as.data.frame(emm) %>%
mutate(DV = paste("DV", i))
}
plot_data <- bind_rows(adj_means_plot)
ggplot(plot_data, aes(x = treatment, y = emmean, fill = treatment)) +
geom_bar(stat = "identity", alpha = 0.7) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.2) +
facet_wrap(~ DV, scales = "free_y") +
labs(title = "Adjusted Means with 95% Confidence Intervals",
y = "Adjusted Mean", x = "Treatment Group") +
theme_minimal()
}
Example 3: Real Dataset Analysis - Plant Growth
# Using PlantGrowth dataset with simulated covariate
data(PlantGrowth)
set.seed(789)
PlantGrowth$soil_quality <- rnorm(nrow(PlantGrowth), mean = 5, sd = 1) +
ifelse(PlantGrowth$group == "trt1", 0.5,
ifelse(PlantGrowth$group == "trt2", -0.5, 0))
cat("=== Plant Growth ANCOVA ===\n")
cat("Research question: Do treatments affect plant weight after controlling for soil quality?\n\n")
# EDA
cat("--- Exploratory Data Analysis ---\n")
cat("Group means (unadjusted):\n")
print(aggregate(weight ~ group, data = PlantGrowth, mean))
cat("\nCorrelation between weight and soil quality:",
cor(PlantGrowth$weight, PlantGrowth$soil_quality), "\n")
# Visualization
ggplot(PlantGrowth, aes(x = soil_quality, y = weight, color = group)) +
geom_point(size = 3) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Relationship between Plant Weight and Soil Quality by Treatment",
x = "Soil Quality", y = "Plant Weight") +
theme_minimal()
# ANCOVA analysis
ancova_plant <- lm(weight ~ group + soil_quality, data = PlantGrowth)
cat("\n--- ANCOVA Results ---\n")
print(summary(ancova_plant))
# Check assumptions
cat("\n--- Assumption Checks ---\n")
check_ancova_assumptions_simple(ancova_plant, PlantGrowth, "group", "soil_quality")
# Contrasts and post-hoc tests
cat("\n--- Post-hoc Comparisons ---\n")
library(multcomp)
contrasts <- glht(ancova_plant, linfct = mcp(group = "Tukey"))
print(summary(contrasts))
# Effect sizes
cat("\n--- Effect Sizes ---\n")
library(effectsize)
eta_squared <- eta_squared(ancova_plant, partial = TRUE)
print(eta_squared)
# Power analysis
cat("\n--- Power Analysis ---\n")
library(pwr)
# Calculate achieved power
f2 <- eta_squared$Eta2_partial[1] / (1 - eta_squared$Eta2_partial[1])
power_achieved <- pwr.f2.test(u = 2, v = 27, f2 = f2, sig.level = 0.05)$power
cat("Achieved power for treatment effect:", round(power_achieved, 3), "\n")
Advanced Topics
1) Nonparametric ANCOVA
# Quandrant test or rank-based ANCOVA
if (require(Rfit)) {
cat("=== Rank-Based ANCOVA ===\n")
rank_model <- rfit(weight ~ group + soil_quality, data = PlantGrowth)
print(summary(rank_model))
}
2) Mixed Effects ANCOVA
# For repeated measures or clustered data
if (require(lme4)) {
# Simulated longitudinal data
set.seed(321)
long_data <- data.frame(
subject = rep(1:20, each = 3),
time = rep(1:3, 20),
treatment = factor(rep(sample(c("A", "B"), 20, replace = TRUE), each = 3)),
baseline = rnorm(60, 100, 15),
response = NA
)
# Generate responses
long_data$response <- with(long_data,
100 +
ifelse(treatment == "A", 5, -5) +
0.7 * baseline +
rnorm(60, 0, 10) + # Residual error
rep(rnorm(20, 0, 5), each = 3) # Random intercept
)
# Linear mixed model (ANCOVA with random effects)
mixed_model <- lmer(response ~ treatment + baseline + time + (1|subject),
data = long_data)
cat("\n=== Mixed Effects ANCOVA ===\n")
print(summary(mixed_model))
# Compare with standard ANCOVA
standard_ancova <- lm(response ~ treatment + baseline + time, data = long_data)
cat("\n--- Comparison: Standard vs Mixed ANCOVA ---\n")
cat("Standard ANCOVA AIC:", AIC(standard_ancova), "\n")
cat("Mixed model AIC:", AIC(mixed_model), "\n")
}
3) Power Analysis and Sample Size Planning
# Power analysis for ANCOVA
calculate_power_ancova <- function(k, n_per_group, f2, rho, alpha = 0.05) {
# k: number of groups
# n_per_group: sample size per group
# f2: effect size (Cohen's f^2)
# rho: correlation between covariate and DV
N <- k * n_per_group
df1 <- k - 1
df2 <- N - k - 1 # -1 for covariate
# Adjust f2 for covariate inclusion
f2_adj <- f2 / (1 - rho^2)
power <- pf(qf(1 - alpha, df1, df2), df1, df2, ncp = N * f2_adj, lower.tail = FALSE)
return(power)
}
cat("=== ANCOVA Power Calculator ===\n")
# Example: 3 groups, 20 per group, medium effect (f2 = 0.15), rho = 0.5
power_example <- calculate_power_ancova(k = 3, n_per_group = 20, f2 = 0.15, rho = 0.5)
cat("Power for specified design:", round(power_example, 3), "\n")
# Sample size calculation for desired power
library(pwr)
ss <- pwr.f2.test(u = 2, f2 = 0.15, sig.level = 0.05, power = 0.80)
cat("Required sample size (per group for 3 groups):", ceiling(ss$v/3 + 3), "\n")
Software Implementation
# Comprehensive ANCOVA analysis function
run_ancova_analysis <- function(formula, data, group_var, covariates,
pairwise = TRUE, diagnostics = TRUE) {
# Fit model
model <- lm(formula, data = data)
cat("=== ANCOVA ANALYSIS REPORT ===\n\n")
cat("Model formula:", deparse(formula), "\n")
cat("Number of groups:", length(unique(data[[group_var]])), "\n")
cat("Number of covariates:", length(covariates), "\n")
cat("Total observations:", nrow(data), "\n\n")
# Model summary
cat("--- MODEL SUMMARY ---\n")
print(summary(model))
cat("\n")
# ANOVA table
cat("--- ANOVA TABLE ---\n")
print(anova(model))
cat("\n")
# Assumption checks
if (diagnostics) {
cat("--- ASSUMPTION DIAGNOSTICS ---\n")
check_ancova_assumptions_simple(model, data, group_var, covariates[1])
}
# Adjusted means and comparisons
if (pairwise && require(emmeans)) {
cat("\n--- ADJUSTED GROUP MEANS ---\n")
adj_means <- emmeans(model, specs = as.formula(paste("~", group_var)))
print(adj_means)
cat("\n--- PAIRWISE COMPARISONS (Holm-adjusted) ---\n")
pairwise_results <- pairs(adj_means, adjust = "holm")
print(pairwise_results)
# Plot adjusted means
plot_data <- as.data.frame(adj_means)
p <- ggplot(plot_data, aes_string(x = group_var, y = "emmean", fill = group_var)) +
geom_bar(stat = "identity", alpha = 0.7) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.2) +
labs(title = "Adjusted Group Means with 95% Confidence Intervals",
y = "Adjusted Mean", x = group_var) +
theme_minimal()
print(p)
}
# Effect sizes
if (require(effectsize)) {
cat("\n--- EFFECT SIZES ---\n")
es <- eta_squared(model, partial = TRUE)
print(es)
}
return(model)
}
# Example usage with mtcars
cat("\n\n=== EXAMPLE: ANCOVA with mtcars dataset ===\n")
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
# Run comprehensive analysis
model_result <- run_ancova_analysis(
formula = mpg ~ cyl + hp + wt,
data = mtcars,
group_var = "cyl",
covariates = c("hp", "wt")
)
Common Issues and Solutions
Violation of Homogeneity of Slopes
# When slopes differ by group
check_slopes <- function(data, dv, group, covariate) {
# Fit interaction model
interaction_model <- lm(as.formula(paste(dv, "~", group, "*", covariate)), data = data)
# Test interaction
simple_model <- lm(as.formula(paste(dv, "~", group, "+", covariate)), data = data)
anova_result <- anova(simple_model, interaction_model)
cat("Test for homogeneity of regression slopes:\n")
print(anova_result)
if (anova_result$`Pr(>F)`[2] < 0.05) {
cat("\nWARNING: Significant interaction detected. Slopes are not homogeneous.\n")
cat("Consider:\n")
cat("1. Reporting separate slopes for each group\n")
cat("2. Using Johnson-Neyman technique to identify regions of significance\n")
cat("3. Considering moderated regression analysis\n")
# Visualize different slopes
library(ggplot2)
p <- ggplot(data, aes_string(x = covariate, y = dv, color = group)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Different Slopes by Group (Interaction Present)",
subtitle = "ANCOVA assumption violated") +
theme_minimal()
print(p)
} else {
cat("\nNo significant interaction. Homogeneity of slopes assumption satisfied.\n")
}
}
# Example
check_slopes(mtcars, "mpg", "cyl", "wt")
Dealing with Missing Data
# Multiple imputation for ANCOVA with missing covariate values
if (require(mice)) {
cat("=== Multiple Imputation for ANCOVA ===\n")
# Create dataset with missing values
set.seed(123)
missing_data <- PlantGrowth
missing_data$soil_quality[sample(1:nrow(missing_data), 5)] <- NA
# Perform multiple imputation
imputed_data <- mice(missing_data, m = 5, method = "pmm", printFlag = FALSE)
# Fit ANCOVA on each imputed dataset
models <- with(imputed_data, lm(weight ~ group + soil_quality))
# Pool results
pooled_results <- pool(models)
cat("Pooled coefficients:\n")
print(summary(pooled_results))
}
Problems and Exercises
1. Conceptual Problems:
a) Prove that the adjusted group mean in ANCOVA is unbiased when the covariate is uncorrelated with treatment assignment b) Derive the variance of the adjusted treatment effect estimator c) Show how ANCOVA increases power compared to ANOVA
2. Applied Problems:
a) Analyze the SOCR Consumer Price Index dataset using ANCOVA b) Conduct a MANCOVA on the Iris dataset with species as the factor and sepal length as covariate c) Perform power analysis for a planned study with 4 groups, expecting a medium effect (f² = 0.20), and correlation ρ = 0.6 between covariate and DV
3. Simulation Study:
# Simulation to demonstrate ANCOVA advantages
simulate_ancova_power <- function(n_sim = 1000, n_per_group = 30,
effect_size = 0.5, rho = 0.6) {
power_anova <- numeric(n_sim)
power_ancova <- numeric(n_sim)
for (i in 1:n_sim) {
# Simulate data
group <- factor(rep(1:3, each = n_per_group))
covariate <- rnorm(n_per_group * 3)
# Generate response with treatment effect and covariate relationship
response <- effect_size * as.numeric(group) + rho * covariate +
rnorm(n_per_group * 3, 0, sqrt(1 - rho^2))
data <- data.frame(response, group, covariate)
# Fit ANOVA
anova_model <- aov(response ~ group, data = data)
power_anova[i] <- summary(anova_model)[[1]]$`Pr(>F)`[1] < 0.05
# Fit ANCOVA
ancova_model <- aov(response ~ group + covariate, data = data)
power_ancova[i] <- summary(ancova_model)[[1]]$`Pr(>F)`[1] < 0.05
}
cat("Simulation Results (n =", n_sim, "):\n")
cat("ANOVA power:", mean(power_anova), "\n")
cat("ANCOVA power:", mean(power_ancova), "\n")
cat("Power increase:", mean(power_ancova) - mean(power_anova), "\n")
return(data.frame(ANOVA = power_anova, ANCOVA = power_ancova))
}
# Run simulation
results <- simulate_ancova_power(n_sim = 500, n_per_group = 25,
effect_size = 0.4, rho = 0.5)
References
1. Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). *Designing Experiments and Analyzing Data: A Model Comparison Perspective* (3rd ed.). Routledge.
2. Rutherford, A. (2011). *ANOVA and ANCOVA: A GLM Approach* (2nd ed.). Wiley.
3. Tabachnick, B. G., & Fidell, L. S. (2018). *Using Multivariate Statistics* (7th ed.). Pearson.
4. Miller, G. A., & Chapman, J. P. (2001). Misunderstanding analysis of covariance. *Journal of Abnormal Psychology, 110*(1), 40-48.
5. Stevens, J. P. (2012). *Applied Multivariate Statistics for the Social Sciences* (5th ed.). Routledge.
Online Resources
- SOCR Home page: https://www.socr.umich.edu
Translate this page: