Difference between revisions of "SMHS ANOVA"

Latest revision as of 17:18, 10 February 2026

Scientific Methods for Health Sciences - Analysis of Variance (ANOVA)

Overview

Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more group means. While the t-test is limited to comparing two groups, ANOVA generalizes this to multiple groups, partitioning the total observed variance into components attributable to different sources (e.g., between‑group and within‑group variation). This chapter introduces one‑way and two‑way ANOVA, discusses underlying assumptions, provides step‑by‑step calculations, and illustrates applications with real data and R examples.

Motivation

Suppose a plant biologist wants to compare the yield of five different varieties of wheat. Each variety is planted in four randomly assigned plots, yielding 20 plots total. The yields (in bushels/acre) are:

Variety A	Variety B	Variety C	Variety D	Variety E
26.2	29.2	29.1	21.3	20.1
24.3	28.1	30.8	22.4	19.3
21.8	27.3	33.9	24.3	19.9
28.1	31.2	32.8	21.8	22.1

If we denote the population means of the five varieties by \(\mu_1, \mu_2, \mu_3, \mu_4, \mu_5\), we could perform \(\binom{5}{2}=10\) pairwise t‑tests. However, this approach inflates the Type I error rate and lacks a single overall test. ANOVA provides an integrated framework that simultaneously tests \(H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu_5\) vs. \(H_a:\) at least one mean differs, while controlling the family‑wise error rate.

Theory

One‑Way ANOVA

One‑way ANOVA compares the means of \(k\) independent groups under the assumption that the populations are normally distributed with equal variances.

Notation:

\(y_{ij}\) = \(j\)‑th observation in group \(i\) (\(i=1,\dots,k\); \(j=1,\dots,n_i\)).
\(n_i\) = number of observations in group \(i.\)
\(n = \sum_{i=1}^k n_i\) = total sample size.
Group mean\[\bar{y}_{i.} = \frac{1}{n_i}\sum_{j=1}^{n_i} y_{ij}.\]
Grand mean\[\bar{y}_{..} = \frac{1}{n}\sum_{i=1}^k \sum_{j=1}^{n_i} y_{ij}.\]

Variance decomposition:
Total sum of squares\[SS_{total} = \sum_{i=1}^k \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_{..})^2\] with \(df_{total}=n-1.\)
Between‑groups sum of squares\[SS_{between} = \sum_{i=1}^k n_i (\bar{y}_{i.} - \bar{y}_{..})^2\] with \(df_{between}=k-1.\)
Within‑groups (error) sum of squares\[SS_{within} = \sum_{i=1}^k \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_{i.})^2\] with \(df_{within}=n-k.\)

The fundamental identity is\[SS_{total} = SS_{between} + SS_{within}\], and likewise \(df_{total} = df_{between} + df_{within}.\)

Mean squares and F‑statistic:
\(MS_{between} = \frac{SS_{between}}{df_{between}}\)
\(MS_{within} = \frac{SS_{within}}{df_{within}}\)
Test statistic\[F = \frac{MS_{between}}{MS_{within}}.\]

Under \(H_0\) (all population means equal), \(F\) follows an F‑distribution with \((k-1,\; n-k)\) degrees of freedom. A large \(F\) value suggests that the between‑group variation is substantial relative to within‑group variation, providing evidence against \(H_0\).

ANOVA table (general form):

Source of Variation	Degrees of Freedom (df)	Sum of Squares (SS)	Mean Square (MS)	F‑statistic	P‑value
Between Groups	\(k-1\)	\(SS_{between}\)	\(MS_{between}\)	\(F = \frac{MS_{between}}{MS_{within}}\)	\(P(F_{k-1,\; n-k} > F_{\text{obs}})\)
Within Groups (Error)	\(n-k\)	\(SS_{within}\)	\(MS_{within}\)
Total	\(n-1\)	\(SS_{total}\)

Assumptions and diagnostics:

Independence: Observations are independent within and across groups.
Normality: The residuals ( \(e_{ij} = y_{ij} - \bar{y}_{i.}\) ) should be approximately normally distributed for each group. This can be checked with Q‑Q plots, Shapiro‑Wilk, or Kolmogorov‑Smirnov tests.
Homoscedasticity (equal variances): Group populations should have the same variance \(\sigma^2\). Formal tests (Levene’s, Bartlett’s) or visual inspection of residual‑vs‑fitted plots can be used.

If assumptions are violated, consider transformations (log, square‑root) or non‑parametric alternatives (Kruskal‑Wallis test).

Effect size – Eta‑squared and partial Eta‑squared:
\(\eta^2 = \frac{SS_{between}}{SS_{total}}\) (proportion of total variance explained by group differences).
Partial \(\eta^2 = \frac{SS_{between}}{SS_{between} + SS_{within}}\).

Interpretation: 0.01 = small, 0.06 = medium, 0.14 = large effect (Cohen, 1988).

Post‑hoc comparisons: If the overall \(F\)-test is significant, we conduct pairwise comparisons to identify which groups differ. Common methods (with correction for multiple testing) include:
Tukey’s HSD (Honest Significant Difference)
Bonferroni correction
Scheffé’s method

Two‑Way ANOVA

Two‑way ANOVA extends the idea to two categorical factors (A and B), allowing us to test:

Main effect of factor A.
Main effect of factor B.
Interaction effect between A and B (whether the effect of one factor depends on the level of the other).

Model\[y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \varepsilon_{ijk}\]

where:

\(\mu\) = grand mean.
\(\alpha_i\) = effect of level \(i\) of factor A (\(i=1,\dots,a\)).
\(\beta_j\) = effect of level \(j\) of factor B (\(j=1,\dots,b\)).
\((\alpha\beta)_{ij}\) = interaction effect.
\(\varepsilon_{ijk} \sim N(0,\sigma^2)\) (error), with \(k=1,\dots,r\) replicates per cell.

Sum of squares decomposition\[SS_{total} = SS_A + SS_B + SS_{AB} + SS_{error}\]

with degrees of freedom\[df_{total} = N-1,\; df_A = a-1,\; df_B = b-1,\; df_{AB} = (a-1)(b-1),\; df_{error} = N - ab\], where \(N = a \times b \times r\) is the total number of observations.

Hypotheses:

\(H_0^{(A)}: \alpha_1 = \alpha_2 = \dots = \alpha_a = 0\) (no main effect of A).
\(H_0^{(B)}: \beta_1 = \beta_2 = \dots = \beta_b = 0\) (no main effect of B).
\(H_0^{(AB)}: (\alpha\beta)_{ij} = 0\) for all \(i,j\) (no interaction).

ANOVA table for two‑way design (balanced, with replication):

Source	df	SS	MS	F	P‑value
Factor A	\(a-1\)	\(SS_A\)	\(MS_A = \frac{SS_A}{df_A}\)	\(F_A = \frac{MS_A}{MS_{error}}\)	…
Factor B	\(b-1\)	\(SS_B\)	\(MS_B = \frac{SS_B}{df_B}\)	\(F_B = \frac{MS_B}{MS_{error}}\)	…
Interaction A×B	\((a-1)(b-1)\)	\(SS_{AB}\)	\(MS_{AB} = \frac{SS_{AB}}{df_{AB}}\)	\(F_{AB} = \frac{MS_{AB}}{MS_{error}}\)	…
Error	\(N-ab\)	\(SS_{error}\)	\(MS_{error} = \frac{SS_{error}}{df_{error}}\)
Total	\(N-1\)	\(SS_{total}\)

Assumptions: same as one‑way ANOVA (independence, normality, homoscedasticity). Additionally, the interaction model assumes that the effects are additive unless the interaction term is included.

Interactions: If the interaction is significant, the main effects cannot be interpreted separately. Simple‑effects analysis (comparing levels of one factor at fixed levels of the other) is then appropriate.

Applications

Example 1: One‑Way ANOVA with R

A clinical trial tests three new analgesics (Drug A, B, C) against a placebo for pain relief (score 0‑10, lower is better). Data:

Pain = c(5,6,4,5,7, 3,4,5,4,3, 2,3,4,3,2, 6,7,6,5,7)
Group = factor(rep(c("Placebo","DrugA","DrugB","DrugC"), each=5))

Step 1: Exploratory data analysis

boxplot(Pain ~ Group, col="lightblue", main="Pain Score by Treatment", 
        xlab="Treatment", ylab="Pain Score")

Step 2: Fit ANOVA model

fit <- aov(Pain ~ Group)
summary(fit)

Output:

            Df Sum Sq Mean Sq F value   Pr(>F)    
Group        3  40.00  13.333   20.00 1.25e-05 ***
Residuals   16  10.67   0.667                     
---
Significant at <0.001

Step 3: Check assumptions

# Normality of residuals
shapiro.test(residuals(fit))

# Homogeneity of variances (Levene’s test)
library(car)
leveneTest(Pain ~ Group)

Step 4: Post‑hoc comparisons (Tukey)

TukeyHSD(fit, conf.level=0.95)
plot(TukeyHSD(fit))

Step 5: Effect size

eta_squared <- summary(fit)[[1]][1,2] / sum(summary(fit)[[1]][,2])
eta_squared

Example 2: Two‑Way ANOVA with Interaction

A study examines how gender (M/F) and exercise regimen (None, Light, Heavy) affect cholesterol level (mg/dL). Data are balanced with 5 subjects per cell.

R analysis:

chol <- c(220,210,230,215,225, 200,195,205,210,200, 
          190,185,195,190,180, 210,205,215,200,210,
          180,175,185,180,170, 170,165,175,170,160)
gender <- factor(rep(rep(c("M","F"), each=5), 3))
exercise <- factor(rep(c("None","Light","Heavy"), each=10))

# Two‑way ANOVA with interaction
fit2 <- aov(chol ~ gender * exercise)
summary(fit2)

# Check interaction visually
interaction.plot(exercise, gender, chol, type="b", col=c("blue","red"), 
                 pch=c(16,18), main="Interaction Plot: Cholesterol")

If the interaction is not significant, we may refit the model without it:

fit2_additive <- aov(chol ~ gender + exercise)
summary(fit2_additive)

Software

R Commands for ANOVA

One‑way ANOVA: aov(y ~ group, data)
Two‑way ANOVA (with interaction): aov(y ~ factor1 * factor2, data)
Check assumptions:

 - Normality: shapiro.test(residuals(model))
 - Homogeneity of variances: leveneTest(y ~ group, data)

Post‑hoc tests: TukeyHSD(model), pairwise.t.test(y, group, p.adjust="bonferroni")
Effect size: library(effectsize); eta_squared(model)

Complete R Example (One‑Way)

# Simulated data: three diets (A, B, C) and weight loss (lbs)
set.seed(123)
n <- 20
diet <- factor(sample(rep(c("A","B","C"), c(7,7,6))))
weight_loss <- c(rnorm(7,5,1.2), rnorm(7,7,1.2), rnorm(6,4,1.2))
data <- data.frame(diet, weight_loss)

# ANOVA
model <- aov(weight_loss ~ diet, data=data)
summary(model)

# Diagnostic plots
par(mfrow=c(2,2))
plot(model)
par(mfrow=c(1,1))

# Post‑hoc (if significant)
if(summary(model)[[1]][1,5] < 0.05) {
  TukeyHSD(model)
}

# Effect size
library(effectsize)
eta_squared(model)

Problems

Problem 1 (One‑Way ANOVA)

A manufacturer compares the assembly times (seconds) of three brands of ping‑pong tables.

Assembly Time (sec)	Brand
93, 67, 77, 92, 97, 62	1
136, 120, 115, 104, 115, 121, 102, 130	2
198, 217, 209, 221, 190	3

At \(\alpha=0.05\), test whether the average assembly times differ across brands.

Solution outline:

1. State hypotheses\[H_0: \mu_1 = \mu_2 = \mu_3\] vs. \(H_a\): at least one mean differs.

2. Check assumptions (normality, equal variance).

3. Compute ANOVA table.

4. Make decision based on F‑test.

5. If significant, perform post‑hoc comparisons.

Problem 2 (Two‑Way ANOVA)

A researcher investigates how gender (M, F) and education level (High School, Bachelor, Graduate) affect starting salary. Data (in thousands):

Gender	Education	Salaries
M	High School	45, 48, 50
M	Bachelor	65, 68, 70
M	Graduate	85, 88, 90
F	High School	42, 44, 46
F	Bachelor	60, 62, 64
F	Graduate	80, 82, 84

Test for main effects and interaction. If the interaction is significant, conduct simple‑effects analysis.

References

See the SOCR SDA app ANOVA Learning Module
SOCR ANOVA Chapter
ANOVA – Wikipedia
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
Kutner, M.H., Nachtsheim, C.J., Neter, J., & Li, W. (2005). Applied Linear Statistical Models (5th ed.). McGraw‑Hill.

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

Difference between revisions of "SMHS ANOVA"

Latest revision as of 17:18, 10 February 2026

Contents

Scientific Methods for Health Sciences - Analysis of Variance (ANOVA)

Overview

Motivation

Theory

One‑Way ANOVA

Two‑Way ANOVA

Applications

Example 1: One‑Way ANOVA with R

Example 2: Two‑Way ANOVA with Interaction

Software

R Commands for ANOVA

Complete R Example (One‑Way)

Problems

Problem 1 (One‑Way ANOVA)

Problem 2 (Two‑Way ANOVA)

References

See Also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
-==[[SMHS| Scientific Methods for Health Sciences]] - Analysis of Variance (ANOVA) ==
+==[[SMHS|Scientific Methods for Health Sciences]] - Analysis of Variance (ANOVA)==
+===Overview===
+[[EBook#Chapter_XI:_Analysis_of_Variance_.28ANOVA.29|Analysis of Variance (ANOVA)]] is a statistical method used to test differences between two or more group means. While the [[AP_Statistics_Curriculum_2007_Infer_2Means_Dep|t-test]] is limited to comparing two groups, ANOVA generalizes this to multiple groups, partitioning the total observed variance into components attributable to different sources (e.g., between‑group and within‑group variation). This chapter introduces one‑way and two‑way ANOVA, discusses underlying assumptions, provides step‑by‑step calculations, and illustrates applications with real data and R examples.
+===Motivation===
+Suppose a plant biologist wants to compare the yield of five different varieties of wheat. Each variety is planted in four randomly assigned plots, yielding 20 plots total. The yields (in bushels/acre) are:
+<center>
+{| class="wikitable" style="text-align:center; width:60%" border="1"
+|-
+! Variety A !! Variety B !! Variety C !! Variety D !! Variety E
+|-
+| 26.2 || 29.2 || 29.1 || 21.3 || 20.1
+|-
+| 24.3 || 28.1 || 30.8 || 22.4 || 19.3
+|-
+| 21.8 || 27.3 || 33.9 || 24.3 || 19.9
+|-
+| 28.1 || 31.2 || 32.8 || 21.8 || 22.1
+|}
+</center>
+If we denote the population means of the five varieties by <math>\mu_1, \mu_2, \mu_3, \mu_4, \mu_5</math>, we could perform <math>\binom{5}{2}=10</math> pairwise t‑tests. However, this approach inflates the Type I error rate and lacks a single overall test. ANOVA provides an integrated framework that simultaneously tests
+<math>H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu_5</math> vs. <math>H_a:</math> at least one mean differs,
+while controlling the family‑wise error rate.
+===Theory===
+====One‑Way ANOVA====
+One‑way ANOVA compares the means of <math>k</math> independent groups under the assumption that the populations are normally distributed with equal variances.
+*Notation:
+* <math>y_{ij}</math> = <math>j</math>‑th observation in group <math>i</math> (<math>i=1,\dots,k</math>; <math>j=1,\dots,n_i</math>).
+* <math>n_i</math> = number of observations in group <math>i.</math>
+* <math>n = \sum_{i=1}^k n_i</math> = total sample size.
+* Group mean: <math>\bar{y}_{i.} = \frac{1}{n_i}\sum_{j=1}^{n_i} y_{ij}.</math>
+* Grand mean: <math>\bar{y}_{..} = \frac{1}{n}\sum_{i=1}^k \sum_{j=1}^{n_i} y_{ij}.</math>
+*Variance decomposition:
+* Total sum of squares: <math>SS_{total} = \sum_{i=1}^k \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_{..})^2</math> with <math>df_{total}=n-1.</math>
+* Between‑groups sum of squares: <math>SS_{between} = \sum_{i=1}^k n_i (\bar{y}_{i.} - \bar{y}_{..})^2</math> with <math>df_{between}=k-1.</math>
+* Within‑groups (error) sum of squares: <math>SS_{within} = \sum_{i=1}^k \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_{i.})^2</math> with <math>df_{within}=n-k.</math>
+The fundamental identity is:
+<math>SS_{total} = SS_{between} + SS_{within}</math>, and likewise <math>df_{total} = df_{between} + df_{within}.</math>
+*Mean squares and F‑statistic:
+* <math>MS_{between} = \frac{SS_{between}}{df_{between}}</math>
+* <math>MS_{within} = \frac{SS_{within}}{df_{within}}</math>
+* Test statistic: <math>F = \frac{MS_{between}}{MS_{within}}.</math>
+Under <math>H_0</math> (all population means equal), <math>F</math> follows an F‑distribution with <math>(k-1,\; n-k)</math> degrees of freedom. A large <math>F</math> value suggests that the between‑group variation is substantial relative to within‑group variation, providing evidence against <math>H_0</math>.
+*ANOVA table (general form):
+<center>
+{| class="wikitable" style="text-align:center; width:70%" border="1"
+|-
+! Source of Variation !! Degrees of Freedom (df) !! Sum of Squares (SS) !! Mean Square (MS) !! F‑statistic !! P‑value
+|-
+| Between Groups || <math>k-1</math> || <math>SS_{between}</math> || <math>MS_{between}</math> || <math>F = \frac{MS_{between}}{MS_{within}}</math> || <math>P(F_{k-1,\; n-k} > F_{\text{obs}})</math>
+|-
+| Within Groups (Error) || <math>n-k</math> || <math>SS_{within}</math> || <math>MS_{within}</math> || ||
+|-
+| Total || <math>n-1</math> || <math>SS_{total}</math> || || ||
+|}
+</center>
+*Assumptions and diagnostics:
+# '''Independence''': Observations are independent within and across groups.
+# '''Normality''': The residuals ( <math>e_{ij} = y_{ij} - \bar{y}_{i.}</math> ) should be approximately normally distributed for each group. This can be checked with Q‑Q plots, Shapiro‑Wilk, or Kolmogorov‑Smirnov tests.
+# '''Homoscedasticity''' (equal variances): Group populations should have the same variance <math>\sigma^2</math>. Formal tests (Levene’s, Bartlett’s) or visual inspection of residual‑vs‑fitted plots can be used.
+If assumptions are violated, consider transformations (log, square‑root) or non‑parametric alternatives (Kruskal‑Wallis test).
+*Effect size – Eta‑squared and partial Eta‑squared:
+* <math>\eta^2 = \frac{SS_{between}}{SS_{total}}</math> (proportion of total variance explained by group differences).
+* Partial <math>\eta^2 = \frac{SS_{between}}{SS_{between} + SS_{within}}</math>.
+Interpretation: 0.01 = small, 0.06 = medium, 0.14 = large effect (Cohen, 1988).
+*Post‑hoc comparisons: If the overall <math>F</math>-test is significant, we conduct pairwise comparisons to identify which groups differ. Common methods (with correction for multiple testing) include:
+* Tukey’s HSD (Honest Significant Difference)
+* Bonferroni correction
+* Scheffé’s method
+====Two‑Way ANOVA====
+Two‑way ANOVA extends the idea to two categorical factors (A and B), allowing us to test:
+# Main effect of factor A.
+# Main effect of factor B.
+# Interaction effect between A and B (whether the effect of one factor depends on the level of the other).
+*Model:
+<math>y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \varepsilon_{ijk}</math>
+where:
+* <math>\mu</math> = grand mean.
+* <math>\alpha_i</math> = effect of level <math>i</math> of factor A (<math>i=1,\dots,a</math>).
+* <math>\beta_j</math> = effect of level <math>j</math> of factor B (<math>j=1,\dots,b</math>).
+* <math>(\alpha\beta)_{ij}</math> = interaction effect.
+* <math>\varepsilon_{ijk} \sim N(0,\sigma^2)</math> (error), with <math>k=1,\dots,r</math> replicates per cell.
+*Sum of squares decomposition:
+<math>SS_{total} = SS_A + SS_B + SS_{AB} + SS_{error}</math>
+with degrees of freedom:
+<math>df_{total} = N-1,\; df_A = a-1,\; df_B = b-1,\; df_{AB} = (a-1)(b-1),\; df_{error} = N - ab</math>,
+where <math>N = a \times b \times r</math> is the total number of observations.
+*Hypotheses:
+# <math>H_0^{(A)}: \alpha_1 = \alpha_2 = \dots = \alpha_a = 0</math> (no main effect of A).
+# <math>H_0^{(B)}: \beta_1 = \beta_2 = \dots = \beta_b = 0</math> (no main effect of B).
+# <math>H_0^{(AB)}: (\alpha\beta)_{ij} = 0</math> for all <math>i,j</math> (no interaction).
+*ANOVA table for two‑way design (balanced, with replication):
+<center>
+{| class="wikitable" style="text-align:center; width:75%" border="1"
+|-
+! Source !! df !! SS !! MS !! F !! P‑value
+|-
+| Factor A || <math>a-1</math> || <math>SS_A</math> || <math>MS_A = \frac{SS_A}{df_A}</math> || <math>F_A = \frac{MS_A}{MS_{error}}</math> || …
+|-
+| Factor B || <math>b-1</math> || <math>SS_B</math> || <math>MS_B = \frac{SS_B}{df_B}</math> || <math>F_B = \frac{MS_B}{MS_{error}}</math> || …
+|-
+| Interaction A×B || <math>(a-1)(b-1)</math> || <math>SS_{AB}</math> || <math>MS_{AB} = \frac{SS_{AB}}{df_{AB}}</math> || <math>F_{AB} = \frac{MS_{AB}}{MS_{error}}</math> || …
+|-
+| Error || <math>N-ab</math> || <math>SS_{error}</math> || <math>MS_{error} = \frac{SS_{error}}{df_{error}}</math> || ||
+|-
+| Total || <math>N-1</math> || <math>SS_{total}</math> || || ||
+|}
+</center>
+*Assumptions: same as one‑way ANOVA (independence, normality, homoscedasticity). Additionally, the interaction model assumes that the effects are additive unless the interaction term is included.
+*Interactions: If the interaction is significant, the main effects cannot be interpreted separately. Simple‑effects analysis (comparing levels of one factor at fixed levels of the other) is then appropriate.
+===Applications===
+====Example 1: One‑Way ANOVA with R====
+A clinical trial tests three new analgesics (Drug A, B, C) against a placebo for pain relief (score 0‑10, lower is better). Data:
+<pre>
+Pain = c(5,6,4,5,7, 3,4,5,4,3, 2,3,4,3,2, 6,7,6,5,7)
+Group = factor(rep(c("Placebo","DrugA","DrugB","DrugC"), each=5))
+</pre>
+'''Step 1: Exploratory data analysis'''
+<pre>
+boxplot(Pain ~ Group, col="lightblue", main="Pain Score by Treatment",
+        xlab="Treatment", ylab="Pain Score")
+</pre>
+'''Step 2: Fit ANOVA model'''
+<pre>
+fit <- aov(Pain ~ Group)
+summary(fit)
+</pre>
+Output:
+<pre>
+            Df Sum Sq Mean Sq F value   Pr(>F)
+Group        3  40.00  13.333   20.00 1.25e-05 ***
+Residuals   16  10.67   0.667
+---
+Significant at <0.001
+</pre>
+'''Step 3: Check assumptions'''
+<pre>
+# Normality of residuals
+shapiro.test(residuals(fit))
+# Homogeneity of variances (Levene’s test)
+library(car)
+leveneTest(Pain ~ Group)
+</pre>
+'''Step 4: Post‑hoc comparisons (Tukey)'''
+<pre>
+TukeyHSD(fit, conf.level=0.95)
+plot(TukeyHSD(fit))
+</pre>
+'''Step 5: Effect size'''
+<pre>
+eta_squared <- summary(fit)[[1]][1,2] / sum(summary(fit)[[1]][,2])
+eta_squared
+</pre>
+==== Example 2: Two‑Way ANOVA with Interaction====
+A study examines how gender (M/F) and exercise regimen (None, Light, Heavy) affect cholesterol level (mg/dL). Data are balanced with 5 subjects per cell.
+'''R analysis:'''
+<pre>
+chol <- c(220,210,230,215,225, 200,195,205,210,200,
+,185,195,190,180, 210,205,215,200,210,
+,175,185,180,170, 170,165,175,170,160)
+gender <- factor(rep(rep(c("M","F"), each=5), 3))
+exercise <- factor(rep(c("None","Light","Heavy"), each=10))
+# Two‑way ANOVA with interaction
+fit2 <- aov(chol ~ gender * exercise)
+summary(fit2)
+# Check interaction visually
+interaction.plot(exercise, gender, chol, type="b", col=c("blue","red"),
+                 pch=c(16,18), main="Interaction Plot: Cholesterol")
+</pre>
+If the interaction is not significant, we may refit the model without it:
+<pre>
+fit2_additive <- aov(chol ~ gender + exercise)
+summary(fit2_additive)
+</pre>
+===Software===
+====R Commands for ANOVA====
+* One‑way ANOVA: aov(y ~ group, data)
+* Two‑way ANOVA (with interaction): aov(y ~ factor1 * factor2, data)
+* Check assumptions:
+  - Normality: shapiro.test(residuals(model))
+  - Homogeneity of variances: leveneTest(y ~ group, data)
+* Post‑hoc tests: TukeyHSD(model), pairwise.t.test(y, group, p.adjust="bonferroni")
+* Effect size: library(effectsize); eta_squared(model)
+====Complete R Example (One‑Way)====
+<pre>
+# Simulated data: three diets (A, B, C) and weight loss (lbs)
+set.seed(123)
+n <- 20
+diet <- factor(sample(rep(c("A","B","C"), c(7,7,6))))
+weight_loss <- c(rnorm(7,5,1.2), rnorm(7,7,1.2), rnorm(6,4,1.2))
+data <- data.frame(diet, weight_loss)
+# ANOVA
+model <- aov(weight_loss ~ diet, data=data)
+summary(model)
+# Diagnostic plots
+par(mfrow=c(2,2))
+plot(model)
+par(mfrow=c(1,1))
+# Post‑hoc (if significant)
+if(summary(model)[[1]][1,5] < 0.05) {
+  TukeyHSD(model)
+}
+# Effect size
+library(effectsize)
+eta_squared(model)
+</pre>
+===Problems===
+====Problem 1 (One‑Way ANOVA)====
+A manufacturer compares the assembly times (seconds) of three brands of ping‑pong tables.
+<center>
+{| class="wikitable" style="text-align:center; width:60%" border="1"
+|-
+! Assembly Time (sec) !! Brand
+|-
+| 93, 67, 77, 92, 97, 62 || 1
+|-
+| 136, 120, 115, 104, 115, 121, 102, 130 || 2
+|-
+| 198, 217, 209, 221, 190 || 3
+|}
+</center>
+At <math>\alpha=0.05</math>, test whether the average assembly times differ across brands.
+* Solution outline:
+. State hypotheses: <math>H_0: \mu_1 = \mu_2 = \mu_3</math> vs. <math>H_a</math>: at least one mean differs.
+. Check assumptions (normality, equal variance).
+. Compute ANOVA table.
+. Make decision based on F‑test.
+. If significant, perform post‑hoc comparisons.
+====Problem 2 (Two‑Way ANOVA)====
+A researcher investigates how gender (M, F) and education level (High School, Bachelor, Graduate) affect starting salary. Data (in thousands):
+<center>
+{| class="wikitable" style="text-align:center; width:70%" border="1"
+|-
+! Gender !! Education !! Salaries
+|-
+| M || High School || 45, 48, 50
+|-
+| M || Bachelor || 65, 68, 70
+|-
+| M || Graduate || 85, 88, 90
+|-
+| F || High School || 42, 44, 46
+|-
+| F || Bachelor || 60, 62, 64
+|-
+| F || Graduate || 80, 82, 84
+|}
+</center>
+Test for main effects and interaction. If the interaction is significant, conduct simple‑effects analysis.
+===References===
+* [https://sda.statisticalcomputing.org/learning See the SOCR SDA app ANOVA Learning Module]
+* [http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Chapter_XI:_Analysis_of_Variance_.28ANOVA.29 SOCR ANOVA Chapter]
+* [https://en.wikipedia.org/wiki/Analysis_of_variance ANOVA – Wikipedia]
+* Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
+* Kutner, M.H., Nachtsheim, C.J., Neter, J., & Li, W. (2005). Applied Linear Statistical Models (5th ed.). McGraw‑Hill.
+===See Also===
+* [[AP_Statistics_Curriculum_2007_ANOVA_1Way | AP Statistics: One‑Way ANOVA]]
+* [[AP_Statistics_Curriculum_2007_ANOVA_2Way | AP Statistics: Two‑Way ANOVA]]
+* [[SOCR_EduMaterials_AnalysisActivities_ANOVA_1 | SOCR One‑Way ANOVA Activity]]
+* [[SOCR_EduMaterials_AnalysisActivities_ANOVA_2 | SOCR Two‑Way ANOVA Activity]]
 <hr>
-* SOCR Home page: http://www.socr.umich.edu
+* SOCR Home page: https://socr.umich.edu
-{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_ANOVA}}
+{{translate|pageName=https://wiki.socr.umich.edu/index.php?title=SMHS_ANOVA}}