Difference between revisions of "AP Statistics Curriculum 2007 ANOVA 1Way"

From SOCR
Jump to: navigation, search
(SOCR ANOVA Calculations)
(SOCR ANOVA Calculations)
Line 110: Line 110:
 
|26.2||A
 
|26.2||A
 
|-
 
|-
|29.2||A
+
|24.3||A
 
|-
 
|-
|29.1||A
+
|21.8||A
 
|-
 
|-
|21.3||A
+
|28.1||A
 
|-
 
|-
|20.1||B
+
|29.2||B
 
|-
 
|-
|24.3||B
+
|28.1||B
 +
|-
 +
|27.3||B
 
|-
 
|-
|28.1||B
+
|31.2||B
 
|-
 
|-
|30.8||B
+
|29.1||C
 
|-
 
|-
|22.4||C
+
|30.8||C
 
|-
 
|-
|19.3||C
+
|33.9||C
 
|-
 
|-
|21.8||C
+
|32.8||C
 
|-
 
|-
|27.3||C
+
|21.3||D
 
|-
 
|-
|33.9||D
+
|22.4||D
 
|-
 
|-
 
|24.3||D
 
|24.3||D
 
|-
 
|-
|19.9||D
+
|21.8||D
|-
 
|28.1||D
 
 
|-
 
|-
|31.2||E
+
|20.1||E
 
|-
 
|-
|32.8||E
+
|19.3||E
 
|-
 
|-
|21.8||E
+
|19.9||E
 
|-
 
|-
 
|22.1||E
 
|22.1||E

Revision as of 12:08, 19 February 2008

General Advance-Placement (AP) Statistics Curriculum - One-Way Analysis of Variance (ANOVA)

In the two-sample inference chapter we considered the comparisons of two independent group means using the independent T-test. Now, we expand our inference methods to study and compare k independent samples. In this case, we will be decomposing the entire variation in the data into (independent/orthogonal) components - i.e., we'll be analyzing the variance of the data. Hence, this procedure called Analysis of Variance (ANOVA).

Motivational Example

Suppose 5 varieties of peas are currently being tested by a large agribusiness cooperative to determine which is best suited for production. A field was divided into 20 plots, with each variety of peas planted in four plots. The yields (in bushels of peas) produced from each plot are shown in two identical forms in the tables below.

Variety of Pea
A B C D E
26.2 29.2 29.1 21.3 20.1
24.3 28.1 30.8 22.4 19.3
21.8 27.3 33.9 24.3 19.9
28.1 31.2 32.8 21.8 22.1



A 26.2,24.3,21.8,28.1
B 29.2,28.1,27.3,31.2
C 29.1,30.8,33.9,32.8
D 21.3,22.4,24.3,21.8
E 20.1,19.3,19.9,22.1

Using the SOCR Charts (see SOCR Box-and-Whisker Plot Activity and Dot Plot Activity) we can generate plots that enable us to compare visually the yields of the 5 different types peas.

SOCR EBook Dinov ANOVA1 021708 Fig1.jpg

Using ANOVA, the data are regarded as random samples from k populations. Suppose the population means of the samples are \(\mu_1, \mu_2, \mu_3, \mu_4, \mu_5\) and their population standard deviations are\[\sigma_1, \sigma_2, \sigma_3, \sigma_4, \sigma_5\]. We have 5 group means to compare. Why not just carry out \({5\choose 2}=10\) T-tests comparing all (independent) pairs of groups?

Repeated T-tests would mean testing hull hypotheses of the type \(H_o: \mu_i = \mu_j, \forall i\not= j\). What is the problem with this approach? Suppose each test is carried out at \(\alpha = 0.05\), so a type I error is 5% for each test. Then, the overall risk of a type I error is larger than 0.05 and gets much larger as the number of groups (k) gets larger. To solve this problem, we need to make multiple comparisons with an overall error of \(\alpha = 0.05\) (or whichever level is specified initially).


The main idea behind ANOVA is that we need to know how much inherent variability there is in the data before we can judge whether there is a difference in the sample means - i.e., presence of a grouping effect. To make an inference about means we compare two types of variability:

variability between sample means
variability within each group

It is very important that we keep these two types of variability in mind as we work through the following formulas. It is our goal to come up with a numerical recipe that describes/computes each of these variabilities.


One-Way ANOVA Calculations

Let's make the following notation: \[y_{i,j}\] = the measurement from group i, observation-index j.

k = number of groups

\[n_i\] = number of observations in group i

n = total number of observations, \(n= n_1 + n_2 + \cdots + n_k\)
The group mean for group i is\[\bar{y}_{i,.} = {\sum_{j=1}^{n_i}{y_{i,j}} \over n_i}\]
The grand mean is\[\bar{y}=\bar{y}_{.,.} = {\sum_{i=1}^k {\sum_{j=1}^{n_i}{y_{i,j}}} \over n}\]

To compute the difference between the means we will compare each group mean to the grand mean.

  • SST (Sum Square due to Treatment):

First, we describe the variation between the group means. For the independent T-test we described the difference between two group means as \(\bar{y_1} - \bar{y_1}\). In ANOVA we describe the difference between k means as sums of squares due to treatments (or between-group variance):

SST(between) = \(\sum_{i=1}^{k}{n_i(\bar{y}_{i,.}-\bar{y})^2}\). SST can be though of as the difference between each group mean and the grand mean.
Degrees of Freedom: df (Between) = k – 1.
Mean Sum Square due to Treatment (Between): c This measures variability between the sample means.
  • SSE (Sum Square due to Error):

Second, we assess the within group variation. Recall that to measure the variability within a single sample we used\[\sqrt{\sum_{i=1}^n{(y_i - \bar{y})^2} \over n-1}\]. In ANOVA to describe the combined variation within the k groups we use sums of squares due to error (within-group variation):

SSE(within) = \(\sum_{i=1}^{k}{\sum_{j=1}^{n_i}{(y_{i,j}-\bar{y}_{i,.})^2}}\), which can be though of as the combination of variation within the k groups.
SSE(within)degrees of freedom: df (Within) = n - k
Mean variability within groups\[MSE(within) = {SSE(Between)\over df(Within)}.\] This is a measure of variability within the groups.
  • Decomposition of Variance: Note that we have the following decomposition of the total variability in the data:
(Verbal): Deviation of an observation from the grand mean (Total variability) = Variation-Within + Variation_Between!
(Mathematics)\[y_{i,j} - \bar{y} = (y_{i,j} - \bar{y}_{i,.}) + (\bar{y}_{i,.} - \bar{y})\]. Ans summing up over all observations we get the desired ANOVA decomposition.
(ANOVA Decomposition)\[\sum_{i=1}^{k}{\sum_{j=1}^{n_i}{(y_{i,j} - \bar{y})^2}} = \sum_{i=1}^{k}{\sum_{j=1}^{n_i}{(y_{i,j} - \bar{y}_{i,.})^2}} + \sum_{i=1}^{k}{n_i(\bar{y}_{i,.} - \bar{y})^2}\].
  • Interpretations:
This means SST(total) = SSE(within) + SST(between)
df(Total) = df(Within) + df(Between) = (n - k) + (k – 1) = n - 1

SOCR ANOVA Calculations

SOCR Analyses provide the tools to compute the 1-way ANOVA. For example, the ANOVA for the peas data above may be easily computed - see the image below. Note that SOCR ANOVA requires the data to be entered in this format:

Variety of Pea
Value Group-Index
26.2 A
24.3 A
21.8 A
28.1 A
29.2 B
28.1 B
27.3 B
31.2 B
29.1 C
30.8 C
33.9 C
32.8 C
21.3 D
22.4 D
24.3 D
21.8 D
20.1 E
19.3 E
19.9 E
22.1 E
SOCR EBook Dinov ANOVA1 021708 Fig2.jpg
F-Value = 23.966
P-Value = 2.2855121203368967E-6

References




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif