Difference between revisions of "AP Statistics Curriculum 2007 NonParam ANOVA"

From SOCR
Jump to: navigation, search
 
Line 1: Line 1:
[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] -  Means of Several Independent Samples
+
==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] -  Means of Several Independent Samples==
  
==Differences of Means of Several Independent Samples==
+
In this section we extend the [[EBook#Chapter_XI:_Analysis_of_Variance_.28ANOVA.29 | multi-sample inference which we discussed in the ANOVA section]], to the situation where the [[AP_Statistics_Curriculum_2007_ANOVA_1Way#ANOVA_Conditions| ANOVA assumptions]] are invalid. Hence we use a non-parametric analysis to study differences in centrality between two or more populations.
TBD
+
 
 +
===Motivational Example===
 +
Suppose four groups of students were randomly assigned to be taught with four different techniques, and their achievement test scores were recorded. Are the distributions of test scores the same, or do they differ in location? The data is presented in the table below.
 +
 
 +
<center>
 +
{| class="wikitable" style="text-align:center; width:35%" border="1"
 +
|-
 +
| colspan=5| Teaching Method
 +
|-
 +
|  || '''Method 1''' || Method 2 || Method 3 || Method 4
 +
|-
 +
| rowspan=4| Index || 65 || 75 || 59 || 94
 +
|-
 +
| || 87 || 69 || 78 || 89
 +
|-
 +
| || 73 || 83 || 67 || 80
 +
|-
 +
| || 79 || 81 || 62 || 88
 +
|}
 +
</center>
 +
 
 +
The small sample sizes, and the lack of information about the distribution of each of the four samples, imply that ANOVA may not be appropriate for analyzing these data.
 +
 
 +
==The Kruskal-Wallis Test==
 +
'''Kruskal-Wallis one-way analysis of variance''' by ranks is a non-parametric method for testing equality of two or more population medians. Intuitively, it is identical to a [[AP_Statistics_Curriculum_2007_ANOVA_1Way | one-way analysis of variance]] with the raw data (observed measurements) replaced by their ranks.
 +
 
 +
Since it is a non-parametric method, the Kruskal-Wallis test '''does not''' assume a normal population, unlike the analogous one-way ANOVA.  However, the test does assume identically-shaped distributions for all groups, except for any difference in their centers (e.g., medians).
 +
 
 +
==Calculations==
 +
# Rank all data from all groups together; i.e., rank the data from 1 to N ignoring group membership. Assign any tied values the average of the ranks they would have received had they not been tied.
 +
# The test statistic is given by:
 +
: <math>K = (N-1)\frac{\sum_{i=1}^g n_i(\bar{r}_{i\cdot} - \bar{r})^2}{\sum_{i=1}^g\sum_{j=1}^{n_i}(r_{ij} - \bar{r})^2}</math>, where:
 +
#*<math>n_g</math> is the number of observations in group <math>g</math>
 +
#*<math>r_{ij}</math> is the rank (among all observations) of observation ''j'' from group ''i''
 +
#*<math>N</math> is the total number of observations across all groups
 +
#*<math>\bar{r}_{i\cdot} = \frac{\sum_{j=1}^{n_i}{r_{ij}}}{n_i}</math>,
 +
#*<math>\bar{r} =(N+1)/2</math> is the average of all the <math>r_{ij}</math>.
 +
#*Notice that the denominator of the expression for <math>K</math> is exactly <math>(N-1)N(N+1)/12</math>. Thus <math>K = \frac{12}{N(N+1)}\sum_{i=1}^g n_i(\bar{r}_{i\cdot} - \bar{r})^2</math>.
 +
# A correction for ties can be made by dividing <math>K</math> by <math>1 - \frac{\sum_{i=1}^G (t_{i}^3 - t_{i})}{N^3-N}</math>, where G is the number of groupings of different tied ranks, and t<sub>i</sub> is the number of tied values within group i that are tied at a particular value.  This correction usually makes little difference in the value of K unless there are a large number of ties.
 +
# Finally, the p-value is approximated by <math>\Pr(\chi^2_{g-1} \ge K)</math>. If some n<sub>i</sub>'s are small (i.e., less than 5) the probability distribution of K can be quite different from this [http://en.wikipedia.org/wiki/Chi-square Chi-square distribution].
 +
 
 +
The null hypothesis of equal population medians would then be rejected if <math>K \ge \chi^2_{\alpha: g-1}</math>.
 +
 
 +
===The Kruskal-Wallis Test using SOCR Analyses===
 +
It is much quicker to use [http://socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] to compute the statistical significance of this test. This [[SOCR_EduMaterials_AnalysisActivities_KruskalWallis | SOCR KruskalWallis Test activity]] may also be helpful in understanding how to use this test in SOCR.
  
==Approach==
+
For the teaching-methods example above, we can easily compute the statistical significance of the differences between the group medians (centers):
TBD
+
 
 +
<center>[[Image:SOCR_EBook_Dinov_KruskalWallis_030108_Fig1.jpg|600px]]</center>
  
==Model Validation==
+
Clearly, there are significant differences between the group medians, even after the multiple testing correction, all groups appear different from each other.
TBD
 
  
==Computational Resources: Internet-based SOCR Tools==
+
: Group Method1 vs. Group Method2: 1.0 < 5.2056
TBD
+
: Group Method1 vs. Group Method3: 4.0 < 5.2056
 +
: Group Method1 vs. Group Method4: 6.0 > 5.2056
 +
: Group Method2 vs. Group Method3: 5.0 < 5.2056
 +
: Group Method2 vs. Group Method4: 5.0 < 5.2056
 +
: Group Method3 vs. Group Method4: 10.0 > 5.2056
  
==Examples==
+
==Practice Examples==
 
TBD
 
TBD
  
==Hands-on Activities==
+
==Notes==
TBD
+
* The [http://en.wikipedia.org/wiki/Friedman_test Friedman Fr test] is the rank equivalent of the randomized block design alternative to the [[AP_Statistics_Curriculum_2007_ANOVA_2Way |two-way analysis of variance F test]].
  
<hr>
 
  
 
==References==
 
==References==
* TBD
 
  
 
<hr>
 
<hr>

Revision as of 00:13, 2 March 2008

General Advance-Placement (AP) Statistics Curriculum - Means of Several Independent Samples

In this section we extend the multi-sample inference which we discussed in the ANOVA section, to the situation where the ANOVA assumptions are invalid. Hence we use a non-parametric analysis to study differences in centrality between two or more populations.

Motivational Example

Suppose four groups of students were randomly assigned to be taught with four different techniques, and their achievement test scores were recorded. Are the distributions of test scores the same, or do they differ in location? The data is presented in the table below.

Teaching Method
Method 1 Method 2 Method 3 Method 4
Index 65 75 59 94
87 69 78 89
73 83 67 80
79 81 62 88

The small sample sizes, and the lack of information about the distribution of each of the four samples, imply that ANOVA may not be appropriate for analyzing these data.

The Kruskal-Wallis Test

Kruskal-Wallis one-way analysis of variance by ranks is a non-parametric method for testing equality of two or more population medians. Intuitively, it is identical to a one-way analysis of variance with the raw data (observed measurements) replaced by their ranks.

Since it is a non-parametric method, the Kruskal-Wallis test does not assume a normal population, unlike the analogous one-way ANOVA. However, the test does assume identically-shaped distributions for all groups, except for any difference in their centers (e.g., medians).

Calculations

  1. Rank all data from all groups together; i.e., rank the data from 1 to N ignoring group membership. Assign any tied values the average of the ranks they would have received had they not been tied.
  2. The test statistic is given by:

\[K = (N-1)\frac{\sum_{i=1}^g n_i(\bar{r}_{i\cdot} - \bar{r})^2}{\sum_{i=1}^g\sum_{j=1}^{n_i}(r_{ij} - \bar{r})^2}\], where:

    • \(n_g\) is the number of observations in group \(g\)
    • \(r_{ij}\) is the rank (among all observations) of observation j from group i
    • \(N\) is the total number of observations across all groups
    • \(\bar{r}_{i\cdot} = \frac{\sum_{j=1}^{n_i}{r_{ij}}}{n_i}\),
    • \(\bar{r} =(N+1)/2\) is the average of all the \(r_{ij}\).
    • Notice that the denominator of the expression for \(K\) is exactly \((N-1)N(N+1)/12\). Thus \(K = \frac{12}{N(N+1)}\sum_{i=1}^g n_i(\bar{r}_{i\cdot} - \bar{r})^2\).
  1. A correction for ties can be made by dividing \(K\) by \(1 - \frac{\sum_{i=1}^G (t_{i}^3 - t_{i})}{N^3-N}\), where G is the number of groupings of different tied ranks, and ti is the number of tied values within group i that are tied at a particular value. This correction usually makes little difference in the value of K unless there are a large number of ties.
  2. Finally, the p-value is approximated by \(\Pr(\chi^2_{g-1} \ge K)\). If some ni's are small (i.e., less than 5) the probability distribution of K can be quite different from this Chi-square distribution.

The null hypothesis of equal population medians would then be rejected if \(K \ge \chi^2_{\alpha: g-1}\).

The Kruskal-Wallis Test using SOCR Analyses

It is much quicker to use SOCR Analyses to compute the statistical significance of this test. This SOCR KruskalWallis Test activity may also be helpful in understanding how to use this test in SOCR.

For the teaching-methods example above, we can easily compute the statistical significance of the differences between the group medians (centers):

SOCR EBook Dinov KruskalWallis 030108 Fig1.jpg

Clearly, there are significant differences between the group medians, even after the multiple testing correction, all groups appear different from each other.

Group Method1 vs. Group Method2: 1.0 < 5.2056
Group Method1 vs. Group Method3: 4.0 < 5.2056
Group Method1 vs. Group Method4: 6.0 > 5.2056
Group Method2 vs. Group Method3: 5.0 < 5.2056
Group Method2 vs. Group Method4: 5.0 < 5.2056
Group Method3 vs. Group Method4: 10.0 > 5.2056

Practice Examples

TBD

Notes


References




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif