Difference between revisions of "AP Statistics Curriculum 2007 NonParam 2MedianIndep"

From SOCR
Jump to: navigation, search
m (Text replacement - "{{translate|pageName=http://wiki.stat.ucla.edu/socr/" to ""{{translate|pageName=http://wiki.socr.umich.edu/")
 
(14 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] -  Difference of Medians of Two Independent Samples==
 
==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] -  Difference of Medians of Two Independent Samples==
  
As we discusse in the [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair | paired case]], non-parametric statistical methods provide alternatives to the (standard) [[EBook#Chapter_VIII:_Hypothesis_Testing | parametric tests that we saw earlier]], and they are applicable when the distribution of the data is unknown.
+
As we discussed in the [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair | paired case]], non-parametric statistical methods provide alternatives to the (standard) [[EBook#Chapter_VIII:_Hypothesis_Testing | parametric tests that we saw earlier]], and they are applicable when the distribution of the data is unknown.
  
==Motivational Example==
+
===Motivational Example===
Nine observations of surface soil [http://en.wikipedia.org/wiki/PH pH] were made at two different (independent) locations.  Does the data suggest that the true mean soil [http://en.wikipedia.org/wiki/PH pH] values differ for the two locations?  Note that there is no pairing in this design, even though this is a balanced design with 9 observation in each (independent) group. Test using <math>\alpha = 0.05</math>, and be sure to check any necessary assumptions for the validity of your test.
+
Nine observations of surface soil [http://en.wikipedia.org/wiki/PH pH] were made at two different (independent) locations.  Does the data suggest that the true mean soil [http://en.wikipedia.org/wiki/PH pH] values differs for the two locations?  Note that there is no pairing in this design, even though this is a balanced design with 9 observations in each (independent) group. Test using <math>\alpha = 0.05</math>, and be sure to check any necessary assumptions for the validity of your test.
  
 
<center>
 
<center>
Line 31: Line 31:
 
</center>
 
</center>
  
We see the clear analogy of this study design to the [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep |independent 2-sample designs]] we saw before. However, if we were to plot these data we can see that their distributions may be different or not even symmetric, unimodal and bell-shaped (i.e., not Normal). Therefore, we can not use the [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep |independent T-test]] to test a Null-hypothesis that the centers of the two distributions (that the 2 samples came from) are identical, using this parametric test.
+
We see the clear analogy of this study design to the [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep |independent 2-sample designs]] we saw before. However, if we were to plot these data we can see that their distributions may be different or not even symmetric, unimodal and bell-shaped (i.e., not Normal). Therefore, we cannot use the [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep |Independent T-Test]] to test a Null-hypothesis that the centers of the two distributions (that the 2 samples came from) are identical, using this parametric test.
  
 
<center>[[Image:SOCR_EBook_Dinov_NonParam_Wilcoxon_022408_Fig1.jpg|600px]]
 
<center>[[Image:SOCR_EBook_Dinov_NonParam_Wilcoxon_022408_Fig1.jpg|600px]]
 
[[Image:SOCR_EBook_Dinov_NonParam_Wilcoxon_022408_Fig2.jpg|600px]]</center>
 
[[Image:SOCR_EBook_Dinov_NonParam_Wilcoxon_022408_Fig2.jpg|600px]]</center>
  
The first of these two figures shows the index plot of the pH levels for both samples. The second figure shows the sample histograms of these samples, which are clearly not Normal-like. Therefore, the [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep |independent T-test]] would not be appropriate to analyze these data.
+
The first figure shows the index plot of the pH levels for both samples. The second figure shows the sample histograms of these samples, which are clearly not Normal-like. Therefore, the [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep |Independent T-Test]] would not be appropriate to analyze these data.
  
Intuitively, we may consider these group differences significantly large, aspecially if we look at the Box-and-whisker plots, but this is a qualitative inference that demands a more quantitative statistical analyses that can back up our intuition.
+
Intuitively, we may consider these group differences significantly large, especially if we look at the [[SOCR_EduMaterials_Activities_BoxPlot | Box-and-Whisker Plots]], but this is a qualitative inference that demands a more quantitative statistical analysis that can back up our intuition.
  
 
<center>[[Image:SOCR_EBook_Dinov_NonParam_Wilcoxon_022408_Fig3.jpg|600px]]</center>
 
<center>[[Image:SOCR_EBook_Dinov_NonParam_Wilcoxon_022408_Fig3.jpg|600px]]</center>
  
 
==The Wilcoxon-Mann-Whitney Test==
 
==The Wilcoxon-Mann-Whitney Test==
The '''sign test''' is a non-parametric alternative to the [[AP_Statistics_Curriculum_2007_Infer_2Means_Dep | one-sample and paired T-test]]. The sign test has no requirements for the data to be Normally distributed. It assigns a positive (+) or negative (-) sign to each observation according to whether it is greater or less than some hypothesized value. Then it measures the difference between the <math>\pm</math> signs and how distinct is this difference from what we would expect to observe by chance alone. For example, if there were no effect of developing acute renal failure on the outcome from sepsis, about half of the 16 studies above would be expected to have a relative risk less than 1.0 (a "-" sign) and the remaining 8 would be expected to have a relative risk greater than 1.0 (a "+" sign). In the actual data, 3 studies had "-" signs and the remaining 13 studies had "+" signs. Intuitively, this difference of 10 appears large to be simply due to random variation. If so, the effect of developing acute renal failure would be significant on the outcome from sepsis.
+
The Wilcoxon-Mann-Whitney (WMW) Test (also known as Mann-Whitney U Test, Mann-Whitney-Wilcoxon Test, or Wilcoxon Rank-Sum Test) is a non-parametric test for assessing whether two samples come from the same distribution. The null hypothesis is that the two samples are drawn from a single population, and therefore that their probability distributions are equal. It requires that the two samples are independent, and that the observations are [[AP_Statistics_Curriculum_2007#Types_of_Data |ordinal]] or continuous measurements.
  
 +
===Calculations===
 +
The ''U'' statistic for the WMW test may be approximated for sample sizes above about 20 using the [[AP_Statistics_Curriculum_2007#Chapter_V:_Normal_Probability_Distribution |Normal Distribution]]. 
  
===Calculations===
+
The ''U'' test is provided as part of [http://socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] - [[SOCR_EduMaterials_AnalysisActivities_Wilcoxon |see this activity]].
Suppose N+ is the number of "+" signs and  we fix a significance level of <math>\alpha= 0.05</math>. And consider the following two hypotheses:
+
 
 +
For small samples, we can directly compute the WMW test-statistic as follows.
 +
 
 +
# Choose the sample for which the ranks seem to be smaller. Call this ''Sample 1'', and call the other sample ''Sample 2.''
 +
 
 +
# Taking each observation in ''Sample 2'', count the number of observations in ''Sample 1'' that are smaller than it (count 0.5 for any that are equal to it).
  
: <math>H_o: N_+=8</math> (equivalent to <math>N_-=8</math>):  The effect of developing acute renal failure is not significant on the outcome from sepsis.
+
# The total of these counts is ''U''.
: <math>H_1: N_+ \not=8</math>:  The effect of developing acute renal failure is significant on the outcome from sepsis.
 
  
Define the following test-statistics
+
For larger samples, a formula can be used:
:<math>B_s = \max{(N_+ , N_-)}</math>, where <math>N_+</math> and <math>N_-</math> are the number of positive and negative signs, respectively.
+
# Arrange all the observations into a single ranked series. That is, rank all the observations without regard to which sample they come from.
  
Then the distribution of <math>B_s \sim Binomial(n=16, p=8/16=0.5)</math>.
+
# Add up the ranks in ''Sample 1''.  The sum of ranks in ''Sample 2'' follows by calculation, since the sum of all the ranks equals ''N'' (''N''&nbsp;+&nbsp;1)/2, where ''N'' is the total number of observations.
  
For our data, <math>B_s = \max{(N_+ , N_-)}=\max{13,3}=13</math> and the probability that such [[AP_Statistics_Curriculum_2007_Distrib_Binomial |binomial variable]] exceeds 13 is <math>P(Bin(16,0.5,13))=0.010635</math>. Therefore, we can reject the null hypothesis <math>H_o</math> and regard as significant the effect of developing acute renal failure on the outcome from sepsis.
+
# "U" is then given by:
  
<center>[[Image:SOCR_EBook_Dinov_NonParam_SignTest_022308_Fig2.jpg|600px]]</center>
+
:<math>U_1=R_1 -  {n_1(n_1+1) \over 2}  \,\!,</math>
  
===The Sign test using SOCR Analyses===
+
where ''n''<sub>1</sub> is the two sample size for ''Sample 1'', and ''R''<sub>1</sub> is the sum of the ranks in ''Sample 1''.
It is much quicker to use [http://socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] to compute the statistical significance of the sign test. This [[SOCR_EduMaterials_AnalysisActivities_TwoPairedSign | SOCR Sign test activity]] may also be helpful in understanding how to use the sign test method in SOCR.
 
  
===Example===
+
:Note that there is no specification as to which sample is considered ''Sample 1''.  An equally valid formula for ''U'' is
A set of 12 identical twins are given psychological tests to determine whether the ''first born'' of the set tends to be more aggressive than the ''second born''.  Each twin is scored according to aggressiveness; a higher score indicates greater aggressiveness. Because of the natural pairing in a set of twins these data can be considered paired. 
+
::<math>U_2=R_2 - {n_2(n_2+1) \over 2}. \,\!</math>
  
<center>
+
:The sum of the two values is then given by
{| class="wikitable" style="text-align:center; width:40%" border="1"
+
::<math>U_1 + U_2 = R_1 - {n_1(n_1+1) \over 2} + R_2 - {n_2(n_2+1) \over 2}. \,\!</math>
|-
 
| Twin-Index || 1<sup>st</sup> Born || 2<sup>nd</sup> Born || Sign
 
|-
 
| 1 || 86 || 88 || -
 
|-
 
| 2 || 71 || 77 || -
 
|-
 
| 3 || 77 || 76 || +
 
|-
 
| 4 || 68 || 64 || +
 
|-
 
| 5 || 91 || 96 || -
 
|-
 
| 6 || 72 || 72 || 0 (Drop)
 
|-
 
| 7 || 77 || 65 || +
 
|-
 
| 8 || 91 || 90 || +
 
|-
 
| 9 || 70 || 65 || +
 
|-
 
| 10 || 71 || 80 || -
 
|-
 
| 11 || 88 || 81 || +
 
|-
 
| 12 || 87 || 72 || +
 
|}
 
</center>
 
  
We first plot the data using [[SOCR_EduMaterials_Activities_LineChart | the SOCR Line Chart]]. Visually there does not seem to be a strong effect of the order of birth on baby's aggression.
+
: Knowing that ''R''<sub>1</sub>&nbsp;+&nbsp;''R''<sub>2</sub> = ''N''(''N''&nbsp;+&nbsp;1)/2 and ''N'' = ''n''<sub>1</sub>&nbsp;+&nbsp;''n''<sub>2</sub>&nbsp;, and doing some algebra, we find that the sum is
 +
::<math>U_1 + U_2 = n_1 n_2. \,\!</math>
  
<center>[[Image:SOCR_EBook_Dinov_NonParam_SignTest_022308_Fig3.jpg|600px]]</center>
+
The maximum value of ''U'' is the product of the sample sizes for the two samples.  In such a case, the "other" ''U'' would be 0.
  
Next we can use the [[SOCR_EduMaterials_AnalysisActivities_TwoPairedSign | SOCR Sign Test Analysis]] to quantitatively evaluate the evidence to reject the null hypothesis that there is no birth-order effect on baby's aggressiveness.
+
* [http://socr.umich.edu/Applets/WilcoxonRankSumTable.html SOCR WMW Critical scores Table]. These critical values for the WMW test-statistics are computed using the following R-script
 +
 +
cat("n1", " n2 ")
 +
for (a in c(1,5,10,100,200)) cat(a/2000, " ")
 +
for (x in 4:20) {
 +
for (y in 1:20) {
 +
cat(x,y, "")
 +
for (a in c(1,5,10,100,500)) {
 +
cat(if (qwilcox(a/2000,x,y,lower.tail = TRUE,
 +
                        log.p = FALSE)-1>=0) qwilcox(a/2000,x,y,lower.tail = TRUE,
 +
                        log.p = FALSE)-1 else 0, " ")
 +
}
 +
cat("\n")
 +
}
 +
}
  
<center>[[Image:SOCR_EBook_Dinov_NonParam_SignTest_022308_Fig4.jpg|600px]]</center>
+
Note that if F(x) denotes the CDF of the Wilcoxon-Mann-Whitney $U$ statistic, the R-function ''qwilcox'' computed the quantile function $Q(α)=\inf\{x∈N:F(x)≥α\}, α∈(0,1)$, for $U$. Since U is a discrete variable, for a given probability $\alpha$, we can't always find a critical value $x$ corresponding to $\alpha$, i.e., $F(x)=α$. However, there will be a minimum $x$, such that $F(x)= F(Q(α))>α$. If $C(α)$ denotes the critical value for the WMW test, $F(C(α))≤α$, to preserve the false-positive error rate of the test. Thus, $C(α)=\sup\{x∈N:F(x)≤α\},α∈(0,1)$ and if there exists a value $x$ such that $F(x)=α$, then, $C(α)=Q(α)$, otherwise, $C(α)=Q(α)−1$.
  
Clearly the p-value reported is 0.274, and our data can not reject the null hypothesis.  
+
===The Wilcoxon-Mann-Whitney Test Using SOCR Analyses===
 +
It is much quicker to use [http://socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] to compute the statistical significance of the sign test. This [[SOCR_EduMaterials_AnalysisActivities_Wilcoxon | SOCR Wilcoxon-Mann-Whitney Test Activity]] may also be helpful in understanding how to use this test in SOCR. We have the following results for the pH data above:
  
==The Wilcoxon signed rank test==
+
<center>[[Image:SOCR_EBook_Dinov_NonParam_Wilcoxon_022408_Fig4.jpg|600px]]</center>
  
Like the [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair#The_Sign-Test | sign test]] and the [[AP_Statistics_Curriculum_2007_Hypothesis_S_Mean | T-test]], the [http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test Wilcoxon signed rank test] involves comparisons of differences between measurements. It requires that the data are measured at an interval level of measurement, but does not require assumptions about the form of the distribution of the measurements. It should therefore be used whenever the distributional assumptions of the T-test are not satisfied.
+
Clearly the p-value < 0.05, and therefore our data provides sufficient evidence to reject the null hypothesis. So we assumes that there were significant pH differences between the two soil lots tested in this experiment.
 +
: One-Sided P-Value for Sample2 < Sample1: 0.00040
  
===Example===
+
: Two-Sided P-Value for Sample1 not equal to Sample2: 0.00079
[[AP_Statistics_Curriculum_2007_NonParam_2MedianPair#References | Whitley and Ball reported]] data on the '''central venous oxygen saturation''' (SvO<sub>2</sub> (%)) from 10 consecutive patients at 2 time points; at admission and 6 hours after admission to the intensive care unit (ICU). The null hypothesis is that there is no effect of 6 hours of ICU treatment on SvO<sub>2</sub>. Under the null hypothesis, the mean of the differences between SvO<sub>2</sub> at admission and that at 6 hours after admission should be zero.  
 
  
<center>
+
==WMW Test vs. [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep |Independent T-Test]]==
{| class="wikitable" style="text-align:center; width:40%" border="1"
+
Both types of tests answer the same question, but treat data differently.  
|-
+
*The WMW test uses rank ordering
| '''Patient''' || '''On Admission''' || '''At 6 Hours''' || '''Difference''' || '''Rank'''
+
: Positive: Doesn’t depend on normality or population parameters
|-
+
: Negative: Distribution free lacks power because it doesn't use all the info in the data
| 2 || 59.1 || 56.7 || -2.4 || 1
 
|-
 
| 7 || 58.2 || 60.7 || 2.5 || 2
 
|-
 
| 9 || 56.0 || 59.5 || 3.5 || 3
 
|-
 
| 10 || 65.3 || 59.8 || -5.5 || 4
 
|-
 
| 3 || 56.1 || 61.9 || 5.8 || 5
 
|-
 
| 5 || 60.6 || 67.7 || 7.1 || 6
 
|-
 
| 6 || 37.8 || 50.0 || 12.2 || 7
 
|-
 
| 1 || 39.7 || 52.9 || 13.2 || 8
 
|-
 
| 4 || 57.7 || 71.4 || 13.7 || 9
 
|-
 
| 8 || 33.6 || 51.3 || 17.7 || 10
 
|}
 
</center>
 
  
<center>[[Image:SOCR_EBook_Dinov_NonParam_SignTest_022308_Fig5.jpg|600px]]</center>
+
* The T-test uses the raw measurements
 +
: Positive: Incorporates all of the data into calculations
 +
: Negative: Must meet normality assumption
 +
 +
* Neither test is uniformly superior. If the data are normally distributed we use the T-test. If the data are not normal use the WMW test.
  
Clearly, we can reject the null-hypothesys at <math>\alpha=0.05</math>, as the one- and two-sided alternative hypotheses p-values for the [http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test Wilcoxon signed rank test] reported by the [[SOCR_EduMaterials_AnalysisActivities_TwoPairedRank | SOCR Analysis]] are respectively
+
==Practice Examples==
: One-Sided p-value = 0.011
 
: Two-Sided p-value = 0.022
 
  
==Practice Problems==
+
===Urinary Fluoride Concentration in Cattle===
Suppose 10 randomly selected rats were chosen to see if they could be trained to escape a maze.  The rats were released and timed (sec.) before and after 2 weeks of training (N means the rat did not complete the maze-test).  Do the data provide evidence to suggest that the escape time of rats is different after 2 weeks of training?  Test using  <math>\alpha= 0.05</math>.  
+
The urinary fluoride concentration (ppm) was measured both for a sample of livestock grazing in an area previously exposed to fluoride pollution and also for a similar sample of livestock grazing in an unpolluted area.
  
 
<center>
 
<center>
{| class="wikitable" style="text-align:center; width:40%" border="1"
+
{| class="wikitable" style="text-align:center; width:35%" border="1"
|-
 
| '''Rat''' || '''Before''' || '''After''' || '''Sign'''
 
|-
 
| 1 || 100 || 50 || +
 
|-
 
| 2 || 38 || 12 || +
 
 
|-
 
|-
| 3 || N || 45 || +
+
| '''Polluted''' || '''Unpolluted'''
 
|-
 
|-
| 4 || 122 || 62 || +
+
| 21.3 || 10.1
 
|-
 
|-
| 5 || 95 || 90 || +
+
| 18.7 || 18.3
 
|-
 
|-
| 6 || 116 || 100 || +
+
| 21.4 || 17.2
 
|-
 
|-
| 7 || 56 || 75 || -
+
| 17.1 || 18.4
 
|-
 
|-
| 8 || 135 || 52 || +
+
| 11.1 || 20.0
 
|-
 
|-
| 9 || 104 || 44 || +
+
| 20.9 ||  
 
|-
 
|-
| 10 || N || 50 || +
+
| 19.7 ||  
 
|}
 
|}
 
</center>
 
</center>
  
 
==References==
 
==References==
* Whitley, E. and Ball, J. (2002) [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=153434 Statistics review 6: Nonparametric methods]. Critical Care, 6(6): 509–513.
 
  
 
<hr>
 
<hr>
 
* SOCR Home page: http://www.socr.ucla.edu
 
* SOCR Home page: http://www.socr.ucla.edu
  
{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=AP_Statistics_Curriculum_2007_NonParam_2MedianIndep}}
+
"{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=AP_Statistics_Curriculum_2007_NonParam_2MedianIndep}}

Latest revision as of 12:24, 3 March 2020

General Advance-Placement (AP) Statistics Curriculum - Difference of Medians of Two Independent Samples

As we discussed in the paired case, non-parametric statistical methods provide alternatives to the (standard) parametric tests that we saw earlier, and they are applicable when the distribution of the data is unknown.

Motivational Example

Nine observations of surface soil pH were made at two different (independent) locations. Does the data suggest that the true mean soil pH values differs for the two locations? Note that there is no pairing in this design, even though this is a balanced design with 9 observations in each (independent) group. Test using \(\alpha = 0.05\), and be sure to check any necessary assumptions for the validity of your test.

Location 1 Location 2
8.10 7.85
7.89 7.30
8.00 7.73
7.85 7.27
8.01 7.58
7.82 7.27
7.99 7.50
7.80 7.23
7.93 7.41

We see the clear analogy of this study design to the independent 2-sample designs we saw before. However, if we were to plot these data we can see that their distributions may be different or not even symmetric, unimodal and bell-shaped (i.e., not Normal). Therefore, we cannot use the Independent T-Test to test a Null-hypothesis that the centers of the two distributions (that the 2 samples came from) are identical, using this parametric test.

SOCR EBook Dinov NonParam Wilcoxon 022408 Fig1.jpg SOCR EBook Dinov NonParam Wilcoxon 022408 Fig2.jpg

The first figure shows the index plot of the pH levels for both samples. The second figure shows the sample histograms of these samples, which are clearly not Normal-like. Therefore, the Independent T-Test would not be appropriate to analyze these data.

Intuitively, we may consider these group differences significantly large, especially if we look at the Box-and-Whisker Plots, but this is a qualitative inference that demands a more quantitative statistical analysis that can back up our intuition.

SOCR EBook Dinov NonParam Wilcoxon 022408 Fig3.jpg

The Wilcoxon-Mann-Whitney Test

The Wilcoxon-Mann-Whitney (WMW) Test (also known as Mann-Whitney U Test, Mann-Whitney-Wilcoxon Test, or Wilcoxon Rank-Sum Test) is a non-parametric test for assessing whether two samples come from the same distribution. The null hypothesis is that the two samples are drawn from a single population, and therefore that their probability distributions are equal. It requires that the two samples are independent, and that the observations are ordinal or continuous measurements.

Calculations

The U statistic for the WMW test may be approximated for sample sizes above about 20 using the Normal Distribution.

The U test is provided as part of SOCR Analyses - see this activity.

For small samples, we can directly compute the WMW test-statistic as follows.

  1. Choose the sample for which the ranks seem to be smaller. Call this Sample 1, and call the other sample Sample 2.
  1. Taking each observation in Sample 2, count the number of observations in Sample 1 that are smaller than it (count 0.5 for any that are equal to it).
  1. The total of these counts is U.

For larger samples, a formula can be used:

  1. Arrange all the observations into a single ranked series. That is, rank all the observations without regard to which sample they come from.
  1. Add up the ranks in Sample 1. The sum of ranks in Sample 2 follows by calculation, since the sum of all the ranks equals N (N + 1)/2, where N is the total number of observations.
  1. "U" is then given by:

\[U_1=R_1 - {n_1(n_1+1) \over 2} \,\!,\]

where n1 is the two sample size for Sample 1, and R1 is the sum of the ranks in Sample 1.

Note that there is no specification as to which sample is considered Sample 1. An equally valid formula for U is
\[U_2=R_2 - {n_2(n_2+1) \over 2}. \,\!\]
The sum of the two values is then given by
\[U_1 + U_2 = R_1 - {n_1(n_1+1) \over 2} + R_2 - {n_2(n_2+1) \over 2}. \,\!\]
Knowing that R1 + R2 = N(N + 1)/2 and N = n1 + n2 , and doing some algebra, we find that the sum is
\[U_1 + U_2 = n_1 n_2. \,\!\]

The maximum value of U is the product of the sample sizes for the two samples. In such a case, the "other" U would be 0.

cat("n1", " n2 ") 
for (a in c(1,5,10,100,200)) cat(a/2000, " ")
for (x in 4:20) {
	for (y in 1:20) {
		cat(x,y, "")
		for (a in c(1,5,10,100,500)) {
			cat(if (qwilcox(a/2000,x,y,lower.tail = TRUE, 
                        log.p = FALSE)-1>=0) qwilcox(a/2000,x,y,lower.tail = TRUE,
                        log.p = FALSE)-1 else 0, " ")
		}
		cat("\n")
	}
}

Note that if F(x) denotes the CDF of the Wilcoxon-Mann-Whitney $U$ statistic, the R-function qwilcox computed the quantile function $Q(α)=\inf\{x∈N:F(x)≥α\}, α∈(0,1)$, for $U$. Since U is a discrete variable, for a given probability $\alpha$, we can't always find a critical value $x$ corresponding to $\alpha$, i.e., $F(x)=α$. However, there will be a minimum $x$, such that $F(x)= F(Q(α))>α$. If $C(α)$ denotes the critical value for the WMW test, $F(C(α))≤α$, to preserve the false-positive error rate of the test. Thus, $C(α)=\sup\{x∈N:F(x)≤α\},α∈(0,1)$ and if there exists a value $x$ such that $F(x)=α$, then, $C(α)=Q(α)$, otherwise, $C(α)=Q(α)−1$.

The Wilcoxon-Mann-Whitney Test Using SOCR Analyses

It is much quicker to use SOCR Analyses to compute the statistical significance of the sign test. This SOCR Wilcoxon-Mann-Whitney Test Activity may also be helpful in understanding how to use this test in SOCR. We have the following results for the pH data above:

SOCR EBook Dinov NonParam Wilcoxon 022408 Fig4.jpg

Clearly the p-value < 0.05, and therefore our data provides sufficient evidence to reject the null hypothesis. So we assumes that there were significant pH differences between the two soil lots tested in this experiment.

One-Sided P-Value for Sample2 < Sample1: 0.00040
Two-Sided P-Value for Sample1 not equal to Sample2: 0.00079

WMW Test vs. Independent T-Test

Both types of tests answer the same question, but treat data differently.

  • The WMW test uses rank ordering
Positive: Doesn’t depend on normality or population parameters
Negative: Distribution free lacks power because it doesn't use all the info in the data
  • The T-test uses the raw measurements
Positive: Incorporates all of the data into calculations
Negative: Must meet normality assumption
  • Neither test is uniformly superior. If the data are normally distributed we use the T-test. If the data are not normal use the WMW test.

Practice Examples

Urinary Fluoride Concentration in Cattle

The urinary fluoride concentration (ppm) was measured both for a sample of livestock grazing in an area previously exposed to fluoride pollution and also for a similar sample of livestock grazing in an unpolluted area.

Polluted Unpolluted
21.3 10.1
18.7 18.3
21.4 17.2
17.1 18.4
11.1 20.0
20.9
19.7

References


"-----


Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif