Difference between revisions of "AP Statistics Curriculum 2007 Infer 2Proportions"
(→Testing for equality of Two Proportions) |
m (Text replacement - "{{translate|pageName=http://wiki.stat.ucla.edu/socr/" to ""{{translate|pageName=http://wiki.socr.umich.edu/") |
||
(12 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] - Inferences about Two Proportions == | ==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] - Inferences about Two Proportions == | ||
− | === Testing for | + | === Testing for Equality of Two Proportions=== |
− | Suppose we have two populations and we are interested in estimating whether the proportions of subjects that have certain characteristic of interest (e.g., fixed gender) in each population are equal. To make this inference we obtain two samples {<math>X_1, X_2, X_3, \cdots, X_n</math>} and {<math>Y_1, Y_2, Y_3, \cdots, Y_k</math>}, where each <math>X_i</math> and <math>Y_i</math> represents whether the ''i<sup>th</sup>'' observation in the sample | + | Suppose we have two populations and we are interested in estimating whether the proportions of subjects that have certain characteristic of interest (e.g., fixed gender) in each population are equal. To make this inference we obtain two samples {<math>X_1, X_2, X_3, \cdots, X_n</math>} and {<math>Y_1, Y_2, Y_3, \cdots, Y_k</math>}, where each <math>X_i</math> and <math>Y_i</math> represents whether the ''i<sup>th</sup>'' observation in the sample has the characteristic of interest. That is |
: <math>X_i = \begin{cases}0,& \texttt{Characteristic-absent},\\ | : <math>X_i = \begin{cases}0,& \texttt{Characteristic-absent},\\ | ||
1,& \texttt{Characteristic-present}.\end{cases}</math> and <math>Y_i = \begin{cases}0,& \texttt{Characteristic-absent},\\ | 1,& \texttt{Characteristic-present}.\end{cases}</math> and <math>Y_i = \begin{cases}0,& \texttt{Characteristic-absent},\\ | ||
1,& \texttt{Characteristic-present}.\end{cases}</math> | 1,& \texttt{Characteristic-present}.\end{cases}</math> | ||
− | Since the raw sample proportions of observations having the characteristic of interest are <math>\hat{p_x}={1 \over n}\sum_{i=1}^n{x_i}</math> and <math>\hat{p_y}={1 \over k}\sum_{i=1}^k{y_i}</math> | + | Since the ''raw sample proportions'' of observations having the characteristic of interest are |
+ | : <math>\hat{p_x}={1 \over n}\sum_{i=1}^n{x_i}</math> and <math>\hat{p_y}={1 \over k}\sum_{i=1}^k{y_i}</math> | ||
− | : <math>\tilde{p_x}={\sum_{i=1}^n{x_i}+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},</math> | + | The ''corrected sample proportions'' (for small samples) are |
− | + | : <math>\tilde{p_x}={\sum_{i=1}^n{x_i}+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},</math> and <math>\tilde{p_y}={\sum_{i=1}^k{y_i}+0.5z_{\alpha \over 2}^2 \over k+z_{\alpha \over 2}^2},</math> | |
where [[AP_Statistics_Curriculum_2007_Normal_Critical | <math>z_{\alpha \over 2}</math> is the normal critical value we saw earlier]]. | where [[AP_Statistics_Curriculum_2007_Normal_Critical | <math>z_{\alpha \over 2}</math> is the normal critical value we saw earlier]]. | ||
− | + | By the independence of the samples, the standard error of the difference of the two proportion estimates is: | |
− | : | ||
− | === Hypothesis Testing the | + | : '''Raw proportions''': <math>SE_{\hat{p_x}-\hat{p_y}} = \sqrt{SE_{\hat{p_x}}^2 + SE_{\hat{p_y}}^2}= \sqrt{ {\hat{p_x}(1-\hat{p_x})\over n} + {\hat{p_y}(1-\hat{p_y})\over k}}.</math> |
+ | |||
+ | : '''Corrected Proportions''': <math>SE_{\tilde{p_x}-\tilde{p_y}} = \sqrt{SE_{\tilde{p_x}}^2 + SE_{\tilde{p_y}}^2}= \sqrt{ {\tilde{p_x}(1-\tilde{p_x})\over n+z_{\alpha \over 2}^2} + {\tilde{p_y}(1-\tilde{p_y})\over k+z_{\alpha \over 2}^2}}.</math> | ||
+ | |||
+ | === Hypothesis Testing the Difference of Two Proportions=== | ||
* Null Hypothesis: <math>H_o: p_x=p_y</math>, where <math>p_x</math> and <math>p_x</math> are the sample population proportions of interest. | * Null Hypothesis: <math>H_o: p_x=p_y</math>, where <math>p_x</math> and <math>p_x</math> are the sample population proportions of interest. | ||
* Alternative Research Hypotheses: | * Alternative Research Hypotheses: | ||
− | ** One sided (uni-directional): <math>H_1: p_x > p_y</math>, or <math> | + | ** One sided (uni-directional): <math>H_1: p_x > p_y</math>, or <math>H_1: p_x < p_y</math> |
** Double sided: <math>H_1: p_x \not= p_y</math> | ** Double sided: <math>H_1: p_x \not= p_y</math> | ||
− | * Test Statistics: <math>Z_o={\tilde{p_x} - \tilde{p_y} \over SE_{\tilde{p_x} | + | * Test Statistics: <math>Z_o={\tilde{p_x} - \tilde{p_y} \over SE_{(\tilde{p_x}-\tilde{p_y})}} \sim N(0,1)</math> |
===Genders of Siblings Example=== | ===Genders of Siblings Example=== | ||
− | Is the gender of a second child influenced by the gender of the first child, in families with >1 | + | Is the gender of a second child influenced by the gender of the first child, in families with >1 child? Research hypothesis needs to be formulated first before collecting/looking/interpreting the data that will be used to address it. Mothers whose 1<sup>st</sup> child is a girl are more likely to have a girl, as a second child, compared to mothers with boys as 1<sup>st</sup> child. Data: 20 yrs of birth records of 1 Hospital in Auckland, New Zealand. |
<center> | <center> | ||
{| class="wikitable" style="text-align:center; width:75%" border="1" | {| class="wikitable" style="text-align:center; width:75%" border="1" | ||
Line 62: | Line 66: | ||
* Test Statistics: <math>Z_o = {Estimate-HypothesizedValue\over SE(Estimate)} = {\hat{p}_1 - \hat{p}_2 - 0 \over SE(\hat{p}_1 - \hat{p}_2)} = {\hat{p}_1 - \hat{p}_2 - 0 \over \sqrt{{\hat{p}_1(1-\hat{p}_1)\over n_1} + {\hat{p}_2(1-\hat{p}_2)\over n_2}}} \sim N(0,1)</math> and <math>Z_o=5.4996</math>. | * Test Statistics: <math>Z_o = {Estimate-HypothesizedValue\over SE(Estimate)} = {\hat{p}_1 - \hat{p}_2 - 0 \over SE(\hat{p}_1 - \hat{p}_2)} = {\hat{p}_1 - \hat{p}_2 - 0 \over \sqrt{{\hat{p}_1(1-\hat{p}_1)\over n_1} + {\hat{p}_2(1-\hat{p}_2)\over n_2}}} \sim N(0,1)</math> and <math>Z_o=5.4996</math>. | ||
− | * <math> | + | * <math>P_{value} = P(Z>Z_o)< 1.9\times 10^{-8}</math>. This small p-values provides extremely strong evidence to reject the null hypothesis that there are no differences between the proportions of mothers that had a girl as a second child but had either boy or girl as their first child. Hence there is strong statistical evidence implying that genders of siblings are not independent. |
− | * | + | * Practical significance: The practical significance of the effect (of the gender of the first child on the gender of the second child, in this case) can only be assessed using [[AP_Statistics_Curriculum_2007#Estimating_a_Population_Proportion |confidence intervals]]. A 95% <math>CI (p_1- p_2) =[0.033; 0.070]</math> is computed by <math>p_1-p_2 \pm 1.96 SE(p_1 - p_2)</math>. Clearly, this is a practically negligible effect and no reasonable person would make important prospective family decisions based on the gender of their (first) child. |
* This [[SOCR_EduMaterials_AnalysisActivities_Chi_Contingency | SOCR Analysis Activity]] illustrates how to use the [http://socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] to compute the p-values and answer the hypothesis testing challenge. | * This [[SOCR_EduMaterials_AnalysisActivities_Chi_Contingency | SOCR Analysis Activity]] illustrates how to use the [http://socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] to compute the p-values and answer the hypothesis testing challenge. | ||
Line 71: | Line 75: | ||
<hr> | <hr> | ||
− | === | + | ===[[EBook_Problems_Infer_2Proportions|Problems]]=== |
<hr> | <hr> | ||
* SOCR Home page: http://www.socr.ucla.edu | * SOCR Home page: http://www.socr.ucla.edu | ||
− | {{translate|pageName=http://wiki. | + | "{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=AP_Statistics_Curriculum_2007_Infer_2Proportions}} |
Latest revision as of 11:47, 3 March 2020
Contents
General Advance-Placement (AP) Statistics Curriculum - Inferences about Two Proportions
Testing for Equality of Two Proportions
Suppose we have two populations and we are interested in estimating whether the proportions of subjects that have certain characteristic of interest (e.g., fixed gender) in each population are equal. To make this inference we obtain two samples {\(X_1, X_2, X_3, \cdots, X_n\)} and {\(Y_1, Y_2, Y_3, \cdots, Y_k\)}, where each \(X_i\) and \(Y_i\) represents whether the ith observation in the sample has the characteristic of interest. That is \[X_i = \begin{cases}0,& \texttt{Characteristic-absent},\\ 1,& \texttt{Characteristic-present}.\end{cases}\] and \(Y_i = \begin{cases}0,& \texttt{Characteristic-absent},\\ 1,& \texttt{Characteristic-present}.\end{cases}\)
Since the raw sample proportions of observations having the characteristic of interest are \[\hat{p_x}={1 \over n}\sum_{i=1}^n{x_i}\] and \(\hat{p_y}={1 \over k}\sum_{i=1}^k{y_i}\)
The corrected sample proportions (for small samples) are \[\tilde{p_x}={\sum_{i=1}^n{x_i}+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},\] and \(\tilde{p_y}={\sum_{i=1}^k{y_i}+0.5z_{\alpha \over 2}^2 \over k+z_{\alpha \over 2}^2},\) where \(z_{\alpha \over 2}\) is the normal critical value we saw earlier.
By the independence of the samples, the standard error of the difference of the two proportion estimates is:
- Raw proportions\[SE_{\hat{p_x}-\hat{p_y}} = \sqrt{SE_{\hat{p_x}}^2 + SE_{\hat{p_y}}^2}= \sqrt{ {\hat{p_x}(1-\hat{p_x})\over n} + {\hat{p_y}(1-\hat{p_y})\over k}}.\]
- Corrected Proportions\[SE_{\tilde{p_x}-\tilde{p_y}} = \sqrt{SE_{\tilde{p_x}}^2 + SE_{\tilde{p_y}}^2}= \sqrt{ {\tilde{p_x}(1-\tilde{p_x})\over n+z_{\alpha \over 2}^2} + {\tilde{p_y}(1-\tilde{p_y})\over k+z_{\alpha \over 2}^2}}.\]
Hypothesis Testing the Difference of Two Proportions
- Null Hypothesis\[H_o: p_x=p_y\], where \(p_x\) and \(p_x\) are the sample population proportions of interest.
- Alternative Research Hypotheses:
- One sided (uni-directional)\[H_1: p_x > p_y\], or \(H_1: p_x < p_y\)
- Double sided\[H_1: p_x \not= p_y\]
- Test Statistics\[Z_o={\tilde{p_x} - \tilde{p_y} \over SE_{(\tilde{p_x}-\tilde{p_y})}} \sim N(0,1)\]
Genders of Siblings Example
Is the gender of a second child influenced by the gender of the first child, in families with >1 child? Research hypothesis needs to be formulated first before collecting/looking/interpreting the data that will be used to address it. Mothers whose 1st child is a girl are more likely to have a girl, as a second child, compared to mothers with boys as 1st child. Data: 20 yrs of birth records of 1 Hospital in Auckland, New Zealand.
Second Child | ||||
Male | Female | Total | ||
First Child | Male | 3,202 | 2,776 | 5,978 |
Female | 2,620 | 2,792 | 5,412 | |
Total | 5,822 | 5,568 | 11,390 |
Let \(p_1\)=true proportion of girls in mothers with girl as first child, \(p_2\)=true proportion of girls in mothers with boy as first child. The parameter of interest is \(p_1- p_2\).
- Hypotheses\[H_o: p_1- p_2=0\] (skeptical reaction). \(H_1: p_1- p_2>0\) (research hypothesis).
Second Child | ||||
Number of births | Number of girls | Proportion | ||
Group | 1 (Previous child was girl) | \(n_1=5412\) | 2792 | \(\hat{p}_1=0.516\) |
2 (Previous child was boy) | \(n_2=5978\) | 2776 | \(\hat{p}_2=0.464\) |
- Test Statistics\[Z_o = {Estimate-HypothesizedValue\over SE(Estimate)} = {\hat{p}_1 - \hat{p}_2 - 0 \over SE(\hat{p}_1 - \hat{p}_2)} = {\hat{p}_1 - \hat{p}_2 - 0 \over \sqrt{{\hat{p}_1(1-\hat{p}_1)\over n_1} + {\hat{p}_2(1-\hat{p}_2)\over n_2}}} \sim N(0,1)\] and \(Z_o=5.4996\).
- \(P_{value} = P(Z>Z_o)< 1.9\times 10^{-8}\). This small p-values provides extremely strong evidence to reject the null hypothesis that there are no differences between the proportions of mothers that had a girl as a second child but had either boy or girl as their first child. Hence there is strong statistical evidence implying that genders of siblings are not independent.
- Practical significance: The practical significance of the effect (of the gender of the first child on the gender of the second child, in this case) can only be assessed using confidence intervals. A 95% \(CI (p_1- p_2) =[0.033; 0.070]\) is computed by \(p_1-p_2 \pm 1.96 SE(p_1 - p_2)\). Clearly, this is a practically negligible effect and no reasonable person would make important prospective family decisions based on the gender of their (first) child.
- This SOCR Analysis Activity illustrates how to use the SOCR Analyses to compute the p-values and answer the hypothesis testing challenge.
Problems
- SOCR Home page: http://www.socr.ucla.edu
"-----
Translate this page: