Difference between revisions of "AP Statistics Curriculum 2007 Hypothesis Proportion"

Latest revision as of 15:55, 3 March 2020

General Advance-Placement (AP) Statistics Curriculum - Testing a Claim about Proportion

Background

Recall that for large samples, the sampling distribution of the sample proportion \(\hat{p}\) is approximately Normal, by CLT, as the sample proportion may be presented as a sample average or Bernoulli random variables. When the sample size is small, the normal approximation may be inadequate. To accommodate this, we will modify the sample-proportion \(\hat{p}\) slightly and obtain the corrected-sample-proportion \(\tilde{p}\): \[\hat{p}={y\over n} \longrightarrow \tilde{p}={y+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},\] where \(z_{\alpha \over 2}\) is the normal critical value we saw earlier.

The standard error of \(\hat{p}\) also needs a slight modification \[SE_{\hat{p}} = \sqrt{\hat{p}(1-\hat{p})\over n} \longrightarrow SE_{\tilde{p}} = \sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.\]

Hypothesis Testing about a Single Sample Proportion

Null Hypothesis\[H_o: p=p_o\] (e.g., \(p_o={1\over 2}\)), where p is the population proportion of interest.
Alternative Research Hypotheses:
- One sided (uni-directional)\[H_1: p >p_o\], or \(H_1: p<p_o\)
- Double sided\[H_1: p \not= p_o\]

Test Statistics\[Z_o={\tilde{p} -p_o \over SE_{\tilde{p}}} \sim N(0,1).\]

Example

Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks. He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years. At the end of the two years, he finds that during the study only 17 subjects had a heart attack. Use \(\alpha=0.05\) to formulate a test a research hypothesis that the proportion of subject on aspirin treatment that have heart attacks within 2 years of treatment is \(p_o=0.04\).

\[\tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038\]

\[SE_{\tilde{p}}= \sqrt{0.038(1-0.038)\over 500+3.84}=0.0085\]

And the corresponding test statistics is \[Z_o={\tilde{p} - 0.04 \over SE_{\tilde{p}}}={0.002 \over 0.0085}=0.2353\]

The p-value corresponding to this test-statistics is clearly insignificant.

Genders of Siblings Example

Is the gender of a second child influenced by the gender of the first child, in families with >1 kid? Research hypothesis needs to be formulated first before collecting/looking/interpreting the data that will be used to address it. Mothers whose 1^st child is a girl are more likely to have a girl, as a second child, compared to mothers with boys as 1^st children. Data: 20 yrs of birth records of 1 Hospital in Auckland, New Zealand.

		Second Child
		Male	Female	Total
First Child	Male	3,202	2,776	5,978
	Female	2,620	2,792	5,412
	Total	5,822	5,568	11,390

Let \(p_1\)=true proportion of girls in mothers with girl as first child, \(p_2\)=true proportion of girls in mothers with boy as first child. The parameter of interest is \(p_1- p_2\).

Hypotheses\[H_o: p_1- p_2=0\] (skeptical reaction). \(H_1: p_1- p_2>0\) (research hypothesis).

		Second Child
		Number of births	Number of girls	Proportion
Group	1 (Previous child was girl)	\(n_1=5412\)	2792	\(\hat{p}_1=0.516\)
Group	2 (Previous child was boy)	\(n_2=5978\)	2776	\(\hat{p}_2=0.464\)

Test Statistics\[Z_o = {Estimate-HypothesizedValue\over SE(Estimate)} = {\hat{p}_1 - \hat{p}_2 - 0 \over SE(\hat{p}_1 - \hat{p}_2)} = {\hat{p}_1 - \hat{p}_2 - 0 \over \sqrt{{\hat{p}_1(1-\hat{p}_1)\over n_1} + {\hat{p}_2(1-\hat{p}_2)\over n_2}}} \sim N(0,1)\] and \(Z_o=5.4996\).

\(P_{value} = P(Z>Z_o)< 1.9\times 10^{-8}\). This small p-values provides extremely strong evidence to reject the null hypothesis that there are no differences between the proportions of mothers that had a girl as a second child but had either boy or girl as their first child. Hence there is strong statistical evidence implying that genders of siblings are not independent.

Practical significance: The practical significance of the effect (of the gender of the first child on the gender of the second child, in this case) can only be assessed using confidence intervals. A 95% \(CI (p_1- p_2) =[0.033; 0.070]\) is computed by \(p_1-p_2 \pm 1.96 SE(p_1 - p_2)\). Clearly, this is a practically negligible effect and no reasonable person would make important prospective family decisions based on the gender of their (first) child.

This SOCR Analysis Activity illustrates how to use the SOCR Analyses to compute the p-values and answer the hypothesis testing challenge.

Problems

Notes

Read this Science discussion "Mission Improbable: A Concise and Precise Definition of P-Value".
Use these data to investigate if there are significant gender effects on mortality rates in this study of heart attacks/ Acute Myocardial Infarction.

SOCR Home page: http://www.socr.ucla.edu

"-----

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

@@ Line 1: / Line 1: @@
 ==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] - Testing a Claim about Proportion==
-=== Testing a Claim about Proportion===
+=== Background===
-Example on how to attach images to Wiki documents in included below (this needs to be replaced by an appropriate figure for this section)!
+[[AP_Statistics_Curriculum_2007_Estim_Proportion |Recall that for large samples]], the sampling distribution of the sample proportion <math>\hat{p}</math> is approximately Normal, by [[AP_Statistics_Curriculum_2007_Limits_CLT |CLT]], as the sample proportion may be presented as a [[AP_Statistics_Curriculum_2007_Limits_Norm2Bin |sample average or Bernoulli random variables]]. When the sample size is small, the normal approximation may be inadequate. To accommodate this, we will modify the '''sample-proportion''' <math>\hat{p}</math> slightly and obtain the '''corrected-sample-proportion''' <math>\tilde{p}</math>:
-<center>[[Image:AP_Statistics_Curriculum_2007_IntroVar_Dinov_061407_Fig1.png|500px]]</center>
+: <math>\hat{p}={y\over n} \longrightarrow \tilde{p}={y+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},</math>
+where [[AP_Statistics_Curriculum_2007_Normal_Critical | <math>z_{\alpha \over 2}</math> is the normal critical value we saw earlier]].
-===Approach===
+The standard error of <math>\hat{p}</math> also needs a slight modification
-Models & strategies for solving the problem, data understanding & inference.
+: <math>SE_{\hat{p}} =  \sqrt{\hat{p}(1-\hat{p})\over n} \longrightarrow SE_{\tilde{p}} =  \sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.</math>
-* TBD
+=== Hypothesis Testing about a Single Sample Proportion===
+* Null Hypothesis: <math>H_o: p=p_o</math> (e.g., <math>p_o={1\over 2}</math>), where ''p'' is the population proportion of interest.
+* Alternative Research Hypotheses:
+** One sided (uni-directional): <math>H_1: p >p_o</math>, or <math>H_1: p<p_o</math>
+** Double sided: <math>H_1: p \not= p_o</math>
-===Model Validation===
+* Test Statistics: <math>Z_o={\tilde{p} -p_o \over SE_{\tilde{p}}} \sim N(0,1).</math>
-Checking/affirming underlying assumptions.
-* TBD
+===Example===
+Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks. He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years.  At the end of the two years, he finds that during the study only 17 subjects had a heart attack. Use <math>\alpha=0.05</math> to formulate a test a research hypothesis that the proportion of subject on aspirin treatment that have heart attacks within 2 years of treatment is <math>p_o=0.04</math>.
-===Computational Resources: Internet-based SOCR Tools===
+: <math>\tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038</math>
-* TBD
-===Examples===
+: <math>SE_{\tilde{p}}= \sqrt{0.038(1-0.038)\over 500+3.84}=0.0085</math>
-Computer simulations and real observed data.
-* TBD
+And the corresponding test statistics is
+: <math>Z_o={\tilde{p} - 0.04 \over SE_{\tilde{p}}}={0.002 \over 0.0085}=0.2353</math>
-===Hands-on activities===
-Step-by-step practice problems.
-* TBD
+The p-value corresponding to this test-statistics is clearly insignificant.
+===Genders of Siblings Example===
+Is the gender of a second child influenced by the gender of the first child, in families with >1 kid? Research hypothesis needs to be formulated first before collecting/looking/interpreting the data that will be used to address it. Mothers whose 1<sup>st</sup> child is a girl are more likely to have a girl, as a second child, compared to mothers with boys as 1<sup>st</sup> children. Data: 20 yrs of birth records of 1 Hospital in Auckland, New Zealand.
+<center>
+{| class="wikitable" style="text-align:center; width:75%" border="1"
+|-
+| colspan=2 rowspan=2|&nbsp;
+| colspan=3| '''Second Child'''
+|-
+|  Male || Female || '''Total'''
+|-
+| rowspan=3| '''First Child''' || Male ||  3,202 ||  2,776 || 5,978
+|-
+|  Female || 2,620 || 2,792 || 5,412
+|-
+|  '''Total''' || 5,822 || 5,568 || 11,390
+|}
+</center>
+Let <math>p_1</math>=true proportion of girls in mothers with girl as first child, <math>p_2</math>=true proportion of girls in mothers with boy as first child. The parameter of interest is <math>p_1- p_2</math>.
+* Hypotheses: <math>H_o: p_1- p_2=0</math> (skeptical reaction). <math>H_1: p_1- p_2>0</math> (research hypothesis).
+<center>
+{| class="wikitable" style="text-align:center; width:75%" border="1"
+|-
+| colspan=2 rowspan=2|&nbsp;
+| colspan=3| '''Second Child'''
+|-
+|  Number of births || Number of girls || '''Proportion'''
+|-
+| rowspan=2| '''Group''' || 1 (Previous child was girl) ||  <math>n_1=5412</math>||2792 || <math>\hat{p}_1=0.516</math>
+|-
+|  2 (Previous child was boy) || <math>n_2=5978</math>|| 2776 || <math>\hat{p}_2=0.464</math>
+|}
+</center>
+* Test Statistics: <math>Z_o = {Estimate-HypothesizedValue\over SE(Estimate)} = {\hat{p}_1 - \hat{p}_2 - 0 \over SE(\hat{p}_1 - \hat{p}_2)} = {\hat{p}_1 - \hat{p}_2 - 0 \over \sqrt{{\hat{p}_1(1-\hat{p}_1)\over n_1} + {\hat{p}_2(1-\hat{p}_2)\over n_2}}} \sim N(0,1)</math> and <math>Z_o=5.4996</math>.
+* <math>P_{value} = P(Z>Z_o)< 1.9\times 10^{-8}</math>. This small p-values provides extremely strong evidence to reject the null hypothesis that there are no differences between the proportions of mothers that had a girl as a second child but had either boy or girl as their first child. Hence there is strong statistical evidence implying that genders of siblings are not independent.
+* '''Practical significance''': The practical significance of the effect (of the gender of the first child on the gender of the second child, in this case) can only be assessed using [[AP_Statistics_Curriculum_2007#Estimating_a_Population_Proportion |confidence intervals]]. A 95% <math>CI (p_1- p_2) =[0.033; 0.070]</math> is computed by <math>p_1-p_2 \pm 1.96 SE(p_1 - p_2)</math>. Clearly, this is a practically negligible effect and no reasonable person would make important prospective family decisions based on the gender of their (first) child.
+* This [[SOCR_EduMaterials_AnalysisActivities_Chi_Contingency | SOCR Analysis Activity]] illustrates how to use the [http://socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] to compute the p-values and answer the hypothesis testing challenge.
+<center>[[Image:SOCR_EBook_Dinov_Hypothesis_020508_Fig6.jpg|700px]]</center>
 <hr>
-===References===
-* TBD
+===[[EBook_Problems_Hypothesis_Proportion|Problems]]===
+===Notes===
+* [http://sciencenow.sciencemag.org/cgi/content/full/2009/1030/1 Read this Science discussion "Mission Improbable: A Concise and Precise Definition of P-Value"].
+* Use these data to investigate if there are significant gender effects on mortality rates in [[SOCR_Data_AMI_NY_1993_HeartAttacks|this study of heart attacks/ Acute Myocardial Infarction]].
 <hr>
 * SOCR Home page: http://www.socr.ucla.edu
-{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=AP_Statistics_Curriculum_2007_Hypothesis_Proportion}}
+"{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=AP_Statistics_Curriculum_2007_Hypothesis_Proportion}}

Difference between revisions of "AP Statistics Curriculum 2007 Hypothesis Proportion"

Latest revision as of 15:55, 3 March 2020

Contents

General Advance-Placement (AP) Statistics Curriculum - Testing a Claim about Proportion

Background

Hypothesis Testing about a Single Sample Proportion

Example

Genders of Siblings Example

Problems

Notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools