Difference between revisions of "AP Statistics Curriculum 2007 Hypothesis Basics"

From SOCR
Jump to: navigation, search
(Type I Error, Type II Error and Power)
m (Text replacement - "{{translate|pageName=http://wiki.stat.ucla.edu/socr/" to ""{{translate|pageName=http://wiki.socr.umich.edu/")
 
(19 intermediate revisions by 4 users not shown)
Line 2: Line 2:
  
 
=== Fundamentals of Hypothesis Testing===
 
=== Fundamentals of Hypothesis Testing===
A (statistical) '''hypothesis test''' is a method of making statistical decisions about populations or processes based on experimental data.  Hypothesis testing just answers the question of ''how well the findings fit the possibility that chance alone might be responsible for the observed discrepancy between the theoretical model and the empirical observations''. This is accomplished by asking and answering a hypothetical question - what is the likelihood of the observed summary statistics of interest, if the data did come from the distribution specified by the null-hypothesis. One use of hypothesis-testing is deciding whether experimental results contain enough information to cast doubt on conventional wisdom.
+
A (statistical) '''Hypothesis Test''' is a method of making statistical decisions about populations or processes based on experimental data.  Hypothesis testing just answers the question of ''how well the findings fit the possibility that the chance alone might be responsible for the observed discrepancy between the theoretical model and the empirical observations''. This is accomplished by asking and answering a hypothetical question. What is the likelihood of the observed summary statistics of interest, if the data did come from the distribution specified by the null-hypothesis? One use of hypothesis-testing is deciding whether experimental results contain enough information to cast doubt on conventional wisdom.
  
 
* Example: Consider determining whether a suitcase contains some radioactive material. Placed under a [http://en.wikipedia.org/wiki/Geiger_counter Geiger counter], the suitcase produces 10 clicks (counts) per minute. The '''null hypothesis''' is that there is no radioactive material in the suitcase and that all measured counts are due to ambient radioactivity typical of the surrounding air and harmless objects in a suitcase. We can then calculate how likely it is that the null hypothesis produces 10 counts per minute. If it is likely, for example if the null hypothesis predicts on average 9 counts per minute, we say that the suitcase is compatible with the null hypothesis (which does not imply that there is no radioactive material, we just can't determine from the 1-minute sample we took using this specific method!); On the other hand, if the null hypothesis predicts for example 1 count per minute, then the suitcase is not compatible with the null hypothesis and there must be other factors responsible to produce the increased radioactive counts.
 
* Example: Consider determining whether a suitcase contains some radioactive material. Placed under a [http://en.wikipedia.org/wiki/Geiger_counter Geiger counter], the suitcase produces 10 clicks (counts) per minute. The '''null hypothesis''' is that there is no radioactive material in the suitcase and that all measured counts are due to ambient radioactivity typical of the surrounding air and harmless objects in a suitcase. We can then calculate how likely it is that the null hypothesis produces 10 counts per minute. If it is likely, for example if the null hypothesis predicts on average 9 counts per minute, we say that the suitcase is compatible with the null hypothesis (which does not imply that there is no radioactive material, we just can't determine from the 1-minute sample we took using this specific method!); On the other hand, if the null hypothesis predicts for example 1 count per minute, then the suitcase is not compatible with the null hypothesis and there must be other factors responsible to produce the increased radioactive counts.
  
The ''hypothesis testing'' is also known as ''statistical significance testing''.  The null hypothesis is a conjecture that exists solely to be disproved, rejected or falsified by the [[AP_Statistics_Curriculum_2007_Estim_L_Mean | sample-statistics used to estimate the unknown population parameters]]. Statistical significance is a possible finding of the test, that the sample is unlikely to have occurred in this process by chance given the truth of the null hypothesis.  The name of the test describes its formulation and its possible outcome.  One characteristic of hypothesis testing is its crisp decision about the null-hypothesis: '''reject''' or '''do not reject''' (which is not the same as '''accept''').   
+
The ''Hypothesis Testing'' is also known as ''Statistical Significance Testing''.  The null hypothesis is a conjecture that exists solely to be disproved, rejected or falsified by the [[AP_Statistics_Curriculum_2007_Estim_L_Mean | sample-statistics used to estimate the unknown population parameters]]. Statistical significance is a possible finding of the test, that the sample is unlikely to have occurred in this process by chance given the truth of the null hypothesis.  The name of the test describes its formulation and its possible outcome.  One characteristic of hypothesis testing is its crisp decision about the null-hypothesis: '''reject''' or '''do not reject''' (which is not the same as '''accept''').   
  
 
===Null and Alternative (Research) Hypotheses===
 
===Null and Alternative (Research) Hypotheses===
A '''Null hypothesis''' is a theses set up to be nullified or refuted in order to support an ''alternate (research) hypothesis''. The null hypothesis is presumed true until statistical evidence, in the form of a hypothesis test, indicates otherwise.  In science, the null hypothesis is used to test differences between treatment and control groups, and the assumption at the outset of the experiment is that no difference exists between the two groups for the variable of interest (e.g., population means). The null hypothesis proposes something initially presumed true, and it is rejected only when it becomes evidently false. That is, when a researcher has a certain degree of confidence, usually 95% to 99%, that the data do not support the null hypothesis.
+
A '''Null Hypothesis''' is a thesis set up to be nullified or refuted in order to support an ''Alternate (research) Hypothesis''. The null hypothesis is presumed '''true''' until statistical evidence, in the form of a hypothesis test, indicates otherwise.  In science, the null hypothesis is used to test differences between treatment and control groups, and the assumption at the outset of the experiment is that no difference exists between the two groups for the variable of interest (e.g., population means). The null hypothesis proposes something initially presumed true, and it is rejected only when it becomes evidently false. That is, when a researcher has a certain degree of confidence, usually 95% to 99%, that the data do not support the null hypothesis.
  
=== An Example===
+
===Example 1: Gender effects===
 
If we want to compare the test scores of two random samples of men and women, a null hypothesis would be that the mean score of the male population was the same as the mean score of the female population:
 
If we want to compare the test scores of two random samples of men and women, a null hypothesis would be that the mean score of the male population was the same as the mean score of the female population:
 
: ''H''<SUB>0</SUB> : &mu;<SUB>men</SUB> = &mu;<SUB>women</SUB>
 
: ''H''<SUB>0</SUB> : &mu;<SUB>men</SUB> = &mu;<SUB>women</SUB>
Line 22: Line 22:
 
Alternatively, the null hypothesis can postulate that the two samples are drawn from the same population, so that the [[AP_Statistics_Curriculum_2007#Chapter_II:_Describing.2C_Exploring.2C_and_Comparing_Data | center, variance and shape of the distributions]] are equal.
 
Alternatively, the null hypothesis can postulate that the two samples are drawn from the same population, so that the [[AP_Statistics_Curriculum_2007#Chapter_II:_Describing.2C_Exploring.2C_and_Comparing_Data | center, variance and shape of the distributions]] are equal.
  
Formulation of the null hypothesis is a vital step in testing statistical significance.  Having formulated such a hypothesis, one can establish the probability of observing the obtained data from the prediction of the null hypothesis, if the null hypothesis is true. That probability is what is commonly called the ''significance level'' of the results.
+
Formulation of the null hypothesis is a vital step in testing statistical significance.  Having formulated such a hypothesis, one can establish the probability of observing the obtained data from the prediction of the null hypothesis, if the null hypothesis is true. That probability is what commonly called the ''significance level'' of the results.
  
 
In many scientific experimental designs we predict that a particular factor will produce an effect on our dependent variable — this is our alternative hypothesis. We then consider how often we would expect to observe our experimental results, or results even more extreme, if we were to take many samples from a population where there was no effect (i.e. we test against our null hypothesis). If we find that this happens rarely (up to, say, 5% of the time), we can conclude that our results support our experimental prediction — we reject our null hypothesis.
 
In many scientific experimental designs we predict that a particular factor will produce an effect on our dependent variable — this is our alternative hypothesis. We then consider how often we would expect to observe our experimental results, or results even more extreme, if we were to take many samples from a population where there was no effect (i.e. we test against our null hypothesis). If we find that this happens rarely (up to, say, 5% of the time), we can conclude that our results support our experimental prediction — we reject our null hypothesis.
Line 29: Line 29:
 
Directly related to hypothesis testing are the following 3 concepts:
 
Directly related to hypothesis testing are the following 3 concepts:
  
* [http://en.wikipedia.org/wiki/Type_I_error Type I Error]: The '''false positive''' (Type I) error of rejecting the null hypothesis given that it is actually true; e.g., A court finding a person guilty of a crime that they did not actually commit.
+
* [http://en.wikipedia.org/wiki/Type_I_error Type I Error]: The '''false positive''' (Type I) Error of rejecting the null hypothesis given that it is actually true; e.g., A court finding a person guilty of a crime that they did not actually commit.
  
* [http://en.wikipedia.org/wiki/Type_II_error Type II Error]: The Type II error ('''false negative''') is the error of failing to reject the null hypothesis given that the alternative hypothesis is actually true; e.g., A court finding a person not guilty of a crime that they did actually commit.
+
* [http://en.wikipedia.org/wiki/Type_II_error Type II Error]: The Type II Error ('''false negative''') is the error of failing to reject the null hypothesis given that the alternative hypothesis is actually true; e.g., A court finding a person not guilty of a crime that they did actually commit.
  
* [http://en.wikipedia.org/wiki/Type_I_error Statistical Power]: The '''power of a statistical test''' is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). As power increases, the chances of a Type II error decrease. The probability of a Type II error is referred to as the false negative rate (β). Therefore power is equal to 1 − β. YOu can also see this [[Power_Analysis_for_Normal_Distribution | SOCR Power Activity]].
+
* [http://en.wikipedia.org/wiki/Type_I_error Statistical Power]: The '''Power of a Statistical Test''' is the probability that the test will reject a false null hypothesis (that it will not make a Type II Error). As power increases, the chances of a Type II error decrease. The probability of a Type II error is referred to as the false negative rate (β). Therefore power is equal to 1 − β. You can also see this [[Power_Analysis_for_Normal_Distribution | SOCR Power Activity]].
  
 
{| class="wikitable" style="text-align:center;" border="2"
 
{| class="wikitable" style="text-align:center;" border="2"
Line 46: Line 46:
 
! &nbsp;Negative<br>(fail to reject H<sub>o</sub>)&nbsp;
 
! &nbsp;Negative<br>(fail to reject H<sub>o</sub>)&nbsp;
 
| Condition absent + Negative result = True (accurate) Negative ('''TN''', 0.98505)
 
| Condition absent + Negative result = True (accurate) Negative ('''TN''', 0.98505)
| Condition present + Negative result = False (invalid) Negative ('''FN''', 0.00025)<br/>'''Type II error'''
+
| Condition present + Negative result = False (invalid) Negative ('''FN''', 0.00025)<br/>'''Type II error''' <math>(\beta)</math>
 
|-
 
|-
 
! Positive<br>(reject H<sub>o</sub>)
 
! Positive<br>(reject H<sub>o</sub>)
| Condition absent + Positive result = False Positive ('''FP''', 0.00995)<br/>'''Type I error'''
+
| Condition absent + Positive result = False Positive ('''FP''', 0.00995)<br/>'''Type I error''' <math>(\alpha)</math>
 
| Condition Present + Positive result = True Positive ('''TP''', 0.00475)
 
| Condition Present + Positive result = True Positive ('''TP''', 0.00475)
 
|-
 
|-
 
! Test<br>Interpretation
 
! Test<br>Interpretation
| '''Power''' = 1-FN(FN+TP)=<br>0.00025/0.005 = 0.95
+
| '''Power''' = 1-FN=<br>1-0.00025 = 0.99975
 
| '''Specificity''': TN/(TN+FP) =<br>0.98505/(0.98505+ 0.00995) = 0.99
 
| '''Specificity''': TN/(TN+FP) =<br>0.98505/(0.98505+ 0.00995) = 0.99
 
| '''Sensitivity''': TP/(TP+FN) =<br>0.00475/(0.00475+ 0.00025)= 0.95
 
| '''Sensitivity''': TP/(TP+FN) =<br>0.00475/(0.00475+ 0.00025)= 0.95
Line 59: Line 59:
  
 
* Remarks:
 
* Remarks:
** A '''specificity''' of 100% means that the test recognizes all healthy individuals as (normal) healthy. The maximum is trivially achieved by a test that claims everybody is healthy regardless of the true condition. Therefore, the specificity alone does not tell us how well the test recognizes positive cases.
+
** A '''Specificity''' of 100% means that the test recognizes all healthy individuals as (normal) healthy. The maximum is trivially achieved by a test that claims everybody is healthy regardless of the true condition. Therefore, the specificity alone does not tell us how well the test recognizes positive cases.
** '''False positive rate (α)'''= FP/(FP+TN) = 0.00995/(0.00995 + 0.98505)=0.01 = 1 - Specificity.
+
** '''False positive rate (α)'''= \(\frac{FP}{FP+TN} = \frac{0.00995}{0.00995 + 0.98505}=0.01 \)= 1 - Specificity.
** '''Sensitivity''' is a measure of how well a test correctly identifies a condition, whether this is medical screening tests picking up on a disease, or quality control in factories deciding if a new product is good enough to be sold.
+
** '''Sensitivity''' is a measure of how well a test correctly identifies a condition, whether this is medical screening tests picking up on a disease, or quality control in factories deciding if a new product is good enough to be sold. $Sensitivity=\frac{TP}{TP+FN} =\frac{0.00475}{0.00475+ 0.00025}= 0.95.$
** '''False negative rate (β)'''= FN/(FN+TP) = 0.00025/(0.00025+0.00475)=0.05 = 1 - Sensitivity.
+
** '''False Negative Rate (β)'''= \(\frac{FN}{FN+TP} = \frac{0.00025}{0.00025+0.00475}=0.05 \)= 1 - Sensitivity.
** '''Power''' = 1 − β= 0.95, see [[Power_Analysis_for_Normal_Distribution]].
+
** '''Power''' = 1 − β= 0.95. Power $= P(\mbox{reject null hypothesis} | \mbox{null hypothesis is false})$. For example, see [[Power_Analysis_for_Normal_Distribution]].
 +
**  Both (''Type I ($\alpha$)'' and ''Type II ($\beta$)'') errors are proportions in the range [0,1], so they represent ''error-rates''. The reason they are listed in the corresponding cells in the table is that they are directly proportionate to the numerical values of the FP and FN, respectively.
 +
** The two alternative definitions of ''power'' are equivalent:
 +
::: power$=1-\beta$, and
 +
::: power=sensitivity
 +
:: This is because power=$1-\beta=1-\frac{FN}{FN+TP}=\frac{FN+TP}{FN+TP} - \frac{FN}{FN+TP}=-\frac{TP}{FN+TP}=$ sensitivity.
  
===Example===
+
===Example 2: Sodium content in hot-dogs===
Use the [[SOCR_012708_ID_Data_HotDogs |Hot-dog dataset]] to see if there is statistically significant difference in the sodium content of the poultry vs. meat hotdogs.
+
Use the [[SOCR_012708_ID_Data_HotDogs |Hot-dog dataset]] to see if there are statistically significant differences in the sodium content of the poultry vs. meat hotdogs.
* Formulate Hypotheses: <math>H_o: \mu_p = \mu_m</math> vs. <math>H_1: \mu_p \not= \mu_m</math>, where <math>\mu_p, \mu_m</math> represent the mean sodium content in poultry and mean hotdogs, respectively.
+
* Formulate Hypotheses: \(H_o: \mu_p = \mu_m\) vs. \(H_1: \mu_p \not= \mu_m\), where <math>\mu_p, \mu_m</math> represent the mean sodium content in poultry and mean hotdogs.
* Plug in the data in [http://socr.stat.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] under the [[SOCR_EduMaterials_AnalysisActivities_TwoIndepTU |Two Independent Sample T-Test (Unpooled)]] will generate results as shown on the figure below (Two-Sided P-Value (Unpooled) = 0.196, which does not provide strong evidence to reject the null hypothesis that the two types of hot-dogs have the same mean sodium content)
+
* Plug in the data in [http://socr.stat.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] under the [[SOCR_EduMaterials_AnalysisActivities_TwoIndepTU |Two Independent Sample T-Test (Unpooled)]] will generate results as shown in the figure below (Two-Sided P-Value (Unpooled) = 0.196, which does not provide strong evidence to reject the null hypothesis that the two types of hot-dogs have the same mean sodium content).
 
<center>[[Image:SOCR_EBook_Dinov_Hypothesis_020508_Fig1.jpg|600px]]</center>
 
<center>[[Image:SOCR_EBook_Dinov_Hypothesis_020508_Fig1.jpg|600px]]</center>
  
 +
===Example 3: Rapid testing in strep-throat===
 +
[[AP_Statistics_Curriculum_2007_Hypothesis_Basics#References | This study investigated the accuracy of rapid diagnosis of group A <math>\beta</math>-streptococcal pharyngitis by commercial immunochemical antigen test kits in the setting of recent streptococcal pharyngitis]]. Specifically, it explored whether the false-positive rate of the rapid test was increased because of presumed antigen persistence.
 +
 +
Study used 443 patients who had clinical pharyngitis diagnosed as group A <math>\beta</math>-hemolytic streptococcus infection in the past 28 days and compared them with 232 control patients who had symptoms of pharyngitis but no recent diagnosis of streptococcal pharyngitis. The aim was narrowly focused to compare the rapid strep test with the culture method used in clinical practice.
 +
 +
The study found that the rapid strep test in this setting showed no difference in specificity (0.96 vs. 0.98). Hence, the assertion that rapid antigen testing had higher false-positive rates in those with recent infection was not confirmed. It also found that in patients who had recent streptococcal pharyngitis, the rapid strep test appears to be more reliable (sensitivity 0.91 vs 0.70, P < .001) than in those patients who had not had recent streptococcal pharyngitis. These findings indicated that the rapid strep test is both sensitive and specific in the setting of recent group A <math>\beta</math>-hemolytic streptococcal pharyngitis, and its use might allow earlier treatment in this subgroup of patients.
 +
 +
Table 1. Sensitivity and Specificity of Laboratory Culture and Rapid Strep Test in ''Patients With Recently Treated Cases of Streptococcal Pharyngitis'' (N=443).
 +
 +
<center>
 +
{| class="wikitable" style="text-align:center;" border="2"
 +
|-
 +
! Results || Culture Negative || Culture Positive
 +
|-
 +
| Rapid strep test negative || 93 || 10
 +
|-
 +
| Rapid strep test positive || 4 || 104
 +
|-
 +
|  || colspan=2 | Estimate 95% CI
 +
|-
 +
|  Sensitivity || 104/(104+10) = 0.91 || 0.84, 0.96
 +
|-
 +
|  Specificity || 93/(93+4) = 0.96 || 0.90, 0.99
 +
|-
 +
|  Positive predictive value || 0.96 || 0.91, 0.99
 +
|-
 +
|  Negative predictive value || 0.90 || 0.83, 0.95
 +
|-
 +
|  False-positive rate || 0.04 || 0.01, 0.10
 +
|-
 +
|  False-negative rate || 0.09 || 0.04, 0.15
 +
|}
 +
</center>
 +
 +
Table 2. Sensitivity and Specificity of Laboratory Culture and Rapid Strep Test in ''Patients With No Recently Treated Cases of Streptococcal Pharyngitis'' (N=232).
 +
 +
<center>
 +
{| class="wikitable" style="text-align:center;" border="2"
 +
|-
 +
! Results || Culture Negative || Culture Positive
 +
|-
 +
| Rapid strep test negative || 165 || 19
 +
|-
 +
| Rapid strep test positive || 4 || 44
 +
|-
 +
| || colspan=2 | Estimate 95% CI
 +
|-
 +
| Sensitivity || 44/(44+19) = 0.70  || 0.57, 0.81
 +
|-
 +
| Specificity  || 168/(165+4) = 0.98 ||  0.94, 0.99
 +
|-
 +
| Positive predictive value ||  0.92 ||  0.80, 0.99
 +
|-
 +
| Negative predictive value ||  0.90 ||  0.84, 0.94
 +
|-
 +
| False-positive rate ||  0.02 ||  0.01, 0.06
 +
|-
 +
| False-negative rate ||  0.30 ||  0.19, 0.43
 +
|}
 +
</center>
 +
 +
===[[EBook_Problems_Hypothesis_Basics|Problems]]===
  
 
<hr>
 
<hr>
  
 
===References===
 
===References===
 +
Robert D. Sheeler, MD, Margaret S. Houston, MD, Sharon Radke, RN, Jane C. Dale, MD, and Steven C. Adamson, MD. (2002) [http://jabfm.org/cgi/content/abstract/15/4/261 Accuracy of Rapid Strep Testing in Patients Who Have Had Recent Streptococcal Pharyngitis]. JABFP, 2002, 15(4), 261-265.
  
 
<hr>
 
<hr>
 
* SOCR Home page: http://www.socr.ucla.edu
 
* SOCR Home page: http://www.socr.ucla.edu
  
{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=AP_Statistics_Curriculum_2007_Hypothesis_Basics}}
+
"{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=AP_Statistics_Curriculum_2007_Hypothesis_Basics}}

Latest revision as of 12:17, 3 March 2020

General Advance-Placement (AP) Statistics Curriculum - Fundamentals of Hypothesis Testing

Fundamentals of Hypothesis Testing

A (statistical) Hypothesis Test is a method of making statistical decisions about populations or processes based on experimental data. Hypothesis testing just answers the question of how well the findings fit the possibility that the chance alone might be responsible for the observed discrepancy between the theoretical model and the empirical observations. This is accomplished by asking and answering a hypothetical question. What is the likelihood of the observed summary statistics of interest, if the data did come from the distribution specified by the null-hypothesis? One use of hypothesis-testing is deciding whether experimental results contain enough information to cast doubt on conventional wisdom.

  • Example: Consider determining whether a suitcase contains some radioactive material. Placed under a Geiger counter, the suitcase produces 10 clicks (counts) per minute. The null hypothesis is that there is no radioactive material in the suitcase and that all measured counts are due to ambient radioactivity typical of the surrounding air and harmless objects in a suitcase. We can then calculate how likely it is that the null hypothesis produces 10 counts per minute. If it is likely, for example if the null hypothesis predicts on average 9 counts per minute, we say that the suitcase is compatible with the null hypothesis (which does not imply that there is no radioactive material, we just can't determine from the 1-minute sample we took using this specific method!); On the other hand, if the null hypothesis predicts for example 1 count per minute, then the suitcase is not compatible with the null hypothesis and there must be other factors responsible to produce the increased radioactive counts.

The Hypothesis Testing is also known as Statistical Significance Testing. The null hypothesis is a conjecture that exists solely to be disproved, rejected or falsified by the sample-statistics used to estimate the unknown population parameters. Statistical significance is a possible finding of the test, that the sample is unlikely to have occurred in this process by chance given the truth of the null hypothesis. The name of the test describes its formulation and its possible outcome. One characteristic of hypothesis testing is its crisp decision about the null-hypothesis: reject or do not reject (which is not the same as accept).

Null and Alternative (Research) Hypotheses

A Null Hypothesis is a thesis set up to be nullified or refuted in order to support an Alternate (research) Hypothesis. The null hypothesis is presumed true until statistical evidence, in the form of a hypothesis test, indicates otherwise. In science, the null hypothesis is used to test differences between treatment and control groups, and the assumption at the outset of the experiment is that no difference exists between the two groups for the variable of interest (e.g., population means). The null hypothesis proposes something initially presumed true, and it is rejected only when it becomes evidently false. That is, when a researcher has a certain degree of confidence, usually 95% to 99%, that the data do not support the null hypothesis.

Example 1: Gender effects

If we want to compare the test scores of two random samples of men and women, a null hypothesis would be that the mean score of the male population was the same as the mean score of the female population:

H0 : μmen = μwomen

where:

H0 = the null hypothesis
μmen = the mean of the males (population 1), and
μwomen = the mean of the females (population 2).

Alternatively, the null hypothesis can postulate that the two samples are drawn from the same population, so that the center, variance and shape of the distributions are equal.

Formulation of the null hypothesis is a vital step in testing statistical significance. Having formulated such a hypothesis, one can establish the probability of observing the obtained data from the prediction of the null hypothesis, if the null hypothesis is true. That probability is what commonly called the significance level of the results.

In many scientific experimental designs we predict that a particular factor will produce an effect on our dependent variable — this is our alternative hypothesis. We then consider how often we would expect to observe our experimental results, or results even more extreme, if we were to take many samples from a population where there was no effect (i.e. we test against our null hypothesis). If we find that this happens rarely (up to, say, 5% of the time), we can conclude that our results support our experimental prediction — we reject our null hypothesis.

Type I Error, Type II Error and Power

Directly related to hypothesis testing are the following 3 concepts:

  • Type I Error: The false positive (Type I) Error of rejecting the null hypothesis given that it is actually true; e.g., A court finding a person guilty of a crime that they did not actually commit.
  • Type II Error: The Type II Error (false negative) is the error of failing to reject the null hypothesis given that the alternative hypothesis is actually true; e.g., A court finding a person not guilty of a crime that they did actually commit.
  • Statistical Power: The Power of a Statistical Test is the probability that the test will reject a false null hypothesis (that it will not make a Type II Error). As power increases, the chances of a Type II error decrease. The probability of a Type II error is referred to as the false negative rate (β). Therefore power is equal to 1 − β. You can also see this SOCR Power Activity.
  Actual condition
Absent (Ho is true) Present (H1 is true)
Test
 Result 
 Negative
(fail to reject Ho
Condition absent + Negative result = True (accurate) Negative (TN, 0.98505) Condition present + Negative result = False (invalid) Negative (FN, 0.00025)
Type II error \((\beta)\)
Positive
(reject Ho)
Condition absent + Positive result = False Positive (FP, 0.00995)
Type I error \((\alpha)\)
Condition Present + Positive result = True Positive (TP, 0.00475)
Test
Interpretation
Power = 1-FN=
1-0.00025 = 0.99975
Specificity: TN/(TN+FP) =
0.98505/(0.98505+ 0.00995) = 0.99
Sensitivity: TP/(TP+FN) =
0.00475/(0.00475+ 0.00025)= 0.95
  • Remarks:
    • A Specificity of 100% means that the test recognizes all healthy individuals as (normal) healthy. The maximum is trivially achieved by a test that claims everybody is healthy regardless of the true condition. Therefore, the specificity alone does not tell us how well the test recognizes positive cases.
    • False positive rate (α)= \(\frac{FP}{FP+TN} = \frac{0.00995}{0.00995 + 0.98505}=0.01 \)= 1 - Specificity.
    • Sensitivity is a measure of how well a test correctly identifies a condition, whether this is medical screening tests picking up on a disease, or quality control in factories deciding if a new product is good enough to be sold. $Sensitivity=\frac{TP}{TP+FN} =\frac{0.00475}{0.00475+ 0.00025}= 0.95.$
    • False Negative Rate (β)= \(\frac{FN}{FN+TP} = \frac{0.00025}{0.00025+0.00475}=0.05 \)= 1 - Sensitivity.
    • Power = 1 − β= 0.95. Power $= P(\mbox{reject null hypothesis} | \mbox{null hypothesis is false})$. For example, see Power_Analysis_for_Normal_Distribution.
    • Both (Type I ($\alpha$) and Type II ($\beta$)) errors are proportions in the range [0,1], so they represent error-rates. The reason they are listed in the corresponding cells in the table is that they are directly proportionate to the numerical values of the FP and FN, respectively.
    • The two alternative definitions of power are equivalent:
power$=1-\beta$, and
power=sensitivity
This is because power=$1-\beta=1-\frac{FN}{FN+TP}=\frac{FN+TP}{FN+TP} - \frac{FN}{FN+TP}=-\frac{TP}{FN+TP}=$ sensitivity.

Example 2: Sodium content in hot-dogs

Use the Hot-dog dataset to see if there are statistically significant differences in the sodium content of the poultry vs. meat hotdogs.

  • Formulate Hypotheses: \(H_o: \mu_p = \mu_m\) vs. \(H_1: \mu_p \not= \mu_m\), where \(\mu_p, \mu_m\) represent the mean sodium content in poultry and mean hotdogs.
  • Plug in the data in SOCR Analyses under the Two Independent Sample T-Test (Unpooled) will generate results as shown in the figure below (Two-Sided P-Value (Unpooled) = 0.196, which does not provide strong evidence to reject the null hypothesis that the two types of hot-dogs have the same mean sodium content).
SOCR EBook Dinov Hypothesis 020508 Fig1.jpg

Example 3: Rapid testing in strep-throat

This study investigated the accuracy of rapid diagnosis of group A \(\beta\)-streptococcal pharyngitis by commercial immunochemical antigen test kits in the setting of recent streptococcal pharyngitis. Specifically, it explored whether the false-positive rate of the rapid test was increased because of presumed antigen persistence.

Study used 443 patients who had clinical pharyngitis diagnosed as group A \(\beta\)-hemolytic streptococcus infection in the past 28 days and compared them with 232 control patients who had symptoms of pharyngitis but no recent diagnosis of streptococcal pharyngitis. The aim was narrowly focused to compare the rapid strep test with the culture method used in clinical practice.

The study found that the rapid strep test in this setting showed no difference in specificity (0.96 vs. 0.98). Hence, the assertion that rapid antigen testing had higher false-positive rates in those with recent infection was not confirmed. It also found that in patients who had recent streptococcal pharyngitis, the rapid strep test appears to be more reliable (sensitivity 0.91 vs 0.70, P < .001) than in those patients who had not had recent streptococcal pharyngitis. These findings indicated that the rapid strep test is both sensitive and specific in the setting of recent group A \(\beta\)-hemolytic streptococcal pharyngitis, and its use might allow earlier treatment in this subgroup of patients.

Table 1. Sensitivity and Specificity of Laboratory Culture and Rapid Strep Test in Patients With Recently Treated Cases of Streptococcal Pharyngitis (N=443).

Results Culture Negative Culture Positive
Rapid strep test negative 93 10
Rapid strep test positive 4 104
Estimate 95% CI
Sensitivity 104/(104+10) = 0.91 0.84, 0.96
Specificity 93/(93+4) = 0.96 0.90, 0.99
Positive predictive value 0.96 0.91, 0.99
Negative predictive value 0.90 0.83, 0.95
False-positive rate 0.04 0.01, 0.10
False-negative rate 0.09 0.04, 0.15

Table 2. Sensitivity and Specificity of Laboratory Culture and Rapid Strep Test in Patients With No Recently Treated Cases of Streptococcal Pharyngitis (N=232).

Results Culture Negative Culture Positive
Rapid strep test negative 165 19
Rapid strep test positive 4 44
Estimate 95% CI
Sensitivity 44/(44+19) = 0.70 0.57, 0.81
Specificity 168/(165+4) = 0.98 0.94, 0.99
Positive predictive value 0.92 0.80, 0.99
Negative predictive value 0.90 0.84, 0.94
False-positive rate 0.02 0.01, 0.06
False-negative rate 0.30 0.19, 0.43

Problems


References

Robert D. Sheeler, MD, Margaret S. Houston, MD, Sharon Radke, RN, Jane C. Dale, MD, and Steven C. Adamson, MD. (2002) Accuracy of Rapid Strep Testing in Patients Who Have Had Recent Streptococcal Pharyngitis. JABFP, 2002, 15(4), 261-265.


"-----


Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif