# Difference between revisions of "AP Statistics Curriculum 2007 Contingency Fit"

(→Butterfly Hotspots) |
(→Brain Cancer) |
||

Line 70: | Line 70: | ||

Does it seem like there is an association between brain cancer and cell phone use? | Does it seem like there is an association between brain cancer and cell phone use? | ||

− | Of the brain cancer patients 18 | + | Of the brain cancer patients, 18 out of 25 (about 0.72) owned a cell phone before their diagnosis. |

− | ''P(CP|BC) = 0.72'' | + | ''P(CP|BC) = 0.72'' is the estimated probability of patients owning a cell phone given that he or she has brain cancer. |

− | Of the other cancer patients, 80 | + | Of the other cancer patients, 80 out of 175 (about 0.46) owned a cell phone before their diagnosis. |

− | ''P(CP|NBC) = 0.46'' | + | ''P(CP|NBC) = 0.46'' is the estimated probability of patients owning a cell phone given that he or she has a different type of cancer. |

==Applications== | ==Applications== |

## Revision as of 22:35, 12 March 2008

## Contents

## General Advance-Placement (AP) Statistics Curriculum - Multinomial Experiments: Chi-Square Goodness-of-Fit

The Chi-Square Test is used to test if a data sample comes from a population with specific characteristics. The Chi-Square Goodness-of-Fit Test is applied to binned data (data put into classes or categories). In most situations, the data histogram or frequency histogram may be obtained and the Chi-Square Test may be applied to these (frequency) values. This test requires a sufficient sample size in order for the Chi-Square approximation to be valid.

The Kolmogorov-Smirnov is an alternative to the Chi-Square Goodness-of-Fit Test. The Chi-Square Goodness-of-Fit Test may also be applied to discrete distributions such as the Binomial and the Poisson. The Kolmogorov-Smirnov Test is restricted to continuous distributions.

## Motivational Example

Mendel's Pea Experiment relates to the transmission of hereditary characteristics from parent organisms to their offspring; it underlies much of genetics. Suppose a *tall offspring* is the event of interest and that the true proportion of tall peas (based on a 3:1 phenotypic ratio) is 3/4 or *p = 0.75*. He would like to show that Mendel's data follow this 3:1 phenotypic ratio.

Observed (O) |
Expected (E)
| |

Tall |
787 | 798 |

Dwarf |
277 | 266 |

## Calculations

Suppose there were *N = 1064* data measurements with *Observed(Tall) = 787* and *Observed(Dwarf) = 277*. These are the O’s (observed values). To calculate the E’s (expected values), we will take the hypothesized proportions under \(H_o\) and multiply them by the total sample size *N*. Expected(Tall) = (0.75)(1064) = 798 and Expected(Dwarf) = (0.25)(1064) = 266. Quickly check to see if the expected total = N = 1064.

- The Hypotheses:

\[H_o\]:P(tall) = 0.75 (No effect, follows a 3:1phenotypic ratio)

- P(dwarf) = 0.25

\[H_a\]: P(tall) ≠ 0.75

- P(dwarf) ≠ 0.25

- Test Statistics:

\[\chi_o^2 = \sum_{all-categories}{(O-E)^2 \over E} \sim \chi_{(df=number\_of\_categories - 1)}^2\]

- P-values and Critical alues for the Chi-Square Distribution may be easily computed using SOCR Distributions.

- Results:

For the Mendel's Pea Experiment, we can compute the Chi-Square Test Statistics to be: \[\chi_o^2 = {(787-798)^2 \over 798} + {(277-266)^2 \over 266} = 0.152+0.455=0.607\].

- p-value=\(P(\chi_{(df=1)}^2 > \chi_o^2)=0.436\)

## Examples

### Butterfly Hotspots

A hotspot is defined as a \(10 km^2\) area that is species rich (heavily populated by the species of interest). Suppose in a study of butterfly hotspots in a particular area of \(10 km^2\), the number of butterfly hotspots in a sample of 2,588 is 165. In theory, 5% of the areas should be butterfly hotspots. Does the data provide evidence to suggest that the number of butterfly hotspots is increasing from the theoretical standards? Test using \(\alpha= 0.01\).

### Cell-Phone Usage

Of 250 randomly selected cell phone users, is there evidence to show that there is a difference in area of home residence, defined as: Northern California (North); Southern California (South); or Out of State (Out)? Without further information suppose we have P(North) = 0.24, P(South) = 0.45, and P(Out) = 0.31. Is there any evidence suggesting different use of cell phones in these three groups of users?

### Brain Cancer

Suppose 200 randomly selected cancer patients were asked if their primary diagnosis was brain cancer and if they owned a cell phone before their diagnosis. The results are presented in the table below:

Brain cancer
| ||||

Yes |
No |
Total
| ||

Cell Phone Use |
Yes |
18 | 80 | 98 |

No |
7 | 95 | 102 | |

Total |
25 | 175 | 200 |

Does it seem like there is an association between brain cancer and cell phone use?
Of the brain cancer patients, 18 out of 25 (about 0.72) owned a cell phone before their diagnosis.
*P(CP|BC) = 0.72* is the estimated probability of patients owning a cell phone given that he or she has brain cancer.

Of the other cancer patients, 80 out of 175 (about 0.46) owned a cell phone before their diagnosis.
*P(CP|NBC) = 0.46* is the estimated probability of patients owning a cell phone given that he or she has a different type of cancer.

## Applications

### Polynomial Model Fitting

## References

- TBD

- SOCR Home page: http://www.socr.ucla.edu

Translate this page: