# AP Statistics Curriculum 2007 StudentsT

## General Advance-Placement (AP) Statistics Curriculum - Student's T Distribution

Very frequently in practice, we do not know the population variance. Therefore we need to estimate it using the sample-variance. This requires us to introduce the T-distribution, which is a one-parameter distribution connecting \(Cauchy=T_{(df=1)} \longrightarrow T_{(df)}\longrightarrow N(0,1)=T_{(df=\infty)}\).

### Student's T Distribution

The Student's t-distribution arises in the problem of estimating the mean of a normally distributed population when the sample size is small and the population variance is unknown. It is the basis of the popular Student's t-tests for the statistical significance of the difference between two sample means, and for confidence intervals for the difference between two population means.

Suppose *X*_{1}, ..., *X*_{n} are independent random variables that are Normally distributed with expected value μ and variance σ^{2}. Let
\[ \overline{X}_n = {X_1+X_2+\cdots+X_n \over n}\] be the sample mean, and

\[{S_n}^2=\frac{1}{n-1}\sum_{i=1}^n\left(X_i-\overline{X}_n\right)^2\] be the sample variance. We already discussed the following statistic: \[Z=\frac{\overline{X}_n-\mu}{\sigma/\sqrt{n}}\]

is normally distributed with mean 0 and variance 1, since the sample mean \(\scriptstyle \overline{X}_n \) is normally distributed with mean \( \mu\) and standard deviation \(\scriptstyle\sigma/\sqrt{n}\).

Gosset studied a related quantity under the pseudonym *Student*,
\[T=\frac{\overline{X}_n-\mu}{S_n / \sqrt{n}},\]
which differs from *Z* in that the (unknown) population standard deviation \(\scriptstyle \sigma\) is replaced by the sample standard deviation \(S_n\). Technically, \(\scriptstyle(n-1)S_n^2/\sigma^2\) has a Chi-square distribution \(\scriptstyle\chi_{n-1}^2\) distribution. Gosset's work showed that *T* has a specific probability density function, which approaches Normal(0,1) as the degree of freedom (df=sample-size -1) increases.

### Computing with T-distribution

- You can see the discretized T-table or
- Use the interactive SOCR T-distribution or
- Use the high precision T-distribution calculator.

### Example

Suppose a researcher wants to examine CD4 counts for HIV(+) patients seen at his clinic. She randomly selects a sample of n = 25 HIV(+) patients and measures their CD4 levels (cells/uL). Suppose she obtains the following results and we are interested in calculating a 95% (\(\alpha=0.05\)) confidence interval for \(\mu\):

Variable | N | N* | Mean | SE of Mean | StDev | Minimum | Q1 | Median | Q3 | Maximum |

CD4 | 25 | 0 | 321.4 | 14.8 | 73.8 | 208.0 | 261.5 | 325.0 | 394.0 | 449.0 |

What do we know from the background information? \[\overline{y}= 321.4\] \[s = 73.8\] \[SE = 14.8\] \[n = 25\]

\[CI(\alpha)=CI(0.05): \overline{y} \pm t_{\alpha\over 2} {1\over \sqrt{n}} \sqrt{\sum_{i=1}^n{(x_i-\overline{x})^2\over n-1}}.\]

\[321.4 \pm t_{(24, 0.025)}{73.8\over \sqrt{25}}\]

\[321.4 \pm 2.064\times 14.8\]

\[[290.85, 351.95]\]

#### CI Interpretation

Still, does this CI (290.85, 351.95) mean anything to us? Consider the following information: The U.S. Government classification of AIDS has three official categories of CD4 counts – asymptomatic = greater than or equal to 500 cells/uL

- AIDS related complex (ARC) = 200-499 cells/uL
- AIDS = less than 200 cells/uL
- Now how can we interpret our CI?

### SOCR CI Experiments

The SOCR Confidence Interval Experiment provides empirical evidence that the definition and the construction protocol for Confidence intervals are consistent.

### Activities

- A biologist obtained body weights of male reindeer from a herd during the seasonal round-up. He measured the weight of a random sample of 102 reindeer in the herd, and found the sample mean and standard deviation to be 54.78 kg and 8.83 kg, respectively. Suppose these data come from a normal distribution. Calculate a 99% confidence interval.

- Suppose the proportion of blood type O in the population is 0.44. If we take a random sample of 12 subjects and make a note of their blood types. What is the probability that exactly 6 subjects have type O blood type in the sample?

#### Approach I (exact)

\[P(X=6)=?\] where \(X\sim B(12, 0.44)\) \[P(X=6)={12\choose 6}p^6(1-p)^{6}=\frac{12!}{6!(6)!}0.44^6 0.56^6=0.2068,\] using SOCR Binomial interactive GUI or calculator.

#### Approach II (Approximate)

\[X \sim B(n=12, p=0.44).\] \[X (approx.) \sim N [\mu = n p = 5.28; \sigma=\sqrt{(np(1-p))}=1.7].\] \(P(X=6) \approx P(Z_1\leq Z \leq Z_2)\), where \(Z = {{X-5.28} \over {1.7}}\) and \(X_1=5.5,\) \(X_2=6.5.\) So, \(P(X=6)\approx P(Z_1 \leq Z \leq Z_2)=0.211.\)

#### Approach III (Approximate)

\[X \sim B(n=12, p=0.44).\] The sample proportion is \(\hat{p} = X/n \approx N [m = p = 0.44; (p(1-p)/n)1/2=0.1433].\) Thus, \(P(X=6) = P(\hat{p}=0.5) \approx P(p_1 \leq \hat{p} \leq p_2),\) where \(p_1=0.5-1/24\) and \(p_2=0.5+1/24.\) Note that approach II is very similar to approach III, however, the former uses the total sum of successes (X), whereas the latter employs the proportion (X/n). This is why the left-right additive term of 0.5 in approach II becomes a 0.5*(1/12) = 1/24 in the III approximation. Finally, standardize each of the 2 limits (\(p_1\) and \(p_2\)), using the \(Z = (p-0.44)/0.1433\) transformation, to get \[P(X=6) \approx P(p_1 \leq \hat{p} \leq p_2) = P(Z_1 \leq Z \leq Z_2) = 0.211.\]

### Problems

- SOCR Home page: http://www.socr.ucla.edu

Translate this page: