SOCR EduMaterials Activities Central Limit Theorem Chi square examples

Outline

This hands-on activity demonstrates the concept of the central limit theorem (CLT). Background, motivation, experiments and other applications of the CLT can be found here.

Example 1

• Supply responses true or false with an explanation to each of the following:
a. The standard deviation of the sample mean $$\overline{x}$$increases as the sample increases.
b. The Central Limit Theorem allows us to claim, in certain cases, that the distribution of the sample mean $$\overline{x}$$ is normally distributed.
c. The standard deviation of the sample mean $$\overline{x}$$ is usually approximately equal to the unknown population $$\sigma$$.
d. The standard deviation of the total of a sample of n observations exceeds the standard deviation of the sample mean.
e. If $$X \sim N(8,\sigma)$$ then $$P(\overline{x} > 4)$$ is less than $$P(X > 4).$$

• a. False, the standard deviation of the sample mean is $$\frac{\sigma}{\sqrt n}$$. Thus as the sample size increases, n increases, and as n increases, the standard deviation decreases.
• b. True
• c. False, standard deviation of the sample mean is $$\frac{\sigma}{\sqrt n}$$
• d. True, the standard deviation of the total of a sample of n observations is $$n\sqrt \sigma$$; but the standard deviation of the sample mean is $$\frac{\sigma}{\sqrt n}.$$Unless n is one, the standard deviation of the total of a sample of n observations exceeds the standard deviation of the sample mean.
• e. False, let's assume $$\sigma=2$$ and $$n=2$$. In this case, the z-score for $$P(\overline{X} > 4)$$ is-2.828 while the z-score for $$P(X>4)$$ is-2. $$P(Z>-2.828) > P(Z>-2)$$. Therefore the statement is false.

Example 2

A selective college would like to have an entering class of 1200 students. Because not all students who are offered admission accept, the college admits 1500 students. Past experience shows that 70% of the students admitted will accept. Assuming that students make their decisions independently, the number who accept X, follows the binomial distribution with n = 1500 and p=0.70.

• a. Write an expression for the exact probability that at least 1000 students accept.
• b. Approximate the above probability using the normal distribution.

• a. $$P(X \ge 1000)= P(X=1000)+P(X=1001)+....+P(X=1500)= \sum_{x=1000}^{1500}(\frac{7}{10})^x (\frac{3}{10})^{1500-x}$$.
• b. We can use the normal approximation to binomial$\mu = np = 1500 \times 0.70 = 1050.$ and $$\sigma = \sqrt npq = \sqrt1500 \times 0.7 \times 0.3= 17.748.$$

$P(X \ge 1000)= P(Z> \frac{999.5-1050}{17.748})=P(Z>-2.845)=.9977$

Below you can see a snapshot for this approximation:

Example 3

An insurance company wants to audit health insurance claims in its very large database of transactions. In a quick attempt to assess the level of overstatement of this database, the insurance company selects at random 400 items from the database (each item represents a dollar amount). Suppose that the population mean overstatement of the entire database is $8, with population standard deviation$20.

• a. Find the probability that the sample mean of the 400 isless than $6.50. *'''b.''' Why can we use the normal distribution in obtaining an answer to part (a)? *'''c.''' For what value of w can we say that P(p-w < X < p+w) is equal to 80%? *'''d.''' Let T be the total overstatement for the 400 randomly selected items. Find the number b so that P(T > b) = 0.975. ==='''Answer (a):'''=== *$$\overline{X} \sim N(8, \frac{20}{\sqrt{400}}). P(\overline{X} <6.50) =P(Z<-1.5)=.0667$$ *Below you can see a snapshot for this part: [[Image: SOCR_Activities_CLT_Christou_example3_a.jpg|600px]] ==='''Answer (b):'''=== *The central limit theorem states that the sample mean approaches the normal distribution as the sample size gets bigger. Usually, if $$n \ge 30$$ we can assume that $$\overline{X} \sim N(\mu,\frac{\sigma}{\sqrt n})$$. In this case $$n=400$$, so n is large enough. ==='''Answer (c):'''=== *$$\overline{X} \sim N(8,1).$$According to the snapshot below, the middle 80% of this distribution is (6.721,9.279). Therefore $$w=8-6.721 =1.29$$ [[Image: SOCR_Activities_Normal_Christou_example3_d.jpg|600px]] ==='''Answer (d):'''=== *$$T \sim N(n\mu,\sigma\sqrt n).$$ In this case, $$T \sim N(3200,400).$$ We know that $$P(T>b) =.975.$$So now we need to find the 2.5th percentile of this distribution using SOCR. According to the SOCR snapshot below, the 2.5th percentile of this distribution is 2416. Therefore b=2416. [[Image: SOCR_Activities_Normal_Christou_example3_e.jpg|600px]] =='''Example 4'''== A telephone company has determined that during nonholidays the number of phone calls that pass through the main branch office each hour follows the normal distribution with mean $$\mu$$= 80000 and standard deviation $$\sigma$$ = 35000. Suppose that a random sample of 60 nonholiday hours is selected and the sample mean of the incoming phone calls is computed. *'''a.''' Describe the distribution of X. *'''b.''' Find the probability that the sample mean X of the incoming phone calls for these 60 hours is larger than 91970. *'''c.''' Is it more likely that the sample average $$\overline{x}$$will be greater than 75000 hours, or that one hour's incoming calls will be? ==='''Answer (a):'''=== *$$\overline{X} \sim N(\mu, \frac{\sigma}{\sqrt n })$$. In this case, $$\overline{X} \sim N(80000, 4518.48)$$. ==='''Answer (b):'''=== *We can find the answer using SOCR. The answer is 0.004032. Please see snapshot below: [[Image: SOCR_Activities_Normal_Christou_example4_.jpg|600px]] ==='''Answer (c):'''=== *We can find the answer right away using SOCR. Please see snapshots below: This is the distribution for $$X$$ [[Image: SOCR_Activities_Normal_Christou_example4_c.jpg|600px]] This is the distribution for $$\overline{X}$$ [[Image: SOCR_Activities_Normal_Christou_example3_c2.jpg|600px]] *The probabilities are 55.6% for one hour vs. 86.6% for sample mean. Therefore the sample mean is more likely to be greater than 75000 hours. =='''Example 5'''== Assume the daily S&P return follows the normal distribution with mean $$\mu$$ = 0.00032 and standard deviation $$\sigma$$ = 0.00859. *a. Find the 75th percentile of this distribution. *b. What is the probability that in 2 of the following 5 days, the daily S&P return will be larger than 0.01? ''Consider the sample average S&P of a random sample of 20 days.'' *c. What is the distribution of the sample mean? *d. What is the probability that the sample mean will be larger than 0.005? *e. Is it more likely that the sample average S&P will be greater than 0.007, or that one day's S&P return will be? ==='''Answer (a):'''=== *According to the SOCR snapshot below, the 75th percentile is 0.006115. [[Image: SOCR_Activities_Normal_Christou_example5_aa.jpg|600px]] ==='''Answer (b):'''=== *$$P(X>.01)=.13.$$ We can see this in the snapshot below: [[Image: SOCR_Activities_Normal_Christou_example5_bb.jpg|600px]] *$${5 \choose 2} .13^2 .87^3= 0.11128.$$ [[Image: SOCR_Activities_Normal_Christou_example5_bc.jpg|600px]] ==='''Answer (c):'''=== *$$X \sim N(.00032,.00192)$$ ==='''Answer (d):'''=== *$$P(\overline{X}>.005)=.0074$$ [[Image: SOCR_Activities_Normal_Christou_example5_cc_ii.jpg|600px]] ==='''Answer (e):'''=== *One day's return is more likely to be greater than .007. The probabilities are 0.21 for $$X$$ vs. .00022 for $$\overline{X}$$. This is the snapshot for $$P(X>.007)$$ [[Image: SOCR_Activities_Normal_Christou_example5_cc_iii_1.jpg|600px]] This is the snapshot for $$P(\overline{X}>.007)$$ [[Image: SOCR_Activities_Normal_Christou_example5_cc_iii_2.jpg|600px]] =='''Example 6'''== *If Y has a $$\chi^2$$ distribution with n degrees of freedom, then Y could be represented by $$Y=\sum_{i=0}^{n}X_i$$ where $$X_i$$'s are independent, each having a distribution of $$\chi^2$$ with 1 degree of freedom. A machine in a heavy-equipment factory produces steel rods of length Y, where Y is a normal random variable with $$\mu = 6$$ inches and $$\sigma^2 = 0.2$$. The cost C of repairing a rod that is not exactly 6 inches in length is proportional to the square of the error and is given, in dollars, by $$C = 4(Y-\mu)^2$$. If 50 rods with independent lengths are produced in a given day, approximate the probability that the total cost for repairs for that day exceeds$48.

$$\sum_{i=1}^{50}\frac{Y-\mu}{\sigma^2} \sim \chi^2_{50}$$

$$P(\chi^2_{50}>60)=.156$$
• Below you can see a snapshot for this example:

• We could have also used the Z distribution to approximate the answer. The mean for $$\chi^2_n$$ is n, and the standard deviation is $$\sqrt{2n}$$. Therefore the probability is
$$P(Z>\frac{60-50}{\sqrt{100}})=P(Z>1)$$
• Using the snapshot below, we find the answer, which is .158. This approximation is close to the exact answer that was obtained using the $$\chi^2$$ distribution.