Difference between revisions of "SOCR EduMaterials Activities GCLT Applications"
(→Additional Applications of the CLT are available here) |
|||
(20 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
==Goals== | ==Goals== | ||
− | The aims of this activity are to demonstrate several practical applications of the general CLT. | + | The aims of this activity are to demonstrate several practical applications of the general CLT. There are many practical examples of using CLT to solve real-life problems. Here are some examples which may be solved using the [[SOCR]] CLT resources. |
==The SOCR CLT Experiment== | ==The SOCR CLT Experiment== | ||
− | First start by readying carefully the [[SOCR EduMaterials Activities GeneralCentralLimitTheorem]]. | + | First start by readying carefully the [[SOCR EduMaterials Activities GeneralCentralLimitTheorem]]. Then go over each of the CLT applications that are worked out in this activity supplement. You may find it useful going back abd forth between these applications and the formal CLT activity. |
− | ===Application 1=== | + | ===Application 1 (Poisson)=== |
− | + | Suppose a call service center expects to get 20 calls a minute for questions regarding each of 17 different vendors that rely on this call center for handling their calls. What is the probability that in a 1-minute interval they receive less than 300 calls in total? | |
− | < | + | Let <math>X_i</math> be the random variable representing the number of calls received about the <math>i^{th}</math> vendor within a minute, then <math>X_i \sim Poisson (20)</math>, as <math>X_i</math> is the number of arrivals within a unit interval and the mean arrival count is given to be 20. The distribution of the total number of calls <math>T = \sum_{i=1}^{17}X_i \sim Poisson(17 \times 20)</math>. By CLT, <math>T \sim Normal(\mu = 17 \times 20, \sigma^2 = 17\times20)</math>, as an approximation of the exact distribution of the total sum. Using the [http://www.socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distribution applet] one can compute exactly the <math>P(T<300 | T \sim Poisson(17 \times 20))= 0.014021</math>. On the other hand side, one may use the CLT to compute a Normal approximation probability of the same event, <math>P(T<300 | T \sim Normal(\mu = 17\times20, \sigma^2 = 17\times20)) = 0.014896</math>. The last quantity is obtained again using the SOCR Distributions applet, without using continuity correction. Using continuity correction the approximation improves, <math>P(T<300 | T \sim Normal(\mu = 17\times20, \sigma^2 = 17\times20)) = 0.0140309</math>. Arguably, the CLT-based calculation is less intense and more appealing to students and trainees, compared to computing the exact probability. |
+ | ===Application 2 (Exponential)=== | ||
+ | It is believed that life-times, in hours, of light-bulbs are Exponentially distributed, say <math>Exp({1\over{2,000}})</math>, mean expected life of 2,000 hours. Recall that the Exponential distribution is called the Mean-Time-To-Failure distribution. You can find more about it from the [http://www.socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distributions applet]. Suppose a University wants to purchase 100 of these light-bulbs and estimate the average life-span of these light bulbs. What is a CLT-based estimate of the probability that the average life-span exceeds 2,200 hrs? Let <math>X_i \sim Exp({ 1\over{2,000}})</math> and | ||
+ | <math>\overline{X}= { 1\over 100}\sum_{i=1}^{100}{X_i}</math>. Notice that in this case the exact distribution of <math>\overline{X}</math> is (generally) not Exponential, even though the density may be computed in closed form (Khuong & Kong, 2006). If we use the CLT, however, we can approximate the probability of interest | ||
+ | <center> | ||
+ | <math>P(\overline{X} > 2,200) \sim P(\overline{X} > 2,200 | \overline{X} \sim N(\mu_{\overline{X}}=2,000, \sigma_{\overline{X}}^2 = {2,000^2 \over 100})),</math> | ||
+ | </center> | ||
+ | as we know that the mean and the standard deviation of <math>X_i</math> are <math>{1\over {\lambda}} =2,000</math> and the standard deviation of <math>\overline{X}</math> is <math>{1\over {\sqrt{100} \times \lambda}} =200</math>. Therefore, <math>P(\overline{X} >2,200) \approx 0.158655</math>, using the CLT approximation and the SOCR Distributions calculator, see figure below. | ||
+ | <center>[[Image:SOCR_Activities_GCLT_Applications_Dinov_040207_Fig1.jpg|400px]]</center> | ||
+ | |||
+ | ===Application 3 (Exponential)=== | ||
+ | A weekly TV talk show from broadcaster U invites viewers to call to express their opinions about the program. Many people call, which sometimes results in quite a long wait time until the host replies. The time it takes the host to respond tends to follow an ''exponential distribution'' with mean of 50 seconds. A competing TV network W has another similar talk show and would like to respond to callers faster than broadcaster U. To do that W executives need to know how long U takes to respond. So, W personnel make 25 calls per week to the U show (for 50 weeks) and measure how long it took the U host to respond. Then W executives compute the average length for their weekly samples of size 25. At the end of the year they plot the distribution of the sample means. What do you think are the center, spread and shape of this distribution? Find out using the SOCR CLT applet. Approximately, what proportion of time is the average weekly wait time for the U broadcaster exceeding 45 seconds? | ||
+ | |||
+ | The Figure below shows the corresponding sampling simulation (SOCR CLT Applet using <math>Exp(\lambda= {1 \over 50} = 0.02)</math> and sampling 50 samples, each of size 25). Notice the differences in the summary statistics between the native distribution, the sample distribution and the sampling distribution for the mean. The answer of this application may then be computed using the [http://www.socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distributions], | ||
+ | <math>P(\overline{X} > 45) \approx P(\overline{X} > 45 | \overline{X} \sim N(\mu_{\overline{X}}=50, \sigma_{\overline{X}}^2 = {50^2 \over 25}))=0.691463.</math> | ||
+ | This chance may also be computed empirically by counting the number of weekly samples that generate an average wait time over 45 seconds and dividing this number by 50 (the total number of weeks in this survey). | ||
+ | <center>[[Image:SOCR_Activities_GCLT_Applications_Dinov_040207_Fig2.jpg|400px]]</center> | ||
+ | |||
+ | ===Application 4 (Binomial)=== | ||
+ | Suppose a player plays a standard Roulette game ([http://www.socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Roulette Experiment]) and bets $1 on a single number. Find the probability the casino will make at least $28 in 100 games. | ||
+ | |||
+ | * '''Solution 1''': One way to solve this problem is to find the distribution of the casino's payoff first: If Y is the random variable representing the payoff for the casino in a single game, the probability mass function for Y is given by <math>P(Y=1)={ 37 \over 38}</math> and <math>P(Y=-35)={ 1 \over 38}</math>, as there are 38 numbers in total (0, 00, 1, 2, …, 36). The player may place a bet on any of these numbers, with a player success payoff of $35 (casino loss of $35) and a player loss of $1 (casino win of $1). Therefore, the casino expected return of the game is <math>\mu_{Y}=E(Y)= {2 \over 38}=0.05263</math> and the variance of the casino return is <math>\sigma_{Y}^2=Var(Y)= 33</math> (<math>\sigma_{Y}=SD(Y)= 5.8</math>) and the range of the return is [-35 : 1], for one game. The exact probability of interest may be computed by using [http://wiki.stat.ucla.edu/socr/index.php/About_pages_for_SOCR_Distributions Binomial Distribution]. If the total casino return in 100 games is denoted by <math>T=\sum_{i=1}^{100}{Y_i}</math> , then the expected casino return in 100 games is $5.26 and ''P(T>28)=P(X>k)'', where <math>X \sim Binomial(p= {37\over 38}=0.97368, n=100)</math> and k is the integer solution of the following dollar amount equation: <math>k \times $1 - (100-k)\times $35 = $28, k=98</math>. Therefore, <math>P(T \ge 28)=P(X \ge 98)=0.508326</math>. The last probability represents the exact solution and is computed using the [http://www.socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Binomial Distribution applet]. This exact calculation is numerically intractable for large sample-sizes (''n>200''), albeit approximations exist. | ||
+ | |||
+ | * '''Solution 2''': One could use the CLT to find a very good approximation to this type of probabilities. For example, in the case above (n=100), we can estimate the probability of interest by <math>P(T>28) \approx P(T>28 | T \sim Normal(\mu_T = n \times \mu_Y = 100 \times 0.05263, \sigma_T^2 = \sigma_Y^2 \times n = 5.8^2 \times 100)=0.347.</math> Notice that this calculation is sample-size independent, and hence widely applicable, whereas the former exact probability calculation (Solution 1) is limited for small n. What caused the large discrepancy between the exact (<math>P(T \ge 28)= 0.508326</math>) and approximate (<math>P(T \ge 28) \approx 0.347</math>) values of the probability of interest? This is an example where the usual rule of '''30 measurements''' breaks, because of the skewed underlying Binomial distribution. Such limitations of the CLT even for large sample-sizes have been previously observed and reported for severely skewed distributions (Freedman, Pisani, & Purves, 1998). Here one would need much larger sample to get a reasonably good approximation using CLT. For example, if n=1,000, and we are looking for <math>P(T>100)</math>, then k=975, the exact probability is <math>P(T>100)=P(X>975)=0.4493287</math>, and the CLT approximation is much closer: <math>P(T>28) \approx P(T>28 | T \sim Normal(\mu_T = n \times \mu_Y = 1,000 \times 0.05263, \sigma_T^2 = \sigma_Y^2 \times n = 5.8^2 \times 1,000)=0.3979.</math> | ||
+ | |||
+ | * '''Solution 3''': Finally, we show how one can use the [http://www.socr.ucla.edu/htmls/SOCR_Experiments.html SOCR CLT applet] alone to completely empirically estimate the probability of interest, ''P(T>28)'', for n=100. The figure below demonstrates how we can manually construct the native probability mass function for the random variable ''Y'' (casino payoff of one roulette game). A simple linear transformation is needed to convert the values of ''Y'' to ''W'' (<math>W= {32 \over 36} (Y+35)</math>), so that the range of the Y variable [-35 : 1] may be mapped to the default range of W, the native distribution [0 : 32]. Now, recalling the definitions above (for the n=100 case) we have that <math>P(T>28)=P(\overline{Y} >0.28)=P(W>31.36) \approx 0.397614</math>. The last equality is obtained by noticing that ''W'' will have approximately <math>Normal(\mu=31.23332; \sigma^2=0.48811895^2)</math> distribution, with empirical mean and standard deviation obtained from row 3 in the table. | ||
+ | <center>[[Image:SOCR_Activities_GCLT_Applications_Dinov_040207_Fig3.jpg|400px]]</center> | ||
+ | |||
+ | ==[[SOCR_EduMaterials_Activities_Central_Limit_Theorem_Chi_square_examples | Additional Applications of the CLT are available here]]== | ||
Line 22: | Line 50: | ||
* [http://www.merlot.org/merlot/viewMaterial.htm?id=236831 SOCR CLT Activity at MERLOT] | * [http://www.merlot.org/merlot/viewMaterial.htm?id=236831 SOCR CLT Activity at MERLOT] | ||
* [http://www.causeweb.org/cwis/SPT--FullRecord.php?ResourceId=1699 SOCR CLT Activity at CAUSEweb] | * [http://www.causeweb.org/cwis/SPT--FullRecord.php?ResourceId=1699 SOCR CLT Activity at CAUSEweb] | ||
+ | * Dinov, ID, Christou, N, and Sanchez, J (2008) ''Central Limit Theorem: New SOCR Applet and Demonstration Activity''. [http://www.amstat.org/publications/jse/v16n2/dinov.html Journal of Statistics Education, Volume 16, Number 2]. | ||
{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=SOCR_EduMaterials_Activities_GCLT_Applications}} | {{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=SOCR_EduMaterials_Activities_GCLT_Applications}} |
Latest revision as of 10:25, 24 May 2010
Contents
SOCR Educational Materials - Activities - Applications of the General Central Limit Theorem (CLT)
This is a component of the activity is based on the SOCR EduMaterials Activities GeneralCentralLimitTheorem.
Goals
The aims of this activity are to demonstrate several practical applications of the general CLT. There are many practical examples of using CLT to solve real-life problems. Here are some examples which may be solved using the SOCR CLT resources.
The SOCR CLT Experiment
First start by readying carefully the SOCR EduMaterials Activities GeneralCentralLimitTheorem. Then go over each of the CLT applications that are worked out in this activity supplement. You may find it useful going back abd forth between these applications and the formal CLT activity.
Application 1 (Poisson)
Suppose a call service center expects to get 20 calls a minute for questions regarding each of 17 different vendors that rely on this call center for handling their calls. What is the probability that in a 1-minute interval they receive less than 300 calls in total?
Let \(X_i\) be the random variable representing the number of calls received about the \(i^{th}\) vendor within a minute, then \(X_i \sim Poisson (20)\), as \(X_i\) is the number of arrivals within a unit interval and the mean arrival count is given to be 20. The distribution of the total number of calls \(T = \sum_{i=1}^{17}X_i \sim Poisson(17 \times 20)\). By CLT, \(T \sim Normal(\mu = 17 \times 20, \sigma^2 = 17\times20)\), as an approximation of the exact distribution of the total sum. Using the SOCR Distribution applet one can compute exactly the \(P(T<300 | T \sim Poisson(17 \times 20))= 0.014021\). On the other hand side, one may use the CLT to compute a Normal approximation probability of the same event, \(P(T<300 | T \sim Normal(\mu = 17\times20, \sigma^2 = 17\times20)) = 0.014896\). The last quantity is obtained again using the SOCR Distributions applet, without using continuity correction. Using continuity correction the approximation improves, \(P(T<300 | T \sim Normal(\mu = 17\times20, \sigma^2 = 17\times20)) = 0.0140309\). Arguably, the CLT-based calculation is less intense and more appealing to students and trainees, compared to computing the exact probability.
Application 2 (Exponential)
It is believed that life-times, in hours, of light-bulbs are Exponentially distributed, say \(Exp({1\over{2,000}})\), mean expected life of 2,000 hours. Recall that the Exponential distribution is called the Mean-Time-To-Failure distribution. You can find more about it from the SOCR Distributions applet. Suppose a University wants to purchase 100 of these light-bulbs and estimate the average life-span of these light bulbs. What is a CLT-based estimate of the probability that the average life-span exceeds 2,200 hrs? Let \(X_i \sim Exp({ 1\over{2,000}})\) and \(\overline{X}= { 1\over 100}\sum_{i=1}^{100}{X_i}\). Notice that in this case the exact distribution of \(\overline{X}\) is (generally) not Exponential, even though the density may be computed in closed form (Khuong & Kong, 2006). If we use the CLT, however, we can approximate the probability of interest
\(P(\overline{X} > 2,200) \sim P(\overline{X} > 2,200 | \overline{X} \sim N(\mu_{\overline{X}}=2,000, \sigma_{\overline{X}}^2 = {2,000^2 \over 100})),\)
as we know that the mean and the standard deviation of \(X_i\) are \({1\over {\lambda}} =2,000\) and the standard deviation of \(\overline{X}\) is \({1\over {\sqrt{100} \times \lambda}} =200\). Therefore, \(P(\overline{X} >2,200) \approx 0.158655\), using the CLT approximation and the SOCR Distributions calculator, see figure below.
Application 3 (Exponential)
A weekly TV talk show from broadcaster U invites viewers to call to express their opinions about the program. Many people call, which sometimes results in quite a long wait time until the host replies. The time it takes the host to respond tends to follow an exponential distribution with mean of 50 seconds. A competing TV network W has another similar talk show and would like to respond to callers faster than broadcaster U. To do that W executives need to know how long U takes to respond. So, W personnel make 25 calls per week to the U show (for 50 weeks) and measure how long it took the U host to respond. Then W executives compute the average length for their weekly samples of size 25. At the end of the year they plot the distribution of the sample means. What do you think are the center, spread and shape of this distribution? Find out using the SOCR CLT applet. Approximately, what proportion of time is the average weekly wait time for the U broadcaster exceeding 45 seconds?
The Figure below shows the corresponding sampling simulation (SOCR CLT Applet using \(Exp(\lambda= {1 \over 50} = 0.02)\) and sampling 50 samples, each of size 25). Notice the differences in the summary statistics between the native distribution, the sample distribution and the sampling distribution for the mean. The answer of this application may then be computed using the SOCR Distributions, \(P(\overline{X} > 45) \approx P(\overline{X} > 45 | \overline{X} \sim N(\mu_{\overline{X}}=50, \sigma_{\overline{X}}^2 = {50^2 \over 25}))=0.691463.\) This chance may also be computed empirically by counting the number of weekly samples that generate an average wait time over 45 seconds and dividing this number by 50 (the total number of weeks in this survey).
Application 4 (Binomial)
Suppose a player plays a standard Roulette game (SOCR Roulette Experiment) and bets $1 on a single number. Find the probability the casino will make at least $28 in 100 games.
- Solution 1: One way to solve this problem is to find the distribution of the casino's payoff first: If Y is the random variable representing the payoff for the casino in a single game, the probability mass function for Y is given by \(P(Y=1)={ 37 \over 38}\) and \(P(Y=-35)={ 1 \over 38}\), as there are 38 numbers in total (0, 00, 1, 2, …, 36). The player may place a bet on any of these numbers, with a player success payoff of $35 (casino loss of $35) and a player loss of $1 (casino win of $1). Therefore, the casino expected return of the game is \(\mu_{Y}=E(Y)= {2 \over 38}=0.05263\) and the variance of the casino return is \(\sigma_{Y}^2=Var(Y)= 33\) (\(\sigma_{Y}=SD(Y)= 5.8\)) and the range of the return is [-35 : 1], for one game. The exact probability of interest may be computed by using Binomial Distribution. If the total casino return in 100 games is denoted by \(T=\sum_{i=1}^{100}{Y_i}\) , then the expected casino return in 100 games is $5.26 and P(T>28)=P(X>k), where \(X \sim Binomial(p= {37\over 38}=0.97368, n=100)\) and k is the integer solution of the following dollar amount equation\[k \times $1 - (100-k)\times $35 = $28, k=98\]. Therefore, \(P(T \ge 28)=P(X \ge 98)=0.508326\). The last probability represents the exact solution and is computed using the SOCR Binomial Distribution applet. This exact calculation is numerically intractable for large sample-sizes (n>200), albeit approximations exist.
- Solution 2: One could use the CLT to find a very good approximation to this type of probabilities. For example, in the case above (n=100), we can estimate the probability of interest by \(P(T>28) \approx P(T>28 | T \sim Normal(\mu_T = n \times \mu_Y = 100 \times 0.05263, \sigma_T^2 = \sigma_Y^2 \times n = 5.8^2 \times 100)=0.347.\) Notice that this calculation is sample-size independent, and hence widely applicable, whereas the former exact probability calculation (Solution 1) is limited for small n. What caused the large discrepancy between the exact (\(P(T \ge 28)= 0.508326\)) and approximate (\(P(T \ge 28) \approx 0.347\)) values of the probability of interest? This is an example where the usual rule of 30 measurements breaks, because of the skewed underlying Binomial distribution. Such limitations of the CLT even for large sample-sizes have been previously observed and reported for severely skewed distributions (Freedman, Pisani, & Purves, 1998). Here one would need much larger sample to get a reasonably good approximation using CLT. For example, if n=1,000, and we are looking for \(P(T>100)\), then k=975, the exact probability is \(P(T>100)=P(X>975)=0.4493287\), and the CLT approximation is much closer\[P(T>28) \approx P(T>28 | T \sim Normal(\mu_T = n \times \mu_Y = 1,000 \times 0.05263, \sigma_T^2 = \sigma_Y^2 \times n = 5.8^2 \times 1,000)=0.3979.\]
- Solution 3: Finally, we show how one can use the SOCR CLT applet alone to completely empirically estimate the probability of interest, P(T>28), for n=100. The figure below demonstrates how we can manually construct the native probability mass function for the random variable Y (casino payoff of one roulette game). A simple linear transformation is needed to convert the values of Y to W (\(W= {32 \over 36} (Y+35)\)), so that the range of the Y variable [-35 : 1] may be mapped to the default range of W, the native distribution [0 : 32]. Now, recalling the definitions above (for the n=100 case) we have that \(P(T>28)=P(\overline{Y} >0.28)=P(W>31.36) \approx 0.397614\). The last equality is obtained by noticing that W will have approximately \(Normal(\mu=31.23332; \sigma^2=0.48811895^2)\) distribution, with empirical mean and standard deviation obtained from row 3 in the table.
Additional Applications of the CLT are available here
- Back to SOCR CLT Activity
- SOCR Home page: http://www.socr.ucla.edu
- SOCR CLT Activity at MERLOT
- SOCR CLT Activity at CAUSEweb
- Dinov, ID, Christou, N, and Sanchez, J (2008) Central Limit Theorem: New SOCR Applet and Demonstration Activity. Journal of Statistics Education, Volume 16, Number 2.
Translate this page: