# SMHS CLT LLN

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

## Scientific Methods for Health Sciences - Limit Theory: Central Limit Theorem and Law of Large Numbers

### Overview:

The two most commonly used theorems in the field of probability – Law of Large Numbers (LLT) and the Central Limit Theorem (CLT) – are commonly referred to as the first and second fundamental laws of probability. CLT suggests that the arithmetic mean of a sufficiently large number of iterates of independent random variables given certain conditions will be approximately normally distributed. LLT states that in performing the same experiment a large number of times, the average of the results obtained should be close to the expected value and tends to get closer to the expected value with increasing number of trials. In this section, we are going to introduce these two probability theorems and illustrate their applications with examples. Finally, we will show some common misconceptions of CLT and LLN.

### Motivation:

Suppose we independently conduct one experiment repeatedly. Assume that we are interested in the relative frequency of the occurrence of one event whose probability to be observed at each experiment is p. The ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of (identical and independent) experiments increases. This is an informal statement of the Law of Large Numbers (LLN). Another important property comes with large sample size is the CLT. What would be the situation when the experiment is repeated with a sufficiently large number of iterations? Does it matter what the original distribution each individual outcome follow in this case? What would CLT tell us in situations like this and how can we apply this theorem to help us solve more complicated problems in researches?

### Theory

#### Law of Large Numbers (LLN)

When performing the same experiment a large number of times, the average of the results obtained should be close to the expected value and tends to get closer to the expected value with increasing number of trials.

• It is generally necessary to draw the parallels between the formal LLN statements (in terms of sample averages) and the frequent interpretations of the LLN (in terms of probabilities of various events). Suppose we observe the same process independently multiple times. Assume a binarized (dichotomous) function of the outcome of each trial is of interest (e.g., failure may denote the event that the continuous voltage measure $< 0.5V$, and the complement, success, that voltage $≥ 0.5V,$ this is the situation in electronic chips which binarize electric currents to 0 or 1). Researchers are often interested in the event of observing a success at a given trial or the number of successes in an experiment consisting of multiple trials. Let’s denote $p=P(success)$ at each trial. Then, the ratio of the total number of successes to the number of trials $(n)$ is the average:$\bar X_{n}=\frac{1}{n}\sum_{i=1}^{n}X_{i}$ , where $X_{i}=0$ if failure and 1 if success. Hence, $\bar X_{n}=\hat\rho$,the ratio of the observed frequency of that event to the total number of repetitions, estimates the true p=P(success). Therefore, $\hat\rho$ converges towards $\rho$ as the number of (identical and independent) trials increases.
• LLN Application: One demonstration of the law of large numbers provides practical algorithms for estimation of transcendental numbers. The two most popular transcendental numbers are $\pi$ and $e$.
• The SOCR Uniform e-Estimate Experiment provides the complete details of this simulation. In a nutshell, we can estimate the value of the natural number e using random sampling from Uniform distribution. Suppose $X_{1},X_{2},…,X_{n}$ are drawn from Uniform distribution on $(0,1)$and define $U= \arg\min_{n}( X_{1}+X_{2}+⋯+X_{n}>1)$, note that all $X_{i}≥0.$ Now,the expected value $E(U)=e\approx 2.7182.$ Therefore, by LLN, taking averages of ${U_{1},U_{2},…,U_{k}}$ values, each computed from random samples $X_{1},X_{2},…,X_{n}\sim U(0,1)$ as described above, will provide a more accurate estimate (as $k\rightarrow\infty$) of the natural number $e$. The Uniform E-Estimate Experiment, part of provides a hands-on demonstration of how the LLN facilitates stochastic simulation-based estimation of $e$.
• Common misconceptions: (1) If we observe a streak of 10 consecutive heads (when p=0.5, say) the odds of the 11th trial being a Head is > p! This is of course, incorrect, as the coin tosses are independent trials (an example of a memoryless process); (2) If run large number of coin tosses, the number of heads and number of tails become more and more equal. This is incorrect, as the LLN only guarantees that the sample proportion of heads will converge to the true population proportion (the p parameter that we selected). In fact, the difference |Heads - Tails| diverges.

#### Central Limit Theorem (CLT)

The arithmetic mean of a sufficiently large number of iterates of independent random variables given certain conditions will be approximately normally distributed. It states that the sum of many independent and identically distributed (i.i.d.) random variables will tend to be distributed according to one of a small set of attractor distributions. There are various statements of the central limit theorem, but all of them represent weak-convergence results regarding (mostly) the sums of independent identically-distributed (random) variables.

• Definition of CLT: let ${X_{1},X_{2},…,X_{n}}$ be a i.i.d. random sample of size n drawn from distributions of expected values $\mu$ and finite variance $\sigma^{2}$. The sample average $\bar{x}_{n}=\frac{X_{1}+X_{2}+⋯+X_{n}}{n}$. By LLT, the sample averages converge in probability and almost surely to the expected value $\mu$ as $n\rightarrow \infty$. As n gets larger, the distribution of difference between the sample average $\bar{x}_{n}$ and its limit $\mu$, when multiplied by the factor $\sqrt n$, that is $\sqrt n(\bar{x}_{n}-\mu)$ approximates the normal distribution with mean 0 and variance $\sigma^{2}$: $\sqrt n(\bar{x}_{n}-\mu)\rightarrow N(0,\sigma^{2})$ when n is large enough. Thus, for large enough n, the distribution of $\bar{x}_{n}$ is close to the normal distribution with mean $\mu$ and variance $\frac{\sigma^{2}}{n}$: $\bar{x}_{n}\rightarrow N(\mu,\frac{\sigma^{2}}{n})$.
• Multidimensional CLT: extend the central limit theorem characteristics to the cases where $X_1,X_2,…,X_n$ for each individual is an i.i.d. random vector in $R^k$ with mean $μ=E(X_i)$ and covariance matrix $Σ$. Then with multidimensional CLT, Let ${X_i}=\begin{bmatrix} X_{i(1)} \\ \vdots \\ X_{i(k)} \end{bmatrix}$ be the $i$-vector. The bold in Xi means that it is a random vector, not a random (univariate) variable. Then the sum of the random vectors will be $\begin{bmatrix} X_{1(1)} \\ \vdots \\ X_{1(k)} \end{bmatrix}+\begin{bmatrix} X_{2(1)} \\ \vdots \\ X_{2(k)} \end{bmatrix}+\cdots+\begin{bmatrix} X_{n(1)} \\ \vdots \\ X_{n(k)} \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^{n} \left [ X_{i(1)} \right ] \\ \vdots \\ \sum_{i=1}^{n} \left [ X_{i(k)} \right ] \end{bmatrix} = \sum_{i=1}^{n} \left [ \mathbf{X_i} \right ].$ Also, the average will be $\left (\frac{1}{n}\right)\sum_{i=1}^{n} \left [ \mathbf{X_i} \right ]= \frac{1}{n}\begin{bmatrix} \sum_{i=1}^{n} \left [ X_{i(1)} \right ] \\ \vdots \\ \sum_{i=1}^{n} \left [ X_{i(k)} \right ] \end{bmatrix} = \begin{bmatrix} \bar X_{i(1)} \\ \vdots \\ \bar X_{i(k)} \end{bmatrix}=\mathbf{\bar X_n}$. Thus, $\frac{1}{\sqrt{n}} \sum_{i=1}^{n} \left [\mathbf{X_i} - E\left ( X_i\right ) \right ]=\frac{1}{\sqrt{n}}\sum_{i=1}^{n} \left [ \mathbf{X_i} - \mu \right ]=\sqrt{n}\left(\mathbf{\overline{X}}_n - \mu\right) .$

The multivariate central limit theorem implies that $$\sqrt{n}\left(\mathbf{\overline{X}}_n - \mu\right)\ \stackrel{D}{\rightarrow}\ \mathcal{N}_k(0,\Sigma),$$ where the covariance matrix $Σ$ is equal to $$\Sigma=\begin{bmatrix} {Var \left (X_{1(1)} \right)} & {Cov \left (X_{1(1)},X_{1(2)} \right)} & Cov \left (X_{1(1)},X_{1(3)} \right) & \cdots & Cov \left (X_{1(1)},X_{1(k)} \right) \\ {Cov \left (X_{1(2)},X_{1(1)} \right)} & {Var \left (X_{1(2)} \right)} & {Cov \left(X_{1(2)},X_{1(3)} \right)} & \cdots & Cov \left(X_{1(2)},X_{1(k)} \right) \\ Cov \left (X_{1(3)},X_{1(1)} \right) & {Cov \left (X_{1(3)},X_{1(2)} \right)} & Var \left (X_{1(3)} \right) & \cdots & Cov \left (X_{1(3)},X_{1(k)} \right) \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ Cov \left (X_{1(k)},X_{1(1)} \right) & Cov \left (X_{1(k)},X_{1(2)} \right) & Cov \left (X_{1(k)},X_{1(3)} \right) & \cdots & Var \left (X_{1(k)} \right) \\ \end{bmatrix}.$$

• The chart below demonstrates the CLT: The sample means are generated using a random number generator, which draws numbers between 1 and 100 from a uniform probability distribution. It illustrates that increasing sample sizes result in the 500 measured sample means being more closely distributed about the population mean (50 in this case). It also compares the observed distributions with the distributions that would be expected for a normalized Gaussian distribution, and shows the chi-squared values that quantify the goodness of the fit (the fit is good if the reduced chi-squared value is less than or approximately equal to one). The input into the normalized Gaussian function is the mean of sample means (~50) and the mean sample standard deviation divided by the square root of the sample size (~28.87/√n), which is called the standard deviation of the mean (since it refers to the spread of sample means).

Use the following R-script to generate the graph below:

par(mfrow=c(4,3))
k = 5	# Sample-size
m <- 200  	# Number of Samples

xbarn.5 <- apply(matrix(rnorm(m*k,50,15),nrow=m),1,mean)
hist(xbarn.5,col="blue",xlim=c(0,100),prob=T,xlab="",ylab="",main="Normal(50,15)")
mtext(expression(bar(x)),side=4,line=1)

xbaru.5 <- apply(matrix(runif(m*k,0,1),nrow=m),1,mean)
hist(xbaru.5,col="blue",xlim=c(0,1),prob=T,xlab="",ylab="",main="Uniform(0,1)")
mtext(expression(bar(x)),side=4,line=1)

xbare.5 <- apply(matrix(rlnorm(m*k,1,1),nrow=m),1,mean)
hist(xbare.5,col="blue",xlim=c(0,15),prob=T,xlab="",ylab="",main="Log-Normal(1,1)")
mtext(expression(bar(x)),side=4,line=1)

xbarn.10 <- apply(matrix(rnorm(m*k*2,50,15),nrow=m),1,mean)
hist(xbarn.10,col="blue",xlim=c(0,100),prob=T,xlab="",ylab="",main="")
mtext(expression(bar(x)),side=4,line=1)

xbaru.10 <- apply(matrix(runif(m*k*2,0,1),nrow=m),1,mean)
hist(xbaru.10,col="blue",xlim=c(0,1),prob=T,xlab="",ylab="",main="")
mtext(expression(bar(x)),side=4,line=1)

xbare.10 <- apply(matrix(rlnorm(m* k*2,1,1),nrow=m),1,mean)
hist(xbare.10,col="blue",xlim=c(0,15),prob=T,xlab="",ylab="",main="")
mtext(expression(bar(x)),side=4,line=1)

xbarn.30 <- apply(matrix(rnorm(m*k*3,50,15),nrow=m),1,mean)
hist(xbarn.30,col="blue",xlim=c(0,100),prob=T,xlab="",ylab="",main="")
mtext(expression(bar(x)),side=4,line=1)

xbaru.30 <- apply(matrix(runif(m*k*3,0,1),nrow=m),1,mean)
hist(xbaru.30,col="blue",xlim=c(0,1),prob=T,xlab="",ylab="",main="")
mtext(expression(bar(x)),side=4,line=1)

xbare.30 <- apply(matrix(rlnorm(m*k*3,1,1),nrow=m),1,mean)
hist(xbare.30,col="blue",xlim=c(0,15),prob=T,xlab="",ylab="",main="")
mtext(expression(bar(x)),side=4,line=1)

# Alternative Plots
m <- 2000    # Number of samples
n <- 16      # size of each sample
mu <- 50
sigma <- 15
sigma.xbar <- sigma/sqrt(n)
rnv <- rnorm(m*n,mu,sigma)      # m samples of size n
rnvm <- matrix(rnv,nrow=m)      # m*n matrix

samplemeans <- apply(rnvm,1,mean)     # compute mean across rows of matrix

hist(samplemeans)                       # plain histogram
hist(samplemeans,prob=T)                # density histogram
xs <- seq((mu-4*sigma.xbar),(mu+4*sigma.xbar),length=800)
ys <- dnorm(xs,mu,sigma.xbar)
lines(xs,ys,type="l")                   # superimpose normal
par(mfrow=c(1,1))

par(col.main="blue",pty="s")
hist(samplemeans,prob=T,col="blue",breaks="scott",
xlab=expression(bar(X)),
main=expression(paste("(X~N(50,15^2): Simulated Sampling Distribution of ", bar(X))))
lines(xs,ys,type="l",lwd=2,col="red") # superimpose normal
Alpha <- round(mean(samplemeans),5)
Beta <- round(sd(samplemeans),5)
text(37,.08,bquote(hat(mu)[bar(X)]==.(Alpha)),pos=4,col="blue")
text(37,.07,bquote(hat(sigma)[bar(X)]==.(Beta)),pos=4,col="blue")
text(55, .08,bquote(mu[bar(X)]==.(mu)),pos=4,col="red")
text(55,.07,bquote(sigma[bar(X)]==.(sigma.xbar)),pos=4,col="red")

• CLT instructional challenges: We have extensive CLT pedagogical experience based on graduate and undergraduate teaching, interacting with students (and teaching assistants) and evaluating students’ performance in various probability and statistics classes. In our endeavors, we have used a variety of classical (e.g., mathematical formulations), hands-on activities (e.g., beads, sums, Quincunx) and technological approaches (e.g., applets, demonstrations). Our prior efforts have identified the following instructional challenges in teaching the concept of the CLT using purely classical and physical hands-on activities.
• Some of these challenges may be addressed by employing modern IT-based technologies, like interactive applets and computer activities: What is a native process (native distribution), a sample, a sample distribution, a parameter estimator, a sample-driven numerical parameter (point) estimate or a sampling distribution? What is the relationship between the inference of the CLT and its applications in the real world? How does one improve CLT knowledge retention, which seems to decay over time? Are there paramount characteristics we can demonstrate in the classroom, which may later serve as a foundation for reconstructing the detailed statement of the CLT and improving communication of CLT meaning and results? How does one directly involve and challenge students in thinking about CLT (in and out of classroom)?
• Traditional CLT teaching techniques (symbolic mathematics and physical demonstrations) are typically restricted in terms of time and space (e.g., shown once in class) and may have the limitations of involving one native process, studying one population parameter and restricting the scope of the inference (e.g., sample-size constraints).
• Modern IT-based blended instruction approaches address many of these CLT teaching challenges by utilizing the Internet and the available computational power. For example, a Java CLT applet may be evoked repeatedly under different initial conditions (choosing sample-sizes and number of experiments, native process distributions, parameters of interest, etc.). Such tests may be performed from remote locations (e.g., classroom, library, home), and may provide enhanced interactive features (e.g., fitting Normal model to sampling distribution) demonstrated in different experimental modes (e.g., intense computational vs. visual animated sampling). Such features are especially useful for active, visual and deductive learners. Furthermore, interactive demonstrations are thought to significantly enhance the learning process for some student populations.
• Students in probability and statistics classes are generally expected to master difficult concepts that ultimately lead to understanding the basis of data variation, modeling and analysis. For many students relying on procedural manipulations and recipes is natural, perhaps because of their prior experiences with (deterministic) Newtonian sciences. Various statistics-education researchers have experimented with technology to explore novel exploratory data-analysis techniques that emphasize making sense of data via data manipulation, visualization and simulation. Such investigators refer to statistical literacy as the process of acquiring and utilizing intuition for discovering and interpreting trends, proposing solutions and counterexamples to basic problems in probability, as well as understanding statistical data modeling and analysis. Because the concepts of distribution, variation, probability, randomness, modeling and estimation are so ubiquitously used and entangled, instructors frequently forget that these notions should be defined, explained and demonstrated in (most) undergraduate probability and statistics classes. Various sampling and simulation applets and demonstrations are quite useful for this purpose.

### Applications

This articledemonstrated the theory and application of LLN in SOCR tools. It illustrated the theoretical meaning and practical implications of LLN and presented the LLN in varieties of situations. It also provided empirical evidence in support of LLN convergence and dispelled the common LLN misconceptions.

This article presents the CLT in new SOCR applet and demonstration activity.

Abstract: Modern approaches for information technology based blended education utilize a variety of novel instructional, computational and network resources. Such attempts employ technology to deliver integrated, dynamically linked, interactive content and multi-faceted learning environments, which may facilitate student comprehension and information retention. In this manuscript, we describe one such innovative effort of using technological tools for improving student motivation and learning of the theory, practice and usability of the Central Limit Theorem (CLT) in probability and statistics courses. Our approach is based on harnessing the computational libraries developed by the Statistics Online Computational Resource (SOCR) to design a new interactive Java applet and a corresponding demonstration activity that illustrate the meaning and the power of the CLT. The CLT applet and activity have clear common goals; to provide graphical representation of the CLT, to improve student intuition, and to empirically validate and establish the limits of the CLT. The SOCR CLT activity consists of four experiments that demonstrate the assumptions, meaning and implications of the CLT and ties these to specific hands-on simulations. We include a number of examples illustrating the theory and applications of the CLT. Both the SOCR CLT applet and activity are freely available online to the community to test, validate and extend Applet:

This article presents the CLT for quadratic forms in strongly dependent linear variables and its application to asymptotical normality of Whittle’s estimate. A central limit theorem for quadratic forms in strongly dependent linear (or moving average) variables is proved, generalizing the results of Avramand Fox and Taqqu for Gaussian variables. The theorem is applied to prove asymptotical normality of Whittle's estimate of the parameter of strongly dependent linear sequences.

This article studied on LLN with continuous i.i.d. random variables. There are two problems with the common argument that a continuum of independent and identically distributed random variables sum to a nonrandom quantity in “large economies”. First, it may be unintelligible in that it may call for the measure of a non-measurable set. However, there is a probability measure, consistent with the finite-dimensional distributions, which assigns zero measure to the set of realizations having that difficulty. A second difficulty is that the “law of large numbers” may not hold even when there is no measurability problem.

### Problems

6.1) Your friend is in Vegas playing Keno, and he has noticed that some numbers have been coming up more frequently than others. He declares that the other numbers were "due" to come up, and puts all of his money on those numbers. Is this a correct assessment? (a) Yes, the Law of Averages says that the numbers that haven't shown up will now come up more often, because the probabilities will even out in the end.

(b) No, this is a misconception, because random phenomena do not "compensate" for what happened in the past.

(c) No, the game is probably broken, and the other numbers won't be coming up more frequently.

(d) Yes, the more often a certain number doesn't come up, its probability of coming up next turn increases.

6.2) You are flipping a coin, and it has already landed heads seven times in a row. For the next flip, the probability of getting tails will be greater than the probability of getting heads. (a) TRUE (b) FALSE

Hands-on activities for practice to help students experiment with the SOCR LLN activity and understand the meaning, ramifications and limitations of the LLN.

6.3) Run the SOCR Coin Toss LLN Experiment twice with stop=100 and p=0.5. This corresponds to flipping a fair coin 100 times and observing the behavior of the proportion of heads across (discrete) time.

What will be different in the outcomes of the 2 experiments?

What properties of the 2 outcomes will be very similar?

If we did this 10 times, what is expected to vary and what may be predicted accurately?

6.4) Use the SOCR Uniform e-Estimate Experiment to obtain stochastic estimates of the natural number $e≈2.7182$.

Try to explain in words, and support your argument with data/results from this simulation, why is the expected value of the variable U (defined above) equal to e, $E(U) = e$.

How does the LLN come into play in this experiment?

How would you go about in practice if you had to estimate $e^2≈7.38861124$?

Similarly, try to estimate $π≈3.141592$ and $π^2≈9.8696044$ using the SOCR Buffon’s Needle Experiment.

6.5) Run the SOCR Roulette Experiment and bet on 1-18 (out of the 38 possible numbers/outcomes). What is the probability of success (p)?

What does the LLN imply about $p$ and repeated runs of this experiment?

Run this experiment 3 times. What is the sample estimate of p ($\hat{p}$)? What is the difference $p-\hat{p}$?

Would this difference change if we ran the experiment 10 or 100 times? How?

In 100 Roulette experiments, what can you say about the difference of the number of successes (outcome in 1-18) and the number of failures? How about the proportion of successes?

6.6) Work through the experiments given in this article to (1) empirically validate the sample average of random observations (most processes) follow normal distribution; (2) demonstrate that the sample average is special and other sample statistics like median or variance generally don’t have distributions that are normal; (3) illustrate that the expectations of the sample average equals the population mean; and show that the variation of the sampling distribution of the mean rapidly decreases as the sample size increases.

If a native process has $σ_X = 10$ and we take a sample of size 10, what will be ($σ_{\bar{X}}$)? Does it depend on the shape of the original process? How large should the sample-size be so that $σ_{\bar{X}}=\frac{2}{3} σ_X$?