Difference between revisions of "SMHS ProbabilityDistributions"

From SOCR
Jump to: navigation, search
(Negative multinomial distribution (NMD))
(Generating Probability Tables)
 
(28 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
===Overview===
 
===Overview===
Distributions are the fundamental basis of probability theory. There are two types of processes that we observe in nature, discrete and continuous, and they are modeled by the corresponding distributions. (There can also be mixture-, multidimensional and tensor distributions[http://www.example.com link title]; these are not discussed here). The type of distribution depends on the type of data. Discrete and continuous distributions represent discrete or continuous random variables, respectively. This section aims to introduce various discrete and continuous distributions and discuss the relationships between distributions.  
+
Distributions are the fundamental basis of probability theory. There are two types of processes that we observe in nature - ''discrete'' and ''continuous'' - and they are modeled by the corresponding distributions. (There can also be mixture, multidimensional and tensor distributions, which are not discussed here). The type of distribution depends on the type of data. Discrete and continuous distributions represent discrete or continuous random variables, respectively. This section aims to introduce various discrete and continuous distributions and to discuss the relationships between distributions.  
  
*Discrete distributions: [[AP_Statistics_Curriculum_2007_Distrib_Binomial|Bernoulli distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Binomial|Binomial distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Multinomial|Multinomial distribution]], [[SOCR_EduMaterials_Activities_Explore_Distributions#Geometric_probability_distribution|Geometric distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Dists#HyperGeometric|Hypergeometric distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Binomial|Negative binomial distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Multinomial_Distribution_.28NMD.29|Negative multinomial distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Poisson|Poisson distribution]].
+
* (Common) Discrete distributions: [[AP_Statistics_Curriculum_2007_Distrib_Binomial|Bernoulli distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Binomial|Binomial distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Multinomial|Multinomial distribution]], [[SOCR_EduMaterials_Activities_Explore_Distributions#Geometric_probability_distribution|Geometric distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Dists#HyperGeometric|Hypergeometric distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Binomial|Negative binomial distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Multinomial_Distribution_.28NMD.29|Negative multinomial distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Poisson|Poisson distribution]].
  
*Continuous distributions: [[AP_Statistics_Curriculum_2007#Chapter_V:_Normal_Probability_Distribution|Normal distribution]], [[SOCR_BivariateNormal_JS_Activity| Multivariate normal distribution]].
+
* (Common) Continuous distributions: [[AP_Statistics_Curriculum_2007#Chapter_V:_Normal_Probability_Distribution|Normal distribution]], [[SOCR_BivariateNormal_JS_Activity| Multivariate normal distribution]], [[AP_Statistics_Curriculum_2007_Beta| Beta distribution]], [[AP_Statistics_Curriculum_2007_Exponential|exponential distribution]].
  
 
===Motivation===
 
===Motivation===
We have talked about different types of data and the fundamentals of probability theory. In order to capture and estimate patterns in data, we introduced the concept of a distribution. A probability distribution assigns a probability to each measurable subset of the possible outcomes of a random experiment. It can either be univariate or multivariate. A univariate distribution gives the probability of a single random variable while a a multivariate distribution (i.e., a joint probability distribution) gives the probability of a random vector, which is a set of two or more random variables taking on various combinations of values. Consider the coin tossing experiment, what distribution would we expect the outcomes to follow?
+
We have talked about different types of data and the fundamentals of probability theory. In order to capture and estimate patterns in data, we introduced the concept of a distribution. A probability distribution assigns a probability to each measurable subset of the possible outcomes of a random experiment. It can either be univariate or multivariate. A univariate distribution gives the probability of a single random variable, while a multivariate distribution (i.e., a joint probability distribution) gives the probability of a random vector, which is a set of two or more random variables taking on various combinations of values. Consider the coin tossing experiment; what distribution would we expect the outcomes to follow?
  
 
===Theory===
 
===Theory===
 
'''Random variables''': A random variable is a function or a mapping from a sample space onto the real numbers (most of the time). In other words, a random variable assigns real values to outcomes of experiments.
 
'''Random variables''': A random variable is a function or a mapping from a sample space onto the real numbers (most of the time). In other words, a random variable assigns real values to outcomes of experiments.
  
'''Probability density / mass functions and the cumulative distribution function'''  
+
'''Probability density / mass functions and the cumulative distribution function'''
 +
 
 
The probability density function (pdf) or probability mass function (pmf) for a continuous or discrete random variable, respectively, is the function defined by the probability of the subset of the sample space $\{s\in S\}\subset S$. $p(x)=P(\{s\in S\} | X(s)=x)$, for all $x$.
 
The probability density function (pdf) or probability mass function (pmf) for a continuous or discrete random variable, respectively, is the function defined by the probability of the subset of the sample space $\{s\in S\}\subset S$. $p(x)=P(\{s\in S\} | X(s)=x)$, for all $x$.
  
Line 33: Line 34:
  
 
====Binomial distribution====
 
====Binomial distribution====
Suppose we conduct an experiment observing n trials of a Bernoulli process. If we are interested in the RV $x$ = {Number of heads in $n$ trials}, then $X$ is called a [[AP_Statistics_Curriculum_2007_Distrib_Binomial#Binomial_Random_Variables|binomial RV]] and its distribution is called binomial distribution. We say $X \sim B(n,p)$,where $n$ is the sample size and $p$ is the probability of heads during one trial. $P(X=x)={n\choose x} p^x (1-p)^{n-x}$, for $x=0,1,…,n$, where ${n\choose x}=\frac {n!} {x!(n-x)!}$ is the binomial coefficient.
+
Suppose we conduct an experiment observing $n$ trials of a Bernoulli process. If we are interested in the $RV$ $x$ = {Number of heads in $n$ trials}, then $X$ is called a [[AP_Statistics_Curriculum_2007_Distrib_Binomial#Binomial_Random_Variables|binomial RV]] and its distribution is called binomial distribution. We say $X \sim B(n,p)$, where $n$ is the sample size and $p$ is the probability of heads during one trial. $P(X=x)={n\choose x} p^x (1-p)^{n-x}$, for $x=0,1,…,n$, where ${n\choose x}=\frac {n!} {x!(n-x)!}$ is the binomial coefficient.
 
$$E[X]=np;VAR[X]=np(1-p)$$
 
$$E[X]=np;VAR[X]=np(1-p)$$
 +
 +
* Example: Suppose a dementia clinic (A) has $n=10$ mildly cognitively impaired (MCI) patients. A binomial probability model, $B(n,p)$, can be used to model the number of patients that are expected to convert to dementia within 1 year. A similar dementia clinic (B) reported that they tend to see 30% of the MCI patients convert to dementia annually. Then the probability that $k$ patients (from the 10 MCIs in clinic A) would convert to dementia is given by the binomial probability. Compute the following:
 +
** The number of MCI patients in clinic A expected to convert to dementia in a one year time frame ($3$).
 +
** The variance of the number of expected MCI-to-dementia conversions annually ($2.1$).
 +
** The probability that 2 or more MCI patients in clinic A will convert to dementia in a year, $P(X≥2)$, ($0.8506917$). Think of the ''complement''.
 +
 +
* What [[AP_Statistics_Curriculum_2007_Distrib_Binomial#Binomial_Random_Variables|situations is a Binomial distribution a good model]] for?
  
 
====Multinomial distribution====
 
====Multinomial distribution====
Line 42: Line 50:
  
 
====Geometric distribution====
 
====Geometric distribution====
The probability distribution of the number, X, of Bernoulli trials needed to obtain one success is called the [[AP_Statistics_Curriculum_2007_Distrib_Dists#Geometric|geometric distribution]]. It is supported on the set $\{1,2,3,…\}$. $P(X=x)=(1-p)^{x-1}p$, for $x = 1, 2, … $
+
The probability distribution of the number, $X$, of Bernoulli trials needed to obtain one success is called the [[AP_Statistics_Curriculum_2007_Distrib_Dists#Geometric|geometric distribution]]. It is supported on the set $\{1,2,3,…\}$. $P(X=x)=(1-p)^{x-1}p$, for $x = 1, 2, … $
  
 
$$E[X]=\dfrac {1} {p},VAR[X]= \frac {1-p} {p^{2}}$$
 
$$E[X]=\dfrac {1} {p},VAR[X]= \frac {1-p} {p^{2}}$$
Line 65: Line 73:
  
 
==== Negative binomial distribution====
 
==== Negative binomial distribution====
Suppose X is the trial index (n) of the $r^{th}$ success, or the total number of experiments ($n$) needed to get $r$ successes. The [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Binomial| negative binomial distribution]] has the following mass function $P(X=n)={n-1 \choose r-1} p^r (1-p)^{(n-r)}$, for $n=r,r+1,r+2,…$, where $n$ is the trial number of the $𝑟^{𝑡ℎ}$ success.
+
Suppose $X$ is the trial index ($n$) of the $r^{th}$ success, or the total number of experiments ($n$) needed to get $r$ successes. The [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Binomial| negative binomial distribution]] has the following mass function $P(X=n)={n-1 \choose r-1} p^r (1-p)^{(n-r)}$, for $n=r,r+1,r+2,…$, where $n$ is the trial number of the $𝑟^{𝑡ℎ}$ success.
  
 
$$E[X]=\frac {r} {p},VAR[X]=\frac {r(1-p)} {p^{2}}$$
 
$$E[X]=\frac {r} {p},VAR[X]=\frac {r(1-p)} {p^{2}}$$
 
   
 
   
Suppose Y is the number of failures ($k$) to get $r$ successes. $P(Y=k)={k+r-1 \choose k} p^{r} (1-p)^{k}$, for $k=0,1,2,…,$ where $k$ is the number of failures before the $ r^{th} $ success. $Y \sim NegBin(r,p)$, the probability of $k$ failures and $r$ successes in $n = k+1$ $Bernoulli(p)$ trials with success on the last trial.
+
Suppose $Y$ is the number of failures ($k$) to get $r$ successes. $P(Y=k)={k+r-1 \choose k} p^{r} (1-p)^{k}$, for $k=0,1,2,…,$ where $k$ is the number of failures before the $ r^{th} $ success. $Y \sim NegBin(r,p)$, the probability of $k$ failures and $r$ successes in $n = k+1$ $Bernoulli(p)$ trials with success on the last trial.
  
 
$$E[Y]=\frac{r(1-p)}{p},VAR[Y]=\frac {r(1-p)} {p^{2}}$$
 
$$E[Y]=\frac{r(1-p)}{p},VAR[Y]=\frac {r(1-p)} {p^{2}}$$
Line 103: Line 111:
 
* ''probability density'' function $ f(x)= {e^{-x^2 \over 2} \over \sqrt{2 \pi}} $ and a  
 
* ''probability density'' function $ f(x)= {e^{-x^2 \over 2} \over \sqrt{2 \pi}} $ and a  
 
* ''cumulative distribution'' function $\Phi(y)= \int_{-\infty}^{y}{{e^{-x^2 \over 2} \over \sqrt{2 \pi}} dx}.$
 
* ''cumulative distribution'' function $\Phi(y)= \int_{-\infty}^{y}{{e^{-x^2 \over 2} \over \sqrt{2 \pi}} dx}.$
 +
* Computing [[AP_Statistics_Curriculum_2007_Normal_Critical| critical values]] and [[AP_Statistics_Curriculum_2007_Normal_Prob|probability values]] for the Normal distribution.
 +
 +
====Moment Generating Function====
 +
For a random variable $X$, the moment generating function (MGF) is defined by
 +
$$ M_X(t)=E(e^{tX})= \begin{cases}
 +
\sum_x {e^{tx}P(X=x)}, & \text{X=discrete} \\
 +
\int_{-\infty}^{\infty} {e^{tx}f_X(x)dx}, & \text{X=continuous}
 +
\end{cases},$$
 +
for all $t$ for which the sum (or integral) are convergent.
 +
 +
===== MGF Properties=====
 +
* When $ M_X(t)$ exists for $|t|< t_o$, the MGF uniquely determines the distribution of $X$.
 +
* Moments about the origin may be found by power series expansion:
 +
$$ M_X(t)=E(e^{tX})= E \bigg ( \sum_{k=0}^{\infty} {\frac{(tX)^k} {k!}}\bigg )=$$
 +
$$\sum_{k=0}^{\infty} {\frac{t^k} {k!}E(X^k)}.$$
 +
: Hence, So, for a given MGF function, the power-series (Taylor) expansion of this function of $t$ yields the $r^{th}$ moment of hte random variable at the origin, as the coefficient, $E(X^k)$, of the term $\frac{t^k} {k!}$ in the expansion.
 +
 +
* Moments about the origin may also be determined by differentiating the MGF:
 +
$$\frac{d^k} {t^k} \{M_X(t) \} = \frac{d^k} {t^k} \{E(e^{tX}) \} = E \bigg [ \frac{d^k} {t^k} e^{tX} \bigg ] = E\bigg [ X^ke^{tX} \bigg ].$$
 +
 +
: Thus, $\frac{d^k} {t^k} \{M_X(t) \}_{t=0} = E(X^k).$
 +
 +
: Note that $M_{a+bX}(t) = E\bigg ( e^{t(a+bX)} \bigg ) = e^{at}M_X(bt).$
 +
:: Example: if $Z \sim N(0,1)$, then the MGF($Z$) is $M_Z(t) = E(e^{tZ})=
 +
\int_{-\infty}^{\infty} {e^{tz}\frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}z^2} dz} =$
 +
: $ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} {e^{-\frac{1}{2}(z^2 -2tz+t^2)+} \frac{1}{2}t^2}=$
 +
: $ e^{\frac{1}{2}t^2} \bigg \{ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} {e^{-\frac{1}{2}(z-t)^2}dz} \bigg \} =  e^{\frac{1}{2}t^2} MGF(N(t,1)).$
 +
 +
* For independent variables ($X_1, X_2, ..., X_n$): $MGF_{X_1+...+X_n} =MGF_{X_1}MGF_{X_2}...MGF_{X_n}.$
 +
:: For example, if $Z_1, Z_2, ..., Z_n \sim N(0,1)$, then $V=Z_1^2+...+Z_n^2 \sim \chi_n^2, $ see the [http://web.am.qub.ac.uk/users/g.gribakin/sor/Chap6.pdf proof here].
  
 
===Applications===
 
===Applications===
*[http://www.mdm.com/articles/28757-the-case-for-proactive-inside-sales?v=preview  The article] examined how a proactive inside sales force can be critical to serving mid-market and small customers as part of a broader multichannel strategy and in included steps for initiating an effective program.
+
*[http://www.mdm.com/articles/28757-the-case-for-proactive-inside-sales?v=preview  The article] examines how a proactive inside sales force can be critical to serving mid-market and small customers as part of a broader multichannel strategy. It also includes steps for initiating an effective program.
  
*[http://wiki.socr.umich.edu/index.php/SOCR_EduMaterials_Activities_NegativeBinomial This article] provides an example of Negative Binomial Experiment by SOCR. The goal of this experiment is to provide a simulation demonstrating properties of the Negative Binomial(k,p) distribution. The applet facilitates the calculations of the Negative Binomial mass/density function, the moments and cumulative distribution function. It gives the specific steps of the experiment in SOCR and it allows users to learn about the variation of the distribution with changing parameters.
+
*[http://wiki.socr.umich.edu/index.php/SOCR_EduMaterials_Activities_NegativeBinomial This article] provides an example of Negative Binomial Experiment by SOCR. The goal is to provide a simulation demonstrating properties of the Negative Binomial ($k,p$) distribution. The applet facilitates the calculations of the Negative Binomial mass/density function, the moments and cumulative distribution function. It gives the specific steps of the experiment in SOCR and it allows users to learn about the variation of the distribution with changing parameters.
 +
 
 +
===Generating Probability Tables===
 +
Once can use R (and many other programming languages) to generate probability tables like the [http://socr.umich.edu/Applets/index.html#Tables popular SOCR Probability Tables]. You can also use the [http://socr.ucla.edu/htmls/dist/Fisher_Distribution.html Java Applet] or the [http://www.distributome.org/V3/calc/FCalculator.html HTML5/JavaScript Webapp] interactive
 +
F-Distribution calculators to obtain more dense and accurate measures of probability or critical values.
 +
 
 +
The following example generates one of the [http://socr.umich.edu/Applets/F_Table.html F distribution tables: $F(\alpha=0.001, df.num, df.deno)$]:
 +
 
 +
# Define the right-tail probability of interest $\alpha=0.001$
 +
right_tail_p <- 0.001
 +
 +
# Define the vectors storing the indices corresponding to numerator (n1) and denominator (n2, row)
 +
# degrees of freedom for $F(\alpha, n_1, n_2)$. Note that Inf corresponds to $\infty$.
 +
 +
n1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 24, 30, 40, 60, 120, Inf)
 +
n2 <- c(1:30, 40, 60, 120, Inf)
 +
 +
# Define precision (4-decimal point accuracy)
 +
options(digits=4)
 +
 +
# Generate an empty matrix of critical f-values
 +
f_table <- matrix(ncol=length(n1), nrow=length(n2))
 +
 +
# Use the The F Distribution quantile function to fill in the matrix values in a nested 2-loop
 +
# Recall that the density (df), distribution function (pf), quantile function (qf) and random generation (rf) for the F distribution
 +
 +
for (i in 1:length(n2)){
 +
    for (j in 1:length(n1)){
 +
f_table[i,j] <- qf(right_tail_p, n1[j], n2[i], lower.tail = FALSE)
 +
    }
 +
}
 +
 +
# Print results
 +
f_table
 +
 +
# label rows and columns
 +
rownames(f_table) <- n2; colnames(f_table) <- n1
 +
 +
# save results to a file
 +
write.table(f_table, file="C:\\User\\f_table.txt")
  
 
===Software===
 
===Software===
Line 118: Line 195:
 
===Problems===
 
===Problems===
 
* If sampling distributions of sample means are examined for samples of size 1, 5, 10, 16 and 50, you will notice that as sample size increases, the shape of the sampling distribution appears more like that of the:
 
* If sampling distributions of sample means are examined for samples of size 1, 5, 10, 16 and 50, you will notice that as sample size increases, the shape of the sampling distribution appears more like that of the:
: (a) normal distribution
+
: (a) Normal distribution
: (b) uniform distribution
+
: (b) Uniform distribution
: (c) population distribution
+
: (c) Population distribution
: (d) binomial distribution
+
: (d) Binomial distribution
  
* Which of the following statements best describes the effect on the Binomial Probability Model if the number of trials is held constant and the p(the probability of "success") increases?
+
* Which of the following statements best describes the effect on the Binomial Probability Model if the number of trials has held constant and the $p$ (probability of "success") increases?
: (a) None of these statements are true
+
: (a) None of these statements are true.
: (b) The mean and the standard deviation both increase
+
: (b) The mean and the standard deviation both increase.
: (c) The mean decreases and the standard deviation increases
+
: (c) The mean decreases and the standard deviation increases.
: (d) The mean increases and the standard deviation decreases
+
: (d) The mean increases and the standard deviation decreases.
: (e) The mean and standard deviation both decrease
+
: (e) The mean and standard deviation both decrease.
  
 
* Suppose you draw one card from a standard deck three times, with replacement. What is the probability that you get spades all three times? Choose one answer.
 
* Suppose you draw one card from a standard deck three times, with replacement. What is the probability that you get spades all three times? Choose one answer.
Line 136: Line 213:
 
: (d) 0.021
 
: (d) 0.021
  
* Suppose the number of cars that enter a parking lot in an hour is a Poisson random variable, and suppose that P(X=0)=0.05. Determine the variance of X.
+
* Suppose the number of cars that enter a parking lot in an hour is a Poisson random variable, and suppose that $P(X=0)=0.05$. Determine the variance of $X$.
 
: (a) 0.349
 
: (a) 0.349
 
: (b) 3.232
 
: (b) 3.232
Line 143: Line 220:
  
 
* A researcher converts 100 lung capacity measurements to z-scores. The lung capacity measurements do not follow a normal distribution. What can we say about the standard deviation of the 100 z-scores?
 
* A researcher converts 100 lung capacity measurements to z-scores. The lung capacity measurements do not follow a normal distribution. What can we say about the standard deviation of the 100 z-scores?
: (a) It depends on the standard deviation of the raw scores
+
: (a) It depends on the standard deviation of the raw scores.
: (b) It equals 1
+
: (b) It equals 1.
: (c) It equals 100
+
: (c) It equals 100.
: (d) It must always be less than the standard deviation of the raw scores
+
: (d) It must always be less than the standard deviation of the raw scores.
: (e) It depends on the shape of the raw score distribution
+
: (e) It depends on the shape of the raw score distribution.
  
* Among first year students at a certain university, scores on the verbal SAT follow the normal curve. The average is around 500 and the SD is about 100. Tatiana took the SAT, and placed at the 85% percentile. What was her verbal SAT score?
+
* Among first year students at a certain university, scores on the verbal SAT follow the normal curve. The average is around 500 and the $SD$ is about 100. Tatiana took the SAT, and placed at the 85% percentile. What was her verbal SAT score?
 
: (a) 604
 
: (a) 604
 
: (b) 560
 
: (b) 560
Line 155: Line 232:
 
: (d) 403
 
: (d) 403
  
* Consider a random sample 100 orc soldiers and found the mean and the standard deviation to be 200lbs and and 20lbs respectively. He can be 68% confident that the mean weight in the population of orc soldiers is between
+
* Consider a random sample of 100 orc soldiers in which the mean and the standard deviation are 200 lbs. and and 20 lbs., respectively. We can be 68% confident that the mean weight in the population of orc soldiers is between
 
: (a) 196 to 204 lbs
 
: (a) 196 to 204 lbs
 
: (b) 198 to 202 lbs
 
: (b) 198 to 202 lbs
Line 173: Line 250:
 
: (d) 62.44 inches
 
: (d) 62.44 inches
  
* The settlement (in cm) of a structure shown in the following figure may be evaluated from S = 0.3A + 0.2B + 0.1C,
+
* The settlement (in cm.) of a structure shown in the following figure may be evaluated from S = 0.3A + 0.2B + 0.1C,
 
<center>[[Image:SMHS_ProbabDist_Fig3.png]]</center>
 
<center>[[Image:SMHS_ProbabDist_Fig3.png]]</center>
: where A, B, and C are respectively the thickness (in m) of the three layers of soil as shown. Suppose A, B, and C are modeled as independent normal random variables as: $A \sim N(5,1)$, $B \sim N(8,2)$, $C \sim N(7,1)$,  
+
: Where A, B, and C are respectively the thickness (in m) of the three layers of soil as shown. Suppose A, B, and C are modeled as independent normal random variables as: $A \sim N(5,1)$, $B \sim N(8,2)$, $C \sim N(7,1)$,  
 
: (a) Determine the probability that the settlement will exceed 4 cm.
 
: (a) Determine the probability that the settlement will exceed 4 cm.
 
: (b) If the total thickness of the three layers is known exactly as 20 m; and furthermore, thicknesses A and B are correlated with correlation coefficient equal to 0.5, determine the probability that the settlement will exceed 4 cm.
 
: (b) If the total thickness of the three layers is known exactly as 20 m; and furthermore, thicknesses A and B are correlated with correlation coefficient equal to 0.5, determine the probability that the settlement will exceed 4 cm.
  
* Suppose that the distribution of X in the population is strongly skewed to the left. If you took 200 independent and random samples of size 3 from this population, calculated the mean for each of the 200 samples, and drew the distribution of the sample means, what would the sampling distribution of the means look like?
+
* Suppose that the distribution of $X$ in the population is strongly skewed to the left. If you took 200 independent and random samples of size 3 from this population, calculated the mean for each of the 200 samples and drew the distribution of the sample means, what would the sampling distribution of the means look like?
 
: (a) It will be perfectly normal and the mean will be equal to the median.
 
: (a) It will be perfectly normal and the mean will be equal to the median.
 
: (b) It will be close to the normal and the mean will be close to the median.
 
: (b) It will be close to the normal and the mean will be close to the median.
Line 186: Line 263:
  
 
* A polling agency has been hired to predict the proportion of voters who favor a certain candidate. The polling agency picks a random sample of 1000 voters of which 400 indicate that they favor the candidate. If they increase the sample size to 2000, how does the standard error change?
 
* A polling agency has been hired to predict the proportion of voters who favor a certain candidate. The polling agency picks a random sample of 1000 voters of which 400 indicate that they favor the candidate. If they increase the sample size to 2000, how does the standard error change?
: (a) The standard error will decrease by one-fourth
+
: (a) The standard error will decrease by one-fourth.
: (b) The standard error will not change; the margin of error changes
+
: (b) The standard error will not change; the margin of error changes.
: (c) Since the sample size is doubled, the standard error will be halved
+
: (c) Since the sample size is doubled, the standard error will be halved.
: (d) The standard error will decrease not by a factor of 1/2 but by the square of root of 1/2
+
: (d) The standard error will decrease not by a factor of 1/2 but by the square of root of 1/2.
  
 
* The probability of winning a certain instant scratch-n-win game is 0.02. You play the game 80 times. Find the probability that you win 3 times.
 
* The probability of winning a certain instant scratch-n-win game is 0.02. You play the game 80 times. Find the probability that you win 3 times.
Line 197: Line 274:
 
: (d) 0.2391
 
: (d) 0.2391
  
* After firing 1000 boxes of ammunition, a certain handgun jamed according to a Poisson distribution with a mean of 0.4 per box of ammunition. Approximate the probability that more than 350 boxes of ammunition contain some that is jammed.
+
* After firing 1,000 boxes of ammunition, a certain handgun jamed according to a Poisson distribution with a mean of 0.4 per box of ammunition. Approximate the probability that more than 350 boxes of ammunition contain some that is jammed.
 
: (a) .0032
 
: (a) .0032
 
: (b) .0012
 
: (b) .0012

Latest revision as of 13:44, 18 March 2016

Scientific Methods for Health Sciences - Probability Distributions

Overview

Distributions are the fundamental basis of probability theory. There are two types of processes that we observe in nature - discrete and continuous - and they are modeled by the corresponding distributions. (There can also be mixture, multidimensional and tensor distributions, which are not discussed here). The type of distribution depends on the type of data. Discrete and continuous distributions represent discrete or continuous random variables, respectively. This section aims to introduce various discrete and continuous distributions and to discuss the relationships between distributions.

Motivation

We have talked about different types of data and the fundamentals of probability theory. In order to capture and estimate patterns in data, we introduced the concept of a distribution. A probability distribution assigns a probability to each measurable subset of the possible outcomes of a random experiment. It can either be univariate or multivariate. A univariate distribution gives the probability of a single random variable, while a multivariate distribution (i.e., a joint probability distribution) gives the probability of a random vector, which is a set of two or more random variables taking on various combinations of values. Consider the coin tossing experiment; what distribution would we expect the outcomes to follow?

Theory

Random variables: A random variable is a function or a mapping from a sample space onto the real numbers (most of the time). In other words, a random variable assigns real values to outcomes of experiments.

Probability density / mass functions and the cumulative distribution function

The probability density function (pdf) or probability mass function (pmf) for a continuous or discrete random variable, respectively, is the function defined by the probability of the subset of the sample space $\{s\in S\}\subset S$. $p(x)=P(\{s\in S\} | X(s)=x)$, for all $x$.

The cumulative distribution function (cdf) $F(x)$ for any random variable $X$ with probability mass or density function $p(x)$ is defined by the total probability of all $\{s\in S\}\subset S$, where $X(s) \leq x; F(x)=P(X\leq x)$, for all x.

Expectation and variance

  • Expectation: The expected value, expectation or mean, of a discrete random variable $X$ is defined as $E[X]=\sum_i {x_i P(X=x_i)}$. The expected value of a continuous random variable $Y$ is defined as $E[Y]=\int_y{yP(y)dy}$. This is the integral over the domain of $Y$, where $P(y)$ is the probability density function of $Y$. An important property of the expectation is that it is a linear functional, i.e., $E[aX+bY]=aE[X]+bE[Y]$.
  • Variance: The variance of a discrete random variable $X$ is defined as $VAR[X]=\sum_i {(x_i-E[X])^2 P(X=x_i)}$. The variance of a continuous random variable $Y$ is defined as $VAR[Y]=\int_y {(y-E[Y])^2 P(y)dy}$. This is the integral over the domain of $Y$ and $P(y)$ is the probability density function of $Y$. The second moment, variance, does not quite have the same linear functional properties as the expectation: $VAR[aX]= a^2 VAR[X]$ and $VAR[X+Y]=VAR[X]+VAR[Y]+2COV(X,Y)$.
  • Covariance:$COV(X,Y)=E[(X-E[X])(Y-E[Y])]$.

Bernoulli distribution

A Bernoulli trial is an experiment whose dichotomous outcomes are random (e.g. ‘head vs. ‘tail’). $X(outcome)= \begin{cases} 0, & \text{s=head} \\ 1, & \text{s=tail} \end{cases}$. If p=P(head), then $E[X]=p$ and $VAR[X]=p(1-p)$.

Binomial distribution

Suppose we conduct an experiment observing $n$ trials of a Bernoulli process. If we are interested in the $RV$ $x$ = {Number of heads in $n$ trials}, then $X$ is called a binomial RV and its distribution is called binomial distribution. We say $X \sim B(n,p)$, where $n$ is the sample size and $p$ is the probability of heads during one trial. $P(X=x)={n\choose x} p^x (1-p)^{n-x}$, for $x=0,1,…,n$, where ${n\choose x}=\frac {n!} {x!(n-x)!}$ is the binomial coefficient. $$E[X]=np;VAR[X]=np(1-p)$$

  • Example: Suppose a dementia clinic (A) has $n=10$ mildly cognitively impaired (MCI) patients. A binomial probability model, $B(n,p)$, can be used to model the number of patients that are expected to convert to dementia within 1 year. A similar dementia clinic (B) reported that they tend to see 30% of the MCI patients convert to dementia annually. Then the probability that $k$ patients (from the 10 MCIs in clinic A) would convert to dementia is given by the binomial probability. Compute the following:
    • The number of MCI patients in clinic A expected to convert to dementia in a one year time frame ($3$).
    • The variance of the number of expected MCI-to-dementia conversions annually ($2.1$).
    • The probability that 2 or more MCI patients in clinic A will convert to dementia in a year, $P(X≥2)$, ($0.8506917$). Think of the complement.

Multinomial distribution

The multinomial distribution is an extension of binomial where the experiment consists of $k$ repeated trials and each trial has a discrete number of possible outcomes. In any given trial, the probability that a particular outcome will occur is constant, and the trials are independent.

$ p=P(X_1=r_1 \cap \cdots \cap X_k=r_{k}│r_1 + ⋯ +r_k=n)$ = ${n\choose r_1,…,r_k} p_1^{r_1} p_2^{r_2}…p_k^{r_k}$ for all (∀) $r_1+⋯+r_k=n$ where ${n\choose r_1,…,r_k}=\frac {n!}{r_1! \times … \times r_k!}$.

Geometric distribution

The probability distribution of the number, $X$, of Bernoulli trials needed to obtain one success is called the geometric distribution. It is supported on the set $\{1,2,3,…\}$. $P(X=x)=(1-p)^{x-1}p$, for $x = 1, 2, … $

$$E[X]=\dfrac {1} {p},VAR[X]= \frac {1-p} {p^{2}}$$

Hypergeometric distribution

A discrete probability distribution that describes the number of successes in a sequence of $n$ draws from a finite population without replacement. An experimental design for using the hypergeometric distribution is illustrated in the table below. A shipment of $N$ objects includes $m$ defective ones. The hypergeometric distribution describes the probability that in a sample of $n$ distinct objects drawn from the shipment, exactly $k$ will be defective.

Type Drawn Not-Drawn Total
Defective $k$ $m-k$ $m$
Non-Defective $n-k$ $N+k-n-m$ $N-m$
Total $n$ $N-n$ $N$

$$ P(X=k)=\frac {{m \choose k}{N-m \choose n-k}} {N \choose n}, E[X]=\frac{nm}{N}, VAR[X]=\frac{\frac{nm}{N}(1-\frac{m}{N})(N-n)} {N-1}$$

Negative binomial distribution

Suppose $X$ is the trial index ($n$) of the $r^{th}$ success, or the total number of experiments ($n$) needed to get $r$ successes. The negative binomial distribution has the following mass function $P(X=n)={n-1 \choose r-1} p^r (1-p)^{(n-r)}$, for $n=r,r+1,r+2,…$, where $n$ is the trial number of the $𝑟^{𝑡ℎ}$ success.

$$E[X]=\frac {r} {p},VAR[X]=\frac {r(1-p)} {p^{2}}$$

Suppose $Y$ is the number of failures ($k$) to get $r$ successes. $P(Y=k)={k+r-1 \choose k} p^{r} (1-p)^{k}$, for $k=0,1,2,…,$ where $k$ is the number of failures before the $ r^{th} $ success. $Y \sim NegBin(r,p)$, the probability of $k$ failures and $r$ successes in $n = k+1$ $Bernoulli(p)$ trials with success on the last trial.

$$E[Y]=\frac{r(1-p)}{p},VAR[Y]=\frac {r(1-p)} {p^{2}}$$

NOTE: $X=Y+r,E[X]=E[Y]+r,VAR[X]=VAR[Y]$.

Negative multinomial distribution (NMD)

The NMD is a generalization of the two-parameter $NegBin(r,p)$ to more than one outcome. Suppose we have $m$ possible outcomes $\{X_0,…,X_m\}$ each with probability $\{p_0,…,p_m \}$, respectively, where $0<p_i<1$ and $\sum_{i=0}^m {p_i} =1$. Suppose the experiment generates independent outcomes until $\{X_0,…,X_m \}$ occur exactly $\{k_0,…,k_m \}$ times; then $\{X_{0},…,X_{m}\}$ follows a negative multinomial distribution with parameter vector $(k_0,\{p_{1},…,p_{m}\})$, where $m$ represents the degrees of freedom.

  • In the special case of $m=1$, if $X$ is the total number of experiments ($n$) necessary to get $k_{0}$ and $n-k_{0}$ outcomes of the other possible outcome $(X_{1})$. $X \sim NegativeMultinomial(k_{0},{p_{0},p_{1}})$
  • NMD Probability Mass Function\[ P(k_1, \cdots, k_m|k_0,\{p_1,\cdots,p_m\}) = \left (\sum_{i=0}^m{k_i}-1\right)!\frac{p_0^{k_0}}{(k_0-1)!} \prod_{i=1}^m{\frac{p_i^{k_i}}{k_i!}},\] or equivalently:

\[ P(k_1, \cdots, k_m|k_0,\{p_1,\cdots,p_m\}) = \Gamma\left(\sum_{i=1}^m{k_i}\right)\frac{p_0^{k_0}}{\Gamma(k_0)} \prod_{i=1}^m{\frac{p_i^{k_i}}{k_i!}},\]

where \(\Gamma(x)\) is the Gamma function.
  • Mean (vector)\[\mu=E(X_1,\cdots,X_m)= (\mu_1=E(X_1), \cdots, \mu_m=E(X_m)) = \left ( \frac{k_0p_1}{p_0}, \cdots, \frac{k_0p_m}{p_0} \right).\]
  • Variance-Covariance (matrix)\[Cov(X_i,X_j)= \{cov[i,j]\},\] where

\[ cov[i,j] = \begin{cases} \frac{k_0 p_i p_j}{p_0^2},& i\not= j,\\ \frac{k_0 p_i (p_i + p_0)}{p_0^2},& i=j.\end{cases}\]

Poisson distribution

The discrete Poisson distribution expresses the probability of a number of events occurring in a fixed interval of time given these events occur at a known average rate that is independent of the time since the last event. The figure below shows the PDF of a Poisson distribution with varying parameter ($\lambda$) values.

SMHS Probability Fig1.png

The distribution is right-skewed, but for increasing $\lambda$ (say $\lambda>40$) the distribution becomes bell shaped. See the Normal approximation to the Poisson distribution section. The Figure below shows the CDF of the Poisson distribution with varying parameter values.

SMHS ProbabilityDistribution fig2.png

You can also see the Distributome interactive Poisson calculator.

$$P(X=k)=\frac{λ^{k}e^{-λ}}{k!},E[X]=λ,VAR[X]=λ.$$

The CDF is discontinuous at the integers of $k$ and flat everywhere else because the variable only takes on integer values. That is, the CDF of the Poisson distribution is left continuous but not right continuous. Also note, the CDF of the Poisson distribution takes on the value of 0 with 0 occurrence and it is non-decreasing with increasing numbers of occurrences. It increases and then stays at 1 after a certain number of occurrence.

Normal distribution

The continuous standard normal distribution has a

  • probability density function $ f(x)= {e^{-x^2 \over 2} \over \sqrt{2 \pi}} $ and a
  • cumulative distribution function $\Phi(y)= \int_{-\infty}^{y}{{e^{-x^2 \over 2} \over \sqrt{2 \pi}} dx}.$
  • Computing critical values and probability values for the Normal distribution.

Moment Generating Function

For a random variable $X$, the moment generating function (MGF) is defined by $$ M_X(t)=E(e^{tX})= \begin{cases} \sum_x {e^{tx}P(X=x)}, & \text{X=discrete} \\ \int_{-\infty}^{\infty} {e^{tx}f_X(x)dx}, & \text{X=continuous} \end{cases},$$ for all $t$ for which the sum (or integral) are convergent.

MGF Properties
  • When $ M_X(t)$ exists for $|t|< t_o$, the MGF uniquely determines the distribution of $X$.
  • Moments about the origin may be found by power series expansion:

$$ M_X(t)=E(e^{tX})= E \bigg ( \sum_{k=0}^{\infty} {\frac{(tX)^k} {k!}}\bigg )=$$ $$\sum_{k=0}^{\infty} {\frac{t^k} {k!}E(X^k)}.$$

Hence, So, for a given MGF function, the power-series (Taylor) expansion of this function of $t$ yields the $r^{th}$ moment of hte random variable at the origin, as the coefficient, $E(X^k)$, of the term $\frac{t^k} {k!}$ in the expansion.
  • Moments about the origin may also be determined by differentiating the MGF:

$$\frac{d^k} {t^k} \{M_X(t) \} = \frac{d^k} {t^k} \{E(e^{tX}) \} = E \bigg [ \frac{d^k} {t^k} e^{tX} \bigg ] = E\bigg [ X^ke^{tX} \bigg ].$$

Thus, $\frac{d^k} {t^k} \{M_X(t) \}_{t=0} = E(X^k).$
Note that $M_{a+bX}(t) = E\bigg ( e^{t(a+bX)} \bigg ) = e^{at}M_X(bt).$
Example: if $Z \sim N(0,1)$, then the MGF($Z$) is $M_Z(t) = E(e^{tZ})=

\int_{-\infty}^{\infty} {e^{tz}\frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}z^2} dz} =$

$ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} {e^{-\frac{1}{2}(z^2 -2tz+t^2)+} \frac{1}{2}t^2}=$
$ e^{\frac{1}{2}t^2} \bigg \{ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} {e^{-\frac{1}{2}(z-t)^2}dz} \bigg \} = e^{\frac{1}{2}t^2} MGF(N(t,1)).$
  • For independent variables ($X_1, X_2, ..., X_n$): $MGF_{X_1+...+X_n} =MGF_{X_1}MGF_{X_2}...MGF_{X_n}.$
For example, if $Z_1, Z_2, ..., Z_n \sim N(0,1)$, then $V=Z_1^2+...+Z_n^2 \sim \chi_n^2, $ see the proof here.

Applications

  • The article examines how a proactive inside sales force can be critical to serving mid-market and small customers as part of a broader multichannel strategy. It also includes steps for initiating an effective program.
  • This article provides an example of Negative Binomial Experiment by SOCR. The goal is to provide a simulation demonstrating properties of the Negative Binomial ($k,p$) distribution. The applet facilitates the calculations of the Negative Binomial mass/density function, the moments and cumulative distribution function. It gives the specific steps of the experiment in SOCR and it allows users to learn about the variation of the distribution with changing parameters.

Generating Probability Tables

Once can use R (and many other programming languages) to generate probability tables like the popular SOCR Probability Tables. You can also use the Java Applet or the HTML5/JavaScript Webapp interactive F-Distribution calculators to obtain more dense and accurate measures of probability or critical values.

The following example generates one of the F distribution tables: $F(\alpha=0.001, df.num, df.deno)$:

# Define the right-tail probability of interest $\alpha=0.001$
right_tail_p <- 0.001

# Define the vectors storing the indices corresponding to numerator (n1) and denominator (n2, row)
# degrees of freedom for $F(\alpha, n_1, n_2)$. Note that Inf corresponds to $\infty$.

n1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 24, 30, 40, 60, 120, Inf)
n2 <- c(1:30, 40, 60, 120, Inf)

# Define precision (4-decimal point accuracy)
options(digits=4)

# Generate an empty matrix of critical f-values
f_table <- matrix(ncol=length(n1), nrow=length(n2))

# Use the The F Distribution quantile function to fill in the matrix values in a nested 2-loop
# Recall that the density (df), distribution function (pf), quantile function (qf) and random generation (rf) for the F distribution

for (i in 1:length(n2)){
    for (j in 1:length(n1)){
	f_table[i,j] <- qf(right_tail_p, n1[j], n2[i], lower.tail = FALSE)
    }
} 

# Print results
f_table

# label rows and columns
rownames(f_table) <- n2; colnames(f_table) <- n1

# save results to a file
write.table(f_table, file="C:\\User\\f_table.txt")

Software

Problems

  • If sampling distributions of sample means are examined for samples of size 1, 5, 10, 16 and 50, you will notice that as sample size increases, the shape of the sampling distribution appears more like that of the:
(a) Normal distribution
(b) Uniform distribution
(c) Population distribution
(d) Binomial distribution
  • Which of the following statements best describes the effect on the Binomial Probability Model if the number of trials has held constant and the $p$ (probability of "success") increases?
(a) None of these statements are true.
(b) The mean and the standard deviation both increase.
(c) The mean decreases and the standard deviation increases.
(d) The mean increases and the standard deviation decreases.
(e) The mean and standard deviation both decrease.
  • Suppose you draw one card from a standard deck three times, with replacement. What is the probability that you get spades all three times? Choose one answer.
(a) 0.002
(b) 0.321
(c) 0.015
(d) 0.021
  • Suppose the number of cars that enter a parking lot in an hour is a Poisson random variable, and suppose that $P(X=0)=0.05$. Determine the variance of $X$.
(a) 0.349
(b) 3.232
(c) 9.321
(d) 2.996
  • A researcher converts 100 lung capacity measurements to z-scores. The lung capacity measurements do not follow a normal distribution. What can we say about the standard deviation of the 100 z-scores?
(a) It depends on the standard deviation of the raw scores.
(b) It equals 1.
(c) It equals 100.
(d) It must always be less than the standard deviation of the raw scores.
(e) It depends on the shape of the raw score distribution.
  • Among first year students at a certain university, scores on the verbal SAT follow the normal curve. The average is around 500 and the $SD$ is about 100. Tatiana took the SAT, and placed at the 85% percentile. What was her verbal SAT score?
(a) 604
(b) 560
(c) 90
(d) 403
  • Consider a random sample of 100 orc soldiers in which the mean and the standard deviation are 200 lbs. and and 20 lbs., respectively. We can be 68% confident that the mean weight in the population of orc soldiers is between
(a) 196 to 204 lbs
(b) 198 to 202 lbs
(c) 194 to 206 lbs
(d) None of the above
  • The Rockwell hardness of certain metal pins is known to have a mean of 50 and a standard deviation of 1.5. If the distribution of all such pin hardness measurements is known to be normal, what is the probability that the average hardness for a random sample of nine pins is at least 50.5?
(a) Approximately 4
(b) 0.4
(c) Approximately 0.1587
(d) Approximately 0
  • You read that the heights of college women are nearly normal with a mean of 65 inches and a standard deviation of 2 inches. If Vanessa is at the 10th percentile (shortest 10% for women) in height for college women, then her height is closest to:
(a) 64.5 inches
(b) It cannot be determined from this information
(c) 60.5 inches
(d) 62.44 inches
  • The settlement (in cm.) of a structure shown in the following figure may be evaluated from S = 0.3A + 0.2B + 0.1C,
SMHS ProbabDist Fig3.png
Where A, B, and C are respectively the thickness (in m) of the three layers of soil as shown. Suppose A, B, and C are modeled as independent normal random variables as: $A \sim N(5,1)$, $B \sim N(8,2)$, $C \sim N(7,1)$,
(a) Determine the probability that the settlement will exceed 4 cm.
(b) If the total thickness of the three layers is known exactly as 20 m; and furthermore, thicknesses A and B are correlated with correlation coefficient equal to 0.5, determine the probability that the settlement will exceed 4 cm.
  • Suppose that the distribution of $X$ in the population is strongly skewed to the left. If you took 200 independent and random samples of size 3 from this population, calculated the mean for each of the 200 samples and drew the distribution of the sample means, what would the sampling distribution of the means look like?
(a) It will be perfectly normal and the mean will be equal to the median.
(b) It will be close to the normal and the mean will be close to the median.
(c) On a p-plot, most of the points will be on the line.
(d) It will be skewed to the left and the mean will be less than the median.
  • A polling agency has been hired to predict the proportion of voters who favor a certain candidate. The polling agency picks a random sample of 1000 voters of which 400 indicate that they favor the candidate. If they increase the sample size to 2000, how does the standard error change?
(a) The standard error will decrease by one-fourth.
(b) The standard error will not change; the margin of error changes.
(c) Since the sample size is doubled, the standard error will be halved.
(d) The standard error will decrease not by a factor of 1/2 but by the square of root of 1/2.
  • The probability of winning a certain instant scratch-n-win game is 0.02. You play the game 80 times. Find the probability that you win 3 times.
(a) 0.2983
(b) 0.1378
(c) 0.3231
(d) 0.2391
  • After firing 1,000 boxes of ammunition, a certain handgun jamed according to a Poisson distribution with a mean of 0.4 per box of ammunition. Approximate the probability that more than 350 boxes of ammunition contain some that is jammed.
(a) .0032
(b) .0012
(c) .0231
(d) .0089

References





Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif