Difference between revisions of "SMHS ProbabilityDistributions"

Revision as of 15:44, 30 July 2014

Scientific Methods for Health Sciences - Probability Distributions

Overview

Distributions are the fundamental basis of probability theory. There are two types of processes that we observe in nature – discrete and continuous distributions (there could also be mixtures, multidimensional or tensor distributions which are not discussed here). The type of distribution depends on the type of data. Discrete and continuous distributions represent discrete or continuous random variables, respectively. This section aims to introduce various kinds of discrete and continuous distributions and the relationships between distributions.

Discrete distribution: [[AP_Statistics_Curriculum_2007_Distrib_Binomial|Bernoulli distribution, Binomial distribution, Multinomial distribution, Geometric distribution, Hypergeometric distribution, Negative binomial distribution, Negative multinomial distribution, Poisson distribution.

Continuous distribution: Normal distribution, Multivariate normal distribution.

Motivation

We have talked about different types of data and the fundamentals of probability theory. In order to capture and estimate the patterns of data, we introduced the concept of distribution. A probability distribution assigns a probability to each measurable subset of the possible outcomes of a random experiment. It can either be univariate or multivariate. A univariate distribution gives the probability of a single random variable while the a multivariate distribution (a joint probability distribution) gives the probability of a random vector which is a set of two or more random variables taking on various combinations of values. Consider the coin tossing experiment, what would be the distribution of the outcome?

Theory

Random variables: a random variable is a function or a mapping from a sample space into the real numbers (most of the time). In other words, a random variable assigns real values to outcomes of experiments.

Probability density / mass and (cumulative) distribution functions The probability density or probability mass function (pdf), for a continuous or discrete random variable, is the function defined by the probability of the subset of the sample space $\{s\in S\}\subset S$. $p(x)=P(\{s\in S\} | X(s)=x)$, for all $x$. The cumulative distribution function (cdf) $F(x)$ of any random variable $X$ with probability mass or density function $p(x)$ is defined by the total probability of all $\{s\in S\}\subset S$, where $X(s) \leq x; F(x)=P(X\leq x)$, for all x.

Expectation and variance

Expectation: The expected value, expectation or mean, of a discrete random variable $X$ is defined as $E[X]=\sum_i {x_i P(X=x_i)}$. The expected value of a continuous random variable $Y$ is defined as $E[Y]=\int_y{yP(y)dy}$, which is the integral over the domain of $Y$ and $P(y)$ is the probability density function of $Y$. An important property of expectation is that it is a linear functional, i.e., $E[aX+bY]=aE[X]+bE[Y]$.

Variance: The variance of a discrete random variable $X$ is defined as $VAR[X]=\\sum_i {(x_i-E[X])^2 P(X=x_i)}$. Variance of a continuous random variable $Y$ is defined as $VAR[Y]=\int_y {(y-E[Y])^2 P(y)dy}$, which is the integral over the domain of $Y$ and $P(y)$ is the probability density function of $Y$. The second moment, variance, does not quite have the same linear functional properties as the expectation: $VAR[aX]= a^2 VAR[X]$ and $VAR[X+Y]=VAR[X]+VAR[Y]+2COV(X,Y)$.
Covariance:$COV(X,Y)=E[(X-E(X))(Y-E[Y])]$.

Bernoulli distribution

A Bernoulli trial is an experiment whose dichotomous outcomes are random (e.g. ‘head vs. ‘tail’). $X(outcome)= \begin{cases} 0, & \text{s=head} \\ 1, & \text{s=tail} \end{cases}$. If p=P(head), then $E[X]=p$ and $VAR[X]=p(1-p)$.

Binomial distribution

Suppose we conduct an experiment observing n trial Bernoulli process. If we are interested in the RV $x$ = {Number of heads in the $n$ trials}, then $X$ is called a Binomial RV and its distribution is called Binomial distribution, $X \approx B(n,p)$,where $n$ is sample size, $p$ is the probability of head at one trial. $P(X=x)={n\choose x} p^x (1-p)^{n-x}$, for $x=0,1,…,n$, where ${n\choose x}=\frac {n!} {x!(n-x)!}$ is the binomial coefficient. $$E[X]=np,VAR[X]=np(1-p)$$

Multinomial distribution

The multinomial distribution is an extension of binomial where the experiment consists of $k$ repeated trials and each trial has a discrete number of possible outcomes; on any given trial, the probability that a particular outcome will occur is constant; the trials are independent.

$ p=P(X_1=r_1 \cap \cdots \cap X_k=r_{k}│r_1 + ⋯ +r_k=n)$ = ${n\choose r_1,…,r_k} p_1^{r_1} p_2^{r_2}…p_k^{r_k}$ for all (∀) $r_1+⋯+r_k=n$ where ${n\choose r_1,…,r_k}=\frac {n!}{r_1! \times … \times r_k!}$.

Geometric distribution

The probability distribution of number X of Bernoulli trials needed to get one success, supported on the set $\{1,2,3,…\}$. $P(X=x)=(1-p)^{x-1}p$, for $x = 1, 2, … $

$$(E[X]=\dfrac {1} {p},VAR[X]= \dfrac {1-p} {p^{2}}$$

Hypergeometric distribution

A discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement. An experimental design for using Hypergeometric distribution is illustrated in this table: a shipment of N objects in which m are defective. The Hypergeometric Distribution describes the probability that in a sample of n distinctive objects drawn from the shipment exactly k objects are defective.

Type	Drawn	Not-Drawn	Total
Defective	k	m-k	m
Non-Defective	n-k	N+k-n-m	N-m
Total	n	N-n	N

$$ P(X=k)=\dfrac {{m \choose k}{N-m \choose n-k}} {N \choose n}$,$E[X]=\frac{nm}{N}$, $VAR[X]=\frac{\frac{nm}{N}(1-\frac{m}{N})(N-n)} {N-1}$$

Negative binomial distribution

Suppose X=trial index (n) of the $r^{th}$ success, or total number of experiments ($n$) to get $r$ successes. $P(X=n)={n-1 \choose r-1} p^r (1-p)^(n-r)$, for $n=r,r+1,r+2,…$, where n is the trial number of the $𝑟^𝑡ℎ$ success.

$$E[X]=\dfrac {r} {p},VAR[X]=\dfrac {r(1-p)} {p^{2}}$$

Suppose Y= Number of failures (k) to get r successes. $P(Y=k)={k+r-1 choose k} p^{r} (1-p)^{k}$, for $k=0,1,2,…,$ where k is the number of failures before the $ r^{th} $ success. $Y~NegBin(r,p)$, the probability of k failures and r successes in n = k+1 Bernoulli(p) trials with success on the last trial.

$E[Y]=\dfrac{r(1-p)}{p},VAR[Y]=\dfrac {r(1-p)} {p^{2}}$

NOTE: $X=Y+r,E[X]=E[Y]+r,VAR[X]=VAR[Y]$.

3.10) Negative multinomial distribution (NMD): a generalization of the two-parameter NB(r,p) to more than one outcomes. Suppose we have m possible outcomes where $m≥1,{X_{0},…,X_{m}}$ each with probability ${p_0,…,p_m }$ respectively where $0<p_i<1$ and $∑_(i=0)^m p_i =1$ Suppose the experiment generate independent outcomes until ${X_0,…,X_m }$ occur exactly ${k_0,…,k_m }$ times, then ${X_{0},…,X_{m}}$ s Negative Multinomial with parameter vector $(k_0,{p_{1},…,p_{m}})$. The degree of freedom is m. X is the total number of experiments (n) to get $k_{0}$ and $n-k_{o}$ outcomes of the other possible outcome $(X_{1})$. $X~NegativeMultinomial(k_{0},{p_[0},p_{1})$

$ P(k_{1},…,k_{m}|k_{0},\{p_{1},…,p_{m}\})$ = $\Bigg(\sum_{i=0}^{m}k_{i}-1\Bigg)$!$\dfrac {p_{0}^{k}_{0}}{(k_{0}-1)}!$

REST OF PROBLEM HERE $∏_{i=1}^{m}\dfrac{p_{I}^{k}_{i}} {k_{i}}!$

3.11) Poisson distribution: a discrete distribution that expresses the probability of the number of events occurring in a fixed interval of time if these events occur with a known average rate and independently of the time since the last event.

Figure 1 PDF of Poisson Distribution with changing parameter.

This is the PDF of Poisson distribution with changing values for parameter λ, as we can see that the curve gets more and more sharp with increasing λ and when λ is large enough (say greater than 1) the distribution gives a skewed bell shape. The function is only defined at integer values of k so the connecting lines are only guides for the eye.

Figure 2 CDF of Poisson Distribution with changing parameters.

$P(X=k)=(λ^k e^(-λ))/k!,E[X]=λ,VAR[X]=λ.$

This is the CDF of Poisson distribution with changing values of λ, as we can see that the curve increases faster to 1 with decreasing λ. The CDF is discontinuous at the integers of k and flat everywhere else because the variable only takes on integer values. That is the CDF of Poisson distribution is left continuous but not right continuous. Also note, the CDF of Poisson distribution takes on the value of 0 with 0 occurrence and it is non-decreasing with increasing number of occurrence. And it increases and stays at 1 after certain number of occurrence.

SOCR Home page: http://www.socr.umich.edu

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

@@ Line 1: / Line 1: @@
 ==[[SMHS| Scientific Methods for Health Sciences]] - Probability Distributions ==
-'''IV. HS 850: Fundamentals'''
+===Overview===
+Distributions are the fundamental basis of probability theory. There are two types of processes that we observe in nature – discrete and continuous distributions (there could also be mixtures, multidimensional or tensor distributions which are not discussed here). The type of distribution depends on the type of data. Discrete and continuous distributions represent discrete or continuous random variables, respectively. This section aims to introduce various kinds of discrete and continuous distributions and the relationships between distributions.
-'''Distributions'''
+*Discrete distribution: [[AP_Statistics_Curriculum_2007_Distrib_Binomial|Bernoulli distribution, [[AP_Statistics_Curriculum_2007_Distrib_Binomial|Binomial distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Multinomial|Multinomial distribution]], [[SOCR_EduMaterials_Activities_Explore_Distributions#Geometric_probability_distribution|Geometric distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Dists#HyperGeometric|Hypergeometric distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Binomial|Negative binomial distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Multinomial_Distribution_.28NMD.29|Negative multinomial distribution]], [[AP_Statistics_Curriculum_2007_Distrib_Poisson|Poisson distribution]].
-===Overview===
+*Continuous distribution: [[AP_Statistics_Curriculum_2007#Chapter_V:_Normal_Probability_Distribution|Normal distribution]], [[SOCR_BivariateNormal_JS_Activity| Multivariate normal distribution]].
-Distribution is the fundamental basis of probability theory. They are two types of processes that we observe in nature – discrete and continuous distributions. The type of distribution depends on the type of data. Namely, discrete distribution is for discrete variable and continuous distribution is for continuous variable. This section aims to introduce various kinds of discrete and continuous distributions and the relationships between distributions.
-*Discrete distribution: Bernoulli distribution, Binomial distribution, Multinomial distribution, Geometric distribution, Hypergeometric distribution, Negative binomial distribution, Negative multinomial distribution, Poisson distribution.
-*Continuous distribution: Normal distribution, Multivariate normal distribution.
 ===Motivation===
@@ Line 14: / Line 12: @@
 ===Theory===
-.1) Random variables: a random variable is a function or a mapping from a sample space into the real numbers (most of the time). In other words, a random variable assigns real values to outcomes of experiments.
+'''Random variables''': a random variable is a function or a mapping from a sample space into the real numbers (most of the time). In other words, a random variable assigns real values to outcomes of experiments.
-.2) Probability density / mass and (cumulative) distribution functions:
+'''Probability density / mass and (cumulative) distribution functions'''
-The probability density or probability mass function (pdf), for a continuous or discrete random variable, is the function defined by the probability of the subset of the sample space $S,{s∈S}⊂S. p(x)=P({s∈S}|X(s)=x)$, all x.
+The probability density or probability mass function (pdf), for a continuous or discrete random variable, is the function defined by the probability of the subset of the sample space $\{s\in S\}\subset S$. $p(x)=P(\{s\in S\} | X(s)=x)$, for all $x$.
-The cumulative distribution function (cdf) F(x) of any random variable X with probability mass or density function p(x) is defined by the total probability of all ${s∈S}⊂S$, where $X(s)≤x; F(x)=P(X≤x)$ all x.
+The cumulative distribution function (cdf) $F(x)$ of any random variable $X$ with probability mass or density function $p(x)$ is defined by the total probability of all $\{s\in S\}\subset S$, where $X(s) \leq x; F(x)=P(X\leq x)$, for all x.
-.3)  Introduction to expectation and variance.
+'''Expectation and variance'''
-*Expectation: The expected value, expectation or mean, of a discrete random variable X is defined as $E[X]=∑_{x}xP(X=x)$,expectation of a continuous random variable Y is defined as $E[Y]=∫yP(y)dy$, which is the integral over the domain of $Y$ and $P(y)$ is the probability density function of Y. An important property of expectation is $E[aX+bY]=aE[X]+bE[Y]$.
+*Expectation: The expected value, expectation or mean, of a discrete random variable $X$ is defined as $E[X]=\sum_i {x_i P(X=x_i)}$. The expected value of a continuous random variable $Y$ is defined as $E[Y]=\int_y{yP(y)dy}$, which is the integral over the domain of $Y$ and $P(y)$ is the probability density function of $Y$. An important property of expectation is that it is a linear functional, i.e., $E[aX+bY]=aE[X]+bE[Y]$.
-*Variance: The variance of a discrete random variable $X$ is defined as $VAR[X]=∑_x(X-E[X])^2 P(X=x)$, variance of a continuous random variable $Y$ is defined as $VAR[Y]=∫ (y-E[Y])^2 P(y)dy$, which is the integral over the domain of $Y$ and $P(y$) is the probability density function of $Y$. $VAR[aX]= a^2 VAR[X]$. $VAR[X+Y]=VAR[X]+VAR[Y]+2COV(X,Y$.
+*Variance: The variance of a discrete random variable $X$ is defined as $VAR[X]=\\sum_i {(x_i-E[X])^2 P(X=x_i)}$. Variance of a continuous random variable $Y$ is defined as $VAR[Y]=\int_y {(y-E[Y])^2 P(y)dy}$, which is the integral over the domain of $Y$ and $P(y)$ is the probability density function of $Y$. The second moment, variance, does not quite have the same linear functional properties as the expectation: $VAR[aX]= a^2 VAR[X]$ and $VAR[X+Y]=VAR[X]+VAR[Y]+2COV(X,Y)$.
 *Covariance:$COV(X,Y)=E[(X-E(X))(Y-E[Y])]$.
-.4) Bernoulli distribution: A Bernoulli trial is an experiment whose dichotomous outcomes are random (e.g. ‘head vs. ‘tail’). $X(outcome)= \begin{cases}
+====Bernoulli distribution====
+A Bernoulli trial is an experiment whose dichotomous outcomes are random (e.g. ‘head vs. ‘tail’). $X(outcome)= \begin{cases}
 , & \text{s=head} \\
 , & \text{s=tail}
 \end{cases}$.
-If ''p''=P(''head''), then $E[X]=p,VAR[X]=p(1-p)$.
+If ''p''=P(''head''), then $E[X]=p$ and $VAR[X]=p(1-p)$.
-.5) Binomial distribution: Suppose we conduct an experiment observing n trial Bernoulli process. If we are interested in the RV x = {Number of heads in the n trials}, then X is called a Binomial RV and its distribution is called Binomial distribution, $X \approx B(n,p)$,where n is sample size, p is the probability of head at one trial. $P(X=x)={n\choose x}) p^x (1-p)^{n-x}$, for $x=0,1,…,n,$ where ${n\choose x}=\dfrac {n!} {x!(n-x)!}$ is the binomial coefficient.
+====Binomial distribution====
-<center>
+Suppose we conduct an experiment observing n trial Bernoulli process. If we are interested in the RV $x$ = {Number of heads in the $n$ trials}, then $X$ is called a Binomial RV and its distribution is called Binomial distribution, $X \approx B(n,p)$,where $n$ is sample size, $p$ is the probability of head at one trial. $P(X=x)={n\choose x} p^x (1-p)^{n-x}$, for $x=0,1,…,n$, where ${n\choose x}=\frac {n!} {x!(n-x)!}$ is the binomial coefficient.
-$E[X]=np,VAR[X]=np(1-p)$
+$$E[X]=np,VAR[X]=np(1-p)$$
-</center>
-.6) Multinomial distribution: an extension of binomial distribution. The experiment consists of k repeated trials and each trial has a discrete number of possible outcomes; on any given trial, the probability that a particular outcome will occur is constant; the trials are independent.
+====Multinomial distribution====
+The multinomial distribution is an extension of binomial where the experiment consists of $k$ repeated trials and each trial has a discrete number of possible outcomes; on any given trial, the probability that a particular outcome will occur is constant; the trials are independent.
-$ p=P(X_{1}=r_{1}∩…⋂X_{k}$=$r_{k}│r_{1}+⋯+r_{k}=n)$ = ${n\choose r_{1,…,}r_{k}} p_{1}^{r_{1}} p_{2}^{r_{2}}…p_{k}^{r_{k}},∀r_{1}+⋯+r_{k}=n$ where ${n\choose r_{1,…,r_k}}=\dfrac {n!}{r_{1!,…,r_k!}}$.
+$ p=P(X_1=r_1 \cap \cdots \cap X_k=r_{k}│r_1 + ⋯ +r_k=n)$ = ${n\choose r_1,…,r_k} p_1^{r_1} p_2^{r_2}…p_k^{r_k}$ for all (∀) $r_1+⋯+r_k=n$ where ${n\choose r_1,…,r_k}=\frac {n!}{r_1! \times … \times r_k!}$.
+====Geometric distribution====
+The probability distribution of number X of Bernoulli trials needed to get one success, supported on the set $\{1,2,3,…\}$. $P(X=x)=(1-p)^{x-1}p$, for $x = 1, 2, … $
-.7) Geometric distribution: the probability distribution of number X of Bernoulli trials needed to get one success, supported on the set ${1,2,3,…}$. $P(X=x)=(1-p)^{x-1}p$, for $x = 1, 2, …$
+$$(E[X]=\dfrac {1} {p},VAR[X]= \dfrac {1-p} {p^{2}}$$
-<center>
+====Hypergeometric distribution====
-$(E[X]=\dfrac {1} {p},VAR[X]= \dfrac {1-p} {p^{2}}$
+A discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement. An experimental design for using Hypergeometric distribution is illustrated in this table: a shipment of N objects in which m are defective. The Hypergeometric Distribution describes the probability that in a sample of n distinctive objects drawn from the shipment exactly k objects are defective.
-</center>
-.8) Hypergeometric distribution: a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement. An experimental design for using Hypergeometric distribution is illustrated in this table: a shipment of N objects in which m are defective. The Hypergeometric Distribution describes the probability that in a sample of n distinctive objects drawn from the shipment exactly k objects are defective.
 <center>
@@ Line 63: / Line 61: @@
 </center>
-<center>
+$$ P(X=k)=\dfrac {{m \choose k}{N-m \choose n-k}} {N \choose n}$,$E[X]=\frac{nm}{N}$, $VAR[X]=\frac{\frac{nm}{N}(1-\frac{m}{N})(N-n)} {N-1}$$
-$ P(X=k)=\dfrac {{m \choose k}{N-m \choose n-k}} {N \choose n}$,$E[X]=\dfrac{nm}{N},VAR[X]=\dfrac{\dfrac{nm}{N}(1-\dfrac{m}{N})(N-n)} {N-1}$
-</center>
-.9) Negative binomial distribution:
-Suppose X=trial index (n) of the $r^th$ success, or total # of experiments (n) to get r successes. $P(X=n)={n-1 \choose r-1} p^r (1-p)^(n-r)$, for $n=r,r+1,r+2,…,$  n is the trial number of the 𝑟𝑡ℎ success.
-<center>
+==== Negative binomial distribution====
-$E[X]=\dfrac {r} {p},VAR[X]=\dfrac {r(1-p)} {p^{2}}$
+Suppose X=trial index (n) of the $r^{th}$ success, or total number of experiments ($n$) to get $r$ successes. $P(X=n)={n-1 \choose r-1} p^r (1-p)^(n-r)$, for $n=r,r+1,r+2,…$, where n is the trial number of the $𝑟^𝑡ℎ$ success.
-</center>
+$$E[X]=\dfrac {r} {p},VAR[X]=\dfrac {r(1-p)} {p^{2}}$$
 Suppose Y= Number of failures (k) to get r successes. $P(Y=k)={k+r-1 choose k} p^{r} (1-p)^{k}$, for $k=0,1,2,…,$ where k is the number of failures before the $ r^{th} $ success. $Y~NegBin(r,p)$, the probability of k failures and r successes in n = k+1 Bernoulli(p) trials with success on the last trial.

Difference between revisions of "SMHS ProbabilityDistributions"

Revision as of 15:44, 30 July 2014

Contents

Scientific Methods for Health Sciences - Probability Distributions

Overview

Motivation

Theory

Bernoulli distribution

Binomial distribution

Multinomial distribution

Geometric distribution

Hypergeometric distribution

Negative binomial distribution

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools