AP Statistics Curriculum 2007 MultivariateNormal

EBook - Multivariate Normal Distribution

The multivariate normal distribution, or multivariate Gaussian distribution, is a generalization of the univariate (one-dimensional) normal distribution to higher dimensions. A random vector is said to be multivariate normally distributed if every linear combination of its components has a univariate normal distribution. The multivariate normal distribution may be used to study different associations (e.g., correlations) between real-valued random variables.

Definition

In k-dimensions, a random vector \(X = (X_1, \cdots, X_k)\) is multivariate normally distributed if it satisfies any one of the following equivalent conditions (Gut, 2009):

Every linear combination of its components Y = a₁X₁ + … + a_kX_k is normally distributed. In other words, for any constant vector \(a\in R^k\), the linear combination (which is univariate random variable) \(Y = a^TX = \sum_{i=1}^{k}{a_iX_i}\) has a univariate normal distribution.

There exists a random ℓ-vector Z, whose components are independent normal random variables, a k-vector μ, and a k×ℓ matrix A, such that \(X = AZ + \mu\). Here ℓ is the rank of the variance-covariance matrix.

There is a k-vector μ and a symmetric, nonnegative-definite k×k matrix Σ, such that the characteristic function of X is

\[ \varphi_X(u) = \exp\Big( iu^T\mu - \tfrac{1}{2} u^T\Sigma u \Big). \]

When the support of X is the entire space R^k, there exists a k-vector μ and a symmetric positive-definite k×k variance-covariance matrix Σ, such that the probability density function of X can be expressed as

\[ f_X(x) = \frac{1}{ (2\pi)^{k/2}|\Sigma|^{1/2} } \exp\!\Big( {-\tfrac{1}{2}}(x-\mu)'\Sigma^{-1}(x-\mu) \Big) \], where |Σ| is the determinant of Σ, and where (2π)^k/2|Σ|^1/2 = |2πΣ|^1/2. This formulation reduces to the density of the univariate normal distribution if Σ is a scalar (i.e., a 1×1 matrix).

If the variance-covariance matrix is singular, the corresponding distribution has no density. An example of this case is the distribution of the vector of residual-errors in the ordinary least squares regression. Note also that the X_i are in general not independent; they can be seen as the result of applying the matrix A to a collection of independent Gaussian variables Z.

Bivariate (2D) case

See the SOCR Bivariate Normal Distribution Activity and corresponding Webapp.

In 2-dimensions, the nonsingular bi-variate Normal distribution with (\(k=rank(\Sigma) = 2\)), the probability density function of a (bivariate) vector (X,Y) is \[ f(x,y) = \frac{1}{2 \pi \sigma_x \sigma_y \sqrt{1-\rho^2}} \exp\left( -\frac{1}{2(1-\rho^2)}\left[ \frac{(x-\mu_x)^2}{\sigma_x^2} + \frac{(y-\mu_y)^2}{\sigma_y^2} - \frac{2\rho(x-\mu_x)(y-\mu_y)}{\sigma_x \sigma_y} \right] \right), \] where ρ is the correlation between X and Y. In this case, \[ \mu = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad \Sigma = \begin{pmatrix} \sigma_x^2 & \rho \sigma_x \sigma_y \\ \rho \sigma_x \sigma_y & \sigma_y^2 \end{pmatrix}. \]

In the bivariate case, the first equivalent condition for multivariate normality is less restrictive: it is sufficient to verify that countably many distinct linear combinations of X and Y are normal in order to conclude that the vector \( [ X, Y ] ^T\) is bivariate normal.

Properties

Normally distributed and independent

If X and Y are normally distributed and independent, this implies they are "jointly normally distributed", hence, the pair (X, Y) must have bivariate normal distribution. However, a pair of jointly normally distributed variables need not be independent - they could be correlated.

Two normally distributed random variables need not be jointly bivariate normal

The fact that two random variables X and Y both have a normal distribution does not imply that the pair (X, Y) has a joint normal distribution. A simple example is provided below:

Let X ~ N(0,1).

Let \(Y = \begin{cases} X,& |X| > 1.33,\\ -X,& |X| \leq 1.33.\end{cases}\)

Then, both X and Y are individually Normally distributed; however, the pair (X,Y) is not jointly bivariate Normal distributed (of course, the constant c=1.33 is not special, any other non-trivial constant also works).

Furthermore, as X and Y are not independent, the sum Z = X+Y is not guaranteed to be a (univariate) Normal variable. In this case, it's clear that Z is not Normal: \[Z = \begin{cases} 0,& |X| \leq 1.33,\\ 2X,& |X| > 1.33.\end{cases}\]

Applications

This SOCR activity demonstrates the use of 2D Gaussian distribution, expectation maximization and mixture modeling for classification of points (objects) in 2D.

Problems

References

Gut, A. (2009): An Intermediate Course in Probability, Springer 2009, chapter 5, ISBN 9781441901613.

SOCR Home page: http://www.socr.ucla.edu

"-----

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

AP Statistics Curriculum 2007 MultivariateNormal

Contents

EBook - Multivariate Normal Distribution

Definition

Bivariate (2D) case

Properties

Normally distributed and independent

Two normally distributed random variables need not be jointly bivariate normal

Applications

Problems

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools