AP Statistics Curriculum 2007 MultivariateNormal

Jump to: navigation, search

EBook - Multivariate Normal Distribution

The multivariate normal distribution, or multivariate Gaussian distribution, is a generalization of the univariate (one-dimensional) normal distribution to higher dimensions. A random vector is said to be multivariate normally distributed if every linear combination of its components has a univariate normal distribution. The multivariate normal distribution may be used to study different associations (e.g., correlations) between real-valued random variables.


In k-dimensions, a random vector \(X = (X_1, \cdots, X_k)\) is multivariate normally distributed if it satisfies any one of the following equivalent conditions (Gut, 2009):

  • Every linear combination of its components Y = a1X1 + … + akXk is normally distributed. In other words, for any constant vector \(a\in R^k\), the linear combination (which is univariate random variable) \(Y = a^TX = \sum_{i=1}^{k}{a_iX_i}\) has a univariate normal distribution.
  • There exists a random -vector Z, whose components are independent normal random variables, a k-vector μ, and a k×ℓ matrix A, such that \(X = AZ + \mu\). Here is the rank of the variance-covariance matrix.
  • There is a k-vector μ and a symmetric, nonnegative-definite k×k matrix Σ, such that the characteristic function of X is

\[ \varphi_X(u) = \exp\Big( iu^T\mu - \tfrac{1}{2} u^T\Sigma u \Big). \]

  • When the support of X is the entire space Rk, there exists a k-vector μ and a symmetric positive-definite k×k variance-covariance matrix Σ, such that the probability density function of X can be expressed as

\[ f_X(x) = \frac{1}{ (2\pi)^{k/2}|\Sigma|^{1/2} } \exp\!\Big( {-\tfrac{1}{2}}(x-\mu)'\Sigma^{-1}(x-\mu) \Big) \], where |Σ| is the determinant of Σ, and where (2π)k/2|Σ|1/2 = |2πΣ|1/2. This formulation reduces to the density of the univariate normal distribution if Σ is a scalar (i.e., a 1×1 matrix).

If the variance-covariance matrix is singular, the corresponding distribution has no density. An example of this case is the distribution of the vector of residual-errors in the ordinary least squares regression. Note also that the Xi are in general not independent; they can be seen as the result of applying the matrix A to a collection of independent Gaussian variables Z.

Bivariate (2D) case

See the SOCR Bivariate Normal Distribution Activity and corresponding Webapp.

In 2-dimensions, the nonsingular bi-variate Normal distribution with (\(k=rank(\Sigma) = 2\)), the probability density function of a (bivariate) vector (X,Y) is \[ f(x,y) = \frac{1}{2 \pi \sigma_x \sigma_y \sqrt{1-\rho^2}} \exp\left( -\frac{1}{2(1-\rho^2)}\left[ \frac{(x-\mu_x)^2}{\sigma_x^2} + \frac{(y-\mu_y)^2}{\sigma_y^2} - \frac{2\rho(x-\mu_x)(y-\mu_y)}{\sigma_x \sigma_y} \right] \right), \] where ρ is the correlation between X and Y. In this case, \[ \mu = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad \Sigma = \begin{pmatrix} \sigma_x^2 & \rho \sigma_x \sigma_y \\ \rho \sigma_x \sigma_y & \sigma_y^2 \end{pmatrix}. \]

In the bivariate case, the first equivalent condition for multivariate normality is less restrictive: it is sufficient to verify that countably many distinct linear combinations of X and Y are normal in order to conclude that the vector \( [ X, Y ] ^T\) is bivariate normal.


Normally distributed and independent

If X and Y are normally distributed and independent, this implies they are "jointly normally distributed", hence, the pair (XY) must have bivariate normal distribution. However, a pair of jointly normally distributed variables need not be independent - they could be correlated.

Two normally distributed random variables need not be jointly bivariate normal

The fact that two random variables X and Y both have a normal distribution does not imply that the pair (XY) has a joint normal distribution. A simple example is provided below:

Let X ~ N(0,1).
Let \(Y = \begin{cases} X,& |X| > 1.33,\\ -X,& |X| \leq 1.33.\end{cases}\)

Then, both X and Y are individually Normally distributed; however, the pair (X,Y) is not jointly bivariate Normal distributed (of course, the constant c=1.33 is not special, any other non-trivial constant also works).

Furthermore, as X and Y are not independent, the sum Z = X+Y is not guaranteed to be a (univariate) Normal variable. In this case, it's clear that Z is not Normal: \[Z = \begin{cases} 0,& |X| \leq 1.33,\\ 2X,& |X| > 1.33.\end{cases}\]


This SOCR activity demonstrates the use of 2D Gaussian distribution, expectation maximization and mixture modeling for classification of points (objects) in 2D.



Translate this page:

Uk flag.gif

De flag.gif

Es flag.gif

Fr flag.gif

It flag.gif

Pt flag.gif

Jp flag.gif

Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Fi flag.gif

इस भाषा में
In flag.gif

No flag.png

Kr flag.gif

Cn flag.gif

Cn flag.gif

Ru flag.gif

Nl flag.gif

Gr flag.gif

Hr flag.gif

Česká republika
Cz flag.gif

Dk flag.gif

Pl flag.png

Ro flag.png

Se flag.gif