AP Statistics Curriculum 2007 MultivariateNormal

From SOCR
Jump to: navigation, search

EBook - Multivariate Normal Distribution

The multivariate normal distribution, or multivariate Gaussian distribution, is a generalization of the univariate (one-dimensional) normal distribution to higher dimensions. A random vector is said to be multivariate normally distributed if every linear combination of its components has a univariate normal distribution. The multivariate normal distribution may be used to study different associations (e.g., correlations) between real-valued random variables.

Definition

In k-dimensions, a random vector \(X = (X_1, \cdots, X_k)\) is multivariate normally distributed if it satisfies any one of the following equivalent conditions (Gut, 2009):

  • Every linear combination of its components Y = a1X1 + … + akXk is normally distributed. In other words, for any constant vector \(a\in R^k\), the linear combination (which is univariate random variable) \(Y = a^TX = \sum_{i=1}^{k}{a_iX_i}\) has a univariate normal distribution.
  • There exists a random -vector Z, whose components are independent normal random variables, a k-vector μ, and a k×ℓ matrix A, such that \(X = AZ + \mu\). Here is the rank of the variance-covariance matrix.
  • There is a k-vector μ and a symmetric, nonnegative-definite k×k matrix Σ, such that the characteristic function of X is

\[ \varphi_X(u) = \exp\Big( iu^T\mu - \tfrac{1}{2} u^T\Sigma u \Big). \]

  • When the support of X is the entire space Rk, there exists a k-vector μ and a symmetric positive-definite k×k variance-covariance matrix Σ, such that the probability density function of X can be expressed as

\[ f_X(x) = \frac{1}{ (2\pi)^{k/2}|\Sigma|^{1/2} } \exp\!\Big( {-\tfrac{1}{2}}(x-\mu)'\Sigma^{-1}(x-\mu) \Big) \], where |Σ| is the determinant of Σ, and where (2π)k/2|Σ|1/2 = |2πΣ|1/2. This formulation reduces to the density of the univariate normal distribution if Σ is a scalar (i.e., a 1×1 matrix).

If the variance-covariance matrix is singular, the corresponding distribution has no density. An example of this case is the distribution of the vector of residual-errors in the ordinary least squares regression. Note also that the Xi are in general not independent; they can be seen as the result of applying the matrix A to a collection of independent Gaussian variables Z.

Bivariate (2D) case

See the SOCR Bivariate Normal Distribution Activity and corresponding Webapp.

In 2-dimensions, the nonsingular bi-variate Normal distribution with (\(k=rank(\Sigma) = 2\)), the probability density function of a (bivariate) vector (X,Y) is \[ f(x,y) = \frac{1}{2 \pi \sigma_x \sigma_y \sqrt{1-\rho^2}} \exp\left( -\frac{1}{2(1-\rho^2)}\left[ \frac{(x-\mu_x)^2}{\sigma_x^2} + \frac{(y-\mu_y)^2}{\sigma_y^2} - \frac{2\rho(x-\mu_x)(y-\mu_y)}{\sigma_x \sigma_y} \right] \right), \] where ρ is the correlation between X and Y. In this case, \[ \mu = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad \Sigma = \begin{pmatrix} \sigma_x^2 & \rho \sigma_x \sigma_y \\ \rho \sigma_x \sigma_y & \sigma_y^2 \end{pmatrix}. \]

In the bivariate case, the first equivalent condition for multivariate normality is less restrictive: it is sufficient to verify that countably many distinct linear combinations of X and Y are normal in order to conclude that the vector \( [ X, Y ] ^T\) is bivariate normal.

Properties

Normally distributed and independent

If X and Y are normally distributed and independent, this implies they are "jointly normally distributed", hence, the pair (XY) must have bivariate normal distribution. However, a pair of jointly normally distributed variables need not be independent - they could be correlated.

Two normally distributed random variables need not be jointly bivariate normal

The fact that two random variables X and Y both have a normal distribution does not imply that the pair (XY) has a joint normal distribution. A simple example is provided below:

Let X ~ N(0,1).
Let \(Y = \begin{cases} X,& |X| > 1.33,\\ -X,& |X| \leq 1.33.\end{cases}\)

Then, both X and Y are individually Normally distributed; however, the pair (X,Y) is not jointly bivariate Normal distributed (of course, the constant c=1.33 is not special, any other non-trivial constant also works).

Furthermore, as X and Y are not independent, the sum Z = X+Y is not guaranteed to be a (univariate) Normal variable. In this case, it's clear that Z is not Normal: \[Z = \begin{cases} 0,& |X| \leq 1.33,\\ 2X,& |X| > 1.33.\end{cases}\]

Applications

This SOCR activity demonstrates the use of 2D Gaussian distribution, expectation maximization and mixture modeling for classification of points (objects) in 2D.

Problems


References




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif