Multivariate Normal Distribution

May 22, 2023

Multivariate Normal Distribution

You might recall in the univariate course that we had a central limit theorem for the sample mean for large samples of random variables.

A similar result is available in multivariate statistics that says if we have a collection of random vectors X1, X2..Xn that are independent and identically distributed, then sample mean vector, x_bar, is going to be approximately multivariate normally distributed for large samples.

For UND, probability density function of X is given by exponential function and the density value is maximized when x is equal to µ (since the exponential function is a monotone function).

Shorthand notation is, X ~ N (µ, σ2)

Similar for MD, if you have a p*1 or k*1random vector X that is distributed according to MVN with population mean vector of µ and population variance-covariance matrix ∑, then this random vector, X, will have joint density function given by,

Where |∑| denotes the determinant of the variance-covariance matrix.

∑^-1 is inverse of the variance-covariance matrix.

Again, this distribution will take maximum values when the vector X is equal to the mean vector, µ.

If p=2, then you have a bivariate normal distribution, this will yield a bell-shaped curve in three dimensions.

The shorthand notation, similar to the univariate version above, is X ~ N (µ, ∑)

Some things to note about the multivariate normal distribution:

1. The exponent of the multivariate normal distribution is a quadratic form, also called the squared Mahalanobis distance between the random vector x and the mean vector µ.

2. If the variables are uncorrelated, then the variance-covariance matrix will be a diagonal matrix with variances of the individual variables appearing on the main diagonal of the matrix and zeros everywhere else. In this case, MVD function simplifies.

Below for UND,

Note: In BVN density function, when variables are uncorrelated, the product term, given by 'capital' pi, (π), acts very much like the summation sign, but instead of adding we multiply over the elements ranging from j=1 to j=p. Inside this product is the familiar univariate normal distribution where the random variables are subscripted by j. In this case, the elements of the random vector, X1, X2,.. Xn are going to be independent random variables.

3. We could also consider linear combinations of the elements of a multivariate normal random variable as shown in the expression below,

Note: To define a linear combination, the random variables Xj need not be uncorrelated. The coefficients are chosen arbitrarily, specific values are selected according to the problem of interest and so are influenced very much by subject matter knowledge. Looking back at the Women's Nutrition Survey Data, for example, we selected the coefficients to obtain the total intake of vitamins A and C.

Vitamin A is measured in micrograms while Vitamin C is measured in milligrams. There are a thousand micrograms per milligram so the total intake of the two vitamins, Y, can be expressed as the following:

Now suppose that the random vector X is multivariate normal with mean µ and variance-covariance matrix ∑. Then, Y is normally distributed with mean, and variance as given below.