Clear Understanding on Mahalanobis Distance
Clear Understanding on Mahalanobis Distance
In multivariate/ multicharacteristics data, a measure of divergence or distance between groups in terms of multiple characteristics is required.
Lets consider, you are interested in measuring the difference (distance) between groups G1 and G2 (each of p-dimensional). A common assumption is to take the p-dimensional random vector X , from each group, as having same variation about its mean within either group.
The difference between the groups can be considered in terms of difference between mean vectors of X, in each group relative to the common within-group variation (using common (pooled) covariance matrix).
The most often used measure for multiple characteristics data is, Mahalanobis distance (Mahalanobis Δ, where Δ is Uppercase Delta).
The square of Mahalanobis distance is given by,
Δ2 = (µ1-µ2)T Σ-1(µ1-µ2) or Δ2 = (µ1-µ2)′ Σ-1(µ1-µ2)
For nonsingular matrix, like Σ, Transpose of matrix is equal to the Inverse of matrix.
The presence of transpose of inverse or transpose of covariance matrix, Σ of X, in the quadratic form ,in Mahalanobis distance formula, is to allow for the different scales on which variables are measures and for non-zero correlation between the variables.
Alternately, The quadratic form of Σ has effect of transforming the variables to uncorrelated standardized variables, Y, and computing the squared Euclidean distance between the mean vectors of Y in two groups.
To understand Quadratic form of matrix, if A is squared matrix, we can compute quadratic form by using vector, X.
By looking at the exponents in the final expression, you can see why this is called a quadratic form or transformation of A.
It is now known that many standard distance measures such as Kolmogorov's variational distance, the Hellinger distance, Rao's distance, etc., are increasing functions of Mahalanobis distance under assumptions of normality and homoscedasticity and in certain other situations.
Sample Version of the Mahalanobis Distance, D2 :
In practice, the means µ1 and
µ2,
and the common covariance matrix Σ of the two groups G1 and G2
are generally unknown and must be estimated from random samples of sizes n1
and n2 from G1 and G2, yielding sample means x̅1
and x̅1 and
(bias-corrected) sample covariance matrices S1 and S2.
The common covariance matrix Σ can then be estimated by the pooled estimate, given by,
The sample version of the Δ2 is denoted by D2 and
is given by
The sample Mahalanobis distance, D2, is known to overestimate its population counter part, Δ2 .
In the situation where D2 is used, knowledge of D2 is needed.
It follows under the assumption of normality that cD2 is distributed as a noncentral F-distribution with p and N-p+ 1 degrees of freedom and noncentrality parameter cΔ2, where c=k (N-p+ l)/(PN) and k=(n1n2 )/(n1+n2).
When Mahalanobis distance is used to test that an observed random sample x1,...., xn is from a multivariate normal distribution, under the null hypothesis Dj2 should be distributed independently (approximate), with common distribution that can be approximated by a chi-squared distribution with p degrees of freedom, where j is jth random sample (where j = n, number of sample)
Mahalanobis formula, in term of respective random sample is,
where x̅ denotes sample mean and S denotes the (bias-corrected) sample covariance matrix of the n observations in the observed sample.
Alternatively, we can form the modified Mahalanobis distances d1, ..., dj, where
where x(j) and S(j) denote respectively the sample mean and (bias-corrected) sample covariance matrix of the n -1 observations after the deletion of xj, (j = 1, ... ,n).
In this case, the dj2 can be taken to be approximately independent with the common distribution of qdj2 given exactly by a F-distribution with p and n - p -1 degrees of freedom, where q = (n -1 ) (n-p-1 )/ {(pn)(n-2)}.
Interesting question which can be answered based on Mahalanobis Distance
- How different are the metabolic characteristics of normal persons, chemical diabetics and overt diabetics as determined by a total glucose tolerance test and how to make a diagnosis?
- On the basis of remote sensing data from satellite, how do you classify various tracts of land by vegetation type, rock type, etc.?
- Problem of pattern recognition or discriminant analysis (using Optimal discriminant function, measured in terms of Δ2).
- In Classification problem (how is it different that discriminant analysis).
https://www.ias.ac.in/article/fulltext/reso/004/06/0020-0026 Article on MD by GJ MaLachlan, Resonance 1999.
- Chapter 17 Quadratic Form of a Matrix | Matrix Algebra for Educational Scientists (zief0002.github.io)
Comments
Post a Comment