In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936.

It is based on correlations between variables by which different patterns can be identified and analyzed. It is a useful way of determining similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant, i.e. not dependent on the scale of measurements.

One way to think about the Mahalanobis distance is that in euclidean space its like all the points are assigned the same level of importance or weight i.e if we are to measure distance between 2 points (x1,y1,z1) and (x2,y2,z2) we merely take :

Let us call vector $ \vec{v_1}=(x_1,y_1,z_1)^{\top} $

and $ \vec{v_2}=(x_2,y_2,z_2)^{\top} $

$ D(\vec{v_2},\vec{v_1})=\sqrt{(x_2-x_1)^2 + (y_2-y_1)^2 + (z_2-z_1)^2)} $

Also, $ D(\vec{X},\vec{0}) =\sqrt{x^2 + y^2 + z^2} $ is defined as the norm of vector $ \vec{X} $ with components (x,y,z).

It follows immediately that all points with the same distance of the origin form a sphere; that is

All points which satisfy D(X,0)=k, for some Constant k, if we plot them we will get a sphere.


On the other hand we are to weigh (or give more/less importance) some "dimensions" more than the other.

Lets suppose we scale the dimension x by weight w1, y by w2 and z by w3.

Then the new distance is :

$ D(\vec{v_1},\vec{v_2})=\sqrt{(\frac{x_2-x_1}{w_1})^2 + (\frac{y_2-y_1}{w_2})^2 + \frac{z_2-z_1}{w3})^2} $

where dividing by the weights is basically scaling the dimension according to its importance.

We can rewrite this as :

$ D(\vec{v_1},\vec{v_2})= \sqrt{(\vec{v_1}-\vec{v_2})^{\top} D^{-1} (\vec{v_1}-\vec{v_2})} $

where (v1-v2)' = Transpose of v1-v2

D = a diagonal matrix with entries $ (w1^2) $ , $ (w2^2) $ and $ (w3^2) $  !

Here the norm of a vector X is defined as

$ D(\vec{X},0)= \sqrt{\vec{X}^{\top}(D^{-1}) \vec{X}} $;

Thus It follows immediately that all points with the same distance of the origin obey:

$ (\frac{x}{w_1})^2 + (\frac{y}{w_2})^2 + (\frac{z}{w_3})^2 = k $, where k is some constant .

This we can easily see is the equation of a ellipsoid.

Finally, we also take the correlation between variables into account when computing statistical distances. Correlation means that there are associations between the variables. Therefore, we want the axes of ellipsoid to reflect this correlation. This is obtained by allowing the axes of the ellipsoid at constant distance to rotate. Then D will no longer be just a diagonal matrix and will reflect the correlation between the axis. The diagonal entries represent the variance along that direction(dimension) itself while the non diagonal entries represent correlation between the variables.

References

A nice example can be seen at :

http://www.jennessent.com/arcview/mahalanobis_description.htm

Alumni Liaison

Correspondence Chess Grandmaster and Purdue Alumni

Prof. Dan Fleetwood