Whitening and Coloring Transforms for Multivariate Gaussian Random Variables

A slecture by ECE student Maliha Hossain

Partly based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.

## Introduction

Data whitening is required prior to many data processing techniques such as principal component analysis, independent component analysis, etc. This slecture discusses how to whiten data that is normally distributed but keep in mind that although I have illustrated the whitening and coloring transforms with Gaussian data, these transforms can be used on data drawn from according to any probability law. The slecture also covers how to transform white data so that it resembles data drawn from another Gaussian distribution with known mean and covariance. This will be useful material for when the reader would like to generate data points from non-white Gaussian distributions.

Throughout this slecture, we will denote the probability density function (pdf) of the random variable X as f$_X$ : R$^d$R where XR$^d$ is a d-dimensional Gaussian random vector with mean μ and covariance matrix Σ. This slecture assumes that the reader is familiar with topics from linear algebra including but not limited to vector spaces, eigenvalues, and eigenvectors. It is no surprise that eigendecomposition plays an important role in data whitening since we would ultimately like to produce a diagonal covariance matrix. The transforms revolve around expressing Σ as

$\mathbf{\Sigma} = \mathbf U^T\mathbf U$

We will focus on finding U via eigendecomposition but there are other methods such as the Cholesky decomposition for symmetric positive definite matrices that could be used to correlate and decorrelate vectors.

The pdf of X is given by

$f_{\mathbf X} (\mathbf x) = \frac{1}{(2\pi)^{\frac{d}{2}}|\mathbf{\Sigma}|^{\frac{1}{2}}}\mbox{exp}\left[{-\frac{1}{2}(\mathbf x - \mu)^T\mathbf{\Sigma}^{-1}(\mathbf x - \mu)}\right]$

## Whitening Transform

Let X be a multivariate Gaussian random vector with arbitrary covariance matrix Σ and mean vector μ. We would like to transform it to another random vector whose covariance is the identity matrix. This means that the components of our new random variable are uncorrelated and have variance 1. This process is called whitening because the output resembles a white noise vector.

There are two steps to the procedure. First, we must decorrelate the components of the vector. This produces a data point drawn from a distribution with a diagonal covariance matrix. Finally, we must scale the different components so that they have unit variance. In order to do this, we need the eigendecomposition of the original covariance matrix Σ. This is either known or it can be estimated from your data set. We assume that Σ is positive definite. We also assume wlog that X has zero mean. If not, then subtract the mean (estimated or otherwise) from your samples such that

\begin{align} \mathbf {\mu} &= \mathbb E[\mathbf X] = \mathbf 0 \\ \Rightarrow\mathbf{\Sigma} &= \mathbb E[\mathbf{XX}^T]\end{align}

The covariance matrix can be expressed as follows

$\mathbf{\Sigma} = \mathbf{\Phi\Lambda\Phi}^{-1} = \mathbf{\Phi\Lambda}^{\frac{1}{2}}\mathbf{\Lambda}^{\frac{1}{2}}\mathbf{\Phi}^{-1}$

where Λ is a diagonal matrix with the eigenvalues, λ$_i$, of Σ on the diagonal and Φ is the matrix of corresponding eigenvectors, e$_i$. In addition, the columns of Φ are orthonormal so that

$\mathbf{\Phi}^{-1} = \mathbf{\Phi}^T$

Let

$\mathbf Y = \mathbf{\Phi}^T \mathbf X$

Then Y is a random vector with a decorrelated multivariates Gaussian distribution with variances λ$_i$ on the diagonal of its covariance matrix. The next step then is quite obvious. Let

$\mathbf W = \mathbf{\Lambda}^{-\frac{1}{2}} \mathbf Y = \mathbf{\Lambda}^{-\frac{1}{2}}\mathbf{\Phi}^T\mathbf X$

W is then random vector with the standard multivariate distribution. Note that X = x, Y = y, and W = w are at the same Mahalanobis distance away from their respective means (in this case, the origin) and in the case of W, the Mahalanobis distance and the Euclidean distance are the same.

Now that you have the procedure, you may be asking why this transform works. Well, we can derive the covariance matrix of Y. Recall that its mean is the zero vector so its covariance is

\begin{align}\mathbb E[\mathbf {YY}^T] &=\mathbb E[\mathbf{\Phi}^T\mathbf{ XX}^T\mathbf{\Phi}] \\ &= \mathbf{\Phi}^T\mathbb E[\mathbf{ XX}^T]\mathbf{\Phi} \\ &= \mathbf{\Phi}^T\mathbf{\Sigma}\mathbf{\Phi} \\ &= \mathbf{\Phi}^T\mathbf{\Phi}\mathbf{\Lambda}\mathbf{\Phi}^T\mathbf{\Phi} \\ &= \mathbf{\Lambda} \end{align}

This proves what we previously stated; the components of Y are uncorrelated since the covariance matrix is diagonal. In the Gaussian case, this is equivalent to independence since we can express the joint pdf of the components of Y as a product of the marginals.

\begin{align} f_{\mathbf Y}(\mathbf y)&=\frac{1}{(2\pi)^{\frac{d}{2}}|\mathbf{\Lambda}|^{\frac{1}{2}}}\mbox{exp}\left(-\frac{1}{2}\mathbf y^T\mathbf{\Lambda}^{-1}\mathbf y\right) \\ &= \frac{1}{(2\pi)^{\frac{d}{2}}|\mathbf{\Lambda}|^{\frac{1}{2}}}\mbox{exp}\left(-\frac{1}{2}\sum_{i=1}^d \frac{y_i^2}{\lambda_i}\right) \\ &= \prod_{i=1}^d \frac{1}{\sqrt{2\pi\lambda_i}}\mbox{exp}\left(-\frac{y_i^2}{2\lambda_i}\right) \end{align}

Using the fact that diagonal matrix Λ$^{0.5}$ is symmetric, the covariance matrix of W is

\begin{align} \mathbb E[\mathbf {WW}^T] &=\mathbb E[\mathbf{\Lambda}^{-\frac{1}{2}T}\mathbf{ YY}^T\mathbf{\Lambda}^{-\frac{1}{2}}] \\ &= \mathbf{\Lambda}^{-\frac{1}{2}T}\mathbb E[\mathbf{ YY}^T]\mathbf{\Lambda}^{-\frac{1}{2}} \\ &= \mathbf{\Lambda}^{-\frac{1}{2}T}\mathbf{\Lambda}\mathbf{\Lambda}^{-\frac{1}{2}} \\ &= \mathbf{\Lambda}^{-\frac{1}{2}} \mathbf{\Lambda}^{\frac{1}{2}} \mathbf{\Lambda}^{\frac{1}{2}} \mathbf{\Lambda}^{-\frac{1}{2}} \\ &= \mathbf I \mathbf I \\ &= \mathbf I \end{align}

This concludes the proof of how the linear transform given by matrix pre-multiplication by Λ$^{-0.5}$Φ$^T$ produces whitened data.

Let us now work through an example. For the purpose of illustration, I will use a bivariate Gaussian vector X∼N(0,Σ) where

$\mathbf {\Sigma} = \begin{bmatrix}10 && -6 \\ -6 && 5 \end{bmatrix}$

Let S be the collection of data points obtained by taking $n$ samples from X. The columns of S are random vectors X$_i$, $i = 1,2,...,n$. S can be generated in MATLAB using the following code.

% Generate n samples from colored bivariate Gaussian
n = 1000;
CovX = [10 -6;-6 5];
mu = [0 0];
S = mvnrnd(mu,CovX,n);
S = S';
Fig 1: Scatter plots and Contours. All contours are at Mahalanobis distance 1. (a) Original colored density of X, (b) decorrelated density of Y, (c) whitenend density of W.

Figure 1(a) shows a scatter plot of the generated samples along with one contour at unit Mahalanobis distance. The contour is an ellipse that is centered at the origin and rotated through an angle. Say that we do not know the parameters of the distribution from which S was obtained so we estimate them. We use the maximum likelihood estimates for the parameters.

$\hat{\mu} = \frac{1}{n}\sum_{i=1}^n\mathbf X_i = \begin{bmatrix}-0.0753 \\ 0.0568\end{bmatrix}$
\begin{align} \hat{\mathbf{\Sigma}} &= \frac{1}{n-1}\sum_{i=1}^n(\mathbf X_i-\mu)(\mathbf X_i-\mu)^T \\ &= \frac{1}{n-1}\sum_{i=1}^n\mathbf X_i\mathbf X_i^T \\ &= \begin{bmatrix} 9.9559 && -5.8718 \\ -5.8718 && 4.8099 \end{bmatrix} \\ \end{align}

The code for the above estimation is given by

% sample mean
mus = mean(S,2);
% estimate covariance
Covs = bsxfun(@minus,S,mus) * bsxfun(@minus,S,mus)'/(n-1);

Next, we find the eigenvectors of the estimated covariance matrix by finding the orthonormal basis for the nullspace of matrix (Σ$_i$I). In MATLAB, we simply use the eig command to find both Φ and Λ.

[Phi,Lam] = eig(Covs);

\begin{align}\mathbf{\Lambda}=\begin{bmatrix}0.9881 && 0 \\ 0 && 13.8774\end{bmatrix} && \mathbf{\Phi} = \begin{bmatrix}-0.5576 && -0.8301 \\ -0.8301 && 0.5576\end{bmatrix}\end{align}

Now let

${\mathbf Y} = \mathbf{\Phi}\mathbf S$

Then, Y is a set of $n$ data points drawn from an uncorrelated bivariate Gaussian distribution with diagonal covariance matrix Λ. The premultiplication by Φ essentially rotates the ellipse in figure 1(a) so that its principal axes are aligned with the $x_1-x_2$ axes. The result of this transformation is shown in figure 1(b).

Finally, we scale the samples so the radii of the ellipse are of unit length.

$\mathbf W = \mathbf{\Lambda}^{-\frac{1}{2}}\mathbf Y$

In MATLAB,

W = sqrt(Lam)\ Y;

The whitened data is shown in figure 1(c). Note that the normalization produces a contour that is now a circle. From MATLAB, the estimated covariance of the whitened samples is given by

$\hat{\mathbf{\Lambda}}_W = \begin{bmatrix}1.0005 &&-0.0001 \\ -0.0001 && 1.0000\end{bmatrix}$

Note that the whitening transform is not unique. If e$_1$ is an eigenvector, then so is -e$_1$. Also, Λ$^{-0.5}$ is not unique either. We could have taken the negative roots of the eigenvalues but since we think of them as standard deviations, it is easier to think of them as positive quantities. In our intermediary step, we rotated the ellipse so that the major axis is aligned vertically in figure 1(c) but we could have done it the other way around and arrived at the same conclusion.

## Coloring Transform

The coloring transform is the inverse of the whitening transform. Here we see how to transform W, a multivariate Gaussian variable whose components are i.i.d with unit variance to X, a Gaussian random vector with some desired covariance matrix.

Say you have S, a matrix whose $n$ columns are $n$ samples drawn from a whitened Gaussian distribution. So S is of the form [W$_1$,...W$_n$] but what you want is $n$ samples from a distribution that has covariance matrix Σ and mean μ. The first step is to perform the eigendocomposition of your desired covariance matrix Σ.

$\mathbf{\Sigma} = \mathbf{\Phi\Lambda\Phi}^T = \mathbf{\Phi\Lambda}^{\frac{1}{2}}\mathbf{\Lambda}^{\frac{1}{2}}\mathbf{\Phi}^T$

where Λ is a diagonal matrix with the eigenvalues, λ$_i$, of Σ on the diagonal and Φ is the matrix of corresponding eigenvectors, e$_i$. In addition, the columns of Φ are orthonormal so that

$\mathbf{\Phi}^{-1} = \mathbf{\Phi}^T$

The first step is to scale the samples so their spread reflects your desired variances along the principal axes. This step changes the spherical contours to elliptical ones.

$\mathbf Y = \mathbf{\Lambda }^{\frac{1}{2}}\mathbf{S}$

The next step is to rotate the data points so that they are now correlated.

$\mathbf X = \mathbf{\Phi Y} = \mathbf{\Phi \Lambda }^{\frac{1}{2}}\mathbf{S}$

Fig 2: Scatter plots and Contours. All contours are at Mahalanobis distance 1. (a) Original white density of W, (b) scaled density of Y, (c) colored density of X.

Figure 2 shows the mapping from W to Y to X for a bivariate distribution with

$\mathbf{\Sigma} = \begin{bmatrix} 10 && -6 \\ -6 && 5 \end{bmatrix}$

It is just like figure 1 from right to left. Just like the whitening transform, the coloring transform is not unique either. Finally, add μ to your samples so that the mean is translated from the origin to your desired location. The transform can be implemented in MATLAB with the following code.

function[X] = colortran(mu, CovX, N)

% function generates N data points from multivariate Gaussian
% with mean vector mu and covariance matrix CovX.

% Find eigenvectors and eigenvalues
[Phi,Lam] = eig(CovX);
% Generate N samples of white data
d = length(mu);
S = mvnrnd(zeros(1,d),eye(d),N)';
% Scale to desired variances
Y = sqrt(Lam)*S;
% Rotate or correlate samples
X = Phi*Y;
% Add the mean vector to every column of X
X = bsxfun(@plus,X,mu);

end

## Summary

So we see that data whitening can be broken down into two steps:

• Decorrelation. In other words, a rotation that maps the data so that the axes along which the original data has the largest variance are aligned with your co-ordinate axes. The resulting covariance matrix is diagonal. This is really just a change of basis.
• Scaling so that the principal axes are each of unit length. The data is squeezed or stretched along individual axes depending on whether the variance along said axis is greater or less than one. The resulting covariance matrix is identity.

We can summarize coloring as the inverse of whitening. I also find it helpful to compare it with the simplest case, when we are working with scalars. Recall that multiplying a random variable with a constant produces another whose standard deviation is scaled by a factor equal to said constant. Consequently, If $W$ is a random variable of unit variance and $X = c.W$, then $X$ has standard deviation $c.1$ and variance $c^2$. This multiplication is a linear transform. In the multivariate case, we are expressing our desired covariance matrix as a product

$\mathbf{\Sigma} = \mathbf U^T\mathbf U$

where U$^T$ = ΦΛ$^{0.5}$. This U$^T$ is our linear transform and is analogous to $c$ in the scalar case.

I am aware that MATLAB comes with the mvnrnd command that can generate data points from any multivariate Gaussian distribution but hopefully, this material provides sufficient background to allow the reader to not only whiten and color data but also to transform data from one distribution to another in other programs that are not equipped with the same out of the box capabilities as MATLAB. A summary of this process is presented in figure 3.

Fig 3: Summary diagram of whitening and coloring process.

## References

[1]. Charles A. Bouman, Digital Image Processing Laboratory: Eigen-decomposition of Images. February 22, 2013

[2]. Mireille Boutin, "ECE662: Statistical Pattern Recognition and Decision Making Processes," Purdue University, Spring 2014.

[3]. Gilbert Strang, Linear Algebra and its Applications. Thomson Brooks/Cole. 4th Edition, 2006.