Introduction

This approach consist in using certain models for clusters and attempting to optimize the fit between the data and the model. In practice, each cluster can be mathematically represented by a parametric distribution, like a Gaussian. Thus, the entire data set is modelled by a mixture of these distributions.

Mixture of Gaussians

This is the most widely used clustering method of this kind is the one based on learning a mixture of Gaussians. The algorithms works in this way:

1) it chosses the component (the Gaussian) at random with probability $ P(w_i) $.

2) it samples a point $ N(m_i,std^2I) $ So basically it is trying to choose clusters so that the average distance between the data vectors and the chosen means (basically the model parameter) is a minimum. Tesentation of the way the data is distributed in the space.

3) Using Expectation-Maximization_OldKiwi algorithm we maximise the likelihood function.

Alumni Liaison

Questions/answers with a recent ECE grad

Ryne Rayburn