The ISODATA algorithm has some further refinements by splitting and merging of clusters (JENSEN, 1996). Clusters are merged if either the number of members (pixel) in a cluster is less than a certain threshold or if the centers of two clusters are closer than a certain threshold. Clusters are split into two different clusters if the cluster standard deviation exceeds a predefined value and the number of members (pixels) is twice the threshold for the minimum number of members.

The ISODATA algorithm is similar to the k-means algorithm with the distinct difference that the ISODATA algorithm allows for different number of clusters while the k-means assumes that the number of clusters is known a priori.


The objective of the k-means algorithm is to minimize the within cluster variability. The objective function (which is to be minimized) is the sums of squares distances (errors) between each pixel and its assigned cluster center.

$ SS_{distances} = \sum_{all x}[ x - C(x)]^2 $


where C(x) is the mean of the cluster that pixel x is assigned to.

Minimizing the SSdistances is equivalent to minimizing the Mean Squared Error (MSE). The MSE is a measure of the within cluster variability.

$ MSE = \frac{\sum_{all x}{[x - C(x)]^2} }{(N - c)^b} = \frac{SS_{distances}}{(N-c)^b} $


where N is the number of pixels, c indicates the number of clusters, and b is the number of spectral bands.

Note that the MSE is not the objective function of the ISODATA algorithm. However, the ISODATA algorithm tends to also minimize the MSE.

K-means (just as the ISODATA algorithm) is very sensitive to initial starting values. For two classifications with different initial values and resulting different classification one could choose the classification with the smallest MSE (since this is the objective function to be minimized). However, as we show later, for two different initial values the differences in respects to the MSE are often very small while the classifications are very different. Visually it is often not clear that the classification with the smaller MSE is truly the better classification.


From a statistical viewpoint, the clusters obtained by k-mean can be interpreted as the Maximum Likelihood Estimates (MLE) for the cluster means if we assume that each cluster comes from a spherical Normal distribution with different means but identical variance (and zero covariance).

This touches upon a general disadvantage of the k-means algorithm (and similarly the ISODATA algorithm): k-means works best for images with clusters that are spherical and that have the same variance. This is often not true for remote sensing images. For example, a cluster with "desert" pixels is compact/circular. A "forest" cluster, however, is usually more or less elongated/oval with a much larger variability compared to the "desert" cluster. While the "desert" cluster is usually very well detected by the k-means algorithm as one distinct cluster, the "forest" cluster is often split up into several smaller cluster. The way the "forest" cluster is split up can vary quite a bit for different starting values and is thus arbitrary.



  • Here are some useful links for ISODATA clustering

Alumni Liaison

Ph.D. 2007, working on developing cool imaging technologies for digital cameras, camera phones, and video surveillance cameras.

Buyue Zhang