Fuzzy C-means

Summarized notes from Introduction to Data Mining (Int'l ed), Chapter 8.2

Unsupervised learning > clustering

assign a weight to each object and each cluster that indicates the degree to which the object belongs to the cluster => \(w_{ij}\) is the weight with which object \(x_i\) belongs to cluster \(C_j\)
additional constraints for fuzzy clusters:
- each weight is between 0 and 1
- sum of all weights for point \(x_1\) is 1
- all clusters contain at least one non-zero-weight point, but all points with weight 1

Pseudo code

FUZZY_C_MEANS(k):
    select initial fuzzy pseudo partition by assigning all weights w_ij 
    do:
        compute the centroid of each cluster using the fuzzy pseudo-partition
        recompute fuzzy pseudo-partition w_ij
    while(centroids change)    
    # alternatively stop when change is below some threshold    

random initialization if often used as initial weights \(w_{ij}\)
choice of initial centroids is relevant, similar to K-means

Algorithm will attempt to minimize the fuzzy version of SSE:

\[ \displaystyle SSE(C_1,C_2,\dots,C_k) = \sum_{j=1}^{k}\sum_{i=1}^m w_{ij}^{p} dist(x_i, c_j)^2 \]

where \(x_i\) is some data point, \(c_j\) is centroid, \(m\) is all points in dataset, \(k\) is number of clusters, and \(p\) determines the influence of the weights and has a value between 1 and \(\infty\).

Considerations for choosing \(p\):

\(p = 1\) fuzzy c-means behaves like traditional K-means
when \(p\) increases all the cluster centroids approach the global centroid of all the data points

\[ c_j = \sum_{i=1}^{m} w_{ij}^p x_i / \sum_{i=1}^{m} w_{ij}^p \]

Fuzzy centroid is similar to traditional centroid except all points are considered with their respective weights, and the contribution of each point to the centroid is weighted by its membership degree.

Computing weight update:

\[ w_{ij} = (1 / dist(x_i,c_j)^2)^{\frac{1}{p-1}} / \sum_{q=1}^k (1 / dist(x_i,c_q)^2)^{\frac{1}{p-1}} \]

If \(p > 2\), then exponent decreases the weight assigned to clusters that are close to the point. If \(p==1\) the membership weight goes to 1 for the closest cluster and to 0 for all the other clusters.

Analysis

able to represent the degree to which each point belongs to any particular cluster
other advantages similar to K-means
computationally more intensive than K-means