Fuzzy C-means
Summarized notes from Introduction to Data Mining (Int'l ed), Chapter 8.2
Unsupervised learning > clustering
-
assign a weight to each object and each cluster that indicates the degree to which the object belongs to the cluster =>
is the weight with which object belongs to cluster -
additional constraints for fuzzy clusters:
- each weight is between 0 and 1
- sum of all weights for point
is 1 - all clusters contain at least one non-zero-weight point, but all points with weight 1
Pseudo code
1 2 3 4 5 6 7 |
|
- random initialization if often used as initial weights
- choice of initial centroids is relevant, similar to K-means
Algorithm will attempt to minimize the fuzzy version of SSE:
where
Considerations for choosing
fuzzy c-means behaves like traditional K-means- when
increases all the cluster centroids approach the global centroid of all the data points
Fuzzy centroid is similar to traditional centroid except all points are considered with their respective weights, and the contribution of each point to the centroid is weighted by its membership degree.
Computing weight update:
If
Analysis
- able to represent the degree to which each point belongs to any particular cluster
- other advantages similar to K-means
- computationally more intensive than K-means