Skip to content

Disjoint Sets

Summarized notes on Introduction to Algorithms, Chapter 21

  • two primary operations:
    1. finding set containing x
    2. uniting two sets
  • n - number of make-set operations
  • m - total number of operations, where mn

Disjoint-set operations

  • disjoint-set data structure is a collection S1,S2,...,Sk disjoint dynamic sets
  • each set can be identified by a representative
  • supported operations
    1. Make-set(x) - creates new set whose only member is x; x cannot exist in some other set
    2. Union(x,y) - unites two sets into a new set, then destroy x and y
    3. Find-set(x) - returns pointer to representative of set that contains x
  • since sets are disjoint, performing union reduces number of sets by 1
  • max number of unions is n1
  • assume make-set operations are performed first

Linked list representation

  • 1 list to represent 1 set
  • set object has attributes head and tail that point to first/last element
  • each set member has pointer to next element and pointer back to set object
  • members can be in any order
  • representative is the first object in the list (below set representative is f)

img

Complexity

Running time
O(1) make-set, find-set
Θ(n2) union
O(m+nlgn) sequence m with weighted-union
  • union is costly because merging two sets requires updating pointers of one to point to the other
  • example worst case: make-set n times then perform union n1 times → last union requires n1 pointer updates → n(n1)=Θ(n2)
  • issue: too many pointer updates!
  • weighted-union heuristic: append shorter list onto longer and break ties randomly to achieve O(m+nlgn) → see pg. 567 for proof

Disjoint-set forests

  • set represented by rooted tree where
    • each individual tree is a set
    • each tree node is a set member
    • root is representative and its own parent

img

  • make-set creates tree with root
  • find-set follows pointer to parent until reaching the root - find path includes nodes visited during this search
  • union results in root of one tree pointing to root of another
  • this implementation is not yet any faster than linked list

Heuristics to improve running time

  • union by rank - maintain a rank which is the upper bound on height of a node → make root with smaller rank point to root with larger rank
    • rank will only increment during union of two roots of equal rank
    • if ranks are different, union does not change ranks
  • path compression - during find-set, make each node on find path point to root - does not change any ranks
    • two-pass method - one pass to find the root, then back to update each node
  • either heuristic improves running time; using both is optimal

Complexity

Running time
O(mlgn) union by rank only
Θ(n+f(1+log2+f/nn)) path compression only, where f is find-set
O(mα(n)) union by rank and path compression

Additional Notes

  • For all nodes x, we have x.rankx.p.rank with strict inequality if xx.p
  • Rank is initially 0 and increases until xx.p;
  • x.rank does not change when x is not root
  • Value of x.p.rank increases monotonically over time
  • Every node has rank at most n1 and at least 0