Skip to content

Disjoint Sets

Summarized notes on Introduction to Algorithms, Chapter 21

  • two primary operations:
    1. finding set containing \(x\)
    2. uniting two sets
  • \(n\) - number of make-set operations
  • \(m\) - total number of operations, where \(m \geq n\)

Disjoint-set operations

  • disjoint-set data structure is a collection \({S_1, S_2, ..., S_k}\) disjoint dynamic sets
  • each set can be identified by a representative
  • supported operations
    1. Make-set(\(x\)) - creates new set whose only member is \(x\); \(x\) cannot exist in some other set
    2. Union(\(x, y\)) - unites two sets into a new set, then destroy \(x\) and \(y\)
    3. Find-set(\(x\)) - returns pointer to representative of set that contains \(x\)
  • since sets are disjoint, performing union reduces number of sets by 1
  • max number of unions is \(n-1\)
  • assume make-set operations are performed first

Linked list representation

  • 1 list to represent 1 set
  • set object has attributes head and tail that point to first/last element
  • each set member has pointer to next element and pointer back to set object
  • members can be in any order
  • representative is the first object in the list (below set representative is \(f\))

img

Complexity

Running time
\(O(1)\) make-set, find-set
\(Θ(n^2)\) union
\(O(m + n \lg n)\) sequence \(m\) with weighted-union
  • union is costly because merging two sets requires updating pointers of one to point to the other
  • example worst case: make-set \(n\) times then perform union \(n-1\) times → last union requires \(n-1\) pointer updates → \(n \cdot (n-1) = Θ(n^2)\)
  • issue: too many pointer updates!
  • weighted-union heuristic: append shorter list onto longer and break ties randomly to achieve \(O(m + n \lg n)\) → see pg. 567 for proof

Disjoint-set forests

  • set represented by rooted tree where
    • each individual tree is a set
    • each tree node is a set member
    • root is representative and its own parent

img

  • make-set creates tree with root
  • find-set follows pointer to parent until reaching the root - find path includes nodes visited during this search
  • union results in root of one tree pointing to root of another
  • this implementation is not yet any faster than linked list

Heuristics to improve running time

  • union by rank - maintain a rank which is the upper bound on height of a node → make root with smaller rank point to root with larger rank
    • rank will only increment during union of two roots of equal rank
    • if ranks are different, union does not change ranks
  • path compression - during find-set, make each node on find path point to root - does not change any ranks
    • two-pass method - one pass to find the root, then back to update each node
  • either heuristic improves running time; using both is optimal

Complexity

Running time
\(O(m \lg n)\) union by rank only
\(\Theta(n + f \cdot (1 + \log_{2+f/n}n))\) path compression only, where \(f\) is find-set
\(O(m \cdot \alpha(n))\) union by rank and path compression

Additional Notes

  • For all nodes \(x\), we have \(x.rank \leq x.p.rank\) with strict inequality if \(x \neq x.p\)
  • Rank is initially 0 and increases until \(x \neq x.p\);
  • \(x.rank\) does not change when \(x\) is not root
  • Value of \(x.p.rank\) increases monotonically over time
  • Every node has rank at most \(n-1\) and at least 0