Disjoint Sets
Summarized notes on Introduction to Algorithms, Chapter 21
- two primary operations:
- finding set containing \(x\)
- uniting two sets
- \(n\) - number of make-set operations
- \(m\) - total number of operations, where \(m \geq n\)
Disjoint-set operations
- disjoint-set data structure is a collection \({S_1, S_2, ..., S_k}\) disjoint dynamic sets
- each set can be identified by a representative
- supported operations
- Make-set(\(x\)) - creates new set whose only member is \(x\); \(x\) cannot exist in some other set
- Union(\(x, y\)) - unites two sets into a new set, then destroy \(x\) and \(y\)
- Find-set(\(x\)) - returns pointer to representative of set that contains \(x\)
- since sets are disjoint, performing union reduces number of sets by 1
- max number of unions is \(n-1\)
- assume make-set operations are performed first
Linked list representation
- 1 list to represent 1 set
- set object has attributes head and tail that point to first/last element
- each set member has pointer to next element and pointer back to set object
- members can be in any order
- representative is the first object in the list (below set representative is \(f\))
Complexity
Running time |
|
\(O(1)\) |
make-set, find-set |
\(Θ(n^2)\) |
union |
\(O(m + n \lg n)\) |
sequence \(m\) with weighted-union |
- union is costly because merging two sets requires updating pointers of one to point to the other
- example worst case: make-set \(n\) times then perform union \(n-1\) times → last union requires \(n-1\) pointer updates → \(n \cdot (n-1) = Θ(n^2)\)
- issue: too many pointer updates!
- weighted-union heuristic: append shorter list onto longer and break ties randomly to achieve \(O(m + n \lg n)\) → see pg. 567 for proof
Disjoint-set forests
- set represented by rooted tree where
- each individual tree is a set
- each tree node is a set member
- root is representative and its own parent
- make-set creates tree with root
- find-set follows pointer to parent until reaching the root - find path includes nodes visited during this search
- union results in root of one tree pointing to root of another
- this implementation is not yet any faster than linked list
Heuristics to improve running time
- union by rank - maintain a rank which is the upper bound on height of a node → make root with smaller rank point to root with larger rank
- rank will only increment during union of two roots of equal rank
- if ranks are different, union does not change ranks
- path compression - during find-set, make each node on find path point to root - does not change any ranks
- two-pass method - one pass to find the root, then back to update each node
- either heuristic improves running time; using both is optimal
Complexity
Running time |
|
\(O(m \lg n)\) |
union by rank only |
\(\Theta(n + f \cdot (1 + \log_{2+f/n}n))\) |
path compression only, where \(f\) is find-set |
\(O(m \cdot \alpha(n))\) |
union by rank and path compression |
Additional Notes
- For all nodes \(x\), we have \(x.rank \leq x.p.rank\) with strict inequality if \(x \neq x.p\)
- Rank is initially 0 and increases until \(x \neq x.p\);
- \(x.rank\) does not change when \(x\) is not root
- Value of \(x.p.rank\) increases monotonically over time
- Every node has rank at most \(n-1\) and at least 0