Disjoint Sets

Summarized notes on Introduction to Algorithms, Chapter 21

Disjoint-set operations

disjoint-set data structure is a collection \({S_1, S_2, ..., S_k}\) disjoint dynamic sets
each set can be identified by a representative
supported operations
1. Make-set(\(x\)) - creates new set whose only member is \(x\); \(x\) cannot exist in some other set
2. Union(\(x, y\)) - unites two sets into a new set, then destroy \(x\) and \(y\)
3. Find-set(\(x\)) - returns pointer to representative of set that contains \(x\)
since sets are disjoint, performing union reduces number of sets by 1
max number of unions is \(n-1\)
assume make-set operations are performed first

1 list to represent 1 set
set object has attributes head and tail that point to first/last element
each set member has pointer to next element and pointer back to set object
members can be in any order
representative is the first object in the list (below set representative is \(f\))

Running time
\(O(1)\)	make-set, find-set
\(Θ(n^2)\)	union
\(O(m + n \lg n)\)	sequence \(m\) with weighted-union

union is costly because merging two sets requires updating pointers of one to point to the other
example worst case: make-set \(n\) times then perform union \(n-1\) times → last union requires \(n-1\) pointer updates → \(n \cdot (n-1) = Θ(n^2)\)
issue: too many pointer updates!
weighted-union heuristic: append shorter list onto longer and break ties randomly to achieve \(O(m + n \lg n)\) → see pg. 567 for proof

set represented by rooted tree where
- each individual tree is a set
- each tree node is a set member
- root is representative and its own parent

make-set creates tree with root
find-set follows pointer to parent until reaching the root - find path includes nodes visited during this search
union results in root of one tree pointing to root of another
this implementation is not yet any faster than linked list

union by rank - maintain a rank which is the upper bound on height of a node → make root with smaller rank point to root with larger rank
- rank will only increment during union of two roots of equal rank
- if ranks are different, union does not change ranks
path compression - during find-set, make each node on find path point to root - does not change any ranks
- two-pass method - one pass to find the root, then back to update each node
either heuristic improves running time; using both is optimal

Running time
\(O(m \lg n)\)	union by rank only
\(\Theta(n + f \cdot (1 + \log_{2+f/n}n))\)	path compression only, where \(f\) is find-set
\(O(m \cdot \alpha(n))\)	union by rank and path compression

For all nodes \(x\), we have \(x.rank \leq x.p.rank\) with strict inequality if \(x \neq x.p\)
Rank is initially 0 and increases until \(x \neq x.p\);
\(x.rank\) does not change when \(x\) is not root
Value of \(x.p.rank\) increases monotonically over time
Every node has rank at most \(n-1\) and at least 0