Disjoint Sets
Summarized notes on Introduction to Algorithms, Chapter 21
- two primary operations:
- finding set containing
- uniting two sets
- - number of make-set operations
- - total number of operations, where
Disjoint-set operations
- disjoint-set data structure is a collection disjoint dynamic sets
- each set can be identified by a representative
- supported operations
- Make-set() - creates new set whose only member is ; cannot exist in some other set
- Union() - unites two sets into a new set, then destroy and
- Find-set() - returns pointer to representative of set that contains
- since sets are disjoint, performing union reduces number of sets by 1
- max number of unions is
- assume make-set operations are performed first
Linked list representation
- 1 list to represent 1 set
- set object has attributes head and tail that point to first/last element
- each set member has pointer to next element and pointer back to set object
- members can be in any order
- representative is the first object in the list (below set representative is )

Complexity
- union is costly because merging two sets requires updating pointers of one to point to the other
- example worst case: make-set times then perform union times → last union requires pointer updates →
- issue: too many pointer updates!
- weighted-union heuristic: append shorter list onto longer and break ties randomly to achieve → see pg. 567 for proof
Disjoint-set forests
- set represented by rooted tree where
- each individual tree is a set
- each tree node is a set member
- root is representative and its own parent

- make-set creates tree with root
- find-set follows pointer to parent until reaching the root - find path includes nodes visited during this search
- union results in root of one tree pointing to root of another
- this implementation is not yet any faster than linked list
Heuristics to improve running time
- union by rank - maintain a rank which is the upper bound on height of a node → make root with smaller rank point to root with larger rank
- rank will only increment during union of two roots of equal rank
- if ranks are different, union does not change ranks
- path compression - during find-set, make each node on find path point to root - does not change any ranks
- two-pass method - one pass to find the root, then back to update each node
- either heuristic improves running time; using both is optimal
Complexity
Additional Notes
- For all nodes , we have with strict inequality if
- Rank is initially 0 and increases until ;
- does not change when is not root
- Value of increases monotonically over time
- Every node has rank at most and at least 0