Algorithm W
- for synchronous systems with crashes
- works for any \(P \leq N\) setup
- input is array \(task[1...N]\) initially 0 and 1 when task is done
- uses 3 binary trees of size \(2n-1\) to:
- enumerate
- allocate processors
- estimate progress
- perform work is assumed \(O(1)\) but can be upto \(O(\log N)\) without impact on time/work complexity.
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13 | forall processors PID=1..n parbegin
Phase W3: Visit leaves based on PID to perform work,
i.e., write to the input tasks
Phase W4: Traverse the done tree bottom up to measure progress.
while the root of the done tree is not n do:
Phase W1: Traverse tree count bottom-up to enumerate processors
Phase W2: Traverse trees done, aux, count top down to
allocate processors
Phase W3: Perform work, i.e., write to the input tasks
Phase W4: Traverse the done tree bottom up to measure progress
od
parend.
|
Complexity
Work of the algorithm is determined by \(B_{N,P} \times C_{BS}\) (cost of block steps \(\times\) iterations).
Analysis |
|
work optimal when |
\(P = O(N \frac{\log \log N}{\log^2 N})\) |
Phase 1, 2, 4 |
\(O(\log N)\) |
Phase 3 |
\(O(1)\) |
Cost of block step |
\(C_{BS} = O(\log N)\) |
Block steps, \(P \leq N\) |
\(B_{N,P} = O(N + P \frac{\lg N}{\log\log N})\) |
Work (w/o chunking) |
\(W = O(N \log N + P \frac{\lg^2 N}{\log\log N})\) |
Work (w/ chunking) |
\(W = O(N + P \frac{\lg^2 N}{\log\log N})\) |