Skip to content

Algorithm W

  • for synchronous systems with crashes
  • works for any \(P \leq N\) setup
  • input is array \(task[1...N]\) initially 0 and 1 when task is done
  • uses 3 binary trees of size \(2n-1\) to:
    • enumerate
    • allocate processors
    • estimate progress
  • perform work is assumed \(O(1)\) but can be upto \(O(\log N)\) without impact on time/work complexity.

Pseudocode

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
forall processors PID=1..n parbegin
    Phase W3: Visit leaves based on PID to perform work, 
    i.e., write to the  input tasks
    Phase W4: Traverse the done tree bottom up to measure progress. 

    while the root of the done tree is not n do:
        Phase W1: Traverse tree count bottom-up to enumerate processors
        Phase W2: Traverse trees done, aux, count top down to 
                  allocate processors 
        Phase W3: Perform work, i.e., write to the input tasks
        Phase W4: Traverse the done tree bottom up to measure progress
    od 
parend.

Complexity

Work of the algorithm is determined by \(B_{N,P} \times C_{BS}\) (cost of block steps \(\times\) iterations).

Analysis
work optimal when \(P = O(N \frac{\log \log N}{\log^2 N})\)
Phase 1, 2, 4 \(O(\log N)\)
Phase 3 \(O(1)\)
Cost of block step \(C_{BS} = O(\log N)\)
Block steps, \(P \leq N\) \(B_{N,P} = O(N + P \frac{\lg N}{\log\log N})\)
Work (w/o chunking) \(W = O(N \log N + P \frac{\lg^2 N}{\log\log N})\)
Work (w/ chunking) \(W = O(N + P \frac{\lg^2 N}{\log\log N})\)