11. Crashes
Feb 17, 2021
Working with Crashes in PRAM
Processor enumeration: use an enumeration tree (e-tree), then run prefix sum on the processors. Enumeration produces and overestimate if processors crash.
-
When no failures occur: time: \(\Theta(\lg P)\), work: \(\Theta(P \lg P)\)
-
With failures: same time, less work gets done. Over time, work increases.
Work is optimal when \(\sum PT = P \cdot T_P = O(T_1^*)\).
Progress estimation: do bottom-up sum. If all tasks are not done, enumerate the processors that are left and load balance → compact processors into a more compact range. To load balance, go top to down along the same tree.
Time \(T_P = \lg N\), and work \(W = P \lg P\).
Progress estimate bottom-up sum is an underestimate when processors fail during summation: fewer tasks appear done than actual.
Algorithm W: Block step analysis
Tasks: \(\text{task}[1...N]\), \(N = P\)
1 2 3 4 5 6 7 8 9 |
|
- steps 1.-2. represent the "oracle"
- In load balancing step look for leaf with value 0, then do that task
- steps 1.-4. combined are called block step
What is the cost of a block step?
Allocating work is negligible if the task is large.
How many block steps do we need? If the number block steps is known, we can compute \(B_{N,P} = BS \times \Theta(\lg P + \lg N)\) and be done.
Handout pg. 147, lemma 10.23: The number of block steps of algorithm W using \(P\) processors on input size \(N\) and using a progress tree with \(N\) leaves, is
for \(P \leq N\).
Notation | Iteration \(j\) | End of Iteration \(j\) |
---|---|---|
Iteration | \(j\) | \(j + 1\) |
Overestimate of active processors | \(P_j\) | \(P_{j+1}\) |
Remaining unvisited leaves | \(U_j\) | \(U_{j + 1}\) |
- From Lemma 10.20: \(P_j \geq P_{j+1}\) (crash failures)
-
From Lemma 10.22: \(U_j > U_{j+1}\) (at least 1 task left)
-
Final iteration of algorithm: \(\tau\)
- number of unvisited elements: \(U_\tau > 0\)
- After iteration \(\tau\), number of unvisited elements: \(U_{\tau+1} = 0\)
Investigate 3 cases
Case 1: fewer processors than tasks
\(P_j < U_j\) - there are fewer processors than unfinished tasks.
Each leaf will be assigned no more than 1 processor.
Number of block steps will be no more than:
Case 2a: lots of progress, few failures
\(P_j \geq U_j\) - there are at least as many processors as tasks
\(\displaystyle U_{j+1} < \frac{U_j}{\log N/\log\log N}\)
Can occur no more than \(O(\log N/\log \log N)\) times, because \(U_{j+1} < U_j = N\).
No more than \(P\) processors complete such block steps, therefore total number of block steps is no more than:
Case 2b: many failures and slow progress
\(P_j \geq U_j\) - there are at least as many processors as tasks
\(\displaystyle U_{j+1} \geq \frac{U_j}{\log N/\log\log N}\)
By Lemma 10.21, number of assigned processors is no less than \(\lfloor{\frac{P_j}{U_j}}\rfloor\) and no more than \(\lceil{\frac{P_j}{U_j}}\rceil\).
Number of failed processors is at least:
and at most \(\tau\) times.
Number of processors completing step \(j\) is at most/no more than:
In general, for \(P\) initial processors, number of processors completing \(l^{th}\) occurrence of case 2b is no more than:
The number of block steps in case 2b is bounded by:
Total block steps
Total number of all block steps for all cases is