6. DO-ALL

Feb 1, 2021

DO-ALL Problem

Recall PRAM comes with (1) synchrony (2) high bandwidth memory and (3) no failures → now remove synchrony.

In a DO-ALL problem, we have: - $N$ tasks and tasks are indivisible - $P$ processors and $N = P$ - all processors do the same task

1 2	`for pid = 1...P do DO TASK(PID)`

Analysis

Uniprocessor time

$T_{1, L B}^{*} = Ω (N)$ $T_{1, U B}^{*} = O (N)$ $T_{1}^{*} = Θ (N)$

Parallel time

$T_{P} = Θ (1)$

$S = \frac{T_{1}^{*} (N)}{T_{P} (N)} = \frac{Θ (N)}{Θ (1)} = Θ (N) = Θ (P)$

$W = P \cdot T_{P} (N) = Θ (P \cdot 1) = Θ (N) = O (T_{1}^{*} (N))$

When $W_{P} = W_{T_{1}^{*}}$ work is optimal

Handling asynchrony

One of the biggest problems in parallel computing is distinguishing failed processors from slow processors. If crash occurs then it is detectable.

Fail-Stop processors never perform erroneous state transformation due to failure. Instead, the processor halts and its state is irretrievably lost.

How does asynchrony happen? Can occur from e.g. page faults.

1
2
3

for pid= 1...P pardo
    for i= 1...N
        do task(i)

As long as at least 1 processor does not crash, N tasks will complete; all processors will do $N$ tasks.

With this strategy: - $W = P \cdot Θ (N) = Θ (N^{2})$ → no speedup! - Work lower bound is $Ω (N)$

What we want is to find solutions in this space between $N$ and $N^{2}$ :

Lower bound for async PRAM DO-ALL is $Ω (N \lg N)$ . Optimality is not achievable in asynchronous world: the lower bound is $N \lg N$ . No algorithm meets this lower bound.

Algorithms with $T_{P} = O (\lg N)$ implies $W = P \cdot O (\lg N) = O (P \lg N)$ . Algorithm is log efficient when the best sequential $W$ degrades by a log factor.

Oracle vs. Adversary

Goal: finish all tasks.

Challenge: dealing with an adversary who kills processors and tries to slow down progress.

All tasks are: - similar size - idempotent - tasks can be performed repeatedly without harm - independent - order of completion does not matter

Oracle will do perfect load balancing and divide tasks equally between processors.

Adversary kills $1 / 2$ of processors on each round.

Adversary loses when $\frac{N}{2^{k - 1}} = 1$ : $N = 2^{k - 1}$ $\lg N = k - 1$ $k = Θ (\lg N)$

Work per round: $\frac{N}{2} + \frac{N}{2} + \frac{N}{2} + . . .$ → $W = \frac{N}{2} \cdot Θ (\lg N) = Θ (N \lg N)$