14. Groote Algorithm

Mar 1, 2021

Algorithm X (cont.)

Recall when $N = P$ algorithm X has work complexity $W = O (N^{1.59})$ . Next consider case where $P \leq N$ .

Subtree strategy

First solve the problem for $P$ processors, and continue until the subtree is done:

( $l o g_{2} 3 - 1 = l o g_{2} 3 - l o g_{2} 2 = l o g_{2} \frac{3}{2}$ )

W_{N, P} = \frac{N}{P} W_{P, P} + P \frac{N}{P} = \frac{N}{P} W + \frac{N}{P} \cdot P^{l o g_{2} \frac{3}{2}} = O (N \cdot P^{0.59})

$= O (T_{1}^{*} \cdot P^{0.59})$ . This is polynomial work efficiency.

Polynomial when $W = O (T_{1}^{*} \cdot N^{ϵ})$ where $0 < ϵ < 1$ . Also remember solutions of interest are between $N \lg N$ and $N^{2}$ .

Chunking strategy

Split into 2 subtrees. One will be done first. Then send all processors to the remaining side. (doppelganger effect).

for use soon: $3^{\log_{2} P} = P^{\log_{2} 3}$

(some mistake here; should be $\log_{2}^{\frac{3}{2}}$ since 0.59)

$W_{\frac{N}{P}, 1} = O (\frac{N}{P})$

W_{N, P} = W_{L} + W_{R} = W_{\frac{N}{2}, \frac{P}{2}} + 2 W_{\frac{N}{2}, \frac{P}{2}} = 3 W_{\frac{N}{2}, \frac{P}{2}} = 9 W_{\frac{N}{2}, \frac{P}{2}}

⋮

= 3^{k} W_{\frac{N}{2}, \frac{P}{2}}

= 3^{l o g_{2} P} W_{\frac{N}{P}, 1} = O (\frac{N}{P} \cdot 3^{l o g_{2} P}) = O (N \cdot P^{l o g_{2} 3}) = O (N \cdot P^{0.59})

Same result, which is better? If there are no failures: - subtree strategy: $W = P \log P \cdot \frac{N}{P} = N \log P$ - chunking strategy: $W = \frac{N}{P} \cdot \log P \cdot P = N + P \log P$ (better)

New Notion for Optimality

If work of an algorithm is $W_{P, N} = (T_{1}^{*}) + o (N)$ .

$lim_{n \to + \infty} \frac{f (N)}{N} \to 0$

Measure in precise units DO-ALL $W_{P, N} = N + o (N)$ .

Algorithms of this complexity are optimal.

Groote Algorithm

This is a collision based DO-ALL algorithm.

Case: 2 processors

Setup: $P = 2$ , $N$ tasks to complete

Trivial solution: both processors do all tasks, $W = 2 N = O (T_{1}^{*})$ .

Another strategy:

two processors start from opposite ends of array.
when finished with task, set bit-array to 1
check next bit and do work until collision
if at collision both see 1, the last task is done twice

$W_{P 2} = N + 1$

Case: 3 processors

Work on 1 half is: $W_{R} = \frac{N}{2} + 1$

After the collision, the middle processor works exclusively in one direction. The end jumps in the middle of the remaining half.

$W_{N} = \frac{N}{2} + 1 + W_{l e f t} = \frac{N}{2} + 1 + W_{\frac{N}{2}}$

Solve this recurrence:

$= \frac{N}{2} + 1 + (\frac{N}{4} + 1 + W_{\frac{N}{4}})$

$= \frac{N}{2} + \frac{N}{4} + 2 + (\frac{N}{8} + 1 + W_{\frac{N}{8}})$

$= \sum_{1}^{k} \frac{N}{2} + k + W_{\frac{N}{2^{k}}}$

$= N \cdot \sum_{1}^{\infty} \frac{1}{2} + \log_{2} N + W_{1}$

This recurrence is done when $\frac{N}{2^{k}} = 1$ or $k = l o g_{2} N$ . Work of 1 processor is $W_{1} = O (3)$ .

$= N + l o g_{2} N + 3$

$= N + o (N)$ ← this is optimal

This strategy works for a few processors, but becomes tedious when number of processors increases.

Case: $P = 2^{k}$ processors

k	P	W
1	1	$W = N + 1$
2	4	$W = N^{2} + \sqrt{N} + 1$
3	8	$W = (M + 1) 3$
d	$2^{d}$	$W_{d} = (M + 1)^{d}$

Example k=2: 4 processors

Divide processors into 2 groups. There are $M \times M = N$ tasks.

Processors work vertically on each slice. Ultimately a collision occurs.

$W_{p a i r} = W_{p} = M + 1$ - for doing 1 vertical chunk

$W_{t o t a l} = W_{c o l} \cdot (M + 1) = (M + 1) (M + 1) = M^{2} + 2 M + 1$

$= N + 2 \cdot \sqrt{N} + 1 = N + o (N)$ ← optimal

Example k=3: 8 processors

Divide processors into groups of two and work on a cube from both directions

$M^{3} = N$ and $N^{1 / 3} = M$

$W_{s l i c e} = (M + 1) = N + 2 \cdot \sqrt{N} + 1 = (M + 1)^{2}$

$W_{t o t a l} = (M + 1)^{2} (M + 1) = M^{3} + 3 M^{2} + 3 M + 1$

$= N + 3 N^{\frac{2}{3}} + 3 N^{\frac{1}{3}} + 1$

$= N + o (N)$

Example k=4: 16 processors

$N = M^{4}$ and $N^{\frac{1}{4}} = M$

$W = W_{P 8} \cdot (M + 1) = (M + 1)^{4}$

$= N + 4 N^{\frac{3}{4}} + 6 N^{\frac{2}{4}} + 4 N^{\frac{1}{4}} + 1$

$= N + o (N)$

Notice: each time we increase $k$ the coefficients increase. The result is a sum that does not converge.

In the general case $k = d$ : $P = 2^{d}$ $N = M^{d}$ and $W_{d} = (M + 1)^{d}$

Example k = d

$d =$ dimensions, $P = 2^{d}$ and $N = M^{d}$ , $d = \log_{2} P$

$W_{d} = (M + 1)^{d} = (M + 1)^{\log P} = \frac{N}{N} (M + 1)^{\log P}$

$= N \cdot M^{\frac{1}{d}} \cdot (M + 1)^{\log_{2} P}$

$= N \cdot (\frac{M + 1}{M})^{\log_{2} P}$

$= N \cdot P^{\log 2 (\frac{M + 1}{M})}$

If $M = 1$ then $N = 1$ and $1 \cdot P^{\log_{2} \frac{2}{1}} = P$ .

If $M \to \infty$ then $W \to N$ .

If $M = 2$ then $W_{d} = N + P^{\log_{2} \frac{3}{2}}$ ← polynomially efficient