Skip to content

14. Groote Algorithm

Mar 1, 2021

Algorithm X (cont.)

Recall when \(N = P\) algorithm X has work complexity \(W = O(N^{1.59})\). Next consider case where \(P \leq N\).

Subtree strategy

First solve the problem for \(P\) processors, and continue until the subtree is done:

img

(\(log_{2}3-1 = log_23- log_22 = log_2\frac{3}{2}\))

\[ W_{N, P} = \frac{N}{P} W_{P,P} + P \frac{N}{P} = \frac{N}{P}W + \frac{N}{P} \cdot P^{log_{2}\frac{3}{2}} = O(N \cdot P^{0.59}) \]

\(= O(T_1^* \cdot P^{0.59})\). This is polynomial work efficiency.

Polynomial when \(W = O(T_1^* \cdot N^\epsilon)\) where \(0 < \epsilon < 1\). Also remember solutions of interest are between \(N \lg N\) and \(N^2\).

Chunking strategy

Split into 2 subtrees. One will be done first. Then send all processors to the remaining side. (doppelganger effect).

img

for use soon: \(3^{\log_2P} = P^{\log_23}\)

(some mistake here; should be \(\log_2^{\frac{3}{2}}\) since 0.59)

\(W_{\frac{N}{P},1} = O(\frac{N}{P})\)

\[ W_{N, P} = W_L + W_R = W_{\frac{N}{2},\frac{P}{2}} + 2W_{\frac{N}{2},\frac{P}{2}} = 3W_{\frac{N}{2},\frac{P}{2}} = 9W_{\frac{N}{2},\frac{P}{2}} \\ \]
\[ \vdots \\ \]
\[ = 3^kW_{\frac{N}{2},\frac{P}{2}} \]
\[ = 3^{log_2P}W_{\frac{N}{P},1} = O(\frac{N}{P} \cdot 3^{log_2P}) = O(N \cdot P^{log_23}) = O(N \cdot P^{0.59}) \]

Same result, which is better? If there are no failures: - subtree strategy: \(W = P \log P \cdot \frac{N}{P} = N \log P\) - chunking strategy: \(W = \frac{N}{P} \cdot \log P \cdot P = N + P \log P\) (better)

New Notion for Optimality

If work of an algorithm is \(W_{P, N} = (T_1^*) + o(N)\).

\(\lim_{n \to +\infty} \frac{f(N)}{N} \to 0\)

Measure in precise units DO-ALL \(W_{P,N} = N + o(N)\).

Algorithms of this complexity are optimal.

Groote Algorithm

This is a collision based DO-ALL algorithm.

Case: 2 processors

Setup: \(P = 2\), \(N\) tasks to complete

Trivial solution: both processors do all tasks, \(W = 2N = O(T_1^*)\).

Another strategy:

  • two processors start from opposite ends of array.
  • when finished with task, set bit-array to 1
  • check next bit and do work until collision
  • if at collision both see 1, the last task is done twice

\(W_{P2} = N + 1\)

Case: 3 processors

img

Work on 1 half is: \(W_R = \frac{N}{2} + 1\)

After the collision, the middle processor works exclusively in one direction. The end jumps in the middle of the remaining half.

\(W_N = \frac{N}{2} + 1 + W_{left} = \frac{N}{2} + 1 + W_{\frac{N}{2}}\)

Solve this recurrence:

\(= \frac{N}{2} + 1 + ({\frac{N}{4}} + 1 + W_\frac{N}{4})\)

\(= \frac{N}{2} + \frac{N}{4} + 2 + ({\frac{N}{8}} + 1 + W_\frac{N}{8})\)

\(= \sum_1^k \frac{N}{2} + k + W_\frac{N}{2^k}\)

\(= N \cdot \sum_1^\infty \frac{1}{2} + \log_2N + W_1\)

This recurrence is done when \(\frac{N}{2^k} = 1\) or \(k = log_2N\). Work of 1 processor is \(W_1 = O(3)\).

\(= N + log_2N + 3\)

\(= N + o(N)\)   ← this is optimal

This strategy works for a few processors, but becomes tedious when number of processors increases.

Case: \(P = 2^k\) processors

k P W
1 1 \(W= N +1\)
2 4 \(W = N^2 + \sqrt{N} + 1\)
3 8 \(W = (M + 1)3\)
d \(2^d\) \(W_d = (M + 1)^d\)

Example k=2: 4 processors

Divide processors into 2 groups. There are \(M \times M = N\) tasks.

img

Processors work vertically on each slice. Ultimately a collision occurs.

\(W_{pair} = W_p = M+1\)   - for doing 1 vertical chunk

\(W_{total} = W_{col} \cdot (M+1) = (M + 1)(M + 1) = M^2 + 2M + 1\)

\(= N + 2 \cdot \sqrt{N} + 1 = N + o(N)\)   ← optimal

Example k=3: 8 processors

Divide processors into groups of two and work on a cube from both directions

img

\(M^3 = N\) and \(N^{1/3} = M\)

\(W_{slice} = (M+1) = N + 2 \cdot \sqrt{N} + 1 = (M+1)^2\)

\(W_{total} = (M+1)^2 (M+1) = M^3 + 3M^2 + 3M + 1\)

\(= N + 3N^\frac{2}{3} + 3N^\frac{1}{3} + 1\)

\(= N + o(N)\)

Example k=4: 16 processors

img

\(N = M^4\) and \(N^\frac{1}{4} = M\)

\(W = W_{P8} \cdot (M+1) = (M + 1)^4\)

\(= N + 4N^\frac{3}{4} + 6N^\frac{2}{4} + 4N^\frac{1}{4} + 1\)

\(= N + o(N)\)

Notice: each time we increase \(k\) the coefficients increase. The result is a sum that does not converge.

In the general case \(k = d\): \(P = 2^d\) \(N = M^d\) and \(W_d = (M+1)^d\)

Example k = d

\(d =\) dimensions, \(P = 2^d\) and \(N = M^d\), \(d = \log_{2}P\)

\(W_d = (M + 1)^d = (M + 1)^{\log P} = \frac{N}{N} (M+1)^{\log P}\)

\(= N \cdot M^{\frac{1}{d}} \cdot (M+1)^{\log_2P}\)

\(= N \cdot (\frac{M + 1}{M})^{\log_2P}\)

\(= N \cdot P^{\log2(\frac{M + 1}{M})}\)

If \(M = 1\) then \(N=1\) and \(1 \cdot P^{\log_2\frac{2}{1}} = P\).

If \(M \to \infty\) then \(W \to N\).

If \(M = 2\) then \(W_d = N + P^{\log_2\frac{3}{2}}\)   ← polynomially efficient