Skip to content

14. Groote Algorithm

Mar 1, 2021

Algorithm X (cont.)

Recall when N=P algorithm X has work complexity W=O(N1.59). Next consider case where PN.

Subtree strategy

First solve the problem for P processors, and continue until the subtree is done:

img

(log231=log23log22=log232)

WN,P=NPWP,P+PNP=NPW+NPPlog232=O(NP0.59)

=O(T1P0.59). This is polynomial work efficiency.

Polynomial when W=O(T1Nϵ) where 0<ϵ<1. Also remember solutions of interest are between NlgN and N2.

Chunking strategy

Split into 2 subtrees. One will be done first. Then send all processors to the remaining side. (doppelganger effect).

img

for use soon: 3log2P=Plog23

(some mistake here; should be log232 since 0.59)

WNP,1=O(NP)

WN,P=WL+WR=WN2,P2+2WN2,P2=3WN2,P2=9WN2,P2
=3kWN2,P2
=3log2PWNP,1=O(NP3log2P)=O(NPlog23)=O(NP0.59)

Same result, which is better? If there are no failures: - subtree strategy: W=PlogPNP=NlogP - chunking strategy: W=NPlogPP=N+PlogP (better)

New Notion for Optimality

If work of an algorithm is WP,N=(T1)+o(N).

limn+f(N)N0

Measure in precise units DO-ALL WP,N=N+o(N).

Algorithms of this complexity are optimal.

Groote Algorithm

This is a collision based DO-ALL algorithm.

Case: 2 processors

Setup: P=2, N tasks to complete

Trivial solution: both processors do all tasks, W=2N=O(T1).

Another strategy:

  • two processors start from opposite ends of array.
  • when finished with task, set bit-array to 1
  • check next bit and do work until collision
  • if at collision both see 1, the last task is done twice

WP2=N+1

Case: 3 processors

img

Work on 1 half is: WR=N2+1

After the collision, the middle processor works exclusively in one direction. The end jumps in the middle of the remaining half.

WN=N2+1+Wleft=N2+1+WN2

Solve this recurrence:

=N2+1+(N4+1+WN4)

=N2+N4+2+(N8+1+WN8)

=1kN2+k+WN2k

=N112+log2N+W1

This recurrence is done when N2k=1 or k=log2N. Work of 1 processor is W1=O(3).

=N+log2N+3

=N+o(N)   ← this is optimal

This strategy works for a few processors, but becomes tedious when number of processors increases.

Case: P=2k processors

k P W
1 1 W=N+1
2 4 W=N2+N+1
3 8 W=(M+1)3
d 2d Wd=(M+1)d

Example k=2: 4 processors

Divide processors into 2 groups. There are M×M=N tasks.

img

Processors work vertically on each slice. Ultimately a collision occurs.

Wpair=Wp=M+1   - for doing 1 vertical chunk

Wtotal=Wcol(M+1)=(M+1)(M+1)=M2+2M+1

=N+2N+1=N+o(N)   ← optimal

Example k=3: 8 processors

Divide processors into groups of two and work on a cube from both directions

img

M3=N and N1/3=M

Wslice=(M+1)=N+2N+1=(M+1)2

Wtotal=(M+1)2(M+1)=M3+3M2+3M+1

=N+3N23+3N13+1

=N+o(N)

Example k=4: 16 processors

img

N=M4 and N14=M

W=WP8(M+1)=(M+1)4

=N+4N34+6N24+4N14+1

=N+o(N)

Notice: each time we increase k the coefficients increase. The result is a sum that does not converge.

In the general case k=d: P=2d N=Md and Wd=(M+1)d

Example k = d

d= dimensions, P=2d and N=Md, d=log2P

Wd=(M+1)d=(M+1)logP=NN(M+1)logP

=NM1d(M+1)log2P

=N(M+1M)log2P

=NPlog2(M+1M)

If M=1 then N=1 and 1Plog221=P.

If M then WN.

If M=2 then Wd=N+Plog232   ← polynomially efficient