Skip to content

4. Analysing

Jan 25, 2021

Analysing parallel algorithms

Alarm Example

Consider the following algorithm:

1
2
3
4
for P = 1 ... PID parbegin:
    if Sensor[PID] > SPEC:
        then alarm = true
parend

It may cause multiple alarms to sound at once.

Then compare to this version:

1
2
3
4
5
for i = 1 ... P parbegin:
    if i == PID:
        then if Sensor[PID] > SPEC:
            then alarm = true
parend

Will only alarm once, but the algorithm is linear — not good.

Logical-OR on PRAM

Given array X[1...N], set r=1 if some X[i]=1 and 0 otherwise.

Sequential version

1
2
3
4
5
bool r init 0
for i...N
    if x[i] == 1
        then r := 1
end for

Analysis of sequential version time: - lower bound Ω(N) - upper bound O(N) - → sequential time complexity is Θ(N) - this is time-optimal implementation T1

Parallel version

r is in shared memory, initialized to 0, N=P

1
2
3
4
5
6
for PID=i...N parbegin
    if X[PID] = 1
        then r := 1
    else
        noop
parend

Parallel version analysis: - TP(N)=Θ(1) - S=T1TP(N)=Θ(N)Θ(1)=Θ(N)=Θ(P)

  • T describes time and number of machine instructions (effort)
  • Work = W=PTP - processors × time product
  • T1(N) - time on a single processor
  • superlinear speedup may occur due to caches but not realistic otherwise, because it would imply there is some T1>T1 and we should always use T1

Parallel Sum

This example is for summation but can be done with any associative operation e.g. ×, and, or, max, min...

Setup have X[1...N] → want 1NX[i]

  1. Create a summation tree
    • this is a binary tree with N leaves and h=lgN
    • initialize leaves with values of X
    • sum children bottom to top; store value at parent

img

In code map the tree to an array then sum with neighbor:

Bounds
TP(N)=Θ(lgN)
S=T1(N)TP(N)=Θ(N)Θ(lgN)=Θ(NlgN)<Θ(P)
W=PTP=NΘ(logN)=Θ(NlgN)>Θ(N)
  • algorithm is log-efficient when PTP=O(T1(N)lgN)
  • work is polylog efficient when PTP=O(T1lgk(N)) where k is some constant
  • work is optimal when PTP=O(T1(N))

optimal work implies linear speedup

Example when N=1024

  • T1(N)=1024
  • TP(N)=lgN=10
  • S=1024/10>100x speedup