Skip to content

3. Speedup & PRAMs

Jan 20, 2021

Speedup

  • on a uniprocessor: \(P_1 = T_1(N)\)
  • with multiple processors: \(P = T_P(N)\)

Speedup \(S = \frac{T_1(N)}{T_P(N)}\)

Goal is to achieve linear speedup

  • \(T_1^{*}\) is the best known sequential time
    • when both upper (\(O\)) and lower (\(\Omega\)) bound, then we have \(\Theta\) bound
    • for example: linear search of an array with \(N\) values is \(\Theta(N)\)
    • \(T_1\) is always expressed as optimal time \(T_1^*\)

Example 1: PRAM find value

Start with \(P = N\) processors on (concurrent read) PRAM model and same size input

1
2
3
4
5
6
shared X[1...N]

for P = 1 ... PID parbegin:
    if X[PID] == value:
        print("found")
parend

Time: \(T_P = \Theta(1)\)

Speedup: \(S = \frac{T_1^*}{T_P} = \frac{\Theta(N)}{\Theta(1)} = \Theta(N) = \Theta(P)\)

Important! - never divide by \(O\) - always express speedup in terms of \(P\) - always compare to time-optimal \(T_1^*\)

Example 2: PRAM find value v. 2

Similar setting but assume CREW PRAM. Use this different algorithm, this time (try to) use variable to determine found state and index of where the match was found:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
shared X[1...N]
found = false
index_where = None

for P = 1 ... PID parbegin:
    if X[PID] == value:
    then:
        found = true
        index_where = PID  # issue!
    else:
        noop, noop
parend

if found
    do Proc(A)
else
    do Proc(B)

What is the problem?

Simultaneous writes by multiple processors into same memory location → runtime errors

PRAM types

  • EREW PRAM = exclusive read, exclusive write: no two processors allowed to read/write to/from same shared memory location
  • CREW PRAM = concurrent read, exclusive write
  • CRCW PRAM = concurrent read and write
    • the most popular model
    • there are subcategories of CRCW PRAM:
      • Common CRCW PRAM: concurrent writes must write the same value
      • Priority CRCW PRAM: processor with higher priority gets to write
      • arbitrary CRCW PRAM: concurrent writes resolved arbitrarily