Skip to content

Multithreaded Algorithms

Summarized notes on Introduction to Algorithms, Chapter 27

General Notes

  • parallel algorithms run on multiprocessor computers → multiple instructions can execute concurrently
  • parallel computer has many processors → different types:
    • chip multiprocessors - single multicore chip with multiple cores
    • clusters - built from individual computers with interconnected network
    • supercomputers
  • no parallel model has widespread acceptance
    • shared memory each processor can directly access any location of memory
    • distributed memory processors memory is private; messaging used to communicate between processors
    • trend toward shared memory model
  • static threading - threads sharing common memory
    • threads execute independent of other threads
    • threads usually exist for duration of the computation
    • this is difficult and error-prone
  • concurrency platforms provide software layer for coordinating, scheduling, and managing parallel computing resources
    • include dynamic multithreading: enable parallelism in applications without worrying about load balancing and other issues associated with static-thread programming
    • concurrency platform includes a scheduler which load-balances computation automatically
    • two common features: nested parallelism and parallel loops
    • nested parallelism: spawn child thread while parent continues execution
    • parallel loops: iterations execute concurrently

1. Dynamic multithreading

  • serialization of a multithreaded algorithm → delete keywords spawn, sync and parallel
  • nested parallelism - when spawn precedes a procedure call → parent may continue to execute parallel to child
  • keyword spawn only indicates logical parallelism → scheduler will determine actual execution
  • sync is used to wait for al spawned children to finish computation

Computation DAG

  • computation DAG - representation of multithreaded computation as a graph

    • vertices are instructions
    • edges are dependencies between instructions, i.e. (u,v)E means u must execute before v
    • strand is a sequence of instructions with no parallel control keywords
    • if G has directed path from strands u to strand v they are in series → otherwise u and v are (logically) in parallel
  • classification of edges in a computation DAG

    • continuation edge (horizontal) connects u to its successor u
    • spawn edge (downward with horizontal continuation edge) when u spawns a strand v
    • call edge (downward) normal procedure call
    • return edge (upward) when u returns to its calling procedure, and x is next strand: (u,x)
    • initial strand single starting point of computation
    • final strand single ending point of computation

img

Performance measures

Equation Meaning
TPT1/P work law
TPT span law
T1/TP speedup
T1/TP=Θ(P) linear speedup
T1/TP=P perfect linear speedup
T1/T parallelism of an algorithm
(T1/T)/P=T1/(PT) slackness
  • work total time to execute the entire computation on one processor
  • span the longest time to execute the strands along any path of the dag → assume unit time then span is the number of vertices on the critical path of the DAG
  • speedup how many times faster the computation is on P processors than on 1 processor → at most P
  • linear speedup when speedup increases linearly wrt. number of processors
  • parallelism can be viewed in 3 ways
    • average amount of work along each step of critical path
    • upper bound on maximum possible speedup
    • limit on possible attaining perfect linear speedup → once number of processors exceeds parallelism cannot achieve perfect linear speedup
  • slackness factor by which parallelism of computation exceeds number of processors in the machine

    • if <1 cannot achieve perfect linear speedup
    • if >1 work per processor is limiting factor
  • concurrency platform's scheduler determines execution in practice → operates on-line without advance knowledge of when threads will be spawned or complete

  • greedy scheduler assigns as many strands to processors as possible in each time step
  • complete step if at least P strands are ready to execute in a time step
  • incomplete step when fewer than P strands are ready to execute
  • The running time Tp of any multithreaded computation scheduled by a greedy scheduler on an ideal parallel computer with P processors is within a factor of 2 of optimal.
Running time
TPT1P+T ideal parallel computer w/ greedy scheduler

Parallel loops

pseudocode

img

img

Race Conditions

  • deterministic algorithm performs same operations on same input independent of scheduling
  • nondeterministic when behavior may cary between executions → typically caused by race condition
  • determinacy race when 2 or more instructions access the same memory location and at least one performs write operation
  • strands are independent when they have no determinacy races