Multithreaded Algorithms
Summarized notes on Introduction to Algorithms, Chapter 27
General Notes
- parallel algorithms run on multiprocessor computers → multiple instructions can execute concurrently
- parallel computer has many processors → different types:
- chip multiprocessors - single multicore chip with multiple cores
- clusters - built from individual computers with interconnected network
- supercomputers
- no parallel model has widespread acceptance
- shared memory each processor can directly access any location of memory
- distributed memory processors memory is private; messaging used to communicate between processors
- trend toward shared memory model
- static threading - threads sharing common memory
- threads execute independent of other threads
- threads usually exist for duration of the computation
- this is difficult and error-prone
- concurrency platforms provide software layer for coordinating, scheduling, and managing parallel computing resources
- include dynamic multithreading: enable parallelism in applications without worrying about load balancing and other issues associated with static-thread programming
- concurrency platform includes a scheduler which load-balances computation automatically
- two common features: nested parallelism and parallel loops
- nested parallelism: spawn child thread while parent continues execution
- parallel loops: iterations execute concurrently
1. Dynamic multithreading
- serialization of a multithreaded algorithm → delete keywords spawn, sync and parallel
- nested parallelism - when spawn precedes a procedure call → parent may continue to execute parallel to child
- keyword spawn only indicates logical parallelism → scheduler will determine actual execution
- sync is used to wait for al spawned children to finish computation
Computation DAG
-
computation DAG - representation of multithreaded computation as a graph
- vertices are instructions
- edges are dependencies between instructions, i.e.
means must execute before - strand is a sequence of instructions with no parallel control keywords
- if
has directed path from strands to strand they are in series → otherwise and are (logically) in parallel
-
classification of edges in a computation DAG
- continuation edge (horizontal) connects
to its successor - spawn edge (downward with horizontal continuation edge) when
spawns a strand - call edge (downward) normal procedure call
- return edge (upward) when
returns to its calling procedure, and is next strand: - initial strand single starting point of computation
- final strand single ending point of computation
- continuation edge (horizontal) connects
Performance measures
Equation | Meaning |
---|---|
work law | |
span law | |
speedup | |
linear speedup | |
perfect linear speedup | |
parallelism of an algorithm | |
slackness |
- work total time to execute the entire computation on one processor
- span the longest time to execute the strands along any path of the dag → assume unit time then span is the number of vertices on the critical path of the DAG
- speedup how many times faster the computation is on
processors than on 1 processor → at most - linear speedup when speedup increases linearly wrt. number of processors
- parallelism can be viewed in 3 ways
- average amount of work along each step of critical path
- upper bound on maximum possible speedup
- limit on possible attaining perfect linear speedup → once number of processors exceeds parallelism cannot achieve perfect linear speedup
-
slackness factor by which parallelism of computation exceeds number of processors in the machine
- if
cannot achieve perfect linear speedup - if
work per processor is limiting factor
- if
-
concurrency platform's scheduler determines execution in practice → operates on-line without advance knowledge of when threads will be spawned or complete
- greedy scheduler assigns as many strands to processors as possible in each time step
- complete step if at least
strands are ready to execute in a time step - incomplete step when fewer than
strands are ready to execute - The running time
of any multithreaded computation scheduled by a greedy scheduler on an ideal parallel computer with P processors is within a factor of 2 of optimal.
Running time | |
---|---|
ideal parallel computer w/ greedy scheduler |
Parallel loops
pseudocode
Race Conditions
- deterministic algorithm performs same operations on same input independent of scheduling
- nondeterministic when behavior may cary between executions → typically caused by race condition
- determinacy race when 2 or more instructions access the same memory location and at least one performs write operation
- strands are independent when they have no determinacy races